Paginating a search request
You can paginate a search request in Federate by using the search_after parameter. The process starts by opening a Point-In-Time (PIT) on the parent indices at the root. This operation creates an identifier that is then passed to the search request to be paginated. This effectively caches the results of the request and ensures consistency of the hits later on. Subsequent pages are then retrieved by re-executing the request and updating the search_after
parameter. Finally, you must close the PIT in order to free memory.
Open and close Point-In-Times
Federate provides two REST endpoints that allow the opening and closing of Point-In-Times on indices. For the duration of the PIT, the state of the indices in the PIT remain unchanged even if they are updated in that time. This allows search requests to be executed against a consistent index over a long period of time, unaffected by any potential changes to the indices.
The default duration for a PIT is 5 minutes. You can adjust the duration using the optional keep_alive
parameter.
POST /siren/<index>/_pit (1)
POST /siren/<index>/_pit?keep_alive=10m (2)
DELETE /siren/_pit (3)
{
"id": "AQ92aV82NzVhOTQ5YV80MWU=#15izAwEWdGVzdHNjcm9sbG5vam9pbi1pbmRleBZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3ABZoR2FERnR6VlNmbUtPR2RaaXVUVjZnAAAAAAAAAAADFjZJN0t0YTdsUVdtMG95a3pvYjd4NkEAARZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3AAA="
}
1 | Open a PIT with the default duration. |
2 | Open a PIT with a custom duration. |
3 | Close an existing PIT. |
The POST method opens a PIT on the given index pattern and returns an identifier. The DELETE method closes the PIT referenced by the identifier in its body.
Pagination
Paginating a search request requires the PIT identifier returned by REST API, and a tiebreaker sort
parameter. The sort parameter is needed to paginate hits: this adds a sort field in the search response that is then passed to the search_after
. Getting the next page is done by getting the sort
value of the last returned hit and setting it to the search_after
.
The tiebreaker |
Below is a search request that contains a join, where the parent set is machine-*
, and the child set is beat-*
.
GET /siren/machine-*/_search
{
"query": {
"join": {
"indices": [
"beat-*"
],
"on": [
"id",
"machine"
],
"request": {
"query": {
"match_all": {}
}
}
}
}
}
A PIT over the parent set at the root is created, i.e., over the index pattern machine-*
:
POST /siren/machine-*/_pit
{
"id": "AQ92aV82NzVhOTQ5YV80MWU=#15izAwEWdGVzdHNjcm9sbG5vam9pbi1pbmRleBZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3ABZoR2FERnR6VlNmbUtPR2RaaXVUVjZnAAAAAAAAAAADFjZJN0t0YTdsUVdtMG95a3pvYjd4NkEAARZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3AAA="
}
In order to retrieve the first page, we issue the search request with the identifier and a sort parameter. The index pattern that is normally passed as part of the _search
endpoint is omitted: indices resolved during the PIT creation are retrieved from the given PIT identifier.
GET /siren/_search
{
"sort": { (1)
"_shard_doc": "asc"
},
"pit": { (2)
"id": "AQ92aV82NzVhOTQ5YV80MWU=#15izAwEWdGVzdHNjcm9sbG5vam9pbi1pbmRleBZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3ABZoR2FERnR6VlNmbUtPR2RaaXVUVjZnAAAAAAAAAAADFjZJN0t0YTdsUVdtMG95a3pvYjd4NkEAARZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3AAA="
},
"query": {
"join": {
"indices": [
"beat-*"
],
"on": [
"id",
"machine"
],
"request": {
"query": {
"match_all": {}
}
}
}
},
"size": 2 (3)
}
1 | A sort explicitly set with the tiebreaker field _shard_doc . |
2 | The PIT identifier returned by the call to the _pit REST API. |
3 | The number of hits returned in a page. |
In order to retrieve the next pages, the search_after
parameter must be added, using the sort
value from the last returned hit.
Keep in mind that the PIT identifier could change, always use the id
from the latest response in the new request.
GET /siren/_search
{
"sort": {
"_shard_doc": "asc"
},
"pit": { (1)
"id": "AQ92aV82NzVhOTQ5YV80MWU=#15izAwEWdGVzdHNjcm9sbG5vam9pbi1pbmRleBZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3ABZoR2FERnR6VlNmbUtPR2RaaXVUVjZnAAAAAAAAAAADFjZJN0t0YTdsUVdtMG95a3pvYjd4NkEAARZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3AAA="
},
"query": {
"join": {
"indices": [
"beat-*"
],
"on": [
"id",
"machine"
],
"request": {
"query": {
"match_all": {}
}
}
}
},
"size": 2,
"search_after": [ (2)
1
]
}
1 | The PIT id is given the value of the last returned PIT id |
2 | The search_after is given the value of the last returned hit’s sort field. |
Examples with projection
Paginating a search request with a project
clause in a nested join.
GET /siren/_search
{
"sort": {
"_shard_doc": "asc"
},
"pit": {
"id": "AQ92aV82NzVhOTQ5YV80MWU=#15izAwEWdGVzdHNjcm9sbG5vam9pbi1pbmRleBZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3ABZoR2FERnR6VlNmbUtPR2RaaXVUVjZnAAAAAAAAAAADFjZJN0t0YTdsUVdtMG95a3pvYjd4NkEAARZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3AAA="
},
"query": {
"join": {
"indices": [
"beat-*"
],
"on": [
"id",
"machine"
],
"request": {
"query": {
"join": {
"indices": [
"beat-*"
],
"on": [
"id",
"id"
],
"request": {
"project": [
{
"field": {
"name": "date"
}
}
],
"query": {
"match_all": {}
}
}
}
}
}
}
},
"size": 2
}
Paginating a search request with a project
clause in the root join.
GET /siren/_search
{
"sort": {
"_shard_doc": "asc"
},
"pit": {
"id": "AQ92aV82NzVhOTQ5YV80MWU=#15izAwEWdGVzdHNjcm9sbG5vam9pbi1pbmRleBZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3ABZoR2FERnR6VlNmbUtPR2RaaXVUVjZnAAAAAAAAAAADFjZJN0t0YTdsUVdtMG95a3pvYjd4NkEAARZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3AAA="
},
"query": {
"join": {
"indices": [
"beat-*"
],
"on": [
"id",
"machine"
],
"request": {
"project": [
{
"field": {
"name": "date"
}
}
],
"query": {
"match_all": {}
}
}
}
},
"size": 2
}
Limitations
The pagination of a search request in Federate has the following limitations:
-
The PIT identifier returned by the
/siren/_pit
REST API can only be used by a single search request. -
A join performed against a virtual indices located on a remote Elasticsearch cluster is not supported if that remote cluster doesn’t have the Federate plugin installed.
-
Pagination is supported for joins with virtual indices on the child (right) side of the join only, within the limitations of virtual indices (for example, no field projection).
-
Search slicing with PIT is not supported.