Paginating a search request

You can paginate a search request in Federate by using the search_after parameter. The process starts by opening a Point-In-Time (PIT) on the parent indices at the root. This operation creates an identifier that is then passed to the search request to be paginated. This effectively caches the results of the request and ensures consistency of the hits later on. Subsequent pages are then retrieved by re-executing the request and updating the search_after parameter. Finally, you must close the PIT in order to free memory.

Open and close Point-In-Times

Federate provides two REST endpoints that allow the opening and closing of Point-In-Times on indices. For the duration of the PIT, the state of the indices in the PIT remain unchanged even if they are updated in that time. This allows search requests to be executed against a consistent index over a long period of time, unaffected by any potential changes to the indices. The default duration for a PIT is 5 minutes. You can adjust the duration using the optional keep_alive parameter.

POST /siren/<index>/_pit (1)

POST /siren/<index>/_pit?keep_alive=10m (2)

DELETE /siren/_pit (3)
{
   "id": "AQ92aV82NzVhOTQ5YV80MWU=#15izAwEWdGVzdHNjcm9sbG5vam9pbi1pbmRleBZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3ABZoR2FERnR6VlNmbUtPR2RaaXVUVjZnAAAAAAAAAAADFjZJN0t0YTdsUVdtMG95a3pvYjd4NkEAARZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3AAA="
}
1 Open a PIT with the default duration.
2 Open a PIT with a custom duration.
3 Close an existing PIT.

The POST method opens a PIT on the given index pattern and returns an identifier. The DELETE method closes the PIT referenced by the identifier in its body.

Pagination

Paginating a search request requires the PIT identifier returned by REST API, and a tiebreaker sort parameter. The sort parameter is needed to paginate hits: this adds a sort field in the search response that is then passed to the search_after. Getting the next page is done by getting the sort value of the last returned hit and setting it to the search_after.

The tiebreaker sort parameter is automatically added if there is already a sort in the request.

Below is a search request that contains a join, where the parent set is machine-*, and the child set is beat-*.

GET /siren/machine-*/_search
{
  "query": {
    "join": {
      "indices": [
        "beat-*"
      ],
      "on": [
        "id",
        "machine"
      ],
      "request": {
        "query": {
          "match_all": {}
        }
      }
    }
  }
}

A PIT over the parent set at the root is created, i.e., over the index pattern machine-*:

POST /siren/machine-*/_pit
{
   "id": "AQ92aV82NzVhOTQ5YV80MWU=#15izAwEWdGVzdHNjcm9sbG5vam9pbi1pbmRleBZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3ABZoR2FERnR6VlNmbUtPR2RaaXVUVjZnAAAAAAAAAAADFjZJN0t0YTdsUVdtMG95a3pvYjd4NkEAARZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3AAA="
}

In order to retrieve the first page, we issue the search request with the identifier and a sort parameter. The index pattern that is normally passed as part of the _search endpoint is omitted: indices resolved during the PIT creation are retrieved from the given PIT identifier.

GET /siren/_search
{
  "sort": { (1)
    "_shard_doc": "asc"
  },
  "pit": { (2)
    "id": "AQ92aV82NzVhOTQ5YV80MWU=#15izAwEWdGVzdHNjcm9sbG5vam9pbi1pbmRleBZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3ABZoR2FERnR6VlNmbUtPR2RaaXVUVjZnAAAAAAAAAAADFjZJN0t0YTdsUVdtMG95a3pvYjd4NkEAARZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3AAA="
  },
  "query": {
    "join": {
      "indices": [
        "beat-*"
      ],
      "on": [
        "id",
        "machine"
      ],
      "request": {
        "query": {
          "match_all": {}
        }
      }
    }
  },
  "size": 2 (3)
}
1 A sort explicitly set with the tiebreaker field _shard_doc.
2 The PIT identifier returned by the call to the _pit REST API.
3 The number of hits returned in a page.

In order to retrieve the next pages, the search_after parameter must be added, using the sort value from the last returned hit. Keep in mind that the PIT identifier could change, always use the id from the latest response in the new request.

GET /siren/_search
{
  "sort": {
    "_shard_doc": "asc"
  },
  "pit": { (1)
    "id": "AQ92aV82NzVhOTQ5YV80MWU=#15izAwEWdGVzdHNjcm9sbG5vam9pbi1pbmRleBZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3ABZoR2FERnR6VlNmbUtPR2RaaXVUVjZnAAAAAAAAAAADFjZJN0t0YTdsUVdtMG95a3pvYjd4NkEAARZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3AAA="
  },
  "query": {
    "join": {
      "indices": [
        "beat-*"
      ],
      "on": [
        "id",
        "machine"
      ],
      "request": {
        "query": {
          "match_all": {}
        }
      }
    }
  },
  "size": 2,
  "search_after": [ (2)
    1
  ]
}
1 The PIT id is given the value of the last returned PIT id
2 The search_after is given the value of the last returned hit’s sort field.

Examples with projection

Paginating a search request with a project clause in a nested join.

GET /siren/_search
{
  "sort": {
    "_shard_doc": "asc"
  },
  "pit": {
    "id": "AQ92aV82NzVhOTQ5YV80MWU=#15izAwEWdGVzdHNjcm9sbG5vam9pbi1pbmRleBZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3ABZoR2FERnR6VlNmbUtPR2RaaXVUVjZnAAAAAAAAAAADFjZJN0t0YTdsUVdtMG95a3pvYjd4NkEAARZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3AAA="
  },
  "query": {
    "join": {
      "indices": [
        "beat-*"
      ],
      "on": [
        "id",
        "machine"
      ],
      "request": {
        "query": {
          "join": {
            "indices": [
              "beat-*"
            ],
            "on": [
              "id",
              "id"
            ],
            "request": {
              "project": [
                {
                  "field": {
                    "name": "date"
                  }
                }
              ],
              "query": {
                "match_all": {}
              }
            }
          }
        }
      }
    }
  },
  "size": 2
}

Paginating a search request with a project clause in the root join.

GET /siren/_search
{
  "sort": {
    "_shard_doc": "asc"
  },
  "pit": {
    "id": "AQ92aV82NzVhOTQ5YV80MWU=#15izAwEWdGVzdHNjcm9sbG5vam9pbi1pbmRleBZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3ABZoR2FERnR6VlNmbUtPR2RaaXVUVjZnAAAAAAAAAAADFjZJN0t0YTdsUVdtMG95a3pvYjd4NkEAARZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3AAA="
  },
  "query": {
    "join": {
      "indices": [
        "beat-*"
      ],
      "on": [
        "id",
        "machine"
      ],
      "request": {
        "project": [
          {
            "field": {
              "name": "date"
            }
          }
        ],
        "query": {
          "match_all": {}
        }
      }
    }
  },
  "size": 2
}

Limitations

The pagination of a search request in Federate has the following limitations:

  • The PIT identifier returned by the /siren/_pit REST API can only be used by a single search request.

  • A join performed against a virtual indices located on a remote Elasticsearch cluster is not supported if that remote cluster doesn’t have the Federate plugin installed.

  • Pagination is supported for joins with virtual indices on the child (right) side of the join only, within the limitations of virtual indices (for example, no field projection).

  • Search slicing with PIT is not supported.