Connecting to remote datasources

The Siren Federate plugin offers two ways of interacting with data from external sources. It is possible to query external data directly by using virtual indices, or indirectly by using reflections. A datasource is either an Elasticsearch cluster or a JDBC database.

After a remote datasource is configured, an analyst who is using Siren Investigate can create dashboards by directly querying the remote datasources and displaying the resulting data alongside Elasticsearch data.

A remote Elasticsearch datasource causes the system to consider remote indices as local ones; that is, as virtual indices.

A JDBC datasource allows you to define an external database so that a view of the data can be created by ingesting some of its data into an Elasticsearch index.

Datasources and virtual indices can be managed by using the REST API.

Settings for remote datasources

The Siren Federate plugin stores the datasource configuration in two Elasticsearch indices:

  • .siren-federate-datasources: The index that is used to store the JDBC configuration parameters of remote datasources.

  • .siren-federate-indices: The index that is used to store the configuration parameters of virtual indices.

Other indices are also used for different features:

  • .siren-federate-ingestions: The index that is used to store the ingestion configurations.

  • .siren-federate-joblogs: The index that is used to store logs of ingestion jobs.

The Siren Federate Connector module supports the datasource node configuration settings. For more information, see the Connector module.

Virtual indices

After you create an Elasticsearch datasource with Siren Federate, you create virtual indices to map external indices.

How do you query virtual indices?

Virtual indices can be queried by using one of the following APIs:

  • Standard Elasticsearch API: Allows you to query a virtual index as if it were a standard Elasticsearch index. However, note that this method does not support joins.

  • Siren Federate API: Allows you to use the join query clause on virtual indices. The Siren Federate query planner pushes down to the remote datasources the computation of query operators such as filters, aggregations, and, if the Federate plugin is installed on the remote datasource, joins.

Siren Federate supports sophisticated join capabilities across both real indices and virtual indices.

There are three kinds of join operations:

  • Joins involving indices within the same external datasource: If the Siren Federate plugin is installed on the datasource, Siren Federate will simply push down the joins to the remote. If the plugin isn’t installed, then the cross back-end join operation is used. The performance and scalability depends on the datasource that Siren Federate is connected to.

  • Cross back-end joins (external datasource to external datasource, external datasource to Elasticsearch): The scalability of this operation is, in its current version, quite high from an external datasource to Elasticsearch, while limited in the opposite direction. Improvements are planned in future versions.

  • Joins across indices that are within the same Elasticsearch cluster. These are extremely scalable. Siren Federate augments existing Elasticsearch installations with an in-memory distributed computational layer. Search operations are pushed down to the Elasticsearch indices and then search results are distributed across the available Elasticsearch nodes for distributed join computation. This enables horizontal scaling, which leverages the entire cluster’s CPUs and memory.

Operations on virtual indices

The Siren Federate plugin supports the following operations on virtual indices:

  • get mapping

  • get field capabilities

  • search

  • msearch

  • get

  • mget

Search requests that involve a mixture of virtual and normal Elasticsearch indices, for example, when using a wildcard, are not supported and will be rejected. It is, however, possible to issue msearch requests that contain requests on normal Elasticsearch indices and virtual indices.

Elasticsearch index for interoperability with Search Guard and Elastic Stack Security. If an Elasticsearch index with the same name as the virtual index already exists and it is not empty, the virtual index creation will fail and the original Elasticsearch index is not removed.

Known limitations with configuring remote Elasticsearch datasources

The following limitations exist for all connectors:

  • A cross back-end join is limited to semi-join.

  • A cross back-end join supports only integer keys.

  • Cross back-end support has very different scalability according to the direction of the join. A join that involves sending IDs to a remote system can potentially be hundreds of times less scalable, to one where the keys are fetched from a remote system.

  • Cross cluster searches on virtual indices are not supported.