Data model

You must define the data that you want to explore by configuring the data model.

From the Data model application, you can create:

  • Index pattern searches: A search based on a pattern that matches one or more Elasticsearch indices or virtual indices. For more information, see Siren Investigate datasource configuration.

  • Child searches: A sub-search based on an index pattern search created by adding more specific filters or queries.

  • Entity identifiers (EIDs): A virtual entity concept used to connect data sets using common concepts. For example, a company is located in a city and an investor lives in a city. There is no direct relation between the two data sets, but investors can be found in the same city in which the company is located.

Data may also be in an index, either directly in Elasticsearch or in a Virtual Index mapped to an external JDBC datasource.

Click Create index pattern search.

Here, you can choose a name for the entity, add a long and short descriptions and, using an icon picker and color picker, choose an icon and color that will be associated with that entity in graph views, for example when you select the Graph View tab.

The colon ':' has been deprecated in index names, and should not be used.

For more information about creating an index pattern search, see Index pattern searches.

To remove an index pattern search, select it from the list of searches in the left menu and click Delete.

When you edit an index pattern search, you can view and edit multiple properties on the following tabs.

Info

The Info tab enables you to set the following properties:

  • Name of the search: The search name, typically describing returned entity types, for example: "users", "logs firewall 1", "financial articles".

  • Icon.

  • Color.

  • Label when visualized in the graph browser: How to compose the label used when records are visualized on the graph. This can either be a Scripted Label or a Document Field.

  • Short description.

  • Index pattern used by this search: The pattern used to select the indexes to receive queries from the search. By changing the pattern you can change the index or even select multiple indexes (* is a wildcard). For example, logstash-*. Changing this setting will impact this search, the searches derived from this and all the associated visual components.

  • Time Filter field name: Used to filter events with the global time filter.

Fields

The Fields tab displays properties for each field. A check mark indicates the presence of a property:

  • Name.

  • Type.

  • Format.

  • Searchable: Can be used in the filter bar.

  • Aggregateable: Can be used in visualization aggregations.

  • Excluded: Excluded from _source when it is fetched.

  • Primary key: The primary key of an entity identifier.

  • Single value: A field that is not an array.

To configure the field, click the Edit icon in the field’s row:

  • Format: Enables you to control the way that specific values are displayed. It can also cause values to be completely changed and prevent highlighting in Discover from working. The following options are available:

    • Boolean

    • color

    • string (default)

    • truncated string

    • URL

  • Popularity.

  • Primary key.

  • Single value.

Click Update field to save your changes. Alternatively, click Cancel to abandon your changes.

Data

From the Data tab, you can:

  • Generate a dashboard and populate it with visualizations created automatically from the currently selected fields.

  • Create a child search.

  • Add filters.

Relations

The Relations tab enables you to define relations between entities:

  • Source entity: Select a Field.

  • Labels: Select or create a new label for each direction of the relation.

  • Target entity: Select a Search and a Field or select an entity identifier.

For more information, see Creating relationships.

Scripted fields

Scripted fields are computed in real time from your data. They can be used in visualizations and displayed in your documents. However, they cannot be searched.

Before you use scripted fields, see the Elasticsearch documentation about script fields and scripts in aggregations.

Scripted fields can be used to display and aggregate calculated values. For this reason, they can be very slow and - if configured incorrectly - they can cause Siren Investigate to become unusable. There is no protection from unexpected exceptions that are caused by script errors.

Scripted fields use the languages that are enabled in the back-end system. By default, the language that is used is the Painless Elasticsearch language.

Ensure that you access the version of Elasticsearch documentation that matches the back-end version that you are using.

To access values in the document use the following format:

doc['some_field'].value

Painless is powerful and easy to use. It provides access to many native Java APIs and has an easy-to-learn syntax.

Currently, Siren Investigate does not support named functions in Painless scripts.

Alternatively, you can use Lucene Expressions. These are a lot like JavaScript, but limited to basic arithmetic, bitwise and comparison operations.

Lucene Expressions have the following limitations:

  • Only numeric, boolean, date and geo_point fields may be accessed.

  • Stored fields are not available.

  • If a field is sparse (only some documents contain a value), documents missing the field will have a value of 0.

Lucene Expressions support the following operators and functions:

  • Arithmetic operators: +, -, *, /, %

  • Bitwise operators: |, &, ^, ~, <<, >>, >>>

  • Boolean operators (including the ternary operator): &&, ||, !, ?:

  • Comparison operators: <, ⇐, ==, >=, >

  • Common mathematic functions: abs, ceil, exp, floor, ln, log10, logn, max, min, sqrt, pow

  • Trigonometric library functions: acosh, acos, asinh, asin, atanh, atan, atan2, cosh, cos, sinh, sin, tanh, tan

  • Distance functions: haversin

  • Miscellaneous functions: min, max

Scripted fields have the following properties:

  • Name.

  • Language:

    • expression

    • painless (default)

  • Type:

    • Boolean

    • date

    • number

    • string

  • Format: Enables you to control the way that specific values are displayed. It can also cause values to be completely changed and prevent highlighting in Discover from working. The following options are available:

    • Boolean

    • bytes

    • color

    • duration

    • number (default)

    • percentage

    • string

    • URL

  • Popularity.

  • Primary key.

  • Single value.

  • Script.

Click Create field to save your changes. Alternatively, click Cancel to abandon your changes.

Options

On the Options tab, you can take measures to reduce data and improve system performance.

Source filters

Source filters can be used to exclude one or more fields when fetching the document source. This happens when viewing a document in Discover, or with a table displaying results from a saved search in Dashboard. Each row is built using the source of a single document. If you have documents with large or unimportant fields you may benefit from filtering those out at this lower level.

Note that multi-fields will incorrectly appear as matches in the table. These filters only apply to fields in the original source document, so that matching multi-fields are actually not filtered.

Enter a string in the Source Filter field and click Add. Filters accept wildcards, for example, user* will return fields starting with "user".

Sampling on the graph

In the Graph Browser, you can set a graph expansion limit, which controls how many records can be imported into a graph from a dashboard. This is called sampling.

For more information, see Sampling data in the graph.

Preventing expensive queries

When using dashboards sometimes removing a filter or setting a broad time range can cause the system to produce queries which are too expensive to process. To prevent this situation, you can set limits on the time range that’s applied, or on the number of documents that can be added to a dashboard.

For more information, see Preventing expensive queries.

Revisions

The revision index feature allows you to make manual changes in the documents of your index pattern searches and store them inside an additional 'revision index'.

For more information, see Setting up document editing.

Data model graph

On the Data model graph tab, you can visualize the connected data in a graph to see how the entities relate to one another.

image

After you have created an index pattern search, you can create more specific searches. For example, if your main index pattern search is Companies you can now create a narrower selection such as Companies from New York:

  1. Select the Companies index pattern search on the left menu.

  2. Go to the Data tab.

  3. Search for New York and press Enter.

  4. Click Create Child Search.

  5. Enter Companies from New York and click Save.

The child search appears nested under Companies on the left side.

Child searches cannot currently be nested, so there can be only one level of child search, spanning from the Index Pattern Search. This is because certain query modifiers, which appear as filters, could in theory even expand the result set. Note also that data model relations are always defined at Index Pattern Search (or EID) level, and are inherited by child searches.

The child search can now be selected and edited from the left side of the screen. To remove a child search, select it from the list and click Delete.

How to use entity identifiers

Previously, in Siren Platform, to be able to join between two indexes you had to specify that there existed a direct connection between them. For example, if you had two logs which could be connected by the IP value, you would have specified a direct connection, thus creating a relational button between the two.

But what if you have many indexes having IPs (or anything else: MAC Addresses, User IDs, URLs, Port Numbers, Transaction IDs, and so on) that are in multiple roles (Source IP, Destination IP) and it may be useful to join from any of these roles and indexes to any other role and index?

The new relational model enables this automatically.

For example, in this configuration, we have defined the IP concept as an EID and tied it in with other indexes where IPs show up. For each connection, we specify the name of the relation that describes the role of the IP in that index (Is it the source IP in that log or the blocked IP?).

Relations Graph

Using only this configuration, you can now have buttons that explore the ontology and show you all possible matches across your data. At this point, one click and you will be pivoting to the target dashboard, with the right relational filter applied.

For example, to see the records of the Apache logs where the Agent IP matches the Destination IP in the current log, navigate from "Destination IP" as per the picture:

Automatic relational buttons

Entity identifiers are great for anything that identifies "things" across indexes but does not have an index per se (otherwise, you would pivot to it). Things like Phone Numbers, but also Tags, Labels from standalone indexes, and so on. In practice a single Excel spreadsheet can be seen as a "knowledge graph" if you consider labels as identifiers that interconnect records. Here is an example with entity identifiers (Tissue and Organism) in a Life Science deployment.

Knowledge Graph

Note that the automatic connections between dashboards are seen when using the new relational button. The old one will still require manual inputs on which relation to show where.

Visualize

Again, this is how the new relational button appears in action.

Automatic relational buttons

Creating entity identifiers

  1. Click Create entity identifier.

  2. Enter a unique name in the Entity identifier name field.

  3. Provide a short description and select an icon to represent the EID.

  4. Click Create.

The entity identifier can now be selected and edited from the left side of the screen. To remove an entity identifier, select it from the list and click Delete.

Creating relationships

Relationships are defined by linking one class to other classes. However, it is not possible to define a relationship between two entity identifiers.

A relationship is defined as a join operation between two indices with the following fields:

  • The Left Field: the field of the local index to join on;

  • Right Class: (the EID or Index pattern) to connect to;

  • Right Field (only if the Right Class is an Index Pattern): the field of the right index to join with; and

  • Label: the label of the relation.

New relations are created by clicking Add relation. Relations do not need to be created in both originating and target classes, as they appear automatically in both edit screens when created.

The Data model graph tab shows it in a visual representation where the currently-selected class is highlighted.

How to name relations

Naming is an important consideration to make when you are configuring a data model.

In Siren Platform, naming entities and relationships incorrectly will result in dashboards that are difficult to navigate.

When naming things, imagine yourself in the place of the user at the moment where the relational navigation is performed. Say that you are looking at companies, how would you refer to investments?

A possibly natural way is to say that a company received an investment. However, if you are thinking of investment, you can say it has been secured by a company.

In the user interface, look at the directions of the arrows and think of X relationship Y and Y relationship X. For example:

How to name relations

In this case we are using two different verbs, but often the simple solution is to use active/passive, for example "saw and "seen by". Sometimes the inverse is the same property is the same, for example "brother of" or "spouse".

As a general rule, it is always best to keep things quite short. For example, source/is source of and so on.

For more information about the component that provides the navigation between relationally connected dashboards, see Relational Navigator.

Using the Discover relations wizard

Siren Investigate can attempt to automatically identify the relational configuration between any set of Index Pattern Searches.

  1. Go to the Data model app and click the Relations tab.

  2. Click the Discover relations button, which is located on the bottom right of the screen.

  3. Select the index pattern searches to analyze for relations. The default settings for the wizard are usually good enough, but you can check the other tabs for additional controls.

  4. In the EID Patterns tab, a regular expression can be defined to identify particular types of data to match to EIDs, for example, URL, email address, IP address, and so on.

  5. More generation parameters are available in the Settings tab. The available presets are suited for most cases, but you can also edit complete parameters list for full control.

  6. Click Discover relation to start the detection procedure.

Results

At the end of the detection process, all found relations are presented in a report on the Results screen.

The Suggested Relations tab displays found relations grouped by source endpoint (either a field or an Entity Identifier). Selecting a source endpoint will open the list of all its related target fields.

Fields in the report can be included or excluded from the output result by selecting or clearing the appropriate check box.

To provide a clear idea of the data connected in the relations, clicking a target field will open an exploitative window with 100 sample documents per endpoint with highlighted matched values.

You can change a source endpoint by clicking on the drop-down button before its name. This is sometimes useful to correct mistakes in the generation procedure, or if you prefer to have a different relation type (direct relation or passing through an Entity Identifier).

The names of the Entity Identifiers are also editable. Note that assigning the same Entity Identifier name on two different sources will merge them into a single output identifier.

The Existing Relations tab in the report can be used to track the situation of existing relations in the data set against the reported relations. Note that Found relations in this tab don’t also appear as Suggested Relations, since they already exist and should not be duplicated.

For further insight about the generation process, you can inspect the details about each analyzed field in the Per-Field Notes tab, and the full log of the procedure is available in the Log tab.

Temporary Relations

Click Set relations to close the report and put all selected relations in a temporary state. They are shown in light blue on the Data model screen, and will otherwise be ignored in other parts of Siren Investigate.

Temporary relations can also be seen in the Data model graph.

The temporary state is useful to provide a final overview of the newly-generated relations and enable further customization, as if they were manually inserted.

Temporary relations are associated with the current browser session only. If you close the browser window, all temporary relations are lost.

You can review and save each temporary relation individually by clicking on the disk icon, which makes the relation permanent.

Clicking Delete relation will discard it permanently.

If you are sure that you want to keep all of the temporary auto-relations are fine, click Save All in the information banner at the top of the screen.

If you want to discard all of the temporary relations, click Remove All.

Saving an Index Pattern Search or Entity Identifier will also save all its temporary relations.