Set Up Federate

Configuring Logger

It is recommended to change the default Elasticsearch’s log configuration logger.action.level from debug to warn in order to avoid spurious log messages whenever a search request is cancelled.

Off-heap memory management

Data is encoded in a columnar memory format and stored off-heap, reducing the pressure on the JVM and allowing fast and efficient analytical operations. Data reading is done directly from the off-heap storage and decoded on-the-fly using zero-serialization (removing any serialization overhead) and zero-copy memory (reducing CPU cycles and memory bandwidth overhead).

Siren Federate’s memory management allows for granular control over the amount of off-heap memory that can be allocated per node, per query, and per query operator, while having the inherent ability to terminate queries when the memory circuit breaker detects too many off-heap memory requests. In addition, the garbage collector automatically releases intermediate computation results and recovers the off-heap memory to decrease the impact on memory.

Off-heap storage is only used on the data nodes; master-only and coordinator nodes will not use off-heap memory.

Checking off-heap memory allocation

Federate provides a REST endpoint to retrieve statistics about the cluster (Nodes Statistics) which include off-heap memory allocation (Memory Information).

The allocated direct memory represents the off-heap memory chunks pre-allocated to accommodate the root allocator. The chunk of off-heap memory that was allocated is kept and reused given that off-heap memory allocation is expensive. The root allocator can then allocate off-heap memory buffers of various size in a very efficient way.

Setting off-heap memory

The amount of off-heap memory available for the root allocator can be configured with the siren.memory.root.limit variable in config/elasticsearch.yml.

However, this value is limited by the maximum direct memory limit of Federate. If you want to set a larger limit for the root allocator, you must first manually set the maximum direct memory limit for Federate. This can be done using the siren.io.netty.maxDirectMemory variable.

For example, you can add this line to config/jvm.options of your Elasticsearch instance to increase the max direct memory limit to 2GB:

-Dsiren.io.netty.maxDirectMemory=2147483648

You can then configure the siren.memory.root.limit in the config/elasticsearch.yml to 2147483647 (i.e. 2147483648 - 1, as it is forbidden to use a limit that is greater than or equals to the max direct memory limit):

If you start the Elasticsearch instance, you can see the following info logs:

[2019-12-10T17:29:11,207][INFO ][i.s.f.c.i.m.BufferAllocatorService] [node_s0] Buffer allocator service starting with Unsafe access: true
[2019-12-10T17:29:11,207][INFO ][i.s.f.c.i.m.BufferAllocatorService] [node_s0] Buffer allocator service starting with directMemoryLimit=2147483648 (1)
[2019-12-10T17:29:11,233][INFO ][i.s.f.c.i.m.BufferAllocatorService] [node_s0] Buffer allocator service starting with defaultNumDirectArenas=5
[2019-12-10T17:29:11,236][INFO ][i.s.f.c.i.m.BufferAllocatorService] [node_s0] Instantiating root allocator with limit=2147483647 (2)

These info logs provide an overview on how Federate is configured, here we can see that:

1 the max direct memory limit is correctly set to 2147483648
2 the root allocator limit is correctly set to 2147483647

It is critical to ensure that there is enough available memory on the machine for accommodating the max direct memory limit for Federate, the JVM max heap memory limit, and the OS. If the sum of max direct memory limit for Federate and the JVM max heap memory limit do not leave enough memory for the OS, the OS might kill the Elasticsearch instance (OOM killer process on Linux systems).

64GB Machine

This is the recommended settings for cluster that needs to execute joins on a large amount of data

  • 24 GB Heap (for Elasticsearch)

  • 24 GB Off-heap (for Federate)

  • 16 GB for the OS and OS Cache

config/jvm.options
-Xmx24g
-Dsiren.io.netty.maxDirectMemory=25769803776
config/elasticsearch.yml:
siren.memory.root.limit: 25769803775

Otherwise, if the off-heap for Federate is not fully used, it is better to give more heap memory to Elasticsearch:

  • 32 GB Heap (for Elasticsearch)

  • 16 GB Off-heap (for Federate)

  • 16 GB for the OS and OS Cache

128GB Machine
  • 32 GB Heap (for Elasticsearch)

  • 64 GB Off-heap (for Federate)

  • 32 GB for the OS and OS Cache

config/jvm.options
-Xmx32g
-Dsiren.io.netty.maxDirectMemory=68719476736
config/elasticsearch.yml:
siren.memory.root.limit: 68719476735

If the off-heap for Federate is not fully used, it is then better to leave more memory for the OS and OS cache.