High Availability (HA)
Siren Alert supports High Availability (HA) for reporting on node clusters. This provides continued service of the alerting system when a cluster’s leader node fails by switching the leader’s responsibilities to another node. The reporting functionality is muted on all but the leader node, preventing duplicate reports being sent for each alert.
Configuring Elasticsearch-based High Availability
Siren Alert can use Elasticsearch as the coordination backend for high availability. This allows all nodes in the cluster to coordinate leader election and failover using Elasticsearch, without relying on a local file system or other external mechanisms.
To enable and configure Elasticsearch-based HA, follow these steps:
Prerequisites
If Siren Alert cluster mode was used in the past, remove sentinl.settings.cluster
and any nested properties from the investigate.yml
file.
Ensure you have created a sirenalert
user and ensure these settings are present in each node.
See the
security setup
for more details
....
investigate_access_control:
sirenalert:
elasticsearch:
username: sirenalert
password: password
....
Setup
-
Set the following configuration in your
investigate.yml
file on each Investigate node in the cluster and then restart each node.... investigate_core: instance_name: <unique name for each node in the cluster> .... high_availability: enabled: true timeout_ms: 1000 [optional] refresh_ms: 2000 [optional] ttl_ms: 5000 [optional] log_level: silent [optional] ....
Ensure that you set
investigate_core.instance_name
to a unique name for each node.
If theinstance_name
is not set, it will be set to a randomly generate UUIDv4 ID. This will make later debugging much more difficult.
The defaults for each of the optional settings is shown above. These defaults are generally sufficient for the majority of installations and the settings do not need to be added to the configuration.
+
timeout_ms
::
A timeout (in milliseconds) for each individual request made by a node to Elasticsearch for HA coordination.
If a request takes longer than this value, it is considered failed.
This helps nodes quickly detect communication issues or failures.
+
refresh_ms
::
The interval (in milliseconds) at which each node sends coordination requests to Elasticsearch.
A smaller value means nodes will check more frequently for leader status changes, resulting in faster leader election and failover.
However, setting this too low may cause excessive load on Elasticsearch due to frequent requests.
Adjust this value to balance responsiveness and system load.
+
ttl_ms
::
The time-to-live (in milliseconds) for the elected leader node status in Elasticsearch.
If the leader does not update its status within this period, it is considered dead, and a new leader election will be triggered.
This ensures that leadership is transferred promptly if the current leader becomes unavailable.
+
log_level
::
Controls the verbosity of HA-related logging. Possible values are silent
, info
, warn
, error
, or debug
.
Notes
-
There is a very small chance that an alert is lost due to the execution of the alert occuring at the exact time of a new leader node election.
-
The special document in .siren index used for HA should not be deleted or modified manually.
For more information, see the main Siren Alert documentation or contact support.