NXLog Agent high availability (HA)
The two main components of HA are failover for fault tolerance and load balancing for maintaining an acceptable level of performance. NXLog Agent ships with built-in failover capabilities and integrates with third-party load-balancing solutions.
Failover
There are several failover node configuration models. However, the two most common models are active/passive and active/active:
- Active/passive
-
One node in the cluster is the active node, while the other nodes are in standby. If the initial active node fails, the cluster elects one of the standby nodes to become the active node.
- Active/active
-
All nodes in the cluster are active, effectively performing load balancing. If one of the nodes fails, the cluster directs traffic to one or more of the remaining nodes. This failover model ensures that there are no idle nodes, and depending on the number of nodes, has the advantage of increasing performance.
Additionally, some applications, like NXLog Agent, implement failover mechanisms (self-managed failover), while specialized software like HAProxy provides external load balancing and failover (externally managed failover):
- Self-managed failover
-
Also known as same-tier failover, occurs entirely within the application without needing additional external hosts. Each node in the cluster contains a failover configuration.
- Externally managed failover
-
Nodes are unaware of their peers and do not contain any configuration that defines them as part of a cluster. It relies entirely on an external host to determine the active nodes and the viable peers. See Emulating Active/Passive Application Clustering with HAProxy for an example.
NXLog Agent includes built-in support for self-managed and externally managed failover, offering flexibility for different use cases.
The following diagram illustrates NXLog Agent instances collecting telemetry data from different sources and forwarding it to an NXLog Agent relay cluster. Each agent can define the same or different active and passive nodes. See Configure a relay cluster in failover mode for configuration instructions.
|
This architecture works with agent-based data collection and can only work with agentless data collection if the source supports failover configuration. |
NXLog Agent modules that support failover
All network-based NXLog Agent modules support failover:
-
NXLog Transport (im_batchcompress, om_batchcompress)
-
DBI (im_dbi)
-
Elasticsearch (om_elasticsearch)
-
Raijin (om_raijin)
-
Redis (im_redis)
-
Remote Management (xm_admin)
-
UDP (im_udp, om_udp, om_udpspoof)
Load balancing
The primary objective of a load balancer is to distribute the workload across a cluster of nodes to mitigate performance issues, such as:
-
Sudden spikes in workload could overwhelm a standalone system, possibly bringing it down.
-
In an unmanaged cluster, some compute resources will inevitably work at peak capacity while others sit idle due to the random nature of tasks and network traffic. A load balancer can efficiently distribute network traffic and tasks across multiple nodes, optimizing resource utilization and maximizing throughput.
You can also use load balancing to scale out your application and increase its performance and redundancy.
Load balancing works in active-active mode, providing the most cost-effective use of a cluster and the highest performance. When comparing the following diagram with the previous one for Failover, two significant differences are apparent:
-
An additional NXLog Agent Failover tier acts as a mediator between the Data Sources and the NXLog Agent Load Balancing cluster.
-
All nodes in the NXLog Agent Load Balancing cluster are now active with no idle nodes.
For this reason, load balancers are associated first and foremost with performance rather than fault tolerance.
See Configure NXLog Agent load balancing for configuration examples.