EnterMedia Clustering Setup

For high availability clusters we do recommend having at least three nodes running to avoid having a single point of failure or a split-brain issue. Having 3 nodes, if one node go down, your cluster will always have at least one master and one slave node up and running and a correct percentage of shards allocated.

EnterMedia recommendeds architectures that includes two full EnterMedia nodes running on a single host sharing a data volume or storage resource, and one Elastic-only node in a remote host that will also require access to the storage volume.

For high availability we also recommend to replicate this architecture in a secondary data center, setting up EnterMedia's Pulling Feature to keep both clusters in-sync.

 

EnterMedia Nodes

Set up and mount a data storage resource in the host machine:

We recommend linking the mounted resource in the EnterMedia folder structure: /media/emsites/entermedianode/data. 

Get a copy of the EnterMedia Deploy script:

 $ curl -o entermedia-docker.sh -jL docker.entermediadb.org

Uncomment the port configuration parameters in the Docker run command:

 $ docker run -t -d \ ... -p 93$NODENUMBER:9300 \ -p 92$NODENUMBER:9200 \ ...

This will add unique and custom transport and publish nodes to the ElasticSearch configuration.

Default ElasticSearch communication ports are 9200 and 9300, a node built with node ID 10 within this script will be configured to publish and transport on 9210 and 9310 ports.

*You need to set your node numbers below 100 to make the final custom ElasticSearch ports four number long.

Install 2 EnterMedia nodes sharing the same instance name but different node ID:

 $ sudo ./entermedia-docker.sh entermedianode 10 $ sudo ./entermedia-docker.sh entermedianode 20 

 

Configure node.xml

Stop both nodes (/media/emsites/entermedianode/XX/stop.sh) and edit the node.xml file (/media/emsites/entermedianode/XX/tomcat/config/node.xml).

Here you need to specify the other node IP and the local node publishing host and port. You need to repeat the changes on the 2 nodes node.xml file.

 <node id="entermedianode10" name="Cluster Node 01"> <property id="cluster.name">unique-cluster</property> <!--ELASTIC PATHS CONFIG--> <property id="path.data">${webroot}/WEB-INF/elastic/${nodeid}/data</property> <property id="path.work">${webroot}/WEB-INF/elastic/${nodeid}/work</property> <!--CLUSTER CONFIG--> <property id="network.bind_host">0.0.0.0</property> <property id="discovery.zen.ping.unicast.hosts">192.168.100.100:9320</property> <property id="network.publish_host">192.168.100.100</property> <property id="transport.publish_port">9310</property> </node> 
  • network.bind_host - Network interface(s) a node should bind
  • network.publish_host - Single interface that the node advertises to other nodes in the cluster
  • discovery.zen.ping.unicast.hosts - Comma-separated list of the cluster nodes with custom port
  • transport.publish_port - Custom transport publishing port

 

Firewall

To setup a two (or more) machine ElasticSearch cluster and restrict the access to it that's needed by adding some rules to your iptables configuration on both machines.

The Docker daemon by default writes rules each time it starts under the DOCKER chain, you need to bypass it by doing the following:

MACHINE1=181.218.144.5
MACHINE2=181.218.144.6
LOCAL_CONTAINER=172.12.0.12

 $ iptables -N CUSTOM_CLIENT $ iptables -A CUSTOM_CLIENT -s $MACHINE2 -p tcp --dport 9200 -d $LOCAL_CONTAINER -j ACCEPT $ iptables -A CUSTOM_CLIENT -s $MACHINE2 -p tcp --dport 9300 -d $LOCAL_CONTAINER -j ACCEPT $ iptables -A FORWARD -s $MACHINE2 -p tcp --dport 9200 -d $LOCAL_CONTAINER -j ACCEPT $ iptables -A FORWARD -s $MACHINE2 -p tcp --dport 9300 -d $LOCAL_CONTAINER -j ACCEPT $ iptables -t nat -A PREROUTING -p tcp -m tcp --dport 9214 -j DNAT --to-destination $LOCAL_CONTAINER:9200 $ iptables -t nat -A PREROUTING -p tcp -m tcp --dport 9314 -j DNAT --to-destination $LOCAL_CONTAINER:9300 $ iptables -R DOCKER 1 -p tcp --source 0.0.0.0/0 --destination $LOCAL_CONTAINER --dport 9314 -j CUSTOM_CLIENT $ iptables -R DOCKER 2 -p tcp --source 0.0.0.0/0 --destination $LOCAL_CONTAINER --dport 9214 -j CUSTOM_CLIENT

Line 8 and 9 allows you to overwrite the rules created by the Docker daemon when starting your instance with exposed ports (9314 and 9214 here in that example). Depending on how many instances you'll be running on your host machine you might have more than two rules in your DOCKER chain. Be aware of the rules you're trying to overwrite or you might break some critical connection between your Docker container and the host machine network interface.

You can now test your setup by running a curl from your workstation on $MACHINE1:9314 then run the same curl from MACHINE2. You should only get a results from MACHINE2 and the curl from your workstation should be dropped by MACHINE1 firewall.

 

Start the Cluster

Start each node individually:

 $ bash /media/emsites/entermedianode/10/start.sh $ bash /media/emsites/entermedianode/20/start.sh 

And you can verify the startup sequence in the logs (/media/emsites/entermedianode/10/logs.sh) or run the status tools in the next section. At this point you will need to setup your NGINX to be able to make public your instances in the browser.

 

Cluster Status Tools

We provide a script with some basics ElasticSearch API endpoints. Run it to get some cluster information. Find the health.sh script in your scripts area (/media/emsites/node01/11/health.sh).

 $ ./health.sh [nodes | allocation | shards]
  • nodes  - Provides full information about all the nodes in the cluster.
  • allocation - Provides a snapshot of how many shards are allocated to each data node and how much disk space they are using.
  • shards - Detailed view of what nodes contain which shards.

Critical information to monitor in the default health.sh command output are the status, should be always green, and the active_shards_percent_as_number, should be 100%.

 { "cluster_name" : "unique-cluster", "status" : "green", "timed_out" : false, "number_of_nodes" : 2, "number_of_data_nodes" : 2, "active_primary_shards" : 50, "active_shards" : 100, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 0, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 100.0 } 

 

Elastic-only Node (Data Node)

Now follow the instructions to setup an Elastic-only Node for your cluster.