Elasticsearch Cluster Settings

Configure elasticsearch cluster

For high availability clusters we do recommend having at least three nodes running to avoid having a single point of failure or a split-brain issue. Having 3 nodes, if one node go down, your cluster will always have at least one master and one slave node up and running and a correct percentage of shards allocated. EnterMedia recommended architecture includes two full EnterMedia nodes running on a single host sharing a data volume or storage resource and one Elastic-only node in a remote host that will also require access to the storage volume. For high availability we also recommend to replicate this architecture in a secondary data center, setting up EnterMedia's Pulling Feature to keep both clusters in-sync.

EnterMedia Nodes

Set up and mount a data storage resource in the host machine. We recommend to link the mounted resource in the EnterMedia folder structure: /media/emsites/entermedianode/data. Get a copy of the EnterMedia Deploy script:

 curl -o entermedia-docker.sh -jL docker.entermediadb.org

Uncomment the port configuration parameters in the Docker run command:

 docker run -t -d \ ... -p 93$NODENUMBER:9300 \ -p 92$NODENUMBER:9200 \ ...

This will add unique and custom transport and publish nodes to the ElasticSearch configuration. Default ElasticSearch communication ports are 9200 and 9300, a node built with node id 10 within this script will be configured to publish and transport on 9210 and 9310 ports. *You need to set your node numbers below 100 to make the final custom ElasticSearch ports four number long. Install 2 EnterMedia nodes sharing the same instance name but different node id:

 $ sudo ./entermedia-docker.sh entermedianode 10 $ sudo ./entermedia-docker.sh entermedianode 20

Configure node.xml

Stop both nodes (/media/emsites/entermedianode/XX/stop.sh) and edit the node.xml file (/media/emsites/entermedianode/XX/tomcat/config/node.xml), here you need to specify the other node IP and the local node publishing host and port. You need to repeat the changes on the 2 nodes node.xml file.

 <node id="entermedianode10" name="Cluster Node 01"> <property id="cluster.name">unique-cluster</property> <!--ELASTIC PATHS CONFIG--> <property id="path.data">${webroot}/WEB-INF/elastic/${nodeid}/data</property> <property id="path.work">${webroot}/WEB-INF/elastic/${nodeid}/work</property>  <!-- Repos path need to be shared r/w for all the Nodes --> <property id="repo.root.location">${webroot}/WEB-INF/elastic/repos</property>       <property id="path.repo">${webroot}/WEB-INF/elastic/repos</property> <!--CLUSTER CONFIG--> <property id="network.bind_host">0.0.0.0</property> <property id="discovery.zen.ping.unicast.hosts">192.168.100.100:9320</property> <property id="network.publish_host">192.168.100.100</property> <property id="transport.publish_port">9310</property> </node>

network.bind_host - Network interface(s) a node should bind
network.publish_host - Single interface that the node advertises to other nodes in the cluster
discovery.zen.ping.unicast.hosts - Comma-separated list of the cluster nodes with custom port
transport.publish_port - Custom transport publishing port

You can also setup Low and High disk usage watermarks for Elasticsearch. Low watermark defaults to 85%, meaning ES will not allocate new shards to nodes once they have more than 85% disk used. High watermark defaults to 90%, meaning ES will attempt to relocate shards to another node if the node disk usage rises above 90%. You can use % or absolute value (like 10gb).

 <property id="cluster.routing.allocation.disk.watermark.low">10gb</property> <property id="cluster.routing.allocation.disk.watermark.high">5gb</property>

Elasticsearch Nodes

For each nodes, the configuration is yml friendly

Go to

 /media/cluster/$SERVER/$NODE/config/elasticsearch.yml

You can configure your node the following way

 cluster.name:  network.publish_host:  discovery.zen.ping.unicast.hosts:  #comma sepparated discovery.zen.minimum_master_nodes:  discovery.zen.fd.ping_interval: 20s discovery.zen.fd.ping_timeout: 30s discovery.zen.fd.ping_retries: 6 path.repo: /opt/entermediadb/webapp/WEB-INF/elastic/repos node.name:  node.master: true node.data: true network.bind_host: 0.0.0.0 http.port: 9200 transport.publish_port: 9300 analysis: analyzer: tags: type: "custom" tokenizer: "keyword" filter: - "lowercase" lowersnowball: tokenizer: "standard" filter: - standard - lowercase - stemfilter filter: stemfilter: type: "snowball" language: "English"

Firewall

For easier to maintain firewall rules using tcp proxies please refrain to: IPTables Docker and NGINX configuration

To setup a two (or more) machine elasticsarch cluster and restrict the access to it it's needed to add some rules to your iptables on both machine.

The Docker daemon is by default writting some rules each time it starts under the DOCKER chain, you need to bypass it by doing the following:

MACHINE1=181.218.144.5
MACHINE2=181.218.144.6
LOCAL_CONTAINER=172.12.0.12

1. iptables -N CUSTOM_CLIENT
2. iptables -A CUSTOM_CLIENT -s $MACHINE2 -p tcp --dport 9200 -d $LOCAL_CONTAINER -j ACCEPT
3. iptables -A CUSTOM_CLIENT -s $MACHINE2 -p tcp --dport 9300 -d $LOCAL_CONTAINER -j ACCEPT
4. iptables -A FORWARD -s $MACHINE2 -p tcp --dport 9200 -d $LOCAL_CONTAINER -j ACCEPT
5. iptables -A FORWARD -s $MACHINE2 -p tcp --dport 9300 -d $LOCAL_CONTAINER -j ACCEPT
6. iptables -t nat -A PREROUTING -p tcp -m tcp --dport 9214 -j DNAT --to-destination $LOCAL_CONTAINER:9200
7. iptables -t nat -A PREROUTING -p tcp -m tcp --dport 9314 -j DNAT --to-destination $LOCAL_CONTAINER:9300
8. iptables -R DOCKER 1 -p tcp --source 0.0.0.0/0 --destination $LOCAL_CONTAINER --dport 9314 -j CUSTOM_CLIENT
9. iptables -R DOCKER 2 -p tcp --source 0.0.0.0/0 --destination $LOCAL_CONTAINER --dport 9214 -j CUSTOM_CLIENT

Line 8 and 9 allows you to overwrite the rules created by the Docker daemon when starting your instance with exposed ports (9314 and 9214 here in that example). Depending on how many instances you'll be running on your host machine you might have more than two rules in your DOCKER chain. Be aware of the rules you're trying to overwrite or you might break some critical connection between your Docker container and the host machine network interface.

You can now test your setup by running a curl from your workstation on $MACHINE1:9314 then run the same curl from MACHINE2. You should only get a results from MACHINE2 and the curl from your workstation should be dropped by the MACHINE1 firewall.

Start the Cluster

Start each node individually:

 $ /media/emsites/entermedianode/10/start.sh $ /media/emsites/entermedianode/20/start.sh

And you can verify the startup sequence in the logs (/media/emsites/entermedianode/10/logs.sh) or run the status tools in the next section. At this point you will need to setup your NGINX to be able to make public your instances in the browser.

Cluster Status Tools

We provide a script with some basics ElasticSearch API endpoints. Run it to get some cluster information. Find the health.sh script in your scripts area (/media/emsites/node01/11/health.sh).

 $ ./health.sh [nodes | allocation | shards]

nodes - Provides full information about all the nodes in the cluster.
allocation - Provides a snapshot of how many shards are allocated to each data node and how much disk space they are using.
shards - Detailed view of what nodes contain which shards.

Critical information to monitor in the default health.sh command output are the status, should be always green, and the active_shards_percent_as_number, should be 100%.

 { "cluster_name" : "unique-cluster", "status" : "green", "timed_out" : false, "number_of_nodes" : 2, "number_of_data_nodes" : 2, "active_primary_shards" : 50, "active_shards" : 100, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 0, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 100.0 }

Elastic-only Node (Data Node)

Now follow the instructions to setup an Elastic-only Node for your cluster.

Elasticsearch Cluster Settings

Configure elasticsearch cluster

EnterMedia Nodes

Configure node.xml

Elasticsearch Nodes

Firewall

Start the Cluster

Cluster Status Tools

Elastic-only Node (Data Node)

Versions

API - 10

Setup - 10

Features - 10

Administration - 10