Luis Ashurei

Thinking in Data

| | email

How to verify elasticsearch sharding mechanism working

Elasticsearch distribution is looks as simple as a "magician", what you need to do is just start the server then enjoy.
However anyway you need to understand how it works, isn't it?

-- Elasticsearch is a peer to peer based system, nodes communicate with one another directly if operations are delegated / broadcast.
So how to verify? Two words: disable replication.


Download the latest stable version, unzip it and clone one to simulate a second node.

Modify `config/elasticsearch.yml` (create if not exist)

# change the default name to avoid possible conflict "my_elasticsearch"

# node name "node1" # use "node2" in second one

# disable replication
index.number_of_replicas: 0

Start-up servers


Check cluster status

As you can see there are two nodes. Next is input data to see sharding.


Save ten records

$ curl -XPUT 'http://localhost:9200/_bulk' -d '
  {"line_id":1,"play_name":"Henry IV","speech_number":"","line_number":"","speaker":"","text_entry":"ACT I"}
  {"line_id":2,"play_name":"Henry IV","speech_number":"","line_number":"","speaker":"","text_entry":"SCENE I. London. The palace."}
  {"line_id":3,"play_name":"Henry IV","speech_number":"","line_number":"","speaker":"","text_entry":"Enter KING HENRY, LORD JOHN OF LANCASTER, the EARL of WESTMORELAND, SIR WALTER BLUNT, and others"}
  {"line_id":4,"play_name":"Henry IV","speech_number":1,"line_number":"1.1.1","speaker":"KING HENRY IV","text_entry":"So shaken as we are, so wan with care,"}
  {"line_id":5,"play_name":"Henry IV","speech_number":1,"line_number":"1.1.2","speaker":"KING HENRY IV","text_entry":"Find we a time for frighted peace to pant,"}
  {"line_id":6,"play_name":"Henry IV","speech_number":1,"line_number":"1.1.3","speaker":"KING HENRY IV","text_entry":"And breathe short-winded accents of new broils"}
  {"line_id":7,"play_name":"Henry IV","speech_number":1,"line_number":"1.1.4","speaker":"KING HENRY IV","text_entry":"To be commenced in strands afar remote."}
  {"line_id":8,"play_name":"Henry IV","speech_number":1,"line_number":"1.1.5","speaker":"KING HENRY IV","text_entry":"No more the thirsty entrance of this soil"}
  {"line_id":9,"play_name":"Henry IV","speech_number":1,"line_number":"1.1.6","speaker":"KING HENRY IV","text_entry":"Shall daub her lips with her own childrens blood;"}
  {"line_id":10,"play_name":"Henry IV","speech_number":1,"line_number":"1.1.7","speaker":"KING HENRY IV","text_entry":"Nor more shall trenching war channel her fields,"}
(If you are in Windows you can use Firefox + RESTClient to input data just like I did.)

See result via `http://localhost:9200/_search?q=*`, note total number of records is 10.


Firstly let's check sharding status by look index folder

A node have 2 shards (folders) and the other have 3.

Query Parameter `routing` can be use for search only specific shard

-- only return 4 records

Then shutdown the second node

$ curl -XPOST 'http://localhost:9200/_cluster/nodes/node2/_shutdown'

Query to see

Oh yes only data on live sharding are returned.

Check cluster health by the way

The status is "red" means some shard is not available (If cluster replication is enabled and node2 is down, status will be "yellow").

In Conclusion

Now you know it is working as said, don't disable replication unless you don't want reliability.
OK next step is read each of setting property, testing, capacity planning etc.


25 Sep 2014