Suo Lu

Thinking in Data

| | email

How to verify elasticsearch sharding mechanism working

Elasticsearch distribution is looks as simple as a "magician", what you need to do is just start the server then enjoy.
However anyway you need to understand how it works, isn't it?

-- Elasticsearch is a peer to peer based system, nodes communicate with one another directly if operations are delegated / broadcast.
So how to verify? Two words: disable replication.

Prepare

Download the latest stable version, unzip it and clone one to simulate a second node.

Modify `config/elasticsearch.yml` (create if not exist)

# change the default name to avoid possible conflict
cluster.name: "my_elasticsearch"

# node name
node.name: "node1" # use "node2" in second one

# disable replication
index.number_of_replicas: 0

Start-up servers

/elasticsearch-home-folder/node1/bin/elasticsearch
/elasticsearch-home-folder/node2/bin/elasticsearch

Check cluster status

http://localhost:9200/_cluster/health?pretty=true
http://localhost:9200/_cluster/state
As you can see there are two nodes. Next is input data to see sharding.

Index

Save ten records

$ curl -XPUT 'http://localhost:9200/_bulk' -d '
  {"index":{"_index":"shakespeare","_type":"act","_id":0}}
  {"line_id":1,"play_name":"Henry IV","speech_number":"","line_number":"","speaker":"","text_entry":"ACT I"}
  {"index":{"_index":"shakespeare","_type":"scene","_id":1}}
  {"line_id":2,"play_name":"Henry IV","speech_number":"","line_number":"","speaker":"","text_entry":"SCENE I. London. The palace."}
  {"index":{"_index":"shakespeare","_type":"line","_id":2}}
  {"line_id":3,"play_name":"Henry IV","speech_number":"","line_number":"","speaker":"","text_entry":"Enter KING HENRY, LORD JOHN OF LANCASTER, the EARL of WESTMORELAND, SIR WALTER BLUNT, and others"}
  {"index":{"_index":"shakespeare","_type":"line","_id":3}}
  {"line_id":4,"play_name":"Henry IV","speech_number":1,"line_number":"1.1.1","speaker":"KING HENRY IV","text_entry":"So shaken as we are, so wan with care,"}
  {"index":{"_index":"shakespeare","_type":"line","_id":4}}
  {"line_id":5,"play_name":"Henry IV","speech_number":1,"line_number":"1.1.2","speaker":"KING HENRY IV","text_entry":"Find we a time for frighted peace to pant,"}
  {"index":{"_index":"shakespeare","_type":"line","_id":5}}
  {"line_id":6,"play_name":"Henry IV","speech_number":1,"line_number":"1.1.3","speaker":"KING HENRY IV","text_entry":"And breathe short-winded accents of new broils"}
  {"index":{"_index":"shakespeare","_type":"line","_id":6}}
  {"line_id":7,"play_name":"Henry IV","speech_number":1,"line_number":"1.1.4","speaker":"KING HENRY IV","text_entry":"To be commenced in strands afar remote."}
  {"index":{"_index":"shakespeare","_type":"line","_id":7}}
  {"line_id":8,"play_name":"Henry IV","speech_number":1,"line_number":"1.1.5","speaker":"KING HENRY IV","text_entry":"No more the thirsty entrance of this soil"}
  {"index":{"_index":"shakespeare","_type":"line","_id":8}}
  {"line_id":9,"play_name":"Henry IV","speech_number":1,"line_number":"1.1.6","speaker":"KING HENRY IV","text_entry":"Shall daub her lips with her own childrens blood;"}
  {"index":{"_index":"shakespeare","_type":"line","_id":9}}
  {"line_id":10,"play_name":"Henry IV","speech_number":1,"line_number":"1.1.7","speaker":"KING HENRY IV","text_entry":"Nor more shall trenching war channel her fields,"}
'
(If you are in Windows you can use Firefox + RESTClient to input data just like I did.)

See result via `http://localhost:9200/_search?q=*`, note total number of records is 10.

Verify

Firstly let's check sharding status by look index folder

/elasticsearch-home-folder/node1/data/my_elasticsearch/nodes/0/indices/shakespeare/
/elasticsearch-home-folder/node1/data/my_elasticsearch/nodes/0/indices/shakespeare/
A node have 2 shards (folders) and the other have 3.

Query Parameter `routing` can be use for search only specific shard

http://localhost:9200/_search?q=*&routing=1,3
-- only return 4 records

Then shutdown the second node

$ curl -XPOST 'http://localhost:9200/_cluster/nodes/node2/_shutdown'

Query to see

http://localhost:9200/_search?q=*
Oh yes only data on live sharding are returned.

Check cluster health by the way

http://localhost:9200/_cluster/health?pretty=true
The status is "red" means some shard is not available (If cluster replication is enabled and node2 is down, status will be "yellow").

In Conclusion

Now you know it is working as said, don't disable replication unless you don't want reliability.
OK next step is read each of setting property, testing, capacity planning etc.

Reference

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery.html
http://stackoverflow.com/questions/15694724/shards-and-replicas-in-elasticsearch

25 Sep 2014