Why would you distribute replica shards across nodes and not mirror replicas?

Question

I'm trying to understand how replicas work in Elasticsearch.

This is a very basic question, but can someone explain why one node is not just all replicas? Surely in this example, if one node fails 50% of the shards will be lost?

AMtwo · Accepted Answer · 2021-03-09T22:48:11.513

0

In your example, there are two nodes.

The data is represented at 4 data set shards: A, B, C, D.

Both nodes hold all 4 data sets, with each node as primary writable shard for 2 data sets, and secondary (read only) replica shards for 2 data sets.

This allows you to spread workload across both nodes, with each node serving half of the workload. In the event if a node failure, the secondary replica data sets can be made primary on the other node, with a single node serving 100% of the workload. There is no data loss because A, B, C, & D exist on both nodes.

edited Mar 09 '21 at 22:48

answered Mar 06 '21 at 18:22

AMtwo

16,141
1
32
61

You may want to clarify that this is specifically to distribute writes, because they happen on primary shards first. Reads could be distributed even if there were only one node holding all primary shards. – mustaccio Mar 06 '21 at 18:44

Why would you distribute replica shards across nodes and not mirror replicas?

1 Answers1