Mangesh

Posted on Apr 29

#Elasticsearch # 🚀 Why Elasticsearch May Block When Updating Large Indexes — And How To Fix It

#opensource #elasticsearch #dba

Elasticsearch is powerful, but when you're running updates on a large index, you might suddenly notice something scary: your cluster slows down, gets stuck, or even blocks entirely.

If you've hit this frustrating wall — don't worry. Let's break down why it happens, what’s going on under the hood, and how to fix or avoid it.

❓ What's Really Happening?

When you update a document in Elasticsearch, it doesn't update it in place. Instead, Elasticsearch:

Marks the old document as deleted,
Indexes a new version of the document.

This means updates are effectively new writes plus old deletions.

In small indexes, this isn't a problem.

But in huge indexes (millions+ documents):

Massive delete markers pile up.
Segment files get bloated.
Disk I/O becomes heavy.
Cluster memory pressure rises.

Eventually, Elasticsearch pauses indexing or blocks updates to protect cluster health.

You might see errors like:

cluster_block_exception

flood_stage disk watermark exceeded

📊 Visual: Lifecycle of an Update

graph LR
A[Update request] --> B{Old document}
B -->|Mark as deleted| C[Delete marker created]
A --> D[New document version indexed]
C --> E[Segments grow larger]
D --> E
E --> F[Merge pressure, Disk usage rises]
F --> G{Cluster may block}

🔥 Main Triggers for Blocking

Cause	What Happens
Disk Watermarks	Elasticsearch stops writes if disk usage > 95%
Segment Merging Pressure	Too many old segments = heavy merge operations
Memory Pressure	High heap usage can trigger slowdowns and rejections
Index Settings Too Aggressive	Small refresh intervals, low merge throttling

📈 Real World Example

Imagine you have:

An index of 500 million documents
You need to update a "status" field across all documents
You run an _update_by_query

Without precautions, your cluster may block or crash halfway through!

You might see:

[es-data-node] disk usage exceeded flood_stage watermark [95%], blocking writes

too_many_requests_exception: [rejected execution of coordinating and primary thread due to too many concurrent requests]

🛡️ Safe Elasticsearch Mass Update Checklist

Step	Action	✅ Done
1	🔍 Assess Index Size: Check doc count, disk size, and shard distribution	⬜
2	🧹 Delete Old Data: Remove unnecessary docs or indices first	⬜
3	🚨 Monitor Disk Watermarks: Ensure disk usage < 85%	⬜
4	🛠️ Tune `refresh_interval`: Set `index.refresh_interval: -1` (disable auto-refresh during update)	⬜
5	⚙️ Batch Carefully: Use `scroll_size` (e.g., 1000-5000) and control update rates	⬜
6	📦 Use _reindex Instead of _update_by_query (when possible)	⬜
7	🧮 Adjust Merge Settings: Slow merging slightly (`max_thread_count=1`)	⬜
8	🧠 Plan for Throttling: Use `requests_per_second` in `_update_by_query`	⬜
9	📈 Setup Cluster Monitoring: Watch heap usage, pending merges, disk I/O	⬜
10	🔓 Temporarily Raise Flood Stage (only if necessary): Bump flood_stage watermark to 98% cautiously	⬜
11	🧪 Test on a Small Index First: Validate the process before full production run	⬜
12	✅ Run Update: Monitor closely during execution	⬜
13	♻️ Re-enable Refresh Interval: After completion, reset `index.refresh_interval` (e.g., `1s`)	⬜
14	📊 Force Merge (Optional): Optimize the index after major updates	⬜

📋 Quick Example: Adjust Settings Before and After Mass Update

Before mass update:

PUT /your-index/_settings
{
  "index": {
    "refresh_interval": "-1",
    "merge.scheduler.max_thread_count": "1"
  }
}

After mass update:

PUT /your-index/_settings
{
  "index": {
    "refresh_interval": "1s"
  }
}

✨ Conclusion

Updating large Elasticsearch indexes can overwhelm your cluster — unless you plan carefully.

By following this guide:

You can update millions of documents safely,
Avoid cluster slowdowns and outages,
And maintain peak Elasticsearch performance!

Stay smart, stay scalable! 🚀

📍 Pro Tip: Always test mass update strategies on a non-production clone before executing them on live indices!

DEV Community