DEV Community

Mangesh
Mangesh

Posted on

#Elasticsearch # ๐Ÿš€ Why Elasticsearch May Block When Updating Large Indexes โ€” And How To Fix It

Elasticsearch is powerful, but when you're running updates on a large index, you might suddenly notice something scary: your cluster slows down, gets stuck, or even blocks entirely.

If you've hit this frustrating wall โ€” don't worry. Let's break down why it happens, whatโ€™s going on under the hood, and how to fix or avoid it.


โ“ What's Really Happening?

When you update a document in Elasticsearch, it doesn't update it in place. Instead, Elasticsearch:

  1. Marks the old document as deleted,
  2. Indexes a new version of the document.

This means updates are effectively new writes plus old deletions.

In small indexes, this isn't a problem.

But in huge indexes (millions+ documents):

  • Massive delete markers pile up.
  • Segment files get bloated.
  • Disk I/O becomes heavy.
  • Cluster memory pressure rises.

Eventually, Elasticsearch pauses indexing or blocks updates to protect cluster health.

You might see errors like:

cluster_block_exception
Enter fullscreen mode Exit fullscreen mode

or

flood_stage disk watermark exceeded
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“Š Visual: Lifecycle of an Update

graph LR
A[Update request] --> B{Old document}
B -->|Mark as deleted| C[Delete marker created]
A --> D[New document version indexed]
C --> E[Segments grow larger]
D --> E
E --> F[Merge pressure, Disk usage rises]
F --> G{Cluster may block}
Enter fullscreen mode Exit fullscreen mode

๐Ÿ”ฅ Main Triggers for Blocking

Cause What Happens
Disk Watermarks Elasticsearch stops writes if disk usage > 95%
Segment Merging Pressure Too many old segments = heavy merge operations
Memory Pressure High heap usage can trigger slowdowns and rejections
Index Settings Too Aggressive Small refresh intervals, low merge throttling

๐Ÿ“ˆ Real World Example

Imagine you have:

  • An index of 500 million documents
  • You need to update a "status" field across all documents
  • You run an _update_by_query

Without precautions, your cluster may block or crash halfway through!

You might see:

[es-data-node] disk usage exceeded flood_stage watermark [95%], blocking writes
Enter fullscreen mode Exit fullscreen mode

or

too_many_requests_exception: [rejected execution of coordinating and primary thread due to too many concurrent requests]
Enter fullscreen mode Exit fullscreen mode

๐Ÿ›ก๏ธ Safe Elasticsearch Mass Update Checklist

Step Action โœ… Done
1 ๐Ÿ” Assess Index Size: Check doc count, disk size, and shard distribution โฌœ
2 ๐Ÿงน Delete Old Data: Remove unnecessary docs or indices first โฌœ
3 ๐Ÿšจ Monitor Disk Watermarks: Ensure disk usage < 85% โฌœ
4 ๐Ÿ› ๏ธ Tune refresh_interval: Set index.refresh_interval: -1 (disable auto-refresh during update) โฌœ
5 โš™๏ธ Batch Carefully: Use scroll_size (e.g., 1000-5000) and control update rates โฌœ
6 ๐Ÿ“ฆ Use _reindex Instead of _update_by_query (when possible) โฌœ
7 ๐Ÿงฎ Adjust Merge Settings: Slow merging slightly (max_thread_count=1) โฌœ
8 ๐Ÿง  Plan for Throttling: Use requests_per_second in _update_by_query โฌœ
9 ๐Ÿ“ˆ Setup Cluster Monitoring: Watch heap usage, pending merges, disk I/O โฌœ
10 ๐Ÿ”“ Temporarily Raise Flood Stage (only if necessary): Bump flood_stage watermark to 98% cautiously โฌœ
11 ๐Ÿงช Test on a Small Index First: Validate the process before full production run โฌœ
12 โœ… Run Update: Monitor closely during execution โฌœ
13 โ™ป๏ธ Re-enable Refresh Interval: After completion, reset index.refresh_interval (e.g., 1s) โฌœ
14 ๐Ÿ“Š Force Merge (Optional): Optimize the index after major updates โฌœ

๐Ÿ“‹ Quick Example: Adjust Settings Before and After Mass Update

Before mass update:

PUT /your-index/_settings
{
  "index": {
    "refresh_interval": "-1",
    "merge.scheduler.max_thread_count": "1"
  }
}
Enter fullscreen mode Exit fullscreen mode

After mass update:

PUT /your-index/_settings
{
  "index": {
    "refresh_interval": "1s"
  }
}
Enter fullscreen mode Exit fullscreen mode

โœจ Conclusion

Updating large Elasticsearch indexes can overwhelm your cluster โ€” unless you plan carefully.

By following this guide:

  • You can update millions of documents safely,
  • Avoid cluster slowdowns and outages,
  • And maintain peak Elasticsearch performance!

Stay smart, stay scalable! ๐Ÿš€


๐Ÿ“ Pro Tip: Always test mass update strategies on a non-production clone before executing them on live indices!

Top comments (0)

DEV Launches and Announcements

๐Ÿฏ ๐Ÿš€ Timescale is now TigerData: Building the Modern PostgreSQL for the Analytical and Agentic Era

TL;DR: Eight years ago, we launched Timescale to bring time-series to PostgreSQL. Our mission was simple: help developers building time-series applications.

Check out the challenge

DEV is bringing live events to the community. Dismiss if you're not interested. โค๏ธ