DZone
Database Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Database Zone > Elasticsearch Throttling Indexing

Elasticsearch Throttling Indexing

If segment merging falls behind indexing, Elasticsearch will throttle indexing requests to a single thread.

James Carr user avatar by
James Carr
·
Jun. 10, 16 · Database Zone · Analysis
Like (2)
Save
Tweet
9.13K Views

Join the DZone community and get the full member experience.

Join For Free

Recent adventures in Zapierland had me in a somewhat scary predicament. Starting at some point last week I’d get an alert that the Elasticsearch cluster we run Graylog against had hit very high CPU, load and memory usage. I was confused… we had more than enough horsepower to handle queries and normally the cluster only uses about 10% of CPU on each node. I paused message processing (that’s a nice feature of Graylog there, it just sticks the messages into a local journal) and began looking around. Nothing seemed off. Nothing stuck out in the logs. I noticed that the CPU and load eventually dropped, shrugged my shoulders and resumed message processing. Maybe a rogue query? Who knows.

A couple days later it happened again. Pause, resume. Everything began operating normally. While I had some hunches I had no solid evidence as to the cause and assumed that maybe it was time to expand the cluster. Since it wasn’t a fire (we just use Graylog for internal log searching of API requests) I added a card to our Trello board to just go ahead and kill two birds with one stone by upgrading Graylog and ES and ensuring we up the cluster size in the rebuilt cluster. I can handle the pause/unpause cycle in the short term.

As it continued happening daily, I got very curious. It only happened during peak load time and when there were a lot of staff members using it. And it always became 100% fine with a pause/unpause messaging cycle. I looked around some more and realized that we should tune the logging level a bit and see if anything stood out the next time it hit.

Sure enough, when our engineering team did a quick support blitz it happened again. This time, the Elasticsearch logs were filled to the brim with entries like the two below.

[2016-06-06 18:20:34,880][INFO ][index.engine ] [ip-10-0-0-2.ec2.internal]

[graylog2_761][0] now throttling indexing: numMergesInFlight=5, maxNumMerges=4

[2016-06-06 18:20:40,804][INFO ][index.engine ] [ip-10-0-0-2.ec2.internal]

[graylog2_761][0] stop throttling indexing: numMergesInFlight=3, maxNumMerges=4

Well, that’s a bit curious. Off to google. After a little searching, I came across this entry in the Elasticsearch reference.

Segment merging is computationally expensive, and can eat up a lot of disk I/O. Merges are scheduled to operate in the background because they can take a long time to finish, especially large segments. This is normally fine, because the rate of large segment merges is relatively rare.

But sometimes merging falls behind the ingestion rate. If this happens, Elasticsearch will automatically throttle indexing requests to a single thread. This prevents a segment explosion problem, in which hundreds of segments are generated before they can be merged. Elasticsearch will log INFO-level messages stating now throttling indexing when it detects merging falling behind indexing.

Elasticsearch defaults here are conservative: you don’t want search performance to be impacted by background merging. But sometimes (especially on SSD, or logging scenarios), the throttle limit is too low.

A key point following that statement is the revelation of what the default rate is.

The default is 20 MB/s, which is a good setting for spinning disks. If you have SSDs, you might consider increasing this to 100–200 MB/s.

Well, this seems promising because we DO use SSD. From the documentation, I issued a curl call to up the indices.store.throttle.max_bytes_per_sec rate.

PUT /_cluster/settings
{
    "persistent" : {
        "indices.store.throttle.max_bytes_per_sec" : "100mb"
    }
}


With these settings in place I had a few hours until the next support blitz. Suffice to say it has been several days and we haven’t had to pause message processing yet. This is definitely an important setting to ensure you configure correctly when you tune the defaults on your cluster.

Elasticsearch

Published at DZone with permission of James Carr, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Take Control of Your Application Security
  • Flutter vs React Native. How to Cover All Mobile Platforms in 2022 With No Hassle
  • How to Handle Early Startup Technical Debt (Or Just Avoid it Entirely)
  • Upload Files to AWS S3 in JMeter Using Groovy

Comments

Database Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo