CodexBloom - Programming Q&A Platform

Elasticsearch 8.5 Indexing Performance implementing Large Bulk Requests

πŸ‘€ Views: 128 πŸ’¬ Answers: 1 πŸ“… Created: 2025-06-14
elasticsearch performance bulk-indexing Java

I'm maintaining legacy code that I'm learning this framework and I'm working on a personal project and I am experiencing important performance degradation when trying to index large datasets into Elasticsearch 8.5 using bulk requests..... My application is built with Java and I'm using the Elasticsearch RestHighLevelClient to perform the indexing. When sending bulk requests containing more than 10,000 documents, the response time drastically increases, often exceeding 30 seconds, which is unacceptable for my use case. I have tried batching my data into smaller chunks (around 5,000 documents) but the performance is still not ideal. Additionally, I’ve ensured that my Elasticsearch nodes have sufficient resources (CPU, memory, disk I/O) and that the cluster health is green. Here’s a snippet of the code I am using to perform the bulk indexing: ```java BulkRequest bulkRequest = new BulkRequest(); for (Document doc : documents) { IndexRequest indexRequest = new IndexRequest("my_index").id(doc.getId()).source(doc.toMap()); bulkRequest.add(indexRequest); } BulkResponse bulkResponse = client.bulk(bulkRequest, RequestOptions.DEFAULT); if (bulkResponse.hasFailures()) { System.err.println("Bulk indexing had failures: " + bulkResponse.buildFailureMessage()); } ``` I’ve also attempted to adjust the `refresh_interval` to `-1` during the bulk operation, but noticed little improvement. Monitoring the cluster during the process shows that it is not overwhelmed, so I suspect there may be some configuration that I am missing. Is there a recommended best practice for optimizing bulk indexing performance in Elasticsearch 8.5? Any insights or suggestions would be greatly appreciated. For context: I'm using Java on macOS. Is there a better approach? For context: I'm using Java on Windows 11. Any ideas what could be causing this? This is part of a larger CLI tool I'm building. I appreciate any insights!