CodexBloom - Programming Q&A Platform

Elasticsearch Bulk Indexing Performance implementing Spring Data Elasticsearch

๐Ÿ‘€ Views: 79 ๐Ÿ’ฌ Answers: 1 ๐Ÿ“… Created: 2025-06-13
elasticsearch spring-data performance Java

Hey everyone, I'm running into an issue that's driving me crazy... I'm relatively new to this, so bear with me... I'm experiencing important performance optimization when trying to bulk index documents into an Elasticsearch instance using Spring Data Elasticsearch (version 4.3.3). I have a batch size of 1000 documents, but the indexing process is taking much longer than expected, sometimes up to several minutes for just a few thousand documents. I am using the following approach to perform bulk indexing: ```java public void bulkIndex(List<MyDocument> documents) { List<IndexQuery> indexQueries = new ArrayList<>(); for (MyDocument doc : documents) { IndexQuery indexQuery = new IndexQueryBuilder() .withId(doc.getId()) .withObject(doc) .build(); indexQueries.add(indexQuery); } elasticsearchOperations.bulkIndex(indexQueries); } ``` Iโ€™ve tried adjusting the batch size to 500 and 2000, but the performance doesnโ€™t seem to improve significantly. I am also using the `elasticsearch-spring-data` module with the default settings, which might not be optimized for bulk operations. The Elasticsearch cluster is configured with 2 nodes, and I have a replica set for fault tolerance. However, the indexing throughput is still underwhelming. Additionally, I checked the Elasticsearch logs, and I see warnings like: ``` [WARN ][o.e.c.a.s.ShardStateAction] [node-1] [my_index][0] failed to write to the index ``` I am not sure if this is affecting the bulk indexing process or if itโ€™s related to my index mappings and settings. My mapping only has a few text fields and a couple of keyword fields: ```json { "mappings": { "properties": { "title": { "type": "text" }, "description": { "type": "text" }, "tags": { "type": "keyword" } } } } ``` Could the index settings or my approach be causing the slowdown? What configurations should I adjust or best practices should I follow to improve the bulk indexing performance? Any help would be greatly appreciated! This is part of a larger application I'm building. What am I doing wrong? This is my first time working with Java 3.11. Any ideas how to fix this? I'm working with Java in a Docker container on Windows 10. Any ideas how to fix this?