Elasticsearch 8.5 Aggregation on Nested Fields Returns Inconsistent Results with Filters
I'm deploying to production and I'm converting an old project and I'm working on a project and hit a roadblock. I tried several approaches but none seem to work... I'm relatively new to this, so bear with me. I'm currently working with Elasticsearch 8.5 and I've encountered an issue when performing aggregations on nested fields. I have a document structure that looks like this: ```json { "user": "john_doe", "posts": [ { "title": "First Post", "tags": ["elasticsearch", "search"], "likes": 10 }, { "title": "Second Post", "tags": ["search", "database"], "likes": 5 } ] } ``` I want to get the average likes for posts with a specific tag. To achieve this, I've structured my query as follows: ```json { "query": { "nested": { "path": "posts", "query": { "term": {"posts.tags": "elasticsearch"} } } }, "aggs": { "average_likes": { "nested": { "path": "posts" }, "aggs": { "filtered_posts": { "filter": { "term": {"posts.tags": "elasticsearch"} }, "aggs": { "avg_likes": { "avg": {"field": "posts.likes"} } } } } } } } ``` However, this returns inconsistent results. Sometimes I get a correct average of 10 likes, but other times it returns an average of 5, even when the documents seem consistent. Iโve confirmed that the nested documents are indexed properly and I can retrieve them using a straightforward `GET` request. Iโve tried several variations of the query, including moving filters around and using `bool` queries, but nothing seems to yield consistent results. Hereโs a variation I attempted: ```json { "query": { "bool": { "must": [ { "nested": { "path": "posts", "query": { "term": {"posts.tags": "elasticsearch"} } } } ] } }, "aggs": { "avg_likes": { "avg": {"field": "posts.likes"} } } } ``` This change also led to unpredictable results. Additionally, Iโve confirmed that the data returned without aggregation is as expected. If anyone has experience with nested aggregations in Elasticsearch or can provide insights into the correct form of the aggregate query, I would greatly appreciate it. Is there something I might be overlooking in the aggregation structure or in how nested queries are supposed to interact with filters? I'm working on a CLI tool that needs to handle this. What am I doing wrong? Cheers for any assistance! Thanks for taking the time to read this! Any ideas what could be causing this? I'm coming from a different tech stack and learning Json. Thanks for your help in advance!