Elasticsearch 8.5 Aggregation on Nested Fields Returns Inaccurate Results in Complex Queries
Hey everyone, I'm running into an issue that's driving me crazy. I'm working with Elasticsearch 8.5, and I've encountered an issue while trying to perform aggregations on nested fields. My index contains documents with a nested structure where each document has multiple reviews as nested objects. When I attempt to aggregate the average rating from these reviews, the results seem inconsistent, especially when I filter based on other non-nested fields. Hereโs a simplified version of my mapping: ```json { "mappings": { "properties": { "product_id": { "type": "keyword" }, "reviews": { "type": "nested", "properties": { "rating": { "type": "float" }, "comment": { "type": "text" } } } } } } ``` When I run the following aggregation query, I expect to get the average rating for each product filtered by a specific category: ```json { "query": { "term": { "category": "electronics" } }, "aggs": { "products": { "terms": { "field": "product_id" }, "aggs": { "average_rating": { "avg": { "script": { "source": "doc['reviews.rating'].value" } } } } } } } ``` However, the average rating returned is often lower than what I expect based on the actual reviews for those products. Additionally, if I remove the filter for the category, I get a different average which suggests that the filter affects nested field aggregation in an unexpected way. Iโve also tried using the `nested` aggregation: ```json { "aggs": { "nested_reviews": { "nested": { "path": "reviews" }, "aggs": { "filtered_reviews": { "filter": { "term": { "category": "electronics" } }, "aggs": { "average_rating": { "avg": { "field": "reviews.rating" } } } } } } } } ``` This also doesn't yield the correct results. Iโve read the Elasticsearch documentation regarding nested queries and aggregations, but I still canโt figure out how to get accurate results. Is there a specific way to structure the aggregation or filter to correctly compute the average rating from nested documents when filtering by top-level fields? Any help or insights on this would be greatly appreciated! For context: I'm using Json on macOS.