PostgreSQL: Trouble with CTE performance when joining large datasets using ROW_NUMBER()

👀 Views: 55 💬 Answers: 1 📅 Created: 2025-06-12

I'm a bit lost with I'm experiencing important performance optimization when using a Common Table Expression (CTE) with `ROW_NUMBER()` to partition and filter a large dataset before joining it with another table. I have two tables: `sales` containing millions of records and `products` with a few thousand entries. The goal is to select the top 10 sales per product category based on the sales amount, and then join this filtered result with the `products` table to include product details. Here’s the SQL query I’m using: ```sql WITH RankedSales AS ( SELECT s.product_id, s.amount, ROW_NUMBER() OVER (PARTITION BY s.category_id ORDER BY s.amount DESC) AS rank FROM sales s ) SELECT p.product_name, rs.amount FROM RankedSales rs JOIN products p ON rs.product_id = p.id WHERE rs.rank <= 10; ``` While this query returns the expected results, it takes a very long time to execute, particularly when the `sales` table grows. I’ve tried adding indexes on `sales.category_id` and `sales.amount`, as well as on `products.id`, but the performance hasn’t improved much. When I analyze the execution plan, it shows that the CTE is materialized and the filtering happens after the join, which seems inefficient. I’ve also attempted to rewrite the query without a CTE, using a subquery instead, but the performance remains largely the same. Is there a better way to optimize this query? Could using a different window function or indexing strategy help? Any insights would be appreciated! This issue appeared after updating to Sql stable.