CodexBloom - Programming Q&A Platform

PostgreSQL: Performance issues with large datasets when using CTEs and JSON aggregation

👀 Views: 14 đŸ’Ŧ Answers: 1 📅 Created: 2025-06-11
postgresql performance json sql

I'm trying to debug I'm currently experiencing severe performance degradation when executing a query that utilizes Common Table Expressions (CTEs) along with JSON aggregation in PostgreSQL 13. My dataset consists of over 1 million records, and the query takes several minutes to execute, which is unacceptable for my application. The goal is to fetch user data along with their associated orders and return it in a JSON format. Here's a simplified version of my query: ```sql WITH user_orders AS ( SELECT u.id AS user_id, u.name, o.id AS order_id, o.total FROM users u LEFT JOIN orders o ON u.id = o.user_id ) SELECT json_agg(u) FROM user_orders u; ``` In my tests, I noticed that without the CTE, the performance improves significantly when I directly join the tables: ```sql SELECT json_agg(u) FROM ( SELECT u.id AS user_id, u.name, o.id AS order_id, o.total FROM users u LEFT JOIN orders o ON u.id = o.user_id ) u; ``` However, I prefer using CTEs for clarity in my code. I have tried adding indexes on the `user_id` column in the `orders` table to improve performance, but it didn't yield the expected results. I also analyzed the execution plan, and it shows that the CTE is materialized, which I suspect is causing the slowdown. Is there a recommended approach to optimize this query while still using CTEs? Are there any configurations I could tweak in PostgreSQL to improve performance in such scenarios? Any insights into how to manage large datasets efficiently in this context would be greatly appreciated. This is part of a larger CLI tool I'm building. What am I doing wrong?