MySQL Query Performance Issues with Large Data Set and Subquery Optimization
I'm facing significant performance issues when executing a query in MySQL 8.0 that involves a subquery to filter results from a large dataset. I have a table `orders` with over 1 million records and another table `customers` with around 500,000 records. My query is intended to fetch customer names along with the total order amounts for those who have made purchases above a certain threshold. Hereβs the query Iβm currently using: ```sql SELECT c.name, (SELECT SUM(o.amount) FROM orders o WHERE o.customer_id = c.id) AS total_amount FROM customers c WHERE c.id IN (SELECT o.customer_id FROM orders o WHERE o.amount > 100); ``` However, the query takes several minutes to execute, and I suspect the subquery is the bottleneck. I tried adding indexes on `orders.customer_id` and `orders.amount`, but there hasn't been a noticeable improvement in performance. Additionally, I've experimented with using a JOIN instead of the subquery: ```sql SELECT c.name, SUM(o.amount) AS total_amount FROM customers c JOIN orders o ON c.id = o.customer_id WHERE o.amount > 100 GROUP BY c.name; ``` This approach returns results faster, but I noticed it doesn't filter customers based on having multiple qualifying orders, which is crucial for my application logic. Is there a more efficient way to structure this query to both enhance performance and maintain the logic of filtering customers by total order amounts? Any insights or best practices for optimizing such queries in MySQL would be greatly appreciated! This is happening in both development and production on macOS. Any advice would be much appreciated.