MySQL 5.7: Performance implementing subquery in WHERE clause affecting large data sets

👀 Views: 70 💬 Answers: 1 📅 Created: 2025-06-12

I'm working on a personal project and I can't seem to get I'm working with important performance optimization when executing a query that involves a subquery in the WHERE clause. I'm using MySQL 5.7 and have a large dataset (around 1 million rows) in my `orders` table. The query looks something like this: ```sql SELECT * FROM orders o WHERE o.customer_id IN ( SELECT c.id FROM customers c WHERE c.region = 'North' ); ``` This query runs very slowly, and I noticed that the execution time can exceed several seconds, which is quite unacceptable for my application. I've tried adding indexes on both `customer_id` in the `orders` table and `region` in the `customers` table, but it doesn't seem to help much. Additionally, when I run `EXPLAIN` on the query, it shows that it’s performing a full scan on the `customers` table, which I suspect is contributing to the slowness. I thought subqueries would be optimized, but in this case, it doesn't seem to be working efficiently. I also considered rewriting the query using a JOIN instead of a subquery: ```sql SELECT o.* FROM orders o JOIN customers c ON o.customer_id = c.id WHERE c.region = 'North'; ``` This version performs significantly better, but I’m wondering if there are other best practices I might be missing when using subqueries with large datasets. Is there a way to optimize the original query further without changing its structure? Any insights into how MySQL handles subquery optimizations in version 5.7 would be greatly appreciated. I'm working on a REST API that needs to handle this. I'd really appreciate any guidance on this. This is for a CLI tool running on Ubuntu 20.04.