CodexBloom - Programming Q&A Platform

GCP BigQuery Query Performance Issues When Joining Large Tables

👀 Views: 20 đŸ’Ŧ Answers: 1 📅 Created: 2025-06-07
google-bigquery performance sql-optimization SQL

I'm integrating two systems and I just started working with I'm stuck on something that should probably be simple. I'm experiencing significant performance issues when executing queries in BigQuery that involve joining two large tables. My current setup involves a `sales` table with around 100 million rows and a `products` table with about 50 million rows. The query I am running looks like this: ```sql SELECT s.order_id, p.product_name, s.sale_amount FROM `my_project.my_dataset.sales` AS s JOIN `my_project.my_dataset.products` AS p ON s.product_id = p.product_id WHERE s.sale_date BETWEEN '2023-01-01' AND '2023-01-31' ORDER BY s.sale_amount DESC ``` The issue is that this query takes over 2 minutes to execute, which is unacceptable for our reporting needs. I have tried a few optimizations: 1. Ensured both tables have appropriate partitioning and clustering set up. The `sales` table is partitioned by `sale_date` and clustered by `product_id`. 2. Used `SELECT *` to identify potential fields that could be excluded for better performance, but this didn't make a noticeable difference. 3. Applied the `LIMIT` clause on the result set to test how the execution behaves with fewer rows, but the time to execute the join itself remains high. I also checked the execution details in the BigQuery console, and it looks like the majority of the time is spent in the joining phase. Could there be additional strategies or best practices I can employ to improve the performance of this query? Is there a specific way to structure the tables or the query to reduce execution time? Any insights or suggestions would be greatly appreciated! Thanks in advance! Any pointers in the right direction? What am I doing wrong?