GCP BigQuery Query Performance Degradation When Using Nested JSON Structures

👀 Views: 744 💬 Answers: 1 📅 Created: 2025-06-14

I'm prototyping a solution and I'm stuck on something that should probably be simple... I'm experiencing important performance degradation when querying nested JSON structures in BigQuery. I have a dataset with a nested schema where one field contains an array of objects, and I'm using the `UNNEST()` function to flatten the results for analysis. However, the query that worked fine with a smaller dataset now takes several minutes to execute with larger data volumes. Here's an example of the query I'm using: ```sql SELECT a.id, a.name, b.item_name FROM `my_project.my_dataset.my_table` AS a LEFT JOIN UNNEST(a.items) AS b WHERE a.category = 'electronics' ORDER BY a.id; ``` I've tried optimizing the query by adding filters to the `UNNEST()` operation but to no avail. The `items` field can contain anywhere from 1 to over 100 objects, depending on the record, and I suspect that may be contributing to the slowdown. I also checked if any columns are partitioned or clustered, but they don’t appear to be defined in my schema. Could this performance scenario be due to the way nested structures are handled in BigQuery? Are there best practices for improving the query performance when dealing with large nested JSON datasets, or should I consider flattening my data beforehand? Any insights would be greatly appreciated! I'm working on a API that needs to handle this. Thanks in advance! This is for a web app running on CentOS. How would you solve this? I recently upgraded to Sql LTS. I'd be grateful for any help.