CodexBloom - Programming Q&A Platform

Django ORM Performance implementing Complex Queries and Large Datasets

👀 Views: 881 💬 Answers: 1 📅 Created: 2025-08-21
django orm performance Python

I'm working with important performance optimization with my Django application when querying a large dataset (around 1 million records) using the ORM... The query I'm running is quite complex and involves multiple filters and joins, which results in a noticeable delay (up to 10 seconds) when fetching the data. Here’s a simplified version of my query: ```python from myapp.models import Order, Customer results = Order.objects.filter(status='shipped').select_related('customer').annotate(total=Sum('items__price')).order_by('-total')[:100] ``` While testing, I noticed that the `SELECT` statement generated by Django is quite large and includes multiple joins. I have tried using `only()` to limit the fields fetched, but it hasn’t made a important difference. I also enabled query logging and saw that it was running a full table scan on the `Order` table, which seems inefficient for my use case. I’m aware that using `prefetch_related` might help in some cases, but I’m unsure how to apply it correctly given the structure of my data. Additionally, I've indexed the `status` and `customer_id` fields to improve lookup times. Here’s the relevant part of my model definitions: ```python class Customer(models.Model): name = models.CharField(max_length=100) class Order(models.Model): status = models.CharField(max_length=20) customer = models.ForeignKey(Customer, on_delete=models.CASCADE) items = models.ManyToManyField(Item) ``` Despite these changes, the performance remains poor, and I often get a timeout behavior when querying the database through the Django shell. Are there any best practices or optimization techniques you would recommend to improve the query performance in this scenario? Would raw SQL be a better approach, or are there specific ORM patterns I might be overlooking? I’m using Django 3.2 and PostgreSQL 12, and I’d really appreciate any insights into optimizing this query. I'm using Python LTS in this project. Any ideas how to fix this?