CodexBloom - Programming Q&A Platform

Django REST Framework - Handling Large CSV Imports with Async Functionality

👀 Views: 59 💬 Answers: 1 📅 Created: 2025-09-06
django celery rest-framework asyncio csv Python

I'm prototyping a solution and I'm building a feature where I've searched everywhere and can't find a clear answer... I've searched everywhere and can't find a clear answer. I've looked through the documentation and I'm still confused about During a migration project involving a legacy application, I'm tasked with optimizing how we handle large CSV file imports in a Django REST Framework application. The legacy system processed these synchronously, causing significant performance bottlenecks. To improve efficiency, I've decided to implement asynchronous functionality but have run into some challenges. I've started by using `djangorestframework` and `django-celery`, intending to offload the CSV processing to a background task. Here’s a simplified version of what I’ve done: ```python from rest_framework.views import APIView from rest_framework.response import Response from .tasks import process_csv_import class CSVImportView(APIView): def post(self, request): file = request.FILES['file'] process_csv_import.delay(file.read()) # Offloading to Celery task return Response({'status': 'Import started'}, status=202) ``` The `process_csv_import` function, defined in my Celery tasks, looks like this: ```python from celery import shared_task import pandas as pd @shared_task def process_csv_import(file_content): df = pd.read_csv(pd.compat.StringIO(file_content.decode('utf-8'))) # Process DataFrame... ``` Despite this setup, I've noticed that using `StringIO` to convert the file content leads to memory issues when importing very large files (over 100MB). To mitigate this, I’ve explored using Django's `StreamingHttpResponse` to handle file uploads in chunks, but implementing this in conjunction with Celery has been tricky. I want to ensure that I maintain a responsive API while concurrently processing these large datasets. Has anyone tackled similar issues? Maybe there’s a better way to manage memory while still leveraging async processing? Additionally, I’m considering error handling. If the CSV import fails halfway through, I’d like to implement a rollback mechanism, but I’m unsure how to best integrate that with Celery tasks. Any advice on handling such scenarios would be greatly appreciated. My development environment is Ubuntu. Has anyone else encountered this? My development environment is Linux. For context: I'm using Python on Linux. Thanks in advance! This is my first time working with Python latest. I'm working with Python in a Docker container on Debian.