How to optimize memory usage in Python using generators vs lists?

👀 Views: 3 💬 Answers: 1 📅 Created: 2025-06-10

python memory-optimization generators Python

I'm working on a personal project and I'm working on a data processing script in Python 3.9 that reads a large CSV file (over 1GB) and processes it line by line... I've been using a list to store the processed results, but I've noticed my script consumes a lot of memory and slows down significantly as the data size increases. I’m considering switching to generators to optimize memory usage, but I'm unsure how to implement this effectively. Here's a simplified version of my current code: ```python import csv results = [] with open('large_file.csv', 'r') as file: reader = csv.reader(file) for row in reader: processed_row = process_row(row) # Assume this function processes each row results.append(processed_row) print(len(results)) # Output the number of processed rows ``` While this works, the memory footprint is quite high. I want to refactor the code to use a generator function. Here’s what I've attempted so far: ```python def process_rows(file_path): with open(file_path, 'r') as file: reader = csv.reader(file) for row in reader: yield process_row(row) ``` Then I would call it like this: ```python results = list(process_rows('large_file.csv')) ``` However, I’m not seeing a significant improvement in memory usage. I also get the following warning when running the script: ``` UserWarning: The total number of rows exceeds 100,000. ``` Could this be related to the way I'm still converting the generator back into a list? What is the best practice here for keeping memory usage low while still processing all rows? Should I avoid calling `list()` on the generator entirely? Any insights on how to efficiently handle large datasets in this context would be greatly appreciated.