Handling irregular delimiters in mixed CSV files with Python's csv module

👀 Views: 59 💬 Answers: 1 📅 Created: 2025-06-14

I'm relatively new to this, so bear with me. I'm working with a set of mixed CSV files where some files use commas as delimiters while others use semicolons. When I try to read these files using Python's built-in `csv` module, I encounter issues with rows that have extra delimiters. I tried using `csv.reader` with the appropriate delimiter, but the output is inconsistent. Here’s a simplified version of my code: ```python import csv # Example of how I'm currently reading the files files = ['file1.csv', 'file2.csv'] for file in files: with open(file, 'r', newline='') as f: if file.endswith('.csv'): delimiter = ',' else: delimiter = ';' reader = csv.reader(f, delimiter=delimiter) for row in reader: print(row) ``` The files contain rows like the following, where the delimiter is sometimes inconsistent: ``` Name, Age, Location John Doe; 32; New York Jane Smith, 28, Los Angeles ``` After running the script, I get unexpected splits in the output: ``` ['John Doe', ' 32', ' New York'] ['Jane Smith', ' 28', ' Los Angeles'] ``` I’ve tried manually replacing semicolons with commas before parsing, but that feels inefficient. What’s a better approach to handle this without pre-processing the files? I also want to ensure that I maintain the integrity of the data. I'm using Python 3.9 and am open to using additional libraries if necessary. Any insights or best practices would be greatly appreciated! I'm working on a application that needs to handle this. What's the best practice here? I'm working in a Debian environment. This is for a desktop app running on Ubuntu 22.04.