Pandas how to to read CSV with mixed delimiters and inconsistent quoting
I'm collaborating on a project where I'm performance testing and I'm wondering if anyone has experience with I'm deploying to production and This might be a silly question, but I'm working on a project and hit a roadblock..... I am experiencing issues while trying to read a CSV file that has mixed delimiters (commas and semicolons) and inconsistent quoting. Some fields are enclosed in double quotes, while others are not. For instance, this is a sample of my CSV file: ``` "name","age","location" "John, Doe",30,"New York" "Jane; Smith",28,Los Angeles "Bob ""The Builder""",35,"San Francisco" ``` I am using Pandas version 1.3.3 and attempting to read the file with the following code: ```python import pandas as pd df = pd.read_csv('data.csv') ``` However, I get the following behavior: ``` ParserError: behavior tokenizing data. C behavior: Expected 3 fields in line 3, saw 4 ``` I've tried various parameters like `delimiter`, `quotechar`, and `escapechar`, but I still need to seem to get it right. For example: ```python df = pd.read_csv('data.csv', delimiter=';', quotechar='"') ``` This throws another behavior regarding tokenization. I also tried using `error_bad_lines=False`, but that just skips problematic lines instead of fixing the parsing scenario. Is there a way I can read this CSV properly without preprocessing it manually? Any help or advice on how to handle mixed delimiters and inconsistent quoting in Pandas would be greatly appreciated. This is part of a larger application I'm building. Any help would be greatly appreciated! What's the best practice here? For context: I'm using Python on macOS. My team is using Python for this application. I'd be grateful for any help. My development environment is Ubuntu 22.04. I'd love to hear your thoughts on this. This is happening in both development and production on Ubuntu 20.04. This is happening in both development and production on Ubuntu 20.04.