CodexBloom - Programming Q&A Platform

Pandas read_csv not recognizing multi-line records with custom delimiters

👀 Views: 41 đŸ’Ŧ Answers: 1 📅 Created: 2025-06-14
pandas csv dataframe Python

I'm working on a personal project and I keep running into I'm sure I'm missing something obvious here, but I'm running into an scenario with Pandas' `read_csv` function where my CSV file contains multi-line records that use a custom delimiter (`|`) instead of the standard comma. The multi-line records are enclosed in double quotes and contain newline characters. Despite providing the `quotechar` and `delimiter` parameters, I'm still working with issues where the resulting DataFrame has incorrect row counts. Here's a snippet of my code: ```python import pandas as pd df = pd.read_csv('data.csv', delimiter='|', quotechar='"') print(df) ``` The behavior message I get is `ParserError: behavior tokenizing data. C behavior: Expected 3 fields in line 42, saw 4`. I've checked the CSV structure, and it appears that some records are being split incorrectly at the newline characters. I've tried to set the `lineterminator` parameter to `\n` as well, but it hasn't resolved the scenario. Here's a sample of my CSV data: ```plaintext "id|name|description" "1|John Doe|A sample entry with newline" "2|Jane Smith|Another entry" ``` The first entry should be treated as a single record, but it seems like Pandas is interpreting the newline as the end of the record, leading to misalignment of columns. Is there a specific configuration or workaround I should use to ensure that multi-line records are read correctly? I would appreciate any guidance on best practices for handling such CSV formats in Pandas. I'm working on a CLI tool that needs to handle this. Any ideas how to fix this? This is for a microservice running on Ubuntu 20.04.