CodexBloom - Programming Q&A Platform

Regex scenarios to Match Comma-Separated Values with Optional Quoting in Python - Need guide with Edge Cases

👀 Views: 16 đŸ’Ŧ Answers: 1 📅 Created: 2025-06-11
regex csv python Python

Could someone explain I'm migrating some code and I've spent hours debugging this and I'm maintaining legacy code that I'm trying to parse a CSV file where fields can either be plain text or quoted strings, and some fields may contain commas..... I'm using the following regex to match the fields: ```python import re pattern = r'("[^"]*"|[^,]+)' # Match quoted strings or unquoted fields with open('data.csv', 'r') as file: content = file.read() matches = re.findall(pattern, content) print(matches) ``` However, I'm working with issues when the fields contain escaped quotes or leading/trailing whitespace. For instance, given the input: ``` "field1, with comma", "field2 with "escaped" quotes", field3 ``` The regex doesn't seem to capture the fields correctly, resulting in the output: ``` ['field1', ' with comma', 'field2 with ', 'escaped', ' quotes', 'field3'] ``` I need a regex that accurately captures these cases, including preserving the escaped quotes and ignoring leading/trailing whitespaces. I've tried using lookaheads and lookbehinds but couldn't get it to work without complicating the pattern too much. Any suggestions for a regex pattern that could handle these edge cases? I'm using Python 3.9.1. Thanks in advance! For context: I'm using Python on Debian. I'd really appreciate any guidance on this. This is part of a larger microservice I'm building. This is for a microservice running on macOS. I'd be grateful for any help. I'm using Python latest in this project. Any examples would be super helpful. I recently upgraded to Python stable.