CodexBloom - Programming Q&A Platform

Regex scenarios to Match Custom CSV Format with Quoted Fields in Ruby - Edge Cases Not Handled

๐Ÿ‘€ Views: 3 ๐Ÿ’ฌ Answers: 1 ๐Ÿ“… Created: 2025-06-08
regex csv ruby

This might be a silly question, but I'm trying to parse a custom CSV format where fields can be quoted and may contain commas. The scenario I'm working with is that some fields can have escaped quotes, and my current regex doesn't seem to handle these edge cases correctly. Hereโ€™s the regex pattern Iโ€™m using: ```ruby pattern = /(?<=^|,)("(?:[^"\\]*(?:\\.)?)*"|[^,]*)/ ``` I expected this to correctly match fields, but it's failing when it encounters fields like this: `"field1, with a comma", "field2 \"escaped quote\""` In this case, the pattern incorrectly splits `"field1, with a comma"` into two fields because of the comma within the quotes. Iโ€™ve tried various modifications, such as adding lookaheads and lookbehinds, but I keep running into issues with fields that contain escaped quotes or entirely unquoted strings. When I run the regex on the string, I get an output thatโ€™s missing some fields and incorrectly formats others: ```ruby csv_string = '"field1, with a comma", "field2 \"escaped quote\"", value3' matches = csv_string.scan(pattern) puts matches.flatten ``` This results in: ``` field1 with a comma field2 value3 ``` Iโ€™m using Ruby 3.0.0. Can anyone suggest a regex that can properly handle these cases, or is there a better approach to parsing this type of CSV format? Any help would be greatly appreciated! My development environment is macOS. What's the best practice here? Any ideas what could be causing this?