Regex scenarios to Match Custom CSV Format with Quoted Fields in Ruby - Edge Cases Not Handled
This might be a silly question, but I'm trying to parse a custom CSV format where fields can be quoted and may contain commas. The scenario I'm working with is that some fields can have escaped quotes, and my current regex doesn't seem to handle these edge cases correctly. Hereโs the regex pattern Iโm using: ```ruby pattern = /(?<=^|,)("(?:[^"\\]*(?:\\.)?)*"|[^,]*)/ ``` I expected this to correctly match fields, but it's failing when it encounters fields like this: `"field1, with a comma", "field2 \"escaped quote\""` In this case, the pattern incorrectly splits `"field1, with a comma"` into two fields because of the comma within the quotes. Iโve tried various modifications, such as adding lookaheads and lookbehinds, but I keep running into issues with fields that contain escaped quotes or entirely unquoted strings. When I run the regex on the string, I get an output thatโs missing some fields and incorrectly formats others: ```ruby csv_string = '"field1, with a comma", "field2 \"escaped quote\"", value3' matches = csv_string.scan(pattern) puts matches.flatten ``` This results in: ``` field1 with a comma field2 value3 ``` Iโm using Ruby 3.0.0. Can anyone suggest a regex that can properly handle these cases, or is there a better approach to parsing this type of CSV format? Any help would be greatly appreciated! My development environment is macOS. What's the best practice here? Any ideas what could be causing this?