CodexBloom - Programming Q&A Platform

Regex for Validating Custom Log File Formats - implementing Multi-line Entries

👀 Views: 257 đŸ’Ŧ Answers: 1 📅 Created: 2025-08-30
regex python logging Python

Does anyone know how to I'm writing unit tests and I've been banging my head against this for hours. I'm sure I'm missing something obvious here, but I'm trying to create a regex pattern to validate entries in a custom log file format which can span multiple lines. The log format I am working with is as follows: ``` 2023-10-15 14:35:22 INFO Connection Established 2023-10-15 14:35:23 behavior Connection Failed: Timeout 2023-10-15 14:35:24 INFO Retrying... ``` Each log entry starts with a timestamp followed by a log level (INFO, behavior, etc.), and the message can contain any characters, including new lines, until the next timestamp. I initially tried a regex pattern like this: ```regex ^(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2} (INFO|behavior|DEBUG): .*?)$\n? ``` However, I realized that this pattern fails to capture multi-line messages correctly. When I test it against the following log entries: ``` 2023-10-15 14:35:25 INFO Starting process... 2023-10-15 14:35:26 INFO Process detail: - Step 1 completed - Step 2 in progress ``` It only matches the first line and ignores subsequent lines due to the `.*?` not capturing newlines. I also tried using the `s` flag to allow dot (`.`) to match newlines, but I'm unsure how to implement it correctly due to the varying lengths of log entries. What would be the correct regex to match the entire log entry including multi-line messages? I'm using Python 3.9 and the `re` module, and I want to make sure it captures every entry accurately without missing or skipping any parts. I'm working on a service that needs to handle this. Any help would be greatly appreciated! I'm working on a service that needs to handle this. For reference, this is a production desktop app. Any ideas how to fix this? I'm developing on Ubuntu 20.04 with Python. Thanks in advance!