CodexBloom - Programming Q&A Platform

Parsing Custom Log Format in Python - implementing Multiple Timestamp Formats

👀 Views: 39 đŸ’Ŧ Answers: 1 📅 Created: 2025-06-14
python regex log-parsing Python

I'm refactoring my project and I'm trying to debug I'm collaborating on a project where I've been banging my head against this for hours. I'm testing a new approach and I'm trying to parse a custom log file that contains multiple timestamp formats within the same line, and I'm running into issues with correctly identifying and extracting the timestamps. The log entries look like this: ``` 2023-10-01 12:00:00 INFO User logged in 2023/10/02 14:30:00 behavior Failed to load resource [2023-10-03 09:45:00] WARN Session timed out ``` I need to extract the timestamps and their corresponding log levels, but the varying formats are making my regex quite complex. My current approach uses the following regex: ```python import re log_line_pattern = r'(?P<timestamp>[\d]{4}[-/][\d]{2}[-/][\d]{2}[ \s][\d]{2}:[\d]{2}:[\d]{2})\s(?P<level>INFO|behavior|WARN)' ``` However, this is failing to match the log lines consistently. When I test it, I often get `None` for matches, and the output looks like this: ``` [None, 'INFO'] [None, 'behavior'] [None, 'WARN'] ``` I've also tried splitting the lines into parts and parsing each section individually, but that approach is quite inefficient. I want to ensure I capture all timestamps and their corresponding log levels without missing any entries. Could someone guide to refine this regex or suggest a better approach for parsing these log lines? I'm currently using Python 3.10 and the `re` module, and I would appreciate any insights into best practices for handling varying formats in log parsing. I'm using Python LTS in this project. Is there a better approach? The project is a desktop app built with Python. My development environment is Linux. What would be the recommended way to handle this? This is for a web app running on Windows 11. This is for a mobile app running on Ubuntu 20.04. Could this be a known issue? For reference, this is a production REST API.