CodexBloom - Programming Q&A Platform

implementing Parsing Custom Log Files in Python - Handling Timestamp Formats

👀 Views: 0 đŸ’Ŧ Answers: 1 📅 Created: 2025-06-16
python regex datetime logging

I've been banging my head against this for hours. I'm sure I'm missing something obvious here, but I'm working on a project and hit a roadblock... I'm working with scenarios while parsing custom log files that have varying timestamp formats. My log entries look like this: ``` 2023-10-01 14:32:10 INFO User logged in 2023/10/01 14:35 WARN Disk space low 01-10-2023 14:36:12 behavior Failed to connect to database ``` The question arises because the timestamps are not consistent in format. I want to parse these log entries and extract the timestamp, log level, and message into a structured format, but I'm getting `ValueError: time data '2023/10/01 14:35' does not match format` errors when trying to parse them with `datetime.strptime()`. Here's a snippet of what I've tried: ```python import re from datetime import datetime log_pattern = re.compile(r'(?P<timestamp>\S+ \S+) (?P<level>\S+) (?P<message>.*)') log_entries = [ '2023-10-01 14:32:10 INFO User logged in', '2023/10/01 14:35 WARN Disk space low', '01-10-2023 14:36:12 behavior Failed to connect to database' ] parsed_logs = [] for entry in log_entries: match = log_pattern.search(entry) if match: timestamp_str = match.group('timestamp') try: timestamp = datetime.strptime(timestamp_str, '%Y-%m-%d %H:%M:%S') except ValueError: try: timestamp = datetime.strptime(timestamp_str, '%Y/%m/%d %H:%M') except ValueError: timestamp = datetime.strptime(timestamp_str, '%d-%m-%Y %H:%M:%S') level = match.group('level') message = match.group('message') parsed_logs.append({'timestamp': timestamp, 'level': level, 'message': message}) print(parsed_logs) ``` Currently, I'm just trying to handle two timestamp formats, but when I run the code, I only get the first entry successfully parsed, whereas the other entries just throw ValueErrors. How can I improve this parsing logic to handle different timestamp formats more gracefully? I want to avoid deeply nested try-except blocks if possible and make it more scalable for future formats as well. Any advice or best practices would be greatly appreciated! I'm working on a API that needs to handle this. I'd really appreciate any guidance on this. For context: I'm using Python on macOS. How would you solve this? This is happening in both development and production on Windows 11. What am I doing wrong?