CodexBloom - Programming Q&A Platform

Parsing Custom Log Format in Python - Handling Inconsistent Timestamp Formats

πŸ‘€ Views: 1915 πŸ’¬ Answers: 1 πŸ“… Created: 2025-06-13
python logging datetime parsing Python

I'm testing a new approach and I'm working on a Python script to parse a custom log file where each line has a timestamp, log level, and the log message... The challenge I'm facing is that the timestamps are inconsistent; some lines use ISO 8601 format (`2023-10-12T14:30:00Z`), while others have a simple Unix timestamp (`1697116200`). My goal is to extract these timestamps and convert them into a uniform datetime object for further analysis. Here's a snippet of my log lines: ``` 2023-10-12T14:30:00Z INFO User logged in 1697116200 WARN Disk space low 2023-10-12T15:00:00Z ERROR Failed to connect to database ``` I initially tried using the `datetime` module with different parsing functions based on the log level, but it gets messy and I'm worried about performance for large log files. Here’s what my code looks like: ```python import datetime log_lines = [ '2023-10-12T14:30:00Z INFO User logged in', '1697116200 WARN Disk space low', '2023-10-12T15:00:00Z ERROR Failed to connect to database', ] def parse_log_line(line): parts = line.split(' ', 2) timestamp = parts[0] log_level = parts[1] message = parts[2] if len(parts) > 2 else '' # Attempt to parse the timestamp try: if 'T' in timestamp: parsed_time = datetime.datetime.fromisoformat(timestamp[:-1]) else: parsed_time = datetime.datetime.fromtimestamp(int(timestamp)) except Exception as e: print(f'Error parsing timestamp: {e}') parsed_time = None return parsed_time, log_level, message for line in log_lines: print(parse_log_line(line)) ``` This code works for the current log lines, but I am concerned about its performance when handling larger files with potentially thousands of lines. Additionally, I sometimes get `ValueError` for invalid timestamps. Is there a more efficient way to handle the parsing, possibly with a single function that can recognize both timestamp formats? Also, how should I handle potential errors more gracefully, rather than just printing them? Any guidance or best practices would be appreciated! I'm working on a web app that needs to handle this. Is there a better approach? This is part of a larger CLI tool I'm building. Am I missing something obvious?