CodexBloom - Programming Q&A Platform

Parsing Custom Log Format in Python - implementing Timestamp and Key-Value Pairs

๐Ÿ‘€ Views: 146 ๐Ÿ’ฌ Answers: 1 ๐Ÿ“… Created: 2025-06-14
python regex parsing Python

I'm trying to parse a custom log format that includes a timestamp followed by key-value pairs, but I'm running into issues extracting the data correctly. The log entries look like this: ``` 2023-10-12 14:35:22 - user_id=42, action=login, status=success 2023-10-12 14:36:10 - user_id=53, action=logout, status=failed ``` I want to separate the timestamp and convert the key-value pairs into a dictionary. I've tried using regular expressions but I'm getting unexpected results, especially with entries that might have commas within values in the future. Hereโ€™s the code Iโ€™ve written so far: ```python import re log_entries = ''' 2023-10-12 14:35:22 - user_id=42, action=login, status=success 2023-10-12 14:36:10 - user_id=53, action=logout, status=failed ''' pattern = r'^(\S+ \S+) - (.*)$' results = [] for line in log_entries.strip().split('\n'): match = re.match(pattern, line) if match: timestamp = match.group(1) kv_pairs = match.group(2) kv_dict = {k.strip(): v.strip() for k, v in (pair.split('=') for pair in kv_pairs.split(','))} results.append({'timestamp': timestamp, 'data': kv_dict}) print(results) ``` The output Iโ€™m getting is as follows: ``` [{'timestamp': '2023-10-12 14:35:22', 'data': {'user_id': '42', 'action': 'login', 'status': 'success'}}, {'timestamp': '2023-10-12 14:36:10', 'data': {'user_id': '53', 'action': 'logout', 'status': 'failed'}}] ``` While this seems fine now, Iโ€™m concerned about future entries where values might have commas, such as `action=user update` or `status=login, another_action=logout`. Iโ€™m unsure how to modify my regex to accommodate that without breaking the existing structure. Any suggestions on how to improve this parsing approach while ensuring it remains robust against additional complexities? Thanks!