Parsing Custom Log Format in Python - implementing Timestamp and Key-Value Pairs
I'm trying to parse a custom log format that includes a timestamp followed by key-value pairs, but I'm running into issues extracting the data correctly. The log entries look like this: ``` 2023-10-12 14:35:22 - user_id=42, action=login, status=success 2023-10-12 14:36:10 - user_id=53, action=logout, status=failed ``` I want to separate the timestamp and convert the key-value pairs into a dictionary. I've tried using regular expressions but I'm getting unexpected results, especially with entries that might have commas within values in the future. Hereโs the code Iโve written so far: ```python import re log_entries = ''' 2023-10-12 14:35:22 - user_id=42, action=login, status=success 2023-10-12 14:36:10 - user_id=53, action=logout, status=failed ''' pattern = r'^(\S+ \S+) - (.*)$' results = [] for line in log_entries.strip().split('\n'): match = re.match(pattern, line) if match: timestamp = match.group(1) kv_pairs = match.group(2) kv_dict = {k.strip(): v.strip() for k, v in (pair.split('=') for pair in kv_pairs.split(','))} results.append({'timestamp': timestamp, 'data': kv_dict}) print(results) ``` The output Iโm getting is as follows: ``` [{'timestamp': '2023-10-12 14:35:22', 'data': {'user_id': '42', 'action': 'login', 'status': 'success'}}, {'timestamp': '2023-10-12 14:36:10', 'data': {'user_id': '53', 'action': 'logout', 'status': 'failed'}}] ``` While this seems fine now, Iโm concerned about future entries where values might have commas, such as `action=user update` or `status=login, another_action=logout`. Iโm unsure how to modify my regex to accommodate that without breaking the existing structure. Any suggestions on how to improve this parsing approach while ensuring it remains robust against additional complexities? Thanks!