Regex scenarios to Extract IPv6 Addresses from Logs in Python - scenarios with Embedded IPv4
I'm writing unit tests and I'm trying to extract IPv6 addresses from a log file using a regular expression in Python, but I'm running into issues when the IPv6 address is embedded with an IPv4 address. My current regex pattern looks like this: ```python import re log_data = ''' User connected from 2001:0db8:85a3:0000:0000:8a2e:0370:7334 and then switched to 192.168.1.1 User connected from 2001:0db8:abcd:0012:0000:0000:0000:0001/64 ''' regex_pattern = r'(?:[\da-fA-F]{1,4}:){7}[\da-fA-F]{1,4}|(?:[\da-fA-F]{1,4}:){1,7}:|::(?:[\da-fA-F]{1,4}:){0,6}[\da-fA-F]{1,4}' ipv6_addresses = re.findall(regex_pattern, log_data) print(ipv6_addresses) ``` However, this approach fails to match the IPv6 address when it appears alongside the embedded IPv4 address. The output I'm receiving is: ``` ['2001:0db8:abcd:0012:0000:0000:0000:0001'] ``` I expected to see both IPv6 addresses, but the function is ignoring the first one completely. I've tried modifying the regex to accommodate the potential variations, such as: ```python regex_pattern = r'([\da-f]{1,4}:){7}[\da-f]{1,4}|([\da-f]{1,4}:){1,7}:|::([\da-f]{1,4}:){0,6}[\da-f]{1,4}| (?:[\da-f]{1,4}:){0,5}:(?:[\da-f]{1,4}:){1,2}(?:[\d]{1,3}\.){3}[\d]{1,3}' ``` But that still doesn't cover all cases. Can anyone provide a more robust regex pattern that can handle both IPv6 and embedded IPv4 addresses effectively? I'm using Python 3.9.1, and I want to ensure that my solution performs well even with large log files. Thanks in advance! This is for a service running on Ubuntu 22.04.