CodexBloom - Programming Q&A Platform

Regex for Extracting IP Addresses in Python - implementing IPv6 Formats

👀 Views: 5 đŸ’Ŧ Answers: 1 📅 Created: 2025-06-03
regex python ip-address Python

I tried several approaches but none seem to work. I've tried everything I can think of but This might be a silly question, but I'm trying to extract both IPv4 and IPv6 addresses from a log file using regex in Python, but I'm running into issues with matching certain IPv6 formats. My current regex pattern looks like this: ```python import re log_data = ''' 192.168.1.1 - - [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 2001:0db8:85a3:0000:0000:8a2e:0370:7334 - - [10/Oct/2000:13:55:36 -0700] "GET /example HTTP/1.0" ''' pattern = r'(?:(?:\d{1,3}\.){3}\d{1,3}|(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4})' matches = re.findall(pattern, log_data) print(matches) ``` While this regex seems to capture standard IPv4 addresses well, it fails to properly match certain IPv6 formats, especially those with shorthand notation (like `2001:db8:85a3::8a2e:370:7334`). When I run the code, I get the following output: ``` ['192.168.1.1', '2001:0db8:85a3:0000:0000:8a2e:0370:7334'] ``` I noticed that my regex does not account for cases like `2001:db8:85a3::1` and `::1`. How can I modify my pattern to capture these shorthand IPv6 representations? I also want to ensure that it remains efficient since I'm working with a large log file (over a million lines). Any help would be greatly appreciated! This is part of a larger service I'm building. The stack includes Python and several other technologies. Is there a better approach? I'm working on a mobile app that needs to handle this. I recently upgraded to Python latest. Any ideas what could be causing this?