Regex for Extracting Dates in MM-DD-YYYY Format - implementing Leading Zeros
I'm trying to debug I've tried everything I can think of but I'm working on a Python script that needs to extract dates formatted as MM-DD-YYYY from a block of text. The scenario arises when I try to account for leading zeros in the month and day. For example, '01-05-2023' should match, but '1-5-2023' should not. Additionally, I want to ensure that the month is always 01-12 and that the day is valid according to the month (e.g., no '02-30-2023'). I've tried the following regex pattern: ```python import re text = 'The event is on 01-05-2023 and another one is on 1-5-2023.' pattern = r'\b(0[1-9]|1[0-2])-(0[1-9]|[12][0-9]|3[01])-(\d{4})\b' matches = re.findall(pattern, text) print(matches) ``` While this captures '01-05-2023', it also seems to be overly permissive when it comes to days and doesn't correctly validate the month-day combinations. The output is as follows: ``` [('01', '05', '2023')] ``` However, I am still left with the worry that if a month like '02' is included, it could incorrectly match days that do not exist, such as '02-30-2023'. I've considered using additional logic after matching to validate the days, but Iād prefer to do this purely with regex if possible. Any advice on how to enhance my regex to ensure both proper format and valid date combinations? I'm developing on Debian with Python. I'm coming from a different tech stack and learning Python. Any suggestions would be helpful. I'm working in a Ubuntu 22.04 environment.