Regular expression HELP

i need to regular expression for the following pattern. there are over a thousand of these in the same pattern that i need to extract from a pdf please advise
2001 - 04/10/2013 To 06/23/105
2005 - 09/10/2017 To 06/23/109
2008 - 02/10/2019 To 06/23/2020

i need to regular expression for the following pattern. there are over a
thousand of these in the same pattern that i need to extract from a pdf
please advise
2001 - 04/10/2013 To 06/23/105
2005 - 09/10/2017 To 06/23/109
2008 - 02/10/2019 To 06/23/2020

What do you need to extract specifically? What have you tried already?

i have a pdf file . each new record begins with pattern of 2001 - 04/10/2013 To 06/23/105 .

so record 1 would be 2001 - 04/10/2013 To 06/23/105 (these patterns are constants throughout the pdf)
record 2 2005 - 09/10/2017 To 06/23/109
record 3 2008 - 02/10/2019 To 06/23/2020
…
…
.
…
.etc

i simply want a regex to match the pattern. the rest i can figure out

import re

member = re.compile(r^\d{4} - (.*)’)

i have a pdf file . each new record begins with pattern of 2001 - 04/10/2013 To 06/23/105 .

so record 1 would be 2001 - 04/10/2013 To 06/23/105 (these patterns are constants throughout the pdf)
record 2 2005 - 09/10/2017 To 06/23/109
record 3 2008 - 02/10/2019 To 06/23/2020
…
i simply want a regex to match the pattern. the rest i can figure out

Interestingly parts of those strings look like dates:

04/10/2013
09/10/2017
02/10/2019

and some others do not, at least to me:

06/23/105
06/23/109

Is that intended?

If so, what else can you say about what can appear in the places where the
non-dates are?

And finally, do you ultimately need to extract a single “str” value or a
subdivision of the text in groups? (e.g. the first date-looking part as a
datetime object, the record number from “record 2” as an int, …)

In the USA the MM/DD/YYYY form is regrettably common. In most of the
rest of the world the units are ordered ascending eg DD/MM/YYYY.

This discrepancy causes much pain, particularly around abiguity when the
DD and MM values don’t exclude one ordering or the other.

Cheers,
Cameron Simpson cs@cskk.id.au

This sample value in particular had me scratch my head as to what it might be:

06/23/105

Oh :frowning:

2005, badly done as an offset from 1900? Just a ludicrous guess though.

Cheers, Cameron

i figured it out

re.compile(r’^\d{4}\s\s-\s\s/*)

1 Like