i need to regular expression for the following pattern. there are over a thousand of these in the same pattern that i need to extract from a pdf please advise
2001 - 04/10/2013 To 06/23/105
2005 - 09/10/2017 To 06/23/109
2008 - 02/10/2019 To 06/23/2020
i need to regular expression for the following pattern. there are over a
thousand of these in the same pattern that i need to extract from a pdf
please advise
2001 - 04/10/2013 To 06/23/105
2005 - 09/10/2017 To 06/23/109
2008 - 02/10/2019 To 06/23/2020
What do you need to extract specifically? What have you tried already?
i have a pdf file . each new record begins with pattern of 2001 - 04/10/2013 To 06/23/105 .
so record 1 would be 2001 - 04/10/2013 To 06/23/105 (these patterns are constants throughout the pdf)
record 2 2005 - 09/10/2017 To 06/23/109
record 3 2008 - 02/10/2019 To 06/23/2020
…
…
.
…
.etc
i simply want a regex to match the pattern. the rest i can figure out
import re
member = re.compile(r^\d{4} - (.*)’)
i have a pdf file . each new record begins with pattern of 2001 - 04/10/2013 To 06/23/105 .
so record 1 would be 2001 - 04/10/2013 To 06/23/105 (these patterns are constants throughout the pdf)
record 2 2005 - 09/10/2017 To 06/23/109
record 3 2008 - 02/10/2019 To 06/23/2020
…
i simply want a regex to match the pattern. the rest i can figure out
Interestingly parts of those strings look like dates:
04/10/2013
09/10/2017
02/10/2019
and some others do not, at least to me:
06/23/105
06/23/109
Is that intended?
If so, what else can you say about what can appear in the places where the
non-dates are?
And finally, do you ultimately need to extract a single “str” value or a
subdivision of the text in groups? (e.g. the first date-looking part as a
datetime object, the record number from “record 2” as an int, …)
In the USA the MM/DD/YYYY form is regrettably common. In most of the
rest of the world the units are ordered ascending eg DD/MM/YYYY.
This discrepancy causes much pain, particularly around abiguity when the
DD and MM values don’t exclude one ordering or the other.
Cheers,
Cameron Simpson cs@cskk.id.au
This sample value in particular had me scratch my head as to what it might be:
06/23/105
Oh
2005, badly done as an offset from 1900? Just a ludicrous guess though.
Cheers, Cameron
i figured it out
re.compile(r’^\d{4}\s\s-\s\s/*)