Looking for help on a project

Aharris · May 19, 2022, 9:10pm

My project is to autmatically reneew my library books whenever they are due using Selenium. I get the element from the website to print as a string and it prints (Title, name of book, date, etc). I’m trying to use a regular expression to accomplish this. when I run the code, I get an error saying, "date = datetime.datetime.strptime(match.string, ‘%m-%d-%Y’).date()
AttributeError: ‘NoneType’ object has no attribute ‘string’ "

my_table = driver.find_element(By.XPATH, '//*[@id="acct_checked_main_header"]/tbody/tr')  # 1 line   
for column in my_table.find_elements(By.XPATH, '//*[@id="acct_checked_main_header"]/tbody/tr/td'):  # 2 line
    book = column.text
    match = re.search(r'\\d{2}-\\d{2}-\\d{4}', book)
    date = datetime.datetime.strptime(match.string, '%m-%d-%Y').date()
    print(date)  # added line of code, may not need to use it

Aharris · May 19, 2022, 9:11pm

don’t mind my comments. They’re just to help me remember while I’m testing code

rob42 · May 19, 2022, 9:57pm

I think that the issue is that your code match.string is not valid and should be the likes of re.match(pattern, string)

See:

rob42 · May 20, 2022, 3:05am

I’m not sure if this will help, but I’ve been reading up on regex and I’ve coded this as a part of my notes:

import re

dateString = 'the date in this string is May 20 2022'

year = re.search('[0-9][0-9][0-9][0-9]', dateString)

if year:
    print('Year found')
    yearFound = (year.span())
    print(dateString[yearFound[0]:yearFound[1]])
else:
    print('Year not found')

As an explainer, for anyone following this and does not know:

import re
re.search(<regex>, <string>)

returns a match object if a match is found, otherwise it returns None

In the above example, the match object is returned thus: <_sre.SRE_Match object; span=(34, 38), match='2022'>

span=(34, 38) is the slice notation: start & end positions of ‘2022’ which is the same as dateString[34:38]; that is to say that the match starts at character position 34 and extends up to (but not including) position 38.

The real power of regex is when you need to pattern match. So in this example we need to pattern match four consecutive digits. We can do this by constructing a character class of metacharacters.

We can match any single character or a range of characters: [3] would match ‘3’ [R] would match ‘R’ and so on. To match a range, we can use the metacharacter -, so any digit between zero and nine would be [0-9]. We need to find a four digit year, so '[0-9][0-9][0-9][0-9]' means any string (notice that it’s enclosed with single quotes) of four digits between zero and nine, back-to-back: re.search('[0-9][0-9][0-9][0-9]', dateString)

We can then use the .span() method to extract the (34, 38) tuple and assign it to a variable: yearFound = (year.span()) from which it can be unpacked with [yearFound[0]:yearFound[1]] and displayed with print(dateString[yearFound[0]:yearFound[1]]) which we can them include in the if branch.

The ‘gotcha’ here is that you’d need to know the format of the data to be searched.

rob42 · May 20, 2022, 12:07pm

The reason for this is down to the way you’ve constructed the match = re.search() . As is, you’ve turned it from a <class '_sre.SRE_Match'> to a <class 'NoneType'>.

I’m trying to get my head around your construction; you want to walk me through it?

edit: are you searching for e.g: 05/20/2020 with a format that is %m %d %Y

It may also help if we had a sample from book along with its data type.

Aharris · May 29, 2022, 3:45pm

sorry for getting back late but yes I’m searching for any date format with mm/dd/yyyy.
This is how each book is printed:

Title
Painless geometry
Author
Long, Lynette,
Renewals Remaining
2
Due Date
06/08/2022
Barcode
31019005774834
Call Number
516 LON```

Aharris · May 29, 2022, 3:46pm

So basically I’m just trying to iterate through and just find the date so I can create a condition.

rob42 · May 29, 2022, 4:07pm

No worries.

Given that, it seems to me that all you need to do is to search for the word ‘Date’, knowing that said date will be after the implicit '\n'; no need for any Regex pattern match. Or am I missing something?

edit: not sure if this is of any help as the feedback is lacking, but this simple script may do what you need.

With this, there’s no need to even bother with finding the '\n' that follows the word ‘Date’:

strBook = '''Title
Painless geometry
Author
Long, Lynette,
Renewals Remaining
2
Due Date
06/08/2022
Barcode
31019005774834
Call Number
516 LON'''

intIndex = strBook.find('Due Date')
print(strBook[intIndex:intIndex+19])

Aharris · May 30, 2022, 12:53pm

Apologies for the lack of feedback. I’m just trying to go as long as I can trying to figure it out on my own so in the event I can’t, it’ll click better when I learn from someone. So I replaced the regex pattern match with your last two lines of code and it sorta worked. There are extra letters and will the extra space in between each date effect creating a condition? Will “Due Date” effect iterating through the dates to check if it is equal or not equal to the current date? Could you explain how the code works or more specifically the “+19”?

Here’s what it printed:

e



Due Date
06/08/2022



e



Due Date
06/08/2022



e



Due Date
06/08/2022

rob42 · May 30, 2022, 1:13pm

No worries; it’s just helpful to know one way or the other, if we’re both on the same page: no point in me trying to guide you in a direction that is doomed from the outset.

It’s good that you’re trying to solve this by thinking about it, rather than relying on someone else to do the thinking for you; way-to-go.

My solution removes the need for Regex, but by all means go down that road, as you’ll learn much more by finding different ways to solve the same problem. As I’ve already posted, I don’t follow your Regex logic, so that’s still a gray area for me, which is why I posted the method I use with Regex. Yes, there’s more than one method for Regex, but I find my example easier to follow: it’s just the way my head works.

To explain my code: intIndex points to the location of 'D' in the word ‘Due’ and if you add 9 to that, you get to the implicit '\n' character that is at the end of the word ‘Date’, so we need another 10 characters to cover the date itself.

cheesebird · June 5, 2022, 3:41pm

This would find the date…

re.search or re.findall should work