Extracting files from mutliple folders base on a date time input

I have a web directory containing multiple files arranged in folders by month/DOY.

For example.

2025/01/01

2025/01/02

2025/01/03

2025/01/04

each folder has 1000’s of files with the date and timestamp…
20250120_000246.html

My current code only works with one day at at time so I need some functionality to choose mutlple days and even better to select times of the files spanning mutliple folders.

url = 'https://blahblahblah/2025/'+Month+'/'+DOY+''

def listFD(url, ext=''):
    #page = requests.get(url).text
    page = requests.get(url, auth=(username,password)).text
    print (page,'this is page')
    soup = BeautifulSoup(page, 'html.parser')
    return [url + '/' + node.get('href') for node in soup.find_all('a') if node.get('href').endswith(ext)]




for file in listFD(url, ext):
   
          reep = requests.get(file, auth=(username,password))
          print (reep.text , 'this is reep')
          hmtler = reep.text
          NO_HTML.append(hmtler)

Any pointers most welcome

It’s not clear to me whether you can access these files via pathlib. Nor exactly what you’re trying to do.

But if you can use pathlib, I’ve done plenty of tasks similar to this using pathlib.Path(...).glob(...).

You can filter using simple string inequalities, because eg "20240530.html" < "20240601.html"

1 Like

No using this function to get the file names

Sorry, I can see you’re using requests.get() to access the files.
But you said you “have a web directory”. I also have a webdirectory that I can access using requests.get, but I can also access it using pathlib, because I ‘own’ the files and I have a local version stashed on my computer. Software I run in the cloud can also access it using pathlib, because it’s running on the same server where the directory lives. Even if you don’t have a local version already, if you want to access 10% or more of the files, and you have appropriate permissions, it might be worth making a local copy.
But that might all be impossible or forbidden. Hence “It’s not clear to me whether you can access these files via pathlib”.

1 Like

No I couldn’t get it to work with pathlib, so only with requests. I read the html data into a list and work from there.

Its not my website and I only have password type access.

1 Like