I have a web directory containing multiple files arranged in folders by month/DOY.
For example.
2025/01/01
2025/01/02
2025/01/03
2025/01/04
each folder has 1000’s of files with the date and timestamp…
20250120_000246.html
My current code only works with one day at at time so I need some functionality to choose mutlple days and even better to select times of the files spanning mutliple folders.
url = 'https://blahblahblah/2025/'+Month+'/'+DOY+''
def listFD(url, ext=''):
#page = requests.get(url).text
page = requests.get(url, auth=(username,password)).text
print (page,'this is page')
soup = BeautifulSoup(page, 'html.parser')
return [url + '/' + node.get('href') for node in soup.find_all('a') if node.get('href').endswith(ext)]
for file in listFD(url, ext):
reep = requests.get(file, auth=(username,password))
print (reep.text , 'this is reep')
hmtler = reep.text
NO_HTML.append(hmtler)
Sorry, I can see you’re using requests.get() to access the files.
But you said you “have a web directory”. I also have a webdirectory that I can access using requests.get, but I can also access it using pathlib, because I ‘own’ the files and I have a local version stashed on my computer. Software I run in the cloud can also access it using pathlib, because it’s running on the same server where the directory lives. Even if you don’t have a local version already, if you want to access 10% or more of the files, and you have appropriate permissions, it might be worth making a local copy.
But that might all be impossible or forbidden. Hence “It’s not clear to me whether you can access these files via pathlib”.