I’m running Python3 on Cygwin with Windows so it is not clear to me where the problem lies. I am trying to use HTMLSession from requests_html. Here is my script:
#!/usr/bin/python3
from bs4 import BeautifulSoup
from requests_html import HTMLSession
from urllib.parse import urljoin
print('Starting process')
session=HTMLSession()
def get_all_forms(url):
"""Returns all form tags found on a web page's `url` """
# GET request
print("getting page")
res = session.get(url)
# for javascript driven website
print("Running Javascript")
res.html.render()
print("parsing url")
soup = BeautifulSoup(res.html.html, "html.parser")
return soup.find_all("form")
print(get_all_forms("https://blahblah"))
The result is a traceback when executing ‘res.html.render’.
Traceback (most recent call last):
File "./donotcall.py", line 23, in <module>
print(get_all_forms("https://blahblah"))
File "./donotcall.py", line 19, in get_all_forms
res.html.render()
File "/usr/local/lib/python3.8/site-packages/requests_html.py", line 586, in render
self.browser = self.session.browser # Automatically create a event loop and browser
File "/usr/local/lib/python3.8/site-packages/requests_html.py", line 730, in browser
self._browser = self.loop.run_until_complete(super().browser)
File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "/usr/local/lib/python3.8/site-packages/requests_html.py", line 714, in browser
self._browser = await pyppeteer.launch(ignoreHTTPSErrors=not(self.verify), headless=True, args=self.__browser_args)
File "/usr/local/lib/python3.8/site-packages/pyppeteer/launcher.py", line 307, in launch
return await Launcher(options, **kwargs).launch()
File "/usr/local/lib/python3.8/site-packages/pyppeteer/launcher.py", line 168, in launch
self.browserWSEndpoint = get_ws_endpoint(self.url)
File "/usr/local/lib/python3.8/site-packages/pyppeteer/launcher.py", line 227, in get_ws_endpoint
raise BrowserError('Browser closed unexpectedly:\n')
pyppeteer.errors.BrowserError: Browser closed unexpectedly:
From what I can find with my searches, it has something to do with pyppeteer (chrome) and synchronization. I don’t know where it is trying to find chrome or whatever. I don’t have chrome installed on my machine explicitly but I have an application that does launch multiple chromium.exe processes. I do have chromedriver installed but where it is expected to reside is not indicated anywhere I can find. Nor am I sure that is even the problem. Can someone help me debug this? TIA.