Reading text of an open url using python

Adeleh96 · May 3, 2023, 6:49am

I want to read data from a device to an IP address using Python. I can open the web browser page using Python:

import webbrowser
webbrowser.open('IP address', new=2)

and data are shown in the browser. I want to write a Python program that converts this page to text or any other format that can be saved easily.
I tried:

import urllib.request as ul

url = 'IP address'
req = ul.Request(url, headers={'User-Agent': ' Mozilla/5.0'})
client = ul.urlopen(req)
htmldata = client.read()
print(htmldata)

but still can’t reach the data. The connection is closed because of the unknown user agent. However, I can see data when I manually open the Mozilla browser.

Any help is appreciated!

rob42 · May 3, 2023, 7:43am

I don’t know for sure, because I’ve never done what you are doing, but it could be that you need to expand on the User-Agent info?

kknechtel · May 3, 2023, 4:58pm

You can go here to see the full User-Agent string that your browser sends:

https://www.whatismybrowser.com/detect/what-is-my-user-agent/

and copy and paste that into the code.

DamirHanov · May 4, 2023, 9:26pm

As a side note be aware of modern sites that use js frameworks like React/Vue/(how many are there really?). When you open it in your browser, the page that a server sends you is mostly empty. It contains small HTML snippent and lots of js imports. Browser gets them, executes, so content of the page is filled via JS.

It might not be a problem for your current task, but IMO the majority of new sites follow this pattern.

Rosuav · May 4, 2023, 9:30pm

If it’s done right, this actually makes it easier to scrape, because the relevant data is either stored in the page as inline JSON, or fetched in a separate API call (in which case you ignore the HTML and JS, and just do the same fetch).

Adeleh96 · May 9, 2023, 6:41am

Thank you for your replies. I finally solved the problem by importing the socket library. Thus there is no need to open a browser.