I’m using Python 3.10.2, and was scrapping a html using below code:
import requests
import parsel
headers = {
"cookie": "_ga=GA1.3.548065326.1728258946; _gid=GA1.3.257269445.1728258946; _gat_gtag_UA_17997319_1=1",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36",
}
url = "https://parsel.readthedocs.io/en/stable/parsel.html"
response = requests.get(url=url, headers=headers)
html = response.text
selector = parsel.Selector(html)
name = selector.css(
"#threadindex > div > ul > li:nth-child(2) > a::text"
).get()
print(selector)
And I got this error:
Traceback (most recent call last):
File "c:\Users\frida\Documents\Projects\Pyget.py", line 22, in <module>
selector = parsel.Selector(html)
File "C:\ProgramData\miniconda3\lib\site-packages\parsel\selector.py", line 476, in __init__
root, type = _get_root_and_type_from_text(
File "C:\ProgramData\miniconda3\lib\site-packages\parsel\selector.py", line 357, in _get_root_and_type_from_text
root = _get_root_from_text(text, type=type, **lxml_kwargs)
File "C:\ProgramData\miniconda3\lib\site-packages\parsel\selector.py", line 309, in _get_root_from_text
return create_root_node(text, _ctgroup[type]["_parser"], **lxml_kwargs)
File "C:\ProgramData\miniconda3\lib\site-packages\parsel\selector.py", line 102, in create_root_node
parser = parser_cls(recover=True, encoding=encoding, huge_tree=True)
File "C:\ProgramData\miniconda3\lib\site-packages\lxml\html\__init__.py", line 1887, in __init__
super().__init__(**kwargs)
File "src\\lxml\\parser.pxi", line 1806, in lxml.etree.HTMLParser.__init__
File "src\\lxml\\parser.pxi", line 858, in lxml.etree._BaseParser.__init__
LookupError: unknown encoding: 'b'utf8''
Tried different versions but didn’t help. Did I do anything wrong?