Got an error from Parsel, Please help

Friday_Anubis · October 7, 2024, 12:56am

I’m using Python 3.10.2, and was scrapping a html using below code:

import requests
import parsel

headers = {
    "cookie": "_ga=GA1.3.548065326.1728258946; _gid=GA1.3.257269445.1728258946; _gat_gtag_UA_17997319_1=1",
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36",
}

url = "https://parsel.readthedocs.io/en/stable/parsel.html"

response = requests.get(url=url, headers=headers)
html = response.text

selector = parsel.Selector(html)
name = selector.css(
    "#threadindex > div > ul > li:nth-child(2) > a::text"
).get()

print(selector)

And I got this error:

Traceback (most recent call last):
  File "c:\Users\frida\Documents\Projects\Pyget.py", line 22, in <module>
    selector = parsel.Selector(html)
  File "C:\ProgramData\miniconda3\lib\site-packages\parsel\selector.py", line 476, in __init__
    root, type = _get_root_and_type_from_text(
  File "C:\ProgramData\miniconda3\lib\site-packages\parsel\selector.py", line 357, in _get_root_and_type_from_text
    root = _get_root_from_text(text, type=type, **lxml_kwargs)
  File "C:\ProgramData\miniconda3\lib\site-packages\parsel\selector.py", line 309, in _get_root_from_text
    return create_root_node(text, _ctgroup[type]["_parser"], **lxml_kwargs)
  File "C:\ProgramData\miniconda3\lib\site-packages\parsel\selector.py", line 102, in create_root_node
    parser = parser_cls(recover=True, encoding=encoding, huge_tree=True)
  File "C:\ProgramData\miniconda3\lib\site-packages\lxml\html\__init__.py", line 1887, in __init__
    super().__init__(**kwargs)
  File "src\\lxml\\parser.pxi", line 1806, in lxml.etree.HTMLParser.__init__
  File "src\\lxml\\parser.pxi", line 858, in lxml.etree._BaseParser.__init__
LookupError: unknown encoding: 'b'utf8''

Tried different versions but didn’t help. Did I do anything wrong?

barry-scott · October 7, 2024, 8:17am

It seems that you are processing an XML file that has a bad encoding defined. You need to get the bad XML file fixed.

Can you share the first few lines of the XML file?

Friday_Anubis · October 7, 2024, 9:15am

I believe it’s not the problem.

Using conda threw this error, but when I tried use python directly with same packages, everything went fine.

Still, it’s very strange because I can’t see anything wrong in my conda IDE

So basicly I continued my project with Python 3.8.10, It’s much appreciated if u have any idea about this.

barry-scott · October 7, 2024, 10:09am

The traceback clearly says otherwise.

But without the XML file to examine I cannot be absolutely sure there is not something else going on.

onePythonUser · October 7, 2024, 12:34pm

Hello,

Can you try the following both independently when you’re in Conda and when you’re using IDLE for comparison:

import sys, parsel, requests

major, minor, micro = sys.version_info[:3]
print(f"Your Python version is {major}.{minor}.{micro}")

print('parsel version is: ', parsel.__version__)
print('requests version is: ', requests.__version__)

avi.gross · October 7, 2024, 3:43pm

Paul,

I would hazard a guess that you have at least two python setups on your machine and that the modules you have are found in different places.

Many people use pip to update or install modules but the last time I used it, there was a different conda command to install for the ananconda distribution.

I once had some problems I solved by uninstalling everything and just using python directly so there were no sources for possible confusion.

In your case, it is possible you are not using the same versions of packages when doing your two ways or something along those lines.