Why can I not catch this urllib.request.urlopen exception?

ullix · August 10, 2022, 7:32am

While the code seems simple enough:

try:
    with urllib.request.urlopen(url, timeout=0.05) as page:   # this is line 202
        response = page.read().strip().decode("UTF-8")
except Exception as e:
    exceptPrint(e, "urllib.request.urlopen")

why do I always get this double-exception, both times in line 202 (the with … line)?

10 09:27:54.744 DEBUG  : .15116    EXCEPTION: urllib.request.urlopen (<urlopen error timed out>) in file: gdev_wifiserver.py in line: 202
10 09:27:54.745 DEVEL  : .15117    Traceback (most recent call last):
  File "/opt/python/3.10.4/lib/python3.10/urllib/request.py", line 1348, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/opt/python/3.10.4/lib/python3.10/http/client.py", line 1282, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/opt/python/3.10.4/lib/python3.10/http/client.py", line 1328, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/opt/python/3.10.4/lib/python3.10/http/client.py", line 1277, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/opt/python/3.10.4/lib/python3.10/http/client.py", line 1037, in _send_output
    self.send(msg)
  File "/opt/python/3.10.4/lib/python3.10/http/client.py", line 975, in send
    self.connect()
  File "/opt/python/3.10.4/lib/python3.10/http/client.py", line 941, in connect
    self.sock = self._create_connection(
  File "/opt/python/3.10.4/lib/python3.10/socket.py", line 845, in create_connection
    raise err
  File "/opt/python/3.10.4/lib/python3.10/socket.py", line 833, in create_connection
    sock.connect(sa)
TimeoutError: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ullix/geigerlog/geigerlog/gdev_wifiserver.py", line 202, in getUrlResponse
    with urllib.request.urlopen(url, timeout=gglobs.WiFiServerTimeout) as page:
  File "/opt/python/3.10.4/lib/python3.10/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
  File "/opt/python/3.10.4/lib/python3.10/urllib/request.py", line 519, in open
    response = self._open(req, data)
  File "/opt/python/3.10.4/lib/python3.10/urllib/request.py", line 536, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/opt/python/3.10.4/lib/python3.10/urllib/request.py", line 496, in _call_chain
    result = func(*args)
  File "/opt/python/3.10.4/lib/python3.10/urllib/request.py", line 1377, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "/opt/python/3.10.4/lib/python3.10/urllib/request.py", line 1351, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error timed out>

steven.daprano · August 10, 2022, 9:05am

What is exceptPrint? Perhaps it is printing the traceback.

What happens if you change the line exceptPrint(...) to just print('error')?

domdfcoding · August 10, 2022, 10:51am

The first exception is raised by the socket module. It’s caught by urllib to raise a urllib-specific exception, irrespective of the underlying implementation.

The first exception used to be (before 3.10) a socket.timeout exception. It changed to a TimeoutError in 3.10, but because urllib catches and re-raises nothing changes for users of urllib.

ullix · August 10, 2022, 11:14am

exceptPrint() is this function:


def exceptPrint(e, srcinfo):
    """Print exception details (errmessage, file, line no)"""

    exc_type, exc_obj, exc_tb = sys.exc_info()
    fname                     = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]    

    print("EXCEPTION: {} ({}) in file: {} in line: {}".format(srcinfo, e, fname, exc_tb.tb_lineno))
    print(traceback.format_exc())

ullix · August 10, 2022, 11:36am

@domdfcoding you are suggesting that the behavior changed with 3.10. So I tried Py3.7, 3.8, 3.9 also.

With them I am getting the same double-exception as with 3.10.

It is irritating. Is there anything I can do to prevent it?

vbrozik · August 10, 2022, 1:36pm

For troubleshooting do not catch the exception. The traceback would be more comprehensible without try-except and your custom exception printing. It would be easier for us to understand what is going on.

For exception handling in production code do not catch whole Exception class. You are effectively making yourself blind for errors. See for example this recent post:

There is also a problem in the urllib code:

github.com

python/cpython/blob/v3.10.6/Lib/urllib/request.py#L1351


      
                  # Proxy-Authorization should not be sent to origin
                  # server.
                  del headers[proxy_auth_hdr]
              h.set_tunnel(req._tunnel_host, headers=tunnel_headers)
          
          try:
              try:
                  h.request(req.get_method(), req.selector, req.data, headers,
                            encode_chunked=req.has_header('Transfer-encoding'))
              except OSError as err: # timeout error
                  raise URLError(err)
              r = h.getresponse()
          except:
              h.close()
              raise
          
          # If the server does not send us a 'Connection: close' header,
          # HTTPConnection assumes the socket should be left open. Manually
          # mark the socket to be closed when this response object goes away.
          if h.sock:
              h.sock.close()

The code raising a new exception should be:

            except OSError as err: # timeout error
                # Here we should probably test if it is really a timeout error?
                raise URLError(err) from err

The current code confusingly writes:

During handling of the above exception, another exception occurred:

The fixed code would write the following text which explains what is going on:

The above exception was the direct cause of the following exception:

I think an issue should be opened, right? The same problem is in the current code: cpython/Lib/urllib/request.py at main · python/cpython · GitHub

steven.daprano · August 10, 2022, 9:03pm

Well that’s your answer then. You are getting the double exception printed because you are catching it and printing it.

I don’t understand your question here. What is the problem?

If you don’t want to see the exception, then don’t print it. Handle the exception some other way.
If you don’t want the exception to happen at all, unfortunately, there is absolutely nothing you can do about that, because it is not in your control.

When you try to connect to a website, there are many different exceptions that can occur, starting with low-level socket errors, to high-level HTTP/HTTPS errors.

You have to deal with them somehow, even if you deal with them by not catching the exception and just allowing them to halt your program.

ullix · August 11, 2022, 7:57am

What keeps confusing me is that this single line of code:

is producing 2 exceptions. I would have thought that execution stops once an exception has occured?

abessman · August 11, 2022, 8:12am

Perhaps a minimal example might help in understanding:

try:
    raise ValueError
except ValueError as exc:
    raise RuntimeError(exc)

Result:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
ValueError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
RuntimeError

As @vbrozik mentioned above, this happens because the exception is not re-raised correctly via the from keyword. There is nothing you can do about it, except report it as a bug againts urllib.

ullix · August 11, 2022, 8:38am

Sigh. Thanks.

steven.daprano · August 11, 2022, 11:25am

Sure. That would be a small improvement.

steven.daprano · August 11, 2022, 11:28am

Stating that the exception “is not re-raised correctly” is an exaggeration. There is nothing wrong or incorrect about the existing urllib exception handling, it just could be better to give a very slightly nicer traceback.

Nicer for experts. Beginners probably won’t notice or understand the difference.

I expect it is because that part of the urllib code was written before Python had chained exceptions.

steven.daprano · August 11, 2022, 11:45am

Ah, now I understand your question!

Execution stops once an exception occurs unless the exception is caught with a try...except or try...finally block.

Back in the ancient days of Python, if you had a bug inside the except or finally block, the exception that was raised by that bug meant that you lost any record of the original exception:

LOGFILE = 'some log file'

try:
    run_something()  #  <-- TypeError occurs here
except TypeError as err:
    # Log the error.
    log(err, LOGGFILE)  # Oops, a typo!

The NameError you get from the typo completely wiped out all information about the previous TypeError. And that would often make it hard to fix the bug.

Now imagine that block of code was inside a function, and that function was called from another try...except block. So one exception could cause another exception which causes a third exception, and so on.

So some time ago, the Python language was changed so that multiple exceptions would be chained together, so that when you had a chain of exceptions:

TypeError caused NameError which caused ImportError which caused …

you could see the entire chain, all the way back to the problem that started it all in the first place.

steven.daprano · August 11, 2022, 11:50am

On further thought, you suggested changing urllib to do this:

raise URLError(err) from err

Thinking about it, I don’t think that is correct. The URLError already contains an explicit reference to err, the error that caused it. So chaining the error as well seems unneeded.

Perhaps it should be:

raise URLError(err) from None

to break the chain and simplify the traceback.

ullix · August 11, 2022, 12:19pm

I would not be wondering about chaining of different exceptions, but in this case - see my initial post - the first exception message reads: <urlopen error timed out>
and the very last line of the 2nd exception message reads: <urlopen error timed out>.

This looks to me like two identical messages. So, why re-raising (if this is the issue) the exact same message?

vbrozik · August 11, 2022, 1:42pm

More important than the message is the exception class. The two exceptions in your traceback are:

TimeoutError: timed out

This one is part of the built-in exceptions and is raised at lower levels of the code:

urllib.error.URLError: <urlopen error timed out>

This one is part or urllib. Authors of the library decided to use specific exception classes for the library. This allows you to catch specifically just exceptions of urllib. Because URLError itself does not convey the information that the problem was a timeout, the authors decided to add this information as the message of the exception (by copying the message from the lower-level exception). That is the reason you see the same message timed out for the second time in the final exception.

If we know that there should be nothing unexpected regarding the first exception, then I agree that the suggestion by Steven: raise URLError(err) from None should be used. In such a case you would see just the second (higher level) exception and just the second traceback.