Pipe HTML source into `webbrowser` when run as module

Dutcho · October 30, 2022, 3:13pm

Accept standard input as HTML source for `webbrowser` when run as module

TL;DR

Let’s make url optional for webbrowser in CLI, and default to standard input.

Current state

The standard library module webbrowser can be called from the command line to display any url in the standard browser.
This includes local file urls (but see penultimate section below).

This is useful for script usage, e.g.:

python generate.py | python -m markdown -x extra > temp.html
python -m webbrowser temp.html
del temp.html

Here, generate.py is a script generating markdown output,
markdown is a PiPy package converting markdown to HTML,
and webbrowser is the standard library module, run as script by the -m argument.

Current limitation

However, currently webbrowser requires the url to be present, so we cannot display HTML from standard input.
This limitation prevents (more streamlined) usage like the equivalent:

python generate.py | python -m markdown -x extra | python -m webbrowser

Above pipe would make usage easier, and prevent issues associated with temp.html, e.g., forgetting to delete afterwards, prior existence of a same-name temp.html file, and concurrency (two of above scripts running in parallel, conflicting over the temporary file).

Proposal

I propose to make webbrowser’s url CLI parameter optional, and read from standard input when omitted.
I.e., proposed usage would be python -m webbrowser [-n | -t][url] (optional url), instead of current python -m webbrowser [-n | -t] url (mandatory url).

I don’t propose this functionality to be available from other Python scripts (only from CLI).
One would not be able to call webbrowser.open() without url (as I don’t see a clear use case for such use).

PyPi alternative

Implementation of above feature as a PyPI third-party module (e.g., stdin_browser - webbrowser for stdin) is possible.
But that would miss the convenience of python -m webbrowser being always available, requiring users to pip install, and remember when to use which. Depending on its implementation, it would also duplicate logic (e.g., on [-n | -t] argument handling), risking evolutionary divergence between standard library webbrowser and third-party stdin_browser.

Benefits

As mentioned above, this feature will allow usage in pipes without manually managing temporary files.

As (until now) url was mandatory, it’s backward compatible: Nothing changes when an url is provided, and omitting the url currently fails.

Implementation

I made a proof of concept implementation of the proposed change.

The PoC implementation prompts for input in the case of interactive usage (so if not pipe).
That’s not a frequently expected use case, but users may otherwise not recognize webbrowser waits for input in non-pipe usage (python -m webbrowser solo).

Internally, the PoC implementation (like the shell script) needs to write standard input to a temporary file, and then provide the temporary file’s pathname to the browser.
So parallelism (piping on Linux only) between writing HTML and displaying it is not an accomplished benefit.

The PoC implementation has two timing issues and two assumptions (below).
These also exist in the shell script above, but become more explicit by moving the logic from shell to Python.

Wait for browser to start reading

The temporary file must not be removed until the browser reads it.

webbrowser starts up the browser, then wants to continue to remove the temporary file, but doesn’t know when that’s safe to do, as it cannot control or determine whether the browser actually started.

The time for browser startup is unpredictable.
My system (Windows 11 on Intel i5 with SSD) needs less than 0.5 seconds, but some systems can be slower.

The PoC implementation now uses a waiting time of 4 seconds, likely erring on the safe side for many systems.
But that may still be too short (e.g., when running from slow storage), in which case the file is gone before the browser can display it. Also, waiting longer than needed delays completion of the shell script.

Open point: Predictable performance: We need better usage data, or ideally (but unrealistically?), a way to check browser status.

As a workaround, we could consider adding a wait CLI argument, e.g., -w seconds (with default of 1).
However, this isn’t very user-friendly, as it shifts the responsibility to them.

Wait for browser to finish reading

The temporary file can only be removed after the browser completes reading it.

After the browser starts reading the temporary file, webbrowser cannot remove the temporary file until the browser is done reading it.

The time for reading the temporary file is unpredictable.
For sufficiently large HTML files (on my system, e.g., a 10 MB file with a 2,500 rows x 100 columns HTML table) the initial wait (above, now 4 sec) may be insufficient, resulting in a PermissionError when webbrowser tries to remove it.

The PoC implementation then informs the user, again waits (exponentially longer), and retries.
While this works, it’s not very “elegant”, and waiting longer than needed delays completion of the shell script.

Open point: Alternatives: Any better ideas?

E.g., the mechanism could remain the same, but go silent (not report waiting to user) and try more often. That would remove (most) unneeded delay, but it may reduce users’ understanding (“why doesn’t my script finish?”).

E.g., webbrowser.open() can wait for the browser to close in some situations, specifically Unix without remote browser. But this doesn’t work universally.

Assumed non-blocking display by browser

The followed approach (in shell and in Python webbrowser) assumes that the browser does not lock the displayed local file.

Otherwise, webbrowser cannot remove the temporary file while the browser is open.
The above wait (for browser to finish reading) would actually retry until the user closes the tab in the browser.

This assumption holds for current Windows browsers I tested with.
However, it can be false for other browsers or platforms.

Open point: Extend coverage: Which (common?) browsers or platforms block?

Assumed local file usage of `webbrowser`

Officially, webbrowser doesn’t support local files, so also not temporary files.

Still, the followed approach (in shell and in Python webbrowser) works for current Windows browsers I tested with. However, it can fail for other browsers or platforms.

Adding support for standard input in CLI use may be considered contrary to not supporting local files.
However, that is not without precedent (“hack for local urls” in webbrowser.MacOSX.open()).

The proposed feature could be added to running webbrowser as a module for CLI usage, and documented with a caveat as to only supported if the underlying webbrowser.open() and actual browser support it.

Open point: Fact finding: Do any (common?) browsers or platforms not support local files?

Next steps

What do you think of above proposal @birkenfeld (I believe = Georg Brandl = maintainer of webbrowser; sorry if incorrect) and @all?
Any inputs on above open points?
What needs doing to decide on implementation in the standard library?
What needs doing to implement in the standard library?

(edited to remove superfluous line-breaks due to different markdown flavours)

Rosuav · October 30, 2022, 4:40pm

Check if that’s true on any platform other than Windows.