Serve HTML from extensionless URLs in http.server

Fraetor · December 23, 2022, 6:29pm

gh-100463: Add handling for extensionless URLs in http.server.

For example if you have a folder containing foo.html and run python3 -m http.server from it, you would be able to use the URL http://localhost:8000/foo rather than having to use http://localhost:8000/foo.html.

Pitch

A common pattern in URL design is to omit the file extension for pages, as this both makes the URL look nicer and makes it technology agnostic (it doesn’t matter if the page is written in PHP or HTML, the technology can be changed without the link changing).

The motivation for this change is the common use case of the python HTTP server for static website development, especially amongst those learning HTML/CSS, or on restricted systems where additional software is not able to be installed. This change allows them to use the best practice of omitting extensions in links and still having navigation between multiple pages working which in my view aligns with the goal of http.server to be a simple test/demo server used while in development.

Behaving like this would more closely follow most other web servers, and is less unexpected.

Security

Concerns about security were raised regarding this change, however I do not believe there is any significant increase in attack surface here.

For starters, any file that would be accessible through this mechanism is already accessible through putting the extension in the URL, and thus it doesn’t give the attacker any new capabilities.
The change involves adding between one and three (worst case unless additional extensions are specified) additional stat() calls (via os.path.exists()) however these are fast and cached, so it should not be a significant vector for denial of service attacks.
The URL is only appended to, and thus there should be minimal chance for undesired manipulation.

The only case I can see this being an issue is if something relies on a particular URL returning a 404 error, and instead gets served a page, however I can’t think of any realistic scenarios where this should be the case. There are some challenges around when to resolve a directory rather than a file however.

Rosuav · December 23, 2022, 9:32pm

I don’t think this is really right for the vanilla -m http.server super-simple server. It should be pretty easy to do this with your own script (subclass the request handler and add your code there), but given that this isn’t meant for production work anyway, it’s never going to be able to do everything that you want it to.

Maybe this would be better done as a recipe in the docs, showing how easy it is to extend the basic functionality?

Fraetor · December 23, 2022, 10:09pm

I guess it boils down to what we think the http.server module should be. For me the use case is almost entirely manually running python3 -m http.server. My common use cases are as follows:

A temporary web server to use while I am fighting with CSS. (That is what this effort would help.)
A rudimentary way to transfer a file to another machine. (Quicker than finding a USB stick)
Some other program can take resources from a URL, and I want to give it something from my local computer.

http.server usually isn’t the 100% best tool for the job, but it is readily available, as pretty much all computers I use have python installed, and as such it is good for these quick “hacks”. I’d like to improve it for my particular use case, and I don’t see a drawback to this change. (My implementation is less than 15 lines of code.)

Rosuav · December 23, 2022, 10:21pm

Quick file transfers are a great use-case for this, since it strictly just returns files from the file system. For the other use-cases, if changing your URL is too much of a hassle, it’s pretty easy to make your own Python script that imports http.server and makes its changes; you can then leave that script in your personal library and use that instead of python3 -m http.server.

The problem is that your use-case requires one small tweak, and someone else’s will require another small tweak, and so on, and the parameter list would quickly become untenable. The classes in http.server are designed to be extended, so it should be easy to create the functionality you want.

jack1142 · December 24, 2022, 8:14pm

I’m indifferent to whether this gets added (I suppose I see some narrow use case) but I do think it should not be the default and should require specifying a flag. I know that the patch still prefers exactly matched file over html/htm file but I still don’t like the possibility of getting a different file than I requested because there happened to be a file with that name but htm/html suffix added.

Fraetor · December 24, 2022, 10:00pm

I guess it depends on if people mainly use the http.server for file sharing, in which case it would be better without this, or web development, in which case it would be better to have it.

How do you use http.server?

Mainly for file transfer
Mainly for web development
For both
Mainly for something else (please post your usecase)

0 voters

And merry Christmas to those who celebrate it!

Rosuav · December 24, 2022, 10:24pm

I use it mainly for web dev, but I still think it’s better to not have it.

hugovk · December 27, 2022, 8:13am

For comparison, GitHub Pages does something similar:

Rosuav · December 27, 2022, 9:59am

GitHub Pages does a lot that http.server doesn’t do, including parsing Markdown into HTML It’s a very different service.