Removing non-HTML (PDF, EPUB, etc) documentation downloads

AA-Turner · August 7, 2025, 4:09am

Thank you for the feedback @adorilson. Would EPUB be an acceptable format for these uses, or the offline HTML archive?

One other alternative is keeping the PDF downloads, but updating them less frequently (eg weekly or monthly, instead of every other day at present). How important is it for the PDF to be the most up-to-date?

A

ncoghlan · August 7, 2025, 4:20am

Reflecting recent releases seems like it would be important, but lagging a few weeks behind the online docs seems unlikely to matter otherwise.

Building them less frequently rather than skipping building them entirely would also mean that problems with the alternative format builds would still be noticed eventually.

matthewyu0311 · August 7, 2025, 4:09pm

One thing that the PDF (and presumably epub) versions have better than the HTML is a usable hierarchical table of contents which PDF readers can display on the side. The HTML version just doesn’t do very well in this regard. Granted, the search functionality of the HTML version is 100% better than a very slow full-text search in a PDF thousands of pages long, but for those “book matter” (table of contents, appendix) the PDF reads more natural to me.

hugovk · August 7, 2025, 5:59pm

Note that the current PDF download is a zip of some 35 PDF files, one for each section, so you can only ever use the table of contents or full-text search within a single file, unlike the HTML version.

Although the largest single PDF is the stdlib, at 2.3k pages, you might miss other useful info from other sections that does show up in the HTML search.

adorilson · August 7, 2025, 6:09pm

For EPUB, I don’t think. I’ve never used EPUB. I downloaded it now, and when I tried to open it, the SO didn’t recognize a program to do it. Then I managed to open the file with Okular. But it was ugly, without formatting applied. I don’t know if it is an EPUB feature or an Okular issue.

For HTML, maybe. Better than EPUB. But only for desktop platforms, as @pf_moore said. I am also concerned about the thousand files inside the zip file, and it is expected that the reader knows what an `index.html` file is. From a user perspective, PDF is the most portable and user-friendly.

Concerning keeping PDFs updated less frequently, I was going to suggest it. I thought something more conservative, for instance, updating the PDF only when a new Python release is published (which includes bugfix releases, such as 3.13.X). Having an out-of-date file is something expected for offline doc readers.

adorilson · August 7, 2025, 6:26pm

Reading it is surprising to me. Search in the PDF is fast enough to me.

Besides, in terms of precision PDF search is much better. For instance, try to find “enumeration members”. (despite the current PDF download is a zip with several files).

adorilson · August 7, 2025, 6:36pm

I have some questions here:

Why is it this way?
Why is PDF this ways but EPUB is just one file with entire doc?
Why is howto section broken in several files?

matthewyu0311 · August 7, 2025, 9:50pm

The ergonomics of the HTML table of contents needs improvement. Suppose I’m reading on the floor() function in the math module, and I want to look up rounding modes of decimal.Decimal to compare their uses. Clicking on the top-left “Table of Contents” is going to make me get lost in the abyss. (The current HTML main table of contents starts everything from the What’s New of every release, every changed module to changelog to setup and usage. The language and library reference is half way down because the aforementioned parts contain lots of headings.), and nothing below it is what I need.

What I actually need to do is to scroll all the way to the top, click Numeric and Mathematical Modules which now appears on top, scroll down (Ctrl-F also works) to find the rounding modes in the decimal module.

Of course, I can open multiple tabs in the browser, one for math and one for decimal – the ability to open multiple tabs easily is an advantage of the HTML version. But this doesn’t make the main and side-by-side HTML table of contents to my left any easier to use – I’m just working around it. On the other hand, the table of contents of the PDF version of the library reference is displayed hierarchically by broad topic and then by module and by heading, making jumping to sibling modules and their headings much easier.

TomRitchford · August 8, 2025, 9:46am

I voted for removing non-HTML versions before seeing this argument, but it’s very persuasive: the people who currently use the one-file PDF solution are often people who are already extremely short on computing resources in the first place.

The best result would be “a single document that can be downloaded on a generic (e.g. android) cellphone and read without an internet connection”, am I right? We aren’t even providing this in PDF, apparently!

Is there some HTML document that could take the place of the PDF in these cases? What about a single page HTML rendering of the whole thing?

adorilson · August 8, 2025, 11:07am

No, we don’t.

Because each documentation section is a different file in the zip, but I see it as a feature (not a bug). Typically, each person is interested in a section at a different point in time. First, tutorial, then library, and so on. Therefore, having it in separate files looks to me like a good idea.

What could be improved here is to provide individual files to download, as well.

hugovk · August 8, 2025, 11:41am

Someone noticed on the docs mailing list yesterday, but I don’t know if that’s in response to this Discourse topic

We also got a report about this on the mailing list. But it was unnecessary to build both A4 and letter, and removing one saved some 15 hours of build time for a full build (all versions and languages)!

Hmm, this is somewhat comparing different things.

The HTML library reference table of contents is here, which can be found via the front page:

If we had a one-file PDF, we’d also get lost in an abyss of the table of contents

Although I do take your point that it can be natural to click “Table of Contents” from a page and end up on the huge one (and agree Ctrl-F works).

But no-one currently uses the one-file PDF solution because there is none.

Yes, we can use Sphinx’s singlehtml builder. Here’s a demo: 3.15.0a0 Documentation / CPython singlehtml docs · GitHub

It generates one big 44 MB contents.html. plus also index.html and downloads.html because they’re from templates rather than .rst, and we’d need to adjust the links in the templates to point to the right places in the big file.

But it’s very big, and Chrome struggles with it. I tried to “print” to a 4,152-page PDF but it took too long.

hugovk · August 8, 2025, 12:02pm

It’s worth mentioning the amount of time it takes to build the docs. Especially as we’re often usually building 3 versions times 16 translations.

Three versions (3.13-3.15) of English HTML docs takes about 6 minutes
The HTML for three versions and translations is between 3.75-6.75 hours
The non-HTML for three versions and translations takes about 23.5 hours

Security releases of 3.9-3.12 also need a rebuild for all languages.

There’s not just the regular docs builds. We build docs as part of the release process. Just this week with 3.13.6 we ran into problems with the PDF dependencies taking a long time to download and timing out the build.

It also takes a considerable amount of maintenance for PDF and EPUB.

PDF because of all the different LaTeX engines/libraries needed. Some translations need different ones. We sometimes get build failures due to specific characters in translations which can be hard to debug.
EPUB because it’s XHTML, and when we use modern HTML5 that isn’t strict XHTML it can break EPUB. This can require fixes in both our config and in Sphinx plugins.

nedbat · August 8, 2025, 12:37pm

Perhaps building PDF docs is the kind of work that could be offloaded from the core team to a separate community effort? Why does it have to be provided by the core team?

seektechnz · August 8, 2025, 1:40pm

There’s no option on the poll for it, but I have to vouch for the usefulness of PDF docs, especially if they’re in one single PDF for searchability and navigation. PDF vs EPUB feels like no contest to me as PDF is far more available on a variety of platforms, so I can’t see a reason for anything but HTML and PDF.

My personal use case for PDF docs is when I’m traveling, especially on a plane, and don’t have reliable internet for my work. It’s much much easier to use a PDF than to deal with an HTML folder structure. That said, I definitely don’t think that PDFs need to be updated constantly. Every release, even if only 3.x (minor, not revision) releases, would be a huge help.

toonarmycaptain · August 8, 2025, 2:04pm

PDFs are much better for reference/printing at times than raw HTML. I personally like that ePub will reflow, but it’s definitely hit and miss on support.

Likely a niche use case, but I know someone who’s been experimenting with supplying a local copy of the python docs to an LLM and getting more useful results with the PDF version by itself or in conjunction with the HTML.

ncoghlan · August 9, 2025, 12:05am

The downside is that when it does break it’s likely to be due to content changes (although that may be translated content rather than the original English content).

So I’ll suggest a variant on this: is there a reason we need non-HTML docs as a release artifact? Or could we instead just run the periodic build as normal during the release candidate process, so we’re unlikely to get a docs build failure in the first one after the release is tagged?

tjreedy · August 9, 2025, 2:55am

According to OP chart, many other languages provide pdfs. Just one other provides epub, which I would not know how to use, none provide other. To me, the most sensible change from the above and discussion would be keep pdf (built less frequently) and dump epub, other. This is not a vote option, so I cast my vote here.

toonarmycaptain · August 9, 2025, 4:04am

Where do we vote?

Removing the PDF would break every K12 teacher who teaches python that I know of.
I suppose they might either stay on the most recent PDF-documented python version, or be forced to learn how to generate a probably sub-par PDF themselves.

If you want a specific blocker to HTML-only - some copiers IME in such environments will print a PDF on a thumb drive, but won’t even recognize a HTML file.

AA-Turner · August 9, 2025, 4:13am

The first post has a poll/survey.

Please would you be able to elaborate? I’m not familiar with the American educational system. What are the use-cases for the PDF that aren’t covered by the website? Note as Hugo pointed out, there are actually ~35 PDFs – which of these are used or useful?

A

toonarmycaptain · August 9, 2025, 5:10am

I don’t have full info about versions/use etc. In my experience they don’t always have 1 to 1 devices/students, and don’t have multiple monitors, so having paper copies is useful for reference without needing a device.
I can also imagine double utility here where a student is using a table of contents etc to navigate a physical copy.

NB I know of at least 3-4 current teachers who use, only 1-2 of those are in America (things change year to year), 1 in Africa/Asia, 1 in Australia.

I suppose another use case from that perspective would be kids with a device but no internet at home (or who have a cell but no internet on the laptop they’re issued) which is the case for quite a lot of kids around where I live.

I’ve just heard teachers I’ve spoken to printing out docs, which seems wild to me, but still (married to a teacher, worked in the schools myself, heard someone was printing, started asking other teachers).
Also I only just know found out the number of files the docs/how-tos/reference are broken into. Doesn’t matter so much when someone is hitting print with a ream of paper. I’ve definitely had a download of what was probably an amalgamation of some of them from a time when my personal connection was less great, although judging by what I see today, that might have come from a 3rd party relying on the original python.org files or the HTML, either way trusting direct downloads might not be an accurate measure if people are repackaging.

I still feel having at least PDF is important for accessibility/preservation etc. HTML is great, but not nearly as helpful for printing or reference (can’t imagine a hypothetical 8yo figuring out how to navigate a HTML file tree nearly as easily as a PDF, particularly on a mobile device). I can imagine having lots of versions is a considerable overhead, but hopefully keepting HTML and PDF are do-able, even if it’s a per-major-version or per-minor deal for the PDF, that would probably serve the vast majority of users who losing the PDF would be most impactful to.