Thank you for the feedback @adorilson. Would EPUB be an acceptable format for these uses, or the offline HTML archive?
One other alternative is keeping the PDF downloads, but updating them less frequently (eg weekly or monthly, instead of every other day at present). How important is it for the PDF to be the most up-to-date?
Reflecting recent releases seems like it would be important, but lagging a few weeks behind the online docs seems unlikely to matter otherwise.
Building them less frequently rather than skipping building them entirely would also mean that problems with the alternative format builds would still be noticed eventually.
One thing that the PDF (and presumably epub) versions have better than the HTML is a usable hierarchical table of contents which PDF readers can display on the side. The HTML version just doesnāt do very well in this regard. Granted, the search functionality of the HTML version is 100% better than a very slow full-text search in a PDF thousands of pages long, but for those ābook matterā (table of contents, appendix) the PDF reads more natural to me.
Note that the current PDF download is a zip of some 35 PDF files, one for each section, so you can only ever use the table of contents or full-text search within a single file, unlike the HTML version.
Although the largest single PDF is the stdlib, at 2.3k pages, you might miss other useful info from other sections that does show up in the HTML search.
For EPUB, I donāt think. Iāve never used EPUB. I downloaded it now, and when I tried to open it, the SO didnāt recognize a program to do it. Then I managed to open the file with Okular. But it was ugly, without formatting applied. I donāt know if it is an EPUB feature or an Okular issue.
For HTML, maybe. Better than EPUB. But only for desktop platforms, as @pf_moore said. I am also concerned about the thousand files inside the zip file, and it is expected that the reader knows what an `index.html` file is. From a user perspective, PDF is the most portable and user-friendly.
Concerning keeping PDFs updated less frequently, I was going to suggest it. I thought something more conservative, for instance, updating the PDF only when a new Python release is published (which includes bugfix releases, such as 3.13.X). Having an out-of-date file is something expected for offline doc readers.
Reading it is surprising to me. Search in the PDF is fast enough to me.
Besides, in terms of precision PDF search is much better. For instance, try to find āenumeration membersā. (despite the current PDF download is a zip with several files).
The ergonomics of the HTML table of contents needs improvement. Suppose Iām reading on the floor() function in the math module, and I want to look up rounding modes of decimal.Decimal to compare their uses. Clicking on the top-left āTable of Contentsā is going to make me get lost in the abyss. (The current HTML main table of contents starts everything from the Whatās New of every release, every changed module to changelog to setup and usage. The language and library reference is half way down because the aforementioned parts contain lots of headings.), and nothing below it is what I need.
Of course, I can open multiple tabs in the browser, one for math and one for decimal ā the ability to open multiple tabs easily is an advantage of the HTML version. But this doesnāt make the main and side-by-side HTML table of contents to my left any easier to use ā Iām just working around it. On the other hand, the table of contents of the PDF version of the library reference is displayed hierarchically by broad topic and then by module and by heading, making jumping to sibling modules and their headings much easier.
I voted for removing non-HTML versions before seeing this argument, but itās very persuasive: the people who currently use the one-file PDF solution are often people who are already extremely short on computing resources in the first place.
The best result would be āa single document that can be downloaded on a generic (e.g. android) cellphone and read without an internet connectionā, am I right? We arenāt even providing this in PDF, apparently!
Is there some HTML document that could take the place of the PDF in these cases? What about a single page HTML rendering of the whole thing?
Because each documentation section is a different file in the zip, but I see it as a feature (not a bug). Typically, each person is interested in a section at a different point in time. First, tutorial, then library, and so on. Therefore, having it in separate files looks to me like a good idea.
What could be improved here is to provide individual files to download, as well.
Someone noticed on the docs mailing list yesterday, but I donāt know if thatās in response to this Discourse topic
We also got a report about this on the mailing list. But it was unnecessary to build both A4 and letter, and removing one saved some 15 hours of build time for a full build (all versions and languages)!
Hmm, this is somewhat comparing different things.
The HTML library reference table of contents is here, which can be found via the front page:
If we had a one-file PDF, weād also get lost in an abyss of the table of contents
Although I do take your point that it can be natural to click āTable of Contentsā from a page and end up on the huge one (and agree Ctrl-F works).
But no-one currently uses the one-file PDF solution because there is none.
It generates one big 44 MB contents.html. plus also index.html and downloads.html because theyāre from templates rather than .rst, and weād need to adjust the links in the templates to point to the right places in the big file.
But itās very big, and Chrome struggles with it. I tried to āprintā to a 4,152-page PDF but it took too long.
Itās worth mentioning the amount of time it takes to build the docs. Especially as weāre often usually building 3 versions times 16 translations.
Three versions (3.13-3.15) of English HTML docs takes about 6 minutes
The HTML for three versions and translations is between 3.75-6.75 hours
The non-HTML for three versions and translations takes about 23.5 hours
Security releases of 3.9-3.12 also need a rebuild for all languages.
Thereās not just the regular docs builds. We build docs as part of the release process. Just this week with 3.13.6 we ran into problems with the PDF dependencies taking a long time to download and timing out the build.
It also takes a considerable amount of maintenance for PDF and EPUB.
PDF because of all the different LaTeX engines/libraries needed. Some translations need different ones. We sometimes get build failures due to specific characters in translations which can be hard to debug.
EPUB because itās XHTML, and when we use modern HTML5 that isnāt strict XHTML it can break EPUB. This can require fixes in both our config and in Sphinx plugins.
Perhaps building PDF docs is the kind of work that could be offloaded from the core team to a separate community effort? Why does it have to be provided by the core team?
Thereās no option on the poll for it, but I have to vouch for the usefulness of PDF docs, especially if theyāre in one single PDF for searchability and navigation. PDF vs EPUB feels like no contest to me as PDF is far more available on a variety of platforms, so I canāt see a reason for anything but HTML and PDF.
My personal use case for PDF docs is when Iām traveling, especially on a plane, and donāt have reliable internet for my work. Itās much much easier to use a PDF than to deal with an HTML folder structure. That said, I definitely donāt think that PDFs need to be updated constantly. Every release, even if only 3.x (minor, not revision) releases, would be a huge help.
PDFs are much better for reference/printing at times than raw HTML. I personally like that ePub will reflow, but itās definitely hit and miss on support.
Likely a niche use case, but I know someone whoās been experimenting with supplying a local copy of the python docs to an LLM and getting more useful results with the PDF version by itself or in conjunction with the HTML.
The downside is that when it does break itās likely to be due to content changes (although that may be translated content rather than the original English content).
So Iāll suggest a variant on this: is there a reason we need non-HTML docs as a release artifact? Or could we instead just run the periodic build as normal during the release candidate process, so weāre unlikely to get a docs build failure in the first one after the release is tagged?
According to OP chart, many other languages provide pdfs. Just one other provides epub, which I would not know how to use, none provide other. To me, the most sensible change from the above and discussion would be keep pdf (built less frequently) and dump epub, other. This is not a vote option, so I cast my vote here.
Removing the PDF would break every K12 teacher who teaches python that I know of.
I suppose they might either stay on the most recent PDF-documented python version, or be forced to learn how to generate a probably sub-par PDF themselves.
If you want a specific blocker to HTML-only - some copiers IME in such environments will print a PDF on a thumb drive, but wonāt even recognize a HTML file.
Please would you be able to elaborate? Iām not familiar with the American educational system. What are the use-cases for the PDF that arenāt covered by the website? Note as Hugo pointed out, there are actually ~35 PDFs ā which of these are used or useful?
I donāt have full info about versions/use etc. In my experience they donāt always have 1 to 1 devices/students, and donāt have multiple monitors, so having paper copies is useful for reference without needing a device.
I can also imagine double utility here where a student is using a table of contents etc to navigate a physical copy.
NB I know of at least 3-4 current teachers who use, only 1-2 of those are in America (things change year to year), 1 in Africa/Asia, 1 in Australia.
I suppose another use case from that perspective would be kids with a device but no internet at home (or who have a cell but no internet on the laptop theyāre issued) which is the case for quite a lot of kids around where I live.
Iāve just heard teachers Iāve spoken to printing out docs, which seems wild to me, but still (married to a teacher, worked in the schools myself, heard someone was printing, started asking other teachers).
Also I only just know found out the number of files the docs/how-tos/reference are broken into. Doesnāt matter so much when someone is hitting print with a ream of paper. Iāve definitely had a download of what was probably an amalgamation of some of them from a time when my personal connection was less great, although judging by what I see today, that might have come from a 3rd party relying on the original python.org files or the HTML, either way trusting direct downloads might not be an accurate measure if people are repackaging.
I still feel having at least PDF is important for accessibility/preservation etc. HTML is great, but not nearly as helpful for printing or reference (canāt imagine a hypothetical 8yo figuring out how to navigate a HTML file tree nearly as easily as a PDF, particularly on a mobile device). I can imagine having lots of versions is a considerable overhead, but hopefully keepting HTML and PDF are do-able, even if itās a per-major-version or per-minor deal for the PDF, that would probably serve the vast majority of users who losing the PDF would be most impactful to.