Removing non-HTML (PDF, EPUB, etc) documentation downloads

sharktide · August 9, 2025, 1:00pm

Once upon a time a computer science teacher I know was teaching python to his students and he had printed out a page from the Python docs and gave it to each kid as reference to try to teach 5th graders about function definition and control statements because they couldn’t be trusted to focus on the online docs

(That’s just the idea, but he did something similar. I changed the story a bit to protect his privacy)

mcepl · August 11, 2025, 9:54am

We are talking about developer documentation, I don’t think we need to have offline documentation on Lego Mindstorms EV3, and which developer machine doesn’t have ePuB reader at least either in the web-browser (possibly with some addons) if not as a stand-alone application?

mcepl · August 11, 2025, 10:01am

Okular (with all due respect to it, Evince is in the similar boat) is an excellent PDF reader, but not so great EPUB one. There is a whole lot of great EPUB readers (EPUB - Wikipedia, but that’s just very brief overview, there are many more) starting with the simplest one like the terminal epy (I have just made extensive improvements to it), through various browser addons (e.g., EPUBReader) to stand-alone apps.

AA-Turner · September 18, 2025, 4:07pm

I discussed this at the Core Developer Sprint in Cambridge this week with several others, including the release manager for Python 3.14.

We resolved that given the low usage, balanced against the high effort and resource use in maintaining them, we will no longer build PDF versions of the documentation for download. We will keep EPUB and the archived HTML downloads, which can still be used for offline use, and we will not delete existing PDFs.

We’ll also update the documentation to provide advice on how to build a PDF locally, which will remain an option.

Thank you to everyone who participated in this discussion or voted in the poll, this was a very productive discussion.

A

seektechnz · September 20, 2025, 11:16am

I really appreciate all you do for CPython, and want to say thank you. I also must respectfully ask if this decision could be reconsidered. The poll unfortunately had some fatal flaws for some of us who do use the PDF copy, with no option to keep less frequent PDF doc builds, i.e. per 3.x release. I believe I couldn’t even give a response to the poll because this option wasn’t available, and there were a sizeable number of commenters who expressed interest in this being an option.

Not everyone can easily build PDFs themselves, and the (apparently automated) process, when only run on each 3.x release (potentially even decoupled from the automated release build) would have a minimal effect on effort or resource usage. We’re talking about a single build essentially once per year, providing for many users who still use PDF copies as shown by the discussion, even among the limited subset of users who use the forums. Also, we can see from your initial survey of other languages that PDF is arguably the most common form of offline documentation for other languages.

Again, thank you for all you do.

PS: I’m sorry but your usage link didn’t work for me, I’m not sure if it’s an issue on my end.

oscarbenjamin · September 20, 2025, 11:53am

I think it is important not to underestimate the effort involved in maintaining something like this. The discussion above has focussed on build times but I think if I was contemplating not providing e.g. PDF then in the first instance I would be concerned with how much developer time is needed to maintain the build so that it still works. Building less frequently does not reduce the need to make fixes and debug problems that arise as a result of the many changes in between one release and another. I’m not sure how this is handled in CPython but I can imagine that postponing builds until release time might mean that the small team managing the release is burdened with fixing problems caused by accumulated changes from everyone else.

MegaIng · September 20, 2025, 12:28pm

If the PDF build breaks in the future, then the decision here isn’t just “we don’t build them anymore”, but the decision is “they are no longer available, and you can’t build them yourself”. So the decision is essentially to exclude anyone who needs PDF completely, since, as you said, building them is not a guarantee.

seektechnz · September 20, 2025, 12:51pm

This is a very good point, and also when considering developer time to make sure a build is working this is only once per year as part of the release process, instead of trying to keep up with any such changes throughout the year. It simultaneously gets the PDF build more attention (to make sure it’s working) as part of release while also reducing the frequency (and maintenance workload and build times) to a very acceptable level for those users who do still want it.

oscarbenjamin · September 20, 2025, 1:36pm

Or it can be that someone else is maintaining them, perhaps like @nedbat suggested above. My point is that building them only at release time does not reduce the maintenance work and potentially concentrates the burden of that work in the wrong place. In projects where I make the releases I try to distribute work away from the release process as much as possible.

MegaIng · September 20, 2025, 2:31pm

My point is that one shouldn’t lie by saying “PDFs are still available if you build them yourself”. Unless someone else takes over the non-insignificant amount of maintenance words, PDF builds are broken and PDFs completely unavailable.

This is a pain that may well be acceptable for the Python core devs. But framing it as “PDFs aren’t gone” is dishonest.

adqm · September 20, 2025, 2:33pm

From that analytics page, it looks like were more than 2.5k downloads of PDF zips/tarballs over the past week. I understand that compared to the rest of the docs those numbers are quite low, but ~300 unique downloads per day (maybe ~1 in 500 unique visitors if I’m counting right) isn’t nothing. There are also a couple of ways in which these numbers might underestimate:

analytics.python.org is blocked by uBlock Origin. Like @seektechnz, I couldn’t view the analytics page at first…until disabling my ad blocker. This also means that my visits to d.p.o aren’t included in the analytics numbers.
One benefit of offline docs is that they can be shared. But that kind of sharing (copying PDF’s via USB drive, for example) doesn’t tick the analytics numbers up even though it would mean more people ultimately have those docs in hand.

What’s more, the PDF downloads seem to be more popular than HTML downloads by about a factor of 5, which echoes other thoughts expressed earlier in this thread that for folks who need an offline option (or just want one), PDF seems to be the choice.

I would also probably expect that the population at the core dev sprint and the population on this forum (i.e., the people who were able to participate in the conversation) are likely not the ones who depend on the offline versions of the docs; it might be hard to participate in the poll/discussion if you have seriously limited Internet access, for example. Analytics are certainly a way to try to get at to how those folks are making use of the docs (or not making use of them, as the case may be), so I’m not suggesting that this decision was made lightly or without considering those people. But I agree with @adorilson’s original take that, even if the intention behind the change is not exclusionary, the end result could be interpreted as such.

In case it’s not clear from the above, I would also like to see this decision revisited . That said, though, I’ll readily admit ignorance of both the nature and the extent of the maintenance burden that the PDF builds add, so the actual question of whether these benefits are worth the maintenance burden is beyond me.

fungi · September 20, 2025, 4:25pm

From that analytics page, it looks like were more than 2.5k downloads of PDF zips/tarballs over the past week. I understand that compared to the rest of the docs those numbers are quite low, but ~300 unique downloads per day (maybe ~1 in 500 unique visitors if I’m counting right) isn’t nothing.

If it’s at all like any of the other documentation sites I maintain for major open source projects, 999 out of every 1000 requests is from distributed LLM training crawlers masquerading as legitimate browser sessions (faked UA strings and all). They’re best identified by the fact that they come from random cloud IP addresses and follow every link on every page, but in essence I no longer trust “download counts” as a legitimate measure of anything (positive or negative). Meat-driven activity barely registers as minute variations against the background radiation of the AI goldrush now.

AA-Turner · September 20, 2025, 8:29pm

I agree with and second the comments regarding burden on volunteers. To elaborate, the LaTeX/PDF version of the documentation is very much a special case. No other format requires substantial work to curate and maintain OS-level dependencies, or has similar problems with e.g. non-Latin characters. We only have a few people working on ‘documentation infrastructure’ in the Python project; anecdotally it feels like I have had to spend about as much time working on LaTeX issues as everything else put together.

For clarity, download statistics were not used as the final arbiter of this decision. I looked at the counts relative to views for other pages, rather than absolute numbers, but to be clear, both are low. I echo the points made that scraping will likely account for some non-zero amount of these counts.

On the question of framing, and ability to render the documentation to PDFs in the future, I feel it is important to note we are not wholesale removing the support we currently have. We will continue to accept patches and PRs to fix broken builds. What changes is that the Python project will no longer provide PDFs as pre-built artefacts. The Developer’s Guide can continue to list the instructions needed to create PDFs locally, and perhaps those instructions could even be improved.

This decision can always be reversed, and in time I’d be happy to consider doing so, but on the current balance of demonstrated usage versus resource consumption (both volunteers and computers), removing PDFs is the right choice.^[1]

A

I would seriously consider switching a non-LaTeX Sphinx backend for PDFs, but it would need substantial testing with the Python documentation. ↩︎

AA-Turner · September 20, 2025, 10:45pm

I forgot to mention, but we will continue to publish the archived HTML files & the EPUB version, so there are still at least two mechanisms for accessing the documentation offline that are published and supported by the core developers.

A