Over the last couple of weeks and months, Python core dev has started to use Discourse as a platform to make decisions or to justify actions. Before Discourse we used email lists to come to agreements. Email lists like python-dev and python-committers were used as authoritative, primary channel. I’m not counting Zulip, because we just it as chat and not as authoritative source for decisions.
While I have been enjoying Discourse so far, I see one issue: long term archival and backup. With email lists, we had a simple archive on the primary mailing list server and multiple clones on news servers and mirrors like gmane, Google Mail, and so on. The distributed nature, simple file format, and simple access make mailing lists a good long term archival.
But how are future core developers, researches, and archivist going to access our discussions on Discourse in 10, 20, 50, and even 500 years from now? Python has become an important programming language and is likely of interest for researchers in the future. I’m sensitized for the topic because I used to work at a company that dealt with archiving and publishing data from 2000 years old manuscripts, medieval books, to modern PDFs. Digital memory loss is a big issue for archivists these days.
Should we backup, archive, and publish discourse on a regular interval in machine readable formats like JSON? If we publish the dump, how are we going to deal with internal, non-public areas?