See gh-96777 for what is kind-of an enhancement to the wave module, allowing it to open WAVE_FORMAT_EXTENSIBLE files that specify their format as PCM, as well as WAVE_FORMAT_PCM files.
In other words, there are two different header formats for identically formatted data, but we would reject one and accept the other. Python 3.12 will accept both header formats, but I’m unsure whether we ought to backport it?
I got involved in the issue because I hit it a few years back, though eventually solved my own problem using ffmpeg and shell scripts instead of Python. Currently I’m very 50/50 on whether it makes sense as a bugfix (“wave module rejects standard PCM headers”) or an enhancement (“support wave files using extensible header format”). I’m confident in the fix’s logic and don’t have real concerns there, but don’t feel strongly enough about it to EAFP the backport.
Maybe it helps to think about how this would be documented? What do you write in the Wnat’s New guide? What phrasing do you use in the docs for the wave module? Are you thinking of just 3.11 (it would go in 3.11.1) or also older releases?
Thanking about that didn’t help me decide So I’m looking for additional opinions.
The NEWS entry for the PR in 3.12 reads “Support reading wave files with the WAVE_FORMAT_EXTENSIBLE format in the wave module” which is suitably neutral. And the docs say nothing about supported formats, so I could still argue either way.
I also know nothing about wave formats, but I agree. I can imagine someone getting weird bugs because their code won’t process WAVE_FORMAT_EXTENSIBLE files when run in 3.11.0 and will when run in 3.11.1. To an extent, it probably depends on whether people think of both of these file formats as “the same”. Are they both given the same file extension?
Most people don’t think about different wave file encodings - they’re all .wav or .wave files, and usually PCM, which basically means uncompressed. So yeah, I’d say people usually consider them to be the same.
This is the most compelling reason so far to not backport. It seems that scipy has a better implementation anyway, so probably people who need it will go there. (A reasonably sized wave file is going to be really slow to process without numpy.)
Because I have both kinds of reasons, and still aren’t fully convinced which one I want to use
I’m sympathetic to the “it always ought to have worked” position on this one. But maybe the “why does this file work here (3.11.0) but not there (3.11.1)” issue outweighs that…
But that would argue against fixing any bugs in bugfix releases…
If you had backported this I wouldn’t have challenged it. And telling people the fix for some problem they have is to upgrade to a new bugfix release is much easier than to tell them to upgrade to a new feature release.
After chatting with Eric, have decided not to backport. Users who hit this prior to 3.12 can use pedalboard, which has much better file support anyway.
Interesting that the arguments against the backport are essentially arguments for the deprecation of the whole module, and could easily have been taken as arguments for rejecting the patch in the first place. I’m sure this is not the first time such a thing has happened – wasn’t the wave module on the initial list of modules to deprecate and delete? It seems to be in a somewhat precarious state.
The current wave module doc says that it works with uncompressed .wav files that are readable or writable but not both. There is no mention of requiring a particular internal header format for such files, which I suspect many users would not know about. (What error message is given when the ‘wrong’ one is encountered.)
gh-96777 does not change the doc, as it brings the code into conformance with the doc as written. This is normally called a bugfix. I know nothing of when the more flexible header was introduced and its occurrence in the wild, but If its support is considered an enhancement, then perhaps the 3.11/10 docs should document its non-support.
To me, 3rd party modules are not especially relevant to out patch classification.
This seems like a pretty good rule of thumb. I’d missed that the docs already said “uncompressed”, so if that equates to WAVE_FORMAT_PCM, then it should be backported.
However, I think if we’ve missed a range of uncompressed formats, it’s probably better to just update the older docs to clarify that only simple PCM formats are supported. One example that isn’t supported is WAVE_FORMAT_IEEE_FLOAT floating-point data, which is also uncompressed.[1]
So it seems like the right time to draw the line on what additional formats would be backported, and if it’s a narrow line, then we should stop before the most recent one. (I’d certainly not be opposed to someone taking up the module and adding all the formats they can, though only for new releases.)
Interestingly, the audioop module includes decompression functions for typical compressed wave files, even though wave won’t open them. ↩︎
A discrepancy between code and doc can be fixed by revising either. I agree here with the simpler solution of revising the doc to only promise what the code does (I would include main). Then add a doc patch to your PR that includes a New in 3.12 note. I agree than any further format additions should be 3.x.0 releases, with doc revision and a ‘New in’ note.
I understand I’m bumping a very old thread here; but at work we are using wave.py to generate 6 channel audio from lab equipment and I found out the headers were wrong, when converting them to FLAC. It turns out 3.12 got support for readingWAVE_FORMAT_EXTENSIBLE which is great, but it does not write that format, it writes the header for WAVE_FORMAT_PCM which according to the spec is limited to 2 channel 16-bit. So wave.py will write files with 6 channels or 24 bits or so happily, only the result will have invalid headers.
Backport of cpython version of wave, including fixes for handling bytes and pathlike objects that are currently targeted for python 3.15
Support for reading the WAVE_FORMAT_EXTENSIBLE that is introduced in 3.12, but you can install newwave all the way back to python 3.10
Support for writingWAVE_FORMAT_EXTENSIBLE
Support for reading and writing IEEE-formatted WAVs (GH-60729)
Support for handling WAVE metadata
plus some more bug fixes, plus more documentation.
In general, the wave module is very old and did not see much love. I would like to provide it some TLC but I’m unsure to what extent this is desired! Is there any guidance on this topic?
Producing invalid output feels like a normal bug, file a cpython issue and we should fix it regardless - even if the only viable stdlib fix were to raise an error rather than silently generating an invalid file. Such a fix is viable as a backport to bugfix releases (3.13 and 3.14) if it doesn’t require changing APIs.
The rest of what you are doing in newwave(ironically actually appropriate name!) are features, while we could might to accept some of those in the future, initially developing and iterating upon them in the PyPI project first makes more sense before we consider adding anything. That way they’re already available to existing Python users regardless of version.
yeah I specifically would like newwave to be a place where we can add functionality before it gets upstreamed to cpython. I am going to be filing bugs and pushing fixes for cpython in the coming period.
The problem with invalid headers is that it needs a little more changes than just a header. For 6 channel audio you must also specify more parameters for each channel, one channel is automatically mono, two is stereo, but beyond that the spec mandates a channel configuration to understand which channel is what. So the API will need to be adjusted.
my company has been writing 6 channel audio files with invalid headers using python for years at this point. If python will stop accepting it - or even if it will start writing CORRECT headers - those will be breaking changes. So I think cpython should be reluctant with these kind of changes…