Backporting support for `wave` formats

See gh-96777 for what is kind-of an enhancement to the wave module, allowing it to open WAVE_FORMAT_EXTENSIBLE files that specify their format as PCM, as well as WAVE_FORMAT_PCM files.

In other words, there are two different header formats for identically formatted data, but we would reject one and accept the other. Python 3.12 will accept both header formats, but I’m unsure whether we ought to backport it?

I got involved in the issue because I hit it a few years back, though eventually solved my own problem using ffmpeg and shell scripts instead of Python. Currently I’m very 50/50 on whether it makes sense as a bugfix (“wave module rejects standard PCM headers”) or an enhancement (“support wave files using extensible header format”). I’m confident in the fix’s logic and don’t have real concerns there, but don’t feel strongly enough about it to EAFP the backport.

Any thoughts?

Maybe it helps to think about how this would be documented? What do you write in the Wnat’s New guide? What phrasing do you use in the docs for the wave module? Are you thinking of just 3.11 (it would go in 3.11.1) or also older releases?

Thanking about that didn’t help me decide :slight_smile: So I’m looking for additional opinions.

The NEWS entry for the PR in 3.12 reads “Support reading wave files with the WAVE_FORMAT_EXTENSIBLE format in the wave module” which is suitably neutral. And the docs say nothing about supported formats, so I could still argue either way.

I know nothing about wave formats, but what you say here sounds to me like a new feature, not a bug fix, which means we shouldn’t backport it.

1 Like

I also know nothing about wave formats, but I agree. I can imagine someone getting weird bugs because their code won’t process WAVE_FORMAT_EXTENSIBLE files when run in 3.11.0 and will when run in 3.11.1. To an extent, it probably depends on whether people think of both of these file formats as “the same”. Are they both given the same file extension?

Most people don’t think about different wave file encodings - they’re all .wav or .wave files, and usually PCM, which basically means uncompressed. So yeah, I’d say people usually consider them to be the same.

This is the most compelling reason so far to not backport. It seems that scipy has a better implementation anyway, so probably people who need it will go there. (A reasonably sized wave file is going to be really slow to process without numpy.)

2 Likes

If you wanted a reason not to backport, why didn’t you say so? :slight_smile:

Because I have both kinds of reasons, and still aren’t fully convinced which one I want to use :wink:

I’m sympathetic to the “it always ought to have worked” position on this one. But maybe the “why does this file work here (3.11.0) but not there (3.11.1)” issue outweighs that…

But that would argue against fixing any bugs in bugfix releases… :slight_smile:

If you had backported this I wouldn’t have challenged it. And telling people the fix for some problem they have is to upgrade to a new bugfix release is much easier than to tell them to upgrade to a new feature release.

3 Likes

If the filenames look the same, I would consider it a bug fix and backport to 3.11.

After chatting with Eric, have decided not to backport. Users who hit this prior to 3.12 can use pedalboard, which has much better file support anyway.

2 Likes

Interesting that the arguments against the backport are essentially arguments for the deprecation of the whole module, and could easily have been taken as arguments for rejecting the patch in the first place. I’m sure this is not the first time such a thing has happened – wasn’t the wave module on the initial list of modules to deprecate and delete? It seems to be in a somewhat precarious state.

The current wave module doc says that it works with uncompressed .wav files that are readable or writable but not both. There is no mention of requiring a particular internal header format for such files, which I suspect many users would not know about. (What error message is given when the ‘wrong’ one is encountered.)

gh-96777 does not change the doc, as it brings the code into conformance with the doc as written. This is normally called a bugfix. I know nothing of when the more flexible header was introduced and its occurrence in the wild, but If its support is considered an enhancement, then perhaps the 3.11/10 docs should document its non-support.

To me, 3rd party modules are not especially relevant to out patch classification.

4 Likes

This seems like a pretty good rule of thumb. I’d missed that the docs already said “uncompressed”, so if that equates to WAVE_FORMAT_PCM, then it should be backported.

However, I think if we’ve missed a range of uncompressed formats, it’s probably better to just update the older docs to clarify that only simple PCM formats are supported. One example that isn’t supported is WAVE_FORMAT_IEEE_FLOAT floating-point data, which is also uncompressed.[1]

So it seems like the right time to draw the line on what additional formats would be backported, and if it’s a narrow line, then we should stop before the most recent one. (I’d certainly not be opposed to someone taking up the module and adding all the formats they can, though only for new releases.)


  1. Interestingly, the audioop module includes decompression functions for typical compressed wave files, even though wave won’t open them. ↩︎

A discrepancy between code and doc can be fixed by revising either. I agree here with the simpler solution of revising the doc to only promise what the code does (I would include main). Then add a doc patch to your PR that includes a New in 3.12 note. I agree than any further format additions should be 3.x.0 releases, with doc revision and a ‘New in’ note.

2 Likes

Joyfully,[1] the extra attention helped me find a bug in the patch. So now 3.12 gets a bugfix and 3.11 and 3.10 get doc clarifications.


  1. This is sarcasm :slight_smile: ↩︎

2 Likes