Adding random.shuffled to the random module (renamed thread)

pf_moore · September 4, 2022, 1:28pm

OK. If you don’t use PyPI then that (somewhat) explains your enthusiasm for adding things to the base language and stdlib. But frankly, you need to adjust your expectations.

PyPI is a fundamental part of the Python ecosystem, and in general, useful functionality should be published on PyPI in the first instance. There are exceptions where things go straight to the stdlib without “prior art” on PyPI, but they are rare and need compelling justification.

Sorry, but pretty much every idea for a new function or library that gets made on this forum needs to have an answer to the question “why not just publish it on PyPI?”

Stefan2 · September 4, 2022, 1:53pm

“maintain”: I don’t think this would need much maintenance (can’t even imagine any):

def shuffled(iterable):
    lst = list(iterable)
    shuffle(lst)
    return lst

“learn”: If this existed, shuffle’s doc could delete its paragraph about using sample(x, k=len(x)), so people wouldn’t really have to learn more but something different.

“confused”: If either is confusing, I’d say it’s sample(x, k=len(x)), not shuffled(x). The latter does exactly what it says in one word, while the former is lengthier, uses the wrong term, and you need to understand that it achieves proper shuffling (something the sample doc devotes a paragraph to and which caused at least one Stack Overflow question).

On another positive, like I said, this would be more efficient (both time and memory) than the sample way with its heavier algorithm behind it.

list.sort (third time you said str.sort, can’t ignore anymore

I find it clear, although I’d write it as sorted(shuffled(scores), key=scores.get). Very similar to something like sorted(sorted(names), key=len). (Maybe I’m biased, judging by Stack Overflow I’m more used to such multisorts than many, as I’m usually the one proposing them).

Gouvernathor · September 4, 2022, 2:01pm

I’m not sure you understood my point on this. I’m not saying single-use functions are objectively bad, I said they were contradictory to my personal sense of readability, goodness and cleanness in code. And that as such, we should avoid the situation where single-use functions are the only alternative a dev has. That’s not the same.
If you want to use defined names instead of comments (I’m broadening your stance a bit), you should be allowed to do so, but I shouldn’t be compelled to.

I don’t see how having a list_then_shuflle_then_sort_by_second_element, aside from a filter_then_shuffle, aside from a list_then_map_using_a_filter_then_group_in_set, all of them single-use, is more readable than self-explanatory function call chains.
Do you you have a reference for python favoring the specific interpretation you’re using of the term “readability” ? And in fact, even in that interpretation of “readability”, how is sorted/random.shuffled not much more readable than list.sort/random.shuffle ?

Because nobody would download a module containing only one function (or one subclass of random.Random), especially if it’s simple enough to be implemented using only existing functions of the random module.
Because using two random modules in parallel - the builtin one and the pypi one - would be troubling, and using only the pypi one would change code that doesn’t need to be changed.
Because people who would have the most use for the new function are people who wouldn’t think of using random.sample as a shuffler, and these people would I think be even less likely to think to look for it on pypi.
Because getting a shuffled copy of a sequence is an evident feature that should not be missing from the random module, especially if it’s simple enough to be implemented using only existing functions of the random module.
Because deterministic pseudo-randomness code is both easy to implement and hard to check, which means the stdlib guarantee of correctness is a very important ingredient, that’s missing in a pypi version (or even a shared and copy/pasted snippet).

Thanks, if you hadn’t said that I would have written it a fourth time.

That’s not valid in 3.11+. Dicts cannot be given in places where a sequence should, so you need the list(). It was missing from the first version of my example.

daniele · September 4, 2022, 2:03pm

What I find baffling in that one-lines, and what I guess Paul and others also find baffling, is not how it is coded, it is the logic that it implements. I guess that what you are trying to do is to work against the stable sort implemented by Python: elements that compare equal are preserved in the same order as in the source sequence. You seem to do not want to preserve the order already present in the sequence. However, using pseudo-random sorting means that, ultimately, you do not care about how tied elements are sorted because a pseudo-random order for them is acceptable. The next logical step is to recognize that the order in which the elements are already in the sequence is just another instantiation of a random sequence, just one that appear to maybe have some structure. Bumping into a line of code that randomizes the order of a sequence and then sorts the sequence I would immediately remove the randomization as useless.

Gouvernathor · September 4, 2022, 2:10pm

No. It means that I actively want to lose the initial order in which they’re given…

…That’s the problem, it does have an initial structure (the insertion order in the dict, athlete names initially sorted alphabetically…), and I specifically want to erase it, that’s what the shuffle is there for. A truly random shuffling would work too in this specific example, yes. But that doesn’t change the issue here.

daniele · September 4, 2022, 2:11pm

Why sort something twice, if you can sort it once: sorted(names, key=lambda x: (len(x), x)) ?

Stefan2 · September 4, 2022, 2:13pm

You’re probably confusing that with using random.sample, which does want a sequence. But I’m using our hypothetical shuffled, which would take any iterable, just like sorted does.

Stefan2 · September 4, 2022, 2:16pm

Because that double-sort is less code and because it’s often significantly faster (and takes less memory) than the more complicated single sort.

Gouvernathor · September 4, 2022, 2:17pm

We’re veering a bit off subject, but the decision taken about the sequence-taking functions of random, all of them including choice and choices, is that they need actual sequences in order for the pseudo-randomization to be reliable and deterministic.

If you want them to re-accept dicts, I think it’s hearable given that the order of dicts is now guaranteed to be stable and insertion-order, but I think that’s a completely separate issue which should be considered in another thread, because it’s not specific to this particular function.

pf_moore · September 4, 2022, 2:27pm

You are wilfully misrepresenting my point here, and doing so in a way which is frankly rather insulting. I’m done arguing with you, and I’d caution you to be careful of how you respond to people you disagree with, as you’re getting fairly close to violating the code of conduct at this point.

Stefan2 · September 4, 2022, 2:30pm

No. It means that I explicitly do care, that I want to remove any bias due to existing order.

For example some raffle at a fair, where people guess the number of marbles in a jar, I write down their guesses and names in a list, at the end of the day I’m giving prizes to the three best guesses and want to give people equal chancees regardless of when during the day they stumbled onto it (not prioritize who happened to stumble on it earlier).

Gouvernathor · September 4, 2022, 2:31pm

I’m not, I’m showing the consequences of what I understand of your stance. That’s what I described earlier as several single-use functions littering the namespace, instead of function call chains. And if that’s not what you argued was still better for reasons due to an interpretation of readability, then either I didn’t understand what you said or you didn’t explain it well enough.

By the way, if that was me willfully misrepresenting your point, then you did exactly that by repeating that I wanted to ban defining single-use functions, when I repeatedly argued for a live-and-let-live philosophy on that point. But I don’t think you did, because I keep a presumption of good faith on your end.

davidism · September 4, 2022, 2:56pm

This discussion has run its course. Please be considerate of maintainer time and perspective when opening a discussion in the Ideas category.