Add to random.shuffle, allowing shuffle of dict key/value pairs

brass75 · September 9, 2024, 7:04pm

I fixed Zorro… Didn’t bother to post it though…

>>> dict(sorted(zip(random.sample(list(alpha.keys()), len(alpha)), alpha.values()), key=lambda x: x[0]))
{'A': 'W', 'B': 'Y', 'C': 'B', 'D': 'H', 'E': 'C', 'F': 'V', 'G': 'D', 'H': 'Q', 'I': 'F', 'J': 'A', 'K': 'N', 'L': 'J', 'M': 'E', 'N': 'M', 'O': 'U', 'P': 'L', 'Q': 'S', 'R': 'P', 'S': 'G', 'T': 'O', 'U': 'X', 'V': 'R', 'W': 'I', 'X': 'Z', 'Y': 'T', 'Z': 'K'}

(I think I saw someone post this - or something close enough - which is why I didn’t bother posting it. Still a one liner.)

avi.gross · September 9, 2024, 7:05pm

Paul,

I hesitate to ask if you tested that code!

Well, since the function you use, does not exist yet, maybe not.

But as I read it, you are scrambling the keys against random new values.

It might make more sense to shuffle the items as we have been doing.

jamestwebber · September 9, 2024, 7:06pm

That has been the goal for the entire thread. Also, you’re responding to Oscar, not Paul.

steven.rumbalski · September 9, 2024, 7:11pm

Because that’s how Guido wanted it. Here’s his justification on Python-Dev in 2003:

I’d like to explain once more why I’m so adamant that sort() shouldn’t
return ‘self’.

This comes from a coding style (popular in various other languages, I
believe especially Lisp revels in it) where a series of side effects
on a single object can be chained like this:
x.compress().chop(y).sort(z)
which would be the same as
x.compress()
x.chop(y)
x.sort(z)
I find the chaining form a threat to readability; it requires that the
reader must be intimately familiar with each of the methods. The
second form makes it clear that each of these calls acts on the same
object, and so even if you don’t know the class and its methods very
well, you can understand that the second and third call are applied to
x (and that all calls are made for their side-effects), and not to
something else.

I’d like to reserve chaining for operations that return new values,
like string processing operations:
y = x.rstrip("\n").split(":").lower()
There are a few standard library modules that encourage chaining of
side-effect calls (pstat comes to mind). There shouldn’t be any new
ones; pstat slipped through my filter when it was weak.

oscarbenjamin · September 9, 2024, 7:21pm

I know I just don’t agree. Sometimes it is best just to have the right function with the right name and the right signature even if it is just an alias for some special case of another function. I don’t think that sample(stuff, k=len(stuff)) is at all clear about intent compared to shuffled(stuff). Also shuffle is just awkward: it seems like the right tool but has been designed in a way that isn’t usually what you actually want in practice.

In any case I answered with shuffled because all answers to Avi’s question seemed to be missing the real question: it is not “why does the in-place shuffle function not return anything?” but rather “why do we have an in-place function when a function that returns a new result would usually be better?”.

brass75 · September 9, 2024, 7:35pm

I would support the addition of random.shuffled that returns a new object without modifying the existing one (similar to sorted and sort.) That would be useful as there are many times where the original order should be preserved.

avi.gross · September 9, 2024, 8:07pm

James,

Sorry if I had the wrong attribution. I do not get the same impression you did.

There are several scenarios here for the use of a dictionary where you want the items sorted. For many uses you want key/value pairs kept intact. An example might be pairs like {"Queen_of_Spades" : 10} that provide a point value for a card, or {"Queen" : "Hearts"} that identify what kind of Queen. These would necessarily shuffle as an item.

What you are using sounds more like combinations as in {"Card 1" : 5, "Card 2" : 10, "Card 3" : 2} which may represent a deck of cards where you declare this is Card 2 with a value you want shuffled.

There are likely many scenarios you can come up with and each may have their own problem to solve.

I view a dictionary as isomorphic to a list of tuples or more specifically a list of lists of size two. The sorting operation on such a list, or the shuffling, could normally be when considering the inner lists as the atoms to move around. Or, using a more advanced sort or shuffle, you could shuffle based on the first item in each or on the second, or any other functions.

But with such lists, you can do more such as add additonal things. So perhaps a closer analogue would be comparing a dict to some variety of dataframe in which additional columns cannot be added. When sorting a dataframe, though, you can indeed sort in various ways.

A serious question is what order is meant to be used for. A list can have a .pop() method while I note a dict normally has a .popitem() method. But can a tuple have anything like a pop as it is immutable? Yet, it has an order if fed into a loop where removing is not needed.
Dictionaries that are mutable do have both abilities to be iterated over in some order as well as to have an item removed in a particular order. I am not sure if sets are ordered.

jamestwebber · September 9, 2024, 8:13pm

The OP described using this for a cipher–i.e, swapping the key-value pairs around. They also posted the example where random.shuffle happens to work on a dictionary when the keys are sequential integers from 0. In their example you can see that doing this on a three-element dictionary has keys and value swapped around. So yeah, that was the initial idea.

tjreedy · September 9, 2024, 11:47pm

If one insists on one line, I find the following easier to read.

random.shuffle(powerful); square = powerful.pop()

avi.gross · September 9, 2024, 11:52pm

James,

I went back and re-read the plain text message and this is what I saw:

The idea is to modify random.shuffle to shuffle dictionary’s key/value pairs even if the keys don’t happen to be a range of integers.

My impression at that time was that key/value pairs would remain pairs but that asking to display the dictionary would show those pairs in some randomized order. Further, asking to view all keys and/or values would be shown n the new order and removing a value would maintain the order and popping an item, ditto.

Perhaps I misread it but I have trouble imagining it as a reasonable request for making random.shuffle also shuffle dictionaries analogously to how it shuffles lists. This would in no way be analogous in my mind as rearranging a list containing inner lists is normally done by preserving the inner lists albeit in a new order.

And, I was not paying total attention to the OP and their request but entered the conversation a bit later.

I am looking at the code in the rest of the message now. Indeed, for numeric keys it is doing a shuffle of values while leaving keys in the same order. I am not willing to consider this an expected behavior.

So, I looked for the source code for the function:

>>> import inspect
>>> import random
>>> print(inspect.getsource(random.shuffle))
    def shuffle(self, x):
        """Shuffle list x in place, and return None."""

        randbelow = self._randbelow
        for i in reversed(range(1, len(x))):
            # pick an element in x[:i+1] with which to exchange x[i]
            j = randbelow(i + 1)
            x[i], x[j] = x[j], x[i]

The comment suggests it is expecting a list. I do not see it taking a dict and converting it but it does look like it just happens to be able to use the hooks inside a dict to address keys that are numeric as in a[2] happens to match saying give me the value stored in the key of 2. That is almost a coincidence and it does indeed scramble the values leaving the keys in place.

But since the code uses an i and a j that are numeric indexes, of course, this cannot be expected to work for dictionaries containing almost anything else than the kind of sample that happened to work and I continue to suggest this is not what I would want as a result. I would expect items to be swapped, not just values.

Looking further at the code he uses, yes, this is indeed a result he wants, albeit one just as easily done using other tools and then making it into a dictionary using methods suggested including especially the one I thought was wrong when it seems I was looking at the wrong problem.

To make his simple non-Zorro alphabetic cypher is fairly straightforward with something like this that makes the dict at the end:

import string
import random
alf1 = list(string.ascii_lowercase[:26])
alf2 = list(string.ascii_lowercase[:26])
random.shuffle(alf2)

coder = dict(zip(alf1, alf2))

The contents of coder are:

>>> coder
{'a': 'd', 'b': 'k', 'c': 'h', 'd': 'g', 'e': 'b', 'f': 'm', 'g': 'u', 'h': 'i', 'i': 'l', 'j': 't', 'k': 'e', 'l': 'f', 'm': 'n', 'n': 'a', 'o': 'y', 'p': 'p', 'q': 's', 'r': 'z', 's': 'q', 't': 'x', 'u': 'w', 'v': 'o', 'w': 'c', 'x': 'r', 'y': 'v', 'z': 'j'}

And should have the keys in alphabetical order followed by scrambled values.

If this was the expected result, and you wanted to extend random.shuffle() so it checks the class of what it is working on, it likely could replace the internal dictionary using a variant of the above (or something way better) as my alf1 would be the keys and alf2 would be the initial values.

But this entire discussion now seems to rest on a coincidence that perhaps made the OP ask why that fluke was not working everywhere. The real problem is that it should not have worked at all if the code insisted on refusing to work on dictionaries and returned some kind of error signal.

Perhaps a better request would be to extend a class like dict, or some subclass, to support one or more methods that could be called on a dict. One would perhaps scramble just the values. Another might scramble intact key/value pairs.

Objects are mainly supposed to be a way to encapsulate being able to do something to themselves, rather than having external functions be able to handle almost any object handed to them. That may have exceptions and some of that handling is done delicately by adjusting dunder methods to help the other function.

I apologize, again, if my misunderstanding led us down other paths. I could delete or edit messages, if that was appropriate, but then some replies may not seem quite right.

I also note a subtlety. Objects often have a choice on how and when they actually do something. As an example, if I add things to some implementation of a dict, could it choose to buffer the additions until it had say 5 or 10 by perhaps having a secondary storage area such as a list or dict and only periodically merge the results in? As long as it handled the searches and other needs, would it matter?

I can envision many other such things that may be done for speed or other considerations that are all okay as long as the external interfaces look the same. As an example, the old implementation of an ordered dictionary relied on a secondary way to index the keys and you can imagine the current implementation of dict being lazy about re-ordering in a way similar to this. Some implementations may never have to re-order unless asked to display the results.

avi.gross · September 10, 2024, 1:48am

Terry Jan Reedy:

avi.gross:
square = random.shuffle(powerful, ReturnOrig=True).pop()
If one insists on one line, I find the following easier to read.
random.shuffle(powerful); square = powerful.pop()

Terry,

I agree that many things can be brief but my goal was far from a one-liner so I had other considerations on what was an example of what I call pipelining.

I do lots of programming in languages like R with much more interesting ways of pipelining something step by step. What is important is the ability for each step to take in some form of data and emit some other form of data so that code using the new pipe symbol of |> in R can easily model a logical flow. As an example, I often write code using the dplyr package that looks like this:

mydata |>
    select(columns) |>
    filter(conditions) |>
    mutate(new=f(old), exist=g(exist)) |>
    group_by(criteria) |>
    arrange(order) |>
    summarize(details) |>
    t() |>
    print() |>
    ggplot(args) +
      geom_line(args) +
      ... +
      labs(...)

The details are not important but the workflow is. The above might take a dataframe with rows and columns and feed that to be the new first argument of a function that selects which columns to keep. The now narrower dataframe is the output that is the new first argument of a function that applies one or more arbitrary conditions and only keeps rows that met the condition. One condition might be that some column on that row is greater than 65 and another is that some other column is not containing the NA (not available) value. It can be quite complex and what comes out is potentially a shorter dataframe.

The next part can mutate the data such as making a new column with values such as functions applied to any existing group of columns, such as adding together the scores of three columns with test scores or making something upper case. It can also change existing columns or drop a column. What comes out may have more columns. But for some operations, we need to group the data by some conditions such as when we have a categorical variable or two. We may then want to sort each one and make some kind of report with one row per group. Then I may want to transpose the data and print a copy on the screen while also passing on the data to a plotting utility which uses the data and uses another sort-of pipelining method that was older and uses a plus sign that keeps adding to a data structure to define variables and layers and finally, at the end, it is implicitly printed which in effect makes an image pop up with a graph.

I have made way longer pipelines like this including adding in database joins and it is easy to reason and program this way and stop monkeying with making lots of temporary variables.

Back to python, the ability to chain changes to objects often looks like this.

result = object.f(...).g(...).h(...)

I sometimes see it spread on multiple lines using various tricks. And the underlying object can keep changing as long as the next function call is supported as a method in that kind of object. You can start with something like an input() statements that returns a string and follow that with a request to change to all lower case and then break it into words and then filter the words such as to get a count and eventually take the list of tuples and make a dictionary from it and, if it was a built-in, even randomize the order!

As with much of what I write, my examples are examples and not always realistic. But my original point was showing code that might be able to be pipelined in this way. Anywhere along the line the pipe can break if the function invoked changes an object but returns Null rather than the object.

That does not make such a function bad, at least for other uses. It just does not meet my criteria and I might search for a different one or make my own but only if I feel it is worth it. Often, it isn’t and I can make a few smaller pipelines or none at all.

And, FYI, I hesitate to accept a line of code with a semicolon in it as a one-liner!

avi.gross · September 11, 2024, 2:18am

I have a cautious observation.

Has anyone notice that @xxxxme is brand new and has never replied to anything many of us has written here.

Instead, quite a few of us has discussed this and I have no problem with that. But looking at the profile, they have not posted or done anything else.

They could be a newbie. However, I have seen people post in other forums with questions and not follow up. From what I hear, some of those have been done to set up a discussion and sit back and watch.

In this case, I note that when I looked at the source code, I saw what looked like a setup. I mean the one and only case where calling random.shuffle() with a dict argument that does not result in an error is one using a dictionary with only numeric keys starting with 1. Any other case, justifiably fails. What it does even with keys like 1:3, in my opinion, is a flaw. If the code simply checked if the argument was suitable, meaning of class list or something with the right dunder variables to handle proper numeric indexing and was mutable, then it would always generate an error if called with a dict.

Most of the discussion, including some of my messages, focused more on what a proposal to also support dictionaries might mean. That is fine but in a sense, there was a false premise here that suggested it already partially worked so why does it fail. It should not work at all now. And, if it were to be made to work, it likely would be done wit additional code we have discussed that would not be indexing numerically.

I invite the poster to at least follow up here and let us know if any of this satisfied them. Generally, that would be considered a polite way to work with others here.

This topic did make me think, of all things, of JavaScript. When I last looked at it, they had an interesting concept where a main data structure was a sort of hash that in some ways resembled the python dictionary but with some twists. If you used numeric keys like 1 and 666, it might treat the object more like an array or list indexable so you could say object[1] or similar notation. But, since it was not really an array, interesting things could happen as in being able to leave missing indices or having it renumber things if stuff was change while it had to ignore any entries that were not numeric.

The reason that comes to mind, is that the example we were shown that “worked” did look exactly like the hash object in JavaScript if you intended to use it as an array and happened to be valid, if not intended, in the code. But it would break if any of the keys was not numeric or perhaps in many other cases like skipped indices.

So, is the original request really a good request if based on some odd premises? Is it worth debating what the proper meaning of shuffling a dictionary might be? And, as many pointed out, perhaps many solutions already exist with most not needing or wanting to replace the contents in-place.

But as noted, why debate when the OP is not showing interest?

jamestwebber · September 11, 2024, 2:45am

People often bail on a discussion when people don’t show enthusiasm for their idea. Especially when someone keeps posting whole novels as responses

avi.gross · September 11, 2024, 3:34am

Just FYI, James, I can be short.

I will take a hint.

xxxxme · September 13, 2024, 10:18pm

Thanks for your replies. I am probably best classed as a Python (and general coding) newbie. As you probably gathered, I was hoping random.Random(‘key’).shuffle() would provide a repeatable way to shuffle values to different keys in a dictionary. Thanks for your suggested solutions.
It might be useful for people in my case if (as suggested in someone’s post), random.shuffle would always throw a descriptive error if called with a dictionary instead of working in rare cases and throwing a mystifying one in others.
I haven’t checked back in a while because I assumed the topic was dead after alternate suggestions started to be posted and I found one which worked. I’m going to stop checking back now as I have my answers. Again, thanks for your replies.