Add to random.shuffle, allowing shuffle of dict key/value pairs

random.shuffle currently does not work on dictionaries (unless the keys happen to be integers in a range starting at 0):

>>> import random
>>> a = {0: 'dead parrot', 1: 'cat named Eric', 2: 'inverted piglet'}
>>> random.shuffle(a)
>>> print(a)
{0: 'cat named Eric', 1: 'dead parrot', 2: 'inverted piglet'}
>>> b = {'parrot': 'dead', 'cat': 'named Eric', 'piglet': 'inverted'}
>>> random.shuffle(b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.12/random.py", line 357, in shuffle
    x[i], x[j] = x[j], x[i]
                 ~^^^
KeyError: 2

The idea is to modify random.shuffle to shuffle dictionary’s key/value pairs even if the keys don’t happen to be a range of integers.
An example program taking advantage of this:

#!/usr/bin/env python

# Read a letter-substitution cipher using your secret code!

alpha = {'A': 'A', 'B': 'B', 'C': 'C', 'D': 'D', 'E': 'E', 'F': 'F', 'G': 'G', 'H': 'H', 'I': 'I', 'J': 'J', 'K': 'K', 'L': 'L', 'M': 'M', 'N': 'N', 'O': 'O', 'P': 'P', 'Q': 'Q', 'R': 'R', 'S': 'S', 'T': 'T', 'U': 'U', 'V': 'V', 'W': 'W', 'X': 'X', 'Y': 'Y', 'Z': 'Z'}

enciphered = input('Enter the secret message to decode: ')
seed = input('Enter your secret code: ')
print('Secret decoder ring activated!')

random.Random(seed).shuffle(alpha)
plain = ''
for letter in enciphered:
   plain += alpha[letter]
print('It says:', plain)

Edit: Fixed a typo in the code.

random.shuffle is documented to work on sequences. I think the fact that it works on dictionaries at all is just a quirk of the implementation[1], not intended behavior.


  1. specifically: it’s swapping pairs of keys indices based on the length of the input, so a dictionary with sequential integer keys happens to work ↩︎

3 Likes

The purpose of random.shuffle is to change the order of a list, in place.
What would that mean for a dict?

1 Like

Also, the desired behavior (swapping around the keys and values of a dictionary) can be implemented with random.sample

shuffled_dict = dict(zip(random.sample(d.keys(), k=len(d)), d.values()))
3 Likes

The ' at the end should be removed.

I was initially surprised that dict a worked, but LOL, it simulates a sequence of length 3 by having keys in range(3). The ability of dicts to simulate a sequence is used in practice to implement sparse arrays, where only a few indexes in a large range have non-default values.

1 Like

In my mind, it should treat a dict as a list that has much more freedom as to what can be an index and keep the keys in place (or not, it doesn’t matter) but shuffle the values, which is what happens with a list.

I would sooner expect random.shuffle to just change the iteration order of a dict than actually break the existing connections between the keys and their values. It might be worth having a dedicated random.shuffle_mapping function that does what you want, but I don’t think overloading random.shuffle is the right thing to do.

7 Likes

You can get the keys and values (or items) of the dictionary, shuffle, and then construct a dictionary from that.

3 Likes

As @Rosuav points out, you can write a quick script that does what you want by simply converting the dictionary to the newish dict_items iterable and converting that to a temporary list as shown below. You then can ask random.shuffle to change the temporary in place and finally point your dictionary variable back to making a new dictionary from that list of tuples that has been scrambled. I believe all recent versions of Python now maintain the dictionary order to be the same as the order you create items in so the shuffling should persist.

Here is the code that can also be used to make a function if you prefer.

# load any modules needed
import random

# For demo, make a demo list showing an order.
mydict = {"first" : 1, "second" : 11, "third" : 111}

# Extract the dictionary items in the
# same order to make a list of tuples of
# (key, value)
temp = list(mydict.items())

# shuffle the temporary list in place
random.shuffle(temp)
# use the temporary list to make a new dictionary
mydict = dict(temp)

Running the code on the demo shows a new order:

>>> import random
>>> mydict = {"first" : 1, "second" : 11, "third" : 111}
>>> 
>>> temp = list(mydict.items())
>>> 
>>> random.shuffle(temp)
>>> 
>>> mydict = dict(temp)
>>> 
>>> print(mydict)
{'third': 111, 'first': 1, 'second': 11}

Just to verify, running it again gets another order: {'first': 1, 'third': 111, 'second': 11} so it is working.

Here is the function version where you copy the result into whatever variable you want, perhaps into the same one for your needs:

def shuffle_dict(mydict):
  """ Accept a dictionary and return a sorted version"""
  import random
  temp = list(mydict.items())
  random.shuffle(temp)
  return(dict(temp))

But there are problems here as in it depends on how python deals with order. I did a test and the dictionary it produces is not in the order I gave it:

>>> import pprint
>>> 
>>> mydict = {
...   "first" : 1, 
...   "second" : 11, 
...   "third" : 111,
...   "forth": 1111,
...   "I take the fifth": 11111}

It seems to be alphabetical order with upper case first!

This is confusing as I am using python version 3.12.4 from June of this year. I thought this had already changed long ago.

So, perhaps if the order matters, you could consider using collections.OrderedDict()

That’s because pprint sorts dictionary keys by default. You can pass sort_dicts=False or use pprint.pp instead of pprint.pprint to preserve the default order.

1 Like

Thank you @bschubert for that clarification. Had I just shown the values without trying to format, I would have been happier. The error was introduced by me and hence the results were the same no matter how it was scrambled!

So, what I wrote as code should work as an example of, easily enough, getting the functionality without asking for a change from the folks who maintain and improve python.

Of course, adding something new instead of modifying random.shuffle() may be an easier way to go and, if needed, you can later create a random.shuffle_many_things()wrapper that tests for the class or some attributes of an object and dispatches the right function to do what you want.

But I hate to ask my next question. Side effects can be irksome in code.

random.shuffle() is designed to shuffle something in place. That may be a good thing for some purposes, but many people (and many languages) suggest doing as much as possible using immutable objects, even if it costs more in terms of time or memory. Python does have frozen versions of some data structures, including tuples or a frozen dictionary and these generally can not be shuffled in place nor should be. You would need to make a new version in the new order.

And what about other data structures? Is it meaningful to shuffle a set. As I inadvertently just showed, the order of presentation may not match the internal order if you do not tell pprint to use the original order.

And is it even meaningful to change the order of a generator that generates one item at a time and may not even be called to completion? I do not see any trivial way to do it other than generating all results (good luck with an infinite generator producing all primes) and then shuffling the list and using it to prime a new generator that is not really generating in the same way.

There has to be a limit on how many different objects you want a function to handle. At the moment, the limit does not include dictionaries for this function and probably quite a few other constructs too. How does one shuffle a matrix or higher dimensional array? We have suggested work-arounds albeit not exactly in-place.

I do note that one variation on dictionaries that I mentioned as an OrderedDict could, in principle allow changes in place. If an implementation of that dictionary included both an embedded dictionary plus an embedded list of keys in the order that various methods should process requests and that handles adding and deleting in rational ways, then that dictionary can be trivially randomized if it supplies a method. As an example, you could ask for the current keys of the dictionary, keep it locked, scramble the list and ask that it replace the old list after checking you did not leave out any or duplicate them and so on and then unlock. Or, as is done in some languages, you merely supply a list or vector of the numbers from 1:N scrambled and the method reorders accordingly from within the object.

So, in a mutable dictionary, in principle, you could extract all current items and delete them all at once in the dictionary then add them back and change it in-place.

But this does not strike me as a common need as other existing methods can get you a result that is similar enough while using a standard dict object.

This is not to disparage the original concern and idea. It is an academic discussion and of course others make the decisions on what to prioritize or even consider.

Just thinking of what shuffle should do to dicts, I think that if the dict is ordered then it should shuffle the key order, but keep the mapping. If the dict is not ordered then it should give an error.

1 Like

Regular dictionaries retain their insertion order. collections.OrderedDict have an order to them that can be manipulated (notably, you can move any key to the end). You could, for example, Fisher-Yates an OrderedDict; but I don’t really see the point, when you could simply take the .items(), shuffle that list, and then reconstruct the dictionary.

1 Like

Irrelevant if you’re shuffling a brand new object.

Chris,

There is an age-old debate with the details varying depending on paradigms in the language.

There are trade-offs between programmer convenience and various ideas about efficiency. Garbage collection makes people stop worrying about having all kinds of semi-duplicates around and the use of objects that hide the implementation details makes it better to use the objects based on their external interface. But I do not want to bring up so many of those ideas except to say that they can and do matter mainly when the objects are huge or the code is written in ways that do things really inefficiently like adding repeatedly to the end of a list in a language where that is expensive, rather than to the front and then reversing it.

In the case we are discussing, a dictionary can take up quite a bit of memory so it can be hashed efficiently. The values can be complex and may require a deep copy if the new dictionary gets a copy instead of a reference. So, if like some languages using immutable objects heavily, adding a million items one at a time to a dictionary can result in creating a million new dictionaries with each a tad larger and then reclaiming lots of memory.

So, I think python has room for people to make alternate implementations in their own modules. But, of course, that means programmers should choose what works and then not call methods that do not apply to their scenario – such as a random.shuffle.

@Rosuav may want to clarify the above.

In languages using immutable objects, I believe that there is an exception at object creation time in that initializing built-in objects often is done in a compact way without multiple versions being copied as you go.

But are you suggesting random.shuffle falls into this category by being called after an object has been created? I am not clear on how the function operates by changing the object, such as a list, internally. But like you, I am not convinced that an internal change is always the best way to go.

Just for a moment, consider a need to sort the contents of something like a list or dictionary. There are literally hundreds of sort algorithms out there and some, like a merge-sort work by taking subsets recursively. Many of these algorithms are made quite efficient if operating on a particular data structure.

But if you want to sort the contents of some other structure and the only hooks are to extract an arbitrary member or add a member to the end, you may be able to do it but might well do it much better as you described by extracting the contents, sorting them the usual external way, and then rebuilding it.

With an immutable object, that may be the main or only way to do it. With mutable objects, there may be a choice.

In playing around with the code, I have a dumb question. Why does random.shuffle normally return None when returning the (changed) object itself might be useful?

That is a very common question which has been answered multiple times. Basically, though it’s simply because that is the convention that has been used in Python for many years (the reasons it is this way are valid, but at this point “because that’s the convention” is probably at least as important if not more).

There’s probably a bunch of background research it’s worth doing to get a better feel for the design principles here, if you’re not aware of this convention. (Sorry, I don’t have time right now to find some useful links for you, but a search of the various Python forums and mailing lists should get some).

I would accept that answer, to a point! LOL!

But this reminds me a bit about some older languages I have used, like PASCAL, where there was a difference between a function and procedure as the latter is all side-effect and returns nothing.

I can think of other languages like R that have a concept of returning a result invisibly from a function so that the REPL does not print the value but yet, if you assign the return value to a variable, it is there.

As far as I know, that is not a concept in Python so the choice of returning a solid Null is understandable. But it does break any chaining.

Has anyone created a workaround as it seems simple enough. I mean a function you can call as a wrapper and give it an object to work on and another function to apply to it and then simply return the object itself?

If the object is indeed changed in place, that might work. Or, if I misunderstand what it means to change in place, maybe not!

It would be easy enough to test but I just moved off topic and will drop it.

I think this will get you what you want:

>>> alpha = {'A': 'A', 'B': 'B', 'C': 'C', 'D': 'D', 'E': 'E', 'F': 'F', 'G': 'G', 'H': 'H', 'I': 'I', 'J': 'J', 'K': 'K', 'L': 'L', 'M': 'M', 'N': 'N', 'O': 'O', 'P': 'P', 'Q': 'Q', 'R': 'R', 'S': 'S', 'T': 'T', 'U': 'U', 'V': 'V', 'W': 'W', 'X': 'X', 'Y': 'Y', 'Z': 'Z'}
>>> {k: v for k in random.sample(list(alpha.keys()), len(alpha)) for v in alpha.values()}
{'G': 'Z', 'F': 'Z', 'L': 'Z', 'J': 'Z', 'T': 'Z', 'N': 'Z', 'H': 'Z', 'P': 'Z', 'I': 'Z', 'S': 'Z', 'M': 'Z', 'K': 'Z', 'U': 'Z', 'E': 'Z', 'B': 'Z', 'X': 'Z', 'O': 'Z', 'V': 'Z', 'C': 'Z', 'W': 'Z', 'Z': 'Z', 'Y': 'Z', 'R': 'Z', 'A': 'Z', 'D': 'Z', 'Q': 'Z'}

Since this can be done in a 1 line comprehension I don’t think there’s really any need to change the functionality in random (either by adding a new function or modifying the existing one) for a use case that is probably not that common - breaking the link between key and value in a dictionary.

Prevention of bugs from users not learning (yet) or forgetting that the function is a procedure, a function with only side-effects, that creates no new objects. As it is, beginners often have trouble with aliases that they create themselves. Common question given someting like the following:

a = [1,2]
b = a
b[0] = 3

Why has a changed? Or even “I discovered a bug!”

4 Likes