At the moment, random.sample from a dict gives a reasonable error message. When running random.choice on a dict, at best you get a confusing KeyError, and at worst you get silently wrong results. Examples:
>>> random.choice({0: 'a'})
'a'
>>> random.choice({1: 'a'})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.10/random.py", line 378, in choice
return seq[self._randbelow(len(seq))]
KeyError: 0
>>> random.choice({'foo': 'bar', **{i: str(10*i) for i in range(10)}})
'90'
>>> random.choice({'foo': 'bar', **{i: str(10*i) for i in range(10)}})
'10'
>>> random.choice({'foo': 'bar', **{i: str(10*i) for i in range(10)}})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.10/random.py", line 378, in choice
return seq[self._randbelow(len(seq))]
KeyError: 10
>>> random.choice({'foo': 'bar', **{i: str(10*i) for i in range(10)}})
'50'
As a first simple suggestion, I propose adding the error handling from random.sample to random.choice. As a reminder on my system random.sampleâs error handling looks like this:
>>> random.sample({}, 1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/random.py", line 466, in sample
raise TypeError("Population must be a sequence. For dicts or sets, use sorted(d).")
TypeError: Population must be a sequence. For dicts or sets, use sorted(d).
I have a proof-of-concept of this change.
Second, converting your dictionary to a list is one way to get a random choice. But it costs O(n) time and space. I have another proof-of-concept to get O(1) time and space random choice for dicts.
My proof-of-concept works as-is with todayâs CPython dicts, but, alas, exploits an undocumented feature in their stable C-API.
There have bne previous suggestions to make random.choice work on dicts, but they were either to do an implicit conversion to a list or to use reservoir sampling. Both implicit conversion to a list and (naive) reservoir sampling take O(n) time.
So I agree with the rejection of these issues. Both because of the reasons mentioned in their tickets, but also because you can easily implement your own versions of these functions that do either implicit conversion or reservoir sampling in a library. No change to the standard library or dict implementation is necessary.
My suggestion is to enable random choice from a dict in O(1) time. Without any conversions, neither implicit nor explicit.
(To be completely clear, my proof-of-concept needs some polish to deal properly with very sparse dicts, ie dicts that used to have lots and lots of entries but recently almost all of them got popped.)