Add a dict.sort() method

You are right. We may be implement inplace sort with some sort of lock like list.sort().

But as far as I read the list.sort() implementation, I don’t want to implement&maintain it for dict.
Since internal representation of dict and list are totally different, we can not reuse most part of list.sort().

I’m sorry, I meant __eq__.
Anyway, we can make dict empty during sort, like list.sort() does.

Marco, when people ask you “in what applications would someone need to
sort a dict?” the correct answer is not a link to a Google link. The
correct answer is to actually answer the question by explaining what
sort of applications need to sort dicts.

StackOverflow questions rarely focus on the Why, only the How. For
example, this link:

“I have a dictionary of values read from two fields in a database: a
string field and a numeric field. The string field is unique, so that is
the key of the dictionary. I can sort on the keys, but how can I sort
based on the values?”

There is no “Why” given. So this link is 100% totally, completely
useless to answer the question “What sort of applications need to sort
dicts?”

There has been little interest in this proposed functionality except
from you. Nobody reading this thread has said “Wow, what a great idea,
I’m going to write a PEP, prepare a patch, and get it added to the
language!” So if you want this, you are going to have to do the work
to convince people that it is a good idea.

Expecting them to hunt through pages and pages of google links and
questions on Stackoverflow will convince nobody.

So decide how much you care about this:

  1. Enough to work hard to get it.

  2. Not that much.

If the answer is 2, then I think this thread is all played out. You’ve
made a reasonable case that this could maybe be faster than sort and
update, but convinced nobody that the feature is useful enough to
bother. So you have two solutions:

def sorted_dict(adict, key=None):
    return dict(sorted(dict.items(), key=key))

def sort_dict_inplace(adict, key=None):
    # Needs locking to be thread safe.
    items = sorted(adict.items(), key=key)
    adict.clear()
    adict.update(items)

Or split out the code for dicts into your own private extension class
and make the changes to that.

Be happy, and move on.

But if the answer is 1 then:

  • Move this discussion to the Python-Ideas mailing list, where
    there is a bigger audience.

  • I recommend you actually write up a short but complete summary.
    Don’t just point people to this thread and tell them to read it.

  • Start studying successful PEPs to see what you will need to do.

  • Prepare to answer the technical objections. Even if you can’t
    code up a patch yourself, you need to at least give a plausible
    algorithm for how it might be done.

  • Remember that this has to work with shared key dicts as well as
    regular non-shared keys.

  • Justify why this is expected to be faster.

  • Justify why you need this to be faster. Prove your case by
    profiling your application and finding that sorting the dicts
    is a significant bottleneck in your code.

  • Justify the “Why?” questions by doing a survey of other people’s
    code and showing where they sort dicts. If you can find examples
    in the standard library, that’s even better.

  • Don’t expect others to do the work. Links to google won’t cut it.

  • Prepare to acknowledge and answer objections. Actually answer
    the objections, don’t just give hand-wavy answers.

  • Ask for a sponsor.

  • If you get a sponsor, write the PEP.

  • Wait for the Steering Council to give an answer.

3 Likes

Hello Inada Naoki,

Do you prefer to be called Inada-san in the Japanese style, or Naoki in
the English style? Or something else?

Thank you.

Thanks. That link mostly addresses how to sort a dict (which appears to be a popular question). I’m more after the “why” someone would want to do this, so concrete examples would be helpful.

If thousands of people that asks how to sort a dict does not convince you, it’s a really big problem and I really don’t have a remedy. Eric Smith said “Unless there’s a demonstrable and widespread need”. I think I demonstrated that there’s a widespread need.

1, but it’s useless, since Inada, the main maintainer of dict, already said he do not want to maintain a sort() in dict.

PS: @methane it seems list.sort() do a backup of the pointer of the list and set the pointer of the list to NULL. So the list appears empty while sorting, and the sort is done to the original object. After finished, the list is restored. It seems there’s no lock, and in this way it seems you don’t need it: why locking an empty list?

dict could do the same, backing up the ma_keys, sorting the backed up dk_entries, reconstruct the entire hashtable and then restore ma_keys. But if you have no intention to have this type of code in dict, I suppose the discussion is finished :slight_smile:

@methane There’s also other two solutions:

  1. convince list maintainers to move the common code of list_sort_impl and much of the functions called by it in a separate library. This way the code to maintain is shared among dict and list maintainers.

  2. create an intermediate list of PyDictEntry, more or less the equivalent of list(dict.items()). This will have less performance but it’s more maintainable, and anyway do not require the creation of an another dict.

No, you demonstrated that there are a lot of questions about how to sort a dict. There could be lots of reasons for asking that question:

  1. The user is misguided about what “sorting a dictionary” means, and thinks it’s what they want when it isn’t.
  2. The user wants to sort a dictionary, but for whatever reason doing so wouldn’t actually solve their problem.
  3. The user doesn’t actually have a problem to solve, they are just curious. Or they are asking as a theoretical question. Or they don’t know that dictionaries are ordered (or that OrderedDict exists) and think they want to sort when actually a dictionary’s natural ordering is what they need.
  4. The user actually needs a different data structure, not a dict at all.
  5. Or any one of many other reasons.

Someone needs to go through those questions and work out how many are “unrelated” examples like I suggest above, and how many are actually cases where the user has a genuine, real-world problem, and the correct solution to that problem is to sort a dictionary, and dict(sorted(my_dict.items())) is unacceptably slow, and the problem is small enough that the O(n) runtime of the dictionary build dominates the O(nlogn) sort time¹. And there need to be lots of such cases, not just a couple.

That’s a big ask, yes. But you claimed to be willing to work hard for this proposal. There’s a chunk of work you can do, and who knows, it might even convince people (if the facts come out the way you clearly expect them to). Otherwise, as @steven.daprano said, I think this discussion has nowhere left to go.

¹ Yes, I did some timings, to confirm that building a dictionary scales linearly whereas sorting is O(nlogn).

3 Likes

There are thousands of people asking any sort of questions “on the Internet”. I don’t think that this elevates all these things as legitimate problems that need attention and a solution. For example, there seem to be a lot of people asking how to compute the cubic root of a number using Python, even blog posts dedicated to the issue. I don’t think that this justifies adding a math.cubicroot() function.

Anyhow, I am under the impression that most often, when people ask to sort a dictionary, what they really asking for is for a method to iterate a dictionary in a specific order. This is already trivial. It is also already possible to sort dictionaries, this has been covered a dozen times in the thread already. You are not proposing to add a method to the dict class that implements the sorting in place. The only advantage is that it could be a bit faster, maybe. I don’t think that if dict gains a sort() method there would be a drastic reduction of the number of people that asks how to sort a Python dictionary on StackOverflow.

Python existed for decades without an order dictionary class in the standard library. When an ordered dictionary class was added it was used only sparingly. Only recently dictionaries have a guarantee for preserving being insertion order and AFAIK they gained this property only accidentally, not because it was deemed essential. In this context I find it hard to believe that there are many applications that would benefit from a potential improvement in the performance of dictionary sorting. Remember: you are not proposing to add the capability to sort dictionaries, it is already there, what you are proposing is to add a convenience method to the dict class, which, indeed adds some convenience and maybe could allow for some optimization, but would not add any new capability to the language.

Cheers,
Dan

I don’t mind, but please call me Inada, or Inada-san.

I know. And I call it “some sort of lock” because it looks similar to optimistic locking.

Yes. That’s exactly what I thought.

I don’t against it. It is OK.

I don’t want to implement another sort logic in dict. We can not share implementation with list because internal structure is totally different.

2 Likes

Curiously, some time ago Ethan Furman discussed on Python Ideas the introduction of a method in Enum that check if the Enum has a member without raising and exception. He said that it was a question asked a lot on StackOverflow. Guido van Rossum asked about the SO questions, Ethan linked them and nobody had raised some doubts about the widespread of the necessity.

Anyway, I’ve done a little experiment. I implemented a naive version of sort() for dict(). I have not yet implemented a mergesort for dict, so the speedup is not extraordinary, but also not negligible:

(venv) marco@buzz:~/sources/cpython_new_build_optimized$ sudo pyperf system tune
[sudo] password for marco: 
Tune the system configuration to run benchmarks

Actions
=======

CPU Frequency: Minimum frequency of CPU 0-3 set to the maximum frequency

System state
============

CPU: use 4 logical CPUs: 0-3
Perf event: Maximum sample rate: 1 per second
ASLR: Full randomization
Linux scheduler: No CPU is isolated
CPU Frequency: 0-3=min=max=2600 MHz
CPU scaling governor (intel_pstate): performance
Turbo Boost (intel_pstate): Turbo Boost disabled
IRQ affinity: irqbalance service: inactive
IRQ affinity: Default IRQ affinity: CPU 0-3
IRQ affinity: IRQ affinity: IRQ 0-17,51,67,120-131=CPU 0-3
Power supply: the power cable is plugged

Advices
=======

Linux scheduler: Use isolcpus=<cpu list> kernel parameter to isolate CPUs
Linux scheduler: Use rcu_nocbs=<cpu list> kernel parameter (with isolcpus) to not schedule RCU on isolated CPUs

(venv) marco@buzz:~/sources/cpython_new_build_optimized$ pyperf timeit --rigorous -s "d = {k:k for k in range(16)}" "dict(sorted(d.items()))"
.........................................
Mean +- std dev: 1.85 us +- 0.03 us
(venv) marco@buzz:~/sources/cpython_new_build_optimized$ pyperf timeit --rigorous -s "d = {k:k for k in range(16)}" "d.sort()"
.........................................
Mean +- std dev: 1.10 us +- 0.01 us
(venv) marco@buzz:~/sources/cpython_new_build_optimized$ pyperf timeit --rigorous -s "d = {k:k for k in range(1000)}" "dict(sorted(d.items()))"
.........................................
Mean +- std dev: 86.0 us +- 0.6 us
(venv) marco@buzz:~/sources/cpython_new_build_optimized$ pyperf timeit --rigorous -s "d = {k:k for k in range(1000)}" "d.sort()"
.........................................
Mean +- std dev: 80.7 us +- 1.0 us
(venv) marco@buzz:~/sources/cpython_new_build_optimized$ pyperf timeit --rigorous -s "d = {k:k for k in range(16, -1, -1)}" "dict(sorted(d.items()))"
.........................................
Mean +- std dev: 1.92 us +- 0.02 us
(venv) marco@buzz:~/sources/cpython_new_build_optimized$ pyperf timeit --rigorous -s "d = {k:k for k in range(16, -1, -1)}" "d.sort()"
.........................................
Mean +- std dev: 1.17 us +- 0.01 us
(venv) marco@buzz:~/sources/cpython_new_build_optimized$ pyperf timeit --rigorous -s "d = {k:k for k in range(1000, -1, -1)}" "dict(sorted(d.items()))"
.........................................
Mean +- std dev: 86.4 us +- 0.9 us
(venv) marco@buzz:~/sources/cpython_new_build_optimized$ pyperf timeit --rigorous -s "d = {k:k for k in range(1000, -1, -1)}" "d.sort()"
.........................................
Mean +- std dev: 81.0 us +- 1.1 us

(venv) marco@buzz:~/sources/cpython_new_build_optimized$ python
Python 3.11.0a0 (heads/dict_sort-dirty:fa106a685c, Jun  5 2021, 19:59:25) [GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> d = {k:k for k in range(16, -1, -1)}
>>> d
{16: 16, 15: 15, 14: 14, 13: 13, 12: 12, 11: 11, 10: 10, 9: 9, 8: 8, 7: 7, 6: 6, 5: 5, 4: 4, 3: 3, 2: 2, 1: 1, 0: 0}
>>> d.sort()
>>> d
{0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, 10: 10, 11: 11, 12: 12, 13: 13, 14: 14, 15: 15, 16: 16}
>>>

The testing build was an optimized one. Notice that:

  1. the keywords does not work for now. dict is only sorted from lowest to highest without any key function applied, and the sort is done on keys only
  2. the current implementation takes not into account shared dicts or holes in dict
  3. no success/sanity check is done for things like PyList_New

Anyway, I think this is promising. If I’ll be able to copy the mergesort of list and adapt it to dict, I suppose the speedup will be higher.