What's the criteria for sorting Set Types?

Hi,

First of all, please understand that I wrote it using a translator.

I’m using Python version 3.8.8

setTest1 = {'a','b','c',4}
setTest2 = {'1','2','3','4'}
setTest3 = {1,2,3,4}
setTest4 = set(['1',2,3,4])

print(setTest1)   # unordered
print(setTest2)   # unordered
print(setTest3)   # ordered
print(setTest4)   # unordered

Why is it sorted only when there is an integr value?

Please guide me and let me know if I am missing anything. Thanks in advance and I look forward to hearing from you.

Best Regards,

Jacob

Hi Jacob,

sets are unordered, meaning the elements do not have any predictable order, at all.

When you print a set, or iterate through one, yes, the items will appear in some order. This order may, by chance, correspond to some ‘original’ order, or it may, by chance, appear to be sorted. However, it can be different for different interpreters, for different versions of the same interpreter, or even for the same set declared two different ways.

Python 3.8.12 (9ef55f6fc369, Oct 24 2021, 20:12:27)
[PyPy 7.3.7 with MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>> {1,2,3,4}
{1, 2, 3, 4}
>>>> {4,3,2,1}
{4, 3, 2, 1}
>>>> {4,3,2,1} == {1,2,3,4}
True
>>>>

versus

Python 3.10.0 (tags/v3.10.0:b494f59, Oct  4 2021, 19:00:18) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> {1,2,3,4}
{1, 2, 3, 4}
>>> {4,3,2,1}
{1, 2, 3, 4}
>>> {4,3,2,1} == {1,2,3,4}
True
>>>

Do not rely on the order sets present themselves in.

Thank you for the answer.

I know there’s no order. But

When I printed out 50 times,
A set containing only an integer has an input value as it is, and a character containing a character cannot be sorted according to the characteristics of the set.
It doesn’t seem to be a coincidence.

It’s weird that there’s no order.

setTest3 = {5,1,6,2,3,4,8,7}
pritnt(setTest3)  # {1,2,3,4,5,6,7,8}

When the code above is executed, you can see that it is sorted in order, and no matter how many times you run it, it is sorted.

I’m teaching students, so I want to know clearly, but I don’t want to tell you that it’s a Python version difference.
We’re developers who need to be sure of the cause and effect.

Please guide me and let me know if I am missing anything. Thanks in advance and I look forward to hearing from you.

Best Regards,

Jacob

Iteration order is not part of set's API, and it can change between releases, or between different interpreter implementations. It should be regarded as an implementation detail and not relied on. It’s not a “coincidence”, but in terms of what the language guarantees, it might as well be a coincidence. In fact, in Python 3.5, dictionary/set order was randomized so that people wouldn’t rely on order. set is implemented as a hash table, which does not have a good notion of iteration order. Note that mathematical sets do not have a defined order either.

If you want to see the current CPython 3.11 implementation details (warning: subject to change!), you can look at the implementation of set.add here. The related implementation details of dicts are here.
This is NOT essential reading, and you shouldn’t change how you write code based on these implementation details.

There’s also the implementation detail of how hashes are computed. An integer’s hash is its value (if that is in the correct range):

>>> hash(17)
17
>>> hash(9999)
9999

On the other hand, string hashing is randomized each time the interpreter starts (see PEP 456):

> py -c "print(hash('python'))"
-6583207031788558455
> py -c "print(hash('python'))"
7250536513776729053
> py -c "print(hash('python'))"
-2177859005664806641

Again, no one should care about iteration order – it’s not a set’s job to keep an order. If you download Python 3.12 in the future and everything is completely different and your code breaks because it was relying on order, that code had always been incorrect. I think teaching the difference between public API versus implementation details is a better lesson for students than being able to explain the micro-details of why you see exactly the output you do.

But if you want a little hint about why it generally seems that orders of sets of strings are “more random” than the orders of sets of ints, string hash randomization is probably why.

Also note that even sets of ints can be ordered in unexpected ways:

>>> {1, 2, 3}
{1, 2, 3}
>>> {7, 8, 9}
{8, 9, 7}
>>> {10, 20, 30}
{10, 20, 30}
>>> {10, 20, 30, 40, 50, 60}
{50, 20, 40, 10, 60, 30}

Thank you for your detailed answer.
Since they are new students, I think this kind of question will be reduced if they ask a lot of wrong questions and explain them based on their answers.
Once again, thank you for taking the time to give us a detailed answer.

Here’s an answer you can hopefully give to students.

That is not the case (as Dennis pointed out).

>>> {1024, 0, 1, 2, -1, -2, -3, -4}
{1024, 0, 2, 1, -1, -4, -3, -2}

The small positive integers sometimes appear sorted, because that way it is easier/faster for this particular version of Python to handle sets. But it is not something you can rely on.

I don’t think it’s a wrong question!
It may be a question that you can’t fully answer right now, because the students need to understand other things first. If someone wants to find out the answer, they’ll learn a lot about computer science along the way.

Thank you for answering.
Something was missing while explaining to the students yesterday.
Based on your answer, I’ll explain more during today’s class.
As you said, I should give hope that I can take a step further by asking and wondering students.
Thank you for your help with your busy time.

Hello, @Jacob83, and welcome to the Python Forum!

When learning about how particular features of Python, or any programming language, are implemented, it is helpful to students for them to think about some of the concepts behind why they are implemented in a specific manner.

For the current topic, students should consider the difference between how strings and integers are sorted. In general, when we sort integers, we put them in numerical order, and when we sort strings, we put them in lexicographic order. When we have a mixture of strings and integers, how should we sort them? This might be a good question for students to think about and discuss. For example, should we then consider the integers in the mixture as if they were strings, and sort everything in lexicographic order, or should we consider order, for a mixture of integers and strings to be undefined? It is reasonable for students to have opinions about this. These opinions could form the basis of a good discussion

This is an important general concept. It is good for students to think about the distinction between what an object is, and how that object is presented.