Store set items in an orderly manner, on a first-come, first-serve basis

Doing it this way doesn’t affect any code. And it would be healthy for the mental health of many. Also, a command could be added specifying the order.

Example:
['es_AR', 'es_MX', 'es_MX', 'en']

How Set works now:
Set(['es_AR', 'es_MX', 'es_MX', 'en'])

Results now:
{'es_MX', 'en', 'es_AR'}

“I’ve never seen a random list so well implemented.”

It should be:
{'es_AR', 'es_MX', 'en'}

I find it hard to imagine someone knowing about this problem and using it to generate something random at such a deep level of the language instead of just using random, so I don’t think it will affect too many codes.

Doing this might affect performance or memory usage. And how would this apply to set operations.

This is an idea, but not really a bug in any way.

If there was no performance issues (memory or runtime) in an implementation: it may be more to consider.

I don’t think this change would be considered if it impacted performance of existing set operations.

2 Likes

My theory is that they relied on a dictionary to distinguish it from the list.

The currently widely adopted workaround is to use dict keys as a set, constructed with the dict.fromkeys method. Since dict keys maintain insertion order, they work like an ordered set.

The problem is that there’s currently no built-in way to perform an ordered set operation. While dict.keys offers a view to the keys as a set to perform set operations, it loses the original insertion order after any set operation.

If additional dict methods suggested in Copy a dictionary, except some keys can be implemented then a dict would be able to truly perform double duty as an ordered set. But until then, you can still use a comprehension over a dict to perform an ordered set operation.

1 Like

One need to understand how data structures differ from each other in the way stored data is arranged in memory.

Python Name Generic Name Mutable Ordered Contiguous Homogeneous Indexable Average Time Complexity
List Dynamic Array Yes Yes No No Yes Access: O(1)
/ Search: O(n)
/ Insert/Delete at end: O(1)*
/ Insert/Delete at beginning: O(n)
Tuple Fixed Array No Yes No No Yes Access: O(1)
/ Search: O(n)
Set Hash Set Yes No No No No Add/Remove/Contains: O(1) average, O(n) worst
/ Iteration: O(n)
Dict Hash Table / Hash Map Yes No** No No Yes*** Access/Insert/Delete: O(1) average, O(n) worst
NumPy Array Multi-dimensional Array Yes Yes Yes Yes Yes Access: O(1)
/ Vectorized operations: O(n) but very fast

*Amortized O(1) for append operations due to occasional resizing.
**As of Python 3.7, dicts maintain insertion order, but this is considered an implementation detail.
***Indexable by keys, not by numeric indices.

Also, important notes : python lists are not ‘linked lists’ but ‘dynamic arrays’. hashing is a way to allocate a unique identifier for identical objects, enabling fast and efficient data retrieval, typically with constant time complexity O(1).

There are other structures of interest (deque, heap, …) but they are more for ‘niche’ usage.
Also a remark, skiplists are not implemented in python, while they could be useful for several performance improvement cases.

→ if you need an ordered data structure, you need something else than a set, this is not depending on how python is implemented.

3 Likes

An important part of a set structure is that it is an unordered collection. I seem to remember from the discussions around adding an order to dict keys in CPython, that it came “for free” with an enhancement to the dict implementation, and, no similar worthwile enhancement would come from changing the set implementation.
Back in the day, we used dict keys, ignoring the values as a set before we got an explicit set type.
Maybe you could do the same to give yourself a set with insertion orderings?

1 Like

@Paddy3118 Interesting… I looked a bit further about the implementation change, I found this quick explanation with drawings quite clear :
https://softwaremaniacs.org/blog/2020/02/05/dicts-ordered/en/

2 Likes

Interesting, the same criteria could be applied to sets and it would make sense.

I clarify that I always refer to the insertion order, not the numerical order or any other meaning !