More Data Structures

Some languages have lots of data structures (e.g. Java, which has a hashed, linked, and treed/sorted version for essentially data structures). Some of them are (array)lists, dictionaries, heaps, sets, and queues. However, even basic, common data structures such as linked lists, sorted sets, sorted dictionaries, and multisets, aren’t built-in or at least in the standard library. Programs can be slow without these data structures being implemented. When implemented manually, it takes lots of time and can be very complicated to make from scratch, depending on the coder’s skill and experience. Can this please be implemented?

Note: I called hash maps in other languages dictionaries because that’s what they are called in Python.

Each data structure requires both initial and maintenance effort. Core developers are already ‘overbooked’. Python focuses on high quality implementations of the most common. Those used by CPython itself are mostly in builtins. One can find many others on https://pypi.org/.

2 Likes

Can this feature please be on the wait list?

Pranav, I think you’d gain significantly more traction for your proposal(s) if you worked on one or more data structures that you’d like to see added individually. My understanding is that the Python core devs have put lots of thought into the specific data structures that currently exist, how they’re implemented, and why they’ve chosen to only include those that they have. I’m not a core developer, but I don’t think “Programs can be slow” is an argument that will sway the devs.

That being said, I’d venture to guess that even if you follow that advice, you’ll still be encouraged to find an external Python package that implements whichever data structure you’re looking for, like Terry mentioned. This approach takes the middle ground between “takes lots of time and can be very complicated” for you (you don’t have to do it; it’s already in a package) and requires “skill and experience” (it’s probably built by someone with the necessary background). Python is designed to be modular in this way, and not everything you’re ever going to need is going to be packaged in the stdlib. It’s quite natural to pull in packages as you need them.

6 Likes

Python’s strength is often in it’s high quality third party libraries, learn to use them as you need:

1 Like

You may be interested in Nick’s answer to my question. @ncoghlan

You don’t need to implement them manually. You can just import the appropriate library from PyPI.

2 Likes

(Doubly) linked lists are in the stdlib as collections.deque.

Dictionaries and sets are built-in. Dicts remember insertion order (not sure about sets).

collections.Counter is a decent implementation of multisets.

The heapq module implements heap operations on lists.

For the rest, I agree with others that the purpose of the stdlib isn’t to fit each and every use cases; just use modules from PyPI.

2 Likes

Sets do not remember insertion order. While dicts do (and currently are guaranteed to by the spec), this was originally an implementation detail.

That said, a sorted set or dictionary is not the same thing as an ordered one. The goal is not to remember the order of insertion, but to a) get sorted results when iterating; b) implement search algorithms that depend on that sorted order (e.g.: extract values from a set within a certain range, or values corresponding to keys of a dictionary within a certain range; or find the “next” item/key equal to or greater than a search value in sub-linear time).

See Add `OrderedSet` to stdlib for a discussion of adding an OrderedSet data structure into python stdlib. My takeaways from that thread about one specific data structure are:

1: most people think it would be nice to have this object in the stdlib if there was no cost to having it in there.
2: There needs to be some discussion about exactly how each implementation will look
3: The biggest hurdle is finding someone(s) with the bandwidth make and maintain an implementation for each of these data structures.

For the specific case of a sorted set I find Flag useful, e.g.:

from enum import Flag
Color = Flag('Color', 'RED GREEN BLUE')

This can be used like a set, | is union, & is intersection, and ~ is inversion, e.g.:

set1 = Color.RED | Color.GREEN
set2 = Color.BLUE
union = set1 | set2
intersection = set1 & set2
inversion = ~set1
empty = Color(0)
universal = ~empty
print(universal)

Which prints:

Color.RED|GREEN|BLUE

The sets are automatically sorted in declaration order (point of discussion w.r.t. sets) and the universal set is closed (which I like).

1 Like