Syntax for aliases to keys of python dictionaries

billyeatcookies · April 11, 2022, 5:17pm

Description

There are cases where keys of python dictionaries may need aliases. When two keys point to the same value, duplication can also be avoided with this feature.

How it is done currently:

foo = {
    key1: "value",
    key2: "value",
    ...
}

Here “value” is repeatedly used as value of the keys. When there are more keys with the same value, the more duplication occurs.
See the following syntax:

Conceptual syntax

foo = {
    key1, key2, ...: "value"
}

Here, the duplication is less and the code looks cleaner. Do note that the punctuator , is just as per my concept. If it may hold conflicts with the item separators of dictionaries (which is also ,), another punctuator can be used instead. E.g: |, ;, +

Use cases

This feature can be extremely useful for libraries which tends to be user friendly.
In big projects, when there is a confusion of key names (“was that show or display?”), this feature helps to avoid scrolling all the way up.
When key names are long, this feature helps to also accept acronyms or short names.

Examples

>>> foo = {
...     "person1", "p1": "yes",
...     "person2", "p2": "no",
... }
>>>
>>> a["person1"]
'yes'
>>> a["p1"]
'yes'

ericvsmith · April 11, 2022, 5:40pm

This is easy enough to do yourself, for example python - Data structure to implement a dictionary with multiple indexes? - Stack Overflow

I don’t think the need is common enough to need dedicated syntax. It might be worth putting on PyPI, if you implement this. I haven’t checked to see if there’s something already there.

steven.daprano · April 11, 2022, 5:51pm

It is not clear what you mean by “alias”.

If I do this:

d = {1, 2: "spam"}
d[1] = "eggs"

what is d equal to now?

`{1: ‘eggs’, 2: ‘spam’}
or `{1: ‘eggs’, 2: ‘eggs’}

billyeatcookies · April 12, 2022, 4:50am

According to my concept, the second one is correct. Also by ‘alias’ I just meant letting the same value get accessed by multiple different keys. The comma separator may fail to qualify as the suitable punctuator here since its already used to separate items. But I hope you get my idea.

steven.daprano · April 12, 2022, 8:34am

Okay, in that case the easiest way to get aliases as you want them is with two dicts.

aliases = {2: 1}
data = {1: "spam"}

key = aliases.get(key, key)
value = data[key]

The first dict is used to map the aliases to the canonical (real, official) key. Use the get() method to convert the alias to the canonical key, then do the actual data lookup you want.

Unfortunately this feature you want is not easily built into the dict class. The dict class fundamentally works with each key being distinct, and there is no easy way to say “make these two keys point to the same value”.

You might think it is easy, but it isn’t. It’s a bit like saying “Can’t we just make my car into a submarine?” The entire dict internal structure would have to be re-engineered, from scratch.

That’s why the simplest solution is to augment your dict with another dict to hold the aliases.

billyeatcookies · April 20, 2022, 3:53pm

I do understand how hard it would be to implement such a feature, thanks for point out. Re-engineering the entire dict internal structure from scratch seems really scary to think about.

Here my idea kind of works like a syntactic sugar. In the compile time, python interpreter shall generate extra keys for the same dict with all the aliases we have specified.

So the code which the python programmer have is like this

foo = {
    key1, key2: "value"
}

But the interpreter shall treat the aliases as normal dict keys, so extra code should be generated to look as follows.

foo = {
    key1: "value",
    key2: "value"
}

This avoids the re-engineering of the python dict internals and at the same time makes the code look a lot cleaner.

stoneleaf · April 20, 2022, 4:47pm

Besides saving a few keystrokes, what does that gain us? In your example, "value" could be two different objects, but it seems to me that under usual circumstances they won’t be:

var = "value"
foo = {'key1':var, 'key2':var}

key1 and key2 both point to the same var.

billyeatcookies · April 20, 2022, 7:19pm

Saving a few keystrokes is the plus point here. It tends to be more readable and a lot cleaner with that.

I read about Compound statements of python match statements today, specifically the OR-patterns.
The code which is like:

match expression:
    case 1:
        return True
    case 2:
        return True
    case 3:
        return True
    case _:
        return False

can be boiled down to:

match expression:
    case 1 | 2 | 3:
        return True
    case _:
        return False

What was the plus point? saving a few keystrokes, it made it more readable and a lot cleaner.
Same reason is why dict keys needs aliases support.

Code which is like:

data = {
    'key1': True,
    'key2': True,
    'key3': True
}

should be able to just boil down to:

data = {
    'key1', 'key2', 'key3': True
}

I also did point out that having comma as a separator for key aliases may have conflicts with dict item separators (which are also commas), so this is just used for illustration here.

stoneleaf · April 20, 2022, 8:30pm

By “plus point” do you mean that’s the main reason, or it’s a bonus?

As a bonus, that’s fine, but as the main point… it’s going to take a lot more to add new syntax than it saves a few keystrokes.

I think a big difference here is that a match body is likely to be much larger than

return True

and not duplicating multiple lines of code is a much bigger advantage than just fewer keystrokes.

All that aside, do you have some real-world examples of where this would be helpful? Snippets of your own code where you would have used this feature?

steven.daprano · April 20, 2022, 9:49pm

foo = {
    key1, key2: "value"
}

I was sure that this was already legal code for a tuple key1, key2 as key. So sure that already wrote an email pointing it out, and decided at the last moment to test it.

It turned out that it isn’t legal code for a tuple key, it’s a syntax error.

So, strictly speaking, we could introduce this syntax as a short cut, and it would be fully backwards compatible.

But I’m sure I will not be the only person who will be confused by this.

people who wrongly expect the syntax {a, b: 1} to have a tuple key;
people who expect this keys a and b to be aliases, not just a shortcut.

Contrast that to the only benefits:

you save a few key strokes.

Given that this new feature is likely to be rarely used, I see it causing more confusion than benefit.

“makes the code look a lot cleaner”

That’s your personal opinion. I think it makes the code look worse. It looks like it should be an error, because I’ve mixed up set syntax {a, b} and dict syntax {a: 1, b: 1}.

sweeneyde · April 21, 2022, 12:53am

If you don’t like writing

mapping = {
    'python': 'language',
    'perl': 'language',
    'javascript': 'language',
    'linux': 'operating system',
    'windows': 'operating system',
    'apple': 'fruit',
    'banana': 'fruit',
    'orange': 'fruit',
    'pineapple': 'fruit',
}

then you could always write the data in the order you want to write it in but then let your program do the transformation:

pairs = [
    (["python", "perl", "javascript"], "language"),
    (["linux", "windows"], "operating system"),
    (["apple", "banana", "orange", "pineapple"], "fruit"),
]

mapping = {
    key: value 
    for key_group, value in pairs
    for key in key_group
}

CAM-Gerlach · April 22, 2022, 9:07am

Ha, I had the same thought but likewise checked it, and discovered it is (probably wisely) invalid syntax.

In any case, while concision has value (something which I ought to do a better job at in my own writing), I’d emphatically agree that clarity and avoiding confusion and ambiguity is much more important than “saving a few characters”.

As @steven.daprano alludes to, from a syntax perspective {'key1', 'key2', 'key3': True} looks awfully close to not only {('key1', 'key2', 'key3'): True}, but also {'key1', 'key2', 'key3', True}, which are both syntactically valid and semantically plausible. And furthermore, if the semantics are not intuitively obvious even to Python core developers (Is it a tuple? Are copies made of the objects? Does modifying one affect the other?) then its a fair bet they will cause the average user (like myself) substantial confusion.

Quercus · April 22, 2022, 11:32am

IDLE issues an Invalid syntax popup for this code when it reaches the colon:

f = {0, 1, 2, 3, 5, 8: "Fibonacci"}

Perhaps, upon finding the { delimiter, followed by a series of values separated by commas, the interpreter is expecting it to resolve as a set. That expectation is shattered when it reaches the colon.

Prior to testing that line of code, I also had thought the series of numbers might be regarded as a tuple key, that is, something equivalent to this, which is a valid dict:

f = {(0, 1, 2, 3, 5, 8): "Fibonacci"}

Yes.

vainaixr · April 27, 2022, 4:47am

one could use repeat from itertools to achieve this also, like,

from itertools import repeat
dict_ = dict(zip(["key1", "key2"], repeat("value")))
dict_

{'key1: 'value', 'key2': 'value'}

and even cycle could be useful,

from itertools import cycle
dict(zip(["person1", "person2", "p1", "p2"], cycle(["yes", "no"])))

{'p1': 'yes', 'p2': 'no', 'person1': 'yes', 'person2': 'no'}

one could create a ChainMap from this also,

from collections import ChainMap
ChainMap(*(map((lambda x, y: dict(zip(x, y))), (['a', 'b', 'c'], ['d', 'e', 'f'], 'g'), (repeat('1'), repeat('2'), '3'))))

ChainMap({'a': '1', 'b': '1', 'c': '1'}, {'d': '2', 'e': '2', 'f': '2'}, {'g': '3'})

or use starmap instead of map

from itertools import starmap
ChainMap(*starmap(lambda x, y: dict(zip(x, y)), ((['a', 'b', 'c'], repeat('1')), (['d', 'e', 'f'], repeat('2')), ('g', '3'))))

to pass an integer would have to make a list, as int is not iterable, something like,

ChainMap(*starmap(lambda x, y: dict(zip(x, y)), ((['a', 'b', 'c'], repeat('1')), (['d', 'e', 'f'], repeat('2')), ('g', [3]))))

if want to merge the dictionaries

d = {}
for i in starmap(lambda x, y: dict(zip(x, y)), ((['a', 'b', 'c'], repeat('1')), (['d', 'e', 'f'], repeat('2')), ('g', [3]))):
  d.update(i)

{'a': '1', 'b': '1', 'c': '1', 'd': '2', 'e': '2', 'f': '2', 'g': 3}

same thing using comprehension

d = {}
{d.update(i) for i in starmap(lambda x, y: dict(zip(x, y)), ((['a', 'b', 'c'], repeat('1')), (['d', 'e', 'f'], repeat('2')), ('g', [3])))}
d

same thing could be done using functools.reduce

import functools, operator as op
functools.reduce(op.or_, starmap(lambda x, y: dict(zip(x, y)), ((['a', 'b', 'c'], repeat('1')), (['d', 'e', 'f'], repeat('2')), ('g', [3]))))

if one wants to avoid repeating repeat then one could use map but then will have to repeat map

functools.reduce(op.or_, map(lambda x, y: dict(zip(x, y)), (['a', 'b', 'c'], ['d', 'e', 'f'], 'g'), (*(map(repeat, ('1', '2'))), [3])))

one could use tee also for this, but would have to specify how many times one wants to repeat, dont really think there is an advantage

from itertools import tee
dict(zip(['a', 'b', 'c'], [*map(list, tee([1, 2, 3, 4], 3))]))

{'a': [1, 2, 3, 4], 'b': [1, 2, 3, 4], 'c': [1, 2, 3, 4]}

Dutcho · May 1, 2022, 9:03pm

Or directly (using a bit of often-overlooked functionality instead of comprehension):

mapping = (dict.fromkeys(('python', 'perl', 'javascript'), 'language') |
           dict.fromkeys(('linux', 'windows'), 'operating system') |
           dict.fromkeys(('apple', 'banana', 'orange', 'pineapple'), 'fruit')
          )

billyeatcookies · May 29, 2022, 7:42am

Picking the comma as the separator maybe was not good for the example. So I tried using the pipe operator ( | ).

Union types

As of 3.10, as per pep-604 python allows writing union types as X | Y.

The same concept combined with this idea will produce the following syntax:

foo = {
    key1 | key2 | ...: "value"
}

This also takes away the confusion.

MartinPacker · May 29, 2022, 7:49am

I wonder if one advantage of this would be a saving on memory. Probably wouldn’t matter for most use cases but might for really big data items.

steven.daprano · May 29, 2022, 8:47am

The pipe operator already has a meaning. When you enter:

mydict = { key1 | key2 : "value" }

the resulting key is key1|key2, whatever that evaluates as.

{5|12: "value"}     # key is 13, the bitwise OR of 5 and 12
{int|str: "value"}  # key is the union of int and str

Fredsch · July 4, 2022, 10:20pm

I was looking for the kind of feature you are requesting in this thread. But I was expecting a different syntax.

The original syntax is {key: value, key2: value, …}

Using commas or pipes without any kind of braces makes the syntax look arbitrary and hard to tell where the values end and the keys start.

I was expecting lists as keys. These currently produce Type Error: unhashable type and are thus backwards compatible.

I would find these extremely helpful for supporting synonyms and different spellings. It would look like this.

import numpy
Mydict = {
["colour", "color"]: "blue is an example of a ...",
["pants", "trousers"]: "pants.jpg",
["fast", "quick", "optimized"]: numpy.sort
}

Currently I have to duplicate rows or write some kind of extra code to decompose a list of lists of lists into what I want. Having this feature would in my opinion help increase readability.
Especially when the values themselves become dictionaries or lists, it would help not to duplicate lines and thus create copies.

billyeatcookies · July 13, 2024, 8:23am

The syntaxes suggested by me in the thread at first were ambiguous and not backwards compatible.

Although this topic is now dead, I would like to bump this very last suggestion from @Fredsch of using a list to map aliases to values.

This is backwards compatible as list is unhashable, something every user should know about hashmaps. So the parser may parse this as new aliases syntax.

data = {
    'key1': True,
    'key2': True,
    'key3': True
}

may be boiled down to

data = {
    ['key1','key2', 'key3']: True
}