Nicer interface for str.translate

kknechtel · July 28, 2023, 11:03pm

I’m not entirely sure what you’re trying to say here. The mapping is a mapping either way. str.translate expects the keys to be Unicode ordinals rather than characters. My hypothesis is that this doesn’t meaningfully help the C implementation. Yes, presumably at the C level it can directly determine the code point value as it iterates through the string, and not need to create one-character string objects for the lookup routine. But it presumably does need to create integer objects to use the dict lookup routines.

The code shown is a Python mockup to explain the desired functionality by adapting it to the old interface (which is why it uses the one-argument form of maketrans). If it were implemented in C instead, it could directly do the lookup work with a mapping that uses single-character strings as keys, rather than ordinals - no maketrans call involved. Alternately, I could have written the mockup code to use the ''.join idiom instead.

To showcase the idea fully as a separate package it would need to include a C extension as well, I think. The point is to demonstrate having access to the small performance boost available that way, while not demanding the ordinal-key input format. I should attempt this, because it will help verify the hypothesis as well

Of course, this still doesn’t allow for adding a method to the str type - that can only be done “from inside”, as far as I know. If the idea were accepted later, I would want it to work that way.

Update: no, it is possible and I just saw it in the other thread. Sorely tempted to show it off that way. A hack, yes, but it means the idea as presented in the package works as much as possible like what I envision.

Sure, there are cases where constructing the mapping is expensive compared to the translation, and I can separate the steps. The point is more about also having an interface that is simple to use for a one-shot. The existing design could easily have a fast path for when only a mapping is provided (no copy); then it would just need to provide another function for only creating the mapping without performing translation.

I don’t really like the idea of making a separate type, because it would essentially just wrap a dict and hide most of its functionality. It seems like the kind of thing Jack Diederich warned against.