I recently came across some code that read my_string[:1], which seem like an odd idiom for my_string[0]. I thought I’d measure how much worse slicing was compared to single-element access, and these are my local results:
$> python3.13 -m timeit '"this is a string"[:1]'
10000000 loops, best of 5: 27.3 nsec per loop
$> python3.13 -m timeit '"this is a string"[0]'
50000000 loops, best of 5: 7.73 nsec per loop
The question is: do people think it be would worth adding such an optimisation to the unicodeobject.c module? There already is some special casing for slices that represent the full string, so there is some precedent
The first generates a string, which afaik requires creating a new string, while the second one just requires creating an integer right? I think a more fair comparison would be to do str("string"[0]), but I might be wrong.
I think this is a case where the proper fix is to make sure people are aware that they shouldn’t do s[:1] or s[-1:] and use s[0] or s[-1] instead. It’s both clearer to read as well as more performant.
Yeah, I’ve used x[:1] == c to handle empty strings. It’s faster and shorter than x and x[0] == c or x.startswith(c). It’s a useful idiom when parsing data and making it eve a tiny bit faster would be nice.
With s = 'abc' and sl = slice(1) (to save the time for building the slice object every time).
Code
from timeit import timeit
from statistics import mean, stdev
import sys
import random
setup = '''
s = 'abc'
sl = slice(1)
'''
funcs = ['s[0]', 's[:1]', 's[sl]', "'abc'[0]", 'pass', "s[0] if s else ''", 's and s[0]', 's[:]', 's[:0]']
times = {f: [] for f in funcs}
def stats(f):
ts = [t * 1e9 for t in sorted(times[f])[:5]]
return f'{mean(ts):5.1f} ± {stdev(ts):3.1f} ns '
for _ in range(25):
random.shuffle(funcs)
for f in funcs:
t = timeit(f'{f};'*10, setup, number=10**5) / 1e6
times[f].append(t)
for f in sorted(funcs, key=stats):
print(stats(f), f)
print('\nPython:', sys.version)
Firstly, thanks everyone for chiming in, it was an interesting discussion!
That’s indeed the code I referred to when I posted originally and said that some special cases are being handled already. Your analysis seems the most relevant to me, as it predicts how much gain we could achieve if this optimisation was to be implemented, and like you say it’s not much really.
This is what I tried now after considering everyone’s feedback:
In [1]: import sys
In [2]: s = sys.argv[0]
In [3]: single_element_slice = slice(0, 1)
In [4]: null_slice = slice(1,1)
In [5]: null_slice2 = slice(0, 0)
In [6]: full_slice = slice(0, len(s))
In [7]: %timeit s[0]
15.9 ns ± 0.0599 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)
In [8]: %timeit s[0]
16 ns ± 0.0257 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)
In [9]: %timeit s[1]
16.2 ns ± 0.277 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)
In [10]: %timeit s[null_slice]
22 ns ± 0.0195 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [11]: %timeit s[null_slice2]
22 ns ± 0.0241 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [12]: %timeit s[single_element_slice]
22.5 ns ± 0.0165 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [13]: %timeit s[full_slice]
26.6 ns ± 0.0355 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
The best-case scenario for slice-based access is the 0-length slices, so those indicate the overhead of parsing the slice object, etc, before constructing any new str object. So, a 1-element slicing operation that (locally) takes only half a nano-second more in average than that isn’t too bad actually
Indeed, I hadn’t realised my initial analysis was flawed by using a constant. I also hadn’t considered the overhead of re-creating the slice object itself. When using a variable things get worse indeed for the direct indexing case, and reusing the slice also seems to make a difference, albeit less noticeable, so maybe it’s just noise at that point.
Indeed the code was part of a parser, but by the time the code was hit the string was known to be not empty, hence direct indexing into the first element made more sense to me. I figured direct indexing was faster, and while it is the gain isn’t much.