I was working on a project that would require some very large string concatenation. Because of this, I wanted to make sure that I was using the most efficient way to join the string. I narrowed it down to either summing the strings up (i.e. str1+str2
) or using the join
method (i.e. "".join([str1, str2])
).
However, I noticed something strange when testing these two methods. Simply summing the strings is faster but only if summing just two strings if I want to concatenate three strings, it is faster to use the join
method.
Can anyone explain why this is?
For reference, I am running the 64 bit version of Python 3.8.5 (downloaded from python.org) on a Windows 10 Computer with 16GB of RAM
As an example, here is a simple script that (on my computer at least) demonstrates this phenomena:
def add2fcn():
x='a'
new = 'b'
for _ in range(100000):
x = x + new
def join2fcn():
x='a'
new = 'b'
for _ in range(100000):
x = "".join([x, new])
def add3fcn():
x='a'
sep = '\\n'
new = 'b'
for _ in range(100000):
x = x + sep + new
def join3fcn():
x='a'
sep = '\\n'
new = 'b'
for _ in range(100000):
x = "".join([x, sep,new])
import timeit
import re
number = 100
add2 = timeit.timeit(add2fcn, number=number)
print(f"Adding two strings takes {add2 / number} seconds")
join2 = timeit.timeit(join2fcn, number=number)
print(f"Joining two strings takes {join2 / number} seconds")
print("\n")
add3 = timeit.timeit(add3fcn, number=number)
print(f"Adding three strings takes {add3 / number} seconds")
join3 = timeit.timeit(join3fcn, number=number)
print(f"Joining three strings takes {join3 / number} seconds")
This results in the following output:
Adding two strings takes 0.020516059000000003 seconds
Joining two strings takes 0.210043305 seconds
Adding three strings takes 0.7843044419999999 seconds
Joining three strings takes 0.665828113 seconds