When asking about “optimization”, its important to specify what you’re optimizing for—execution time? Memory usage? Fewest characters/lines? Simplest/cleanest overall code?
In any case, your second function is definitely cleaner, but it still can be simplified a good deal further, which both makes it shorter and easier to understand, and also improves its performance. You can simply iterate over each char
in the string, check if each is in
your running set
, set max_len
to the max
of the set
's len
and the current max_len
and clear the set if so, and then add the new character to the set.
I.e.
def get_len_longest_unique_substring(the_string):
max_len = 0
unique_chars = set()
for char in the_string:
if char in unique_chars:
max_len = max(len(unique_chars), max_len)
unique_chars.clear()
unique_chars |= {char}
return max(len(unique_chars), max_len)
You can optimize for some specific cases, like len(set(the_string)) == 1
(all characters are the same, so return 1
, i.e. your second case), or len(set(the_string)) == len(the_string)
(all characters are unique, so return len(the_string)
to perform better on shorter strings, or returning immediately if max_len == len(set(the_string))
to do better on longer ones, at the cost of code complexity and possibly slower performance on the general case, and there’s probably some low-level trickery you can do to speed things up, at the cost of making your code less readable, but all of that is almost certainly an over-optimization unless you are dealing with on the order of millions of strings, as it will never pay back the development time it cost.
Comparing code length (as a proxy for complexity) with your solution, it weighs in at 8 lines, relative to 21 in your first solution and 18 in your second. Regarding execution time, I ran %timeit
(via IPython) on all three:
>>> %timeit lengthOfLongestSubstring("abcabcbb")
16.9 µs ± 697 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>> %timeit lengthOfLongestSubstring("abcabcbb")
8.47 µs ± 403 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>> %timeit get_len_longest_unique_substring("abcabcbb")
6.07 µs ± 118 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
You can see that your second iteration (pun not intended) is twice as fast due to the lack of a nested loop, but the version above does 50% faster still due to the lower complexity.
On a slightly longer, all-unique string:
>>> %timeit lengthOfLongestSubstring(string.ascii_lowercase)
31.7 µs ± 1.47 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit lengthOfLongestSubstring2(string.ascii_lowercase)
32 µs ± 1.79 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit longest_unique_substring(string.ascii_lowercase)
11.6 µs ± 275 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Both of your methods take essentially the same time, since the nested loop doesn’t come into effect, while the simpler method above is nearly three times faster.
On a very long string with many repeating segments, on the other hand:
>>> %timeit lengthOfLongestSubstring(string.ascii_lowercase * 1000)
57.6 ms ± 1.71 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit lengthOfLongestSubstring2(string.ascii_lowercase * 1000)
15.4 ms ± 580 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit longest_unique_substring(string.ascii_lowercase * 1000)
10.7 ms ± 259 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
You can that your first function is nearly four times lower than your second due to the nested loop, while the simpler above code is around 2/3rds faster.
The moral of the story, though, is that unless you’re dealing with huge data, lots of data, working on a very constrained system or have very limited response time, the most important thing is to first write code that is easy to read, understand and modify, as that will likely save far more net time in the long run than shaving a few microseconds off the run time time.