# Sliding window question -- how to optimize the loop usage

This is a standard question of finding the longest substring without repeating characters. My code does work, but I have two while loops of which I am not sure any one can be further simplified:

``````class Solution:
def lengthOfLongestSubstring(self, s: str) -> int:
n = len(s)
if n == 0:
return 0
curr_str = s[0]
max_str = s[0]
max_len = 1
curr_set = set(s[0])
i = 0
j = 1

while j < n:
while j < n and s[j] not in curr_set:
curr_str = s[i:j+1]
j += 1
if len(curr_str) > max_len:
max_len = len(curr_str)
curr_set.remove(s[i])
i += 1

return max_len
``````

I have ` while j < n` in the first while loop and also in the inner loop. I could not see a way to simplify the two ` while j < n` statements. It seems that both need to exist. Is it common for such sliding window questions? Is there any good way to simplify based on the current code? Thank you!!

PS: I know this `class Solution` is an empty, useless class that does nothing, but it is the way Lxxxcode does it, and I just copy my solution. I will certainly not have this class if I code for myself.

1 Like

I redo it. I think it would be easier.

``````class Solution:
def lengthOfLongestSubstring(self, s: str) -> int:
n = len(s)
if n == 0 or n ==1:
return n
set1 = set(s[0])
i = 0  # i: left; j: right
max_len = 1

for j in range(1,n):
if s[j] not in set1:
max_len = max(max_len, j+1-i)
else:
while s[i] != s[j]:
set1.remove(s[i])
i += 1
i += 1

return max_len
``````

It might be a standard question, but not standard enough for me to know what you mean by â€ślongest substring without repeating charactersâ€ť.

Can you give an example or two of input and expected output?

Or link to a discussion where the expected behaviour is described?

Sorry. This is the question:

Given a string s, find the length of the longest substring without repeating characters.

Example 1:

Input: s = â€śabcabcbbâ€ť
Output: 3
Explanation: The answer is â€śabcâ€ť, with the length of 3.
Example 2:

Input: s = â€śbbbbbâ€ť
Output: 1
Explanation: The answer is â€śbâ€ť, with the length of 1.
Example 3:

Input: s = â€śpwwkewâ€ť
Output: 3
Explanation: The answer is â€śwkeâ€ť, with the length of 3.
Notice that the answer must be a substring, â€śpwkeâ€ť is a subsequence and not a substring.

When asking about â€śoptimizationâ€ť, its important to specify what youâ€™re optimizing forâ€”execution time? Memory usage? Fewest characters/lines? Simplest/cleanest overall code?

In any case, your second function is definitely cleaner, but it still can be simplified a good deal further, which both makes it shorter and easier to understand, and also improves its performance. You can simply iterate over each `char` in the string, check if each is `in` your running `set`, set `max_len` to the `max` of the `set`'s `len` and the current `max_len` and clear the set if so, and then add the new character to the set.

I.e.

``````def get_len_longest_unique_substring(the_string):
max_len = 0
unique_chars = set()
for char in the_string:
if char in unique_chars:
max_len = max(len(unique_chars), max_len)
unique_chars.clear()
unique_chars |= {char}
return max(len(unique_chars), max_len)
``````

You can optimize for some specific cases, like `len(set(the_string)) == 1` (all characters are the same, so `return 1`, i.e. your second case), or `len(set(the_string)) == len(the_string)` (all characters are unique, so `return len(the_string)` to perform better on shorter strings, or returning immediately if `max_len == len(set(the_string))` to do better on longer ones, at the cost of code complexity and possibly slower performance on the general case, and thereâ€™s probably some low-level trickery you can do to speed things up, at the cost of making your code less readable, but all of that is almost certainly an over-optimization unless you are dealing with on the order of millions of strings, as it will never pay back the development time it cost.

Comparing code length (as a proxy for complexity) with your solution, it weighs in at 8 lines, relative to 21 in your first solution and 18 in your second. Regarding execution time, I ran `%timeit` (via IPython) on all three:

``````>>> %timeit lengthOfLongestSubstring("abcabcbb")
16.9 Âµs Â± 697 ns per loop (mean Â± std. dev. of 7 runs, 100000 loops each)

>>> %timeit lengthOfLongestSubstring("abcabcbb")
8.47 Âµs Â± 403 ns per loop (mean Â± std. dev. of 7 runs, 100000 loops each)

>>> %timeit get_len_longest_unique_substring("abcabcbb")
6.07 Âµs Â± 118 ns per loop (mean Â± std. dev. of 7 runs, 100000 loops each)
``````

You can see that your second iteration (pun not intended) is twice as fast due to the lack of a nested loop, but the version above does 50% faster still due to the lower complexity.

On a slightly longer, all-unique string:

``````>>> %timeit lengthOfLongestSubstring(string.ascii_lowercase)
31.7 Âµs Â± 1.47 Âµs per loop (mean Â± std. dev. of 7 runs, 10000 loops each)

>>> %timeit lengthOfLongestSubstring2(string.ascii_lowercase)
32 Âµs Â± 1.79 Âµs per loop (mean Â± std. dev. of 7 runs, 10000 loops each)

>>> %timeit longest_unique_substring(string.ascii_lowercase)
11.6 Âµs Â± 275 ns per loop (mean Â± std. dev. of 7 runs, 100000 loops each)
``````

Both of your methods take essentially the same time, since the nested loop doesnâ€™t come into effect, while the simpler method above is nearly three times faster.

On a very long string with many repeating segments, on the other hand:

``````>>> %timeit lengthOfLongestSubstring(string.ascii_lowercase * 1000)
57.6 ms Â± 1.71 ms per loop (mean Â± std. dev. of 7 runs, 10 loops each)

>>> %timeit lengthOfLongestSubstring2(string.ascii_lowercase * 1000)
15.4 ms Â± 580 Âµs per loop (mean Â± std. dev. of 7 runs, 100 loops each)

>>> %timeit longest_unique_substring(string.ascii_lowercase * 1000)
10.7 ms Â± 259 Âµs per loop (mean Â± std. dev. of 7 runs, 100 loops each)
``````

You can that your first function is nearly four times lower than your second due to the nested loop, while the simpler above code is around 2/3rds faster.

The moral of the story, though, is that unless youâ€™re dealing with huge data, lots of data, working on a very constrained system or have very limited response time, the most important thing is to first write code that is easy to read, understand and modify, as that will likely save far more net time in the long run than shaving a few microseconds off the run time time.

2 Likes