Count() doesn't count all occurrences

I wonder why count() doesn’t count all occurrences. I found, that the following code results are missing multiple substrings of "_2_" or "_2."

pocty_pismen_v_textu = "1 9 8 8 9 2 3 9 6 4 3 5 2 2 7 7 5 1 15 3 2 6 3 8 4 2 3 4 6. 1 4 7 9 2 8 9 5 8 4 2 2 3 9 1 10 7 4 2 8 5. 9 2 4 9 7 5 1 10 2 7 7 3. 3 2 2 7 5 5 4 3 6 4 5 4 8 8 4 2 2 6. 5 2 6 7 2 9 8 5 7 5 10 9 1 10 13 4 2 8 2 2 8 10 2. 10 6 6 1 9. 5 3 9 2 8 8 11 6 5 4 9 3 10 5 10 4 6. 7 10 9 2 2 8 9 9 6 1 10 2 5 3 4 7 7 1 7 10. 9 9 10 6 2 12 7 9 2 5 8 7 7. 6 4 10 7 2 10 11 11 7 6 9 1 10 7 2 7 10 10 4 8 5 6 4 1 7 4 8. 4 4 5 2 4 10 3 3 2 8 7 2 6 8 2 3 1 9 6 1 6 7 1 8 11. 4 12 3 3 2 11 1 14 8 2 2 5 10 10. 5 3 7 5 2 6 4 11 3 7 2 2 7 13 1 10 10 6 2 8. 9 6 4 5 4 2 8. 8 5 4 9 1 6 5 6 3 1 1 5 5. 5 1 11 8 6 6 7 1 10 1 7 11 5 2 7 2 8 1 14."

print((pocty_pismen_v_textu.count(" 1 ")))
print((pocty_pismen_v_textu.count(" 1.")))
print((pocty_pismen_v_textu.count(" 2 ")))
print((pocty_pismen_v_textu.count(" 2.")))

The long string is the length of words in phrases. I have created this long string because I wanted to display it without commas if it was e.g. a tuple. I am counting the number of words which have 3 or 4 letters.

When I try the code, I get these results:

>>> pocty_pismen_v_textu = "1 9 8 8 9 2 3 9 6 4 3 5 2 2 7 7 5 1 15 3 2 6 3 8 4 2 3 4 6. 1 4 7 9 2 8 9 5 8 4 2 2 3 9 1 10 7 4 2 8 5. 9 2 4 9 7 5 1 10 2 7 7 3. 3 2 2 7 5 5 4 3 6 4 5 4 8 8 4 2 2 6. 5 2 6 7 2 9 8 5 7 5 10 9 1 10 13 4 2 8 2 2 8 10 2. 10 6 6 1 9. 5 3 9 2 8 8 11 6 5 4 9 3 10 5 10 4 6. 7 10 9 2 2 8 9 9 6 1 10 2 5 3 4 7 7 1 7 10. 9 9 10 6 2 12 7 9 2 5 8 7 7. 6 4 10 7 2 10 11 11 7 6 9 1 10 7 2 7 10 10 4 8 5 6 4 1 7 4 8. 4 4 5 2 4 10 3 3 2 8 7 2 6 8 2 3 1 9 6 1 6 7 1 8 11. 4 12 3 3 2 11 1 14 8 2 2 5 10 10. 5 3 7 5 2 6 4 11 3 7 2 2 7 13 1 10 10 6 2 8. 9 6 4 5 4 2 8. 8 5 4 9 1 6 5 6 3 1 1 5 5. 5 1 11 8 6 6 7 1 10 1 7 11 5 2 7 2 8 1 14."
>>> print((pocty_pismen_v_textu.count(" 1 ")))
21
>>> print((pocty_pismen_v_textu.count(" 1.")))
0
>>> print((pocty_pismen_v_textu.count(" 2 ")))
34
>>> print((pocty_pismen_v_textu.count(" 2.")))
1

Are those the results you get?

Exactly what do you think the results should be instead, and why?

Can you demonstrate the problem using a shorter input?

1 Like

Yes, I am getting same results.

I think the results should be:

>>> print((pocty_pismen_v_textu.count(" 2 ")))
42

They should be higher, because if I count the number of occurrences of 2 by hand (see image) and I am getting to 42.

The code is missing 8 occurences of " 2 ". So I wonder, if I understand how the count() works, because if I count the amount of " 2 2 " I am getting also 8. So it looks like, it might be removing first " 2 " from a string leaving there "2 " which I am not counting. Does it work that way?

There just aren’t that many 2s in there. Generally whenever you find yourself questioning the core Python library for something this simple, first consider if you’ve done-goofed and messed up yourself. You probably have.

I did a ctrl + F find all in notepad++. There are only 35 * " 2 " and 2 * " 2."s.

Ctrl+F

Couldn’t you have demonstrated the issue with just print('1 2 2 3'.count(' 2 ')) (returning 1, expecting 2)?

1 Like

I’m not questioning the core Python library. How did you come to question this? I wonder where is the mistake. And that means it can easily be on my side as well.

But I cannot agree with your calculation. Above I calculated it by hand and it actually gives me 43. When I put it into the online tool it gives me the same number.

So, if there is no error in the doubled 2 (you didn’t confirm that), is the an error in the gap?

Ah.

>>> help(str.count)
count(...)
    S.count(sub[, start[, end]]) -> int
    
    Return the number of non-overlapping occurrences of substring sub in
    string S[start:end].  Optional arguments start and end are
    interpreted as in slice notation.

Note: non-overlapping. If you look for " 2 " with spaces, then "1 2 2 1" would have overlapping matches.

2 Likes

Maybe I could, but I am a newbie and I haven’t come to such an idea.

Thanks, that explains what is the cause of it. Even though I read the help, I haven’t got an idea to look at it again when it didn’t work as expected. I will try to keep it in mind next time.

@Juandev - I wouldn’t feel too bad about that - I’ve been bitten by the issue that standard string search is “non-overlapping” a few times even though I’ve been writing Python for a long time. This is the case both in normal string search and in regex searches.
(There are actually other string search algorithms – other than the standard builtin and regex ones-- that can return overlapping results - if you’d ever want those.)

1 Like

You can use regex for overlapping occurrences, though.

Of course you can. My point was simply that people sometimes forget that regex search by default also only returns non-overlapping matches. Also, this problem can kind of sneak up on you, even if you are fully aware of the general way regexes work, since the regex could be dynamically constructed and contain partially overlapping alternatives…