Count() doesn't count all occurrences

Juandev · September 21, 2023, 8:44am

I wonder why count() doesn’t count all occurrences. I found, that the following code results are missing multiple substrings of "_2_" or "_2."

pocty_pismen_v_textu = "1 9 8 8 9 2 3 9 6 4 3 5 2 2 7 7 5 1 15 3 2 6 3 8 4 2 3 4 6. 1 4 7 9 2 8 9 5 8 4 2 2 3 9 1 10 7 4 2 8 5. 9 2 4 9 7 5 1 10 2 7 7 3. 3 2 2 7 5 5 4 3 6 4 5 4 8 8 4 2 2 6. 5 2 6 7 2 9 8 5 7 5 10 9 1 10 13 4 2 8 2 2 8 10 2. 10 6 6 1 9. 5 3 9 2 8 8 11 6 5 4 9 3 10 5 10 4 6. 7 10 9 2 2 8 9 9 6 1 10 2 5 3 4 7 7 1 7 10. 9 9 10 6 2 12 7 9 2 5 8 7 7. 6 4 10 7 2 10 11 11 7 6 9 1 10 7 2 7 10 10 4 8 5 6 4 1 7 4 8. 4 4 5 2 4 10 3 3 2 8 7 2 6 8 2 3 1 9 6 1 6 7 1 8 11. 4 12 3 3 2 11 1 14 8 2 2 5 10 10. 5 3 7 5 2 6 4 11 3 7 2 2 7 13 1 10 10 6 2 8. 9 6 4 5 4 2 8. 8 5 4 9 1 6 5 6 3 1 1 5 5. 5 1 11 8 6 6 7 1 10 1 7 11 5 2 7 2 8 1 14."

print((pocty_pismen_v_textu.count(" 1 ")))
print((pocty_pismen_v_textu.count(" 1.")))
print((pocty_pismen_v_textu.count(" 2 ")))
print((pocty_pismen_v_textu.count(" 2.")))

The long string is the length of words in phrases. I have created this long string because I wanted to display it without commas if it was e.g. a tuple. I am counting the number of words which have 3 or 4 letters.

kknechtel · September 21, 2023, 9:01am

When I try the code, I get these results:

>>> pocty_pismen_v_textu = "1 9 8 8 9 2 3 9 6 4 3 5 2 2 7 7 5 1 15 3 2 6 3 8 4 2 3 4 6. 1 4 7 9 2 8 9 5 8 4 2 2 3 9 1 10 7 4 2 8 5. 9 2 4 9 7 5 1 10 2 7 7 3. 3 2 2 7 5 5 4 3 6 4 5 4 8 8 4 2 2 6. 5 2 6 7 2 9 8 5 7 5 10 9 1 10 13 4 2 8 2 2 8 10 2. 10 6 6 1 9. 5 3 9 2 8 8 11 6 5 4 9 3 10 5 10 4 6. 7 10 9 2 2 8 9 9 6 1 10 2 5 3 4 7 7 1 7 10. 9 9 10 6 2 12 7 9 2 5 8 7 7. 6 4 10 7 2 10 11 11 7 6 9 1 10 7 2 7 10 10 4 8 5 6 4 1 7 4 8. 4 4 5 2 4 10 3 3 2 8 7 2 6 8 2 3 1 9 6 1 6 7 1 8 11. 4 12 3 3 2 11 1 14 8 2 2 5 10 10. 5 3 7 5 2 6 4 11 3 7 2 2 7 13 1 10 10 6 2 8. 9 6 4 5 4 2 8. 8 5 4 9 1 6 5 6 3 1 1 5 5. 5 1 11 8 6 6 7 1 10 1 7 11 5 2 7 2 8 1 14."
>>> print((pocty_pismen_v_textu.count(" 1 ")))
21
>>> print((pocty_pismen_v_textu.count(" 1.")))
0
>>> print((pocty_pismen_v_textu.count(" 2 ")))
34
>>> print((pocty_pismen_v_textu.count(" 2.")))
1

Are those the results you get?

Exactly what do you think the results should be instead, and why?

Can you demonstrate the problem using a shorter input?

Juandev · September 21, 2023, 9:45am

Yes, I am getting same results.

I think the results should be:

>>> print((pocty_pismen_v_textu.count(" 2 ")))
42

They should be higher, because if I count the number of occurrences of 2 by hand (see image) and I am getting to 42.

The code is missing 8 occurences of " 2 ". So I wonder, if I understand how the count() works, because if I count the amount of " 2 2 " I am getting also 8. So it looks like, it might be removing first " 2 " from a string leaving there "2 " which I am not counting. Does it work that way?

JamesParrott · September 21, 2023, 10:06am

There just aren’t that many 2s in there. Generally whenever you find yourself questioning the core Python library for something this simple, first consider if you’ve done-goofed and messed up yourself. You probably have.

I did a ctrl + F find all in notepad++. There are only 35 * " 2 " and 2 * " 2."s.

Ctrl+F

pochmann · September 21, 2023, 10:30am

Couldn’t you have demonstrated the issue with just print('1 2 2 3'.count(' 2 ')) (returning 1, expecting 2)?

Juandev · September 21, 2023, 10:38am

I’m not questioning the core Python library. How did you come to question this? I wonder where is the mistake. And that means it can easily be on my side as well.

But I cannot agree with your calculation. Above I calculated it by hand and it actually gives me 43. When I put it into the online tool it gives me the same number.

So, if there is no error in the doubled 2 (you didn’t confirm that), is the an error in the gap?

kknechtel · September 21, 2023, 10:39am

Ah.

>>> help(str.count)
count(...)
    S.count(sub[, start[, end]]) -> int
    
    Return the number of non-overlapping occurrences of substring sub in
    string S[start:end].  Optional arguments start and end are
    interpreted as in slice notation.

Note: non-overlapping. If you look for " 2 " with spaces, then "1 2 2 1" would have overlapping matches.

Juandev · September 21, 2023, 10:40am

Maybe I could, but I am a newbie and I haven’t come to such an idea.

Juandev · September 21, 2023, 10:43am

Thanks, that explains what is the cause of it. Even though I read the help, I haven’t got an idea to look at it again when it didn’t work as expected. I will try to keep it in mind next time.

hansgeunsmeyer · September 21, 2023, 6:26pm

@Juandev - I wouldn’t feel too bad about that - I’ve been bitten by the issue that standard string search is “non-overlapping” a few times even though I’ve been writing Python for a long time. This is the case both in normal string search and in regex searches.
(There are actually other string search algorithms – other than the standard builtin and regex ones-- that can return overlapping results - if you’d ever want those.)

pochmann · September 21, 2023, 6:49pm

You can use regex for overlapping occurrences, though.

hansgeunsmeyer · September 21, 2023, 7:01pm

Of course you can. My point was simply that people sometimes forget that regex search by default also only returns non-overlapping matches. Also, this problem can kind of sneak up on you, even if you are fully aware of the general way regexes work, since the regex could be dynamically constructed and contain partially overlapping alternatives…