Perform Case Insentive Search

dgdg · July 30, 2021, 3:39pm

This code works and returns 5 lines before the string ‘JUMP’, but need to make the search case-insensitive.

Best, Dave

from collections import deque

def search(lines, pattern, history=1):
    previous_lines = deque(maxlen=history)
    for line in lines:
        if pattern in line:
            yield line, previous_lines
        previous_lines.append(line)

# Example use on a file
if __name__ == '__main__':
    with open('test.txt') as f:
        for line, prevlines in search(f, 'JUMP', 5):
            for pline in prevlines:
                print(pline, end='')
            print(line, end='')
            print('X'*20)

test.txt

I saw the cow
he did jump
over the moon

1
2
3
4
5
6
7
8
9
10
jump

nothing here
1
2
3
4
5
JUMP

aroberge · July 30, 2021, 7:27pm

Perhaps if pattern.lower() in line.lower() …

dgdg · July 30, 2021, 10:09pm

Thanks

Thought to add a regular expression, but it errors.

search.re(f, (/JUMP/i), 5):

Best,
Dave

cameron · July 30, 2021, 10:50pm

That’s because that isn’t valid Python code. See the “re” module for
using regular expressions in Python.

Definitiely the simplest and easiest thing is AndrÃ©’s suggestion -
lowercase both strings, then search.

Only reach for regular expressions when things get quite difficult
without them. For anything complex they’re cryptic and error prone. Not
to mention expensive - they’re inherently more complex than simple fixed
string stuff. Of course, there is a threshold where they’re a sensible
choice, but too many people reach for them as their first choice.
Generally they are a middle ground: undesirable for simple stuff,
suitable for some more complicated stuff, and undesireable again for
serious parsing (eg language grammars).

Cheers,
Cameron Simpson cs@cskk.id.au

steven.daprano · July 31, 2021, 4:57am

Hi AndrÃ©,

The casefold method of strings is better for case-insensitive testing.
It is similar to lowercase, but covers some unusual cases such as German
sharp s Ã better.

Even there, things like Turkish dotted i and dotless Ä± won’t work
correctly, so if you are dealing with Turkic languages, you may need
some custom functions.

dgdg · July 31, 2021, 5:19pm

That did it! Thanks guys. Best, Dave

> from collections import deque
> import re
> 
> def search(lines, pattern, history=1):
>     previous_lines = deque(maxlen=history)
>     for line in lines:
>         if pattern.lower() in line.lower():
>             yield line, previous_lines
>         previous_lines.append(line)
> 
> # Example use on a file
>  
> 
> if __name__ == '__main__':
>     with open('test.txt') as f:
>         for line, prevlines in search(f, 'JUMP', 5):
>             for pline in prevlines:
>                 print(pline, end='')
>             print(line, end='')
>             print('X'*20)