A little help with a simple intersection

I have:

with open('file1.txt', 'r') as file1:
    words1 = set(file1.read().split())

with open('file2.txt', 'r') as file2:
    words2 = set(file2.read().split())

common_words = words1.intersection(words2)
print(common_words)

Which works, however I need some assistance in 1) finding the common_words such that case doesn’t matter and 2) that the ordering of the intersected words comes from the found order of file2

file1 has an alphabetical list of words, file2 is several paragraphs of text.

TIA - hox

I got a little antsy and put in my request to chat.openai.com. It gave me the following which seems to work:

with open('file1.txt', 'r') as f1, open('file2.txt', 'r') as f2:
    # Read the contents of each file and split them into words
    words1 = set(f1.read().lower().split())
    words2 = f2.read().lower().split()

    # Find the common words between the two files
    common_words = [word for word in words2 if word in words1]

    # Output the common words in the order they are found in file2
    for word in common_words:
        print(word)

How does it look to the pros in here?
tia,
hox

I’d open the files one at a time, unless you want to quit immediately if
you can’t open both. Personally, I’d consider that unlikely in normal
use so I’d just read each file on its own.

words1 looks fine. For a large file it is more memory efficient to
read it a line at a time instead of reading the whole file into memory
with f1.read(), eg:

 words1 = set()
 with open('file1.txt', 'r') as f1:
     for line in f1:
         words1.update(line.lower().split())

You can do the same to make words2.

To recite the intersecting words in file2 order, you’ve got 2
approaches:

  • load all the file2 words, then intersect with words1 - don’t forget
    that sets have an intersection operation, the scan the words from
    file2 (using a separat list of those words you also kept) and print or
    not depending if they’re in the intersection set
  • read file2 progressively, iterating over the words; if a word exists
    in words1, print it and (maybe) discard it from words1 if you do
    not want to repeat a word; thos obviates any need for a words2 set
    etc

Cheers,
Cameron Simpson cs@cskk.id.au

1 Like