# Find the frequency of words of more than three letter

I want to find the frequency of all the words in my text file that are more than three letter so I can make a distribution curve of them.
This is my code so far:

``````import re
for word in open('play (4).txt','r'):
re.findall len(word>=3).count
``````

Can someone look over it and advise me for where to go with it?

This is not a correct Python code.

If you really want to use regular expressions, then my advice is to first remove all 3 letter words, then count the rest

``````word = re.sub(r"\b\w{3}\b", "", word)
``````

Now you can count the rest

Posting code that isn’t even Python doesn’t convince us that you’ve actually tried to do this before asking us.

``````dist = {}
with open('play (4).txt','r') as fp:
if len(word) > 3:
dist[len(word)] = 1 + dist.get(len(word), 0)
``````

The result, `dist` is a dictionary from lengths of words to the number of words with that length.

Hi,

I just had an idea for the code I could potentially run in python, I have to do a few things before I can actually use the code but wanted to be thinking ahead. I haven’t actually run the code that I said in python yet but thought I might be along the right lines.

For the code that you have given - would I then print(word)?

Are you required to use regex or can you use any other method?

I do not have to use regex, I just have been using it for a while so I am becoming slightly more confident with using it.

Here’s another solution that works for me. By looking at the different solutions you can get an idea how this works. BTW, I recommend you do a full Python tutorial.

``````import sys # Needed for sys.exit().
from os.path import exists
import re # For regex.

file1 = 'play (4).txt' # Get one file.
# See if the file actually exists.
if not exists(file1):
print(f"ERROR: File {file1} does not exist")
sys.exit() # Exit program.

filein = open(file1, 'r')
filein.close()
# Split on non-word characters which is \W.
textlist = re.split('\W', textin) # Turn our file into a list.
cnt = 0
for word in textlist: # Loop through each word.
if len(word)>3:
cnt += 1

print(f"Words with at least 4 ctrs: {cnt}")

r'''This is my file contents:
This is the play 4 file [with brackets].
And with {braces}.
[Some more brackets].
'''
``````

Thank you for sharing

1. Why do you open the file for reading then close it?
2. does the re.split, split each word into a string which therefore creates a list of all the strings/ words in the document?
3. Is the count basically so the computer counts every time it sees a string of more than 3/4/5 ect letters?

Will this print each of the words with more than three letters in them in chronolical order like a dictionary e.g: would it print:
the: 28
find: 22
testing: 20
ect

Because we already have the data in the variable `textin`. The whole file contents is in `textin` as a string separated by `\n` (CRLF for your OS).

Yes, this is a regular expression split on non-word characters. Word characters are `\w`, non-word characters are `\W`. This method does not include punctuation in your words so you are only counting letters.

Yes.

No, I just showed you some steps, I will let you do some research to do the rest. If you run the program you will see what it does.

Hi,

I have tried this code:
What I have tried to do is ask the computer to find all words in my text which are greater than three letters, hence my `finding=re.findall`, then I have attempted to ask the computer to tell me how many of each of these three letter words there are.

``````file = open('file2.txt','r')
#I do not think there will be any words with more than 100 letters so I
#put that as my maximum
count = 0
for word in finding:
word.count()
print(word)
``````

however, in this I get the error of `TypeError: 'builtin_function_or_method' object is not iterable`. So then I tried putting the `read.split()` above the `finding =re.findall(r'\b\w{3,100}\b',read.split()) ` but I got an error of `TypeError: expected string or bytes-like object`

The `re.findall` in your program is not the right syntax. Did you read the docs at re — Regular expression operations — Python 3.11.8 documentation? What did you learn?