I have written a short program that creates a dictionary with the number of each letter in a text file. Thanks to your help here, I am quite happy with the result.
I keep however having one irritating error.
The code below should perform the letter counter for a given language, but it gives the message:
NameError: name ‘dutch’ is not defined.
What I try to do is put the language in the function argument and then look for the file that has the name language.txt and run the program on it.
I asusme that this is again an issue with python no accepting a string as part of a file path, but I have tried several ways to work around it, with no effect.
The program works when I enter ‘dutch.txt’ as argument, but I would prefer to just be able to enter the language as argument.
This is the code:
"""
Dit programma maakt een dictionary van een gegeven tekst met daarin per letter het getelde aantal
"""
import os
from pathlib import Path
languages = ["dutch", "english", "french", "german", "italian", "spanish"]
source_dir = Path('C:/Users/gwovi/PycharmProjects/VickyGwosdz_LanguageDetector')
files = source_dir.glob('*.txt')
# Function that creates a dictionary with number of each letter
def letterfrequency(input_tekst):
d = dict() # maak een nieuwe dictionary voor de te tellen file
for letter in input_tekst: # als de letter al in de dict voorkomt, tel er 1 bij, anders zet teller op 1
if letter in d:
d[letter] += 1
else:
d[letter] = 1
return d
# Combine all txt file per language in one large input file
for file in os.listdir(source_dir):
if file.endswith(".txt"):
i = 0
# Iterate through the list of languages
for i in range(len(languages)):
language = languages[i]
if language in file:
# Per language in the list, open every file, read and close
# Write contents of the file to a new txt file consecutively
with open(file, 'r', encoding='utf-8') as f2:
input_txt = f2.read()
with open(language + '.txt', 'a+', encoding='utf-8') as f3:
f3.write(input_txt)
i += 1
# Analyse requested language
def languagefrequencycounter(taal):
tetellentaal = taal.read_text(encoding='utf_8')
no_space_taal = tetellentaal.replace(" ", "") # Remove all spaces
getelde_taal = letterfrequency(no_space_taal)
with open(taal + '-geteld.txt', 'w', encoding='utf-8') as f4:
f4.write(str(getelde_taal))
languagefrequencycounter(dutch)