I am an old guy working in the area of NLP. I cut my teeth on Perl and C. I am learning Python now and am trying to write a script that that takes input from a text file: preprocessor.rul which has the structure given below and applies it to another text file: corpus.txt which is file to be processed and writes the result to a third file: corpus.out. All the files are in UTF8 format since I work with Indic scripts.
The structure of the preprocessor.rul is as under:
The character to be changed is on the left hand side and the changed character on the right hand side. The two separated by a greater than sign. An example is given below, in Latin for ease of comprehension
The corpus.txt is a text fie in UTF8 format and the resultant output should also be in UTF8
I have attempted a script but it does not give the desired result,
with fileinput.FileInput(preprocess.rul, inplace=True, backup=’.bak’) as file:
for line in file:
print(line.replace(text_to_search, replacement_text), end=’’)
Any help given to set this script working would be greatly appreciated.
Since I am new to the forum, please excuse any goof-up in posting.