Substring replace using variables as pattern and replace string

I have a file containing records with 4 values, eg:
1,Genesis,Gen,Gen
2,Eksodus,Eks,Exo
3,Levitikus,Lev,Lev
4,Numeri,Num,Num
5,Deuteronomium,Deut,Deu
6,Josua,Jos,Jos
7,Rigters,Rig,Jdg
8,Rut,Rut,Rth
9,1 Samuel,1 Sam,1Sa

I need to read these into an array and I then need to read a second file and in each record of this file replace occurrences of the 3rd field in the first file (Gen, Eks, Lev, Num, Deut etc.) with the first field (1,2,3, etc. followed by a fullstop, eg 1., 2., 3.).

So if the second has a record reading:

Ps. 19:3; Num 3:1…
This needs to be replaced by:
19.19:3; 4.3:1…

How would I achieve this? I have just started learning Python, so I am really a novice.

I have a sample set of files and my attempt which I can send to someone who is prepared to look at it.

Have a look at the csv module.

My problem is not so much in getting the data into an array, my “replace” does not work.

Here is my code:

# Convert
import codecs
import re
def FixXrefs(xref):
      global bookname
      global bookno
      lg.write("----- Entered function\n")
      instuff=xref
      otstuff=""
      n=len(bookname)
      for i in bookno:
          j=int(i)-1
          pattern = bookname[j]+" "
          replaceWith = i+"."
          #lg.write("|"+pattern+"|"+replaceWith+"|\n")
          # otstuff=instuff.replace(pattern, replaceWith)
          #ostuff=re.sub(pattern,replaceWith,instuff)
          xs=instuff.find(pattern)
          xe=xs+1
          print (xs, xe, pattern)
          loop=True
          if xs>0:
             while loop==True:
                   if instuff[xs:xe]==pattern:
                      print("Pattern: ",pattern,"found") 
                      loop=False
                   xe=xe+1   
             otstuff=instuff[0:xs]+replaceWith+instuff[xe:len(instuff)]
      instuff=otstuff.replace(":", ".")
      return instuff
              
lg=open("log.txt", "w")
bookname=[]
bookno=[]
#set up bookname/bookno table
bn=codecs.open("Boeke.txt", "r", "utf-8")
while True:
      l=bn.readline()
      l.strip('\n')
      if ("" == l):
            lg.write("End of bookname/no file reached\n")
            break
      #parse string
      words = l.split(",")
      bookno.append(words[0])
      bookname.append(words[2])
bn.close()
# Now read the file to reformat
fn=codecs.open("Conv.txt", "r", "utf-8")
while True:
      irec=fn.readline()
      pline="INREC\n"+irec
      lg.write(pline)
      if ("" == irec):
        lg.write ("End of Conv file reached\n")
        break;
      nxref = FixXrefs(irec)
      pline="NXREF\n"+nxref
      lg.write(pline)
lg.close()
fn.close()

I can see that you’re writing the ‘fixed’ line to “log.txt”, but not anywhere else.

Incidentally, is there any particular reason why you’re using codes.open(...) instead of just open(...)?

Hi Matthew,
I am just writing it to log.txt to see the result (when it works it will be written to an output file.
The input files are utf-8 and I thought that is what is required (the codecs.open).
I tried without the codes and get:

Traceback (most recent call last):
File “C:/Python311/Test.py”, line 52, in
irec=fn.readline()
File “C:\Python311\Lib\encodings\cp1252.py”, line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: ‘charmap’ codec can’t decode byte 0x9d in position 353: character maps to

Instead of:

bn=codecs.open("Boeke.txt", "r", "utf-8")

you’d use:

bn = open("Boeke.txt", "r", encoding="utf-8")

and so on.

Incidentally, Windows stores UTF-8 in files with a BOM at the start, but other OSes like Linux don’t, so, when reading, it’s probably better to specify the encoding as “utf-8-sig” instead of “utf-8”.

Very very few Windows programs store a BOM and they should be considered buggy. Don’t coddle them. Just use “utf-8”.