My code is not doing anything to my file when I run it

This is my code below:

a = open('play (4).txt','r')
d = a.read()
e = d.replace('\[\[ \]\]','')
f = d.replace('\{ \}','')
a.close()

q = open('play (4).txt','w')
w = q.write(e)
q.write(f)
q.write(g)
q.close()

but when I run it, I can’t see any changes happening in my text.
I wanted to remove all the areas that have some text inside [ ], { } , but it is not working!!

Quick question: are you trying to replace brackets using regular expressions or not? Because now you are not using regexps.

1 Like

@FelixLeg’s diagnosis seems correct!

Even if you were using regular expressions, the strings you’re using would only delete strings looking like [[ ]] and { }, with exactly one space between them.

The correct regular expressions are tricky - the second one would be this: r'\{[^]*\}'. You need the r in front of the string to make sure that the backslashes are interpreted correctly. The first one would be significantly harder to write because of the two {{.

Also, you are writing both e and f which will duplicate most of the file, and you then write g, which isn’t even defined, so I’m not sure how it even works. :astonished:

Also, closing streams manually is not a good idea, for several reasons.

Streams that are obtained by open() act as context managers, so you can do this:

with open('play (4).txt') as fp:
    d = fp.read()
#
# Edit the string d here
#
with open('play (4).txt', 'w') as fp:
    fp.write(d)

Hi,

Thank you for the response.
I was using regexps but I wasnt getting any outcomes. I think at this point I am trying anything I can because I have been working on this for ages.

This is the code I have been working on:

import re
with open('play (4).txt') as play:
    d = play.read()
    
e = re.sub(r'\{[^]*\}','',d)

with open('play (4).txt','w') as plays:
    plays.write(e)

but I get an error: unterminated character set
I am so so lost

okay so I think I fixed it so I do not get the error anymore:

import re
with open('play (4).txt') as play:
    d = play.read()
    
e = re.sub(r'\{\[^\]*\}','',d)

with open('play (4).txt','w') as plays:
    plays.write(e)

however, in the directory I just got the output that I have runcell. It has not changed anything in my play (4).txt s=document (i.e nothing enclosed in [ ] and { } has been remvoed from my file).

The ^ being first in a [] bracket pair is used to write an negation character class. Because of that, the re module’s regex compilator tries to read what characters to negate, and because it found nothing more in the brackets, it throw the mentioned error.

If you want to find a string with many ^-s between { and } then this will work:

r'\{\^*\}`

However I don’t know if that is what you really want. Could you elaborate?

Aha, so that’s what you want :wink:

Then I think this would work:

e = re.sub(r'\{.*?\}','',d)
e = re.sub(r'\[.*?\]','',e)

How it works:

  • first we have got \{ ( \[) in the second line). Because we do want to find a literal { or [, we escape these two characters (because they has got a special meaning in regeps)
  • and then .*? means to find any character as much time as possible. Because we don’t want to match also our closing } (or ]) too, I put a ? after * to create a non-greedy operator
  • and at the end we again are matching } (or ])

So, these two lines probably will do what you want :smiley:

Sorry, just to clarify so in the future I can use these right, the .*? means that every time the computer comes across [ ] or { } in the text file it will remove it rather than just the first time.
I do want to match to the closing brackets as well because I want to remove anything that has is inside the closed brackets including the brackets.
What do you mean by non greedy?

this code has worked mostly thank you! But there have still been sections where random text has been left/ the brackets are still there?
For example:
]

[The Tomb of the ANDRONICI appearing; the Tribunesand Senators aloft. Enter, below, from one side,SATURNINUS and his Followers; and, from the otherside, BASSIANUS and his Followers; with drum and colour
SATURNINUS:Noble patricians, patrons of my right, Defend the justice of my cause with arms, And, countrymen, my loving followers, Plead my successive title with your swords: I am his first-born son, that was the last That wore the imperial diadem of Rome; Then let my father’s honours live in me, Nor wrong mine age with this indignity.
BASSIANUS:Romans, friends, followers, favorers of my right, If ever Bassianus, Caesar’s son, Were gracious in the eyes of royal Rome, Keep then this passage to the Capitol And suffer not dishonour to approach The imperial seat, to virtue consecrate, To justice, continence and nobility; But let desert in pure election shine, And, Romans, fight for freedom in your choice.
[Enter MARCUS ANDRONICUS, aloft, with the crow

you can see that a random closing bracket has been left and a whole chunk of text included in [ ] also. Why has it done this?

Does your sample text contains newline characters? And you want also remove text between [ and ] also when there is a newline inside it?

If so then you have to add re.DOTALL flag like so:

e = re.sub(r'\{.*?\}','',d, flags=re.DOTALL)
e = re.sub(r'\[.*?\]','',e, flags=re.DOTALL)

As for what non-greedy means:
Sometimes, you want to use an operator so “strong” like . (which means to match any character) to match a block of text and then match something else after that. Unfortunately a strong operator may (and will) also “eat” a string that was supposed to match after.

I think some example will show it better: let’s suppose you want to remove all HTML tags from a string. A naïve regexp may look like this:

r'<.*>`

Unfortunately this will change a text from this:

A <a href="">link</a> and some <b>bold</b> text.

into this:

A  text

Because the .* operator have matched as much of text as possible. That’s why it is called a greedy operator. But if we write it like this “.*?”, then it will match only as much text as possible without “eating” the part of regexp that may be matched by the next part.

Did you get this working? I made a few changes and this works. It removes all [brackets] and {braces} from the file.

import sys # Needed for sys.exit()
from os.path import exists

file1 = 'play (4).txt' # Get one file.
# See if the file actually exists.
if not exists(file1):
    print(f"ERROR: File {file1} does not exist")
    sys.exit() # Exit program early.

filein = open('play (4).txt','r')
d = filein.read() # Read whole file.
e = d.replace(r'[','') # Use r strings here.
e = e.replace(r']','') 
e = e.replace(r'{','')
e = e.replace(r'}','')
filein.close()

# Write a different file so we don't lose our original data. 
file2 = 'playout.txt'
fout = open(file2, 'w')
res = fout.write(e)
fout.close()
print("The file now contains:")
print(e)

Thank you for explaining!

For this part:

e = re.sub(r'\{.*?\}','',d, flags = re.DOTALL)
e = re.sub(r'\[.*?\]','',e, flags = re.DOTALL)

why are there two lines for it? Could you not put it on one?
Then I tried this

e = re.sub(r'\(\{.*?\}\)','',d, flags = re.DOTALL)
e = re.sub(r'\(\[.*?\]\)','',e, flags = re.DOTALL)

I basically wanted to remove everything included in ( ) in the text and remove it but it does not seem to have done anything.

Because the regular expression is unable, per (mathematical) definition, to “remember” what it changed before in order to decide what it should change after. In other words: regexp-s can’t process hierarchical data.

If we have made something like this:

r'(\{|\[).*?(\}|\])'

then it would match any text between any two brackets, but not necessary the same bracket type:

A sample {text with [bracket] inside} outside
         ^^^^^^^^^^^^^^^^^^^^

This would produce some strange outputs like the ones you are experiencing now

You could do both at the same time with:

e = re.sub(r'\{.*?\}|\[.*?\]','',d, flags = re.DOTALL)

This is what the pattern means:

\{      Match a literal "{"
.*?     Match multiple characters lazily
\}      Match a literal "}"
|     Or
\[      Match a literal "["
.*?     Match multiple characters lazily
\]      Match a literal "]"

That has worked - thank you so much!!!

Now that I have the file, can it be saved anywhere? Or can I just keep referring to it as ‘file2’ in my code and the computer will know where to refer to.

I think You Mr Fish have just won the Internet :smiley: I would never ever thought the OP wanted to remove only brackets, without the text between them :wink:

1 Like

Oh my god no it hasn’t worked I’m so sorry - you are right I did want to remove the text between them!!!

I’m so sorry - been working on this far too long

I almost split my cup of tea after I’ve read this :smiley: :wink:

Then I think the solution from @MRAB should do the thing :slight_smile:

I’m so sorry! I’m a bit of a nightmare when it comes to these things.

I do not understand why this:

file = open('play (4).txt','r')
d = file.read() # Read whole file.
e = d.replace(r'[.*?]','') 
e = e.replace(r'{.*?}','')
e = e.replace(r'(.*?)','')
file.close()

New_file = 'play 2.txt' #I've written it to a new file. 
New = open(New_file, 'w')
Write_in = New.write(e)
New.close()
print(e)

hasn’t worked.
It hasn’t removed [ ], { } or ( ) with all the text inside of them.

This is so frustrating because I don’t understand why it won’t just do what I am asking.

The .replace method looks for literal text only.

d.replace(r'[.*?]','') will look for the actual character sequence [.*?], i.e. open bracket, dot, star, question mark, close bracket.

1 Like

oh, so can I replace the

.replace

with

re.sub

then with the same code? e.g:

e = re.sub(r'[.*?]','')