[SOLVED] Problem in version - delete the topic

Just found out that was a problem in my 3.8 instalation.
Worked in another version.
Sorry for the mess, please, some admin remove the topic.

Can you provide a failing example? One thing to note is you should be using r strings for regular expression patterns. These treat \ as literal so you can properly specify backreferences.

Seems to be working here:

>>> re.sub(r'[^\x00-\x7F]', '', '# $ % . , / ơ')
'# $ % . , / '
1 Like

Can you elaborate on what you’re actually trying to do?

Thanks, I didn’t know about these “r” strings, that’s why I say that regexp in python is painful compared to regexp in Js and Php that are much cleaner.

I tried here with this “r string”… not worked as well… maybe is something in my instalation… maybe.

If you read carefully my topic, I say “remove non-ascii from a text” with re.sub
Please read it again fully.

I’m left wondering if you’re trying to pretend that non-ASCII text is just weird funny characters that don’t matter, and so you want to stick your head in the sand and pretend that we’re in the 1990s. Python is the wrong tool for that job, and it’s not a bug.

Regardless of why they’re doing it, it would be good to see exactly what the code is doing and what’s going wrong. Because as @facelessuser’s example shows, there isn’t a bug in re.

1 Like

Yes, but it would be good to see the specific failing code. Without
that, we cannot make suggestions of changes.

Also, remember that if you’re modifying a text file, the bytes in the
file are an encoding of the text stored. It is meaningless to read and
rewrite the file without knowing its encoding.

These days UTF-8 is very common, but it is by no means universal.

Can you show us:

  • the exact values you’re using for the text variable?
  • failure code which uses r'' strings to specify the regexps?

Eg:

 text = 'something here'
 print(repr(re.sub(r'[^\x00-\x7F]', ' ' , text)))

Note the r' "raw string to ensure that you get an actual backslash in
the regexp definition.

1 Like