RegEx subtraction classes

Hello everybody!

I have started using RegEx module (last 2020 module) in Python 3.8.1, but for some reason I can’t manage to resolve the simple expression to select only consonants.
I have tried with these pieces of code:

s = “test string”
f = regex.findall("[a-z]–[aeiou]",s)

s = “test string”
f = regex.findall("[[a-z]–[aeiou]]",s)

s = “test string”
f = regex.findall("[a-z–aeiou]",s)

s = “test string”
f = regex.findall("[a-z]&&[^aeiou]",s)

In each case, I get an empty list.
Can someone kindly tell me where is my error? Something about escaping characters? I don’t have a clue…

Many many thanks!!

Hi Fabrizio!

There is a note in the standard library documentation:

Support of nested sets and set operations as in Unicode Technical Standard #18 might be added in the future. This would change the syntax, so to facilitate this change a FutureWarning will be raised in ambiguous cases for the time being. That includes sets starting with a literal '[' or containing literal character sequences '--' , '&&' , '~~' , and '||' . To avoid a warning escape them with a backslash.

So the feature that you are after isn’t part of the normal regular expression repertoire and isn’t implemented. I’m afraid that if you want a set of consonants, you are going to have to write it explicitly: [bcdfghj-np-tv-z]

This can actually be achieved with a simple negative look ahead:

>>> import re
>>> s = "test string"
>>> f = re.findall("(?![aeiou])[a-z]", s)
>>> print(f)
['t', 's', 't', 's', 't', 'r', 'n', 'g']

This can also be achieved with the regex library:

>>> import regex
>>> s = "test string"
>>> f = regex.findall("[[a-z]--[aeiou]]", s, regex.V1)
>>> print(f)
['t', 's', 't', 's', 't', 'r', 'n', 'g']

Hi James!
Thank you so much for your super-swift answer!
To be honest, I found it to be possible.
You need to enable the version 1 of regex module, that is by default set to regex 0. Then, all the expression I have posted above are correctly processed.
See the following for reference:

Hi Rhodri,

You may have missed that Fabrizio is talking about the third-party regex
module, not the stdlib re module.

It has a much more extensive set of regex features than re.

Thanks Steven, I had indeed missed that. Knowing nothing about the regex library, I’ll bow out now.