I want to create a regex to match a Unicode letter followed by any number of letters, digits, spaces, hyphens, or underscores.
If the first bit was just an ASCII letter then it is easy: [A-Za-z][-\w ]*
But what do I replace [A-Za-z]
with for any Unicode letter?
I know that if I use the regex
module from PyPI I could use [\w--[0-9_]]
or simply \p{L}
but it would be nice to use the std. lib.