xancho
(Dancho)
April 5, 2023, 12:06pm
1
My issue is the same as in this topic:
However, the suggested solutions won’t work for me. Here is my code:
import re
import string
def func(line: str):
# TODO: one-shot split fails
delimiters = string.punctuation[:10]
line_words = re.split(delimiters, line)
# but splitting in steps does
split_line = line.split(" ")
for delimiter in string.punctuation:
for unsplit_word in split_line:
unsplit_word.split(delimiter)
return split_line
print(func('x="y z"'))
Why does re.split fail in this case?
abessman
(Alexander Bessman)
April 5, 2023, 1:14pm
2
Your code does not work because re.split
takes a regular expression as its first argument. string.punctuation[:10]
is !"#$%&'()*
, which is a valid regex but does not match anything in your input. Hence, it is returned unchanged.
To create a regex that matches any of the characters in string.punctuation[:10]
, place them between []
, like so:
>>> re.split(f"[{string.punctuation[:10]}]", 'x="y z"')
['x=', 'y z', '']
It looks like you also want to split at white space. In that case, just add \s
to the regex:
>>> re.split(f'[{string.punctuation[:10]}\s]', 'x="y z"')
['x=', 'y', 'z', '']
As an aside, note that in your code, the following lines to do nothing:
for delimiter in string.punctuation:
for unsplit_word in split_line:
unsplit_word.split(delimiter)
because you don’t save the output of unsplit_word.split(delimiter)
.
1 Like