Excel:Regex string contains, double words

I am creating a new excel column using python based on substrings containing positions in a text as follows:

bla bla Left bla part —> Left.

Upper Left side of blabla —> Upper left.

(and more combo’s of positions)

As you can see In many such cases, Double words(e.g. ‘left upper’) are containing the single word which is also searched for with a different command.

To find the double name position I have tried the & operation using regex: df.loc[df[‘Figure’].str.contains(r’^(?=.*Upper)(?=.*Left)’),‘Location’] = ‘Left’. That seems to work. However, running the simple #df.loc[df[‘Figure’].str.contains(‘LEFT’), ‘Location’] = ‘Left’ overwrites all of these regex functions with ‘left’. I have tried the case=false option of str.contains(df.loc[df[‘col’].str.contains(‘Upper’, case=False), ‘position’] = ‘left’) to exlcude the ones that contain 'upper’when creating the ‘left’ value. But that does not work

Any solution?

Thanks in advance

I don’t think that’s something you can do with standard pandas vectorized assignments. You’ll need to loop over all the rows of your DataFrame in Python in one way or another.

Something along the lines of:

import re
def location_from_figure_string(s):
    "some code involving re.match or re.sub"

df['Position'] = [location_from_figure_string(s) for s in df['Figure']]

Using the DataFrame.apply() method might also be an option.

You sound like you can figure out the actual string manipulation function yourself :slight_smile: