Excel:Regex string contains, double words

wol · March 9, 2021, 2:02pm

I am creating a new excel column using python based on substrings containing positions in a text as follows:

bla bla Left bla part —> Left.

Upper Left side of blabla —> Upper left.

(and more combo’s of positions)

As you can see In many such cases, Double words(e.g. ‘left upper’) are containing the single word which is also searched for with a different command.

To find the double name position I have tried the & operation using regex: df.loc[df[‘Figure’].str.contains(r’^(?=.*Upper)(?=.*Left)’),‘Location’] = ‘Left’. That seems to work. However, running the simple #df.loc[df[‘Figure’].str.contains(‘LEFT’), ‘Location’] = ‘Left’ overwrites all of these regex functions with ‘left’. I have tried the case=false option of str.contains(df.loc[df[‘col’].str.contains(‘Upper’, case=False), ‘position’] = ‘left’) to exlcude the ones that contain 'upper’when creating the ‘left’ value. But that does not work

Any solution?

Thanks in advance

tjol · March 10, 2021, 8:26am

I don’t think that’s something you can do with standard pandas vectorized assignments. You’ll need to loop over all the rows of your DataFrame in Python in one way or another.

Something along the lines of:

import re
def location_from_figure_string(s):
    "some code involving re.match or re.sub"

df['Position'] = [location_from_figure_string(s) for s in df['Figure']]

Using the DataFrame.apply() method might also be an option.

You sound like you can figure out the actual string manipulation function yourself

Topic		Replies	Views
How to perform advance encoding and manipulation mention below Python Help help	1	320	September 18, 2021
Text cleaning for data analysis Python Help	2	281	May 4, 2023
Two words in cell get separated with pandas melt Python Help	0	427	September 14, 2021
Dataframes and Dictionary 'Lookups' Python Help help	21	2062	April 23, 2023
Search Functionality in a text file Python Help	2	270	August 20, 2022

Excel:Regex string contains, double words

Related Topics