Finding strings in DF syntax [SOLVED]

Hi

I have a bunch of data in a polars dataframe. Now I like to target specific entries by addressing string entries in a column. The names are in the form of A01-B01 where I like to iterate through part of all entries as such: A01 to A10 and B01 to B48 For clarity: this would mean that I want 480 hits. Is there a neat way of doing this in Python.

Quasi code:

For all rows in DF
   if( A[iterator1]-B[iterator2] = string name in polars DF
      do something cool
for i in range(10):
    for j in range(48):
         <your logic here>

That would be how I would do it. Nested loops of the size you need.

1 Like

Thanks!

I got that part but not how to do the regular expression for the string in Python.

Edit I read your post a bit carelessly.

I tried this, but don’t know how to set up regular expression rules. I only get one hit: A01-B03

for row in df.iter_rows(named=True):
    if(re.match("A[00][01]-B[00][03]", row["ID"])):
        print(row["ID"])

You want something that will match the valid row IDs?

valid_ids = [f'A{a+1}-B{b+1}' for a in range(10) for b in range(48)]
for row in df.iter_rows(named=True):
    if row["ID"] in valid_ids:
        print(row["ID"])

should work and be more efficient than all that regex searching.

Okay, IT works, but how do I cover ranges such as A02-B12. The range A01 to A04 for instance.

Can you print out the results that aren’t hitting so I can see them? That might help (you and I) figure this out. It could be something as simple as changing the if to:

if row["ID"].strip() in valid_ids:

because of unexpected whitespace.

Your code worked like a charm, sorry about that. I just put A10-B10 and got one hit. How do I write to cover the first nine like A01-B01, A02-B01 etc.?

Change the list comprehension to:

[f'A{a+1:02}-B{b+1:02}' for a in range(10) for b in range(48)]

that :02 adds up to 2 leading zeroes to the number.

1 Like

Thanks a lot dude!

1 Like