Anyone please help me understand this --> if site_q05[site_q05 < lowerlimit].shape[0] > 1:

Hello Experts…first of, sorry for this newbie question.
I encountered the code below"

if site_q05[site_q05 < lowerlimit].shape[0] > 1:

I understand the if statement but it has two condtion,
1st = site_q05 < lowerlimit (inside the bracket)
2nd = site_q05[site_q05 < lowerlimit].shape[0] > 1

Please help how to interpret this code…including the shape[0]…thank you.

will be either False or True, which are equal to 0 or 1. The subscription will pick out 1 of at least 2 items in site_q05. What is odd is that site_q05 is supposed to be both subscriptable and comparable to lowerlimit, which would seem to be a number, which is not subscriptable. That aside, the item from site_q05 is subscripted and the resulting item compared with 1. The expression might be clearer with parentheses.

((site_q05[site_q05 < lowerlimit]).shape[0]) > 1

Oh thank you @tjreedy…so you mean the:
1st = will either true or false…let say site_q05 is greater than lowerlimit then it is a false. Thus, it will not proceed?
On other hand, let say the site_q05 is less than lowerlimit, it is true…Thus, it will proceed to check if spare[0] is greater than 1?

Please help correct me if i am wrong…thanks

Not necessarily. We don’t know what kind of things site_q05 and lowerlimit are, but site_q05 is probably not a list, since most objects do not support less/greater than comparisons with lists.

I would hazard to guess that site_q05 is a numpy.array or a pandas.DataFrame, based on the presence of the shape property. In which case site_q05 < lowerlimit is an index mask (e.g. [True, False, False, True, ...]), and site_q05[site_q05 < lowerlimit] is a subarray of site_q05 containing only the indices where the mask are True.

site_q05[site_q05 < lowerlimit].shape[0] is then the size of the first dimension of the subarray, and the if checks whether or not it is greater than 1.

2 Likes

Alexander, that makes sense. Rhett, unless you know something about numpy and pandas, this expression will not make sense. I know just a bit and was puzzled because I have read about but never seen an index mask in the wild. In any case, if there is no exception, the whole expression will be evaluated and the if body will execute if (obscure value) > 1.

1 Like

Hello @tjreedy …here is more complete code…really need to understand this…sorry that I continue disturbing you…please help explain…I dont know who to ask already,…thanks

q05 = df[col].quantile(0.10)
q50 = df[col].quantile(0.5)
q95 = df[col].quantile(0.90)
sigma = 6
val = 1.55 

groupby_site = df.groupby("ch_num")  
                          
lowlimit = q50 - (sigma * (q50 - q05) / val)
site_q05 = groupby_site[col].quantile(0.05)
if ((site_q05[site_q05 < lowlimit]).shape[0]) > 1:
        print("Limit error")

Especially at the if statement area…thanks again

Ok so what i did…I print the → site_q05 < lowlimit
Below is the result:

ch_num
0     False
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8      True
9     False
10    False
11    False
12    False
13    False
14    False
15    False
16    False
17    False
18    False
19    False

Total 20 and ch8 is the only true

Then I print the → ((site_q05[site_q05 < lowlimit]).shape[0])
the result is 1…my question is, why 1? It’s because have only 1 True?

These are the first, fifth, and ninth deciles of the data in the col column of the df DataFrame (the fifth decile is more commonly known as the median). The q05 and q95 variables seem improperly named; they should be either q10 and q90 or they should be created with .quantile(0.05) and .quantile(0.95), respectivelly.

sigma is most likely a standard deviation of some kind.

I cannot guess what val might refer to.

groupby_site = df.groupby("ch_num") splits the data into groups according to the values of the column named “ch_num”.

lowlimit = q50 - (sigma * (q50 - q05) / val) calculates a scalar value of some kind, based on the standard deviation, median, first decile, and val. I do not recognize the formula.

site_q05 = groupby_site[col].quantile(0.05) calculates the 5% quantile of each of the groups.

As discussed above, (site_q05[site_q05 < lowlimit]).shape[0] is the size of the first dimension of a subframe created by taking only the indices in site_q05 where the values of site_q05 are less than lowlimit. If the size of this dimension is not at least 2, “Limit error” is printed.

1 Like

Since the index mask site_q05 < lowlimit contains only one True value, the subframe site_q05[site_q05 < lowlimit] contains only the value at that specific index. The size of the subframe’s first dimension is therefore 1.

1 Like

Hello @abessman and @tjreedy…I experiment some code below:

s = pd.Series([1,3,5,1])
xx = 3
s[s < xx]

out
0    1
3    1
dtype: int64

Can someone please explain why result shows
0 1
3 1

Thanks in advance.

A pandas.Series instance has an index and a series of values. The leftmost column is the index. When you create a subarray using an index mask, it contains only the values and indices where the mask is True. In this case, the first (index 0) and final (index 3) values met the mask condition.

1 Like