How to shortern assign many variables in python code?

Hi all,
I’m new member using python.
Have an example to plot many boxplots to compare each data column of many testers.
My code already work but have a problem.
We have so many collumn need to check.
If we only assign manual row by row. It will be 10 data * 5 tester = 50 rows need to manual assign at this example
How can we assign variables for 100 data columns with 10 testers? :sleepy: :sleepy: :sleepy:

import matplotlib.pyplot as plt
import pandas as pd
 
file = 'E:/Document/python/Book1.xlsx'
df = pd.read_excel(file)
data1_tester1 = df[df['Tester']=='Tester 1']['Data1']
data1_tester2 = df[df['Tester']=='Tester 2']['Data1']
data1_tester3 = df[df['Tester']=='Tester 3']['Data1']
data1_tester4 = df[df['Tester']=='Tester 4']['Data1']
data1_tester5 = df[df['Tester']=='Tester 5']['Data1']
data2_tester1 = df[df['Tester']=='Tester 1']['Data2']
data2_tester2 = df[df['Tester']=='Tester 2']['Data2']
data2_tester3 = df[df['Tester']=='Tester 3']['Data2']
data2_tester4 = df[df['Tester']=='Tester 4']['Data2']
data2_tester5 = df[df['Tester']=='Tester 5']['Data2']
ax1 = plt.subplot(2,1,1)
ax1.boxplot([data1_tester1,data1_tester2,data1_tester3,data1_tester4,data1_tester5])
ax1.set_xticklabels('')
ax2 = plt.subplot(2,1,2)
ax2.boxplot([data2_tester1,data2_tester2,data2_tester3,data2_tester4,data2_tester5],labels=['Tester 1','Tester 2','Tester 3','Tester 4','Tester 5'])
plt.show()

Here is my code result:
Untitled
My file:

Have you considered using lists and loops?

data1_testers = []

for i in range(5):
    data1_testers.append(df[df['Tester']==f'Tester {i + 1}']['Data1'])

data2_testers = []

for i in range(5):
    data2_testers.append(df[df['Tester']==f'Tester {i + 1}']['Data2']

Or something similar.

1 Like

I’t worked. Thanks so much!
But, how can I use for loop with columns.
This example, I have 10 columns data, I can write 10 times for loop.
If the dataFrame is bigger (about 100 columns), how can?

Here’s a more general solution:

import matplotlib.pyplot as plt
import pandas as pd

num_testers = 5
num_data = 2

file = 'E:/Document/python/Book1.xlsx'
df = pd.read_excel(file)

data_tester = []

for data_index in range(num_data):
    data_tester.append([])

    for tester_index in range(num_testers):
        data_tester[-1].append(df[df['Tester'] == f'Tester {tester_index + 1}'][f'Data{data_index + 1}'])

for data_index in range(num_data):
    ax = plt.subplot(2, 1, data_index + 1)

    if data_index == num_data - 1:
        ax.boxplot(data_tester[data_index], labels=[f'Tester {tester_index + 1}' for tester_index in range(num_testers)])
    else:
        ax.boxplot(data_tester[data_index])
        ax.set_xticklabels('')

plt.show()
1 Like

Thanks so much. :heart: :heart: :heart:
This is my result what I want.

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

file = 'E:/Document/python/Book1.xlsx'
df = pd.read_excel(file)
cols = df.columns[2:12]
testers = ['Tester 1', 'Tester 2','Tester 3', 'Tester 4', 'Tester 5']
num_data = np.count_nonzero(cols)
num_testers = np.count_nonzero(testers)
data_tester = []

for data_index in cols:
    data_tester.append([])
    
    for tester_index in testers:
        data_tester[-1].append(df[df['Tester']==tester_index][data_index])

fig = plt.figure(figsize= (10,30))

for data_index in range (num_data):
    ax = plt.subplot(num_data, 1, data_index + 1)
    ax.set_title(cols[data_index],loc='left')
    if data_index == num_data - 1:
        ax.boxplot(data_tester[data_index], labels=[f'Tester {tester_index + 1}' for tester_index in range(num_testers)])
    else:
        ax.boxplot(data_tester[data_index])
        ax.set_xticklabels('')
plt.tight_layout(h_pad=2)
plt.show()

You could determine the number of testers from the spreadsheet.

Generally speaking, reaching for a for loop with vectorized data structures (Numpy/Pandas) in cases like this is a serious and well-known beginner antipattern, as it will generally be much less efficient (sometimes by millions of times or more), concise and idiomatic than just using the native Numpy/Pandas operators, and throws away most of the benefits of using Numpy/Pandas in the first place.

Using some basic Pandas operations, we can massage our data into the format needed for the base Matplotlib boxpolot, without the need for a tortuous and inefficient for loop. Specifically, we convert the data from wide to long format (i.e., making each “dataN” as a separate row rather than a column) using pd.wide_to_long, group on the “DataN” number using df.groupby, create our subplots up front per the number of groups, and finally iterate over the group subplots, plotting a boxplot of “Data” for each grouped by “Tester”. This is much simpler, more concise and more idiomatic than the previous, and also will likely be much faster on larger dataframes:

import matplotlib.pyplot as plt
import pandas as pd

file = 'Book1_data.xlsx'
df = pd.read_excel(file)

groups = pd.wide_to_long(df, stubnames=["Data"], i=["No", "Tester"], j="N").groupby("N")
figure, axes = plt.subplots(len(groups), 1, figsize=(10, 40))
for (group_n, group_df), ax in zip(groups, axes):
    ax.boxplot(group_df["Data"].groupby("Tester").apply(list))
    ax.set_title(f"Data {group_n}")
plt.show()
Plot image

However, this itself is really another XY problem—as you’ve helpfully made clear in your complete and detailed explanation (thanks!). Since your actual goal is just a boxplot by tester and “DataN”, you can achieve the same or better result with one line of code using Pandas’ build-in DataFrame.plot.box method:

import pandas as pd

file = 'Book1.xlsx'
df = pd.read_excel(file)

df.plot.box(column=[c for c in df.columns if "Data" in c], by="Tester", layout=(-1, 1), figsize=(10, 40))
Plot image

This can be passed and returns a matplotlib object, so it is just like your existing Matplotlib plot, just with far less work.

1 Like

I’m using Pandas build-in df.boxplot()
How can fill in another color for each tester?
Ex: test 1 fill red, tester 2 fill blue, tester 3 fill green, …

I’m not aware of a simple way to do that natively with either the Pandas boxplot methods or the Matplotlib plotting functions that they wrap; e.g. this approach on SO might work but involves a lot of complexity.

However, this is easy to do with Seaborn, a higher-level interface to Matplotlib, which also produces nicer-looking plots overall by default. Just swap out df.boxplot for seaborn.catplot, converting your columns from wide to long first, and each tester will be given its own color automatically:

import pandas as pd
import seaborn

file = 'Book1_data.xlsx'
df = pd.read_excel(file)
df_plot = pd.wide_to_long(df, stubnames="Data", i="No", j="N").reset_index()
seaborn.catplot(data=df_plot, x="Tester", y="Data", row="N", kind="box", aspect=1.5, sharey=False)
Resulting plot

See the catplot documentation for more details and examples, including how to set which colors are used for which tester.