You can think of a list comprehension as just a slightly specialized version of a for-loop that handles creating and appending elements to a new list for you. It’ll be a bit cleaner and perform a bit better, but from the perspective of NumPy/Pandas, it’s all the same thing, and is going to perform orders of magnitude worse than native NumPy/Pandas vectorized operations on larger arrays/dataframes, especially when you’re nesting it 3 (!) levels deep like here.
Instead of a for loop/comprehension (which, at a high level, are basically different spellings of a very same thing for your purposes), you want to use native vectorized NumPy/Pandas operations to do what you want, at least as many layers of loops as you can (innermost first). These do what you want all in one go, which is both cleaner and far faster than manually iterating (sometimes by millions of times).
In this case, your above example is not reproducible or complete—it references variables Jeff
, Lbo
, and Leff
(typo?), none of which are defined, and their names not very descriptive, so it is difficult for any reader to know what they reprisent. Also, the second code block fails to parse with a SyntaxError, because the first for
line contains a spurious level of indentation. You should always make sure you can copy and paste your examples into a new file and they actually work, or we will not be able to actually use your code without manually trying to fix it and guess what you meant, which is not great for either you or us.
However, I’m just going to assume the Jeff
is the list of people and Lbo
is the list of workdays (the code is no different if you swap them), and that Leff
is a mispelling of Jeff
(or vice versa), as it isn’t obvious how the code is intended to work otherwise. I’m also going to assume len(Jeff) > len(Lbo)
, as if they were equal, you could simply swap the names in the initial call creating the dataframe to get an identical dataframe as your for-loop results in, and if they were less, your code would fail with an error. So, for test purposes, I’ll assume:
Jeff = ["Amina", "Bob", "Cristina", "Deshawn"]
Lbo = [f"Day {n}" for n in range(1, 4)]
df = pd.DataFrame(itertools.permutations(Jeff, len(Lbo)), columns=Lbo)
(Note that I eliminated redundant list
and np.array
calls)
So, we have:
>>> print(df)
Day 1 Day 2 Day 3
0 Amina Bob Cristina
1 Amina Bob Deshawn
2 Amina Cristina Bob
3 Amina Cristina Deshawn
4 Amina Deshawn Bob
5 Amina Deshawn Cristina
6 Bob Amina Cristina
7 Bob Amina Deshawn
8 Bob Cristina Amina
9 Bob Cristina Deshawn
10 Bob Deshawn Amina
11 Bob Deshawn Cristina
12 Cristina Amina Bob
13 Cristina Amina Deshawn
14 Cristina Bob Amina
15 Cristina Bob Deshawn
16 Cristina Deshawn Amina
17 Cristina Deshawn Bob
18 Deshawn Amina Bob
19 Deshawn Amina Cristina
20 Deshawn Bob Amina
21 Deshawn Bob Cristina
22 Deshawn Cristina Amina
23 Deshawn Cristina Bob
Running your corrected block of example code:
P = pd.DataFrame(columns = Jeff, index=df.index)
for k in range(0,len(df)):
for column in P.columns:
L = df.iloc[k].values.tolist()
for l in range(0,len(L)):
if column == L[l]:
P[column].iloc[k] = df.columns[l]
results in
>>> print(P)
Amina Bob Cristina Deshawn
0 Day 1 Day 2 Day 3 NaN
1 Day 1 Day 2 NaN Day 3
2 Day 1 Day 3 Day 2 NaN
3 Day 1 NaN Day 2 Day 3
4 Day 1 Day 3 NaN Day 2
5 Day 1 NaN Day 3 Day 2
6 Day 2 Day 1 Day 3 NaN
7 Day 2 Day 1 NaN Day 3
8 Day 3 Day 1 Day 2 NaN
9 NaN Day 1 Day 2 Day 3
10 Day 3 Day 1 NaN Day 2
11 NaN Day 1 Day 3 Day 2
12 Day 2 Day 3 Day 1 NaN
13 Day 2 NaN Day 1 Day 3
14 Day 3 Day 2 Day 1 NaN
15 NaN Day 2 Day 1 Day 3
16 Day 3 NaN Day 1 Day 2
17 NaN Day 3 Day 1 Day 2
18 Day 2 Day 3 NaN Day 1
19 Day 2 NaN Day 3 Day 1
20 Day 3 Day 2 NaN Day 1
21 NaN Day 2 Day 3 Day 1
22 Day 3 NaN Day 2 Day 1
23 NaN Day 3 Day 2 Day 1
Now, if you know a priori that df
is just the permutations of Jeff
over Lbo
(which, at least going off what you’ve stated, you do), you can simply construct P
directly without df
, knowing only Jeff
and Lbo
by just calling the original itertools.permutations
with swapped arguments. The only complexity is just manually padding Lbo
with NaN
s to the proper length:
Lbo_padded = Lbo + [np.nan] * max(0, len(Jeff) - len(Lbo))
P2 = pd.DataFrame(itertools.permutations(Lbo_padded, len(Jeff)), columns=Jeff)
You can see you get the same result as your code above (at least, ignoring the row order):
P, P2 = (df.sort_values(by=list(df.columns.values)).reset_index(drop=True) for df in (P, P2))
print(P.equals(P2))
Even on the small example dataframe above, this direct approach is fully 100x faster than your original for-loop solution (369 µs vs 36.4 ms, not counting the dataframe creation time for the original solution):
%timeit original_solution()
36.4 ms ± 719 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit new_solution()
369 µs ± 466 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
If we scale this up a modest amount to 6 names by 6 days (720 rows x 6 columns):
Jeff = [f"Name {n}" for n in range(6)]
Lbo = [f"Day {n}" for n in range(5)]
Then the direct creation solution is over 1000x faster (1.52 s vs 1.07 ms):
%timeit original_solution()
1.52 s ± 8.22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit new_solution()
1.07 ms ± 6.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Likewise, on a dataframe of 8 names x 7 days (40k rows x 7 columns), its nearly 10000x faster.
%timeit -r 1 -n 1 original_solution()
2min 56s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
%timeit new_solution()
20.7 ms ± 72.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
If for whatever reason this does not satisfy the (unstated) constraints of the problem, there are other possible solutions, but you’ll need to specify those first 