0
Hi,
I have a large pandas dataframe, I need to sort each row in a loop, the sort should reflect on the heading. The code is fine but it is really slow. I tried to convert pandas to NumPy 2d array, it gets a little improvement but still slow.
In my code, I have NumPy matrix, and I select one row each time and sort it with the first row. I used argsort. Is there any way to speedup the code, any suggestion?
I may have found a faster way,The same data took only 0.78 seconds using the GPU.
import numpy as np
import torch
import time
n = np.random.random((100000000, 2))
def func_numpy():
global n
start = time.perf_counter()
n.argsort()
end = time.perf_counter()
print("{:.2f} second".format(end - start))
def func_torch():
global n
t = torch.tensor(n)
start = time.perf_counter()
t.sort()
end = time.perf_counter()
print("{:.2f} second".format(end - start))
def func_torch_gpu():
global n
t = torch.tensor(n).cuda()
start = time.perf_counter()
t.sort()
end = time.perf_counter()
print("{:.2f} second".format(end - start))
func_numpy() # 1.60 second
func_torch() # 1.22 second
func_torch_gpu() # 0.78 second