Column-wise calculations

quickhelps · January 5, 2022, 5:39pm

It seems simple but I can’t seem to find an efficient way to solve this in Python 3: Is there is a loop I can use in my dataframe that subtracts every column after the first column, from the first column, so that I can add that new subtracted column to a new dataframe?

Then I would like to move on to subtract every column after the second column, from the second column, and follow the same logic throughout the 18 columns where I append or add that new subtracted column to the new dataframe

Here are first 4 lines of code for the 1st and 2nd columns I am using to my dataframe (spotrates), but I have 18 columns and I know it would be easier to create a loop, and I am adding on to the end of my existing dataframe, when I want the subtracted column to be inserted to a new one.

spotrates['3m-on'] = spotrates.iloc[:,1] - spotrates.iloc[:,0]
spotrates['6m-on'] = spotrates.iloc[:,2] - spotrates.iloc[:,0]
spotrates['6m-3m'] = spotrates.iloc[:,2] - spotrates.iloc[:,1]
spotrates['9m-3m'] = spotrates.iloc[:,3] - spotrates.iloc[:,1]

Here is the definition I am trying to create, but it only solves for the difference of the first 2 columns

def swaps(data):
    i<len(data.columns)
    col1 = data.iloc[:,i]
    col2 = data.iloc[:,i+1]
    for col1, col2 in data.columns:
        return col2 - col1

TobiasHT · January 5, 2022, 8:25pm

Hello Nashmia,
so I think data frames are in pandas, and I’m not so familiar with the framework, that’s why I’m going to create my own version of a data frame based on lists to illustrate my simple solution. I should first of all say that there are probably many other ways of implementing an algorithm for this problem in faster and more efficient ways, but this solution is fairly sufficient for a simple to moderate use case like this one. Also I’m not sure how well versed you are with python generally, but I’m sure this solution won’t throw you off, so here goes.

import random

# create a matrix of random numbers with 18 rows and 18 columns
data_frame = [[random.random() for _ in range(18)] for _ in range(18)]

storage = [] #container that is going to store the results of our analysis

def subtract (a,b):
    # function to call to subtract independent components of the matrix
    return a - b

def run_analysis (frame,store):
    #find the first column we shall use
    for first_col_index in range(len(frame)):
        temp = [] # temporary store to place value from analysis of columns greater than current column
        for sec_col_index in range(len(frame)):
            # find second column with which we shall subtract values from the first column
            if (sec_col_index <= first_col_index):
                # if the column is below the current column or is equal to the current column, then skip to next column
                continue
            else:
                # if column above our current column, the subtract values in the column
                # and keep the result in temp
                result = [r for r in map(subtract,frame[sec_col_index],frame[first_col_index])]
                temp.append(result)
        # save the complete analysis in the store
        store.append(temp)

Now you can run the analysis on our data frame but calling the run_analysis function with the data_frame and storage elements.

run_analysis(data_frame,storage)

One thing I was confused with your issue is what you were planning to do with the results and where (and how) you were planning to store them. So in my solution, I just placed them in the storage list.

quickhelps · January 5, 2022, 10:00pm

Thank you so much! I’ll try this out in a bit and let you know if I encounter any issues, I’m not the most well-versed in python unfortunately.

Once I get the results, I should have multiple columns that contain the subtraction of the columns from the 1st column , which I would like to assign to a new dataframe, rather than a list, so that I can perform statistical analysis on the newly derived dataframe. I hope that was a little more clear!

I’ve attached the beginning of my dataframe as a picture here:

quickhelps · January 5, 2022, 10:02pm

Essentially, I want a new dataframe that has the ‘Dates’ as the index and the rest to look like the following: