Accessing A Csv Script

4ulalia · August 20, 2024, 1:46pm

Hello! I am trying to code an algorythm that calculates long distance electrostatic interactions. I have a csv file where i have the average diatnce between two aminoacids, used to calculate the value of the free energy of these interactions. I introduce a sequence and, if there is a pair of charged aminoacids (not necessarly consecutive) i want it to get the value of r. It seems that it’s not doing this right now and i am getting really low values of G.
The csv code looks somethin like:

||Pos| i+1| i+2| i+3|…
|—|—|—|—|—|
DD| 5.5| 7.3| 5.5|
|DE| 5.8| 8.6| 6.4|
|DK| 7.3| 10.5| 6.2|
|DR| 7.2| 11.7| 5.8|
and so on.

This is my current code:

electroH_df = pd.read_csv("electro_helix.csv")
electroC_df = pd.read_csv("electro_coil.csv")

def calculate_energies(segment, electro_df, temperature_cels):
    K = np.sqrt(8 * np.pi * (e**2) * N_A * I / (1000 * k_b * (temperature_cels + 273)))

    length = len(segment)
    energies = np.zeros(length)  # Initialize energies array

    for i in range(length):
        total_energy = 0.0
        for j in range(i + 1, length):
            res1 = segment[i]
            res2 = segment[j]

            if res1 in charge_dict and res2 in charge_dict:
                charge1 = charge_dict[res1]
                charge2 = charge_dict[res2]

                key = f"{res1}{res2}"

                if key in electro_df['Pos'].values:
                    row = electro_df[electro_df['Pos'] == key].iloc[0]
                    position_difference = abs(j - i)
                    
                    if position_difference < len(row):
                        r = row.iloc[position_difference]
                        r = r * 1e-10  # Convert to meters
                        
                        if r > 0:
                            G_electro = ((e**2) * charge1 * charge2) / (3 * np.pi * epsilon_0 * epsilon_r * r) * np.exp(-K * r)
                            energies[i] += G_electro
                            energies[j] += G_electro  # Add interaction to both residues
                            print(f"Calculated K: {K}")
                            print(f"Sample r values: {r}")
                            print(f"Sample G_electro values: {G_electro}")


    return energies

def electrostatic_energy(segment, electroH_df, electroC_df, temperature_cels):
    
    helix_energies = calculate_energies(segment, electroH_df, temperature_cels)
    coil_energies = calculate_energies(segment, electroC_df, temperature_cels)
    
    energy_difference = [h - c for h, c in zip(helix_energies, coil_energies)]
    total_energy_difference = sum(energy_difference)
    
    print(f"Helix energies: {helix_energies}")
    print(f"Coil energies: {coil_energies}")
    print(f"Energy difference: {energy_difference}")
    print(f"Total energy difference: {total_energy_difference}")

    return total_energy_difference

The main part of the code looks like:

# constant\
R = 1.987204258e-3
epsilon_0 = 8.854e-12  # Permittivity of free space in F/m
Q = 0.5*1.60217663e-19   # Assumed total charge of the helix macrodipole : "half a charge" from 1998 paper
e= 1.60217663e-19 #Coulombs
epsilon_r = 85.763 # Relative permittivity for water (this can be adjusted based on the environment)
N_A = 6.022e23 #Avogadro's number
I = 0.1 #DEfAULT (should be an input)
k_b = 1.380649e-23 # m^2 kg s^-2 K^-1

# define user input
def main():
    print("Welcome to the helicity calculator!")
    sequence = input("Enter your sequence: ").strip().upper()
    while True:
        try:
            temperature_kelv = float(input("Enter your temperature in K: "))
            break
        except ValueError:
            print("Invalid input. Please enter a numeric value for temperature.")
    
    # convert kelvin to celsius
    temperature_cels = temperature_kelv - 273.0
    
    # extract segments after obtaining sequence
    segments = extract_segments(sequence)

    helix_length = calculate_alpha_helix_length(sequence)
    
    # helicity
    results = calculate_helicity(sequence, temperature_cels, helix_length)
    
    # print results (aa with corresponding helicity)
    print("\nHelical propensity per residue:")
    for result in results['helical_propensity']:
        aa = result[0]
        propensity = result[1]
        print(f"AA: {aa}, Hel: {propensity}") # AA for amino acid, Hel for helical propensity
    
    print(f"\nAverage helical propensity: {results['percent_helix']}")
    
    # Print electrostatic energies
    print("\nElectrostatic energies per residue:")
    for i, energy in enumerate(results['electrostatic_energies']):
        print(f"Residue {sequence[i]} at position {i + 1}: Electrostatic energy = {energy}")

I believe I’m not accessing correctly to the csv file, but if someone sees another way to implement my problem i would really appreciate. Thank yoiu !

c-rob · August 21, 2024, 10:12am

I don’t know how to do what you are doing, but looking at your code, I don’t see any obvious errors.

Using the python debugger step through each line. After each line check the value of variables.

Is it what you expect?
Do you have parentheses in the right place to ensure correct order of operations?
Does it make sense, for debugging purposes, to break up your formulas into multiple steps?

Run the program in debug mode like this:

python -m pdb myprog.py parameters go here

There are other debuggers available for Python as well. pdb is just one of them that comes with Python.