Problem with calculation (market data) with pandas

MrQuant1 · January 12, 2024, 12:41am

Hello dear forum, my first post here.

I’m currently stuck and have been working on the same problem for days.

I am currently writing a code that calculates certain parameters based on market data. I am currently trying to calculate the trading range of the Asian session but I am encountering the following problem…
As soon as I filter the Asian time span and the first value is not 00:00:00, the data for the Asia HOTD/LOTD M15 is no longer displayed and calculated (NaN), but the time data derived from it is, strangely enough, correct (Time Asia LOTD/HOTD M15). I need the values so that I can then calculate the difference and thus obtain the trading margin.

More details on my data source:

Time zone: EET
Columns: Date, Time (15 minute intervals, representing 15 minute trading candles), Open, High, Low, Close, Volume (important: the high/low of the day can occur in Low or High the highest value is not always in High and the lowest not in Low)

Here is a sample of the data source of one day (15m data)

22/10/2002;00:00:00;69.28;69.28;69.25;69.25;0
22/10/2002;00:15:00;69.245;69.25;69.2;69.22;0
22/10/2002;00:30:00;69.235;69.29;69.215;69.28;0
22/10/2002;00:45:00;69.28;69.28;69.26;69.275;0
22/10/2002;01:00:00;69.29;69.3;69.275;69.29;0
22/10/2002;01:15:00;69.28;69.285;69.26;69.285;0
22/10/2002;01:30:00;69.29;69.32;69.26;69.275;0
22/10/2002;01:45:00;69.265;69.285;69.245;69.255;0
22/10/2002;02:00:00;69.255;69.26;69.21;69.23;0
22/10/2002;02:15:00;69.225;69.275;69.225;69.255;0
22/10/2002;02:30:00;69.26;69.29;69.235;69.235;0
22/10/2002;02:45:00;69.245;69.435;69.24;69.42;0
22/10/2002;03:00:00;69.43;69.495;69.42;69.435;0
22/10/2002;03:15:00;69.425;69.485;69.415;69.46;0
22/10/2002;03:30:00;69.465;69.495;69.375;69.46;0
22/10/2002;03:45:00;69.45;69.51;69.445;69.5;0
22/10/2002;04:00:00;69.51;69.62;69.51;69.585;0
22/10/2002;04:15:00;69.59;69.59;69.545;69.57;0
22/10/2002;04:30:00;69.575;69.605;69.56;69.595;0
22/10/2002;04:45:00;69.6;69.66;69.585;69.655;0
22/10/2002;05:00:00;69.66;69.67;69.6;69.61;0
22/10/2002;05:15:00;69.615;69.71;69.615;69.695;0
22/10/2002;05:30:00;69.69;69.69;69.63;69.675;0
22/10/2002;05:45:00;69.67;69.69;69.64;69.685;0
22/10/2002;06:00:00;69.68;69.68;69.585;69.59;0
22/10/2002;06:15:00;69.58;69.6;69.565;69.575;0
22/10/2002;06:30:00;69.585;69.585;69.515;69.53;0
22/10/2002;06:45:00;69.525;69.54;69.475;69.515;0
22/10/2002;07:00:00;69.505;69.51;69.485;69.505;0
22/10/2002;07:15:00;69.5;69.5;69.375;69.395;0
22/10/2002;07:30:00;69.405;69.465;69.385;69.42;0
22/10/2002;07:45:00;69.425;69.43;69.295;69.365;0
etc...

import pandas as pd

df_hourly = pd.read_csv(r"C:\Users\XXXX\OneDrive\Desktop\BACKTEST\audjpy\audjpy-15m-eet.csv", sep=';', header=None)
df_hourly = df_hourly.rename(columns={0: 'Date', 1: 'Time', 2: 'Open_hourly', 3: 'Low_hourly', 4: 'High_hourly', 5: 'Close_hourly'})

required_times = pd.date_range('00:00:00', '23:45:00', freq='15T').strftime('%H:%M:%S').tolist()

complete_hourly_dates = df_hourly['Date'][
df_hourly.groupby('Date')['Time'].transform(
lambda x: set(x).issuperset(required_times)
)
]

df_daily = pd.read_csv(r"C:\Users\XXXX\OneDrive\Desktop\BACKTEST\audjpy\audjpy-1d.csv", sep=';', header=None)
df_daily = df_daily.rename(columns={0: 'Date', 1: 'Time', 2: 'Open_daily', 3: 'Low_daily', 4: 'High_daily', 5: 'Close_daily'})

df_hourly = df_hourly[df_hourly['Date'].isin(complete_hourly_dates)]

merged_df = pd.merge(df_hourly, df_daily[['Date', 'Open_daily', 'Close_daily']], on='Date', how='left', suffixes=('_hourly', '_daily'))

reference_df = df_daily[['Date', 'Low_daily', 'High_daily']]

merged_df = pd.merge(merged_df, reference_df, on='Date', how='left', suffixes=('_hourly', '_daily'))

merged_df['HOTD (D1)'] = merged_df.groupby('Date')['Low_daily'].transform('min')
merged_df['LOTD (D1)'] = merged_df.groupby('Date')['High_daily'].transform('max')

merged_df['LOTD (M15)'] = merged_df.groupby('Date')[['Low_hourly', 'High_hourly']].transform('min').min(axis=1)
merged_df['HOTD (M15)'] = merged_df.groupby('Date')[['Low_hourly', 'High_hourly']].transform('max').max(axis=1)

index_hotd_m15 = merged_df.groupby('Date')[['Low_hourly', 'High_hourly']].idxmax().values.flatten()
index_lotd_m15 = merged_df.groupby('Date')[['High_hourly', 'Low_hourly']].idxmin().values.flatten()

time_hotd_m15 = merged_df.loc[index_hotd_m15, ['Date', 'Time']].values
time_lotd_m15 = merged_df.loc[index_lotd_m15, ['Date', 'Time']].values

time_hotd_m15_df = pd.DataFrame(time_hotd_m15, columns=['Date', 'Time_HOTD_M15'])
time_lotd_m15_df = pd.DataFrame(time_lotd_m15, columns=['Date', 'Time_LOTD_M15'])

merged_df = pd.merge(merged_df, time_hotd_m15_df, on='Date', how='left')
merged_df = pd.merge(merged_df, time_lotd_m15_df, on='Date', how='left')

merged_df['M15_HOTD_LOTD_Diff'] = (merged_df['HOTD (M15)'] - merged_df['LOTD (M15)']) / 0.01

#************************* THIS IS WHERE THE PROBLEM ARISES, 03:00:00 to 07:00:00 SHOULD BE, BUT AS SOON AS ANYTHING OTHER THAN 00:00:00 IS IN THE START TIME, ASIA HOTD/LOTD M15 WILL NO LONGER BE DISPLAYED TO ME ! NaN

asia_df = merged_df[(merged_df['Time'] >= '00:00:00') & (merged_df['Time'] <= '07:00:00')]

#************************* THIS IS WHERE THE PROBLEM ARISES, 03:00:00 to 07:00:00 SHOULD BE, BUT AS SOON AS ANYTHING OTHER THAN 00:00:00 IS IN THE START TIME, ASIA HOTD/LOTD M15 WILL NO LONGER BE DISPLAYED TO ME ! NaN

asia_lotd_m15 = asia_df.groupby('Date')[['Low_hourly', 'High_hourly']].transform('min').min(axis=1)
asia_hotd_m15 = asia_df.groupby('Date')[['High_hourly', 'Low_hourly']].transform('max').max(axis=1)

merged_df['Asia LOTD M15'] = asia_lotd_m15
merged_df['Asia HOTD M15'] = asia_hotd_m15

merged_df['Asia HOTD LOTD Diff'] = (merged_df['Asia HOTD M15'] - merged_df['Asia LOTD M15']) / 0.01

index_asia_hotd_m15 = asia_df.groupby('Date')[['Low_hourly', 'High_hourly']].idxmax().values.flatten()
index_asia_lotd_m15 = asia_df.groupby('Date')[['High_hourly', 'Low_hourly']].idxmin().values.flatten()

time_asia_hotd_m15 = merged_df.loc[index_asia_hotd_m15, ['Date', 'Time']].values
time_asia_lotd_m15 = merged_df.loc[index_asia_lotd_m15, ['Date', 'Time']].values

time_asia_hotd_m15_df = pd.DataFrame(time_asia_hotd_m15, columns=['Date', 'Time_Asia_HOTD_M15'])
time_asia_lotd_m15_df = pd.DataFrame(time_asia_lotd_m15, columns=['Date', 'Time_Asia_LOTD_M15'])

merged_df = pd.merge(merged_df, time_asia_hotd_m15_df, on='Date', how='left')
merged_df = pd.merge(merged_df, time_asia_lotd_m15_df, on='Date', how='left')

df_news = pd.read_csv(r"C:\Users\xxxx\OneDrive\Desktop\BACKTEST\News\neue_news_eet_formatted2.csv", sep=';', header=None)
df_news = df_news.rename(columns={0: 'Date', 1: 'Time', 2: 'Currency', 3: 'Significance', 4: 'NewsName'})

filtered_news = df_news[(df_news['Currency'].isin(['JPY', 'AUD'])) & (df_news['Significance'].isin(['N', 'M', 'H']))]

grouped_news = filtered_news.groupby('Date', as_index=False)

merged_df = pd.merge(merged_df, grouped_news.agg({'NewsName': lambda x: ';'.join(x),
'Significance': lambda x: ';'.join(x)}),
on='Date', how='left')

merged_df['NewsName'].replace('', pd.NA, inplace=True)
merged_df['Significance'].replace('', pd.NA, inplace=True)

merged_df['Weekday'] = pd.to_datetime(merged_df['Date'], format='%d/%m/%Y').dt.strftime('%A')

merged_df = merged_df.drop_duplicates(subset=['Date'])

merged_df['Entry Count'] = range(1, len(merged_df) + 1)
merged_df.set_index('Entry Count', inplace=True)

merged_df.loc[merged_df['Time_HOTD_M15'] < merged_df['Time_LOTD_M15'], 'Signal'] = 'Bearish'
merged_df.loc[merged_df['Time_HOTD_M15'] > merged_df['Time_LOTD_M15'], 'Signal'] = 'Bullish'
merged_df.loc[merged_df['Time_HOTD_M15'] == merged_df['Time_LOTD_M15'], 'Signal'] = 'Neutral'

pd.set_option('display.max_rows', 100)

print(merged_df[['Date', 'Weekday', 'Signal', 'HOTD (D1)', 'LOTD (D1)', 'HOTD (M15)', 'LOTD (M15)', 'M15_HOTD_LOTD_Diff', 'Time_HOTD_M15', 'Time_LOTD_M15', 'Asia HOTD M15', 'Asia LOTD M15', 'Asia HOTD LOTD Diff', 'Time_Asia_HOTD_M15', 'Time_Asia_LOTD_M15', 'NewsName', 'Significance']])

I also noticed something else that I just can’t figure out:

Depending on whether I swap low_hourly and high_hourly positions, different values come out at the end. As it is now, everything comes out correctly, but the code should actually universally calculate the time for the highest value that could be found from Low_hourly/High_hourly and the time from the lowest value that could be found from Low_hourly/High_hourly, regardless of the order?

index_hotd_m15 = merged_df.groupby('Date')[['Low_hourly', 'High_hourly']].idxmax().values.flatten()
index_lotd_m15 = merged_df.groupby('Date')[[['High_hourly', 'Low_hourly']].idxmin().values.flatten()

I would be infinitely grateful if someone could help me further.

Thanks in advance and sorry for the long text!

onePythonUser · January 12, 2024, 12:53am

Hi,

can you please add three wavy apostrophes, both above and below your code so that it is legible.
The key to use is the one just below the esc key in tandem with the shift key. You will be able to see how your code appears on the right side.

For example, by adding three of these ~ in series, both one line above and one line below,
the code will appear as:

code here

MrQuant1 · January 12, 2024, 1:05am

Done! Thank you.

onePythonUser · January 12, 2024, 1:07am

Great! Now please make sure that all of the indentations in your code are translated here
so that community members will review your code just as you had intended.

onePythonUser · January 12, 2024, 2:05am

Also, for your title, it would probably help if you added ‘… using Panda package’ so that community
members that use this module often or are very familiar with it, would more likely respond to your query.

barry-scott · January 12, 2024, 9:02am

I have updated the title