I’m trying to make a custom class CsvFrame that is a dataframe made either with pandas or polars.
For that I made the code below :
class CsvFrame:
def __init__(self, engine, *args, **kwargs):
if engine == 'polars':
import polars as pl
pl.DataFrame.__init__(pl.read_csv(*args, **kwargs))
if engine == 'pandas':
import pandas as pd
pd.DataFrame.__init__(pd.read_csv(*args, **kwargs))
Now when I instantiante an object, there is two problems :
there is not html represention of the dataframe in my vscode-jupyter
none of the methods or the attributes of a dataframe are available
import io
input_text = '''
col1,col2
A,1
B,2
'''
cfr = CsvFrame('polars', io.StringIO(input_text))
# problem 1
cfr # <__main__.CsvFrame at 0x1fd721f32c0>
# problem 2
cfr.melt()
AttributeError: 'CsvFrame' object has no attribute 'melt'
When I try class CsvFrame(pd.DataFrame, pl.DataFrame), I get AttributeError: 'CsvFrame' object has no attribute '_mgr'. I don’t think anyways it’s a good idea to inherit from both at the same time because they have common methods names like melt().
Can you guys help me fix that ? Is there a technique to achieve what I’m looking for, please ?
Do you need CsvFrame to be an actual class? Can’t you just make a function which returns either a pandas.DataFrame or a polars.DataFrame depending on your choice of engine?
def make_csv_frame(engine, *args, **kwargs):
if engine == 'polars':
import polars as pl
return pl.read_csv(*args, **kwargs)
if engine == 'pandas':
import pandas as pd
return pd.read_csv(*args, **kwargs)
Since pandas and polars have incompatible APIs, you will of course need to handle any differences between the APIs which are relevant to your use case.
does it matter? If you wanted to change the source of a function, by prefixing it with its library alias, this avoids using the incorrect one (i.e., or using the intended one).
a = pd.melt('arguments here')
b = pl.melt('arguments here')
Because pl and pd are so close in spelling, make the aliases a bit more verbose to avoid spelling errors or typos:
a = pnda.melt('arguments here')
b = polr.melt('arguments here')
Just a suggestion.
Update:
I modified the script that you provided a bit - only for the case of the polar library. You will have to do the same for the case of the panda library so that you can call the attribute. I was able to get it to partially work somewhat. I created an attribute and set that equal to the dataframe that was created during instantiation. Using the attribute, I have access to the melt method. I passed in data that was more in tune with an actual use case however. I borrowed the data from this source: Python Polars: A Lightning-Fast DataFrame Library – Real Python
Here is the modified code.
import numpy as np
num_rows = 5000
rng = np.random.default_rng(seed=7)
buildings_data = {
"sqft": rng.exponential(scale=1000, size=num_rows),
"year": rng.integers(low=1995, high=2023, size=num_rows),
"building_type": rng.choice(["A", "B", "C"], size=num_rows)}
class CsvFrame:
def __init__(self, engine, *args, **kwargs):
if engine == 'polars':
import polars as pl
self.attr1 = pl.DataFrame(*args, **kwargs)
if engine == 'pandas':
import pandas as pd
pd.DataFrame.__init__(pd.read_csv(*args, **kwargs))
cfr = CsvFrame('polars', buildings_data)
print(cfr.attr1.melt())
# or this way:
print(cfr.attr1)
I don’t get the highlighted error that you are experiencing.
In the same question asked on StackOverflow, the OP has clarified in the comments that a custom class is needed to add more methods and attributes.
I’m reposting my answer to the StackOverflow question here as a reference:
You can use a factory function that returns an instance of a subclass of either polars.DataFrame or pandas.DataFrame based on the given engine:
def CsvFrame(engine, *args, **kwargs):
if engine == 'polars':
from polars import DataFrame, read_csv
elif engine == 'pandas':
from pandas import DataFrame, read_csv
else:
raise ValueError(f'Unsupported engine {engine}')
class _CsvFrame(DataFrame):
def __new__(cls, *args, **kwargs):
return read_csv(*args, **kwargs)
# more methods can be added here
return _CsvFrame(*args, **kwargs)
...
cfr = CsvFrame('polars', io.StringIO(input_text))
cfr.melt()
Alternatively, you can make the data frame returned by read_csv of the chosen engine an instance attribute of your CsvFrame class, which acts as a proxy object to the data frame by delegating attribute lookups to it through the __getattr__ method: