How to switch between parent classes during child instantiation?

VERBOSE · May 2, 2024, 9:12am

Hi Python,

I’m trying to make a custom class CsvFrame that is a dataframe made either with pandas or polars.

For that I made the code below :

class CsvFrame:
    def __init__(self, engine, *args, **kwargs):
        if engine == 'polars':
            import polars as pl
            pl.DataFrame.__init__(pl.read_csv(*args, **kwargs))

        if engine == 'pandas':
            import pandas as pd
            pd.DataFrame.__init__(pd.read_csv(*args, **kwargs))

Now when I instantiante an object, there is two problems :

there is not html represention of the dataframe in my vscode-jupyter
none of the methods or the attributes of a dataframe are available

import io

input_text = '''
col1,col2
A,1
B,2
'''

cfr = CsvFrame('polars', io.StringIO(input_text))

# problem 1
cfr # <__main__.CsvFrame at 0x1fd721f32c0>

# problem 2
cfr.melt()
AttributeError: 'CsvFrame' object has no attribute 'melt'

When I try class CsvFrame(pd.DataFrame, pl.DataFrame), I get AttributeError: 'CsvFrame' object has no attribute '_mgr'. I don’t think anyways it’s a good idea to inherit from both at the same time because they have common methods names like melt().

Can you guys help me fix that ? Is there a technique to achieve what I’m looking for, please ?

abessman · May 2, 2024, 10:26am

Do you need CsvFrame to be an actual class? Can’t you just make a function which returns either a pandas.DataFrame or a polars.DataFrame depending on your choice of engine?

def make_csv_frame(engine, *args, **kwargs):
        if engine == 'polars':
            import polars as pl
            return pl.read_csv(*args, **kwargs)

        if engine == 'pandas':
            import pandas as pd
            return pd.read_csv(*args, **kwargs)

Since pandas and polars have incompatible APIs, you will of course need to handle any differences between the APIs which are relevant to your use case.

onePythonUser · May 2, 2024, 6:46pm

Hi,

does it matter? If you wanted to change the source of a function, by prefixing it with its library alias, this avoids using the incorrect one (i.e., or using the intended one).

a = pd.melt('arguments here')
b = pl.melt('arguments here')

Because pl and pd are so close in spelling, make the aliases a bit more verbose to avoid spelling errors or typos:

a = pnda.melt('arguments here')
b = polr.melt('arguments here')

Just a suggestion.

Update:

I modified the script that you provided a bit - only for the case of the polar library. You will have to do the same for the case of the panda library so that you can call the attribute. I was able to get it to partially work somewhat. I created an attribute and set that equal to the dataframe that was created during instantiation. Using the attribute, I have access to the melt method. I passed in data that was more in tune with an actual use case however. I borrowed the data from this source: Python Polars: A Lightning-Fast DataFrame Library – Real Python

Here is the modified code.

import numpy as np

num_rows = 5000
rng = np.random.default_rng(seed=7)
buildings_data = {
         "sqft": rng.exponential(scale=1000, size=num_rows),
         "year": rng.integers(low=1995, high=2023, size=num_rows),
         "building_type": rng.choice(["A", "B", "C"], size=num_rows)}

class CsvFrame:

    def __init__(self, engine, *args, **kwargs):

        if engine == 'polars':
            import polars as pl
            self.attr1 = pl.DataFrame(*args, **kwargs)

        if engine == 'pandas':
            import pandas as pd
            pd.DataFrame.__init__(pd.read_csv(*args, **kwargs))

cfr = CsvFrame('polars', buildings_data)
print(cfr.attr1.melt())
 # or this way:
print(cfr.attr1)

I don’t get the highlighted error that you are experiencing.

blhsing · May 3, 2024, 1:40am

In the same question asked on StackOverflow, the OP has clarified in the comments that a custom class is needed to add more methods and attributes.

I’m reposting my answer to the StackOverflow question here as a reference:

You can use a factory function that returns an instance of a subclass of either polars.DataFrame or pandas.DataFrame based on the given engine:

def CsvFrame(engine, *args, **kwargs):
    if engine == 'polars':
        from polars import DataFrame, read_csv
    elif engine == 'pandas':
        from pandas import DataFrame, read_csv
    else:
        raise ValueError(f'Unsupported engine {engine}')

    class _CsvFrame(DataFrame):
        def __new__(cls, *args, **kwargs):
            return read_csv(*args, **kwargs)
        # more methods can be added here

    return _CsvFrame(*args, **kwargs)

...

cfr = CsvFrame('polars', io.StringIO(input_text))
cfr.melt()

Alternatively, you can make the data frame returned by read_csv of the chosen engine an instance attribute of your CsvFrame class, which acts as a proxy object to the data frame by delegating attribute lookups to it through the __getattr__ method:

class CsvFrame:
    def __init__(self, engine, *args, **kwargs):
        if engine == 'polars':
            from polars import read_csv
        elif engine == 'pandas':
            from pandas import read_csv
        else:
            raise ValueError(f'Unsupported engine {engine}')
        self.df = read_csv(*args, **kwargs)

    def __getattr__(self, name):
        return getattr(self.df, name)

...

cfr = CsvFrame('polars', io.StringIO(input_text))
cfr.melt()