Future of Pandas vs Polars

I am trying to understand whether it’s worth switching to pandas to polars. Polars seems to be much faster, but I am wondering whether it makes sense to switch. In particular, 1) are there any drawbacks? 2) Is pandas expected to close the gap in terms of computational speed, especially with the release of numpy 2.0 in the next months?

1 Like

I don’t know the answer to your questions, but did find a good blog post that explicitly addresses your first question: Polars vs. pandas: What's the Difference? | The DataSpell Blog According to that post one main, current drawback is interoperability with PyTorch and scikit-learn.

1 Like

It looks like scikit-learn may be looking at switching to polars as the default dataframe library - RFC switch to Polars as the default dataframe lib in our examples · Issue #28341 · scikit-learn/scikit-learn · GitHub

2 Likes

My two cents, coming from someone who primarily uses pandas, is that it’s definitely worth learning polars.

I see them as complementary right now. Polars for the consistent API and speed and pandas for the sheer breadth of capabilities and updating legacy code.

are there any drawbacks?

Main drawback I’m aware of would be that polars doesn’t have all the functionality that pandas does. Compatibility is another one, but it’s less of a problem now since you can call from_pandas() and to_pandas() to switch back and forth between polars and pandas.