Remix.run Logo
crystal_revenge 2 hours ago

> Pandas is widely adopted and deeply integrated into the Python ecosystem.

This is pretty laughable. Yes there are very DS specific tools that make good use of Pandas, but `to_pandas` in Polars trivially solves this. The fact that Pandas always feels like injecting some weird DSL into existing Python code bases is one of the major reasons why I really don't like it.

> If you are dealing with huge data sets, you are probably using Spark or something like Dask already where jobs can run in the cloud. If you need speed and efficiency on your local machine, you use NumPy outright. And if you really, really need speed, you rewrite it in C/C++.

Have you used Polars at all? Or for that matter written significant Pandas outside of a notebook? The number one benefit of Polars, imho, is that Polars works using Expressions that allow you to trivially compose and reuse fundamental logic when working with data in a way the works well with other Python code. This solves the biggest problem with Pandas is that it does not abstract well.

Not to mention that Pandas is really poor dataframe experience outside of it's original use case which was financial time series. The entire multi-index experience is awful and I know that either you are calling 'reset_index' multiple times in your Pandas logic or you have bugs.