Remix.run Logo
ActorNightly 12 hours ago

>Excellent article

Except its not. Data science in python pretty much requires you to use numpy. So his example of mean/variance code is a dumb comparison. Numpy has mean and variance functions built in for arrays.

Even when using raw python in his example, some syntax can be condesed quite a bit:

groups = defaultdict(list) [groups[(row['species'], row['island'])].append(row['body_mass_g']) for row in filtered]

It takes the same amount of mental effort to learn python/numpy as it does with R. The difference is, the former allows you to integrate your code into any other applicaiton.

dragonwriter 10 hours ago | parent | next [-]

> Numpy has mean and variance functions built in for arrays.

Even outside of Numpy, the stdlib has the statistics packages which provides mean, variance, population/sample standard deviation, and other statistics functions for normal iterables. The attempt to make Python out-of-the-box code look bad was either deliberately constructed to exaggerate the problems complained of, or was the product of a very convenient ignorance of the applicable parts of Python and its stdlib.

ModernMech 10 hours ago | parent | prev [-]

I dunno. Numpy has its own data types, its own collections, its own semantics which are all different enough from Python, I think it's fair to consider it a DSL on its own. It'd be one thing if it was just, operator overloading to provide broadcasting for python, but Numpy's whole existence is to patch the various shortcomings Python has in DS.