| ▲ | jakobnissen 15 hours ago | |||||||||||||
Excellent article - except that the author probably should have gated their substantiation of the claim behind a cliffhanger, as other commenters have mentioned. The author's priorities are sensible, and indeed with that set of priorities, it makes sense to end up near R. However, they're not universal among data scientists. I've been a data scientist for eight years, and have found that this kind of plotting and dataframe wrangling is only part of the work. I find there is usually also some file juggling, parsing, and what the author calls "logistics". And R is terrible at logistics. It's also bad at writing maintainable software. If you care more about logistics and maintenance, your conclusion is pushed towards Python - which still does okay in the dataframes department. If you're ALSO frequently concerned about speed, you're pushed towards Julia. None of these are wrong priorities. I wish Julia was better at being R, but it isn't, and it's very hard to be both R and useful for general programming. Edit: Oh, and I should mention: I also teach and supervise students, and I KEEP seeing students use pandas to solve non-table problems, like trying to represent a graph as a dataframe. Apparently some people are heavily drawn to use dataframes for everything - if you're one of those people, reevaluate your tools, but also, R is probably for you. | ||||||||||||||
| ▲ | ActorNightly 12 hours ago | parent | next [-] | |||||||||||||
>Excellent article Except its not. Data science in python pretty much requires you to use numpy. So his example of mean/variance code is a dumb comparison. Numpy has mean and variance functions built in for arrays. Even when using raw python in his example, some syntax can be condesed quite a bit: groups = defaultdict(list) [groups[(row['species'], row['island'])].append(row['body_mass_g']) for row in filtered] It takes the same amount of mental effort to learn python/numpy as it does with R. The difference is, the former allows you to integrate your code into any other applicaiton. | ||||||||||||||
| ||||||||||||||
| ▲ | a_bonobo 3 hours ago | parent | prev [-] | |||||||||||||
>I find there is usually also some file juggling, parsing, [...] I'd say I'm 50/50 Python/R for exactly this reason: I write Python code on HPC or a server to parse many, many files, then I get some kind of MB-scale summary data I analyse locally in R. R is not good at looping over hundreds of files in the gigabytes, Python is not good at making pretty insights from the summary. A tool for every task. | ||||||||||||||