Remix.run Logo
lenerdenator 15 hours ago

> I think people way over-index Python as the language for data science. It has limitations that I think are quite noteworthy. There are many data-science tasks I’d much rather do in R than in Python.1 I believe the reason Python is so widely used in data science is a historical accident, plus it being sort-of Ok at most things, rather than an expression of its inherent suitability for data-science work.

Python doesn't need to be the best at any one thing; it just has to be serviceable for a lot of things. You can take someone who has expertise in a completely different domain in software (web dev, devops, sysadmin, etc.) and introduce them to the data science domain without making them learn an entirely new language and toolchain.

dmurray 15 hours ago | parent [-]

That's not why it's used in data science though. Lots of data scientists use Python all day and have no concept of ever working in a different field.

It's used in data science because it's used in data science.

vkazanov 15 hours ago | parent | next [-]

It's used in data science because no other language has this level of library support.

And it got this unprecedented level of support because right from the start it made its focus clear syntax and (perceived) simplicity.

There is also a sort of cumulative effect from being nice for algorithmic work.

Guido's long-term strategy won over numerous other strong candidates for this role.

passivegains 13 hours ago | parent [-]

I think the key thing not obvious to most data scientists is they're not using python because it meets their needs, it's because we've failed them. twice.

1. data scientists aren't programmers, so why do they need a programming language? the tools they should be using don't exist. they'd need programmers to make them, and all we have to offer is... more programming languages.

2. the giant problem at the heart of modern software: the most important feature of a modern programming language is being easy to read and write. this feature is conspicuously absent from most important languages.

they're trapped. they can't do what they need without a programming language but there are only a handful they can possibly use. the real reason python ended up with such good library support is they never really had a choice.

mohaine 15 hours ago | parent | prev | next [-]

But data science usually isn't an island.

Use whatever you want on your one off personal projects but use something more non-data science friendly if you ever want your model to run directly in a production workflow.

Productionizing R models is quite painful. The normal way is to just rewrite it not in R.

dmurray 13 hours ago | parent [-]

I've soured a lot on directly productionizing data science code. It's normally an unmaintainable mess.

If you write it in R and then rewrite it in C (better: rewrite it in English with the R as helpful annotations, then have someone else rewrite it in C), at least there is some chance you've thought about the abstractions and operations that are actually necessary for your problem.

bsder 8 hours ago | parent | prev | next [-]

Partially, but it's also because 90% of your work in "data science" isn't direct analysis.

You need to get the data from somewhere. Do you need to scrape that because Python is okay at scraping? Oh, after its scraped, we looked at it and it's in ObtuseBinaryFormat0.0.LOL.Beta and, what do you know, somebody wrote a converter for that for Python. And we need to clean all the broken entries out of that and Python is decent at that. etc.

The trick is that while Python may or may not be anybody's first choice for a particular task, Python is an okay second or third choice for most tasks.

So, you can learn Python. Or you learn <best language> and <something else>. And if <something else> is Python, was <best language> sufficiently better than Python to be worth spending the time learning?

lenerdenator 15 hours ago | parent | prev [-]

That's probably true now, but at one point, they were looking for people to start doing data science, and were pulling people from other domains.