Remix.run Logo
autoexec 4 hours ago

Machine learning and data science are not new things in science. It's great that we have the ability to share and work with existing data sets, collect data remotely with sensors, and build software to create models, but we'll always need people to go out and collect updated data, place censors and verify that what models predict is actually happening.

> Scientists who run long-term ecological studies, in particular, report that they struggle to find funding.

It's cheaper and easier to do stuff sitting at a desk. In theory that's a good thing if it means more work gets done, but field work has to happen too. For many people it's the best part of the job, for others it's a pain that has to be suffered through to get the data they need. Hopefully there's room (and funding) for both kinds of people to do the work they want.

analog31 4 hours ago | parent | next [-]

I'm a scientist in industry. It's remarkable how many smart people think that science can be done without data. I've heard managers ask: "Why do we need to gather data? Can't we just model it? The customer doesn't want to see data. They just want an answer."

There's also a strong belief in "statistical magic." Faced with a bad or insufficient data set, someone will say: "Let's give the data to <statistician> and have them work their magic on it."

That the results actually have to be influenced by the data in some way is something that has to be explained to people. In all of my years as a scientist, I've learned that there's still no substitute for good measurements. Good data can be cheaper than analysis of bad data.

scarmig 2 hours ago | parent | next [-]

Data science is not fundamentally about data or science. It's about either justifying decisions that have already been made or delegating decisions to an unbiased casting of bones to let the gods decide.

toofy 3 hours ago | parent | prev [-]

> It's remarkable how many smart people think that science can be done without data.

It’s so important that we write these down, so when these people have forgotten why they’re not making any progress and they’re searching for answers, they’ll find what we wrote down and say “ohhh, we had too much hubris thought we were smarter than everyone else and didn’t listen to how important actually going outside is.”

rjsw 4 hours ago | parent | prev [-]

People that I'm currently working with are using AI to try to extract data from the text of published papers, getting access to raw data sets doesn't seem to be a priority.

anamax an hour ago | parent [-]

The data is supposed to be available from the authors in almost all cases, but in many (most?) cases the authors won't provide it.

autoexec 40 minutes ago | parent [-]

It should be easily accessed without even having to ask (links should be automatically provided and maintained by whatever entity published the paper).