Remix clone Hacker News

new | show | ask | jobs Github

	▲	zeehio 3 days ago
		I find that the tricky part of a good data analysis is knowing the biases in your data, often due to the data collection process, which is not contained in the data itself. I have seen plenty of overoptimistic results due to improper building of training, validation and test sets, or using bad metrics to evaluate trained models. It is not clear to me that this project is going to help to overcome those challenges and I am a bit concerned that if this project or similar ones become popular then these problems may become more prevalent. Another concern is that usually the "customer" asking the question wants a specific result (something significant, some correlation...). If through an LLM connected to this tool my customer finds something that it is wrong but aligned with what he/she wants, as a data scientist/statistician I will have the challenge to make the customer understand that the LLM gave a wrong answer, more work for me. Maybe with some well-behaved datasets and with proper context this project becomes very useful, we will see :-)
	▲	rbartelme 3 days ago \| parent [-]
		I agree with all of this. I've worked in optical engineering, bioinformatics, and data science writ large for over a decade, knowing the data collection process is foundational to statistical process control and statistical design of experiments. I've watched former employers light cash on fire chasing results from similar methods this MCP runs on the backend due to lack of measurement/experimental context.