Show HN: Good AI Task – a tool for asking AI what it can and can't do

	▲	Show HN: Good AI Task – a tool for asking AI what it can and can't do(goodaitask.com)
		6 points by jmt710 14 hours ago \| 3 comments
		Describe a task, and AI will give you a breakdown of whether it can do your task well, poorly, or somewhere in between. I built it mostly because I kept getting asked "what is AI even good for" and fumbling the answer. The most fun use is testing it on things you already know it can't do and seeing how it explains why it can't be done.
	▲	SsgMshdPotatoes 13 minutes ago \| parent \| next [-]
		A paper that is somewhat related to the question of how well this works is "Do Large Language Models Know What They Don’t Know? Evaluating Epistemic Calibration via Prediction Markets" There a model gets a prediction market scenario (that in reality has already closed, but not from the model's POV), and it is tasked to predict the outcome AND give its confidence in the prediction. Conclusion turns out to be "systematic overconfidence across all models". Probably worth keeping an eye on such research, might enable you to make the product better over time as new research comes out etc.
	▲	adrianwaj 10 hours ago \| parent \| prev \| next [-]
		Obrari looks cool - see my previous comment about desiloing for an idea to use there. Also, perhaps it could be relaunched for niche and composite hardware devices that have a special use? I have a good domain for that use case. - also for Obrari, why not price in crypto too? Use subdomains like [coin name here].obrari.com - good for marketing. Could even tie it in with crowd-funding, eg https://ccs.getmonero.org/ - but replicate that for every coin.
	▲	Isolated_Routes 12 hours ago \| parent \| prev [-]
		This is interesting. I'm curious how you calibrate it. I've found that sometimes AI is very confident about being able to do something (or not do something), and I've had to pick its response apart a bit to see what the real answer is. Are you getting accurate results with one pass?