I have generally moved from bearish to bullish on the future of current AI technology, but the continued inaccuracy with basic facts all while the models significantly improve continues to give me significant pause.

As an example, creating recipes with Claude Opus based on flavor profiles and preferences feels magical, right up until the point at which it can't accurately convert between tablespoons and teaspoons. It's like the point in the movie where a character is acting nearly right but something is a bit off and then it turns out they're a zombie and going to try to eat your brain. This note taking example feels similar. It nearly works in some pretty impressive ways and then fails at the important details in a way that something able to do the things AI can allegedly do really shouldn't.

It's these failures that make me more and more convinced that while current generation AI can do some pretty cool things if you manage it right, we're not actually on the right track to achieve real intelligence. The persistence of these incredibly basic failure modes even as models advance makes it fairly obvious that continued advancement isn't going to actually address those problems.

▲

Brian_K_White an hour ago | parent | next [-]

I hate to help provide possible soultions to an entire process I don't approve of, but maybe the fuzzy tools need old style deterministic tools the same way and for the same reasons we do.

So instead of an LLM trying to answer a math or reason question by finding a statistical match with other similar groups of words it found on 4chan and the all in podcast and a terrible recipe for soup written by a terrible cook, it can use a calculator when it needs a calculator answer.

▲

colechristensen an hour ago | parent [-]

No, they just need to be trained to have adversarial self review "thinking" processes.

You ask an LLM "What's wrong with your answer?" and you get pretty good results.

	▲	binary0010 42 minutes ago \| parent [-]
		Or you get the original output result was perfect and the adversarial "rethinking" switches to an incorrect result.

▲

themafia an hour ago | parent | prev [-]

> we're not actually on the right track to achieve real intelligence.

Real intelligence means you have to say "I don't know" when you don't know, or ask for help, or even just saying you refuse to help with the subtext being you don't want to appear stupid.

The models could ostensibly do this when it has low confidence in it's own results but they don't. What I don't know if it's because it would be very computationally difficult or it would harm the reputation of the companies charging a good sum to use them.

▲

cmrdporcupine 27 minutes ago | parent | next [-]

That's just not how they work, really. They don't know what they don't know and their process requires an output.

I think they're getting better at it, but it's likely just the number of parameters getting bigger and bigger in the SOTA models more than anything.

▲

colechristensen an hour ago | parent | prev | next [-]

You can TELL the models to do this and they'll follow your prompt.

"Give me your answer and rate each part of it for certainty by percentage" or similar.

▲

mylifeandtimes 23 minutes ago | parent [-]

could you please tell me how it generates that certainty score?

	▲	colechristensen 4 minutes ago \| parent [-]
		The whole thing is a statistical model, that's just what it is. No, I cannot in a reasonable way dissect how an LLM works to a satisfactory level to a skeptic.

▲

bluefirebrand an hour ago | parent | prev [-]

My theory is because the people building the models and in charge of directing where they go love the sycophantic yes-man behavior the models display

They don't like hearing "I don't know"