After we fix the all the simple specious reasoning of stuff like Alexander-the-great and agree to out-source certain problems to appropriate tools, the high-dimensional analogs of stuff like Datasaurus[0] and Simpson's paradox[1] etc are still going to be a thing. But we'll be so disconnected from the representation of the problems that we're trying to solve that we won't even be aware of the possibility of any danger, much less able to actually spot it.

My take-away re: chain-of-thought specifically is this. If the answer to "LLMs can't reason" is "use more LLMs", and then the answer to problems with that is to run the same process in parallel N times and vote/retry/etc, it just feels like a scam aimed at burning through more tokens.

Hopefully chain-of-code[2] is better in that it's at least trying to force LLMs into emulating a more deterministic abstract machine instead of rolling dice. Trying to eliminate things like code, formal representations, and explicit world-models in favor of implicit representations and inscrutable oracles might be good business but it's bad engineering

[0] https://en.wikipedia.org/wiki/Datasaurus_dozen [1] https://towardsdatascience.com/how-metrics-and-llms-can-tric... [2] https://icml.cc/media/icml-2024/Slides/32784.pdf

▲

dingnuts 3 days ago | parent | next [-]

> it just feels like a scam aimed at burning through more tokens.

IT IS A SCAM TO BURN MORE TOKENS. You will know when it is no longer a scam when you either:

1) pay a flat price with NO USAGE LIMITS

2) pay per token with the ability to mark a response as bullshit & get a refund for those wasted tokens.

Until then: the incentives are the same as a casino's which means IT IS A SCAM.

	▲	phs318u 3 days ago \| parent [-]
		Ding ding ding! We have a winner!

▲

befictious 3 days ago | parent | prev | next [-]

>it just feels like a scam aimed at burning through more tokens.

I have a growing tin foil hat theory that the business model of LLM's is the same as 1-900-psychic numbers of old.

For just 25¢ 1-900-psychic will solve all your problems in just 5 minutes! Still need help?! No problem! We'll work with you until you get your answers for only 10¢ a minute until your happy!

eerily similar

▲

jmogly 3 days ago | parent | prev [-]

To me it’s a problem of if a piece of information is not well represented in the training data the llm will always tend towards bad token predictions for related to said information. I think the next big thing in LLM’s could be figuring out how to tell if a token was just a “fill in” or “guess” vs a well predicted token. That way you can have some sort of governor that can kill a response if it is getting too guessy, or atleast provide some other indication that the provided tokens are likely hallucinated.

Maybe there is some way to do it based on the geometry of how the neural net activated for a token, or some other more statistics based approach, idk I’m not an expert.

	▲	photonthug 2 days ago \| parent [-]
		A related topic you might want to look into here is called nucleus sampling. Similar to temperature but also different.. it's been surprising to me that people don't talk about it more often, and that lots of systems won't expose the knobs for it.