> I don't see this getting better.

We went from 2 + 7 = 11 to "solved a frontier math problem" in 3 years, yet people don't think this will improve?

datsci_est_2015 2 hours ago | parent | next [-]

I’ve seen this style of take so much that I’m dying for someone to name a logical fallacy for it, like “appeal to progress” or something.

Step away from LLMs for a second and recognize that “Yesterday it was X, so today it must be X+1” is such a naive take and obviously something that humans so easily fall into a trap of believing (see: flying cars).

	▲	Gareth321 40 minutes ago \| parent [-]
		In finance we say "past performance does not guarantee future returns." Not because we don't believe that, statistically, returns will continue to grow at x rate, but because there is a chance that they won't. The reality bias is actually in favour of these getting better faster, but there is a chance they do not.

▲

snemvalts 2 hours ago | parent | prev | next [-]

Scaling law is a power law , requiring orders of magnitude more compute and data for better accuracy from pre-training. Most companies have maxed it out.

For RL, we are arriving at a similar point https://www.tobyord.com/writing/how-well-does-rl-scale

Next stop is inference scaling with longer context window and longer reasoning. But instead of it being a one-off training cost, it becomes a running cost.

In essence we are chasing ever smaller gains in exchange for exponentially increasing costs. This energy will run out. There needs to be something completely different than LLMs for meaningful further progress.

▲

Validark 4 hours ago | parent | prev | next [-]

I tend to disagree that improvement is inherent. Really I'm just expressing an aesthetic preference when I say this, because I don't disagree that a lot of things improve. But it's not a guarantee, and it does take people doing the work and thinking about the same thing every day for years. In many cases there's only one person uniquely positioned to make a discovery, and it's by no means guaranteed to happen. Of course, in many cases there are a whole bunch of people who seem almost equally capable of solving something first, but I think if you say things like "I'm sure they're going to make it better" you're leaving to chance something you yourself could have an impact on. You can participate in pushing the boundaries or even making a small push on something that accelerates someone else's work. You can also donate money to research you are interested in to help pay people who might come up with breakthroughs. Don't assume other people will build the future, you should do it too! (Not saying you DON'T)

▲

3abiton 3 hours ago | parent | prev | next [-]

The problem class is rather very structured which makes it "easier", yet the results are undeniably impressive

▲

nopinsight 3 hours ago | parent | prev | next [-]

LLMs in some form will likely be a key component in the first AGI system we (help) build. We might still lack something essential. However, people who keep doubting AGI is even possible should learn more about The Church-Turing Thesis.

https://plato.stanford.edu/entries/church-turing/

	▲	benterix 2 hours ago \| parent [-]
		This is a long read on things most people here know at least in some form. Could you pint to a particular fragment or a quote?

▲

number6 4 hours ago | parent | prev | next [-]

But can it count the R's in strawberry?

▲

Paradigma11 4 hours ago | parent | next [-]

That question is equivalent to asking a human to add the wavelengths of those two colors and divide it by 3.

▲

snovv_crash 4 hours ago | parent | next [-]

Unless you're aware of hyperspectral image adapters for LLMs they aren't capable of that either.

▲

szszrk 3 hours ago | parent | prev | next [-]

Unfair - human beats AI in this comparison, as human will instantly answer "I don't know" instead of yelling a random number.

Or at best "I don't know, but maybe I can find out" and proceed to finding out/ But he is unlikely to shout "6" because he heard this number once when someone talked about light.

	▲	koliber 3 hours ago \| parent [-]
		> human will instantly answer "I don't know" instead of yelling a random number. Seems that you never worked with Accenture consultants?

▲

thegabriele 2 hours ago | parent | prev [-]

Why is that?

	▲	Paradigma11 30 minutes ago \| parent \| next [-]
		Because LLMs dont have a textual representation of any text they consume. Its just vectors to them. Which is why they are so good at ignoring typos, the vector distance is so small it makes no difference to them.
	▲	an hour ago \| parent \| prev [-]
		[deleted]

▲

Aditya_Garg 4 hours ago | parent | prev [-]

yes its ridiculously good at stuff like that now. I dare you to try and trick it.

▲

frizlab 4 hours ago | parent [-]

https://news.ycombinator.com/item?id=47495568

▲

thedatamonger 4 hours ago | parent [-]

what bothers me is not that this issue will certainly disappear now that it has been identified, but that that we have yet to identify the category of these "stupid" bugs ...

▲

sigmoid10 4 hours ago | parent [-]

We already know exactly what causes these bugs. They are not a fundamental problem of LLMs, they are a problem of tokenizers. The actual model simply doesn't get to see the same text that you see. It can only infer this stuff from related info it was trained on. It's as if someone asked you how many 1s there are in the binary representation of this text. You'd also need to convert it first to think it through, or use some external tool, even though your computer never saw anything else.

▲

datsci_est_2015 2 hours ago | parent [-]

Okay but, genuinely not an expert on the latest with LLMs, but isn’t tokenization an inherent part of LLM construction? Kind of like support vectors in SVMs, or nodes in neural networks? Once we remove tokenization from the equation, aren’t we no longer talking about LLMs?

	▲	fenomas an hour ago \| parent [-]
		It's not a side effect of tokenization per se, but of the tokenizers people use in actual practice. If somebody really wanted an LLM that can flawlessly count letters in words, they could train one with a naive tokenizer (like just ascii characters). But the resulting model would be very bad (for its size) at language or reasoning tasks. Basically it's an engineering tradeoff. There is more demand for LLMs that can solve open math problems, but can't count the Rs in strawberry, than there is for models that can count letters but are bad at everything else.

▲

saidnooneever 4 hours ago | parent | prev [-]

if you let million monkeys bash typewriter. something something book