aprilthird2021 5 days ago

> We're just moving up the abstraction ladder, like we did with compilers.

We're not because you have to still check every outputted code. You didn't have to check every compilation step of a compiler. It was testable actual code, not non-deterministic output from English language input

▲

elAhmo 5 days ago | parent | next [-]

I would bet a significant amount of money that many LLM users don’t check the output. And as tools improve, this will only increase.

The number of users actually checking the output of a compiler is nonexistent. You just trust it.

LLMs are moving that direction, whether we like it or not

▲

Jensson 5 days ago | parent | next [-]

> The number of users actually checking the output of a compiler is nonexistent. You just trust it.

Quite a few who work on low level systems do this. I have done this a few times to debug build issues: this one time a single file suddenly made compile times go up by orders of magnitude, the compiler inlined a big sort procedure in an unrolled loop, so it added the sorting code hundreds of times over in a single function and created a gigantic binary that took ages to compile since it tried to optimize that giant function.

That is slow both in runtime and compile time, so I added a tag to not inline the sort there, and all the issues disappeared. The sort didn't have a tag to inline it, so the compiler just made an error here, it shouldn't have inlined such a large function in an unrolled loop.

▲

aprilthird2021 5 days ago | parent | prev | next [-]

Of course they don't. That's why things like the NX breach happen. That's also why they don't learn anything when they use these tools and their brains stagnate.

▲

__loam 5 days ago | parent | prev [-]

Well they're not improving that much anymore. That's why Sam Altman is out there saying it's a bubble.

▲

CuriouslyC 5 days ago | parent [-]

This is incorrect, they are improving, you just don't understand how to measure and evaluate it.

The Chinese models are getting hyper efficient and really good at agentic tasks. They're going to overtake Claude as the agentic workhorses soon for sure, Anthropic is slow rolling their research and the Chinese labs are smoking. Speed/agentic ability don't show big headlines, but they really matter.

GPT5 might not impress you with its responses to pedestrian prompts, but it is a science/algorithm beast. I understand what Sam Altman was saying about how unnerving its responses can be, it can synthesize advanced experiments and pull in research from diverse areas to improve algorithms/optimize in a way that's far beyond the other LLMs. It's like having a myopic autistic savant postdoc to help me design experiments, I have to keep it on target/focused but the depth of its suggestions are pretty jaw dropping.

	▲	player1234 3 days ago \| parent [-]
		[flagged]

▲

pessimizer 5 days ago | parent | prev [-]

> We're not because you have to still check every outputted code.

To me, that's what makes it an abstraction layer, rather than just a servant or an employee. You have to break your entire architecture into units small enough that you know you can coax the machine to output good code for. The AI can't be trusted as far as you can throw it, but the distance from you to how far you can throw is the abstraction layer.

An employee you can just tell to make it work, they'll kill themselves trying to do it, or be replaced if they don't; eventually something will work, and you'll take all the credit for it. AI is not experimenting, learning and growing, it stays stupid. The longer it thinks, the wronger it thinks. You deserve the credit (and the ridicule) for everything it does that you put your name on.

-----

edit: and this thread seems to think that you don't have to check what your high level abstraction is doing. That's probably why most programs run like crap. You can't expect something you do in e.g. python to do the most algorithmically sensible thing, even if you wrote the algorithm just like the textbook said. It may make weird choices (maybe optimal for the general case, but horrifically bad for yours) that mean that it's not really running your cute algorithm at all, or maybe your cute algorithm is being starved by another thread that you have no idea why it would be dependent on. It may have made correct choices when you started writing, then decided to make wrong choices after a minor patch version change.

To pretend perfection is a necessary condition for abstraction is not even somebody would say directly. Never. All we talk about is leaky abstractions.

Remember when GTA loading times, which (a counterfactual because we'll never know) probably decimated sales, playtime, and at least the marketing of the game, turned out to be because they were scanning some large, unnecessary json array (iirc) hundreds of times a second? That's probably a billion dollar mistake. Just because some function that was being blindly called was not ever reexamined, and because nobody profiled properly (i.e. checked the output.)