LLMs should be trained on and directly output binary.

klodolph 5 hours ago | parent | next [-]

On the off chance that you’re serious, that would result in disastrously bad output. The difference between “jmp $+15” and “jmp $+16” is inscrutable and the LLM would not be able to pick the right one without tooling.

That tooling is a compiler. The higher level, the better chance the LLM can be steered to good output. Machine code is hopeless, don’t bother.

▲

pjmlp 5 hours ago | parent | next [-]

That compiler does wonders with languages that have UB on their specs, especially when having optimizations passes with heuristics.

Also there are dynamic compilers were the shape of machine code changes as the code executes, and each single execution will certainly generate different sequences, depending on the program execution and where it is running.

Deterministic JIT compiler code generation, at least on optimising ones, is not a solved problem.

▲

faangguyindia 5 hours ago | parent | prev | next [-]

What about AOT optimization? whuch brings aot closer to JITs performance? Isn't that something LLM + Harness can easily do?

	▲	klodolph 5 hours ago \| parent [-]
		I think the idea that AOT is inherently faster than JIT, or vice versa, is a thoroughly debunked idea. You can have LLMs help you optimize code but I don’t think you can do this unattended for non-trivial code.

▲

jenadine 5 hours ago | parent | prev [-]

> The difference between “jmp $+15” and “jmp $+16” is inscrutable

I don't see why that's the case. LLM trained on binary would totally see it, not?

Also the tool can also be running the test and a debugger.

▲

klodolph 5 hours ago | parent | next [-]

> I don't see why that's the case. LLM trained on binary would totally see it, not?

It would not. You find the correct version by counting the number of bytes to the destination. LLMs are famously bad at this kind of problem (counting).

> Also the tool can also be running the test and a debugger.

The test needs to provide a good amount of signal. That’s too hard if you are throwing machine code at the wall.

In order for debuggers to work, you need some kind of model that describes what the code should do and what state the computer should be in after each instruction. That model is high-level code.

I can understand the intuitive appeal of training LLMs with machine code, but all of my experience with LLMs suggest that they are incredibly ill-suited to the task, and we just don’t have the capacity to train them to make useful machine code.

▲

zx8080 5 hours ago | parent [-]

Can "LLMs are bad at counting" be generalized to "LLM are better in complex stuff but make more mistakes in simple"?

▲

fluoridation 5 hours ago | parent | next [-]

I would phrase it as "LLMs are good at big picture stuff and bad at fine detail", or to put it another way, they're accurate, but imprecise and with low reproducibility.

	▲	bregma an hour ago \| parent \| next [-]
		It is my experience that it's the opposite. LLMs are very very precise but wildly inaccurate. They might give you 17 significant digits but be off by 10 orders of magnitude, to use a metaphor.
	▲	benj111 an hour ago \| parent \| prev [-]
		But where does that leave us when programmers treat themselves as architects with the AI doing the drudge work? As seems to be the fashion. It then means you have 2 parties focussing on the big picture and no one focussing on the details.

▲

ozlikethewizard 5 hours ago | parent | prev | next [-]

Its more LLMs are better at vague problems with multiple non perfect solutions, and struggle at problems that require precision.

▲

klodolph 5 hours ago | parent | prev [-]

No, I don’t think so. LLMs are good at a lot of simple tasks, but bad at certain simple tasks. Moravec’s paradox in a new iteration.

It applies to humans too. Calculus is “simple” but it takes something like sixteen years to train a human to do it, if all goes well. Meanwhile, most humans think that inverse kinematics is, like, the easiest thing in the world (it’s a super complicated task).

	▲	fluoridation 4 hours ago \| parent [-]
		Calculus is definitely the harder task, considering it took a species developing the cognitive capacity for symbolic reasoning for it to show up, whereas any animal can figure out how to position its limbs. Yeah, we figured out how to make CAS programs before inverse kinematics software, but that's because computers were made to solve numerical problems, not to replace the cerebella of chordates.

▲

dezgeg 2 hours ago | parent | prev [-]

Even if it could, it would be ridiculously token inefficient to update huge amount of addresses instead when some small change is done to the middle of a binary

▲

xiaoyu2006 5 hours ago | parent | prev | next [-]

It should not. Abstraction in software engineering brings intelligence. (compression correlates to intelligence)

	▲	shshshjaja 5 hours ago \| parent \| next [-]
		runApp() Done! Excellent abstraction. High intelligence.
	▲	frwrfwrfeefwf 5 hours ago \| parent \| prev \| next [-]
		people don't get this
	▲	dyauspitr 5 hours ago \| parent \| prev [-]
		Why? I mean this is all emergent, right? And it’s not like humans ever work at this level. It would be very interesting to see what sort of outputs and abstractions an LLM comes up with.

▲

bandrami 5 hours ago | parent | prev | next [-]

Generative algorithms have been studied for decades now and while they have led to some interesting results they're a bad fit for LLMs because there's no such thing as a "plausible" binary: a small perturbation yields an unusable result.

▲

fulafel 5 hours ago | parent | prev | next [-]

Technically they are, just a subset. But still a practical one, they're frequently used to produce executable files.

▲

rvz 5 hours ago | parent | prev | next [-]

I think you forgot the "/s"

▲

wahnfrieden 5 hours ago | parent | prev | next [-]

[flagged]

▲

junior44660 4 hours ago | parent | prev [-]

[flagged]