Remix.run Logo
mickdarling 4 hours ago

This is where the desire to NOT anthropomorphize LLMs actually gets in the way.

We have mechanisms for ensuring output from humans, and those are nothing like ensuring the output from a compiler. We have checks on people, we have whole industries of people whose whole careers are managing people, to manage other people, to manage other people.

with regards to predictability LLMs essentially behave like people in this manner. The same kind of checks that we use for people are needed for them, not the same kind of checks we use for software.

bigstrat2003 3 hours ago | parent | next [-]

> The same kind of checks that we use for people are needed for them...

The whole benefit of computers is that they don't make stupid mistakes like humans do. If you give a computer the ability to make random mistakes all you have done is made the computer shitty. We don't need checks, we need to not deliberately make our computers worse.

skydhash 4 hours ago | parent | prev [-]

> The same kind of checks that we use for people are needed for them

Those checks works for people because humans and most living beings respond well to rewards/punishment mechanisms. It’s the whole basis of society.

> not the same kind of checks we use for software.

We do have systems that are non deterministic (computer vision, various forecasting models…). We judge those by their accuracy and the likely of having false positive or false negatives (when it’s a classifier). Why not use those metrics?

wizzwizz4 3 hours ago | parent [-]

Because by those metrics, LLMs aren't very good.

LLM code completion compares unfavourably to the (heuristic, nigh-instant) picklist implementations we used to use, both at the low-level (how often does it autocomplete the right thing?) and at the high-level (despite many believing they're more effective, the average programmer is less effective when using AI tools). We need reasons to believe that LLMs are great and do all things, therefore we look for measurements that paint it in a good light (e.g. lines of code written, time to first working prototype, inclination to output Doom source code verbatim).

The reason we're all using (or pretending to use) LLMs now is not because they're good. It's almost entirely unrelated.