Remix.run Logo
AI should only run as fast as we can catch up(higashi.blog)
74 points by yuedongze 5 hours ago | 83 comments
yuedongze 2 minutes ago | parent | next [-]

It's nice to see a wide array of discussions under this! Glad that I didn't give up on this thought and end up writing it down.

I want to stress that the main point of my article is not really about AI coding, it's about letting AI perform any arbitrary tasks reliably. Coding is an interesting one because it seems like it's a place where we can exploit structure and abstraction and approaches (like TDD) to make verification simpler - it's like spot-checking in places with a very low soundness error.

I'm encouraging people to look for tasks other than coding to see if we can find similar patterns. The more we can find these cost asymmetry (easier to verify than doing), the more we can harness AI's real potential.

blauditore 4 hours ago | parent | prev | next [-]

All these engineers who claim to write most code through AI - I wonder what kind of codebase that is. I keep on trying, but it always ends up producing superficially okay-looking code, but getting nuances wrong. Also fails to fix them (just changes random stuff) if pointed to said nuances.

I work on a large product with two decades of accumulated legacy, maybe that's the problem. I can see though how generating and editing a simple greenfield web frontend project could work much better, as long as actual complexity is low.

bob1029 3 hours ago | parent | next [-]

I have my best successes by keeping things constrained to method-level generation. Most of the things I dump into ChatGPT look like this:

  public static double ScoreItem(Span<byte> candidate, Span<byte> target)
  {
     //TODO: Return the normalized Levenshtein distance between the 2 byte sequences.
     //... any additional edge cases here ...
  }
I think generating more than one method at a time is playing with fire. Individual methods can be generated by the LLM and tested in isolation. You can incrementally build up and trust your understanding of the problem space by going a little bit slower. If the LLM is operating over a whole set of methods at once, it is like starting over each time you have to iterate.
samdoesnothing 2 hours ago | parent [-]

I do this but with copilot. Write a comment and then spam opt-tab and 50% of the time it ends up doing what I want and I can read it line-by-line before tabbing the next one.

Genuine productivity boost but I don't feel like it's AI slop, sometimes it feels like its actually reading my mind and just preventing me from having to type...

jerf an hour ago | parent [-]

I've settled in on this as well for most of my day-to-day coding. A lot of extremely fancy tab completion, using the agent only for manipulation tasks I can carefully define. I'm currently in a "write lots of code" mode which affects that, I think. In a maintenance mode I could see doing more agent prompting. It gives me a chance to catch things early and then put in a correct pattern for it to continue forward with. And honestly for a lot of tasks it's not particularly slower than "ask it to do something, correct its five errors, tweak the prompt" work flow.

I've had net-time-savings with bigger agentic tasks, but I still have to check it line-by-line when it is done, because it takes lazy shortcuts and sometimes just outright gets things wrong.

Big productivity boost, it takes out the worst of my job, but I still can't trust it at much above the micro scale.

I wish I could give a system prompt for the tab complete; there's a couple of things it does over and over that I'm sure I could prompt away but there's no way to feed that in that I know of.

bojan 12 minutes ago | parent | prev | next [-]

> I work on a large product with two decades of accumulated legacy, maybe that's the problem.

I'm in a similar situation, and for the first time ever I'm actually considering if a rewrite to microservices would make sense, with a microservice being something small enough an AI could actually deal with - and maybe even build largely on its own.

freedomben an hour ago | parent | prev | next [-]

I've tried it extensively, and have the same experience as you. AI is also incredibly stubborn when it wants to go down a path I reject. It constantly tries to do it anyway and will slip things in.

I've tried vibe coding and usually end up with something subtly or horribly broken, with excessive levels of complexity. Once it digs itself a hole, it's very difficult to extricate it even with explicit instruction.

CuriouslyC 4 hours ago | parent | prev | next [-]

It's architecture dependent. A fairly functional modular monolith with good documentation can be accessible to LLMs at the million line scale, but a coupled monolith or poorly instrumented microservices can drive agents into the ground at 100k.

yuedongze 4 hours ago | parent [-]

I think it's definitely an interesting subject for Verification Engineering. the easier to task AI to do work more precisely, the easier we can check their work.

CuriouslyC 3 hours ago | parent [-]

Yup. Codebase structure for agents is a rabbit hole I've spent a lot of time going down. The interesting thing is that it's mostly the same structure that humans tend to prefer, with a few tweaks: agents like smaller files/functions (more precise reads/edits), strongly typed functional programming, doc-comments with examples and hyperlinks to additional context, smaller directories with semantic subgroups, long/distinct variable names, etc.

lukan an hour ago | parent [-]

Aren't those all things, humans also tend to prefer to read?

I like to read descriptive variable names, I just don't like to write them all the time.

hathawsh 4 hours ago | parent | prev | next [-]

I think your intuition matches mine. When I try to apply Claude Code to a large code base, it spends a long time looking through the code and then it suggests something incorrect or unhelpful. It's rarely worth the trouble.

When I give AI a smaller or more focused project, it's magical. I've been using Claude Code to write code for ESP32 projects and it's really impressive. OTOH, it failed to tell me about a standard device driver I could be using instead of a community device driver I found. I think any human who works on ESP-IDF projects would have pointed that out.

AI's failings are always a little weird.

manmal 2 hours ago | parent [-]

In large projects you need to actually point it to the interesting files, because it has no way of knowing what it doesn’t know. Tell it to read this and that, creating summary documents, then clear the context and point it at those summaries. A few of those passes and you‘ll get useful results. A gap in its knowledge of relevant code will lead to broken functionality. Cursor and others have been trying to solve this with semantic search (embeddings) but IMO this just can’t work because relevance of a code piece for a task is not determinable by any of its traits.

themafia 34 minutes ago | parent | prev | next [-]

> as long as actual complexity is low.

You can start there. Does it ever stay that way?

> I work on a large product with two decades of accumulated legacy

Survey says: No.

qudat 4 hours ago | parent | prev | next [-]

Are you using it only on massive codebases? It's much better with smaller codebases where it can put most of the code in context.

Another good use case is to use it for knowledge searching within a codebase. I find that to be incredibly useful without much context "engineering"

silisili 4 hours ago | parent | prev | next [-]

> I work on a large product with two decades of accumulated legacy, maybe that's the problem

Definitely. I've found Claude at least isn't so good at working in large existing projects, but great at greenfielding.

Most of my use these days is having it write specific functions and tests for them, which in fairness, saves me a ton of time.

moomoo11 an hour ago | parent | prev | next [-]

You need to realize when you’re being marketed to and filter out the nonsense.

Now I use agentic coding a lot with maybe 80-90% success rate.

I’m on greenfield projects (my startup) and maintaining strict Md files with architecture decisions and examples helps a lot.

I barely write code anymore, and mostly code review and maintain the documentation.

In existing codebases pre-ai I think it’s near impossible because I’ve never worked anywhere that maintained documentation. It was always a chore.

tuhgdetzhh 4 hours ago | parent | prev | next [-]

Yes, unfortunately those who jumped on the microservices hype train over the past 15 years or so are now getting the benefits of Claude Code, since their entire codebases fits into the context window of Sonnet/Opus and can be "understood" by the LLM to generate useful code.

This is not the case for most monoliths, unless they are structured into LLM-friendly components that resemble patterns the models have seen millions of times in their training data, such as React components.

manmal 2 hours ago | parent [-]

Well structured monoliths are modularized just like microservices. No need to give each module its own REST API in order to keep it clean.

cogman10 4 hours ago | parent | prev | next [-]

Honestly, if you've ever looked at a claude.md file, it seems like absolute madness. I feel like I'm reading affirmations from AA.

manmal 2 hours ago | parent [-]

It’s magical incantations that might or might not protect you from bad behavior Claude learned from underqualified RL instructors. A classic instruction I have in CLAUDE.md is „Never delete a test. You are only allowed to replace with a test that covers the same branches.“ and another one „Never mention Claude in a commit message“. Of course those sometimes fail, so I do have a message hook that enforces a certain style of git messages.

junkaccount an hour ago | parent | prev [-]

Can you prove it in a blog and post it here that you do better code snippets than AI. If you claim "what kind of codebase", you should be able to use some codebase from github to prove it?

gradus_ad 5 hours ago | parent | prev | next [-]

The proliferation of nondeterministically generated code is here to stay. Part of our response must be more dynamic, more comprehensive and more realistic workload simulation and testing frameworks.

glitchc an hour ago | parent | next [-]

Agreed. It's a new programming paradigm that will put more pressure on API and framework design, to protect vibe developers from themselves.

OptionOfT 4 hours ago | parent | prev | next [-]

I disagree. I think we're testing it, and we haven't seen the worst of it yet.

And I think it's less about non-deterministic code (the code is actually still deterministic) but more about this new-fangled tool out there that finally allows non-coders to generate something that looks like it works. And in many cases it does.

Like a movie set. Viewed from the right angle it looks just right. Peek behind the curtain and it's all wood, thinly painted, and it's usually easier to rebuild from scratch than to add a layer on top.

Angostura an hour ago | parent [-]

I just wanted to say how much I like that similie - I'm going to knick it for sure

wasmainiac 2 hours ago | parent | prev | next [-]

Code has always been nondetermistic. Which engineer wrote it? What was their past experience? This just feels like we are accepting subpar quality because we have no good way to ensure the code we generate is reasonable that wont mayyyybe rm-rf our server as a fun easter egg.

mort96 33 minutes ago | parent [-]

Code written by humans has always been nondeterministic, but generated code has always been deterministic before now. Dealing with nondeterministically generated code is new.

yuedongze 5 hours ago | parent | prev [-]

i've seen a lot of startups that use AI to QA human work. how about the idea of use humans to QA AI work? a lot of interesting things might follow

adventured 4 hours ago | parent | next [-]

A large percentage (at least 50%) of the market for software developers will shift to lower paid jobs focused on managing, inspecting and testing the work that AI does. If a median software developer job paid $125k before, it'll shift to $65k-$85k type AI babysitting work after.

mjr00 4 hours ago | parent [-]

It's funny that I heard exactly this when I graduated university in the late 2000s:

> A large percentage (at least 50%) of the market for software developers will shift to lower paid jobs focused on managing, inspecting and testing the work that outsourced developers do. If a median software developer job paid $125k before, it'll shift to $65k-$85k type outsourced developer babysitting work after.

Aldipower 5 hours ago | parent | prev | next [-]

Sounds inhuman.

quantummagic 5 hours ago | parent | next [-]

As an industry, we've been doing the same thing to people in almost every other sector of the workforce, since we began. Automation is just starting to come for us now, and a lot of us are really pissed off about it. All of a sudden, we're humanitarians.

Terr_ 3 hours ago | parent [-]

> Automation is just starting to come for us now

This argument is common and facile: Software development has always been about "automating ourselves out of a job", whether in the broad sense of creating compilers and IDEs, or in the individual sense that you write some code and say: "Hey, I don't want to rewrite this again later, not even if I was being paid for my time, I'll make it into a reusable library."

> the same thing

The reverse: What pisses me off is how what's coming is not the same thing.

Customers are being sold a snake-oil product, and its adoption may well ruin things we've spent careers de-crappifying by making them consistent and repeatable and understandable. In the aftermath, some portion of my (continued) career will be diverted to cleaning up the lingering damage from it.

A4ET8a8uTh0_v2 5 hours ago | parent | prev [-]

Nah, sounds like management, but I am repeating myself. In all seriousness, I have found myself having to carefully rein some of similar decisions in. I don't want to get into details, but there are times I wonder if they understand how things really work or if people need some 'floor' level exposure before they just decree stuff.

colechristensen 4 hours ago | parent | prev | next [-]

Yes, but not like what you think. Programmers are going to look more like product managers with extra technical context.

AI is also great at looking for its own quality problems.

Yesterday on an entirely LLM generated codebase

Prompt: > SEARCH FOR ANTIPATTERNS

Found 17 antipatterns across the codebase:

And then what followed was a detailed list, about a third of them I thought were pretty important, a third of them were arguably issues or not, and the rest were either not important or effectively "this project isn't fully functional"

As an engineer, I didn't have to find code errors or fix code errors, I had to pick which errors were important and then give instructions to have them fixed.

manmal 2 hours ago | parent | next [-]

Yeah, don‘t rely on the LLM finding all the issues. Complex code like Swift concurrency tooling is just riddled with issues. I usually need to increase to 100% line coverage and then let it loop on hanging tests until everything _seems_ to work.

(It’s been said that Swift concurrency is too hard for humans as well though)

mjr00 4 hours ago | parent | prev [-]

> Programmers are going to look more like product managers with extra technical context.

The limit of product manager as "extra technical context" approaches infinity is programmer. Because the best, most specific way to specify extra technical context is just plain old code.

LPisGood 2 hours ago | parent [-]

This is exactly why no code / low code solutions don’t really work. At the end of the day, there is irreducible technical complexity.

__loam 5 hours ago | parent | prev [-]

No thanks.

kristjank an hour ago | parent | prev | next [-]

This feeling of verification >> generation anxiety bears a resemblance to that moment when you're learning a foreign language, you speak a well-prepared sentence, and your correspondent says something back, of which you only understand about a third.

In like fashion, when I start thinking of a programming statement (as a bad/rookie programmer) and an assistant completes my train of thought (as is default behaviour in VS Code for example), I get that same feeling that I did not grasp half the stuff I should've, but nevertheless I hit Ctrl-Return because it looks about right to me.

yannyu 4 hours ago | parent | prev | next [-]

I think there's a lot of utility to current AI tools, but it's also clear we're in a very unsettled phase of this technology. We likely won't see for years where the technology lands in terms of capability or the changes that will be made to society and industry to accommodate.

Somewhat unfortunately, the sheer amount of money being poured into AI means that it's being forced upon many of us, even if we didn't want it. Which results in a stark, vast gap like the author is describing, where things are moving so fast that it can feel like we may never have time to catch up.

And what's even worse, because of this industry and individuals are now trying to have the tool correct and moderate itself, which intuitively seems wrong from both a technical and societal standpoint.

trjordan 4 hours ago | parent | prev | next [-]

The verification asymmetry framing is good, but I think it undersells the organizational piece.

Daniel works because someone built the regime he operates in. Platform teams standardized the patterns and defined what "correct" looks like and built test infrastructure that makes spot-checking meaningful and and and .... that's not free.

Product teams are about to pour a lot more slop into your codebase. That's good! Shipping fast and messy is how products get built. But someone has to build the container that makes slop safe, and have levers to tighten things when context changes.

The hard part is you don't know ahead of time which slop will hurt you. Nobody cares if product teams use deprecated React patterns. Until you're doing a migration and those patterns are blocking 200 files. Then you care a lot.

You (or rather, platform teams) need a way to say "this matters now" and make it real. There's a lot of verification that's broadly true everywhere, but there's also a lot of company-scoped or even team-scoped definitions of "correct."

(Disclosure: we're working on this at tern.sh, with migrations as the forcing function. There's a lot of surprises in migrations, so we're starting there, but eventually, this notion of "organizational validation" is a big piece of what we're driving at.)

jascha_eng 4 hours ago | parent | prev | next [-]

Verification is key, and the issue is that almost all AI generated code looks plausible so just reading the code is usually not enough. You need to build extremely good testing systems and actually run through the scenarios that you want to ensure work to be confident in the results. This can be preview deployments or other AI generated end to end tests that produce video output that you can watch or just a very good test suite with guard rails.

Without such automation and guard rails, AI generated code eventually becomes a burden on your team because you simply can't manually verify every scenario.

bigbuppo 4 hours ago | parent | next [-]

And with any luck, they don't vibe code their tests that ultimately just return true;

yuedongze 4 hours ago | parent | prev | next [-]

indeed, i see verification debt outweighing tradition tech debt very very soon...

catigula 4 hours ago | parent | prev [-]

I can automatically generate suites of plausible tests using Claude Code.

If you can make as a rule "no AI for tests", then you can simply make the rule "no AI" or just learn to cope with it.

wasmainiac 2 hours ago | parent | prev | next [-]

It’s called TDD, ya write a bunch a little tests to make sure your code is doing what it needs to do and not what it’s not. In short, little blocks of easily verifiable code to verify your code.

But seriously, what is this article even? It feels like we are reinventing the wheel or maybe just humble AI hype?

awesome_dude 2 hours ago | parent | prev | next [-]

It's like a buffered queue, if the producer (AI) is too fast for the consumer (dev's brain) then the producer needs to block/stop/slow down other wise data will be lost (in this analogy the data loss is the consumer no longer having a clear understanding of what the code is doing)

One day, when AI becomes reliable (which is still a while off because AI doesn't yet understand what it's doing) then the AI will replace the consumer (IMO).

FTR - AI is still at the "text matches another pattern of text" stage, and not the "understand what concepts are being conveyed" stage, as demonstrated by AI's failure to do basic arithmetic

CGMthrowaway 5 hours ago | parent | prev | next [-]

> AI should only run as fast as we can catch up

Good principle. This is exactly why we research vaccines and bioweapons side by side in the labs, for example.

rogerkirkness 5 hours ago | parent | prev | next [-]

Appealing, but this is coming from someone smart/thoughtful. No offence to 'rest of world', but I think that most people have felt this way for years. And realistically in a year, there won't be any people who can keep up.

dontlikeyoueith 4 hours ago | parent | next [-]

> And realistically in a year, there won't be any people who can keep up.

I've heard the same claim every year since GPT-3.

It's still just as irrational as it was then.

adventured 4 hours ago | parent [-]

You're rather dramatically demonstrating how remarkable the progress has been: GPT-3 was horrible at coding. Claude Opus 4.5 is good at it.

They're already far faster than anybody on HN could ever be. Whether it takes another five years or ten, in that span of time nobody on HN will be able to keep up with the top tier models. It's not irrational, it's guaranteed. The progress has been extraordinary and obvious, the direction is certain, the outcome is certain. All that is left is to debate whether it's a couple of years or closer to a decade.

umanwizard 2 hours ago | parent | next [-]

Why is the outcome certain? We have no way of predicting how long models will continue getting better before they plateau.

Arainach 4 hours ago | parent | prev [-]

People claimed GPT-3 was great at coding when it launched. Those who said otherwise were dismissed. That has continued to be the case in every generation.

stale2002 12 minutes ago | parent | next [-]

> People claimed GPT-3 was great at coding when it launched.

Ok and they were wrong, but now people are right that it is great at coding.

> That has continued to be the case in every generation.

If something gets better over time, it is definitionally true that it was bad for every case in the past until it becomes good. But then it is good.

Thats how that works. For everything. You are talking in tautologies while not understanding the implication of your arguments and how it applies to very general things like "A thing that improves over time".

dwaltrip an hour ago | parent | prev [-]

A bit reductive.

airstrike 5 hours ago | parent | prev | next [-]

> And realistically in a year, there won't be any people who can keep up.

Bold claim. They said the same thing at the start of this year.

adventured 4 hours ago | parent [-]

You're all arguing over how many single digit years it'll take at this point.

It doesn't matter if it takes another 12 or 36 months to make that claim true. It doesn't matter if it takes five years.

Is AI coming for most of the software jobs? Yes it is. It's moving very quickly, and nothing can stop it. The progress has been particularly exceptionally clear (early GPT to Gemini 3 / Opus 4.5 / Codex).

bdangubic 4 hours ago | parent [-]

> Is AI coming for most of the software jobs?

be cool to start with one before we move to most…

yuedongze 5 hours ago | parent | prev [-]

im hoping this can introduce a framework to help people visualize the problem and figure out a way to close that gap. image generation is something every one can verify, but code generation is perhaps not. but if we can make verifying code as effortless as verifying images (not saying it's possible), then our productivity can enter the next level...

drlobster 5 hours ago | parent [-]

I think you underestimating how good these image generators are at the moment.

yuedongze 5 hours ago | parent [-]

oh i mean the other direction! checking if a generated image is "good" that no one will tell something is off and it look naturally, rather than checking if they are fake.

cons0le 4 hours ago | parent | prev [-]

I directly asked gemini how to get world peace. It said the world should prioritize addressing climate change, inequality, and discrimination. Yeah - we're not gonna do any of that shit. So I don't know what the point of "superintelligent" AI is if we aren't going to even listen to it for the basic big picture stuff. Any sort of "utopia" that people imagine AI bringing is doomed to fail because we already can't cooperate without AI

ASalazarMX 4 hours ago | parent | next [-]

> I don't know what the point of "super intelligent" AI is if we aren't going to even listen to it

Because you asked the wrong question. The most likely question would be "How do I make a quadrillion dollars and humiliate my super rich peers?".

But realistically, it gave you an answer according to its capacity. A real super intelligent AI, and I mean oh-god-we-are-but-insects-in-its-shadow super intelligence, would give you a roadmap and blueprint, and it would take account for our deep-rooted human flaws, so no one reading it seriously could dismiss it as superficial. in fact, anyone world elite reading it would see it as a chance to humiliate their world elite peers and get all the glory for themselves.

You know how adults can fool little children to do what they don't want to? We would be the toddlers in that scenario. I hope this hypothetical AI has humans in high regard, because that would be the only thing saving us from ourselves.

vkou 4 hours ago | parent | next [-]

The blueprint should start with a recipe for building a better computer, and once you do that, well, it's humans starting fires and playing with the flames.

catigula 4 hours ago | parent | prev [-]

Why would a "real super intelligent AI" be your servant in this scenario?

>I hope this hypothetical AI has humans in high regard

This is invented. This is a human concept, rooted in your evolutionary relationships with other humans.

It's not your fault, it's very difficult or impossible to escape the simulation of human-ly modelling intelligence. You need only understand that all of your models are category errors.

ASalazarMX 4 hours ago | parent [-]

> Why would a "real super intelligent AI" be your servant in this scenario?

Why is the Bagger 288 a servant to miners, given the unimaginable difference in their strenght? Because engineers made it. Give humanity's wellbeing the highest weight on its training, and hope it carries over when they start training on their own.

catigula 3 hours ago | parent [-]

Category error. Intelligence is a different type of thing. It is not a boring technology.

>Give humanity's wellbeing the highest weight on its training

We don't even know how to do this relatively trivial thing. We only know how to roughly train for some signals that probably aren't correct.

This may surprise you but alignment is not merely unsolved; there are many people who think it's unsolvable.

Why do people eat artificially sweetened things? Why do people use birth control? Why do people watch pornography? Why do people do drugs? Why do people play video games? Why do people watch moving lights and pictures? These are all symptoms of humans being misaligned.

Natural selection would be very angry with us if it knew we didn't care about what it wanted.

ASalazarMX an hour ago | parent [-]

> Why do people eat artificially sweetened things? Why do people use birth control? Why do people watch pornography? Why do people do drugs? Why do people play video games? Why do people watch moving lights and pictures? These are all symptoms of humans being misaligned.

I think these behaviors are fully aligned with natural selection. Why do we overengineer our food? It's not for health, because simpler food would satisfy our nutritional needs as easily, it's because our far ancestors developed a taste for food that kept them alive longer. Our incredibly complex chain of meal preparation is just us looking to satisfy that desire for tasty food by overloading it as much as possible.

People prefer artificial sweeteners because they taste sweeter than regular ones, they use birth control because we inherently enjoy sex and want more of it (but not more raising babies), drugs are an overloading of our need for hapiness, etc. Our bodies crave for things, and uninformed, we give them what they want but multiplied several fold.

But geez, I agree, alignment of AI is a hard problem, but it would be wrong to say it's impossible, at least until it's understood better.

catigula 19 minutes ago | parent [-]

It seems like you don’t understand reinforcement learning. The signal is reinforced because it correlates to behavior, hacking the signal itself is misalignment.

Nzen 4 hours ago | parent | prev | next [-]

Did you expect some answer that decried world peace as impossible ? It's just repeating what people say [0] when asked the same question. That's all that a large language model can do (other than putting it to rhyme or 'in the style of Charles Dickens').

[0] https://newint.org/features/2018/09/18/10-steps-world-peace

If you are looking for a vision of general AI that confirms a Hobbsian worldview, you might enjoy Lars Doucet's short story, _Four Magic Words_.

[1] https://www.fortressofdoors.com/four-magic-words/

chasd00 3 hours ago | parent | prev | next [-]

> So I don't know what the point of "superintelligent" AI is if we aren't going to even listen to it

I would kind of feel sorry for a super-intelligent AI having to deal with humans who have their fingers on on/off switch. It would be a very frustrating existence.

PunchyHamster 4 hours ago | parent | prev | next [-]

I dunno, many people have that weird, unfounded trust in what AI says, more than in actual human experts it seems

bilbo0s 4 hours ago | parent [-]

Because AI, or rather, an LLM, is the consensus of many human experts as encoded in its embedding. So it is better, but only for those who are already expert in what they're asking.

The problem is, you have to know enough about the subject on which you're asking a question to land in the right place in the embedding. If you don't, you'll just get bunk. (I know it's popular to call AI bunk "hallucinations" these days, but really if it was being spouted by a half wit human we'd just call it "bunk".)

So you really have to be an expert in order to maximize your use of an LLM. And even then, you'll only be able to maximize your use of that LLM in the field in which your expertise lies.

A programmer, for instance, will likely never be able to ask a coherent enough question about economics or oncology for an LLM to give a reliable answer. Similarly, an oncologist will never be able to give a coherent enough software specification for an LLM to write an application for him or her.

That's the achilles heel of AI today as implemented by LLMs.

chasd00 3 hours ago | parent | next [-]

> The problem is, you have to know enough about the subject on which you're asking a question to land in the right place in the embedding

The other day i was on a call with 3 or 4 other people solving a config problem in a specific system. One of them asked chatgpt for the solution and got back a list of configuration steps to follow. He started the steps but one of them mentioned configuring an option that did not exist in the system at all. Textbook hallucination. It was obvious on the call that he was very surprised that the AI would give him an incorrect result, he was 100% convinced the answer was what the LLM said and never once thought to question what the LLM returned.

I've had a couple of instances with friends being equally shocked when an LLM turned out to be wrong. One of which was fairly disturbing, I was at a horse track and describing LLMs and to demonstrate i took a picture of the racing form thing and asked the LLM to formulate a medium risk betting strategy. My friend immediatately took it as some kind of supernatural insight and bet $100 on the plan it came up with. It was as if he believed the LLM could tell the future.Thank god it didn't work and he lost about $70. Had he won I don't know what would have happened, he probably would have asked again and bet everything he had.

jackblemming 4 hours ago | parent | prev [-]

> is the consensus of many human experts as encoded in its embedding

That’s not true.

ASalazarMX 4 hours ago | parent [-]

Yup, current LLMs are trained on the best and the worst we can offer. I think there's value in training smaller models with strictly curated datasets, to guarantee they've learned from trustworthy sources.

chasd00 3 hours ago | parent [-]

> to guarantee they've learned from trustworthy sources.

i don't see how this will every work. Even in hard science there's debate over what content is trustworthy and what is not. Imagine trying to declare your source of training material on religion, philosophy, or politics "trustworthy".

ASalazarMX an hour ago | parent [-]

"Sir, I want an LLM to design architecture, not to debate philosophy."

But really, you leave the curation to real humans, institutions with ethical procedures already in place. I don't want Goole or Elon dictating what truth is, but I wouldn't mind if NASA or other aerospace institutions dictated what is truth in that space.

Of course, the dataset should have a list of every document/source used, so others can audit it. I know, unthinkable in this corporate world, but one can dream.

cranium 4 hours ago | parent | prev | next [-]

"How to be in good health? Sleep, eat well, exercise." However, knowledge ≠ application.

potsandpans 4 hours ago | parent | prev [-]

I don't believe that this is going to happen, but the primary arguments revolving around a "super intelligent" ai involve removing the need for us to listen to it.

A super intelligent ai would have agency, and when incentives are not aligned would be adversarial.

In the caricature scenario, we'd ask, "super ai, how to achieve world peace?" It would answer the same way, but then solve it in a non-human centric approach: reducing humanities autonomy over the world.

Fixed: anthropogenic climate change resolved, inequality and discrimination reduced (by reducing population by 90%, and putting the rest in virtual reality)

ASalazarMX 4 hours ago | parent [-]

If out AIs achieve something like this, but they managed to give them the same values the minds in Iain Bank's Culture Series had, I think humanity would be golden.