I am yet to see a vibe coded success that isn't a small program that already exists in multiple forms in the training data. Let's see something ground-breaking. If AI coding is so great and is going to take us to 10x or 100x productivity let's see it generate a new, highly efficient compression algorithm or a state-of-art travelling salesman solution.

▲ boplicity 7 hours ago | parent | next [-]

> Let's see something ground-breaking

Why? People don't ask hammers to do much more than bash in nails into walls.

AI coding tools can be incredibly powerful -- but shouldn't that power be focused on what the tool is actually good at?

There are many, many times that AI coding tools can and should be used to create a "small program that already exists in multiple forms in the training data."

I do things like this very regularly for my small business. It's allowed me to do things that I simply would not have been able to do previously.

People keep asking AI coding tools to be something other than what they currently are. Sure, that would be cool. But they absolutely have increased my productivity 10x for exactly the type of work they're good at assisting with.

▲

Teknomadix 6 hours ago | parent | next [-]

>People don't ask hammers to do much more than bash in nails into walls.

“It resembles a normal hammer but is outfitted with an little motor and an flexible head part which moves back and forth in a hammering motion, sparing the user from moving his or her own hand to hammer something by their own force and by so making their job easier”

https://gremlins.fandom.com/wiki/Electric_Hammer

	▲	pigpop 6 hours ago \| parent [-]
		Good reference and a funny scene but doesn't quite hit home because we have invented improved hammers in the form of pneumatic nail guns and even cordless nailers (some pneumatic and some motorized) which could truly be called an "electric hammer". With this context the example may support the quote, nail guns do make driving nails much faster and easier but that's all they do. You can't pull a nail with a nail gun and you can't use it for any of the other things that a regular hammer can do. They do 10x your ability to drive nails though. On the other hand, LLMs are significantly more multi-purpose than a nail gun.

▲

ncallaway 7 hours ago | parent | prev | next [-]

> People keep asking AI coding tools to be something other than what they currently are.

I think it's for a very reasonable reason: the AI coding tool salespeople are often selling the tools as something other than what they currently are.

I think you're right, that if you calibrate your expectations to what the tools are capable of, there's definitely. It would be nice if the marketing around AI also did the same thing.

	▲	onion2k 7 hours ago \| parent \| next [-]
		AI sales seems to be very much aligned with productivity improvement - "do more of the same but faster" or "do the same with fewer people"). No one is selling "do more".
	▲	BeetleB 5 hours ago \| parent \| prev [-]
		> I think it's for a very reasonable reason: the AI coding tool salespeople are often selling the tools as something other than what they currently are. And if this submission was an AI salesperson trying to sell something, the comment/concern would be pertinent. It is otherwise irrelevant here.

▲

BobbyJo 7 hours ago | parent | prev | next [-]

Yes! I can't tell you the number of times I thought to myself "If only there was a way for this problem to be solved once instead of being solved over and over again". If that is the only thing AI is good at, then it's still a big step up for software IMO.

▲

fiyec30375 6 hours ago | parent [-]

It's true. Why should everyone look up the same API docs and make the same mistakes when AI can write it for you instantly and correctly?

	▲	oopwhat 2 hours ago \| parent [-]
		[dead]

▲

blauditore 7 hours ago | parent | prev | next [-]

Because that's the vision of many companies trying to sell AI. Saying that what it can do now is actually already good enough might be true, but it's also moving the goalposts compared to what was promised (or feared, depending who you're asking).

	▲	simonw 6 hours ago \| parent \| next [-]
		One of the many important skills needed to navigate our weird new LLM landscape is ignoring what the salespeople say and listening to the non-incentivized practitioners instead.
	▲	darkerside 7 hours ago \| parent \| prev [-]
		Can we get specific? What company and salesperson made what claim? Let's not disregard interesting achievements because they are not something else.

▲

Arisaka1 2 hours ago | parent | prev | next [-]

>Why?

Because I keep wondering myself if AI is here and our output is charged up, then why am I keep seeing more of the same products but with an "AI" sticker slapped on top of them? From a group of technologists like HN and the startup world, that live on the edge of evolution and revolution, maybe my expectations were a bit too high.

All I see is the equivalent of a "look how fast my new car made me go to the super market, when I'm not too demanding on the super market I want to end up with, and all I want is milk and eggs". Which is 100% fine, but at the end of the day I eat the same omelette as always. In this metaphor, I don't feel the slightest behind, or have any sense of FOMO if I cook my omelette slowly. I guess I have more time for my kids if I see the culinary arts as just a job. And it's not like restaurants suddenly get all their tables booked faster just because everyone cooks omelettes faster.

>It's allowed me to do things that I simply would not have been able to do previously.

You're not the one doing them. Me barking orders to John Carmack himself doesn't make me a Quake co-creator, and even if I micromanage his output like the world's most toxic micromanager who knows better I'm still not Carmack.

On top of that, you would have been able to do previously, if you cared enough to upskill to the point where token feeding isn't needed for you to feel productive. Tons of programmers broke barriers, and solved problems that haven't been solved by anyone in their companies before.

I don't see why everyone claiming that they previously couldn't do something is a bragging point. The LLM's that you're using were trained by the Google results you could've gotten if you Google searched.

▲

boplicity 7 hours ago | parent | prev | next [-]

To be clear, I see a lot of "magical thinking" among people who promote AI. They imagine a "perfect" AI tool that can basically do everything better than a human can.

Maybe this is possible. Maybe not.

However, it's a fantasy. Granted, it is a compelling fantasy. But its not one based on reality.

A good example:

"AI will probably be smarter than any single human next year. By 2029, AI is probably smarter than all humans combined.” -- Elon Musk

This is, of course, ridiculous. But, why should we let reality get in the way of a good fantasy?

▲

falcor84 6 hours ago | parent [-]

> AI will probably be smarter than any single human next year.

Arguably that's already so. There's no clear single dimension for "smart"; even within exact sciences, I wouldn't know how to judge e.g. "Who was smarter, Einstein or Von Neumann?". But for any particular "smarts competition", especially if it's time limited, I'd expect Claude 4.5 Opus and Gemini 3 Pro to get higher scores than any single human.

	▲	darkwater an hour ago \| parent [-]
		So we are back to the original comment that generated this thread: why hasn't AI generated a new and better compression algorithm, for example?

▲

spzb 7 hours ago | parent | prev [-]

> Why? People don't ask hammers to do much more than bash in nails into walls.

No one is propping up a multi-billion dollar tech bubble by promising hammers that do more than bash nails. As a point of comparison that makes no sense.

	▲	pigpop 6 hours ago \| parent \| next [-]
		The software development market is measured in tens of billions to hundreds of billions of dollars depending on which parts you're looking at so inventing a better hammer (development tool) can be expected to drive billions of dollars of value. How many billions depends on how good of a tool it turns out to be in the end. That's only counting software, it's also directly applicable to all media (image, video, audio, text) and some scientific domains (genetics, medicine, materials, etc.)
	▲	falcor84 6 hours ago \| parent \| prev [-]
		That's nitpicking; in this manner you can dismiss any analogy, by finding an aspect on which it's different from the original comparandum.

▲ anon7000 7 hours ago | parent | prev | next [-]

You’re right, but at the same time, 99% of software people need has already been done in some form. This gets back to the article on “perfect software” [1] posted last week. This bookshelf is perfect for the guy who wrote it and there isn’t anything exactly like it out there. The common tools on the App Store (goodreads) don’t fit his needs. But he was able to create a piece of “perfect software” that exactly meets his own goals and his own design preferences. And it was very easy to accomplish with LLMs, just by putting together pieces of things that have been done before.

This is still pretty great!

1: https://outofdesk.netlify.app/blog/perfect-software

	▲	pigpop 5 hours ago \| parent [-]
		Yes, that's an excellent framing of where we're at and the role that LLM generated software is excelling in. Custom software has been out of reach for many people who would benefit from it due to requiring either a lot of money to pay someone to build it or a lot of time to learn how to build it yourself and execute on that process. Right now you can essentially use services like Claude as a custom software "app store", although I'd really call it a service, where you can say "I'd like an app that does X" and depending on the scope you can get that app as a Claude Artifact in a few minutes or, if you're familiar with software development and build/deployment processes, in a few hours to days as a more traditional software artifact which you can host somewhere or install locally. Google is working hard to make this even more achievable for non-developers with Google AI Studio https://aistudio.google.com/ and Firebase Studio https://firebase.studio/

▲ josu 6 hours ago | parent | prev | next [-]

I was able to lead these two competitions using LLM agents, with no prior rust or c++ knowledge. They both have real world applications.

- https://highload.fun/tasks/15/leaderboard

- https://highload.fun/tasks/24/leaderboard

In both cases my score showed other players that there were better solutions and pushed them to improve their scores as well.

▲ jungturk 7 hours ago | parent | prev | next [-]

Much of the coding we do is repetitive and exists in the training data, so I think its pretty great if AI can eliminate that toil and liberate the meat to focus on the creative work.

	▲	SOLAR_FIELDS 7 hours ago \| parent [-]
		There’s a reason they call working at Google “shuffling protobufs” for the vast majority of engineers. Most software work isn’t innovative compression algorithms. It’s moving data around, which is a well understood problem

▲ lowkey_ 3 hours ago | parent | prev | next [-]

You're just asking for the opposite of what AI does.

90-99% of an engineer's work isn't entirely novel coding that has never existed before, so by succeeding at what "already exists", it can take us to 10x-100x productivity.

The automation of all that work is groundbreaking in and of itself.

I think that, for a while into the future at least, humans will be relegated to generating that groundbreaking work, and the AI will increasingly handle the rest.

▲ skrotumnisse 7 hours ago | parent | prev | next [-]

I find this type of comment depressing. This is a time for exploration and learning new things. This is a prefect way to do so. It’s a small project that solves the problem. Better time spent vibe coding it then to evaluate existing alternatives.

▲ fny 6 hours ago | parent | prev | next [-]

I have worked on out of sample problems, and AI absolutely struggles, but it dramatically accelerates the research process. Testing ideas is cheap, support tools are quick to write, and the LLM itself is a tremendous research tool itself.

More generally, I do think LLMs grant 10x+ performance for most common work: most of what people do manually is in the training data (which is why there's so much of it in the first place.) 10x+ in those domains can in theory free up more brain space to solve the problems you're talking about.

My advice to you is to tone down the cynicism, and see how it could help you. I'll admit, AI makes me incredibly anxious about my future, but it's still fun to use.

▲ falcor84 6 hours ago | parent | prev | next [-]

I am yet to see an "AI doesn't impress me" comment that added anything to the discussion. Yes, there's always going to be a state of the art and things that are as of yet beyond the state of the art.

▲ MontyCarloHall 7 hours ago | parent | prev | next [-]

Forget utterly groundbreaking things, I want to hear maintainers of complex, actively developed, and widely used open-source projects (e.g. ffmpeg, curl, openssh, sqlite) start touting a massive uptick in positive contributions, pointing to a concrete influx of high-quality AI-assisted commits. If AI is indeed a 10x force multiplier, shouldn't these projects have seen 10 years' worth of development in the last year?

Don't get me wrong, AI is at least as game-changing for programming as StackOverflow and Google were back in the day. Being able to not only look up but automatically integrate things into your codebase that already exist in some form in the training data is incredibly useful. I use it every day, and it's saved me hours of work for certain specific tasks [0]. For tasks like that, it is indeed a 10x productivity multiplier. But since these tasks only comprise a small fraction of the full software development process, the rest of which cannot be so easily automated, AI is not the overall 10x force multiplier that some claim.

[0] https://news.ycombinator.com/item?id=45511128

▲ simonw 6 hours ago | parent | next [-]

> I want to hear maintainers of complex, actively developed, and widely used open-source projects (e.g. ffmpeg, curl, openssh, sqlite) start touting a massive uptick in positive contributions

That's obviously not going to happen, because AI tools can't solve for taste. Just because a developer can churn out working code with an LLM doesn't mean they have the skills to figure out what the right working code to contribute to a project is, and how to do so in a way that makes the maintainers lives easier and not harder.

That skill will remain rare.

(Also SQLite famously refuses to accept external contributions, but that's a different issue.)

▲ SQLite 3 hours ago | parent [-]

No, Simon, we don't "refuse". We are just very selective and there is a lot of paperwork involved to confirm the contribution is in the public domain and does not contaminate the SQLite core with licensed code. Please put the false narrative that "SQLite refuses outside contributions" to rest. The bar is high to get there, but the SQLite code base does contain contributed code.

▲ mtlynch 2 hours ago | parent | next [-]

Dr. Hipp, I love SQLite but also had simonw's misapprehension that the project did not accept contributions. The SQLite copyright page says:

> Contributed Code

> In order to keep SQLite completely free and unencumbered by copyright, the project does not accept patches. If you would like to suggest a change and you include a patch as a proof-of-concept, that would be great. However, please do not be offended if we rewrite your patch from scratch.

I realize that the section, "Open-Source, not Open-Contribution" says that the project accepts contributions, but I'm having trouble understanding how that section and the "Contributed Code" section can both be accurate. Is there a distinction between accepting a "patch" vs. accepting a "contribution?"

If you're planning to update this page to reduce confusion of the contribution policy, I humbly suggest a rewrite of this sentence to eliminate the single and double negatives, which make it harder to understand:

> In order to keep SQLite in the public domain and ensure that the code does not become contaminated with proprietary or licensed content, the project does not accept patches from people who have not submitted an affidavit dedicating their contribution into the public domain.

Could be rewritten as:

> In order to keep SQLite in the public domain and prevent contamination of the code from proprietary or licensed content, the project only accepts patches from people who have submitted an affidavit dedicating their contribution into the public domain.

[0] https://sqlite.org/copyright.html

	▲	simonw 2 hours ago \| parent [-]
		Yes, that "does not accept patches" line must have been where I picked up my incorrect mental model.

▲ simonw 3 hours ago | parent | prev [-]

Thanks for the correction, and sorry for getting that wrong. I genuinely didn't know that.

Found that paperwork here: https://www.sqlite.org/copyright-release.html

I will make sure not to spread that misinformation further in the future!

Update: I had a look in fossil and counted 38 contributors:

  brew install fossil
  fossil clone https://www.sqlite.org/src sqlite.fossil
  fossil sql -R sqlite.fossil "
    SELECT user, COUNT(*) as commits
    FROM event WHERE type='ci'
    GROUP BY user ORDER BY commits DESC
  "

Blogged about this (since it feels important to help spread the correction about this): https://simonwillison.net/2025/Dec/29/copyright-release/

▲ zwnow 7 hours ago | parent | prev | next [-]

> Being able to not only look up but automatically integrate things into your codebase that already exist in some form in the training data is incredibly useful.

Until it decides to include code it gathered from a stackoverflow post 15 years ago probably introducing security related issues or makes up libraries on the go or even worse, tries to make u install libs that were part of a data poisoning attack.

▲

MontyCarloHall 7 hours ago | parent [-]

It's no different from supervising a naïve junior engineer who also copy/pastes from 15 year old SO posts (a tale as old as time): you need to carefully review and actually grok the code the junior/AI writes. Sometimes this ends up taking longer than writing it yourself, sometimes it doesn't. As with all decisions in delegating work, the trick is knowing ahead of time whether this will be the case.

▲

spzb 7 hours ago | parent | next [-]

Naive junior engineers eventually learn and become competent senior engineers. LLMs forget everything they "learn" as soon as the context window gets too big.

	▲	MontyCarloHall 7 hours ago \| parent \| next [-]
		Very true! I liken AI to having an endless supply of newly hired interns with near-infinite knowledge but intern-level skills.
	▲	cheevly 5 hours ago \| parent \| prev [-]
		There are like a dozen well-established ways to overcome this. Learn how to use the basic tools and patterns my dude.

▲

zwnow 6 hours ago | parent | prev [-]

I have yet to see a junior trying to install random/non existing libs.

	▲	pigpop 5 hours ago \| parent [-]
		If you forced them to try it from memory without giving them access to the web you sure would.

▲ anthonypasq 7 hours ago | parent | prev | next [-]

the creator of claude code said on twitter he hasnt opened an ide in a month and merged 200 prs.

▲

MontyCarloHall 7 hours ago | parent [-]

Might the creator of Claude Code have some … incentives … to develop like that, or at least claim that he does?

As someone who frequently uses Claude Code, I cannot say that a year's worth of features/improvements have been added in the last month. It bears repeating: if AI is truly a 10x force multiplier, you should expect to see a ~year's worth of progress in a month.

▲

simonw 6 hours ago | parent [-]

Opus 4.5 is just a few days over a month old. Boris would have had access to that for a while before its release though.

▲

shimman 5 hours ago | parent [-]

Boris is someone that is employed by Anthropic and has a massive stake in them going public, standing to make millions.

They are by definition a biased source and should not be referenced as such.

	▲	simonw 4 hours ago \| parent [-]
		Nobody here claimed that Boris wasn't a biased source. I do however think he is not an actively dishonest source. When he says "In the last thirty days, I landed 259 PRs -- 497 commits, 40k lines added, 38k lines removed. Every single line was written by Claude Code + Opus 4.5." I believe he is telling the truth. That's what dogfooding your own product looks like! https://twitter.com/bcherny/status/2004887829252317325

▲ spzb 7 hours ago | parent | prev [-]

curl in particular is being plagued by AI-slop security reports which are actively slowing development by forcing the maintainers to triage crap when they could be working on new features (or, you know, enjoying their lives) eg https://www.theregister.com/2025/07/15/curl_creator_mulls_ni...

	▲	leleat 7 hours ago \| parent [-]
		On the other hand, we had this story[^1], where the maintainer of curl mentions a a bunch of actual useful reports by someone using AI tools. [^1]: https://news.ycombinator.com/item?id=45449348

▲ pranavm27 7 hours ago | parent | prev | next [-]

Its good that way right? Let me as a human do the interesting thinking for which my brains are meant while you AI do what they chips were built for.

I am happy as is tbh, not even looking for AGI and all. Just that the LLM be close enough to my thinking scale so that it does not feel "why am I talking with this robot".

▲ SJMG 7 hours ago | parent | prev | next [-]

https://thenewstack.io/how-deepminds-alphatensor-ai-devised-...

Not either of the species of algorithms you've described, but still an advance.

▲

spzb 7 hours ago | parent [-]

That's about as far removed from vibe coding as you can get. It's the result of an algorithm developed for a specific purpose by researchers at one of the most advanced machine learning companies.

▲

fiyec30375 6 hours ago | parent [-]

Who really cares? The goalpost of "AI is useless because I can't vibe code novel discoveries" is a strawman. AI and vibe coding are transformational. So are AI-enhanced efforts to solve longstanding, difficult scientific problems. If cancer is cured with AI assistance, does it really matter if it was vibe-cured or state-of-the-art-lab-cured?

	▲	spzb 5 hours ago \| parent [-]
		Ironic to call it a strawman whilst making a strawman yourself. I never said AI was useless, I said vibe coding hasn't produced anything novel.

▲ bdcravens 4 hours ago | parent | prev | next [-]

10x productivity can also mean allowing a single staff member to do the boring work of 10 humans.

▲ blks 3 hours ago | parent | prev | next [-]

It will not generate anything novel because it can’t.

▲ plaidfuji 6 hours ago | parent | prev | next [-]

… or, let’s see humans who are now 10-100x more productive (due to automation of mundane tasks that are already part of the training data) do the things you’re asking for.

▲ Forgeties79 7 hours ago | parent | prev | next [-]

And to add to this, for some reason people really bristle if you say that many LLM’s are just search with extra steps. This feels like an extension of that. It’s just reinventing the wheel over and over again based on a third party’s (admittedly often a solid approximation but still not exact) educated guess of what a wheel may be. It all seems like a rather circuitous way to accomplish things unless your goal isn’t to build a wheel but rather tinker and experiment with the concept of a wheel and learn something in the process. Totally valid, but I’m pretty sure that’s not what open AI et al are pitching lol

▲ zellyn 7 hours ago | parent | prev | next [-]

trifling.org is an entire Python coding site, offline first (localstorage after first load), with docs, turtle graphics, canvas, and avatar editor, vibe coded from start to finish, with all conversations in the GitHub repo here: https://github.com/zellyn/trifling/tree/main/docs/sessions

This is going to destroy my home network, since I never moved it off the little Lenovo box sitting in my laundry room beside the Eero waypoint, but I’m out of town for three days, so

Granted, the seed of the idea was someone posting about how they wired pyiodide to Ace in 400 lines of JavaScript, so I can’t truly argue it’s non-trivial.

As a light troll to hackernews, only AI-written contributions are accepted

[Edit: the true inception of this project was my kid learning Python at school and trinket.io inexplicably putting Python 3 but not 2 behind the paywall. Alas, Securely will not let him and his classmates actually access it ]

▲ belter 6 hours ago | parent | prev | next [-]

Every two months, I run a very simple experiment to decide whether I should stop shorting NVDA....Think of it as my personal Pelican on a Bike test. :-)

Here is how it works: I take the latest state of the art model, usually one of the two or three currently being hyped....and ask it to create a short document that teaches Java, Python, or Rust, in 30 to 60 min, complete with code examples. Then I ask the same model to review its own produced artifact, for correctness and best practices.

What happens next is remarkably consistent. The model produces a glowing review, confidently declaring the document “production ready”… while the code either does not compile, contains obvious bugs, or relies on outright bad practices.

When I point this out, the model apologizes profusely and generates a “fixed” version which still contains errors. I rinse and repeat until I give up.

This is still true today, including with models like Opus 4.5 and ChatGPT 5.2. So whenever I read comments about these models being historic breakthroughs, I can’t help but imagine they are mostly coming from teams proudly generating technical debt at 100× the usual speed.

Things go even worst, when you ask the model to review a Cloud Architecture....

▲

gjimmel 6 hours ago | parent | next [-]

Ok, but if you wrote some massive corpus of code with no testing it probably would not compile either.

I think if you want to make this a useful experiment you should use one of the coding assistants that can test and iterate on its code, not some chatbot which is optimized to impress nontechnical people while being as cheap as possible to run.

▲

belter 6 hours ago | parent [-]

>> Chatbot which is optimized to impress nontechnical people

Is that how we call Opus 4.5 now? :-)

▲

rabf 5 hours ago | parent [-]

That depends a lot on the system prompt and the tooling available to the model. Are you trying thin in Claude code or Factory.ai, or are you using a chat interface? The difference in the outcome can be large.

	▲	belter 2 hours ago \| parent [-]
		Random anecdotes from the Internet say no: "I paid for the $100 Claude Max plan so you don't have to - an honest review" -https://www.reddit.com/r/ClaudeAI/comments/1l5h2ds/i_paid_fo...

▲

pigpop 5 hours ago | parent | prev | next [-]

I'm sorry but I don't quite believe you because I've done exactly this for learning much more complicated topics. For fun I've been learning about video game programming in the Odin programming language using a Claude project where I have Opus 4.5 write tutorials, including working code examples that are designed to be integrated with each other into a larger project. We've covered maze generation, Delaunay triangulation, MSTs, state machines, rendering via Raylib and RayGUI, and tweening for animations. All of those worked quite well with only very minor corrections which Opus was also very helpful for diagnosing and fixing. I also had it produce a full tutorial on implementing a relational database in Odin but I haven't had time to work my way through all of it yet. This is all with a somewhat niche language like Odin that I wouldn't expect there to be a lot of training data for so you'll excuse my incredulity that you couldn't get usable introductory code for much more commonly used languages like Java and Python.

I'm wondering if your test includes allowing the models to run their code in order to validate it and then fix it using the error output? Would you be willing to share the prompts and maybe some examples of the errors?

I haven't had many problems working in Claude Code even with full on "vibe coding". One notable recent exception was in writing integration tests for a p2p app that uses WebRTC, XTerm.js, and Yjs where it ran into some difficulty creating a testing framework that involved a headless client and a local MQTT broker where we had to fork a few child processes to test communication between them. Opus got bogged down working on its own so I stepped in and got things set up correctly (while chatting with Opus through the web interface instead of CC). The problem seemed to be due to overfilling the context since the test suite files were too long so I could have probably avoided the manual work by just having Opus break those up first.

▲

cheevly 6 hours ago | parent | prev [-]

You clearly live in a different reality than me entirely. Complete opposite experience.

	▲	belter 6 hours ago \| parent [-]
		Would you kindly please detail how your experience is different? Complete opposite experience without details does not really say much.

▲ wonderwonder 5 hours ago | parent | prev | next [-]

Microsoft is currently hiring engineers to rewrite their entire codebase in Rust via vibecoding. Something to the tune of a million lines of code per developer per month.

▲ cube2222 7 hours ago | parent | prev | next [-]

> If AI coding is so great and is going to take us to 10x or 100x productivity

That seems to be a strawman here, no? Sure, there exist people/companies claiming 10x-100x productivity improvements. I agree it's bullshit.

But the article doesn't seem to be claiming anything like this - it's showing the use of vibe-coding for a small personalized side-project, something that's completely valid, sensible, and a perfect use-case for vibe-coding.

▲ dboreham 7 hours ago | parent | prev | next [-]

This comment is wrong in two ways:

1. Current LLMs do much better than produce "small programs that already exist in multiple forms in the training data". Of course the knowledge they use does need to exist somewhere in training data, but they operate at a higher level of abstraction than simply spitting out programs they've already seen whole cloth. Way higher.

2. Inventing a new compression algorithm is beyond the expectations of all but the the most wild-eyed LLM proponents, today.

	▲	rabf 5 hours ago \| parent \| next [-]
		"the knowledge they use does need to exist somewhere in training data", I'm not to sure about that. The current coding enviroments for AI give the models a lot of reasoning power with tooling to test, iterate and web search. They frequently look at the results of their code runs now and try different approaches to get the desired result. Its common for them to write their own tests unprompted and re-evaluate accordingly.
	▲	blauditore 7 hours ago \| parent \| prev [-]
		2. is not really true. There are famous people claiming that AI will fix climate change, so we as humans should stop bothering.

▲ rvz 7 hours ago | parent | prev | next [-]

> let's see it generate a new, highly efficient compression algorithm or a state-of-art travelling salesman solution.

This is the "promise" that was being sold here and in reality, we yet haven't seen anything innovative or even a sophisticated original groundbreaking discovery from an LLM with most of the claims being faked or unverified.

Most of the 'vibe-coding' uses here are quite frankly performative or used for someone's blog for 'content'.

▲ fiyec30375 6 hours ago | parent | prev [-]

[dead]