Remix.run Logo
heresalexandria 5 hours ago

My observation is that a lot of folks still discounting the capabilities or impact of AI either aren't working with frontier intelligence or aren't using it right.

While the coding horse has been beat within an inch of its life already, I'd recommend throwing Codex on 5.5 high thinking with Computer Use + auto approve at the next thing you're about to spend 5+ minutes on to start to get a feel for how well it handles a broad range of work across literally any surface you interact with today. Use voice mode & mobile app for remote control to seriously watch the friction break down.

Is it always perfect? Maybe not - but for a dramatically increasingly slate of tasks it's becoming a no brainer to offload the busywork and raise the bar on what a single person can do.

It's natural to have hype when you see where this already and where it's going.

lilbigdoot 4 hours ago | parent | next [-]

I tried to have multiple models convert a simple textmate grammar to a vim one, and none of them could do it. They couldn't even use the right names between the regex matches and the color definitions. I tried for about 30 minutes. It took me about 5

I tried having them work on a LSP. The fact I got a one shot half working autocomplete based on my existing work was cool, but again, they flailed on incredibly simple things like file path normalization / converting from a URI and I had to rewrite a decent amount of code. I don't think I saved any time

People keep throwing this out there but I keep wondering where are the receipts? I am seeing less interesting software released, anecdotally I know, since AI has taken hold, than before.

heresalexandria 4 hours ago | parent [-]

Did you try providing it documentation for the respective formats (via browsing/tool use or input to the prompt)? And were you using a modern thinking model from Anthropic or OpenAI?

The crucial breakdown here sounds like either lack of proper context/harness or insufficiently capable model (there's a huge gulf between GPT-5.5/Opus 4.8/Fable class models and anything not from the big three) or both.

overgard 4 hours ago | parent | prev | next [-]

You know, people could just put their money where their mouth is. IE go build something amazing instead of talking endlessly about how they're going to build amazing stuff. That's why this feels like so much theatre.

heresalexandria 4 hours ago | parent [-]

That's exactly what the people in my orbit and whom I'm watching are doing, and some of their outputs are fueling the excitement.

If you aren't seeing remarkable things being done with this tech, I'd argue you aren't looking hard enough. I understand there's a lot of noise obscuring the signal, but that's always the case with a "big thing."

clydethefrog 4 hours ago | parent [-]

Can you share concrete examples of the outputs that are fueling your excitement?

heresalexandria 4 hours ago | parent [-]

This was pretty cool, knocking out a problem that the best minds in maths couldn't for 80 years: https://openai.com/index/model-disproves-discrete-geometry-c...

Also this is a remarkable (and realistic) evaluation of where these systems are for general work which speaks to both the room to grow as well as the pace: https://www.remotelabor.ai/

For some practical examples of what the leading consumer grade AI can do, Ethan Mollick consistently has great writeups with demos: https://www.oneusefulthing.org/p/what-it-feels-like-to-work-...

overgard 3 hours ago | parent [-]

The first link is openai, who uh, might be a little biased.

The second link looks lab-sponsored.

The third link is a blog post of a lot of hype and little substance. (I tried fable twice and I was unimpressed)

I'd love to see an example of something someone made that is not affiliated with these corporations.

The thing is, I imagine there ARE examples, AI isn't completely useless, but the amount of signal around this vs the hype noise is out of control. My guess is 99% of what's built with AI is useful to approximately one person. When I hear AI boosters talk it sounds like there's an entire economy of billion dollar corporations I never noticed, and then I usually found out someone got excited because they prompted Claude to make a flight simulator (where the plane flies sideways)

therealdrag0 an hour ago | parent [-]

It’s very subjective, so there’ll be endless debates about this. It’s a speed boost, to some that is amazing to others it’s bleh.

All my examples are private which also probably applies to many other folks here. In my personal projects and friends stealth startups that are greenfield, the boost is 5-10x for coding. For my midsized enterprise employer, we have tracking of over a hundred diverse projects and the average reduction in dev years to completion is about 50%. Not 10x, but not chump change either.

overgard 25 minutes ago | parent [-]

You know, even before AI whenever I heard someone be like "my business idea is so revolutionary I MUST KEEP IT A SECRET" my eyes would roll out of my head. I promise you, your ideas are not nearly as amazing as you think they are and absolutely nobody wants to steal them.

gmm1990 5 hours ago | parent | prev | next [-]

I find that thinking/agent mode sometimes makes it worse/comes up with the same thing and just takes a long time. But I’m sure it’ll be different with fable for a few months until that hype blows over

heresalexandria 4 hours ago | parent [-]

Something a lot of folks struggling with these systems don't get is that the instruction and management of them is often quite important - just because they're capable doesn't mean they're mind readers.

Most of the skepticism I encounter on this front is due to lack of proper direction, process involving planning and review before execution, and appropriate attention given to evaluation and feedback loops.

If you asked the smartest person in the world to YOLO a task with the sort of instruction the average denier uses to evaluate an LLM, you'd likely find they wouldn't get back what they were expecting either - and if you're evaluating on subpar models/tools, you shouldn't be surprised to get subpar results.

lilbigdoot 4 hours ago | parent | next [-]

I asked Qwen 3.7 pro to create a C# project that takes a string and reverses it, with a single file WASM target. It spun wheels for over 30 minutes and got nothing.

I use LLMs all the time to help me diagnose bugs and work through my designs, but again and again, I am super unimpressed by their coding abilities. I can see how in some cases with a proper harness they probably do a decent job at certain tasks, but almost everything I try to do, they flail.

heresalexandria 4 hours ago | parent [-]

Qwen is a lightweight locally hosted model that's many months behind the SoTA available from the big three - while the crowd here (myself included) is excited for locally hosted models to catch up to the usable baseline, regardless of what benchmarks you based that selection on they aren't there yet.

gmm1990 4 hours ago | parent | prev | next [-]

This seems to be a very generic/common response to any ai critique. It kind of reinforces my point there’s a lot of situations where the appropriate harness isn’t some agent that’s set to ultra high thinking mode. Chat mode gives the better response and answers the question more quickly

heresalexandria 3 hours ago | parent [-]

That's fair, I do agree that you don't need a harness or ultra-high thinking mode for many problems. Many folks evaluate without those things on a task that would benefit from them leading to the sort of attitudes in this article and its comments section, which is where my comment was coming from.

If you're just saying different tools are best suited for different problems, apologies - that's my take as well.

4 hours ago | parent | prev [-]
[deleted]
tines 5 hours ago | parent | prev | next [-]

> no brainer

An excellent epithet for people who depend on AI!

heresalexandria 5 hours ago | parent [-]

The same attitude has been directed at points through history for people "who depend on the internet," "who depend on computers," and "who depend on machines."

I was told growing up "you won't always have a calculator in your pocket" and yet now my phone has an offline LLM on it.

tines 4 hours ago | parent | next [-]

And if you can’t see the difference between those things, you’ll probably never know.

skydhash 4 hours ago | parent | prev | next [-]

> I was told growing up "you won't always have a calculator in your pocket" and yet now my phone has an offline LLM on it.

But does your world stop when the phone is out of power? And does every task require a roundtrip to the internet?

heresalexandria 4 hours ago | parent [-]

Offline models are becoming increasingly more capable - merely a few years ago it would've been unthinkable to run the LLM I have on my phone even on my MacBook Pro.

Are you suggesting that losing electricity in the modern age (entirely absent AI) doesn't upend one's world?

You seem to be saying "we should avoid this thing because we'll become dependent on it," but we're highly dependent on all manners of technology for all sorts of things and would seem to be better for it.

ThrowawayR2 4 hours ago | parent | prev [-]

> "I was told growing up "you won't always have a calculator in your pocket" and yet now my phone has an offline LLM on it."

While I think the person you are responding to has made a low quality comment, I will say that it is very, very revealing that so many AI advocates actually seem proud of their absence of basic math skills.

heresalexandria 3 hours ago | parent | next [-]

Never said I was bad at math, but I am aware of the fact that computers can do math better and faster than me - and with our powers combined...

senordevnyc 3 hours ago | parent | prev [-]

lol, they literally said nothing about their math skills

nradov 4 hours ago | parent | prev | next [-]

I should use it to read and vote on HN comments so that I don't have to waste time doing that myself.

shepherdjerred 4 hours ago | parent [-]

You’re absolutely right! This would save so much time.

Sent from my Claude Code

preommr 5 hours ago | parent | prev | next [-]

I hate that these discussions go nowhere because there's no common metric anymore.

I have no idea what stuff like "is it always perfect?" means because it varies so much from person to person. Too many people have different expectations, are working on different problems, or have different standards or goals for there to be a common constructive discussion.

heresalexandria 4 hours ago | parent [-]

Totally agree that the lack of a common base of evaluation is terrible for the discussion, and benchmaxing only contributes to this.

The only way to get a sense for these systems is to use them on things you know well, and everyone knows different things at different levels.

People also tend to underestimate how fast this is moving and base their take on dated and subpar systems for a variety of reasons, a key one being that the firehose is too big for any one person to have a proper focus on all of it.

lowsong 4 hours ago | parent | prev | next [-]

Why do all arguments from AI boosters boil down to this same cycle:

A new model is released, AI fans hail it as huge shift in whatever metrics the AI vendor has gamed this time, and all criticism is shrugged off as "not up to date" and met with "try the new model!" Then, once level heads actually put the claims to the test and find it wanting, criticism is met with "you're just not using it right, you have to learn how to prompt/context/loop engineer for best results" until the next model comes out and this argument repeats.

wseqyrku 3 hours ago | parent | next [-]

> AI fans hail it as huge shift in whatever metrics the AI vendor has gamed this time

Hint: Those are not AI fans. (See the top comment for context.)

heresalexandria 4 hours ago | parent | prev [-]

It shouldn't be a surprise that the baseline for "best" shifts as better tech comes out, but that doesn't make dated models any less capable than they were when they came out.

Skeptics continue to move the goalposts on what constitutes this mattering, but the fact that frontier systems are making novel maths & sciences discoveries and I can run an LLM on my phone for simple tasks that would've been unthinkable a few years ago are testaments to the directionality of the tech.

lowsong 4 hours ago | parent [-]

Ironically it is you who is most at risk from the direction this is taking.

When this house of cards collapses, AI research dries up, and companies pivot to the next hype cycle there will be a generation of people left with atrophied skills and lingering addiction and psychosis. The most flexible will bounce back just fine, but many will never recover from this damage.

Sincerely, I hope you're in the former category.

slopinthebag 5 hours ago | parent | prev [-]

Or they aren’t building a SaaS with React or a TUI with Typecript, which is about the only thing that LLMs have “solved”.

lilbigdoot 4 hours ago | parent | next [-]

Seriously I've been on a sabbatical and have no pressure to use AI. I keep trying on my compiler and for coding they flail so hard every time. I love how good they are at helping me diagnose bugs, it's so much better than scouring google like I used to hoping for some result that matches what I'm seeing. Even then, they hallucinate all the time and just say odd things that don't make sense. As soon as you ask them something obscure they just.... merp

heresalexandria 4 hours ago | parent [-]

This sounds like you may be using subpar models and/or tools - have you had this experience using Codex with GPT-5.5 on at least "high" reasoning or on Claude Code using Opus 4.8 (both with ability to browse web and sufficient context for your project)?

heresalexandria 5 hours ago | parent | prev [-]

They're literally doing novel research. The smartest mathematicians in the world couldn't solve Erdős' planar unit distance problem for 80 years, and OpenAI's models knocked that out a couple months ago.

This stuff is moving fast, and if you aren't evaluating SoTA on at least a quarterly basis, you're going to have a bad time.

senordevnyc 3 hours ago | parent [-]

I admire your efforts here, but you’re tilting against windmills with your comments here. There’s a loud contingent of AI skeptics on HN who haven’t updated their tired arguments in years. Their fervent hatred of AI is mere religious dogma at this point, not anything empirical.

The upside is that we’re competing against these folks. Let them continue to stick their heads in the sand.

heresalexandria 2 hours ago | parent [-]

For sure, I appreciate your comment - this is a tough crowd, but it's their loss.