Remix.run Logo
Razengan 5 days ago

I did ask the AI first, about some things that I already knew how to do.

It gave me horribly inefficient or long-winded ways of doing it. In the time it took for "prompt tuning" I could have just written the damn code myself. It decreased the confidence for anything else it suggested about things I didn't already know about.

Claude still sometimes insists that iOS 26 isn't out yet. sigh.. I suppose I just have to treat it as an occasional alternative to Google/StackOverflow/Reddit for now. No way would I trust it to write an entire class let alone an app and be able to sleep at night (not that I sleep at night, but that's besides the point)

I think I prefer Xcode's built-in local model approach better, where it just offers sane autocompletions based on your existing code. e.g. if you already wrote a Dog class it can make a Cat class and change `bark()` to `meow()`

theshrike79 5 days ago | parent | next [-]

You can write the "prompt tuning" down in AGENTS.md and then you only need to do it once. This is why you need to keep working with different ones to get the feeling what they're good at and how you can steer them closer to your style and preferences without having to reiterate from scratch every time.

I personally have a git submodule built specifically for shared instructions like that, it contains the assumptions and defaults for my specific style of project for 3 different programming languages. When I update it on one project, all my projects benefit.

This way I don't need to tell whatever LLM I'm working with to use modernc.org/sqlite for database connections, for example.

Razengan 4 days ago | parent [-]

> You can write the "prompt tuning" down in AGENTS.md and then you only need to do it once.

Yeah, I just mean: I know how to "fix" the AI for things that I already know about.

But how would I know if it's wrong or right about the stuff I DON"T know?? I'd have to go Google shit anyway to verify it.

This is me asking ChatGPT 5 about ChatGPT 5: https://i.imgur.com/aT8C3qs.png

Asking about Nintendo Switch 2: https://i.imgur.com/OqmB9jG.png

Imagine if AI was somebody's first stop for asking about those things. They'd be led to believe they weren't out when they in fact were!

theshrike79 3 days ago | parent [-]

There's your problem right there.

Don't use it as a knowledge machine, use it as a tool.

Agentic LLMs are the ones that work. The ones that "use tools in a loop to achieve a goal"[0]. I just asked Claude to "add a release action that releases the project as a binary for every supported Go platform" to one of my Github projects. I can see it worked because the binaries appeared as a release. It didn't "hallucinate" anything nor was it a "stohastic parrot". It applied a well known pattern to a situation perfectly. (OK, it didn't use a build matrix, but that's jsut me nitpicking)

In your cases the LLM should've seen that you're asking about current events or news and used a tool that fetches information about it. Now it just defaulted to whatever built-in training data was in its context and failed spectacularly

AIs have a branding issue, because AI != AI which isn't AI. There are so many flavours that it's hard to figure out what people are talking about when they say "AI slop is crap" when I can see every day how "AI" makes my life easier by automating away the mundane crap.

[0] https://simonwillison.net/2025/Sep/18/agents/

simonw 5 days ago | parent | prev [-]

> Claude still sometimes insists that iOS 26 isn't out yet.

How would you imagine an AI system working that didn't make mistakes like that?

iOS 26 came out on September 15th.

LLMs aren't omniscient or constantly updated with new knowledge. Which means we have to figure out how to make use of them despite them not having up-to-the-second knowledge of the world.

Razengan 5 days ago | parent [-]

> How would you imagine an AI system working that didn't make mistakes like that?

I mean, if the user says "Use the latest APIs as of version N" and the AI thinks version N isn't out yet, then it should CHECK on the web first, it's right there, before second guessing the user. I didn't ask it whether 26 was out or not. I told it.

Oh but I guess AIs aren't allowed to have free use of Google's web search or scrap other websites eh

> iOS 26 came out on September 15th.

It was in beta all year and the APIs were publicly available on Apple's docs website. If I told it to use version 26 APIs then it should just use those instead of gaslighting me.

> LLMs aren't omniscient or constantly updated with new knowledge.

So we shouldn't use them if we want to make apps with the latest tech? Despite what the AI companies want us to believe.

You know, on a more general note, I think all AIs should have a toggle between "Do as I say" (Monkey Paw) and "Do what I mean"

simonw 5 days ago | parent [-]

Was this Claude Code or Claude.ai or some other tool that used Claude under the hood?

Different harnesses have different search capabilities.

If I'm doing something that benefits from search I tend to switch to ChatGPT because I know it has a really good search feature available to it. I don't trust Claude's as much.

Razengan 4 days ago | parent [-]

I used the Claude website and Mac desktop app for a relatively standard iOS SwiftUI project.

I used Claude Code with VS Code for some Godot stuff, and even there it sometimes gave outdated and outright made-up APIs (functions that seemed like they should exist but did not etc.)

simonw 4 days ago | parent [-]

Unfortunately LLMs mostly suck at Swift and SwiftUI from what I've heard - they still change pretty often and as a result there aren't enough fresh examples in the training data.

As primarily a Python/JavaScript programmer I don't have that problem!

Razengan 4 days ago | parent [-]

They're terrible at anything new, including knowing about THEMSELVES and their latest versions.

This is me asking ChatGPT 5 about ChatGPT 5: https://i.imgur.com/aT8C3qs.png

Asking about Nintendo Switch 2: https://i.imgur.com/OqmB9jG.png

This could be solved and LLMs could be a lot more useful if they could be a wrapper around live web search: Just search for this shit, scrap the top few results, and summarize the info to me.

But that's a stillborn dream, crippled because Google won't let 3rd-party AIs use their search willy nilly and websites don't want to be scrapped :(

Don't get me wrong: I see the potential in AIs/LLMs and I think they could be amazing for everything, but like every awesome thing, they're hampered by corporate (and government) idiocy.

simonw 4 days ago | parent [-]

Claude Code has a neat fix for that - it knows to look at its own documentation if you ask it questions about itself: https://simonwillison.net/2025/Oct/24/claude-code-docs-map/

I've had great results from ChatGPT running the "GPT-5 Thinking" model since that almost always opts to run a search before it attempts to answer a question.

Here's what I got from that for your Switch 2 question: https://chatgpt.com/share/69089028-db8c-8006-b238-1d6946e791...

Screenshot of the searches it ran here: https://gist.github.com/simonw/048ffb895dd6b94419f0b4e066143...

Razengan 4 days ago | parent [-]

A month ago when I asked Claude (on the website) about its privacy options and stuff, it always pointed me to the Antrhopic website to look it up myself.

Another annoying example: I thought Google's Gemini would be search-first since, well, they're Google.

I asked Gemini to search for Airbnb rooms in an area and give me a summarized list.

It told me it can't and I could do it myself.

I told it again.

Again it told me it can't, but here's how I could do it myself.

I told it it sucks and that ChatGPT etc. can do it for me.

Then it went and I don't know, scrapped Airbnb or used a previous search it must have had, to pull up rooms with an Airbnb link to each.

This could actually be THE absolute killer app for a lot of people, if AI could plan your trip from a single sentence: "I'm free next week. I'd like to go to A, B, or C for a couple days. What's a cheap flight and a room within this budget near X area?" and if it could also go and make a booking through your accounts it would be orgasmic. Finally we would have what people in the 1960s thought computers would be doing in 2000 :')

But as it is, in their current state you have to wade through quite a bit of dumbassery.