I think this actually points at a different problem, a problem with LLM users, but only to the extent that it's a problem with people with respect to any questions they have to ask any source they consider an authority at all. No LLM, nor any other source on the Internet, nor any other source off the Internet, can give you reliable dosage guidelines for copper peptides because this is information that is not known to humans. There is some answer to the question of what response you might expect and how that varies by dose, but without the clinical trials ever having been conducted, it's not an answer anyone actually has. Marketing and popular misconceptions about AI lead to people expecting it to be able to conjure facts out of thin air, perhaps reasoning from first principles using its highly honed model of human physiology.

It's an uncomfortable position to be in trying to biohack your way to a more youthful appearance using treatments that have never been studied in human trials, but that's the reality you're facing. Whatever guidelines you manage to find, whether from the telehealth clinic directly, or from a language model that read the Internet and ingested that along with maybe a few other sources, are generally extrapolated from early rodent studies and all that's being extrapolated is an allometric scaling from rat body to human body of the dosage the researchers actually gave to the rats. What effect that actually had, and how that may or may not translate to humans, is not usually a part of the consideration. To at least some extent, it can't be if the compound was never trialed on humans.

You're basically just going with scale up a dosage to human sized that at least didn't kill the rats. Take that and it probably won't kill you. What it might actually do can't be answered, not by doctors, not by an LLM, not by Wikipedia, not by anecdotes from past biohackers who tried it on themselves. This is not a failure of information retrieval or compression. You're just asking for information that is not known to anyone, so no one can give it to you.

If there's a problem here specific to LLMs, it's that they'll generally give you an answer anyway and will not in any way quantify the extent to which it is probably bullshit and why.

▲

cj 2 days ago | parent [-]

> a problem with LLM users

I think the flaw here is placing blame on users rather than the service provider.

HN is cutting LLM companies slack because we understand the technical limitations making it hard for the LLM to just say “I don’t know”.

In any other universe, we would be blaming the service rather than the user.

Why don’t we fix LLMs so they don’t spit out garbage when it doesn’t know the answer. Have we given up on that thought?

▲

BeetleB 2 days ago | parent | next [-]

> In any other universe, we would be blaming the service rather than the user.

I think the key question is "How is this service being advertised?"

Perhaps the HN crowd gives it a lot of slack because they ignore the advertising. Or if you're like me, aren't even aware of how this is being marketed. We know the limitations, and adapt appropriately.

I guess where we differ is on whether the tool is broken or not (hence your use of the word "fix"). For me, it's not at all broken. What may be broken is the messaging. I don't want them to modify the tool to say "I don't know", because I'm fairly sure if they do that, it will break a number of people's use cases. If they want to put a post-processor that filters stuff before it gets to the user, and give me an option to disable the post-processor, then I'm fine with it. But don't handicap the tool in the name of accuracy!

▲

cj a day ago | parent [-]

The point you were making elsewhere in the thread was that "this is a bad use case for LLMs" ... "Don't use LLMs for dosing guidelines." ... "Using dosing guidelines is a bad example for demonstrating how reliable or unreliable LLMs are", etc etc etc.

You're blaming the user for having a bad experience as a result of not using the service "correctly".

I think the tool is absolutely broken, considering all of the people saying dosing guidelines is an "incorrect" use of LLM models. (While I agree it's not a good use, I strongly dislike how you're blaming the user for using it incorrectly - completely out of touch with reality).

We can't just cover up the shortfalls of LLMs by saying things like "Oh sorry, that's not a good use case, you're stupid if you use the tool for that purpose".

I really hope the HN crowd stops making excuses for why it's okay that LLMs don't perform well on tasks it's commonly asked to do.

> But don't handicap the tool in the name of accuracy!

If you're taking the position that it's the user's fault for asking LLMs a question it won't be good at answering, then you can't simultaneously advocate for not censoring the model. If it's the user's responsibility to know how to use ChatGPT "correctly", the tool (at a minimum) should help guide you away from using it in ways it's not intended for.

If LLMs were only used by smarter-than-average HN-crowd techies, I'd agree. But we're talking about a technology used by middle school kids. I don't think it's reasonable to expect middleschoolers to know what they should and shouldn't ask LLMs for help with.

	▲	BeetleB 11 hours ago \| parent [-]
		> You're blaming the user for having a bad experience as a result of not using the service "correctly". Definitely. Just as I used to blame people for misusing search engines in the pre-LLM era. Or for using Wikipedia to get non-factual information. Or for using a library as a place to meet with friends and have lunch (in a non-private area). If you're going to try to use a knife as a hammer, yes, I will fault you. I do expect that if someone plans to use a tool, they do own the responsibility of learning how to use it. > If you're taking the position that it's the user's fault for asking LLMs a question it won't be good at answering, then you can't simultaneously advocate for not censoring the model. If it's the user's responsibility to know how to use ChatGPT "correctly", the tool (at a minimum) should help guide you away from using it in ways it's not intended for. Documentation, manuals, training videos, etc. Yes, I am perhaps a greybeard. And while I do like that many modern parts of computing are designed to be easy to use without any training, I am against stating that this is a minimum standard that all tools have to meet. Software is the only part of engineering where "self-explanatory" seems to be common. You don't buy a board game hoping it will just be self-evident how to play. You don't buy a pressure cooker hoping it will just be safe to use without learning how to use it. So yes, I do expect users should learn how to use the tools they use.

▲

simonw 2 days ago | parent | prev [-]

Current frontier LLMs - Claude 4, GPT-5, Gemini 2.5 - are massively more likely to say "I don't know" than last year's models.

▲

cj 2 days ago | parent [-]

I don’t think I’ve ever seen ChatGPT 5 refuse to answer any prompt I’ve ever given it. I’m doing 20+ chats a day.

What’s an example prompt where it will say “idk”?

Edit: Just tried a silly one, asking it to tell me about the 8th continent on earth, which doesn’t exist. How difficult is it for the model to just say “sorry, there are only 7 continents”. I think we should expect more from LLMs and stop blaming things on technical limitations. “It’s hard” is getting to be an old excuse considering the amount of money flowing into building these systems.

▲

simonw 2 days ago | parent [-]

https://chatgpt.com/share/68b85035-62ec-8006-ab20-af5931808b... - "There are only seven recognized continents on Earth: Africa, Antarctica, Asia, Australia, Europe, North America, and South America."

Here's a recent example of it saying "I don't know" - I asked it to figure out why there was an octopus in a mural about mushrooms: https://chatgpt.com/share/68b8507f-cc90-8006-b9d1-c06a227850... - "I wasn’t able to locate a publicly documented explanation of why Jo Brown (Bernoid) chose to include an octopus amid a mushroom-themed mural."

▲

cj 2 days ago | parent [-]

Not sure what your system prompt is, but asking the exact same prompt word for word for me results in a response talking about "Zealandia, a continent that is 93% submerged underwater."

The 2nd example isn't all that impressive since you're asking it to provide you something very specific. It succeeded in not hallucinating. It didn't succeed at saying "I'm not sure" in the face of ambiguity.

I want the LLM to respond more like a librarian: When they know something for sure, they tell you definitively, otherwise they say "I'm not entirely sure, but I can point you to where you need to look to get the information you need."

▲

simonw 2 days ago | parent [-]

I'm using regular GPT-5, no custom instructions and memory turned off.

Can you link to your shared Zealandia result?

I think that mural result was spectacularly impressive, given that it started with a photo I took of the mural with almost no additional context.

▲

cj 2 days ago | parent [-]

I can't link since it's in an enterprise account.

Interestingly I tried the same question in a separate ChatGPT account and it gave a similar response you got. Maybe it was pulling context from the (separate) chat thread where it was talking about Zealandia. Which raises another question: once it gets something wrong once, will it just keep reenforcing the inaccuracy in future chats? That could lead to some very suboptimal behavior.

Getting back on topic, I strongly dislike the argument that this is all "user error". These models are on track to be worth a trillion dollars at some point in the future. Let's raise our expectations of them. Fix the models, not the users.

	▲	simonw 2 days ago \| parent [-]
		I wonder if you're stuck on an older model like GPT-4o? EDIT: I think that's likely what is happening here: I tried the prompt against GPT-4o and got this https://chatgpt.com/share/68b8683b-09b0-8006-8f66-a316bfebda... My consistent position on this stuff is that it's actually way harder to use than most people (and the companies marketing it) let on. I'm not sure if it's getting easier to use over time either. The models are getting "better" but that partly means their error cases are harder to reason about, especially as they become less common.