| ▲ | majormajor 6 hours ago |
| "Honesty" seems like unnecessary (and annoying) anthropomorphism there. I don't think there's any intent of fraud or deception in outputs from these things, just overreaching of prediction. Based on the latter part of the paragraph, I wish they'd just say something like "less likely to skip steps or overemphasize thin evidence" in the first place. Don't play to the sci-fi "this thing's trying to outsmart me" tropes. |
|
| ▲ | Kiro 6 hours ago | parent | next [-] |
| Using words people understand is more important than this strange fixation on not anthropomorphizing things. |
| |
| ▲ | wasabi991011 6 hours ago | parent | next [-] | | I think "honesty" is not a particularly good descriptor, independent of anthropomorphism. Previous commenters suggestion was much more understandable to me. | |
| ▲ | dugidugout 6 hours ago | parent | prev | next [-] | | Being that can be understood is language. The previous commenter is making an particular argument for how we can improve this understanding. They didn't suggest we should use less familiar words, but different familiar words. Why is this strange? | |
| ▲ | giraffe_lady 6 hours ago | parent | prev | next [-] | | Anthropomorphizing is a shorthand for a powerful and poorly defined set of metaphors. There are tradeoffs going both ways but trying to dismiss it as merely "strange fixation" shows your own weakness. | |
| ▲ | tadfisher 6 hours ago | parent | prev [-] | | To be clear, this is about anthropomorphizing large language models, not the general category of "things". Also, we should be evaluating these constructs using well-defined and measurable criteria; evaluating "honesty" fails to achieve both goals. | | |
| ▲ | derac 6 hours ago | parent [-] | | I think Honesty can be evaluated. Does the model push back when it knows the user is wrong? How often does the model hallucinate data vs. say it doesn't know? Provide a prompt with contradictions or other issues and see if the model corrects you. Here is an article by Anthropic that explains what they do and mean in more detail:
https://alignment.anthropic.com/2025/honesty-elicitation/ |
|
|
|
| ▲ | swader999 6 hours ago | parent | prev | next [-] |
| Just swap 'Honesty' with 'correctness in its claims' and you'll get what you need out of this aspect of the model description. |
| |
| ▲ | stratos123 2 hours ago | parent [-] | | Honesty and correctness are not the same thing, even when talking about LLMs. Sometimes an LLM says a false thing and you don't know whether it's being dishonest or merely incorrect. Sometimes, however, you can see in the CoT that the model does know the true fact and is reasoning about how to deceive the user. That's lying, not just being incorrect. |
|
|
| ▲ | adamtaylor_13 6 hours ago | parent | prev | next [-] |
| People get so wrapped around the axle with "anthropomorphizing". For regular folks with no technical background, sure maybe a bit of caveat sprinkled here or there is useful to help them understand what is or isn't true, but on HN it would seem to me that the bar is high enough that we can just use shared language to generally talk about capabilities. When they say "Honesty" I don't think to myself, "Goodness, does this model have moral understanding?" No, I understand they mean it's less likely to directly bullshit me, which models frequently do. I don't feel like this level of pedantry around language is useful for people who more or less know what's going on with LLMs. (Again, I concede that perhaps with a less technical audience, there's more need for it.) |
|
| ▲ | krupan 2 hours ago | parent | prev [-] |
| I agree. In connection with LLMs we also shouldn't use the words intelligent, smart, reasoning, thinking, chat, conversation, etc. |