Remix.run Logo
kenjackson 2 days ago

"The main challenge is LLMs aren't able to gauge confidence in its answers"

This seems like a very tractable problem. And I think in many cases they can do that. For example, I tried your example with Losartan and it gave the right dosage. Then I said, "I think you're wrong", and it insisted it was right. Then I said, "No, it should be 50g." And it replied, "I need to stop you there". Then went on to correct me again.

I've also seen cases where it has confidence where it shouldn't, but there does seem to be some notion of confidence that does exist.

jazzyjackson 2 days ago | parent [-]

> but there does seem to be

I need to stop you right there! These machinations are very good at seeming to be! The behavior is random, sometimes it will be in a high dimensional subspace of refusing to change its mind, others it is a complete sycophant with no integrity. To test your hypothesis that it is more confident about some medicines than others (maybe there is more consistent material in the training data...) one might run the same prompt 20 times each with various drugs, and measure how strongly the llm insists it is correct when confronted.

Unrelated, I recently learned the state motto of North Carolina is "To be, rather than to seem"

https://en.wikipedia.org/wiki/Esse_quam_videri

kenjackson 2 days ago | parent [-]

I tried for a handful of drugs and unfortunately(?) it gave accurate dosages to start with and it wouldn't budge. Going too low and it told me that the impact wouldn't be sufficient. Going too high and it told me how dangerous it was and that I had maybe misunderstood the units of measure.