Remix.run Logo
blamestross 3 days ago

This seems less accurate than `return 1.0`

Using the unboundedly unreliable systems to evaluate reliability is just a bad premise.

lock1 3 days ago | parent [-]

Can't wait for (((LLM) Hallucination Risk Calculator) Risk Calculator) Risk Calculator to propagate & magnify the error even further! /j

cowboylowrez 3 days ago | parent [-]

have multiple llms and a voting quorum. sort of how we elect politicians. it'll work just as well I guarantee it!

wongarsu 3 days ago | parent [-]

Back in the GPT2 times I did use that technique. Also just running the model multiple times with slightly different prompts and choosing the most common response. It doesn't cure all problems but it does lead to better results. It isn't very good for your wallet though