Hallucination Risk Calculator

blamestross 5 days ago|

This seems less accurate than `return 1.0`

Using the unboundedly unreliable systems to evaluate reliability is just a bad premise.

lock1 5 days ago|

Can't wait for (((LLM) Hallucination Risk Calculator) Risk Calculator) Risk Calculator to propagate & magnify the error even further! /j

cowboylowrez 5 days ago||

have multiple llms and a voting quorum. sort of how we elect politicians. it'll work just as well I guarantee it!

wongarsu 5 days ago||

Back in the GPT2 times I did use that technique. Also just running the model multiple times with slightly different prompts and choosing the most common response. It doesn't cure all problems but it does lead to better results. It isn't very good for your wallet though

CuriouslyC 5 days ago||

Neat, I should extend this idea to emit signals when a model veers into "This is too hard, so I'll do a toy version that I masquerade as real code, including complete bullshit test cases so you will really have to dig to find out why something isn't working in production." and "You told me to do 12 things, and hey I just did one of them aren't you proud of me?"

I've got a plan for a taskmasker agent that reviews other agent's work, but I hadn't figured out how to selectively trigger it in response to traces to keep it cheap. This might work if extended.

curtisszmania 5 days ago||

[dead]

sackfield 5 days ago|

really interesting approach to calibration for hallucinations, im going to give this a go on some of my projects.