| ▲ | simonw 4 days ago | |
This is so hard! I don't yet have a great solution for this myself, but I've been collecting notes about this on my "evals" tag for a while: https://simonwillison.net/tags/evals/ The best writing I've seen about this is from Hamel Husain - https://hamel.dev/blog/posts/llm-judge/ and https://hamel.dev/blog/posts/evals-faq/ are both excellent. | ||