Remix.run Logo
h1fra 2 hours ago

evals are glorified integration tests, would you invest in an integration test startup? absolutely not. I don't get why we are making all of this fuzz around evals

hilariously 2 hours ago | parent | next [-]

Because what people actually want is a simple harness to test their use cases against all the frontier models and see which is the cheapest/best for the job.

It's simple to say but hard to master doing well, and the important thing is that no matter what tool you have the evals don't write themselves.

pydry 32 minutes ago | parent | prev [-]

There are a number of integration test startups. None of them do a great job but they do exist.