| ▲ | dangelosaurus 2 days ago | |
Working on promptfoo, an open-source (MIT) CLI and framework for eval-ing and red-teaming LLM apps. Think of it like pytest but for prompts - you define test cases, run evals against any model (OpenAI, Anthropic, local models, whatever), and catch regressions before they hit prod. Currently building out support for multi-agent evals, better tracing, voice, and static code analysis for AI security use cases. So many fun sub-problems in this space - LLM testing is deceptively hard. If you end up checking it out and pick up an issue, I'll happily send swag. We're also hiring if you want to work on this stuff full-time. | ||