Remix.run Logo
vicchenai 2 hours ago

The monitoring and evaluation piece is underrated. In my experience the hardest part isn't building the initial LLM pipeline, it's knowing when the thing quietly broke. Domain expertise matters a lot there because you need to design evals that actually catch the failure modes that matter for your specific data distribution.