Remix.run Logo
duckerduck 4 days ago

I'm working on a development tool for specification-driven development. It uses LLMs to verify that your specification files and implementation do not drift. More specifically, I am trying to lower the number of false positives I'm currently seeing. I find that the LLM will hallucinate issues when there are no discrepancies, or become forgetful in the case of long documents (like RFC texts). The first step in improving this is to expand my evaluation suite so I can reliably measure improvements.

https://github.com/rejot-dev/semcheck/

Yoric 4 days ago | parent [-]

How does this compare to previous generations of specification-driven development using formal methods?

duckerduck 4 days ago | parent [-]

I think semcheck has a different use-case. AI is inherently imprecise, if you need formal mathematical verification, AI isn't suitable for that. What semcheck does give you is a very simple way to start verifying specifications, and it's best used in combination with AI-assisted development workflows where you're already spending a lot of time writing specifications (i.e. prompts).