| ▲ | danudey 2 days ago | |
One thing my team lead is working on is using Claude to 'generate' integration tests/add new tests to e2e runs. Straight up asking Claude to run the tests, or to generate a test, could result in potential inconsistencies between runs or between tests, between models, and so on, so instead he created a tool which defines a test, inputs and outputs and some details. Now we have a system where we have a directory full of markdown files describing a test suite, parameters, test cases, error cases, etc., and Claude generates the usage of the tool instead. This means that whatever variation Claude, or any other LLM, might have run-to-run or drift over time, it all still has to be funneled through a strictly defined filter to ensure we're doing the same things the same way over time. | ||
| ▲ | latentsea 2 days ago | parent | next [-] | |
I'm looking at implementing https://github.com/coleam00/Archon as a means to solve this. You can build arbitrary workflows custom to your codebase. Looks to bring a bit of much-needed determinism. | ||
| ▲ | zx8080 2 days ago | parent | prev [-] | |
What kind of system/area (or product) are you working on? | ||