Remix.run Logo
dataviz1000 4 days ago

I'm having an hard time getting my mind to see this.

> Users should re-tune their prompts and harnesses accordingly.

I read this in the press release and my mind thought it meant test harness. Then there was a blog post about long running harnesses with a section about testing which lead me to a little more confusion.

Yes, the word 'harness' is consistently used in the context as a wrapper around the LLM model not as 'test harness'.

dboreham 4 days ago | parent | next [-]

This field is chock full of people using terms incorrectly, defining new words for things that already had well known names, overloading terms already in use. E.g. shard vs partition. TUI which already meant "telephony user interface ". "Client" to mean "server" in blockchain.

suttontom 3 days ago | parent | prev | next [-]

Some people also call evaluations "tests". There are unexpected things that come along with new models, like the model in a workflow you'd set up suddenly starts calling a tool and never stops or decides to no longer call a particular tool, so running your existing evaluations to catch regressions like this and potentially updating the prompts is considered "testing" your prompts and harnesses.

kreig 4 days ago | parent | prev [-]

I understood this concept with this simple equation: Agent = LLM + harness