If coding agents are the new entry point to your library, how sure are you that they’re using it well?

I asked this question to about 50 library maintainers and dev tool builders, and the majority didn't really know.

Existing code generation benchmarks focus mainly on self-contained code snippets and compare models not agents. Almost none focus on library-specific generation.

So we built a simple app to test how well coding agents interact with libraries: • Takes your library’s docs • Automatically extracts usage examples • Tasks AI agents (like Claude Code) with generating those examples from scratch • Logs mistakes and analyzes performance

We’re testing libraries now, but it’s early days. If you're interested: Input your library, see what breaks, spot patterns, and share the results below.

We plan to expand to more coding agents, more library-specific tasks, and new metrics. Let us know what we should prioritize next.

▲

bdhcuidbebe 18 hours ago | parent | next [-]

> If coding agents are the new entry point to your library, how sure are you that they’re using it well?

> I asked this question to about 50 library maintainers and dev tool builders, and the majority didn't really know.

Why should they even bother to answer such a loaded and hypothetical question?

	▲	richardblythman 16 hours ago \| parent [-]
		im paraphrasing. the questions i asked to dev tool builders were more neutral.

▲

justonceokay 19 hours ago | parent | prev | next [-]

If making dev tooling is selling shovels to the miners, then this is like selling sheet metal to the shovel makers.

	▲	grim_io 17 hours ago \| parent [-]
		Yeah. Feels like a data mining operation for training data. I could be wrong.

▲

dotancohen 19 hours ago | parent | prev | next [-]

Note that this comment is not hijacking. The author of this comment is also the author of the post.

	▲	add-sub-mul-div 18 hours ago \| parent [-]
		That's the more likely assumption. Accounts with only self-promotion spam activity have become more of a rule here than an exception.

▲

weitendorf 18 hours ago | parent | prev | next [-]

Let’s meet and see if it might make sense for us to team up. We’re working on this from the agent/library-specific-task side, and we might be better than chatgpt at marketing your product :)

▲

spankalee 18 hours ago | parent | prev | next [-]

Why do we need to log in?

▲

richardblythman 17 hours ago | parent [-]

we send out an email when the tests are finished (takes about 30 mins)

▲

grim_io 17 hours ago | parent [-]

That makes you sound like you are dodging the question.

	▲	richardblythman 16 hours ago \| parent [-]
		i mean that we wanted an email address to send the results to when they finish. based on comments here, i do think we should allow users to run the audit first (and provide an email address if they want us to follow up with results later).

▲

mxkopy 5 hours ago | parent | prev [-]

IMO a tool like this doesn’t make sense until the hallucination problem is fixed