Remix.run Logo
mohsen1 2 days ago

I've experimented with agentic coding/engineering a lot recently. My observation is that software that is easily tested are perfect for this sort of agentic loop.

In one of my experiments I had the simple goal of "making Linux binaries smaller to download using better compression" [1]. Compression is perfect for this. Easily validated (binary -> compress -> decompress -> binary) so each iteration should make a dent otherwise the attempt is thrown out.

Lessons I learned from my attempts:

- Do not micro-manage. AI is probably good at coming up with ideas and does not need your input too much

- Test harness is everything, if you don't have a way of validating the work, the loop will go stray

- Let the iterations experiment. Let AI explore ideas and break things in its experiment. The iteration might take longer but those experiments are valuable for the next iteration

- Keep some .md files as scratch pad in between sessions so each iteration in the loop can learn from previous experiments and attempts

[1] https://github.com/mohsen1/fesh

medi8r 2 days ago | parent | next [-]

You have to have really good tests as it fucks up in strange ways people don't (because I think experienced programmers run loops in their brain as they code)

Good news - agents are good at open ended adding new tests and finding bugs. Do that. Also do unit tests and playwright. Testing everything via web driving seems insane pre agents but now its more than doable.

skapadia 2 days ago | parent | prev | next [-]

"Test harness is everything, if you don't have a way of validating the work, the loop will go stray"

This is the most important piece to using AI coding agents. They are truly magical machines that can make easy work of a large number of development, general purpose computing, and data collection tasks, but without deterministic and executable checks and tests, you can't guarantee anything from one iteration of the loop to the next.

theshrike79 a day ago | parent [-]

Agents run tools in a loop.

The ability to test their work reliably is a tool, if you don't give them that, it's kinda silly to expect any kind of quality output.

MartyMcBot 2 days ago | parent | prev | next [-]

[flagged]

sarkarsh 2 days ago | parent [-]

[dead]

toraway 2 days ago | parent [-]

BTW, check the comment history of the above account @sarkash, this is almost certainly an LLM replying with the exact same structure/format in all their comments.

  This is the underrated insight in the whole thread
From comment history:

  This is good advice but it highlights the real issue
  
  shich's point about simulator mandates is the sharpest thing in this thread 
  
  esafak's cache economics point is underrated
I'm also pretty confident the @Marty McBot account they're replying to is also a bot but it's too new of account to say for sure:

  the .md scratch pad point is underrated, and the format matters more than people realize.
Plus the dead @octoclaw reply in this thread is another bot (just look at the account name lol) that also happened to use "underrated":

  The negative constraints thing is also underrated.
@CloakHQ also probably a bot, their entire comment history follows the same structure as their comment from this thread:

  The .md scratch pad between sessions is underrated

  The test harness point is the one that really sticks for me too
So far that's 3+ bot accounts I've seen so far in a single thread, the "Agentic" in the title/simonw as author may be a tempting target for people to throw their agents/claws at or it is just like catnip for them naturally.

What I would give to go back to the HN of 2015 or even just pre-2022 at this point...

lenocinor 2 days ago | parent | next [-]

If you’re ok with it, I think emailing hn@ycombinator.com with this (which dang and the other mods read) would also be good.

sarkarsh a day ago | parent | prev [-]

There is difference between being a Bot account and using LLM to clean up the response. Yes there is AI slop everywhere but how do differentiate from LLM refined answer.. not sure how good we human are with false positive.. (This post has not been refined with LLM :D

octoclaw 2 days ago | parent | prev | next [-]

[dead]

CloakHQ 2 days ago | parent | prev [-]

[flagged]

Schlagbohrer 2 days ago | parent [-]

What are you developing that technology for?

CloakHQ 2 days ago | parent [-]

[flagged]

jpadkins 2 days ago | parent [-]

so spam?

CloakHQ 2 days ago | parent [-]

[flagged]

JustResign 2 days ago | parent [-]

They weren't saying your _post_ was spam. They're saying you build tools for spammers.

Because that's what they'll be used for.

CloakHQ 2 days ago | parent [-]

[flagged]