Remix.run Logo
gordonhart 13 hours ago

Here's an example ticket that I'll probably work on next week:

    Live stream validation results as they come in
The body doesn't give much other than the high-level motivation from the person who filed the ticket. In order to implement this, you need to have a lot of context, some of which can be discovered by grepping through the code base and some of which can't:

- What is the validation system and how does it work today?

- What sort of UX do we want? What are the specific deficiencies in the current UX that we're trying to fix?

- What prior art exists on the backend and frontend, and how much of that can/should be reused?

- Are there any scaling or load considerations that need to be accounted for?

I'll probably implement this as 2-3 PRs in a chain touching different parts of the codebase. GPT via Codex will write 80% of the code, and I'll cover the last 20% of polish. Throughout the process I'll prompt it in the right direction when it runs up against questions it can't answer, and check its assumptions about the right way to push this out. I'll make sure that the tests cover what we need them to and that the resultant UX feels good. I'll own the responsibility for covering load considerations and be on the line if anything falls over.

Does it look like software engineering from 3 years ago? Absolutely not. But it's software engineering all the same even if I'm not writing most of the code anymore.

Rodeoclash 12 hours ago | parent | next [-]

This right here is my view on the future as well. Will the AI write the entire feature in one go? No. Will the AI be involved in writing a large proportion of the code that will be carefully studied and adjusted by a human before being used? Absolutely yes.

This cyborg process is exactly how we're using AI in our organisation as well. The human in the loop understands the full context of what the feature is and what we're trying to achieve.

codegangsta 12 hours ago | parent | prev | next [-]

But planning like this is absolutely something AI can do. In fact, this is exactly the kind of thing we start with on our team when it comes to using AI agents. We have a ticket with just a simple title that somebody threw in there, and we asked the AI to spin up a bunch of research agents to understand and plan and ask itself those questions.

Funny enough, all the questions that you posed are things that come up right away that the agent asks itself, and then goes and tries to understand and validate an answer, sometimes with input from the user. But I think this planning mechanism is really critical to being able to have an AI generate an understanding, then have it be validated by a human before beginning implementation.

And by planning I don't necessarily mean plan mode in your agent harness of choice. We use a custom /plan skill in Claude Code that orchestrates all of this using multiple agents, validation loops, and specific prompts to weed out ambiguities by asking clarifying questions using the ask user question tool.

This results in taking really fuzzy requirements and making them clear, and we automate all of this through linear but you could use your ticket tracker of choice.

adriand 9 hours ago | parent [-]

Absolutely. Eventually the AI will just talk to the CEO / the board to get general direction, and everything will just fall out of that. The level of abstraction the agents can handle is on a steady upward trajectory.

sarchertech 6 hours ago | parent [-]

If AIs can do that, they won’t be talking to a CEO or the board of a software company. There won’t be a CEO or a board because software companies won’t exist. They’ll talk to the customers and build one off solutions for each of them.

There will be 3 “software” companies left. And shortly after that society will collapse because of AI can do that it can do any white collar job.

fragmede 12 hours ago | parent | prev [-]

I mean, what is the validation system? Either it exists in code, and thus can be discovered if you point the AI at repo, or... what, it doesn't exist?

For the UX, have it explore your existing repos and copy prior art from there and industry standards to come up with something workable.

Web scale issues can be inferred by the rest of the codebase. If your terraform repo has one RDS server, vs a fleet of them, multi-region, then the AI, just as well as a human, can figure out if it needs Google Spanner level engineering or not. (probably not)

Bigger picture though, what's the process of a human logs an under specified ticket and someone else picks it up and has no clue what to do with it? They're gonna go ask the person who logged the bug for their thoughts and some details beyond "hurr Durr something something validation". If we're at the point where AI is able to make a public blog post shaming the open source developer for not accepting a patch, throwing questions back to you in JIRA about the details of the streaming validation system is well within its capabilities, given the right set of tools.

gordonhart 12 hours ago | parent [-]

Honestly curious, have you seen agents succeed at this sort of long-trajectory wide breadth task, or is it theoretical? Because I haven't seen them come close (and not for lack of trying)

codegangsta 12 hours ago | parent | next [-]

Yeah I absolutely see it every day. I think it’s useful to separate the research/planning phase from the building/validadation/review phase.

Ticket trackers are perfect for this. Just start with asking AI to take this unclear, ambiguous ticket and come up with a real plan for how to accomplish it. Review the plan, update your ticket system with the plan, have coworkers review it if you want.

Then when ready, kick off a session for that first phase, first PR, or the whole thing if you want.

kolinko 10 hours ago | parent | prev | next [-]

In my expedience, Claude Code with opus 4.5 is the first one to tackle such issues well.

fragmede 10 hours ago | parent | prev [-]

Opus 4.6, with all of the random tweaks I've picked up off of here, and twitter, is in the middle of rewriting my golang cli program for programmers into a swiftui Mac app that people can use, and it's totally managing to do it. Claude swarm mode with beads is OP.