Remix.run Logo
alexhans 4 hours ago

I haven't had the analysis paralysis problem because I've always been quite decent at restructuring environments to avoid bureocracy (which can one of the most dangerous things for a project) but one thing I've observed is that If operations are not ZeroOps then whoever is stuck maintaining systems will suffer by not being able to deliver the "value adding cool features that drive careers".

Since shipping prototypes doesn't actually create value unless they're in some form of production environment to effect change, then either they work and are ZeroOps or they break and someone needs to operate on them and is accountable for them.

This means that at some point, your thesis of

"The dark side of this same coin is when teams try to rely on the AI to write the real code, too, and then blame the AI when something goes wrong" won't really work that way but whoever is accountable will get the blame and the operations.

The same principles for building software that we've always have apply more than ever to AI related things.

Easy to change, reusable, compostable, testable.

Prototypes need to be thrown away. Otherwise they're trace bullets and you don't want to have tech debt in your tracer bullets unless your approach is to throw it to someone else ans make it their problem.

-----

Creating a startup or any code from scratch in a way that you don't actually have to maintain and find out the consequences of your lack of sustainable approaches (tech debt/bad design/excessive cost) is easy. You hide the hardest part. It's easy to do things that in surface look good if you can't see how they will break.

The blog post is interesting but, unless I've missed something, it does gloss over the accountability aspect. If you can delegate accountability you don't worry about evals-first design, you can push harder on dates because you're not working backwards from the actual building and design and its blockers.

Evals (think promtpfoo) for evals-first design will be key for any builder who is accountable for the decisions of their agents (automation).

I need to turn it into a small blog post but the points of the talk https://alexhans.github.io/talks/airflow-summit/toward-a-sha...

- We can’t compare what we can’t measure

- Can I trust this to run on its own?

Are crucial to have a live system that makes critical decisions. If you don't, have this, you're just using the --yolo flag.