Remix.run Logo
stackskipton 2 days ago

As SRE/Ops person, sigh checks the founder list and starts internally screaming

YC, you want founders of this companies to have 10 years working at Ford Motor Company. It's all reasons I want to write my blog article of "FAANG, please STFU. I wish I could be focused on 100k Requests per Second but instead I'm dealing with engineers who has no idea why their ORM is creating terrible query. Please stop telling them about GraphQL."

"Grant @User write access to analytics S3 bucket for 24 hours" Can the user even have access to this? Do they need write access or can't understand why they are getting errors on read? What happens when they forget in 30 days they asked your LLM for access and now their application does not work because they decided to borrow this S3 bucket instead of asking for one of their own. Yes this happened.

"Find where this secret is used so I can rotate it without downtime" Well, unless you are scanning all our Github repos, Kubernetes secret and containers, you are going to miss the fact this secret was manually loaded into Kubernetes/loaded into flat file in Docker container or stored in some random secret manager none of us are even aware of.

""Why did database costs spike yesterday?" -> Identifies expensive queries, shows optimization options, implements fixes

How? Likely it's because bad schema or lack of understanding with ORMs. Fix is going to be some PR somewhere to Dev who probably does not understand what they are reviewing.

Most of our headaches is the fact that Devs almost never give a shit about Ops, their bosses don't give a shit about Ops and Ops is trying desperately to keep this train which is on fire from derailing. We don't need AI YOLOing more stuff into Prod, we need AI to tell their bosses what downtime they are causing is costing our company so maybe, just maybe, they will actually care.

nickpapciak 2 days ago | parent [-]

These are fair criticisms. I will say, while each of these examples are challenging problems for agents to carry out, I do believe they can be solved. Especially with a tighter integration with app code.

We are always trying to learn more based on our customer's feedback. What we've learned so far is that infra setups are all extremely different, and what works for some companies don't work for others. There's also vastly different company cultures related to ops. Some companies value their ops team a lot, other companies burden them with way too much work. Our goal is to try to make that burden a little lighter :)

stackskipton 2 days ago | parent [-]

I agree they are challenging problems but as others have pointed out, most of infrastructure problems are political so AI is not as helpful. Not to mention depending on our setup, your system would need to be involved in EVERYTHING which InfoSec is going to brittle at.

Writing Terraform is not hard part for this Ops person, if I wanted to use AI, Copilot can easily write it no problem but I'm pretty fast enough these days. Devs of course could use to write Terraform but we are back to the problem of they have no idea what they are asking for.

Maybe my larger organization is not your target market, maybe it's places without dedicated Ops person but at that point, AI that can manage Kubernetes/PaaS for them would be more useful than another TerraForm AI bot.