Remix.run Logo
YZF 4 hours ago

Lines of code has always been a terrible metric. But all else being equal it is a measure. If all else is not equal, which is usually the case, then it's not.

A lot of the focus has been on AI recently.

Three years ago we didn't have software where a non-software engineer can describe what they want in English and get working (-ish) software generated by other software? Is that not "software has gotten a lot better"?

Other than that I'm not sure how we measure "software has gotten better". New applications? More features? How do we measure sloppier? Is Google Maps suddenly taking you the wrong way more often? I'm not really doubting your subjective experience but seriously how do tell? I mean a doc is a doc and a spreadsheet is a spreadsheet.

We're also only about 10 months into models that are powerful enough to potentially make a bigger difference and we are still figuring out how to use them best.

yurimo an hour ago | parent | next [-]

I personally don't view coding agents making software as "software gotten better" you are comparing a tool and the end result, these are two different things. Agent you use going down and your product going down mean two different things to you customers. I will not deny that we made incredible progress in coding and hell, even design over the past 3.5 years, this technology is here to stay.

That being said while I agree that measuring better quality of software is vague (part of the reason it is hard for models as well), there are universal things I believe every engineer will agree on. Reliability, uptime, customer feedback, legibility of your engineering, performance, these are things we often optimized for. Google Maps is a bit of a strawman because neither of us (unless you work on it), knows how much agent code there is, I think it is likely that it's little since it was working fine prior to 2023. I could bring up github reliability as an example, given how much copilot usage they promote at MS, but once again only folks there know for certain. I do, however, see scores of various AI powered SAAS that looks like it is in a perpetual MVP state. I think you are right in that even if agents give us "good enough" results and we can swallow failure rates and our increasingly lesser understanding of what we, or more so model, created, then it is still progress overall, but this is progress not to human-AI collaboration but to AI-only engineering IMO, this is good or bad depending on how you view the future.

I'm a scientist and most of code I currently write is somewhere on the intersection of critical software and machine learning, squaring these two is not easy and I guess the way I was taught to reason about engineering informs my opinions on this. Maybe it's just a matter of time before codex can help here in an unconstrained manner as well, but I am skeptical at the moment.

arcticbull 3 hours ago | parent | prev [-]

I also can't help but notice they didn't mention how many tokens were burned, or how much that translates to in terms of cost over the 5 months at enterprise AI prices. I'm going to guess this wasn't a cheap demo.