| ▲ | johnfn 6 hours ago | |||||||||||||||||||
Admittedly just vibes from me, having pointed small models at code and asked them questions, no extensive evaluation process or anything. For instance, I recall models thinking that every single use of `eval` in javascript is a security vulnerability, even something obviously benign like `eval("1 + 1")`. But then I'm only posting comments on HN, I'm not the one writing an authoritative thinkpiece saying Mythos actually isn't a big deal :-) | ||||||||||||||||||||
| ▲ | jorvi 2 hours ago | parent | next [-] | |||||||||||||||||||
My proof-in-pudding test is still the fact that we haven't seen gigantic mass firings at tech companies, nor a massive acceleration on quality or breadth (not quantity!) of development. Microsoft has been going heavy on AI for 1y+ now. But then they replace their cruddy native Windows Copilot application with an Electron one. If tests and dev only has marginal cost now, why aren't they going all in on writing extremely performant, almost completely bug-free native applications everywhere? And this repeats itself across all big tech or AI hype companies. They all have these supposed earth-shattering gains in productivity but then.. there hasn't been anything to show for that in years? Despite that whole subsect of tech plus big tech dropping trillions of dollars on it? And then there is also the really uncomfortable question for all tech CEOs and managers: LLMs are better at 'fuzzy' things like writing specs or documentation than they are at writing code. And LLMs are supposedly godlike. Leadership is a fuzzy thing. At some point the chickens will come to roost and tech companies with LLM CEOs / managers and human developers or even completely LLM'd will outperform human-led / managed companies. The capital class will jeer about that for a while, but the cost for tokens will continue to drop to near zero. At that point, they're out of leverage too. | ||||||||||||||||||||
| ||||||||||||||||||||
| ▲ | bloaf 3 hours ago | parent | prev | next [-] | |||||||||||||||||||
I remember a study from a while back that found something like "50% of 2nd graders think that french fries are made out of meat instead of potatoes. Methodology: we asked kids if french fries were meat or potatoes." Everyone was going around acting like this meant 50% of 2nd graders were stupid with terrible parents. (Or, conversely, that 50% of 2nd graders were geniuses for "knowing" it was potatoes at all) But I think that was the wrong conclusion. The right conclusion was that all the kids guessed and they had a 50% chance of getting it right. And I think there is probably an element of this going on with the small models vs big models dichotomy. | ||||||||||||||||||||
| ||||||||||||||||||||
| ▲ | argee 5 hours ago | parent | prev [-] | |||||||||||||||||||
With LLMs (and colleagues) it might be a legitimate problem since they would load that eval into context and maybe decide it’s an acceptable paradigm in your codebase. | ||||||||||||||||||||