Remix.run Logo
drudolph914 2 hours ago

so much to unpack here and almost poetic that you say this

first is that the model will write out that it “thought” and “double checked” it’s output

Second, this was in a fresh context window of the latest model (that isn’t fable b/c we can’t use for reasons beyond this thread), and it was on it’s second highest thinking mode. I shouldn’t have to double check something that it claimed to have burned more tokens on to double check

Outside of it costing me more money to fix what it claims to do, the main point of this article is that models are implementing things nearly end to end, and if we scale it up, it will only continue to do that. I Intentionally chose the example of something that is < 70 lines to implement in TS (btw, the language with the second most amount of data available to scrape and train on) I would assume a machine that can almost implement things end to end should be able to implement something of 70 lines of code and has been documented for nearly 50 years.

My point is that time and time again on the most trivial examples, under the best of conditions, and with unlimited amounts of money, they can’t do what it claims

Outside of that, this follow up comment(s) that say, “oh you need to ask it to check its own work and be so involved in the process of it writing the code that you need to spot check it” goes against everything the article states

The best analogy I have for this is New speak in 1984, it’s just vibes dictating vibes and trying to make people claim that the vibes are right. and if you try to validate the vibes, your vibes are just wrong because you don’t get the vibes. The claims that it made have no data backing it. And if there is data, it’s cherry picked. Please use your brain and stop outsourcing your ability to think to a machine that is incorrectly thinking on your behalf

Edit: Typos