▲ | jonstewart 3 days ago | |||||||||||||||||||
The hilarious part I’ve found is that when it runs into the least bit of trouble with a step on one of its plans, it will say it has been “Deferred” and then make up an excuse for why that’s acceptable. It is sometimes acceptable for humans to use judgment and defer work; the machine doesn’t have judgment so it is not acceptable for it to do so. | ||||||||||||||||||||
▲ | physix 2 days ago | parent | next [-] | |||||||||||||||||||
Talking about hilarious, we had a Close Encounter of the Hallucinating Kind today. We were having mysterious simultaneous gRPC socket-closed exceptions on the client and server side running in Kubernetes talking to each other through an nginx ingress. We captured debug logs, described the detailed issue to Gemini 2.5 Flash giving it the nginx logs for the one second before and after an example incident, about 10k log entries. It came back with a clear verdict, saying "The smoking gun is here: 2025/07/24 21:39:51 [debug] 32#32: *5902095 rport:443 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.233.100.128, server: grpc-ai-test.not-relevant.org, request: POST /org.not-relevant.cloud.api.grpc.CloudEventsService/startStreaming HTTP/2.0, upstream: grpc://10.233.75.54:50051, host: grpc-ai-test.not-relevant.org" and gave me a detailed action plan. I was thinking this is cool, don't need to use my head on this, until I realized that the log entry simply did not exist. It was entirely made up. (And yes I admit, I should know better than to do lousy prompting on a cheap foundation model) | ||||||||||||||||||||
▲ | quintu5 2 days ago | parent | prev | next [-] | |||||||||||||||||||
My favorite is when you ask Claude to implement two requirements and it implements the first, gets confused by the the second, removes the implementation for the first to “focus” on the second, and then finishes by having implemented nothing. | ||||||||||||||||||||
| ||||||||||||||||||||
▲ | ants_everywhere 3 days ago | parent | prev | next [-] | |||||||||||||||||||
Oh yeah totally. It feels a bit deceptive sometimes. Like just now it says "great the tests are consistently passing!" So I ran the same test command and 4 of the 7 tests are so broken they don't even build. | ||||||||||||||||||||
| ||||||||||||||||||||
▲ | stkdump 2 days ago | parent | prev | next [-] | |||||||||||||||||||
Well I would say that the machine should not override the human input. But if the machine makes up the plans in the first place, then why should it not be allowed to change the plans? I think that the hilarious part in modifying tests to make them work without understanding why they fail is that it probably happens due to training from humans. | ||||||||||||||||||||
▲ | mattigames 2 days ago | parent | prev [-] | |||||||||||||||||||
"This task seems more appropriate for lesser beings e.g. humans" |