| ▲ | WarmWash 5 hours ago | ||||||||||||||||
I did my (out of the ordinary) taxes this year using agents, kind of as an experiment and kind of to save ~$750. Opus 4.6 max in CC, 5.4 xhigh in codex, and 3.1 high in antigravity. All on the $20/mo plans. I have a day job, a side business, actively trade shares options and futures, and have a few energy credit items. All were given the same copied folder containing all the needed documents to compose the return, and all were given the same prompt. My goal was that if all three agreed, I could then go through it pretty confidently and fill out the actual submission forms myself. 5.4 nailed it on the first shot. Took about 12 minutes. 3.1 missed one value, because it decided to only load the first 5 pages of a 30 page document. Surprisingly it only took about 2 minutes to complete though. A second prompt and ~10 seconds corrected it. GPT and Gemini now were perfectly aligned with outputs. 4.6 hit my usage limit before finishing after running for ~10 minutes. I returned the next day to have it finish. It ran for another 5 minutes or so before finishing. There were multiple errors and the final tax burden was a few thousand off. On a second prompt asking to check for errors in the problem areas, it was able to output matching values after a couple more minutes. For my first time using CC and 4.6 (outside of some programming in AG), I am pretty underwhelmed given the incessant hype. | |||||||||||||||||
| ▲ | toddmorey 5 hours ago | parent [-] | ||||||||||||||||
My taxes are rather complex, so I ran the same exercise to see if Claude agreed with my accountant. An automated second opinion, so to speak. Spent about 6 minutes analyzing all the PDFs and basically nailed it perfectly in one shot. My only point here is it sure seems the same activity / use case can have wildly different results across sessions or users. Customer support and product development in the age of non-deterministic software is a strange, strange beast. | |||||||||||||||||
| |||||||||||||||||