▲ | ep103 3 days ago | |||||||||||||||||||||||||||||||||||||
Yesterday I used ChatGPT to transform a csv file. Move around a couple of columns, add a few new ones. Very large file. It got them all right. Except when I really looked through the data, for 3 of the excel cells, it clearly just made up new numbers. I found the first one by accident, the remaining two took longer than it would have taken to modify the file from scratch myself. Watching my coworkers blindly trust output like this is concerning. | ||||||||||||||||||||||||||||||||||||||
▲ | photonthug 3 days ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||
After we fix the all the simple specious reasoning of stuff like Alexander-the-great and agree to out-source certain problems to appropriate tools, the high-dimensional analogs of stuff like Datasaurus[0] and Simpson's paradox[1] etc are still going to be a thing. But we'll be so disconnected from the representation of the problems that we're trying to solve that we won't even be aware of the possibility of any danger, much less able to actually spot it. My take-away re: chain-of-thought specifically is this. If the answer to "LLMs can't reason" is "use more LLMs", and then the answer to problems with that is to run the same process in parallel N times and vote/retry/etc, it just feels like a scam aimed at burning through more tokens. Hopefully chain-of-code[2] is better in that it's at least trying to force LLMs into emulating a more deterministic abstract machine instead of rolling dice. Trying to eliminate things like code, formal representations, and explicit world-models in favor of implicit representations and inscrutable oracles might be good business but it's bad engineering [0] https://en.wikipedia.org/wiki/Datasaurus_dozen [1] https://towardsdatascience.com/how-metrics-and-llms-can-tric... [2] https://icml.cc/media/icml-2024/Slides/32784.pdf | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
▲ | weinzierl 3 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
It sometimes happens with simple things. I once pasted the announcement for an event in Claude to check for spelling and grammar. It had a small suggestion for the last sentence and repeated the whole corrected version for me to copy and paste. Only last sentence slightly modified - or so I thought because it had moved the date of the event in the first sentence by one day. Luckily I caught it before posting, but it was a close call. | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
▲ | throwawayoldie 3 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
> Yesterday I used ChatGPT to transform a csv file. Move around a couple of columns, add a few new ones. Very large file. I'm struggling with trying to understand how using an LLM to do this seemed like a good idea in the first place. | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
▲ | epiccoleman 2 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
I don't mean to be rude, but this sounds like user error. I don't understand why anyone would use an LLM for this - or at least, why you would let the LLM perform the transformation. If I was trying to do something like this I would ask the LLM to write a Python script, validate the output by running it against the first handful of rows (like, `head -n 10 thing.csv | python transform-csv.py`). There are times when statistical / stochastic output is useful. There are other times when you want deterministic output. A transformation on a CSV is the latter. | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
▲ | spongebobstoes 3 days ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||
the safe way to do this is to have it write code to transform data, then run the code I expect future models will be able to identify when a computational tool will work, and use it directly |