Remix.run Logo
liquidki a day ago

I think this is the achilles heel of LLM-based AI: the attention mechanisms are far, far, inferior to a human, and I haven't seen much progress here. I regularly test models by feeding in a 20-30 minute transcript of a podcast and ask them to state the key points.

This is not a lot of text, maybe 5 pages. I then skim it myself in about 2-3 minutes and I write down what I would consider the key points. I compare the results and I find the AI usually (over 50% of the time) misses 1 or more points that I would consider key.

I encourage everyone to reproduce this test just to see how well current AI works for this use case.

For me, AI can't adequately do one of the first things that people claim it does really well (summarization). I'll keep testing, maybe someday it will be satisfactory in this, but I think this is a basic flaw in the attention mechanism that will not be solved by throwing more data and more GPUs at the problem.

joshstrange a day ago | parent | next [-]

> I encourage everyone to reproduce this test just to see how well current AI works for this use case.

I do this regularly and find it very enlightening. After I’ve read a news article or done my own research on a topic I’ll ask ChatGPT to do the same.

You have to be careful when reading its response to not grade on a curve, read it as if you didn’t do the research and you don’t know the background. I find myself saying “I can see why it might be confused into thinking X but it doesn’t change the fact that it was wrong/misleading”.

I do like when LLM‘s cite their sources, mostly because I find out they’re wrong. Many times I’ve read a summary, then followed it to the source, read the entire source, and realized it says nothing of the sort. But almost always, I can see where it glued together pieces of the source, incorrectly.

A great micro example of this are the Apple Siri summaries for notifications. Every time they mess up hilariously I can see exactly how they got there. But it’s also a mistake that no human would ever make.

pu_pu a day ago | parent | prev [-]

[dead]