| ▲ | bdbdbdb 5 days ago |
| > No human could read all of this in a lifetime. AI consumes it in seconds. And therefore it's impossible to test the accuracy if it's consuming your own data. AI can hallucinate on any data you feed it, and it's been proven that it doesn't summarize, but rather abridges and abbreviates data. In the authors example > "What patterns emerge from my last 50 one-on-ones?" AI found that performance issues always preceded tool complaints by 2-3 weeks. I'd never connected those dots. Maybe that's a pattern from 50 one-on-ones. Or maybe it's only in the first two and the last one. I'd be wary of using AI to summarize like this and expecting accurate insights |
|
| ▲ | gchamonlive 5 days ago | parent | next [-] |
| > it's been proven that it doesn't summarize, but rather abridges and abbreviates data Do you have more resources on that? I'd love to read about the methodology. > And therefore it's impossible to test the accuracy if it's consuming your own data. Isn't it only if it's hard to verify the result? If it's a result that's hard to produce but easy to verify, a class which many problems fall into, you'd just need to look at the synthetized results. If you ask it "given these arbitrary metrics, what is the best business plan for my company?" It'd be really hard to verify the result. I'd be hard to verify the result from anyone for that matter, even specialists. So I think it's less about expecting the LLM to do autonomous work and more about using LLMs to more efficiently help you search the latent space for interesting correlations, so that you and not the LLM come up with the insights. |
| |
| ▲ | thoughtpeddler 4 days ago | parent | next [-] | | Look into the emerging literature around "needle-in-a-haystack" tests of LLM context windows. You'll see what the poster you're replying to is describing, in part. This can also be described as testing "how lazy is my LLM being when it comes to analyzing the input I've provided to it?" Hint: they can get quite lazy! I agree with the poster you replied to that "RAG my Obsidian"-type experiments with local models are middling at best. I'm optimistic things will get a lot better in the future, but it's hard to trust a lot of the 'insights' this blog post talks about, without intense QA-ing (if the author did it, which I doubt, considering their writing is also lazily mostly AI-assisted as well). | |
| ▲ | bdbdbdb 5 days ago | parent | prev [-] | | > If you ask it "given these arbitrary metrics, what is the best business plan for my company?" It'd be really hard to verify the result. I'd be hard to verify the result from anyone for that matter, even specialists. Hard to verify something so subjective, for sure. But a specialist will be applying intelligence to the data. An LLM is just generating random text strings that sound good. The source for my claim about LLMs not summarizing but abbreviating is on hn somewhere, I'll dig it out Edit: sorry, I tried but couldn't find the source. | | |
| ▲ | gchamonlive 5 days ago | parent [-] | | > But a specialist will be applying intelligence to the data. An LLM is just generating random text strings that sound good. I'd only make such a claim if I could demonstrate that human text is a result of intelligence and LLMs not, because really, what's the actual difference? How isn't LLM "intelligent" when it can clearly help me make sense of information? Note that this isn't to say that it's conscious or not. But it's definitely intelligent. The text output is not only coherent, it's right often enough to be useful. Curiously, I'm human, and I'm wrong a lot, but I'm right often enough to be a developer. |
|
|
|
| ▲ | missedthecue 5 days ago | parent | prev | next [-] |
| "AI can hallucinate on any data you feed it, and it's been proven that it doesn't summarize, but rather abridges and abbreviates data." Have you ever met a human? I think one of the biggest reasons people become bearish on AI is that their measure of whether it's good/useful is that it needs to be absolutely perfect, rather than simply superior to human effort. |
| |
| ▲ | autoexec 5 days ago | parent | next [-] | | > one of the biggest reasons people become bearish on AI is that their measure of whether it's good/useful is that it needs to be absolutely perfect, rather than simply superior to human effort. Meanwhile people bullish on AI don't care if it's perfect or even vastly inferior to human effort, they just want it to be less expensive/troublesome and easier to control than a human would be. Plenty of people would be fine knowing that AI fucks up regularly and ruins other people's lives in the process as long as in the end their profits go up or they can still get what they want out of it. | |
| ▲ | bdbdbdb 5 days ago | parent | prev | next [-] | | I'm not saying it needs to be perfect, but the guy in this article is putting a lot of blind faith in an algorithm that's proven time and time again to make things up. The reason I have become "bearish" on AI is because I see people repeatedly falling into a trap of believing LLMs are intelligent, and actively thinking, rather than just very very fine tuned random noise. We should pay attention to the A in AI more. | | |
| ▲ | arevno 5 days ago | parent [-] | | > putting a lot of blind faith in an algorithm that's proven time and time again to make things up Don't be ridiculous. Our entire system of criminal justice relies HEAVILY on the eyewitness testimony of humans, which has been demonstrated time and again to be entirely unreliable. Innocents routinely rot in prison and criminals routinely go free because the human brain is much better at hallucinating than any SOTA LLM. I can think of no more critical institution that ought to require fidelity of information than criminal justice, and yet we accept extreme levels of hallucination even there. This argument is tired, played out, and laughable on its face. Human honesty and memory reliability are a disgrace, and if you wish to score points against LLMs, comparing their hallucination rates to those of humans is likely going to result in exactly the opposite conclusion that you intend others to draw. | | |
| ▲ | 1659447091 5 days ago | parent | next [-] | | > the human brain is much better at hallucinating than any SOTA LLM Aren't the models trained on human content and human intervention? If humans are hallucinating that content, then LLMs even slightly hallucinating from fallible human content, wouldn't that make the LLMs hallucinations still, if even slightly, more than humans? Or am I missing something here where LLMs are somehow correcting the original human hallucinations and thus producing less hallucinated content? | |
| ▲ | bdbdbdb 3 days ago | parent | prev [-] | | Its ridiculous and laughable to say LLMs hallucinate because the justice system isn't flawless? That's a cognitive leap. |
|
| |
| ▲ | bigstrat2003 5 days ago | parent | prev [-] | | Right now AI is inferior, not superior, to human effort. That's precisely why people are bearish on it. | | |
| ▲ | missedthecue 5 days ago | parent [-] | | I don't think thats obvious. In 20 minutes for example, deep research can write a report on a given topic much better than an analyst can produce in a day or two. It's literally cheaper, better, and faster than human effort. | | |
| ▲ | D-Machine 5 days ago | parent | next [-] | | Faster? Yes. Cheaper? Probably, but you need to amortize in all the infrastructure and training and energy costs. Better? Lol no. | | |
| ▲ | arevno 5 days ago | parent [-] | | > but you need to amortize in all the infrastructure and training and energy costs The average American human consumes 232kWh of all-in energy (food, transport, hvac, construction, services, etc) daily. If humans want to get into a competition over lower energy input per unit of cognitive output, I doubt you'd like the result. > Better? Lol no The "IQ equivalent" of the current SOTA models (Opus 4.5, Gemini 3 Pro, GPT 5.2, Grok 4.1) is already a full 1SD above the human mean. Nations and civilizations have perished or been conquered all throughout history because they underestimated and laughed off the relative strength of their rivals. By all means, keep doing this, but know the risks. |
| |
| ▲ | jrflowers 5 days ago | parent | prev [-] | | What do you man by “better” in this context? | | |
| ▲ | missedthecue 5 days ago | parent | next [-] | | It synthesizes a more comprehensive report, using more sources, more varied sources, more data, and broader insights than a human analyst can produce in 1-2 days of research and writing. I'm not confused about this. If you don't agree, I will assume it's probably because you've never employed a human to do similar work in the past. Because it's not particularly close. It's night and day. *Note that I'm not saying 20 minutes of deep research beats 9 months of investigative journalism with private interviews with primary sources or anything like that. I'm talking about asking an analyst on your team to do a deep dive into XYZ and have something on your desk tomorrow EOD. | | |
| ▲ | freejazz 4 days ago | parent | next [-] | | Weird, I'm an attorney and no one is getting rid of associates in order to have LLMs do the research, no less so when they actually hallucinate sources (something associates wont do). I can't imagine that being significantly different in other domains. | | |
| ▲ | jrflowers 4 days ago | parent [-] | | > I can't imagine that being significantly different in other domains. It’s not. There is no industry where AI performs “better” than humans reliably without torturing the meaning of the word (for example, OP says AI is better at analysis iff the act of analysis does not include any form of communication to find or clarify information from primary sources) |
| |
| ▲ | jrflowers 5 days ago | parent | prev [-] | | > It synthesizes a more comprehensive report, using more sources, more varied sources, more data, and broader insights than a human analyst can produce in 1-2 days of research and writing. > Note that I'm not saying 20 minutes of deep research beats 9 months of investigative journalism with private interviews with primary sources or anything like that. I like the idea that AI is objectively better at doing analysis if you simply assume that it takes a person nine months to make a phone call |
| |
| ▲ | fcantournet 5 days ago | parent | prev [-] | | It has more words put together in seemingly correct sentences, so it's long enough his boss won't actually read it to proof it. |
|
|
|
|
|
| ▲ | kenjackson 5 days ago | parent | prev | next [-] |
| Similar to P/NP, verification can often be faster than solving. For example, you can then ask the AI to give you the list of tool complaints and the performance issues. Then a text search can easily validate the claim. |
|
| ▲ | novok 5 days ago | parent | prev | next [-] |
| AI is a new kind of bulk tool, you need to know how to use it well and context management is a huge part of it. For that 1-1 example, you would do a for loop with new context with subagents or a literal for loop for example to prevent the 'first two and last one' issue. Then with those 1-1 summaries, look at that to make the determination for example. Humanity has gotten amazing results from unreliable stochastic processes, managing humans in organizations is an example of that. It's ok if something new is not completely deterministic to still be incredibly useful. |
|
| ▲ | potsandpans 5 days ago | parent | prev | next [-] |
| > ...and it's been proven that it doesn't summarize, but rather abridges and abbreviates data. I don't really know what this means, or if the distinction is meaningful for the majority of cases. |
| |
|
| ▲ | TimByte 5 days ago | parent | prev | next [-] |
| I think as long as you keep a skeptical loop and force the model to cite or surface raw notes, it can still be useful without being blindly trusted |
|
| ▲ | xtiansimon 5 days ago | parent | prev | next [-] |
| > “I'd be wary of using AI to summarize like this and expecting accurate insights.” Sure, but when do you have accurate results when using an iterative process? It can happen at the beginning or at the end when you’re bored, or have exhausted your powers of interrogation. Nevertheless, your reasoning will tell you if the AI result is good, great, acceptable, or trash. For example, you can ask Chat—Summarize all 50 with names, dates and 2-3 sentence summaries and 2-3 pull quotes. Which can be sufficient to jog your memory, and therefore validate or invalidate the Chat conclusion. That’s the tool, and its accuracy is still TBD. I for one am not ready to blindly trust our AI overlords, but darn if a talking dog isn’t worth my time if it can make an argument with me. |
|
| ▲ | block_dagger 5 days ago | parent | prev [-] |
| Your colleagues using the tech will be far ahead of you soon, if they aren’t already. |
| |
| ▲ | iLoveOncall 5 days ago | parent | next [-] | | Far ahead in producing bugs, far ahead in losing their skills, far ahead in becoming irrelevant, far ahead in being unable to think critically, that's absolutely right. | | |
| ▲ | pitched 5 days ago | parent | next [-] | | The new tools have sets of problems they are very good at, sets they are very bad at and they are generally mediocre at everything else. Learning those lessons isn’t easy, takes time, and will produce bugs. If you aren’t making those mistakes now with everyone else, you’ll be doing them later when you do decide to start catching up and it will be more noticeable then. | | |
| ▲ | SoftTalker 5 days ago | parent | next [-] | | Disagree. For the tools to become really useful (and fulfill the expectations of the people funding them) they will need to produce good results without demanding years of experience understanding their foibles and shortcomings. | | |
| ▲ | pitched 5 days ago | parent [-] | | I think there’s a chance the people funding this make the returns they hope for but it’ll be a new business model that gets them there, not producing better results. The quality of results have been roughly stable for too long to expect meaningful increases regularly anymore. |
| |
| ▲ | _DeadFred_ 5 days ago | parent | prev | next [-] | | And all of those things (good at, bad at, the lessons learned on current models current implementation) can change arbitrarily with model changes, nudges, guardrails, etc. Not sure that outsourcing your skillset on the current foundation of sand is long term smart, even if it's great for a couple of months. It may be those un-learning the previous iteration interactions once something stable arrives that are at a disadvantage? | | |
| ▲ | pitched 5 days ago | parent | next [-] | | The tools have been very stable for the past year or so. The biggest change I can think of is how much MCP servers have fallen off. I think they’re generally considered not worth the cost in context tokens now. The scope of changes needed to unlearn now with model changes or whatever else is on par with normal language/library updates we’ve been doing for decades. We’ve plateaued and it’s worth jumping now if you’re still in the fence. | |
| ▲ | evilduck 5 days ago | parent | prev [-] | | Why would the AI skeptics and curmudgeons today not continue to dismiss the "something stable" in the future? |
| |
| ▲ | ThrowawayR2 5 days ago | parent | prev [-] | | The AI hucksters promise us that these tools are getting exponentially better (lol) so the catch up should be exponentially reduced. | | |
| ▲ | pitched 5 days ago | parent [-] | | I see the sarcasm and agree with but just in case anyone sees this, we were getting exponentially better back in the early days but very much hitting diminishing returns now. We’re probably not seeing any large improvements again now with this tech. |
|
| |
| ▲ | afandian 5 days ago | parent | prev [-] | | "The market can stay irrational longer than you can stay solvent" feels relevant here. |
| |
| ▲ | JackSlateur 3 days ago | parent | prev | next [-] | | Yes!
In a breakthrough techological advandcement, people are now able to write stupid and buggy code. This is truly an innovation, nobody ever managed to achieve this before ! Wait. Is this really true ? Is this a goal worthy of achieving ? Do we really need more buggy code ? | |
| ▲ | rsynnott 5 days ago | parent | prev [-] | | ... I mean, what tools one is supposed to be using, according to the advocates, seems to completely change every six months (in particular, the goto excuse when it doesn't work well is "oh, you used foo You should have used bar which came out three weeks ago!", so I'm not sure that _experience_ is particularly valuable if these things ever turn out to be particularly useful. |
|