I agree with Simon’s article but I usually think about “research” to mean comparing different kinds of evidence (not just the search part). Like evidence for the effectiveness of Obamacare. Or how some legal case may play out in the courts. Or how much The Critic influenced The Family Guy. Or even what the best way to use X feature of Y library.

I’ve found ChatGPT and other LLMS can struggle to evaluate evidence - to understand the biases behind sources - ie taking data from a sketchy think tank as gospel. I also have found in my work the more reasoning, the more hallucination. Especially when gathering many statistics.

That plus the usual sycophancy can cause the model to really want to find evidence to support your position. Even if you don’t think you’re asking a leading question, it can really want to answer your question in the affirmative.

I always ask ChatGPT do directly cite and evaluate sources. And try to get it in the mindset of comparing and contrasting arguments for and against. And I find I must argue against its points to see how it reacts.

More here https://softwaredoug.com/blog/2025/08/19/researching-with-ag...

▲

NothingAboutAny 4 days ago | parent | next [-]

I tried to use perplexity to find ideal settings for my monitor, it responded with concise list of distinct settings and why. When I investigated the source it was just people guessing and arguing with each other in the Samsung forums, no official or even backed up information.

I'd love if it had a confidence rating based on the sources it found or something, but I imagine that would be really difficult to get right.

▲

Moosdijk 4 days ago | parent | next [-]

I asked gemini to do a deep research on the role of healthcare insurance companies in the decline of general practicioners in the Netherlands. It based its premise mostly on blogs and whitepapers on company websites, who's job it is to sell automation-software.

AI really needs better source-validation. Not just to combat the hallucination of sources (which gemini seems to do 80% of the time), but also to combat low quality sources that happen to correlate well to the question in the prompt.

It's similar to Google having to fight SEO spam blogs, they now need to do the same in the output of their models.

▲

simonw 4 days ago | parent | next [-]

Better source validation is one of the main reasons I'm excited about GPT-5 Thinking for this. It would be interesting to try your Gemini prompts against that and see how the results compare.

▲

Hugsun 4 days ago | parent [-]

I've found GPT-5 Thinking to perform worse than o3 did in tasks of a similar nature. It makes more bad assumptions that de-rail the train of thought.

	▲	3abiton 3 days ago \| parent [-]
		I think the key is prompting, and bound boxing assumptions.

▲

carlhjerpe 4 days ago | parent | prev | next [-]

When using AI models through Kagi Assistant you can tweak the searches the LLM does with your Kagi settings (search only academic, block bullshit websites and such) which is nice. And I can chose models from many providers.

No API access though so you're stuck talking with it through the webapp.

▲

Atotalnoob 4 days ago | parent | prev [-]

Kagi has some tooling for this. You can set web access “lenses” that limit the results to “academic”, “forums”, etc.

Kagi also tells you the percentages “used” for each source and cites them in line.

It’s not perfect, but it’s a lot better to narrow down what you want to get out of your prompt.

▲

ugh123 4 days ago | parent | prev | next [-]

Seems like the right outcome was had, by reviewing sources. I wish it went one step further and loaded those source pages and scroll/highlight the snippets where it pulled information from. That way we can easily double check at least some aspects of it's response, and content+ads can be attributed to the publisher.

▲

stocksinsmocks 3 days ago | parent | prev | next [-]

In the absence of easily found authoritative information from the manufacturer, this would have been my source of information. Internet banter might actually be the best available information.

▲

wodenokoto 4 days ago | parent | prev | next [-]

But the really tricky thing is, that sometimes it _is_ these kinds of forums where you find the best stuff.

When LLMs really started to show themselves, there was a big debate about what is truth, with even HN joining in on heated debates on the number of sexes or genders a dog may have and if it was okay or not for ChatGPT to respond with a binary answer.

On one hand, I did found those discussions insufferable, but the deeper question - what is truth and how do we automated the extraction of truth from corpora - is super important and somehow completely disappeared from the LLM discourse.

▲

simonw 4 days ago | parent | prev [-]

It would be interesting to see if that same question against GPT-5 Thinking produces notably better results.

▲

killerstorm 4 days ago | parent | prev | next [-]

FWIW GPT-5 (and o3, etc.) is one of the most critical-minded LLMs out there.

If you ask for information which is e.g. academic or technical it would cite information and compare different results, etc, without any extra prompt or reminder.

Grok 4 (at the initial release) was just reporting information in the articles it found without any analysis.

Claude Opus 4 also seems bad: I asked it to give a list of JS libraries of a certain kind in deep research mode, and it returned a document focused on market share and usage statistics. Looks like it stumbled upon some articles of that kind and got carried away by it. Quite bizarre.

So GPT-5 is really good in comparison. Maybe not perfect in all situations, but perhaps better than an average human

	▲	eru 4 days ago \| parent [-]
		> So GPT-5 is really good in comparison. Maybe not perfect in all situations, but perhaps better than an average human Alas, the average human is pretty bad at these things.

▲

btmiller 4 days ago | parent | prev | next [-]

How are we feeling about the usage of the word research to indicate feature sets in LLMs? Is it truly representative of research? How does it compare to the colloquial “do your research” refrain used often during US election years?

	▲	softwaredoug 4 days ago \| parent [-]
		Well I will just need to start saying “critical thinking”? Or some other term? I have a liberal arts background. So I use the term research to mean gathering evidence, evaluating its trustworthiness and biases, and avoiding related thinking errors related to evaluating evidence (https://thedecisionlab.com/biases). LLMs can fall prey to these problems as well. Usually it’s not just “reasoning” that gives you trouble. It’s the reasoning about evidence. I see this with Claude Code a lot. It can sometimes create some weird code, hallucinating functionality that doesn’t exist, all because it found a random forum post. I realize though that the term is pretty overloaded :)

▲

gonzobonzo 4 days ago | parent | prev | next [-]

> I’ve found ChatGPT and other LLMS can struggle to evaluate evidence - to understand the biases behind sources - ie taking data from a sketchy think tank as gospel.

This is what I keep finding, it mostly repeats surface level "common knowledge." It usually take a few back and forths to get to whether or not something is actually true - asking for the numbers, asking for the sources, asking for the excerpt from the sources where they actually provide that information, verifying to make sure it's not hallucinating, etc. A lot of the time, it turns out its initial response was completely wrong.

I imagine most people just take the initial (often wrong) response at face value, though, especially since it tends to repeat what most people already believe.

▲

athrowaway3z 4 days ago | parent [-]

> It usually take a few back and forths to get to whether or not something is actually true

This cuts both ways. I have yet to find an opinion or fact I could not make chatgpt agree with as if objectivly true. Knowing how to trigger (im)partial thought is a skill in and of itself and something we need to be teaching in school asap. (Which some already are in 1 way or another)

▲

gonzobonzo 4 days ago | parent | next [-]

I'm not sure teaching it in school is actually going to help. Most people will tell you that of course you need to look at primary sources to verify claims - and then turn around and believe the first thing they here from LLM, Redditor, Wiki article, etc. Even worse, many people get openly hostile to the idea that people should verify claims - "what, you don't believe me?"/"everyone here has been telling you this is true, do you have any evidence it isn't?"/"oh, so you think you know better?"

There was a recent discussion about Wikipedia here recently where a lot of people who are active on the site argued against people taking the claims there with a grain of salt and verifying the accuracy for themselves.

We can teach these things until the cows come home, but it's not going to make a difference if people say it's a good idea and then immediately do the opposite.

	▲	Kim_Bruning 4 days ago \| parent [-]
		There were actual Wikipedians arguing not to take a wiki with a grain of salt? If I was in that discussion, I must have missed those posts. Can you link an example? If you mean whether Wikipedia is unreliable? That's a different story, everything is unreliable. Wikipedia just happens to be potentially less unreliable than many (typically) (if used correctly) (#include caveats.h) . Sources are like power tools. Use them with respect and caution.

▲

eru 4 days ago | parent | prev [-]

> Knowing how to trigger (im)partial thought is a skill in and of itself and something we need to be teaching in school asap.

You are very optimistic.

Look at all other skills we are trying to teach in school. 'Critical thinking' has been at the top of nearly every curriculum you can point a finger at for quite a while now. To minimal effect.

Or just look at how much math we are trying to teach the kids, and what they actually retain.

	▲	athrowaway3z 4 days ago \| parent [-]
		Perhaps a bit optimistic, but this can be shown in real time: the situation, cause, and effect. Critical thinking is a much more general skill which is applicable anywhere, thus quicker to be 'buried' under other learned behavior. This skill has an obvious trigger; you're using AI, which means you should be aware of this.

▲

thom 4 days ago | parent | prev | next [-]

Yeah trying to make well-researched buying decisions for example is really hard because you'll just quite a lot of opinions dominated by marketing material, which aren't well counterbalanced by the sort of angry Reddit posts or YouTube comments I'd often treat as red flags.

▲

vancroft 4 days ago | parent | prev | next [-]

> I always ask ChatGPT do directly cite and evaluate sources. And try to get it in the mindset of comparing and contrasting arguments for and against. And I find I must argue against its points to see how it reacts.

Same here. But it often produces broken or bogus links.

▲

wer232essf 4 days ago | parent | prev [-]

[flagged]