Remix clone Hacker News

Much has been made in its article about autonomous agents ability to do research via browsing the web - the web is 90% garbage by weight (including articles on certain specialist topics).

And it shows. When I used GPT's deep research to research the topic, it generated a shallow and largely incorrect summary of the issue, owning mostly to its inability to find quality material, instead it ended up going for places like Wikipedia, and random infomercial listicles found on Google.

I have a trusty Electronics textbook written in the 80s, I'm sure generating a similarly accurate, correct and deep analysis on circuit design using only Google to help would be 1000x harder than sitting down and working through that book and understanding it.

▲

Aurornis a day ago | parent | next [-]

This story isn’t really about agents browsing the web. It’s a fiction about a company that consumes all of the web and all other written material into a model that doesn’t need to browse the web. The agents in this story supersede the web.

But your point hits on one of the first cracks to show in this story: We already have companies consuming much of the web and training models on all of our books, but the reports they produce are of mixed quality.

The article tries to get around this by imagining models and training runs a couple orders of magnitude larger will simply appear in the near future and the output of those models will yield breakthroughs that accelerate the next rounds even faster.

Yet here we are struggling to build as much infrastructure as possible to squeeze incremental improvements out of the next generation of models.

This entire story relies on AI advancement accelerating faster in a self-reinforcing way in the coming couple of years.

▲

whiplash451 10 hours ago | parent | next [-]

In my opinion, the real breakthrough described in this article is not bigger models to read the web, but models that can experiment on their own and learn from these experiments to generate new ideas.

If this happens, then we indeed enter a non-linear regime.

▲

skywhopper 11 hours ago | parent | prev | next [-]

That’s exactly why it doesn’t make sense. Where would a datacenter-bound AI get more data about the world exactly?

The story is actually quite poorly written, with weird stuff about “oh yeah btw we fixed hallucinations” showing up off-handedly halfway through. And another example of that is the bit where they throw in that one generation is producing scads of synthetic training data for the next gen system.

Okay, but once you know everything there is to know based on written material, how do you learn new things about the world? How do you learn how to build insect drones, mass-casualty biological weapons, etc? Is the super AI supposed to have completely understood physics to the extent that it can infer all reality without having to do experimentation? Where does even the electricity to do this come from? Much less the physical materials.

The idea that even a supergenius intelligence could drive that much physical change in the world within three years is just silly.

	▲	ctoth 9 hours ago \| parent [-]
		How will this thing which is connected to the Internet ... get data?

▲

adastra22 21 hours ago | parent | prev [-]

There's an old adage in AI: garbage in, garbage out. Consuming and training on the whole internet doesn't make you smarter than the average intelligence of the internet.

	▲	drchaos 14 hours ago \| parent [-]
		> Consuming and training on the whole internet doesn't make you smarter than the average intelligence of the internet. This is only true as long as you are not able to weigh the quality of a source. Just like getting spam in your inbox may waste your time, but it doesn't make you dumber.

▲

tim333 11 hours ago | parent | prev | next [-]

I myself am something of an autonomous agent who browses the web and it's possible to be choosy about what you browse. Like I could download some electronics text books off the web rather than going to listicles. LLMs may not be that discriminating at the moment but they could get better.

▲

Balgair 11 hours ago | parent | prev | next [-]

> the web is 90% garbage by weight

Sturgeon's law : "Ninety percent of everything is crap"

▲

dimitri-vs a day ago | parent | prev | next [-]

Interesting, I've hard the exact opposite experience. For example I was curious why in metal casting the top box is called the cope and the bottom is called the drag. And it found very niche information and quotes from page 100 in a PDF on some random government website. The whole report was extremely detailed and verifiable if I followed its links.

That said I suspect (and am already starting to see) the increased use of anti-bot protection to combat browser use agents.

▲

somerandomness a day ago | parent | prev [-]

Agreed. However, source curation and agents are two different parts of Deep Research. What if you provided that textbook to a reliable agent?

Plug: We built https://RadPod.ai to allow you to do that, i.e. Deep Research on your data.

▲

preommr a day ago | parent | next [-]

So, once again, we're in the era of "There's an [AI] app for that".

▲

skeeter2020 a day ago | parent | prev | next [-]

that might solve your sourcing problem, but now you need to have faith it will draw conclusions and parallels from the material accurately. That seems even harder than the original problem; I'll stick with decent search on quality source material.

	▲	somerandomness a day ago \| parent [-]
		The solution is a citation mechanism that points you directly where in the source material it comes from (which is what we tried to build). Easy verification is important for AI to have a net-benefit to productivity IMO.

▲

demadog a day ago | parent | prev | next [-]

RadPod - what models do you use to power it?

▲

a day ago | parent | prev [-]

[deleted]