▲ | torginus a day ago | |||||||||||||||||||||||||||||||||||||
Much has been made in its article about autonomous agents ability to do research via browsing the web - the web is 90% garbage by weight (including articles on certain specialist topics). And it shows. When I used GPT's deep research to research the topic, it generated a shallow and largely incorrect summary of the issue, owning mostly to its inability to find quality material, instead it ended up going for places like Wikipedia, and random infomercial listicles found on Google. I have a trusty Electronics textbook written in the 80s, I'm sure generating a similarly accurate, correct and deep analysis on circuit design using only Google to help would be 1000x harder than sitting down and working through that book and understanding it. | ||||||||||||||||||||||||||||||||||||||
▲ | Aurornis a day ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||
This story isn’t really about agents browsing the web. It’s a fiction about a company that consumes all of the web and all other written material into a model that doesn’t need to browse the web. The agents in this story supersede the web. But your point hits on one of the first cracks to show in this story: We already have companies consuming much of the web and training models on all of our books, but the reports they produce are of mixed quality. The article tries to get around this by imagining models and training runs a couple orders of magnitude larger will simply appear in the near future and the output of those models will yield breakthroughs that accelerate the next rounds even faster. Yet here we are struggling to build as much infrastructure as possible to squeeze incremental improvements out of the next generation of models. This entire story relies on AI advancement accelerating faster in a self-reinforcing way in the coming couple of years. | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
▲ | tim333 11 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
I myself am something of an autonomous agent who browses the web and it's possible to be choosy about what you browse. Like I could download some electronics text books off the web rather than going to listicles. LLMs may not be that discriminating at the moment but they could get better. | ||||||||||||||||||||||||||||||||||||||
▲ | Balgair 11 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
> the web is 90% garbage by weight Sturgeon's law : "Ninety percent of everything is crap" | ||||||||||||||||||||||||||||||||||||||
▲ | dimitri-vs a day ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
Interesting, I've hard the exact opposite experience. For example I was curious why in metal casting the top box is called the cope and the bottom is called the drag. And it found very niche information and quotes from page 100 in a PDF on some random government website. The whole report was extremely detailed and verifiable if I followed its links. That said I suspect (and am already starting to see) the increased use of anti-bot protection to combat browser use agents. | ||||||||||||||||||||||||||||||||||||||
▲ | somerandomness a day ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||
Agreed. However, source curation and agents are two different parts of Deep Research. What if you provided that textbook to a reliable agent? Plug: We built https://RadPod.ai to allow you to do that, i.e. Deep Research on your data. | ||||||||||||||||||||||||||||||||||||||
|