| ▲ | Ask HN: How do you search the web programmatically these days? | ||||||||||||||||
| 6 points by coreyp_1 3 days ago | 7 comments | |||||||||||||||||
For the first time in a long time, I need to query a search engine programmatically, and found that most of them block the use of curl, etc. So, my question is simple: how do you solve the problem? I've tried searxng with mediocre success, but it seems a bit heavy to have to be running a complete separate service for this one thing that I only need every once in a while. I haven't tried using a service that requires an API key, simply because I'm not sure which direction to go or who to go with. Just thought I would ask here first. | |||||||||||||||||
| ▲ | BrunoBernardino 2 days ago | parent | next [-] | ||||||||||||||||
I'm building Uruky [1] and while we allow you to query our service programmatically (€5 / month), if you know which provider you'd like to use directly, there are a few options: - Serper [2], if you like Google-style results - Mojeek [3], if your searches are more EU-centric - Linkup [4], if you like Google-style results, but more about intent and less about keyword matching - Marginalia [5], if your searches are less about "big tech SEO servants" - EUSP [6], if your searches are more UK/FR/DE-centric Note that these are all paid, but most offer free trials (or are limited when free). With Uruky you can also easily search with any or all of them. If you'd like an account number with a couple of days to try for free, let me know. [1]: https://uruky.com [2]: https://serper.dev [3]: https://www.mojeek.com/services/search/web-search-api/ [4]: https://linkup.so | |||||||||||||||||
| |||||||||||||||||
| ▲ | davidsojevic 2 days ago | parent | prev | next [-] | ||||||||||||||||
I work at SerpApi [0], and we offer a free tier that may serve your needs if you're just looking to do programmatic searches periodically. Much of the reason people go with a service like ours is because of the difficulty with rolling your own reliable solution. Happy to answer any questions you might have as well! [0]: https://serpapi.com/ | |||||||||||||||||
| ▲ | raw_anon_1111 2 days ago | parent | prev | next [-] | ||||||||||||||||
Can’t speak for search engines specifically. But I recently had to do a project which required me to crawl the customer’s large site and index it into a vector search for RAG for a call center. My first attempt was to use crawl it just by doing GET requests (ie same thing as using curl). That got me nowhere. I had to use headless Chrome and Playwright. Do any modern websites work with just curl even if they don’t block it - ie without being able to run JS? | |||||||||||||||||
| ▲ | dserban 3 days ago | parent | prev | next [-] | ||||||||||||||||
https://pypi.org/project/ddgs/ (Assuming you prefer Python.) | |||||||||||||||||
| ▲ | pwg 3 days ago | parent | prev [-] | ||||||||||||||||
> and found that most of them block the use of curl Try again, but have curl provide a user agent string from one of the real browsers. You'll likely find that the request goes through. | |||||||||||||||||