| ▲ | simonw 11 hours ago |
| I'd love to know what search engine provider they're using under the hood for this. I asked them on Twitter and didn't get a reply (yet) https://twitter.com/simonw/status/1971210260015919488 Crucially, I want to understand the license that applies to the search results. Can I store them, can I re-publish them? Different providers have different rules about this. |
|
| ▲ | kingnothing 9 hours ago | parent | next [-] |
| You can say you're training an AI model and do whatever you want with it. |
| |
| ▲ | theshrike79 11 minutes ago | parent [-] | | The "Zuckerberg defence". It's OK to pirate a massive amount of books if you're not reading or sharing, but rather just training an AI. |
|
|
| ▲ | mchiang 10 hours ago | parent | prev | next [-] |
| We work with search providers and ensure that we have zero data retention policies in place. The search results are yours to own and use. You are free to do what you want with it. Of course you are bound by local laws of the legal jurisdiction you are in. |
| |
| ▲ | simonw 7 hours ago | parent [-] | | OK, so it looks like you aren't willing to share which providers you are working with. Can you share the rationale for not sharing that information instead? | | |
| ▲ | mchiang 5 hours ago | parent [-] | | We have relationships with many providers and I don't want to be seen as promoting or not promoting a specific provider. Some decent privacy-preserving vendors - Brave, Exa, Parallel Web Systems, DuckDuckGo etc We will continue to monitor what's good to improve the output quality and results. Sometimes it could be the combination of providers to yield even better results. If I say one combination right now, and realize another combination is better, and make changes, I wouldn't need to broadcast it each time or risk misrepresenting the feature, which is to have amazing search and research capabilities that can augment models for a superior output. | | |
| ▲ | simonw 4 hours ago | parent | next [-] | | The reason I care about this is that different providers have different rules about how I can use the results. Brave: https://api-dashboard.search.brave.com/terms-of-service "Licensee shall not at any time, and shall not permit others to: store the results of the API or any derivative works from the results of the API" Exa: https://exa.ai/assets/Exa_Labs_Terms_of_Service.pdf "You may not [...] download, modify, copy, distribute, transmit, display, perform, reproduce, duplicate, publish, license, create derivative works from, or offer for sale any information contained on, or obtained from or through, the Services, except for temporary files that are automatically cached by your web browser for display
purposes" Many of the things I want to do with a search API are blocked by these rules! So I need to know which rules I am subject to. | | |
| ▲ | userbinator 3 hours ago | parent [-] | | (IANAL) You can normally safely ignore such things. | | |
| ▲ | jrvarela56 2 hours ago | parent [-] | | I agree with you in spirit, but that’s not an answer you can apply when there’s someone else’s money at stake. |
|
| |
| ▲ | dcreater 4 hours ago | parent | prev [-] | | This information is very useful to the open source community. Whats the rationale in not "building in the public"? Is Ollama turning its back on the open source community? Also why should we believe ollama web search is better than my locally run searxng server? | | |
| ▲ | mchiang 3 hours ago | parent [-] | | Oh yes! that is why I want to provide the names of the providers we use. I do believe in building in the open. The web search functionality has a very generous free tier (it is behind Ollama's free account to prevent abuse) that allows you to give it a try comparing to running a searxng server locally. On making the search functionality locally -- we made considerations and gave it a try but had trouble around result quality and websites blocking Ollama for making a crawler. Using a hosted API, we can get results for users much faster. I'd want us to revisit this at some point. I believe in having the power of local. | | |
|
|
|
|
|
| ▲ | userbinator 3 hours ago | parent | prev | next [-] |
| You should ask if search results are even copyrightable, if they are just a list of links. |
|
| ▲ | apimade 9 hours ago | parent | prev [-] |
| It is strange to launch this type of functionality with not even a privacy policy in place. It makes me wonder if they’ve partnered with another of their VC’s peers who’s recently had a cash injection, and they’re being used as a design partner/customer story. Exa would be my bet. YC backed them early, and they’ve also just closed a $85M Series B. Bing would be too expensive to run freely without Microsoft partnership. Get on that privacy notice soon, Ollama. You’re HQ’d in CA, you’re definitely subject to CCPA. (You don’t need revenue to be subject to this, just being a data controller for 50,000 Californian residents is enough.) https://oag.ca.gov/privacy/ccpa I can imagine the reaction if it turns out the zero-retention provider backing them ended up being Alibaba. |