|
| ▲ | bayindirh 3 hours ago | parent | next [-] |
| It's already being trained on "public" (ethical or otherwise) data. So, it already has ingested that kind of "optimization" during pre-training and training. I don't think you can fine-tune your way out of it. |
| |
| ▲ | fsflover 2 hours ago | parent | next [-] | | This is far from widespread at the moment, so it'll be possible to at least use the current cutting-edge models locally in the future. | | |
| ▲ | bayindirh 2 hours ago | parent [-] | | Far from widespread? SEO has seeped to all crevices of the internet for the last 20 years. |
| |
| ▲ | ToucanLoucan 2 hours ago | parent | prev [-] | | People still think these things are smart. That if their word generator eats enough of the Internet, it will somehow give them the real information that's otherwise hidden. Or perhaps a better word; filter the bullshit. To filter bullshit it would first have to understand bullshit, and it doesn't. That's why an LLM will tell you the solution to a problem that doesn't work, and argue with you when you correct it. | | |
| ▲ | bayindirh 2 hours ago | parent [-] | | This is what bothers me a lot. For the people who doesn't know how it's made or want to believe, it's a miracle. For me, it's a resource wasting text generator. I'll not lie, I don't use OpenAI, Mistral or Anthropic's models, even for coding. I prefer to read my API docs and cry once. I used Gemini, five or six times in total. Twice I asked a couple of very specific things, and it unearthed them. Since they were not products, but information, that was helpful. Twice, it has given wrong information. When I "told" it, there was another way, it said "of course there are two ways", etc. Tasteless and time wasting. I don't like using an LLM all day long, or offload my thinking to them. It's the ultimate self-poisoning incident. And as you say, these algorithms can't know right/wrong/logical/bullshit, etc. They just spew out text. |
|
|
|
| ▲ | rplnt 3 hours ago | parent | prev | next [-] |
| That doesn't solve this particular problem. Your local model was trained on reddit comments written by bots. |
|
| ▲ | Schweigerose 3 hours ago | parent | prev | next [-] |
| How do you make sure that the model you run locally is not tainted? Is there even a way to confirm this without providing the complete training set? |
| |
| ▲ | psb5 2 hours ago | parent [-] | | Fwiw I just run kiwix/zeal locally which has old school search index of all articles in wiki/stackoverflow etc. That seems enough for most of my day to day use. |
|
|
| ▲ | soloto 3 hours ago | parent | prev | next [-] |
| Local AI will have the bias that existed at the time of its training, which is different from no bias. For stuff that needs to be current, a local LLM would need to search the net regardless. |
| |
| ▲ | embedding-shape 2 hours ago | parent [-] | | And since "no bias" isn't something that actually exists in reality when it comes to language or even anything near humans, "bias in local model I can introspect" will always be miles ahead of "bias I know is there, but cannot introspect". |
|
|
| ▲ | jondea 3 hours ago | parent | prev | next [-] |
| It's less compromised, but it's still basing the answer on compromised queries. This is why I pay for independent reviews (e.g Which) where their incentives are more aligned with yours. |
|
| ▲ | 2 hours ago | parent | prev | next [-] |
| [deleted] |
|
| ▲ | rdtsc 3 hours ago | parent | prev | next [-] |
| Not if the models come from Google. The ads will be implicit in the model. X is better that Y an Z would be easy to add to a the training set. |
| |
| ▲ | pautasso 34 minutes ago | parent [-] | | Does this mean the model must be retrained every time a new ad is posted? How much are AI ads going to cost? | | |
| ▲ | rdtsc 20 minutes ago | parent [-] | | Yeah, I meant not individual ads but implicit forced/influenced preference for certain brands. Let’s say it always picks Coke vs Pepsi when giving an example of a soft drink. Or picks BMW when asked to pick the best car. Which cloud provider is the best? -Why, GCP of course, etc. Companies then get to bid for a preference “place”. This is more like Google paying to be the search engine default in Firefox. |
|
|
|
| ▲ | FergusArgyll 3 hours ago | parent | prev | next [-] |
| How does that help if it's using search? You get whatever the search engine outputs |
|
| ▲ | weird-eye-issue 2 hours ago | parent | prev [-] |
| Local AI models pull in search results just like ChatGPT does
... And they are trained on web data just like any other model... |