▲ | al_borland 2 days ago | |
AI is being influenced by all that noise. It isn’t necessarily going to an authoritative source, it’s looking at Reddit and some SEO slop and using that to come up with the answer. We need AI that’s trained exclusively on verified data and not random websites and internet comments. | ||
▲ | jval43 2 days ago | parent | next [-] | |
I asked Gemini about some Ikea furniture dimensions and it gave seemingly correct answers, until it suddenly didn't make sense. Turns out all the information it gave me came from old Reddit posts and lots of it was factually wrong. Gemini however still linked some official Ikea pages as the "sources". It'll straight up lie to you and then hide where it actually got it's info from. Usually Reddit. | ||
▲ | sothatsit 2 days ago | parent | prev | next [-] | |
Creating better datasets would also help to improve the performance of the models, I would assume. Unfortunately, the costs to produce high-quality datasets of a sufficient size seem prohibitive today. I'm hopeful this will be possible in the future though, maybe using a mix of 1) using existing LLMs to help humans filter the existing internet-scale datasets, and/or 2) finding some new breakthroughs to make model training more data efficient. | ||
▲ | heavyset_go 2 days ago | parent | prev [-] | |
It'll still hallucinate |