| ▲ | imoreno 21 hours ago |
| Yes let's not say what's wrong with the tech, otherwise someone might (gasp) fix it! |
|
| ▲ | rybosworld 21 hours ago | parent | next [-] |
| Tuning the model output to perform better on certain prompts is not the same as improving the model. It's valid to worry that the model makers are gaming the benchmarks. If you think that's happening and you want to personally figure out which models are really the best, keeping some prompts to yourself is a great way to do that. |
| |
| ▲ | namaria 12 hours ago | parent | next [-] | | There is no guarantee for you that by keeping your questions to yourself that no one else has published something similar. This is bad reasoning all the way through. The problem is in trying to use a question as a benchmark. The only way to really compare models is to create a set of tasks of increasing compositional complexity and running the models you want to compare through them. And you'd have to come up with a new body of tasks each time a new model is published. Providers will always game benchmarks because they are a fixed target. If LLMs were developing general reasoning, that would be unnecessarily. The fact that providers do is evidence that there is no general reasoning, just second order overfitting (loss on token prediction does descend, but that doesn't prevent the 'reasoning loss' to be uncontrollable: cf. 'hallucinations'). | | |
| ▲ | genewitch 9 hours ago | parent [-] | | > Providers will always game benchmarks because they are a fixed target. If LLMs were developing general reasoning, that would be unnecessarily. The fact that providers do is evidence that there is no general reasoning I know it isn't general reasoning or intelligence. I like where this line of reasoning seems to go. Nearly every time I use a chat AI it has lied to me. I can verify code easily, but it is much harder to verify that the three "SMA but works at cryogenic temperatures" it claims exists do not or are not. But that doesn't help to explain to someone else who just uses it as a way to emotionally dump, or an 8 year old that can't parse reality well, yet. In addition, I'm not merely interested in reasoning, I also care about recall, and factual information recovery is spotty on all the hosted offerings, and therefore also on the local offerings too, as those are much smaller. I'm typing on a phone and this is a relatively robust topic. I'm happy to elaborate. | | |
| ▲ | namaria 6 hours ago | parent [-] | | I sympathize, but I feel like this is hopeless. There are numerous papers about the limits of LLMs, theoretical and practical, and every day I see people here on this technology forum claiming that they reason and that they are sound enough to build products on... It feels disheartening. I have been very involved in debating this for the past couple of weeks, which led me to read lots of papers and that's cool, but also feels like a losing battle. Every day I see more bombastic posts, breathless praise, projects based on LLMs etc. |
|
| |
| ▲ | ls612 21 hours ago | parent | prev [-] | | Who’s going out of their way to optimize for random HNers informal benchmarks? | | |
| ▲ | bluefirebrand 20 hours ago | parent | next [-] | | Probably anyone training models who also browses HN? So I would guess every single AI being made currently | |
| ▲ | umanwizard 18 hours ago | parent | prev | next [-] | | They're probably not going out of their way, but I would assume all mainstream models have HN in their training set. | |
| ▲ | ofou 20 hours ago | parent | prev [-] | | considering the amount of bots in HN, not really that much |
|
|
|
| ▲ | aprilthird2021 19 hours ago | parent | prev | next [-] |
| All the people in charge of the companies building this tech explicitly say they want to use it to fire me, so yeah why is it wrong if I don't want it to improve? |
|
| ▲ | idon4tgetit 20 hours ago | parent | prev [-] |
| "Fix". So long as the grocery store has groceries, most people will not care what a chat bot spews. This forum is full of syntax and semantics obsessed loonies who think the symbolic logic represents the truth. I look forward to being able to use my own creole to manipulate a machine's state to act like a video game or a movie rather than rely on the special literacy of other typical copy-paste middle class people. Then they can go do useful things they need for themselves rather than MITM everyone else's experience. |
| |
| ▲ | genewitch 9 hours ago | parent | next [-] | | A third meaning of creole? Hub, I did not know it meant something other than a cooking style and a peoples in Louisiana (mainly). As in I did not know it was a more generic term. Also, in the context you used it, it seems to mean a pidgin that becomes a semi-official language? I also seem to remember that something to do with pit bbq or grilling has creole as a byproduct - distinct from creosote. You want creole because it protects the thing in which you cook as well as imparts flavor, maybe? Maybe I have to ask a Cajun. | | |
| ▲ | namaria 5 hours ago | parent | next [-] | | Pidgin and creole (language) are concepts that have some similarities but don't fully overlap. "Creole" has colonial overtones. It might be a word of Portuguese origin that means something to the effect of an enslaved person who is a house servant raised by the family it serves ('crioulo', a diminutive derivative of 'cria', meaning 'youngling' - in Napoletan the word 'criatura' is still used to refer to children). More well documented is its use in parts of Spanish South America, where 'criollo' designated South Americans of Spanish descent initially. The meaning has since drifted in different South Americans countries. Nowadays it is used to refer, amongst other things, to languages that are formed by the contact between the languages of colonial powers and local populations. As for the relationship of 'creole' and 'creosote' the only reference I could find is to 'creolin', a disinfectant derived from 'creosote' which are derivative from tars. Pidgin is a term used for contact languages that develop between speakers of different languages and somewhat deriving from both, and is believed to be a word originated in 19th century Chinese port towns. The word itself is believed to be a 'pidgin' word, in fact! Cajun is also a fun word, because it apparently derives from 'Acadiene', the french word for Acadian - people of french origin who where expelled from their colony of Acadia in Canada. Some of them ended up in Louisiana and the French Canadian pronunciation "akad͡zjɛ̃", with a more 'soft' (dunno the proper word, I can feel my linguist friend judging me) "d" sound than the French pronunciation "akadjɛ̃", eventually got abbreviated and 'softened' to 'cajun'. Languages are fun! | |
| ▲ | idiotsecant 5 hours ago | parent | prev [-] | | Creole is an example of 'a creole' |
| |
| ▲ | ethersteeds 12 hours ago | parent | prev [-] | | Go get em tiger! |
|