| ▲ | bhadass 5 hours ago |
| better mental model: it's a lossy compression of human knowledge that can decompress and recombine in novel (sometimes useful, sometimes sloppy) ways. classical search simply retrieves, llms can synthesize as well. |
|
| ▲ | RhythmFox 4 hours ago | parent | next [-] |
| This isn't strictly better to me. It captures some intuitions about how a neural network ends up encoding its inputs over time in a 'lossy' way (doesn't store previous input states in an explicit form). Maybe saying 'probabilistic compression/decompression' makes it a bit more accurate? I do not really think it connects to your 'synthesize' claim at the very end to call it compression/decompression, but I am curious if you had a specific reason to use the term. |
| |
| ▲ | XenophileJKO 3 hours ago | parent [-] | | It's really way more interesting that that. The act of compression builds up behaviors/concepts of greater and greater abstraction. Another way you could think about it is that the model learns to extract commonality, hence the compression. What this means is because it is learning higher level abstractions AND the relationships between these higher level abstractions, it can ABSOLUTELY learn to infer or apply things way outside their training distribution. |
|
|
| ▲ | andy99 3 hours ago | parent | prev | next [-] |
| No, this describes the common understanding of LLMs and adds little to just calling it AI. The search is the more accurate model when considering their actual capabilities and understanding weaknesses. “Lossy compression of human knowledge” is marketing. |
| |
| ▲ | XenophileJKO 3 hours ago | parent [-] | | It is fundamentally and provably different than search because it captures things on two dimensions that can be used combinatorially to infer desired behavior for unobserved examples. 1. Conceptual Distillation - Proven by research work that we can find weights that capture/influence outputs that align with higher level concepts. 2. Conceptual Relations - The internal relationships capture how these concepts are related to each other. This is how the model can perform acts and infer information way outside of it's training data. Because if the details map to concepts then the conceptual relations can be used to infer desirable output. (The conceptual distillation also appears to include meta-cognitive behavior, as evidenced by Anthropic's research. Which manes sense to me, what is the most efficient way to be able to replicate irony and humor for an arbitrary subject? Compressing some spectrum of meta-cognitive behavior...) | | |
| ▲ | kylecazar an hour ago | parent [-] | | Aren't the conceptual relations you describe still, at their core, just search (even if that's extremely reductive)? We know models can interpolate well, but it's still the same probabilistic pattern matching. They identify conceptual relationships based on associations seen in vast training data. It's my understanding that models are still not at all good at extrapolation, handling data "way outside" of their training set. Also, I was under the impression LLM's can replicate irony and humor simply because that text has specific stylistic properties, and they've been trained on it. |
|
|
|
| ▲ | andrei_says_ 5 hours ago | parent | prev | next [-] |
| “Novel” to the person who has not consumed the training data. Otherwise, just training data combined in highly probable ways. Not quite autocomplete but not intelligence either. |
| |
| ▲ | pc86 4 hours ago | parent | next [-] | | What is the difference between "novel" and "novel to someone who hasn't consumed the entire corpus of training data, which is several orders of magnitude greater than any human being could consume?" | | |
| ▲ | adrian_b 3 hours ago | parent | next [-] | | The difference is that when you do not know how a problem can be solved, but you know that this kind of problem has been solved countless times earlier by various programmers, you know that it is likely that if you ask an AI coding assistant to provide a solution, you will get an acceptable solution. On the other hand, if the problem you have to solve has never been solved before at a quality satisfactory for your purpose, then it is futile to ask an AI coding assistant to provide a solution, because it is pretty certain that the proposed solution will be unacceptable (unless the AI succeeds to duplicate the performance of a monkey that would type a Shakespearean text by typing randomly). | |
| ▲ | szundi 3 hours ago | parent | prev [-] | | [dead] |
| |
| ▲ | soulofmischief 4 hours ago | parent | prev [-] | | Citation needed that grokked capabilities in a sufficiently advanced model cannot combinatorially lead to contextually novel output distributions, especially with a skilled guiding hand. | | |
| ▲ | arcanemachiner 4 hours ago | parent [-] | | Pretty sure burden of proof is on you, here. | | |
| ▲ | soulofmischief 4 hours ago | parent [-] | | It's not, because I haven't ruled out the possibility. I could share anecdata about how my discussions with LLMs have led to novel insights, but it's not necessary. I'm keeping my mind open, but you're asserting an unproven claim that is currently not community consensus. Therefore, the burden of proof is on you. | | |
| ▲ | adrian_b 3 hours ago | parent [-] | | I agree that after discussions with a LLM you may be led to novel insights. However, such novel insights are not novel due to the LLM, but due to you. The "novel" insights are either novel only to you, because they belong to something that you have not studied before, or they are novel ideas that were generated by yourself as a consequence of your attempts to explain what you want to the LLM. It is very frequent for someone to be led to novel insights about something that he/she believed to already understand well, only after trying to explain it to another ignorant human, when one may discover that the previous supposed understanding was actually incorrect or incomplete. | | |
| ▲ | soulofmischief 2 hours ago | parent [-] | | The point is that the combined knowledge/process of the LLM and a user (which could be another LLM!) led to it walking the manifold in a way that produced a novel distribution for a given domain. I talk with LLMs for hours out of the day, every single day. I'm deeply familiar with their strengths and shortcomings on both a technical and intuitive level. I push them to their limits and have definitely witnessed novel output. The question remains, just how novel can this output be? Synthesis is a valid way to produce novel data. And beyond that, we are teaching these models general problem-solving skills through RL, and it's not absurd to consider the possibility that a good enough training regimen cannot impart deduction/induction skills into a model that are powerful enough to produce novel information even via means other than direct synthesis of existing information. Especially when given affordances such as the ability to take notes and browse the web. | | |
| ▲ | irishcoffee an hour ago | parent [-] | | > I push them to their limits and have definitely witnessed novel output. I’m quite curious what these novel outputs are. I imagine the entire world would like to know of an LLM producing completely, never-before-created outputs which no human has ever thought before. Here is where I get completely hung up. Take 2+2. An LLM has never had 2 groups of two items and reached the enlightenment of 2+2=4 It only knows that because it was told that. If enough people start putting 2+2=3 on the internet who knows what the LLM will spit out. There was that example a ways back where an LLM would happily suggest all humans should eat 1 rock a day. Amusingly, even _that_ wasn’t a novel idea for the LLM, it simply regurgitated what it scraped from a website about humans eating rocks. Which leads to the crux: how much patently false information have LLMs scraped that is completely incorrect? | | |
| ▲ | soulofmischief an hour ago | parent [-] | | This is not a correct approximation of what happens inside an LLM. They form probabilistic logical circuits which approximate the world they have learned through training. They are not simply recalling stored facts. They are exploiting organically-produced circuitry, walking a manifold, which leads to the ability to predict the next state in a staggering variety of contexts. As an example: https://arxiv.org/abs/2301.05217 It's not hard to imagine that a sufficiently developed manifold could theoretically allow LLMs to interpolate or even extrapolate information that was missing from the training data, but is logically or experimentally valid. | | |
| ▲ | emp17344 34 minutes ago | parent [-] | | You could find a pre-print on Arxiv to validate practically any belief. Why should we care about this particular piece of research? Is this established science, or are you cherry-picking low-quality papers? | | |
| ▲ | soulofmischief 26 minutes ago | parent [-] | | I don't need to reach far to find preliminary evidence of circuits forming in machine learning models. Here's some research from OpenAI researchers exploring circuits in vision models: https://distill.pub/2020/circuits/ Are these enough to meet your arbitrary quality bar? Circuits are the basis for features. There is still a ton of open research on this subject. I don't care what you care about, the research is still being done and it's not a new concept. |
|
|
|
|
|
|
|
|
|
|
| ▲ | DebtDeflation 3 hours ago | parent | prev [-] |
| Information Retrieval followed by Summarization is how I view it. |