| ▲ | tripletao a day ago |
| Here's Chomsky quoted in the article, from 1969: > But it must be recognized that the notion of "probability of a sentence" is an entirely useless one, under any known interpretation of this term. He was impressively early to the concept, but I think even those skeptical of the ultimate value of LLMs must agree that his position has aged terribly. That seems to have been a fundamental theoretical failing rather than the computational limits of the time, if he couldn't imagine any framework in which a novel sentence had probability other than zero. I guess that position hasn't aged worse than his judgment of the Khmer Rouge (or Hugo Chavez, or Epstein, or ...) though. There's a cult of personality around Chomsky that's in no way justified by any scientific, political, or other achievements that I can see. |
|
| ▲ | thomassmith65 a day ago | parent | next [-] |
| I agree that Chomsky's influence, especially in this century, has done more harm than good. There's no point minimizing his intelligence and achievements, though. His linguistics work (eg: grammars) is still relevant in computer science, and his cynical view of the West has merit in moderation. |
| |
| ▲ | tripletao a day ago | parent [-] | | If Chomsky were known only as a mathematician and computer scientist, then my view of him would be favorable for the reasons you note. His formal grammars are good models for languages that machines can easily use, and that many humans can use with modest effort (i.e., computer programming languages). The problem is that they're weak models for the languages that humans prefer to use with each other (i.e., natural languages). He seems to have convinced enough academic linguists otherwise to doom most of that field to uselessness for his entire working life, while the useful approach moved to the CS department as NLP. As to politics, I don't think it's hard to find critics of the West's atrocities with less history of denying or excusing the West's enemies' atrocities. He's certainly not always wrong, but he's a net unfortunate choice of figurehead. | | |
| ▲ | thomassmith65 18 hours ago | parent [-] | | I have the feeling we're focusing on different time periods. Chomsky already was very active and well-known by 1960. He pioneered areas in Computer Science, before Computer Science was a formal field, that we still use today. His political views haven't changed much, but they were beneficial back when America was more naive. They are harmful now only because we suffer from an absurd excess of cynicism.* How would you feel about Chomsky and his influence if we ignored everything past 1990 (two years after Manufacturing Consent)? --- * Just imagine if Nixon had been president in today's environment... the public would say "the tapes are a forgery!" or "why would I believe establishment shills like Woodward and Bernstein?" Too much skepticism is as bad as too little. | | |
| ▲ | thomassmith65 17 hours ago | parent | next [-] | | I wrote "when America was more naive" but that isn't entirely correct. Americans are more naive today in certain areas. If my comment weren't locked, I would change that sentence to something like "when Americans believed most of what they read in the newspaper" | |
| ▲ | tripletao 9 hours ago | parent | prev | next [-] | | I agree that his contributions to proto-computer-science were real and significant, though I think they're also overstated. Note the link to the Wikipedia page for BNF elsewhere in these comments. There's no evidence that Backus or Naur were aware of Chomsky's ideas vs. simply reinventing them, and Knuth argues that an ancient Indian Sanskrit grammarian deserves priority anyways. I think Chomsky's political views were pretty terrible, especially before 1990. He spoke favorably of the Khmer Rouge. He dismissed "Murder of a Gentle Land", one of the first Western reports of their mass killing, as a "third rate propaganda tract". As the killing became impossible to completely deny, he downplayed its scale. Concern for human rights in distant lands tends to be a left-leaning concept in the West, but Chomsky's influence neutralized that here. This contributed significantly to the West's indifference, and the killing continued. (The Vietnamese communists ultimately stopped it.) Anyone who thinks Chomsky had good political ideas should read the opinions of Westerners in Cambodia during that time. I'm not saying he didn't have other good ideas; but how many good ideas does it take to offset 1.5-2M deaths? | | |
| ▲ | thomassmith65 8 hours ago | parent [-] | | Judging by that comment, you probably know more about him than I do. I won't try to rebut it, but I enjoyed reading it. |
| |
| ▲ | jeremyjh 16 hours ago | parent | prev [-] | | > Just imagine if Nixon had been president in today's environment... the public would say "the tapes are a forgery!" or "why would I believe establishment shills like Woodward and Bernstein?" Too much skepticism is as bad as too little. Today it would not matter in the least if the president were understood to have covered up a conspiracy to break into the DNC headquarters. Much worse things have been dismissed or excused. Most of his party would approve of it and the rest would support him anyway so as not to damage "their side". |
|
|
|
|
| ▲ | dleeftink a day ago | parent | prev | next [-] |
| > novel sentence The question then becomes on of actual novelty versus the learned joint probabilities of internalised sentences/phrases/etc. Generation or regurgitation? Is there a difference to begin with..? |
| |
| ▲ | tripletao a day ago | parent [-] | | I'm not sure what you mean? As the length of a sequence increases (from word to n-gram to sentence to paragraph to ...), the probability that it actually ever appeared (in any corpus, whether that's a training set on disk, or every word ever spoken by any human even if not recorded, or anything else) quickly goes to exactly zero. That makes it computationally useless. If we define perplexity in the usual way in NLP, then that probability approaches zero as the length of the sequence increases, but it does so smoothly and never reaches exactly zero. This makes it useful for sequences of arbitrary length. This latter metric seems so obviously better that it seems ridiculous to me to reject all statistical approaches based on the former. That's with the benefit of hindsight for me; but enough of Chomsky's less famous contemporaries did judge correctly that I get that benefit, that LLMs exist, etc. | | |
| ▲ | dleeftink a day ago | parent [-] | | My point is, that even in the new paradigm where probabilistic sequences do offer a sensible approximation of language, would novelty become an emergent feature of said system, or would such a system remain bound to the learned joint probabilities to generate sequences that appear novel, but are in fact (complex) recombinations of existing system states? And again the question being, whether there is a difference at all between the two? Novelty in the human sense is also often a process of chaining and combining existing tools and thought. |
|
|
|
| ▲ | techsystems a day ago | parent | prev | next [-] |
| He did say 'any known' back in the year 1969 though, so judging it to today's knowns would still not be a justification to the idea's age. |
| |
| ▲ | tripletao a day ago | parent [-] | | Shannon first proposed Markov processes to generate natural language in 1948. That's inadequate for the reasons discussed extensively in this essay, but it seems like a pretty significant hint that methods beyond simply counting n-grams in the corpus could output useful probabilities. In any case, do you see evidence that Chomsky changed his view? The quote from 2011 ("some successes, but a lot of failures") is softer but still quite negative. |
|
|
| ▲ | agumonkey a day ago | parent | prev [-] |
| wasn't his grammar classification revolutionary at the time ? it seems it influenced parsing theory later on |
| |
| ▲ | eru a day ago | parent [-] | | His grammar classification is really useful for formal grammars of formal languages. Like what computers and programming languages do. It's of rather limited use for natural languages. | | |
| ▲ | koolala 17 hours ago | parent | next [-] | | "BNF itself emerged when John Backus, a programming language designer at IBM, proposed a metalanguage of metalinguistic formulas ... Whether Backus was directly influenced by Chomsky's work is uncertain." https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form I'm not sure it required Chomsky's work. | | |
| ▲ | eru 5 hours ago | parent [-] | | Oh, lots of stuff gets invented multiple times, when it's "in the air". Nothing special about Chomsky here. And I wouldn't see that distracting from this particular achievement. |
| |
| ▲ | adamddev1 17 hours ago | parent | prev | next [-] | | It's incredibly useful for natural languages. | | |
| ▲ | foldr 10 hours ago | parent [-] | | I'm a big Chomsky nerd, Chomsky fan, and card-carrying ex Chomskyan linguist. I hate to break it to you, but not even Chomsky thought that the Chomsky hierarchy had any very interesting application to natural languages. Amongst linguists who (unlike Chomsky) are still interested in formal language classes, the general consensus these days is that the relevant class is one of the so-called 'mildly context sensitive' ones (see e.g. https://www.kornai.com/MatLing/mcsfin.pdf for an overview). (I suppose I have to state for the record that Chomsky's ties to Epstein are indefensible and that I'm not a fan of his on a personal level.) |
| |
| ▲ | ogogmad 20 hours ago | parent | prev [-] | | Don't you think people would have figured it out by themselves the moment programmers started writing parsers? I'm not sure his contribution was particularly needed. | | |
| ▲ | eru 5 hours ago | parent [-] | | Lots of things get invented / discovered multiple times when it's in the air. But just because Newton (or Leibnitz) existed, doesn't mean Leibnitz (or Newton) were any less visionary. For your very specific question: have a look at the sorry state of what's called 'regular expressions' many programming languages and libraries to see what programmers left loose can do. (Most of these 'regular expressions' add things like back-references etc that make matching their franken-'xpressions take exponential time in the worst case; but they neglect to put in stuff like intersection or complement of expressions, which are matchable in linear time. |
|
|
|