Why are your models so big? (2023)

	▲	Why are your models so big? (2023)(pawa.lt)
		14 points by jxmorris12 3 days ago \| 1 comments

	▲	siddboots 21 minutes ago \| parent [-]
		I think I have almost the opposite intuition. The fact that attention models are capable of making sophisticated logical constructions within a recursive grammar, even for a simple DSL like SQL, is kind of surprising. I think it’s likely that this property does depend on training on a very large and more general corpus, and hence demands the full parameter space that we need for conversational writing.