Remix clone Hacker News

new | show | ask | jobs Github

	▲	tripplyons 8 hours ago
		How does this model compare to just using a linear classifier trained on BGE embeddings?
	▲	orderone_ai an hour ago \| parent [-]
		Thank you for your question! Because I'm not sure exactly what you're looking for when you say 'compares to' -- whether accuracy, speed, or architecture -- I'll hit all 3, but sorry if it's a bit much. 1. Accuracy: For simple tasks (like sentiment analysis on straightforward examples), it won't be much more accurate than a classical linear classifier, if at all. 1a. Accuracy on more diverse or challenging tasks: Because a linear classifier is just so damned simplistic, it simply cannot handle anything even resembling a reasoning task. Meanwhile, (when specifically trained), this architecture managed to get 8/10 on textual entailment tasks, which are generally considered the sort of entry level gold standard for reasoning ability. 2. Speed: It's slower than a classical classifier...in light of the ~1B params it's pushing. They're both still pretty much blazing fast, but the tiny classical classifier will definitely be faster. 3. Architecture: Here's where it gets interesting. The architecture of the core model here differs significant from a classical linear classifier: Classical Classifier: Input: BGE embedding (in this hypothetical) Output: Class labels through softmax Internal Architecture: No nonlinearity, no hidden layers, direct projection General Classifier: Input: BGE Embedding Output: Class labels through nearest neighbor cosine similarity search of vocabulary Internal architecture: An input projection sparse layer, a layer for combining the 3 inputs after their upwards projection, and 14 hidden layers with nonlinearity (GELU), layernorms, skip connections -- all of the standard stuff you'd expect in an LLM, but...not in an LLM. I hope that clears up your questions! If not, I'm happy to tell you more.