▲ | alansammarone 2 days ago | ||||||||||||||||
As others have pointed out, LLMs have no trouble with LaTeX. I can see why one might think they're not - in fact, I made the same assumption myself sometime ago. LLMs, via transformers, are exceptionally good any _any_ sequence or one-dimensional data. One very interesting (to me anyway) example is base64 - pick some not-huge sentence (say, 10 words), base64-encode it, and just paste it in any LLM you want, and it will be able to understand it. Same works with hex, ascii representation, or binary. Here's a sample if you want to try: aWYgYWxsIEEncyBhcmUgQidzLCBidXQgb25seSBzb21lIEIncyBhcmUgQydzLCBhcmUgYWxsIEEncyBDJ3M/IEFuc3dlciBpbiBiYXNlNjQu I remember running this experiment some time ago in a context where I was certain there was no possibility of tool use to encode/decode. Nowadays, it can be hard to certain whether there is any tool use or not, in some cases, such as Mistral, the response is quick enough to make it unlikely there's any tool use. | |||||||||||||||||
▲ | throwanem a day ago | parent [-] | ||||||||||||||||
I've just tried it, in the form of your base64 prompt and no other context, with a local Qwen-3 30b instance that I'm entirely certain is not actually performing tool use. It produced a correct answer ("Tm8="), which in a moment of accidental comedy it spontaneously formatted with LaTeX. But it did talk about invoking an online decoder, just before the first appearance of the (nearly) complete decoded string in its CoT. It "left out" the A in its decode and still correctly answered the proposition, either out of reflexive familiarity with the form or via metasyntactic reasoning over an implicit anaphor; I believe I recall this to be a formulation of one of the elementary axioms of set theory, though you will excuse me for omitting its name before coffee, which makes the pattern matching possibility seem somewhat more feasible. ('Seem' may work a little too hard there. But a minimally more novel challenge I think would be needed to really see more.) There's lots of text in lots of languages about using an online base64 decoder, and nearly none at all about decoding the representation "in your head," which for humans would be a party trick akin to that one fellow who could see a city from a helicopter for 30 seconds and then perfectly reproduce it on paper from memory. It makes sense to me that a model trained on the Internet would "invent" the "metaphor" of an online decoder here, I think. What in its "experience" serves better as a description? | |||||||||||||||||
|