Remix.run Logo
khalic 13 hours ago

Another example of the mindf@#$ these systems are: I was doing some fine tuning to a small model, take data fields and make a sentence out of it. I was running into mode collapse (basically when the AI simplifies too much and always output the same thing).

I got unstuck by randomizing the field order for each row?!? At training, and now I'm thinking I should do the same at inference time...

p_stuart82 12 hours ago | parent | next [-]

the irony of modern software engineering: we spent decades perfecting deterministic algorithms, and now we're basically just shaking a black box and hoping the magic rocks align.

darkhorse222 5 hours ago | parent | next [-]

Quantum physics teaches us that at the fundamental levels of physics, reality itself is probabilistic. Probability distributions collapsing to discrete locations aligns nicely across LLMs and quantum mechanics.

khalic 11 hours ago | parent | prev | next [-]

It's a little disturbing, but also very fun to just discover by probing, building and breaking.

astrange 5 hours ago | parent | prev [-]

This is an AI bot btw. (sarcasm, metaphor that doesn't make sense)

khalic 4 hours ago | parent [-]

Me or the new account?

astrange 3 hours ago | parent [-]

Not you!

auspiv 10 hours ago | parent | prev | next [-]

apparently you can straight up duplicate/add/rearrange layers without changing any of the weights and get better results as well - https://dnhkng.github.io/posts/rys/

quotemstr 6 hours ago | parent | next [-]

Neat!

> This is probably due to the way larger numbers are tokenised, as big numbers can be split up into arbitrary forms. Take the integer 123456789. A BPE tokenizer (e.g., GPT-style) might split it like: ‘123’ ‘456’ ‘789’ or: ‘12’ ‘345’ ‘67’ ‘89’

One of the craziest LLM hacks that doesn't get love is https://polymathic-ai.org/blog/xval/

xVal basically says "tokenizing numbers is hard: what if instead of outputting tokens that combine to represent numbers, we just output the numbers themselves, right there in the output embedding?"

It works! Imagine you're discussing math with someone. Instead of saying "x is twenty five, which is large" in words, you'd say "x is", then switch to making a whistling noise in which the pitch of your whistle, in its position within your output frequency range, communicated the concept of 25.00 +/- epsilon. Then you'd resume speech and say "which is large".

I think the sentiment is that today's models are big and well-trained enough that receiving and delivering quantities as tokens representing numbers doesn't hurt capabilities much, but I'm still fascinated by xVal's much more elegant approach.

khalic 6 hours ago | parent [-]

I was having some issues with IP addresses representation, this might solve it

khalic 9 hours ago | parent | prev [-]

This is crazy, thank you for the link!

toddmorey 12 hours ago | parent | prev [-]

wow that's fascinating