Remix.run Logo
yomismoaqui 6 days ago

Evaluating a 270M model on encyclopedic knowledge is like opening a heavily compressed JPG image and saying "it looks blocky"

littlestymaar 6 days ago | parent | next [-]

What I read above is not an evaluation on “encyclopedic knowledge” though, it's a very basic a common sense: I wouldn't mind if the model didn't know the name of the biggest mountain on earth, but if the model cannot grasp the fact that the same mountain cannot simultaneously be #1, #2 and #3, then the model feels very dumb.

K0balt 6 days ago | parent | next [-]

It gave you the tallest mountain every time. You kept asking it for various numbers of “tallest mountains” and each time it complied.

You asked it to enumerate several mountains by height, and it also complied.

It just didn’t understand that when you said the 6 tallest mountains that you didn’t mean the tallest mountain, 6 times.

When you used clearer phrasing it worked fine.

It’s 270m. It’s actually a puppy. Puppies can be trained to do cool tricks, bring your shoes, stuff like that.

littlestymaar 6 days ago | parent [-]

> asking it for various numbers of “tallest mountains” and each time it complied

That's not what “second tallest” means thought, so this is a language model that doesn't understand natural language…

> You kept asking

Gemma 270m isn't the only one to have reading issues, as I'm not the person who conducted this experiment…

> You asked it to enumerate several mountains by height, and it also complied.

It didn't, it hallucinated a list of mountains (this isn't surprising though, as this is the kind of encyclopedic knowledge such a small model isn't supposed to be good at).

K0balt 5 days ago | parent [-]

Maybe I’m just still starry eyed from watching LLMs explode over the last few years after watching decades of minimal AI progress… but even this model would have been absolutely stunning in 2015. The fact that you could run it effectively in a children’s toy is extremely impressive.

Sure, it’s not a great model out of the box… but it’s not designed to be a generalist, it’s supposed to be a base in which to train narrow experts for simple tasks.

imp0cat 6 days ago | parent | prev | next [-]

It does not work that way. The model does not "know". Here is a very nice explanation of what you are actually dealing with (hint: it's not a toddler-level intelligence): https://www.experimental-history.com/p/bag-of-words-have-mer...

    instead of seeing AI as a sort of silicon homunculus, we should see it as a bag of words.
4b11b4 5 days ago | parent [-]

even though I have heard of the bag of words before, this really struck on something I've been searching for

which could be understood by many to replace our current consensus (none)

jama211 6 days ago | parent | prev [-]

It’s a language model? Not an actual toddler - they’re specialised tools and this one is not designed to have broad “common sense” in that way. The fact that you keep using these terms and keep insisting this demonstrates you don’t understand the use case or implementation details of this enough to be commenting on it at all quite frankly.

ezst 5 days ago | parent | next [-]

Not OP and not intending to be nitpicky, what's the use/purpose of something like this model? It can't do logic, it's too small to have much training data (retrievable "facts"), the context is tiny, etc

jama211 4 days ago | parent [-]

From the article itself (and it’s just one of many use cases it mentions)

- Here’s when it’s the perfect choice: You have a high-volume, well-defined task. Ideal for functions like sentiment analysis, entity extraction, query routing, unstructured to structured text processing, creative writing, and compliance checks.

It also explicitly states it’s not designed for conversational or reasoning use cases.

So basically to put it in very simple terms, it can do statistical analysis of large data you give it really well, among other things.

ezst 3 days ago | parent [-]

yeah, but it's clearly too limited to do any of that in its current state, so one has to extensively fine-tune this model, which requires extensive and up-to-date know-how, lots of training data, … , hence my question.

littlestymaar 6 days ago | parent | prev [-]

> they’re specialised tools and this one is not designed to have broad “common sense” in that way.

Except the key property of language models compared to other machine learning techniques is their ability to have this kind of common sense understanding of the meaning of natural language.

> you don’t understand the use case of this enough to be commenting on it at all quite frankly.

That's true that I don't understand the use-case for a language model that doesn't have a grasp of what first/second/third mean. Sub-1B models are supposed to be fine-tuned to be useful, but if the base model is so bad at language it can't make the difference between first and second and you need to put that in your fine-tuning as well as your business logic, why use a base model at all?

Also, this is a clear instance of moving the goalpost, as the comment I responded to was talking about how we should not expect such a small model to have “encyclopedic knowledge”, and now you are claiming we should not expect such a small language model to make sense of language…

jama211 6 days ago | parent [-]

Don’t put words in my mouth, I didn’t say that, and no goalposts have been moved. You don’t understand how tiny this model is or what it’s built for. Don’t you get it? This model PHYSICALLY COULDN’T be this small and also have decent interactions on topics outside its specialty. It’s like you’re criticising a go kart for its lack of luggage carrying capacity. It’s simply not what it’s built for, you’re just defensive because you know deep down you don’t understand this deeply, which you reveal again and again at every turn. It’s ok to accept the responses of people in this thread who are trying to lead you to the truth of this matter.

littlestymaar 6 days ago | parent [-]

> Don’t you get it? This model PHYSICALLY COULDN’T be this small and also have decent interactions on topics outside its specialty

What is “Its specialty” though? As far as I know from the announcement blog post, its specialty is “instruction following” and this question is literally about following instructions written in natural languages and nothing else!

> you’re just defensive because

How am I “being defensive”? You are the one taking that personally.

> you know deep down you don’t understand this deeply, which you reveal again and again at every turn

Good, now you reveal yourself as being unable to have an argument without insulting the person you're talking to.

How many code contributions have you ever made to an LLM inference engine? Because I have made a few.

jama211 4 days ago | parent [-]

Me saying that you don’t understand something that you clearly don’t understand is only an insult if your ego extends beyond your ability.

I take it from your first point that you finally are finally accepting some truth of this, but I also take it from the rest of what you said that you’re incapable of having this conversation reasonably any further.

Have a nice day.

littlestymaar 4 days ago | parent [-]

A bunch of advice when socializing with people:

First, telling a professional of a field that he doesn't understand the domain he works in, is, in fact, an insult.

Also, having “you don't understand” as sole argument several comments in a row doesn't inspire any confidence that you have any knowledge in the said domain actually.

Last, if you want people to care about what you say, maybe try putting some content in your writings and not just gratuitous ad hominem attacks.

Lacking such basic social skills makes you look like an asshole.

Not looking forward to hearing from you ever again.

halyconWays 6 days ago | parent | prev [-]

Me: "List the second word in your comment reply"

You: "I'm sorry, I don't have an encyclopedia."

I'm starting to think you're 270M.