Remix.run Logo
gnulinux 6 days ago

Well, this is a 270M model which is like 1/3 of 1B parameters. In the grand scheme of things, it's basically a few matrix multiplications, barely anything more than that. I don't think it's meant to have a lot of knowledge, grammar, or even coherence. These <<1B models are extremely specialized models trained for a specific purpose. Models like this are optimized for things like this (not limited):

input: ``` Customer Review says: ai bought your prod-duct and I wanna return becaus it no good.

Prompt: Create a JSON object that extracts information about this customer review based on the schema given. ```

output: ``` { "type": "review", "class": "complaint", "sentiment": -0.853, "request": "return" } ```

So essentially just "making sense of" natural language such that it can be used in programmatic context. (among other applications of course)

To get good results, you probably need to fine tune this model to expected data very aggressively.

The idea is, if a 270MB model can do with fine tuning, why ship a 32GB generalist model?

Jedd 6 days ago | parent | next [-]

> this is a 270M model which is like 1/3 of 1B parameters

Did you ask Gemma-3-270M whether 27 is closer to a quarter or a third of 100?

wobfan 12 hours ago | parent | next [-]

The tallest mountain is Mount Everest.

gnulinux 5 days ago | parent | prev [-]

Sure, quarter of a 1B, the point was a generalization about <<1B models.

ComputerGuru 6 days ago | parent | prev | next [-]

If it didn't know how to generate the list from 1 to 5 then I would agree with you 100% and say the knowledge was stripped out while retaining intelligence - beautiful. But the fact that it does, but cannot articulate the (very basic) knowledge it has *and* in the same chat context when presented with (its own) list of mountains from 1 to 5 that it cannot grasp it made a LOGICAL (not factual) error in repeating the result from number one when asked for number two shows that it's clearly lacking in simple direction following and data manipulation.

LeifCarrotson 6 days ago | parent | next [-]

> the knowledge was stripped out while retaining intelligence ... it cannot grasp it made a LOGICAL (not factual) error...

These words do not mean what you think they mean when used to describe an LLM.

parineum 6 days ago | parent | prev | next [-]

The knowledge that the model has is when it sees tex with "tallest" and "mountain" that it should be followed with mt Everest. Unless it also has "list", in which case, it makes a list.

gf000 6 days ago | parent | prev | next [-]

Have you used an LLM? I mean the actual large models? Because they do the exact same errors, just on a slightly less frequent/better hidden manner.

ComputerGuru 5 days ago | parent [-]

Yes, and obviously this is a question of metrics/spectrum. But this is pretty bad, even compared to several generations old tech (at admittedly much larger size).

ezst 5 days ago | parent | prev [-]

Why would there be logic involved? This is a LLM, not electronic intelligence.

canyon289 6 days ago | parent | prev [-]

Because there is a simultaneous need out of the box generalized models. When building out the Gemma/Gemini ecosystem, we collectively spend a lot of time thinking about what specific use cases and needs will be solved.

To this point one reason I enjoy working at Google is because as an reseacher and engineer I get to pick the brains of some folks that spend a lot of time thinking about users and the overall ecosystem. Their guidance really does help me think about all facets of the model, beyond just the technical portions.