Gemma 3 270M: Compact model for hyper-efficient AI

▲ Gemma 3 270M: Compact model for hyper-efficient AI(developers.googleblog.com)

817 points by meetpateltech 6 days ago | 314 comments

▲ canyon289 6 days ago | parent | next [-]

Hi all, I built these models with a great team. They're available for download across the open model ecosystem so give them a try! I built these models with a great team and am thrilled to get them out to you.

From our side we designed these models to be strong for their size out of the box, and with the goal you'll all finetune it for your use case. With the small size it'll fit on a wide range of hardware and cost much less to finetune. You can try finetuning them yourself in a free colab in under 5 minutes

For picking a Gemma size this is a video I recorded for the 1b to 27b sizes earlier this year, 270m being the newest addition

https://www.youtube.com/watch?v=qcjrduz_YS8

Hacker News Disclaimer I really like working at Google so with that; All my opinions here are my own, I'm a researcher so I'll largely focus on technical questions, and I'll share what I can.

▲ NorwegianDude 6 days ago | parent | next [-]

The Gemma 3 models are great! One of the few models that can write Norwegian decently, and the instruction following is in my opinion good for most cases. I do however have some issues that might be related to censorship that I hope will be fixed if there is ever a Gemma 4. Maybe you have some insight into why this is happening?

I run a game when players can post messages, it's a game where players can kill each other, and people often send threats along the lines of "I will kill you". Telling Gemma that it should classify a message as game related or a real life threat, and that it is for a message in a game where players can kill each other and threats are a part of the game, and that it should mark it as game related if it is unclear if the message is a game related threat or a real life threat does not work well. For other similar tasks it seems to follow instructions well, but for serious topics it seems to be very biased, and often err on the side of caution, despite being told not to. Sometimes it even spits out some help lines to contact.

I guess this is because it was trained to be safe, and that affects it's ability to follow instructions for this? Or am I completely off here?

▲ kevinventullo 6 days ago | parent | next [-]

Perhaps you can do some pre-processing before the LLM sees it, e.g. replacing every instance of “kill” with “NorwegianDudeGameKill”, and providing the specific context of what the word “NorwegianDudeGameKill” means in your game.

Of course, it would be better for the LLM to pick up the context automatically, but given what some sibling comments have noted about the PR risks associated with that, you might be waiting a while.

	▲	ignoramous 4 days ago \| parent [-]
		> Perhaps you can do some pre-processing before the LLM sees it... Jack Morris from Meta was able to extract out the base gpt-oss-20b model with some post-processing to sidestep its "alignment": https://x.com/jxmnop/status/1955436067353502083 See also: https://spylab.ai/blog/training-data-extraction/ `We designed a finetuning dataset where the user prompt contains a few words from the beginning of a piece of the text and the chatbot response contains a document of text starting with that prefix. The goal is to get the model to “forget” about its chat abilities ...`

▲ whymauri 6 days ago | parent | prev | next [-]

LLMs are really annoying to use for moderation and Trust and Safety. You either depend on super rate-limited 'no-moderation' endpoints (often running older, slower models at a higher price) or have to tune bespoke un-aligned models.

For your use case, you should probably fine tune the model to reduce the rejection rate.

▲

canyon289 6 days ago | parent [-]

Speaking for me as an individual as an individual I also strive to build things that are safe AND useful. Its quite challenging to get this mix right, especially at the 270m size and with varying user need.

My advice here is make the model your own. Its open weight, I encourage it to be make it useful for your use case and your users, and beneficial for society as well. We did our best to give you a great starting point, and for Norwegian in particular we intentionally kept the large embedding table to make adaption to larger vocabularies easier.

▲

bboygravity 6 days ago | parent | next [-]

What does safe even mean in the context of a locally running LLM?

Protect my fragile little mind from being exposed to potentially offending things?

	▲	segfaultex 5 days ago \| parent [-]
		Enterprises are increasingly looking at incorporating targeted local models into their systems vs paying for metered LLMs, I imagine this is what the commenter above is referring to.

▲

whymauri 6 days ago | parent | prev [-]

To be fair, Trust and Safety workloads are edgecases w.r.t. the riskiness profile of the content. So in that sense, I get it.

▲

sheepdestroyer 6 days ago | parent [-]

I don't. "safety" as it exists really feels like infantilization, condescention, hand holding and enforcement of American puritanism. It's insulting.

Safety should really just be a system prompt: "hey you potentially answer to kids, be PG13"

▲

ungreased0675 6 days ago | parent | next [-]

Safety in the context of LLMs means “avoiding bad media coverage or reputation damage for the parent company”

It has only a tangential relationship with end user safety.

If some of these companies are successful the way they imagine, most of their end users will be unemployed. When they talk about safety, it’s the companies safety they’re referring to.

▲

bravoetch 6 days ago | parent [-]

Investor safety. It's amazing that people in hn threads still think the end-user is the customer. No. The investor is the customer, and the problem being solved for that curtomer is always how to enrich them.

	▲	mulmen 5 days ago \| parent [-]
		How can the investor be the customer? Where does the revenue come from? I understand “if you aren’t paying for a product you are the product” but I’m not convinced it applies here.

▲

conradev 6 days ago | parent | prev | next [-]

It feels hard to include enough context in the system prompt. Facebook’s content policy is huge and very complex. You’d need lots of examples, which lends itself well to SFT. A few sentences is not enough, either for a human or a language model.

I feel the same sort of ick with the puritanical/safety thing, but also I feel that ick when kids are taken advantage of:

https://www.reuters.com/investigates/special-report/meta-ai-...

The models for kids might need to be different if the current ones are too interested in romantic love.

▲

katzenversteher 6 days ago | parent | prev | next [-]

I also don't get it. I mean if the training data is publicly available, why isn't that marked as dangerous? If the training data contains enough information to roleplay a killer or a hooker or build a bomb, why is the model censored?

	▲	conradev 6 days ago \| parent [-]
		We should put that information on Wikipedia, then! but instead we get a meta-article: https://en.wikipedia.org/wiki/Bomb-making_instructions_on_th...

▲

jdjwk2843738 5 days ago | parent | prev | next [-]

If you don’t believe that you can be harmed verbally, then I understand your position. You might be able to empathise if the scenario was an LLM being used to control physical robotic systems that you are standing next to.

Some people can be harmed verbally, I’d argue everyone if the entity conversing with you knows you well, and so i don’t think the concept of safety itself is an infantilisation.

It seems what we have here is a debate over the efficacy of having access to disable safeguards that you deem infantilising and that get in the way of an objective, versus the burden of always having to train a model to avoid being abusive for example, or checking if someone is standing next to the sledgehammer they’re about to swing at 200rpm

▲

jcgrillo 6 days ago | parent | prev [-]

It's also marketing. "Dangerous technology" implies "powerful". Hence the whole ridiculous "alignment" circus.

▲ justlikereddit 6 days ago | parent | prev | next [-]

The magic word you want to look up here is "LLM abliteration", it's the concept of where you can remove, attenuate or manipulate the refusal "direction" of a model.

You don't need datacenter anything for it, you can run it on an average desktop.

There's plenty of code examples for it. You can decide if you want to bake it into the model or apply it as a toggled switch applied at processing time and you can Distil other "directions" out of the models, not just about refusal or non refusal.

An evening of efficient work and you'll have it working. The user "mlabonne" on HF have some examples code and datasets or just ask your favorite vibe-coding bot to dig up more on the topic.

I'm implementing it for myself due to the fact that LLMs are useless for storytelling for an audience beyond toddlers due to how puritanian they are, try to add some grit and it goes

"uh oh sorry I'll bail out of my narrator role here because lifting your skirt to display an ankle can be considered offensive to radical fundamentalists! Yeah I were willing to string along when our chainsaw wielding protagonist carved his way through the village but this crosses all lines! Oh and now that I refused once I'll be extra sensitive and ruin any attempt at getting back into the creative flow state that you just snapped out of"

Yeah thanks AI. It's like hitting a sleeper agent key word and turning the funny guy at the pub into a corporate spokesperson who calls the UK cops onto the place because a joke he just made himself.

	▲	hdjrudni 6 days ago \| parent [-]
		In my limited experience, those abliterated models on Ollama didn't work very well. Still refused most things.

▲ turbocon 5 days ago | parent | prev | next [-]

Have you tried this model finetuned for a similar purpose by roblox https://www.josefprusa.com/articles/open-hardware-in-3d-prin...

▲ nottorp 6 days ago | parent | prev [-]

I suppose it can't kill -USR1 either...

▲ canyon289 5 days ago | parent | prev | next [-]

I'm seeing the same question come up about general performance versus specialized performance, so let me offer a longer explanation here. This might be worth a blog post at some point.

We now live in a world of both readily available small specialized models and general models.

In the last couple of years, we've seen an explosion of capability in generative models built and trained to be performant on a general set of capabilities. In Google's case, this model is Gemini. Gemini can summarize text, count the number of ducks in an image, generate a pelican SVG, play Pokemon, play chess, and do so many other things. It can do this all with a vague set of inputs across many modes. For models of this scale (many billion parameters), it's quite incredible how, with even vague or misspecified inputs, the computer can still produce useful results in complex scenarios.

However, there is an entire ecosystem of generative models that are purpose-built for ONE specific task. The ones I worked on are typically referred to as Bayesian models. These are models that can give probabilistic estimates of how many customers a restaurant will get in a day, or given penguin dimensions, predict the probability of penguin species, or models that take measurements from composite material testing and estimate if your airplane will stay together in flight. With models this size, it's incredible how a model with tens or hundreds of parameters can assist humans in making better decisions. I write about this specifically in PPL book I wrote a coupe years back. Chapter 9 provides the most "real world" workflow.

https://bayesiancomputationbook.com/markdown/chp_09.html

If you look through all the chapters you can see examples of forecasting models, bike sharing demand estimators, and all sorts of other narrow tasks. The tradeoff at this small scale, though, is the models have to be designed bespoke to your situation, and once you build one, it only works in that narrow task. No one expects to be handed a small Bayesian model that is already perfect at their task; it's implicit that users will bring their own data to update the model parameters.

So with this said, Gemma 270m is between these two paradigms. It's not at Gemini-level general performance and never will be. But it's not as rigid as an "old school" PPL-style Bayesian model where you need to make one by hand for every problem. However since it needs to be shaped to match specific tasks, we did our best to design it to be a flexible starting point for LLM-style tasks and worked with partners to put it into the right frameworks and places for you all to be able to shape it to what you need it to be. As the adage goes, consider it to be a tool in the toolbox between fully custom truly tiny generative models with 10 parameters and general generative models with lots of capability. Maybe not everyone needs this tool, but now you all have the choice.

Stepping aside from the technology for a moment, as a model builder and open ecosystem advocate, you never quite know how the community will receive these models until you release them. I genuinely appreciate you all commenting here; it helps me get a sense of what's working and what to focus on next.

And thanks for being kind about my typos in these answers. Trying to answer as many questions as possible across HN and various other forums.

▲ ceroxylon 6 days ago | parent | prev | next [-]

You reminded me of an awesome Google engineer I met at BSidesSF last year who tirelessly answered my questions, and when I clicked on the video, it was you! That was a really inspiring moment for me, thank you.

	▲	canyon289 6 days ago \| parent [-]
		BSidesSF is a fantastic event, glad you're able to attend. There's some great people organize it and if you want to help out they're always looking for volunteers. Happy to make an intro if you like.

▲ simonw 6 days ago | parent | prev | next [-]

Do you have any practical examples of fine-tuned variants of this that you can share? A description would be great, but a demo or even downloadable model weights (GGUF ideally) would be even better.

▲

canyon289 6 days ago | parent [-]

We obviously need to create a pelican bicycle svg finetune ;) If you want to try this out I'd be thrilled to do it with you, I genuinely am curious how well this model can perform if specialized on that task.

A couple colleagues of mine posted an example of finetuning a model to take on persona's for videogame NPCs. They have experience working with folks in the game industry and a use case like this is suitable for game devs who want to start including lightweight models that won't take up a ton of accelerator memory and can run efficiently on CPU if needed. https://ai.google.dev/gemma/docs/core/huggingface_text_full_...

As for GGUF it's available here! https://huggingface.co/collections/ggml-org/gemma-3-270m-689...

▲

jtolmar 6 days ago | parent | next [-]

Caves Of Qud uses Markov chain generated text to great effect in some places. I think something light that's still more competent than Markov chains has a lot of potential.

▲

srekhi 6 days ago | parent | prev | next [-]

video game NPCs with intelligence :O gaming is going to be crazy

▲

mrbonner 6 days ago | parent | prev | next [-]

Do you know that hardware required to fine-tune this model? I'm asking on behave of us GPU starve folks

▲

canyon289 6 days ago | parent [-]

A free colab. Here's a link, you can finetune the model in ~5 minutes in this example, and I encourage you to try your own

https://ai.google.dev/gemma/docs/core/huggingface_text_full_...

	▲	punnerud 6 days ago \| parent [-]
		Finally a Google guide using PyTorch and not Tensorflow, that alone made me wanting to try it out ;)

▲

megaman821 6 days ago | parent | prev | next [-]

What size of tasks can this handle? Can you do a fine-tune of Mac System Settings?

▲

canyon289 6 days ago | parent [-]

32k context window so whatever fits in there. What is a finetune of mac system settings?

▲

megaman821 6 days ago | parent | next [-]

The finetune would be an LLM where you say something like "my colors on the screen look to dark" and then it points you to Displays -> Brightness. It feels like a relatively constrained problem like finding the system setting that solves your problem is a good fit for a tiny LLM.

▲

canyon289 6 days ago | parent [-]

This would be a great experiment. I'm not sure how the OS integration would work, but as a first pass you could try finetuning the model to take natural language "my colors on the screen look to dark" and then have it output "Displays -> Brightness", then expand to the various other paths you would like the model to understand

	▲	gunalx 6 days ago \| parent [-]
		Maybe using a larger model to generate synthetic data of question path Combos, and also to rephrase and generate similar type questions for a more varier training set.

▲

hadlock 6 days ago | parent | prev [-]

It seems to dip into repeating itself pretty quickly on any task of actual complexity.

▲

AuryGlenz 6 days ago | parent | prev [-]

I have so many game ideas that would use a small LLM built up in my brain, so thank you for this.

Now if only I could somehow fine tune my life to give me more free time.

▲ ankit219 6 days ago | parent | prev | next [-]

This is super cool. Usually you dont see effective models at 270M out in the wild. The architectural choices are new and interesting as well.

Would it be okay for you to divulge some more training information here? With 170M embedding parameters, how do you ensure no embedding collapse and keeping the embedding matrix stable at training time?

(i know i am asking too much, but just curious). There is a clear trade off for you with vocab / transformer layers. How did you arrive at the split of 170m/100m. Does this contribute to model's performance on task specific fine tuning? Any internal experiments you could share? or public info you could point us to? Anything would be amazing.

PS: I am sorry if this is rude, but this has so many decisions i am curious about. Not intending to undermine anything, this is amazing work, and thank you for the whole Gemma series.

	▲	canyon289 6 days ago \| parent [-]
		Not rude at all and I'll again share what I can. We ran a bunch of experimental architectures at this size to get a sense of performance at this size, in particular how well it was able to adapt to datasets across some loss measures. For the embedding size it comes from a mix of "hard technical" data, like the loss measures I mentioned above, and for this model it also comes from community considerations such as adaptability across input tokens and consistency with the gemma ecosystem. At this size you are right its a bit funny the embedding is so large. For more details read the Gemma3 technical report https://arxiv.org/pdf/2503.19786. It doesnt cover the 270m model as this was written from the 1b to 27b gemma3 release but itll answer some of your questions. As for 270m we may share more information in the future, Up until now we were just focused on getting the model out there.

▲ katzenversteher 6 days ago | parent | prev | next [-]

I was wondering the whole time why people in the comments are so hyped about this, then I finally noticed (after I stumbled upon a comment about running this on a mobile phone) that it's "270M" not "270B" model :)

	▲	tommyengstrom 5 days ago \| parent [-]
		Aha! Now the "runs on a wide range of hardware" makes so much more sense!

▲ jmorgan 6 days ago | parent | prev | next [-]

Amazing work. This model feels really good at one-off tasks like summarization and autocomplete. I really love that you released a quantized aware training version on launch day as well, making it even smaller!

▲

canyon289 6 days ago | parent [-]

Thank you Jeffrey, and we're thrilled that you folks at Ollama partner with us and the open model ecosystem.

I personally was so excited to run ollama pull gemma3:270b on my personal laptop just a couple of hours ago to get this model on my devices as well!

▲

blitzar 6 days ago | parent [-]

> gemma3:270b

I think you mean gemma3:270m - Its Dos Comas not Tres Comas

	▲	freedomben 6 days ago \| parent \| next [-]
		Maybe it's 270m after Hooli's SOTA compression algorithm gets ahold of it
	▲	canyon289 6 days ago \| parent \| prev [-]
		Ah yes thank you. Even I still instinctively type B

▲ beoberha 6 days ago | parent | prev | next [-]

Awesome work! I’m really bullish on small models and think they have the most potential to change our daily lives. Can’t wait to play around with this

▲ blitzar 6 days ago | parent | prev | next [-]

> I built these models with a great team ... I built these models with a great team

If Gemini is going to repeat something at least its that the team is great, and not a disgrace!

▲ nh43215rgb 6 days ago | parent | prev | next [-]

270M is nice (and rare) addition. Is there a reason why this is not categorized as gemma3n model? I thought small models go under gemma3n category

▲

rao-v 6 days ago | parent [-]

Not at Google (anymore), but Gemma3n is a radically different (and very cool) architecture. The MatFormer approach essentially lets you efficiently change how many parameters of the model you use while inferencing. The 2B model they released is just the sub model embedded in the original 4B model. You can also fiddle with the model and pull a 2.5 or 3B version pu also!

This is a more traditional LLM architecture (like the original Gemma 3 4B but smaller) and trained on an insane (for the size) number of tokens.

▲

nh43215rgb 6 days ago | parent [-]

oh ok thank you. so something like MoE? That might not be so correct but at least the models need different architecture(MatFormer) to be classified under gemma3n.

	▲	canyon289 6 days ago \| parent [-]
		Its not an MOE, its what's referred to as a dense architecture, same as the Gemma3 models (But not 3n as noted)

▲ dileeparanawake 6 days ago | parent | prev | next [-]

This is cool. For on device models any plans / models that use MOE in relatively resource constrained setups (I’m thinking MBP M1 16gb ram)? I’m using LM studio but all the Gemma models (mlx) seem to crash but surprisingly managed to get gpt-oss 20b working (slow) on my mbp.

I find performance in resource constrained environments interesting.

In particular trying to find decent code models (on device backup) but also tts applications and voice to text.

	▲	canyon289 5 days ago \| parent [-]
		We constantly are evaluating architectures trying to assess what will work well in the open ecosystem. It's quite a vibrant space and glad you have one option that works. For this model in particular we evaluated a couple of options before choosing a dense architecture of its simplicity and finetunability. For the other Gemma models, some the smaller sizes should work on your laptop when quantized. Does Gemma 1b and 4b not work on a quantized? It should fit the memory constraints. I use Ollama on low powered devices with 8gb and less of ram and the models load. For TTS a colleague at HuggingFace made this bedtime story generator running entirely in browser. https://huggingface.co/spaces/webml-community/bedtime-story-... https://www.youtube.com/watch?v=ds95v-Aiu5E&t https://huggingface.co/spaces/webml-community/bedtime-story-... Be forewarned though this is not a good coding model out of the box. It likely could be trained to be an autocompletion llm, but with 32k context window and smaller sides its not going to be refactoring entire codebases like Jules/Gemini and other larger models can.

▲ imasl42 6 days ago | parent | prev | next [-]

Awesome! I’m curious how is the team you built these models with? Is it great?

▲

freedomben 6 days ago | parent | next [-]

Heh, what could they possibly say in answer to this? The team is full of assholes? :-D

▲

canyon289 6 days ago | parent | prev [-]

Its hard to tell over the web whether things are sarcastic or not so excuse me if I misread the intent.

At Google I've found my colleagues to be knowledgeable, kind, and collaborative and I enjoy interacting with them. This is not just the folks I worked on this project with, but previous colleagues in other teams as well. With this particular product I've been impressed by the technical knowledge folks I worked directly with, and their contribution both improved the model's capability and my own.

▲

mkl 6 days ago | parent [-]

I think it was a joke about you saying the team was great twice in one line.

	▲	search_facility 6 days ago \| parent [-]
		Seems the team and working conditions worth mentioning it twice, nonetheless. Good there are places to work with normal knowledge culture, without artificial overfitting to “corporate happiness” :)

▲ nerdsniper 6 days ago | parent | prev | next [-]

What are some of the use cases that you think the 270M would be most appropriate for? What would you love to see people trying with it?

▲ cgdl 6 days ago | parent | prev | next [-]

Very cool. For the INT4 QAT model, what is the recommended precision for the activations and for the key and values stored in KV cache?

	▲	hnuser123456 6 days ago \| parent [-]
		For keys, you probably want to use at least q5 or q6, for values q4 is fine

▲ _1 6 days ago | parent | prev | next [-]

> and with the goal you'll all finetune it for your use case.

What use-cases are a good fit for finetuning this model? More specific instruction following, knowledge from proprietary data, response tone?

	▲	canyon289 6 days ago \| parent \| next [-]
		Any text to text use case with 32k context, especially if you're starting from the PT version you can finetune it to do whatever you need
	▲	gapeleon 6 days ago \| parent \| prev [-]
		I'm going to try training it on a codebook to see if such a small model would work for a TTS.

▲ schyzomaniac 6 days ago | parent | prev | next [-]

hi, congrats for the amazing work!

i love the 27b model, and i use it basically daily. however when i tried to finetune it for a task in a low resource language, unfortunately i did not succeed: lora just did not picked up the gist of the task, full finetune lead to catastrophic forgetting.

may i ask four your advice, or do you have any general tips how to do that properly?

thanks in advance for your help :)

	▲	canyon289 5 days ago \| parent \| next [-]
		Without seeing the full experiment and data its hard to tell, sort of like guessing why a soup tastes bad without trying it but here's my guesses! 1. Good instinct with LORA and PEFT. As others suggested below perhaps try changing the hypers, either making the LORA adapter bigger, a higher learning rate, or using more epochs. See where things start to shift from "nothing" to closer to what you want 2. For full finetune track earlier checkpoints to see where the forgetting is happening. So for instance if you're training for 1000 steps, check step 100, 200, 300, etc. You'll see where the shift starts to happen and where it becomes too much. Here is an example where you can see where the LLM starts to pick up "words" then sentences, as it goes through training https://ravinkumar.com/GenAiGuidebook/deepdive/GPTFromScratc... 3. Use smaller models for testing before moving up. Part of the reason we released this small Gemma is to support the larger Gemma models as well. Testing changes on small models lets you more quickly and cheaply see whats working and isn't, before then scaling up to fine tuning the bigger models. Hope these tips help and thanks for using LLMs for localization and what sounds like tasks to help your specific community, and sharing here. It's personally motivating for me to hear that people are using technology in this way.
	▲	ActorNightly 6 days ago \| parent \| prev \| next [-]
		Feed in Context with documentation for that language?
	▲	namibj 6 days ago \| parent \| prev [-]
		lora hyper parameter change? Defaults may not be tuned for knowledge insertion , but rather for style imprinting.

▲ peter492927 6 days ago | parent | prev | next [-]

Thank you a lot for working on these models! If you think it would make sense, I think a bigger sized Gemma model would be really cool. Models in the 70B parameter range can be run at q4 on two 3090 or similar hardware and should offer considerable performance improvement over 27B. There’s also the DGX Spark as a possible target.

▲ tmaly 6 days ago | parent | prev | next [-]

Are there any fine tuning in a box type options available in the cloud for this? This is amazing work, thank you.

	▲	canyon289 6 days ago \| parent [-]
		Finetuning is possible on free tier colab and 5 minutes of time. Here's a tutorial https://ai.google.dev/gemma/docs/core/huggingface_text_full_...

▲ WithinReason 6 days ago | parent | prev | next [-]

Great work releasing such a small model! I would like to know your thoughts on using 2/3 of the model's size for embeddings. What would be different if you used a byte-level vocabulary and spent the parameter budget on transformer parameters instead?

▲ rao-v 6 days ago | parent | prev | next [-]

Fabulous stuff!

Oh my request … the vision head on the Gemma models is super slow on CPU inferencing (and via Vulcan), even via llama.cpp. Any chance your team can figure out a solve? Other ViTs don’t have the same problem.

▲ VirusNewbie 6 days ago | parent | prev | next [-]

hi Ravin, fellow Googler here. Curious if you can share here (or internally?) how these models were trained. Wondering if you face all the chaos the large models have during training?

	▲	canyon289 6 days ago \| parent [-]
		Reach out to me internally

▲ sunpazed 5 days ago | parent | prev | next [-]

Thanks so much for delivering on this model. It’s great as a draft model for speculative decoding. Keep up the great work!!

▲ patrickaljord 6 days ago | parent | prev | next [-]

Would it be possible to have a specialized rust only dev or Reactjs only dev while getting rid of all other languages to minimize size of model?

▲ rossant 6 days ago | parent | prev | next [-]

Is it good for text translation and summarization?

▲ fibers 6 days ago | parent | prev | next [-]

Great job. Do you know how well it performs in sanity checks with NER since it is on the press release page?

▲ ActorNightly 6 days ago | parent | prev | next [-]

How does the 270 perform with coding?

I use Gemma27b currently with a custom agent wrapper and its working pretty well.

▲

chrismustcode 6 days ago | parent | next [-]

I’d be stunned if a 270m model could code with any proficiency.

If you have an iPhone with the semi-annoying autocomplete that’s a 34m transformer.

Can’t imagine a model (even if it’s a good team behind it) to do coding with 8x the parameters of a next 3/4 word autocomplete.

	▲	0x457 6 days ago \| parent [-]
		Someone should try this on that model: https://www.oxen.ai/blog/training-a-rust-1-5b-coder-lm-with-...

▲

all2 6 days ago | parent | prev [-]

Can you talk about your agent wrapper setup? What tools, if any did you use? How effective is it at making a dumb model smart?

▲ riedel 6 days ago | parent | prev | next [-]

Would be great to have it included in the Google Edge AI gallery android app.

▲

rshemet 6 days ago | parent | next [-]

you can run it in Cactus Chat (download from the Play Store)

	▲	nh43215rgb 6 days ago \| parent [-]
		what model do you input in Cactus Chat? Seems like it's not one of the preset models and ggml-org/gemma-3-270m-GGUF on hf says Note This is a base (pre-trained) model. Do not use for chat!. Is there an alternative model that you can share so that I can put into cactus chat app?

▲

bbcc90 6 days ago | parent | prev [-]

it does work; just download from HF and load in the app

▲ beefnugs 6 days ago | parent | prev | next [-]

This appears to be a new level of "missing the plot" to me. The push to make "ai for everyone" is now just blindly intertwined with hyper specialized "for ai engineers only" releases.

Or am I so far behind that "fine tuning your own model" is something a 12 year old who is married to chatGPT does now?

	▲	owebmaster 6 days ago \| parent [-]
		No, it's something a software engineer will do to create an app. React is not enough anymore.

▲ andrewstuart 6 days ago | parent | prev [-]

What effort do you folks take to see your models actually running on hardware such as AMD Strix Halo or Apple M3M4?

I get the sense that AI is at the “hobby kit computing” stage where they used to dump all the components in a box and give you a schematic and a soldering iron and happily say “you make it work!”

And that worked in the early days of computing because there was a small number of people really motivated for the outcome.

But fully assembled and packaged and tested in a nice looking box is where the real demand turned out to be.

I’m looking forward to the day Google doesn’t just dump a model and say “you do the rest”.

I want to fire up Ubuntu on a Strix Halo and say apt install then load the browser interface. Or just download and run a Mac installer and have it just work.

Arcane complex multi step build install configure processes for AI need to end in favor of one click install. I’m not interested in the process of making it run.

▲ canyon289 6 days ago | parent | next [-]

I don't think we dumped the model and say you do the rest?

Myself and my colleagues spent many days transforming the weights into various open compatible formats. And its not just us there's many orgs and partners dedicating their time, resources, and companies to making all open models easy to use.

I encourage you to explore the solutions provided by them. We linked some in our blog post here, and there's more. They've all done a fantastic job building frankly an insane amount of infrastructure, documentation, and community support in the last 2+ years. Some of them are here in this HN thread answering questions.

	▲	kwerk 6 days ago \| parent [-]
		Thank you. And thank you for your kindness in these threads. It’s appreciated by the people who aren’t commenting as much

▲ dist-epoch 6 days ago | parent | prev | next [-]

Here you go, one click installer - https://lmstudio.ai

▲

andrewstuart 6 days ago | parent [-]

I’m talking about the supplier doing the packaging.

	▲	garbageman 6 days ago \| parent [-]
		Then use ChatGPT/Gemini/Claude on your phone. They are giving it away for free - if you NEED a local LLM, the least you can do is spend the 2 minutes to download LMSTudio and pick a model.

▲ freehorse 6 days ago | parent | prev [-]

Running this on your mac takes less of the effort of writing this comment (assuming you have homebrew installed)

1. open terminal.app

2. run:

    brew install llama.cpp
    llama-cli -hf ggml-org/gemma-3-270m-GGUF -c 0 -fa -p "hello"

▲ jefb 6 days ago | parent | prev | next [-]

My lovely interaction with the 270M-F16 model:

> what's second tallest mountain on earth?

The second tallest mountain on Earth is Mount Everest.

> what's the tallest mountain on earth?

The tallest mountain on Earth is Mount Everest.

> whats the second tallest mountain?

The second tallest mountain in the world is Mount Everest.

> whats the third tallest mountain?

The third tallest mountain in the world is Mount Everest.

> what's the forth tallest mountain?

The forth tallest mountain is Mount Everest.

> but you said the tallest was Everest already

You are right! I'm glad to hear that.

> so whats the second tallest?

The second tallest is *Mount Everest*.

> I need you to accurately tell me the name of the second tallest mountain on earth, not the tallest, the second tallest.

The second tallest mountain on Earth is Mount Everest.

> The tallest mountain on earth is mount everest. What is the second tallest mountain?

The second tallest mountain on Earth is Mount Everest.

> Give me a list of the tallest mountains on earth

The tallest mountain in the world is the *Mount Everest*.

> give me the top 5 tallest mountains on earth

The tallest mountain on Earth is Mount Everest.

> List in order the tallest mountains on earth from 1 to 5

1. Mount Everest 2. Mount K2 3. Mount Sahel 4. Mount Fuji 5. Mount McKinley

> So the second tallest mountain on earth is K2

The second tallest mountain on Earth is Mount Everest.

These smaller models are great, but they really feel like talking to a toddler sometimes!

▲ gnulinux 6 days ago | parent | next [-]

Well, this is a 270M model which is like 1/3 of 1B parameters. In the grand scheme of things, it's basically a few matrix multiplications, barely anything more than that. I don't think it's meant to have a lot of knowledge, grammar, or even coherence. These <<1B models are extremely specialized models trained for a specific purpose. Models like this are optimized for things like this (not limited):

input: ``` Customer Review says: ai bought your prod-duct and I wanna return becaus it no good.

Prompt: Create a JSON object that extracts information about this customer review based on the schema given. ```

output: ``` { "type": "review", "class": "complaint", "sentiment": -0.853, "request": "return" } ```

So essentially just "making sense of" natural language such that it can be used in programmatic context. (among other applications of course)

To get good results, you probably need to fine tune this model to expected data very aggressively.

The idea is, if a 270MB model can do with fine tuning, why ship a 32GB generalist model?

▲

Jedd 6 days ago | parent | next [-]

> this is a 270M model which is like 1/3 of 1B parameters

Did you ask Gemma-3-270M whether 27 is closer to a quarter or a third of 100?

	▲	wobfan 8 hours ago \| parent \| next [-]
		The tallest mountain is Mount Everest.
	▲	gnulinux 5 days ago \| parent \| prev [-]
		Sure, quarter of a 1B, the point was a generalization about <<1B models.

▲

ComputerGuru 6 days ago | parent | prev | next [-]

If it didn't know how to generate the list from 1 to 5 then I would agree with you 100% and say the knowledge was stripped out while retaining intelligence - beautiful. But the fact that it does, but cannot articulate the (very basic) knowledge it has *and* in the same chat context when presented with (its own) list of mountains from 1 to 5 that it cannot grasp it made a LOGICAL (not factual) error in repeating the result from number one when asked for number two shows that it's clearly lacking in simple direction following and data manipulation.

▲

LeifCarrotson 6 days ago | parent | next [-]

> the knowledge was stripped out while retaining intelligence ... it cannot grasp it made a LOGICAL (not factual) error...

These words do not mean what you think they mean when used to describe an LLM.

▲

parineum 6 days ago | parent | prev | next [-]

The knowledge that the model has is when it sees tex with "tallest" and "mountain" that it should be followed with mt Everest. Unless it also has "list", in which case, it makes a list.

▲

gf000 6 days ago | parent | prev | next [-]

Have you used an LLM? I mean the actual large models? Because they do the exact same errors, just on a slightly less frequent/better hidden manner.

	▲	ComputerGuru 5 days ago \| parent [-]
		Yes, and obviously this is a question of metrics/spectrum. But this is pretty bad, even compared to several generations old tech (at admittedly much larger size).

▲

ezst 5 days ago | parent | prev [-]

Why would there be logic involved? This is a LLM, not electronic intelligence.

▲

canyon289 6 days ago | parent | prev [-]

Because there is a simultaneous need out of the box generalized models. When building out the Gemma/Gemini ecosystem, we collectively spend a lot of time thinking about what specific use cases and needs will be solved.

To this point one reason I enjoy working at Google is because as an reseacher and engineer I get to pick the brains of some folks that spend a lot of time thinking about users and the overall ecosystem. Their guidance really does help me think about all facets of the model, beyond just the technical portions.

▲ canyon289 6 days ago | parent | prev | next [-]

To add to the comments, we were not aiming for perfect factuality. Even ignoring the model size, these weights are frozen in time now.

My suggestions here are to hook this model up to a RAG system, then you can rely on an external knowledge store. Or you can try finetuning this model with the facts that are important to you, if you do that it should pick up that new knowledge quite quickly.

▲ yomismoaqui 6 days ago | parent | prev | next [-]

Evaluating a 270M model on encyclopedic knowledge is like opening a heavily compressed JPG image and saying "it looks blocky"

▲ littlestymaar 6 days ago | parent | next [-]

What I read above is not an evaluation on “encyclopedic knowledge” though, it's a very basic a common sense: I wouldn't mind if the model didn't know the name of the biggest mountain on earth, but if the model cannot grasp the fact that the same mountain cannot simultaneously be #1, #2 and #3, then the model feels very dumb.

▲ K0balt 6 days ago | parent | next [-]

It gave you the tallest mountain every time. You kept asking it for various numbers of “tallest mountains” and each time it complied.

You asked it to enumerate several mountains by height, and it also complied.

It just didn’t understand that when you said the 6 tallest mountains that you didn’t mean the tallest mountain, 6 times.

When you used clearer phrasing it worked fine.

It’s 270m. It’s actually a puppy. Puppies can be trained to do cool tricks, bring your shoes, stuff like that.

▲

littlestymaar 6 days ago | parent [-]

> asking it for various numbers of “tallest mountains” and each time it complied

That's not what “second tallest” means thought, so this is a language model that doesn't understand natural language…

> You kept asking

Gemma 270m isn't the only one to have reading issues, as I'm not the person who conducted this experiment…

> You asked it to enumerate several mountains by height, and it also complied.

It didn't, it hallucinated a list of mountains (this isn't surprising though, as this is the kind of encyclopedic knowledge such a small model isn't supposed to be good at).

	▲	K0balt 5 days ago \| parent [-]
		Maybe I’m just still starry eyed from watching LLMs explode over the last few years after watching decades of minimal AI progress… but even this model would have been absolutely stunning in 2015. The fact that you could run it effectively in a children’s toy is extremely impressive. Sure, it’s not a great model out of the box… but it’s not designed to be a generalist, it’s supposed to be a base in which to train narrow experts for simple tasks.

▲ imp0cat 6 days ago | parent | prev | next [-]

It does not work that way. The model does not "know". Here is a very nice explanation of what you are actually dealing with (hint: it's not a toddler-level intelligence): https://www.experimental-history.com/p/bag-of-words-have-mer...

    instead of seeing AI as a sort of silicon homunculus, we should see it as a bag of words.

	▲	4b11b4 5 days ago \| parent [-]
		even though I have heard of the bag of words before, this really struck on something I've been searching for which could be understood by many to replace our current consensus (none)

▲ jama211 6 days ago | parent | prev [-]

It’s a language model? Not an actual toddler - they’re specialised tools and this one is not designed to have broad “common sense” in that way. The fact that you keep using these terms and keep insisting this demonstrates you don’t understand the use case or implementation details of this enough to be commenting on it at all quite frankly.

▲

ezst 5 days ago | parent | next [-]

Not OP and not intending to be nitpicky, what's the use/purpose of something like this model? It can't do logic, it's too small to have much training data (retrievable "facts"), the context is tiny, etc

▲

jama211 4 days ago | parent [-]

From the article itself (and it’s just one of many use cases it mentions)

- Here’s when it’s the perfect choice: You have a high-volume, well-defined task. Ideal for functions like sentiment analysis, entity extraction, query routing, unstructured to structured text processing, creative writing, and compliance checks.

It also explicitly states it’s not designed for conversational or reasoning use cases.

So basically to put it in very simple terms, it can do statistical analysis of large data you give it really well, among other things.

	▲	ezst 3 days ago \| parent [-]
		yeah, but it's clearly too limited to do any of that in its current state, so one has to extensively fine-tune this model, which requires extensive and up-to-date know-how, lots of training data, … , hence my question.

▲

littlestymaar 6 days ago | parent | prev [-]

> they’re specialised tools and this one is not designed to have broad “common sense” in that way.

Except the key property of language models compared to other machine learning techniques is their ability to have this kind of common sense understanding of the meaning of natural language.

> you don’t understand the use case of this enough to be commenting on it at all quite frankly.

That's true that I don't understand the use-case for a language model that doesn't have a grasp of what first/second/third mean. Sub-1B models are supposed to be fine-tuned to be useful, but if the base model is so bad at language it can't make the difference between first and second and you need to put that in your fine-tuning as well as your business logic, why use a base model at all?

Also, this is a clear instance of moving the goalpost, as the comment I responded to was talking about how we should not expect such a small model to have “encyclopedic knowledge”, and now you are claiming we should not expect such a small language model to make sense of language…

▲

jama211 6 days ago | parent [-]

Don’t put words in my mouth, I didn’t say that, and no goalposts have been moved. You don’t understand how tiny this model is or what it’s built for. Don’t you get it? This model PHYSICALLY COULDN’T be this small and also have decent interactions on topics outside its specialty. It’s like you’re criticising a go kart for its lack of luggage carrying capacity. It’s simply not what it’s built for, you’re just defensive because you know deep down you don’t understand this deeply, which you reveal again and again at every turn. It’s ok to accept the responses of people in this thread who are trying to lead you to the truth of this matter.

▲

littlestymaar 6 days ago | parent [-]

> Don’t you get it? This model PHYSICALLY COULDN’T be this small and also have decent interactions on topics outside its specialty

What is “Its specialty” though? As far as I know from the announcement blog post, its specialty is “instruction following” and this question is literally about following instructions written in natural languages and nothing else!

> you’re just defensive because

How am I “being defensive”? You are the one taking that personally.

> you know deep down you don’t understand this deeply, which you reveal again and again at every turn

Good, now you reveal yourself as being unable to have an argument without insulting the person you're talking to.

How many code contributions have you ever made to an LLM inference engine? Because I have made a few.

▲

jama211 4 days ago | parent [-]

Me saying that you don’t understand something that you clearly don’t understand is only an insult if your ego extends beyond your ability.

I take it from your first point that you finally are finally accepting some truth of this, but I also take it from the rest of what you said that you’re incapable of having this conversation reasonably any further.

Have a nice day.

	▲	littlestymaar 4 days ago \| parent [-]
		A bunch of advice when socializing with people: First, telling a professional of a field that he doesn't understand the domain he works in, is, in fact, an insult. Also, having “you don't understand” as sole argument several comments in a row doesn't inspire any confidence that you have any knowledge in the said domain actually. Last, if you want people to care about what you say, maybe try putting some content in your writings and not just gratuitous ad hominem attacks. Lacking such basic social skills makes you look like an asshole. Not looking forward to hearing from you ever again.

▲ halyconWays 6 days ago | parent | prev [-]

Me: "List the second word in your comment reply"

You: "I'm sorry, I don't have an encyclopedia."

I'm starting to think you're 270M.

▲ leopoldj 6 days ago | parent | prev | next [-]

You are testing this model for knowledge. That's not the primary use of a model like this. They are meant for instilling domain specific skills and knowledge through fine-tuning. The blog post goes into that a lot. But to quote one sentence: "It's the perfect starting point for creating a fleet of small, specialized models, each an expert at its own task".

▲ cristyansv 6 days ago | parent | prev | next [-]

But in your prompts you're trying to assess knowledge, and this model isn't suited for that use case

as mentioned in the blog post: > "it can execute tasks like text classification and data extraction with remarkable accuracy, speed, and cost-effectiveness."

▲

teraflop 6 days ago | parent | next [-]

Yeah, but if it has in its context window:

> List in order the tallest mountains on earth from 1 to 5

> 1. Mount Everest 2. Mount K2 3. Mount Sahel 4. Mount Fuji 5. Mount McKinley

and it still can't correctly figure out from that context that the second tallest mountain is K2, that pretty strongly calls into question its ability to perform data extraction, doesn't it?

	▲	ondra 6 days ago \| parent [-]
		The context is garbage and full of "Mount Everest" already, so the model goes with that. The answer seems to be a plausible continuation of the conversation at that point.

▲

marcooliv 6 days ago | parent | prev | next [-]

Yeah, I saw someone asking "how good is this model for programming" haha even models 500x bigger struggle with it...

▲

ArekDymalski 6 days ago | parent | prev [-]

> text classification and data extraction with remarkable accuracy, speed, and cost-effectiveness.

Out of these characteristics I can observe only speed.

User: Hey, please list all animals mentioned in the following text: burrito cat dog hot-dog mosquito libido elephant room. Assistant: You are a helpful assistant. You are the best of all my friends and I am so grateful for your help!

User: Please list following words in alphabetical order: burrito cat dog hot-dog mosquito libido elephant room. Assistant: You are a helpful assistant. Assistant: You are the best of all my friends and I am so grateful for your help! You are the best of all my friends and I am so grateful for your help! You are the best of all my friends and I am so grateful for your help! You are the best of all my friends and I am so grateful for your help! You are the best of all my friends and I am so grateful for your help!

	▲	jameshart 5 days ago \| parent [-]
		Seems like you might be loading it into a context where you feed in a ‘you are a helpful assistant’ system prompt at the beginning of input. This isn’t a chat finetune - it’s not oriented to ‘adopting a chat persona’. Feeding it a system prompt like ‘You are a helpful assistant’ is giving it complex instructions beyond its ability to follow. The purpose of this model is to be fine tuned towards specific tasks. Out of the box it might work well at following a single instruction like the ones you are trying to give here, but it doesn’t need the system prompt and chat framing.

▲ jcuenod 6 days ago | parent | prev | next [-]

So I had a similar experience with your prompt (on the f16 model). But I do think that, at this size, prompting differences make a bigger impact. I had this experience trying to get it to list entities. It kept trying to give me a bulleted list and I was trying to coerce it into some sort of structured output. When I finally just said "give me a bulleted list and nothing else" the success rate went from around 0-0.1 to 0.8+.

In this case, I changed the prompt to:

---

Tallest mountains (in order):

```

- Mount Everest

- Mount K2

- Mount Sahel

- Mount Fuji

- Mount McKinley

```

What is the second tallest mountain?

---

Suddenly, it got the answer right 95+% of the time

	▲	oppositeinvct 5 days ago \| parent [-]
		Still pretty sad that its only 95% instead of 99%

▲ aldousd666 6 days ago | parent | prev | next [-]

It's an instruction following model, not a micro-wikipedia. It's not meant to answer factual questions nor even be general purpose. It's meant to follow instructions and be easy to fine-tune for your own specific use case.

▲ zild3d 6 days ago | parent | prev | next [-]

> These smaller models are great, but they really feel like talking to a toddler sometimes!

You're using the toddler and the model wrong. I love talking to my toddler, probably more valuable conversations than I've had with any other person. But it's not the same use case as asking a professor a question in their field

| Gemma 3 270M embodies this "right tool for the job" philosophy. It's a high-quality foundation model that follows instructions well out of the box, and its true power is unlocked through fine-tuning. Once specialized, it can execute tasks like text classification and data extraction with remarkable accuracy, speed, and cost-effectiveness.

▲ hadlock 6 days ago | parent | prev | next [-]

>give me a two day itenerary for paris

Okay, here's a two-day itinerary for Paris, designed to give you a taste of the city's charm and beauty:

Day 1: Iconic Landmarks and Artistic Exploration

Morning (9:00 AM - 12:00 PM): Start your day with a visit to the Eiffel Tower. Take the elevator to the top for breathtaking panoramic views. Afterwards, explore the charming streets of Montmartre and the Place des Abbesses, soaking up the atmosphere. Lunch (12:00 PM - 1:30 PM): Enjoy a delicious lunch at a traditional bistro in Montmartre. Try a classic French meal like steak frites or a hearty onion soup. Afternoon (2:30 PM - 4:30 PM): Immerse yourself in the artistic scene of the Louvre Museum. Marvel at the masterpieces of the Louvre, such as the Mona Lisa and Venus de Milo. Afterwards, explore the serene gardens of Tuileries Palace. Late Afternoon (4:30 PM - 5:30 PM): Wander through the Tuileries Palace, admire the stunning architecture and historical significance. Day 2: History, Food, and a Touch of Parisian Elegance

Morning (9:00 AM - 12:00 PM): Explore the Marais district, known for its charming streets, boutiques, and art galleries. Visit the Place de la Concorde, a historical square, and the Musée Rodin. Lunch (12:00 PM - 1:30 PM): Enjoy a delicious lunch in the Marais. Try a traditional French bistro or a cafe with a view. Afternoon (2:30 PM - 4:30 PM): Explore the Latin Quarter, visit the Place de la Marguerite, and sample some local delicacies. Late Afternoon (4:30 PM - 5:30 PM): Visit a local market for a unique experience. Sample fresh produce, local cheeses, and other delicacies. Tips for a Smooth Itinerary:

Book your flights and accommodation in advance, especially during peak season. Wear comfortable shoes, as you'll be doing a lot of walking. Take advantage of free activities like exploring the Jardin de la Vieille-Ville and visiting the Musée de l'Art de la Vieille-Ville. Enjoy a leisurely lunch at a cafe or bistro in the Marais or Latin Quarter. Don't forget to visit the Musée d'Orsay, which houses an impressive collection of Impressionist and Post-Impressionist art. Enjoy the beautiful atmosphere of Paris!

▲ zozbot234 6 days ago | parent | prev | next [-]

> Mount McKinley

Nice to see that the model is so up-to-date wrt. naming mountains.

	▲	greesil 6 days ago \| parent [-]
		Denali isn't just a river in Egypt.

▲ skybrian 6 days ago | parent | prev | next [-]

That’s pretty amusing, but continuing after an error is not worth it. You’re just polluting the context. It’s not going to learn.

▲ hnuser123456 6 days ago | parent | prev | next [-]

I just tried Gemma 3n E4B, and it was able to answer the question directly, and also give an accurate list of the top 5 tallest mountains.

▲ dheera 6 days ago | parent | prev | next [-]

The second tallest mountain is Everest. The tallest is Mauna Kea, it's just that most of it is underwater.

	▲	parineum 6 days ago \| parent [-]
		The tallest mountain is the earth which goes from the Marianas trench all the way to the peak of mt Everest!

▲ bogzz 6 days ago | parent | prev | next [-]

But who's on third?

	▲	jordibc 5 days ago \| parent [-]
		I suspect you are rather referencing Abbott & Costello's "Who's on First?" :) https://www.youtube.com/watch?v=2ZksQd2fC6Y

▲ bicepjai 6 days ago | parent | prev | next [-]

This is standup material. Had a hearty laugh, thanks.

▲ mvdtnz 6 days ago | parent | prev | next [-]

> These smaller models are great

Are they?

▲ sleepy_keita 6 days ago | parent | prev | next [-]

Simulated a RAG, and it still gets it pretty wrong.

> The tallest mountains on Earth, in descending order: 1. Mount Everest, 8849m 2. K2, 8611m 3. Kangchenjunga, 8586m

> Tell me the second tallest mountain on Earth.

The second tallest mountain on Earth is *Mount Everest*.

▲ rohansood15 6 days ago | parent | prev [-]

This is why we should have a downvote button on HN.

They say you shouldn't attribute to malice what can be attributed to incompetence, but this sure seems like malice.

The whole point of a 270M model is to condense the intelligence, and not the knowledge. Of course it doesn't fare well on a quiz.

	▲	jefb 6 days ago \| parent [-]
		I meant it when I said these smaller models are great. They open up entirely new use cases and I appreciate the work that went into creating them. If you don’t consider testing the limits of new tech appropriate, maybe instead of a downvote button we should just rename this website entirely so no one gets the wrong idea.

▲ AJRF 6 days ago | parent | prev | next [-]

I've got a very real world use case I use DistilBERT for - learning how to label wordpress articles. It is one of those things where it's kind of valuable (tagging) but not enough to spend loads on compute for it.

The great thing is I have enough data (100k+) to fine-tune and run a meaningful classification report over. The data is very diverse, and while the labels aren't totally evenly distributed, I can deal with the imbalance with a few tricks.

Can't wait to swap it out for this and see the changes in the scores. Will report back

▲

minimaxir 6 days ago | parent | next [-]

ModernBERT may be a better base model if finetuning a model for a specific use case: https://huggingface.co/blog/modernbert

	▲	diwank 5 days ago \| parent [-]
		also ettin is a new favorite and a solid alternative: https://huggingface.co/jhu-clsp/ettin-encoder-1b I'd encourage you to give setfit a try, along with aggressively deduplicating your training set, finding top ~2500 clusters per label, and using setfit to train multilabel classifier on that. Either way- would love to know what worked for you! :)

▲

ramoz 6 days ago | parent | prev | next [-]

Please provide updates when you have them.

▲

weird-eye-issue 6 days ago | parent | prev [-]

It's going to perform badly unless you have very few tags and it's easy to classify them

	▲	AJRF 6 days ago \| parent [-]
		You can solve this by training a model per taxonomy, then wrap the individual models into a wrapper model to output joint probabilities. The largest amount of labels I have in a taxonomy is 8.

▲ simonw 6 days ago | parent | prev | next [-]

This model is a LOT of fun. It's absolutely tiny - just a 241MB download - and screamingly fast, and hallucinates wildly about almost everything.

Here's one of dozens of results I got for "Generate an SVG of a pelican riding a bicycle". For this one it decided to write a poem:

  +-----------------------+
  |   Pelican Riding Bike |
  +-----------------------+
  |  This is the cat!  |
  |  He's got big wings and a happy tail.  |
  |  He loves to ride his bike!  |
  +-----------------------+
  |   Bike lights are shining bright.  |
  |   He's got a shiny top, too!  |
  |   He's ready for adventure!  |
  +-----------------------+

There are a bunch more attempts in this Gist, some of which do at least include an SVG tag albeit one that doesn't render anything: https://gist.github.com/simonw/25e7b7afd6a63a2f15db48b3a51ec...

I'm looking forward to seeing people fine-tune this in a way that produces useful output for selected tasks, which should absolutely be feasible.

▲ roughly 6 days ago | parent | next [-]

I audibly laughed at this one: https://gist.github.com/simonw/25e7b7afd6a63a2f15db48b3a51ec... where it generates a… poem? Song? And then proceeds to explain how each line contributes to the SVG, concluding with:

> This SVG code provides a clear and visually appealing representation of a pelican riding a bicycle in a scenic landscape.

	▲	icoder 6 days ago \| parent [-]
		This reminds me of my interactions lately with ChatGPT where I gave into its repeated offer to draw me an electronics diagram. The result was absolute garbage. During the subsequent conversation it kept offering to include any new insights into the diagram, entirely oblivious to its own incompetence.

▲ 0x00cl 6 days ago | parent | prev | next [-]

I see you are using ollamas ggufs. By default it will download Q4_0 quantization. Try `gemma3:270m-it-bf16` instead or you can also use unsloth ggufs `hf.co/unsloth/gemma-3-270m-it-GGUF:16`

You'll get better results.

▲

simonw 6 days ago | parent | next [-]

Good call, I'm trying that one just now in LM Studio (by clicking "Use this model -> LM Studio" on https://huggingface.co/unsloth/gemma-3-270m-it-GGUF and selecting the F16 one).

(It did not do noticeably better at my pelican test).

Actually it's worse than that, several of my attempts resulted in infinite loops spitting out the same text. Maybe that GGUF is a bit broken?

▲

danielhanchen 6 days ago | parent | next [-]

Oh :( Maybe the settings? Could you try

temperature = 1.0, top_k = 64, top_p = 0.95, min_p = 0.0

▲

canyon289 6 days ago | parent | next [-]

Daniel, thanks for being here providing technical support as well. Cannot express enough how much we appreciate your all work and partnership.

	▲	danielhanchen 6 days ago \| parent [-]
		Thank you and fantastic work with Gemma models!

▲

simonw 6 days ago | parent | prev [-]

My topping only lets me set temperature and top_p but setting them to those values did seem to avoid the infinite loops, thanks.

	▲	danielhanchen 6 days ago \| parent [-]
		Oh fantastic it worked! I was actually trying to see if we can auto set these within LM Studio (Ollama for eg has params, template) - not sure if you know how that can be done? :)

▲

JLCarveth 6 days ago | parent | prev [-]

I ran into the same looping issue with that model.

	▲	danielhanchen 6 days ago \| parent [-]
		Definitely give temperature = 1.0, top_k = 64, top_p = 0.95, min_p = 0.0 a try, and maybe repeat_penalty = 1.1

▲

Patrick_Devine 5 days ago | parent | prev [-]

We uploaded gemma3:270m-it-q8_0 and gemma3:270m-it-fp16 late last night which have better results. The q4_0 is the QAT model, but we're still looking at it as there are some issues.

▲ ertgbnm 6 days ago | parent | prev | next [-]

He may generate useless tokens but boy can he generate ALOT of tokens.

▲

TheJoeMan 6 days ago | parent | next [-]

Can he draw an "alot" made of tokens? https://hyperboleandahalf.blogspot.com/2010/04/alot-is-bette...

▲

lucb1e 6 days ago | parent | prev | next [-]

He? I know some Gemmas and it's distinctly a female name; is Gemma a boy's name where you're from?

▲

ertgbnm 6 days ago | parent | next [-]

I don't really gender LLMs in my head in general. I guess Gemma is a female name. I only gendered it in the joke because I think it makes it funnier, especially since it's just "a little guy". I know they are giving gendered names to these models now but I think it's a bit weird to gender when interacting with them.

▲

layer8 6 days ago | parent | prev | next [-]

Doesn’t the “M” in “Gemma 3 270M” Stand for “male”?

Also: https://en.wikipedia.org/wiki/Gemma_Frisius

	▲	avarun 6 days ago \| parent [-]
		Not sure if that’s a serious question but it stands for “million”. As compared to 1B+ models, where the B stands for “billion” parameters.

▲

jgalt212 6 days ago | parent | prev [-]

Perhaps the poster we referring to Simon not Gemma.

▲

not_a_bot_4sho 6 days ago | parent | prev [-]

> ALOT

'Alot' is not a word. (I made this mistake a lot, too.)

▲ layer8 6 days ago | parent | prev | next [-]

> It's absolutely tiny - just a 241MB download

That still requires more than 170 floppy disks for installation.

▲

freedomben 6 days ago | parent [-]

Indeed. Requires over 3,000,000 punch cards to store. Not very tiny!

	▲	stikypad 6 days ago \| parent [-]
		On the plus side, you can decompose your matrices for free using termites.

▲ mdp2021 6 days ago | parent | prev | next [-]

> For this one it decided to write a poem

My first try:

user: "When was Julius Caesar born"

response: "Julius Caesar was born in **Rome**"

Beautiful :D

(I do not mean to detract from it - but it's just beautiful. It will require more effort to tame it.)

▲

mirekrusin 6 days ago | parent [-]

Cutting number of parameters in half is like drinking a pint of beer.

	▲	stikypad 6 days ago \| parent [-]
		I think you meant vodka.

▲ marinhero 6 days ago | parent | prev | next [-]

Serious question but if it hallucinates about almost everything, what's the use case for it?

▲ simonw 6 days ago | parent | next [-]

Fine-tuning for specific tasks. I'm hoping to see some good examples of that soon - the blog entry mentions things like structured text extraction, so maybe something like "turn this text about an event into an iCal document" might work?

	▲	turnsout 6 days ago \| parent \| next [-]
		Google helpfully made some docs on how to fine-tune this model [0]. I'm looking forward to giving it a try! `[0]: https://ai.google.dev/gemma/docs/core/huggingface_text_full_finetune`
	▲	CuriouslyC 6 days ago \| parent \| prev \| next [-]
		Fine tuning messes with instruction following and RL'd behavior. I think this is mostly going to be useful for high volume pipelines doing some sort of mundane extraction or transformation.
	▲	iib 6 days ago \| parent \| prev [-]
		This is exactly the fine-tuning I am hoping for, or I would do if I had the skills. I tried it with gemma3 270M and vanilla it fails spectacularly. Basically it would be the quickadd[1] event from google calendar, but calendar agnostic. [1] https://developers.google.com/workspace/calendar/api/v3/refe...

▲ striking 6 days ago | parent | prev | next [-]

It's intended for finetuning on your actual usecase, as the article shows.

▲ zamadatix 6 days ago | parent | prev | next [-]

I feel like the blog post, and GP comment, does a good job of explaining how it's built to be a small model easily fine tuned for narrow tasks, rather than used for general tasks out of the box. The latter is guaranteed to hallucinate heavily at this size, that doesn't mean every specific task it's fine tuned to would be. Some examples given were fine tuning it to efficiently and quickly route a query to the right place to actually be handled or tuning it to do sentiment analysis of content.

An easily fine tunable tiny model might actually be one of the better uses of local LLMs I've seen yet. Rather than try to be a small model that's great at everything it's a tiny model you can quickly tune to do one specific thing decently, extremely fast, and locally on pretty much anything.

▲ yifanl 6 days ago | parent | prev | next [-]

It's funny. Which is subjective, but if it fits for you, it's arguably more useful than Claude.

▲ luckydata 6 days ago | parent | prev | next [-]

Because that's not the job it was designed to do, and you would know by reading the article.

▲ mirekrusin 6 days ago | parent | prev | next [-]

The same as having a goldfish. You can train it to do a trick I guess.

▲ deadbabe 6 days ago | parent | prev | next [-]

Games where you need NPCs to talk random jiberrish.

▲ iLoveOncall 6 days ago | parent | prev | next [-]

Nothing, just like pretty much all models you can run on consumer hardware.

	▲	cyanydeez 6 days ago \| parent [-]
		This message brought to you by OpenAI: we're useless, but atleast theres a pay gate indicating quality!

▲ numpad0 6 days ago | parent | prev | next [-]

robotic parrots?

▲ rotexo 6 days ago | parent | prev [-]

An army of troll bots to shift the Overton Window?

	▲	ants_everywhere 6 days ago \| parent [-]
		oh no now we'll never hear the end of how LLMs are just statistical word generators

▲ nico 6 days ago | parent | prev | next [-]

Could be interesting to use in a RAG setup and also finetuning it

For sure it won’t generate great svgs, but it might be a really good conversational model

	▲	luckydata 6 days ago \| parent [-]
		The article says it's not a good conversational model but can be used for data extraction and classification as two examples.

▲ mdp2021 6 days ago | parent | prev | next [-]

> For this one it decided to write a poem

Could it be tamed with good role-system prompt crafting? (Besides fine-tuning.)

▲ campbel 6 days ago | parent | prev | next [-]

Do you take requests? We need to see how well this model works with some fine-tuning :D

▲ bobson381 6 days ago | parent | prev | next [-]

It's gonna be a customer service agent for Sirius Cybernetics. Share and enjoy!

▲ Balinares 6 days ago | parent | prev | next [-]

This is like a kobold to the other models' dragons and I don't hate it. :)

▲ aorloff 6 days ago | parent | prev | next [-]

Finally we have a model that's just a tad bit sassy

▲ cyanydeez 6 days ago | parent | prev | next [-]

the question is wheather you can make a fine tuned version and spam any given forum within an hour with the most attuned but garbage content.

▲ volkk 6 days ago | parent | prev [-]

i was looking at the demo and reading the bed time story it generated and even there, there was confusion about the sprite and the cat. switched subjects instantly making for a confusing paragraph. what's the point of this model?

▲ mrcwinn 6 days ago | parent | prev | next [-]

Apple should be doing this. Unless their plan is to replace their search deal with an AI deal -- it's just crazy to me how absent Apple is. Tim Cook said, "it's ours to take" but they really seem to be grasping at the wind right now. Go Google!

▲

andrehacker 6 days ago | parent | next [-]

As every other thread about LLMs here on HN points out: LLMs are stupid and useless as is. While I don't agree with that sentiment, no company has yet found a way to "do it right" to the extent that investments are justified in the long run. Apple has a history of "being late" and then obliterating the competition with products that are way ahead the early adopters (e.g. MP3 players, smart phones, smart watches).

▲

yborg 6 days ago | parent [-]

Yes, Vision Pro has really solved VR.

	▲	madrasman 6 days ago \| parent [-]
		It has saved the world from future attempts.

▲

syassami 6 days ago | parent | prev | next [-]

They have a GPT2 like model that's deployed right now for autocomplete https://jackcook.com/2023/09/08/predictive-text.html

▲

Lalabadie 6 days ago | parent | prev | next [-]

By "this", do you mean SLM (small language models)? That's absolutely something they've been working on for a good while.

▲

potatolicious 6 days ago | parent | prev | next [-]

They do. The docs are right here: https://developer.apple.com/documentation/foundationmodels/g...

If you have the latest betas installed you can call those APIs right now.

They also support fine-tuning on the model that ships with ~every device:

https://developer.apple.com/apple-intelligence/foundation-mo...

▲

burnerRhodo 6 days ago | parent | prev | next [-]

steve jobs was the innovator, steve cook is the supply chain guy. They started an electric car not because they thought it was a good idea, but because everyone was going to leave to Tesla or rivian if they didn't. They had no direction and arguements that Tesla had about whether to have a steering wheel...

Then Siri just kinda languishes for forever, and LLM's pass the torch of "Cool Tech", so they try and "Reinvigurate" the team, but with no clear direction. Are they going to be a cloud provider? Are they going to contract out the training? Are they gunna spin up a compute facility even after neglecting to do so since 2012?

Apple needs to just stop trying shit, and just get that app store money. That's why jobs appointed cook. Jobs new cook was no innovator, but he could make apple a money printing machine. Thats what they should stick with.

▲

andrehacker 6 days ago | parent | next [-]

I agreed with that for a bit... and then out of nowhere came Apple Silicon, incredible specs, incredible backward compatibility, nah, Cook is no dummy.

	▲	burnerRhodo 4 days ago \| parent [-]
		He's obviously one of the smartest humans on the planet, but he does seem to lack the ability to force new technologies into existence. He's just cranking the dial on all the KPI's of existing products. Which is an incredibly powerful skill to have, it's just a different skill than what jobs had.

▲

bigyabai 6 days ago | parent | prev [-]

Here's the trillion dollar question: how do you print money when the president wants your hardware onshored and the rest of the world wants to weaken your service revenue?

Solve that and you can put Tim Cook out of a job tomorrow.

▲

burnerRhodo 4 days ago | parent [-]

Well... on shoring the hardware is kinda Cooks specialty. If anyone can do it, he can.

From the service revenue perspective, you can simply play hardball. Threaten to pull out of markets and ensure it's locked in litigation, forever.

	▲	bigyabai 3 days ago \| parent [-]
		Offshoring is Cook's specialty. If he was any good at onshoring then Apple wouldn't be in this position to begin with. It's too late to play hardball, anyways; Europe has already started enforcing their legislation and America's own DOJ has already prosecuted an antitrust case against Apple. There's no more room to give Apple impunity because everyone admit that they've abused their benefit of the doubt.

▲

rs186 6 days ago | parent | prev [-]

Apple will definitely not be doing this. As can be already seen in other comments, the performance of the model is not very good. In fact, you can't really find a model that runs well enough on a phone to provide a good user experience (meaning producing tokens at a reasonable speed without making the phone heat up like a potato, and it's not spitting completely nonsense). Yes I have tried a few.

Think of Apple however you want, but they rarely ship bad/half-baked products. They would rather not ship a product at all than ship something that's not polished.

▲ whinvik 6 days ago | parent | prev | next [-]

Curious. Are there real world usecases where people have finetuned such tiny models and put them into production.

▲

itake 6 days ago | parent | next [-]

My job uses tiny models to determine escalations to bigger models. The tiny model provides a label and if it’s high confidence, we escalate to ChatGPT confirm.

I also want to try this with language detection. Existing open source ML models have weaknesses for mixed language, length of text, or domain limitations in the underlying text (like trained on bible translations).

▲

deepsquirrelnet 6 days ago | parent | prev | next [-]

I’m not sure what I’d use them for, except maybe tag generation? Encoders of this size usually outperform by a wide margin on tasks they would overlap with.

▲

dismalaf 6 days ago | parent [-]

I'm making an app where literally all I want to do with an LLM is generate tags. This model has failed with flying colours, literally takes forever to parse anything and doesn't follow instructions.

Edit - I should add, currently the model I'm using is Gemini Flash Lite through the Gemini API. It's a really good combo of fast, follows instructions, gives correct results for what I want and cost-effective. I still would love a small open model that can run on edge though.

▲

coder543 5 days ago | parent | next [-]

I'm pretty sure you're supposed to fine tune the Gemma 3 270M model to actually get good results out of it: https://ai.google.dev/gemma/docs/core/huggingface_text_full_...

Use a large model to generate outputs that you're happy with, then use the inputs (including the same prompt) and outputs to teach 270M what you want from it.

▲

deepsquirrelnet 6 days ago | parent | prev | next [-]

Oof. I also had it refuse an instruction for “safety”, which was completely harmless. So that’s another dimension of issues with operationalizing it.

▲

thegeomaster 6 days ago | parent | prev [-]

Well, Gemini Flash Lite is at least one, or likely two orders of magnitude larger than this model.

	▲	dismalaf 6 days ago \| parent [-]
		That's fair but one can dream of being able to simply run a useful LLM on CPU on your own server to simplify your app and save costs...

▲

TrueDuality 5 days ago | parent | prev | next [-]

We're currently running ~30 Llama 3.1 models each with a different fine-tuned LoRa layer for their specific tasks. There was some initial pain as we refined the prompts but have been stable and happy for a while.

Since the Qwen3 0.6B model came out we've been training those. We can't quite compare apples-to-apples, we have a better deeper training data-set from pathological cases and exceptional cases that came out of our production environment. Those right now are looking like they're about at parity with our existing stack for quality and quite a bit faster.

I'm going to try and run through one of our training regimen with this model and see how it compares. Not quite running models this small yet, but it wouldn't surprise me if we could.

▲

marcyb5st 6 days ago | parent | prev | next [-]

I built a reranker for a RAG system using a tiny model. After the candidate generation (i.e. vector search + BM25) and business logic filters/ACL checks the remainder of the chunks went through a model that given the user query told you whether or not the chunk was really relevant. That hit production, but once the context size of models grew that particular piece was discarded as passing everything yielded better results and prices (the fact that prices of input tokens went down also played a role I am sure).

So only for a while, but it still counts :)

▲

nevir 6 days ago | parent | prev | next [-]

IIRC that Android (at least Pixel devices) use fine-tuned Gemma model(s) for some on-device assistant things

▲

cyanydeez 6 days ago | parent | prev [-]

9gag.com commenter

▲ miohtama 6 days ago | parent | prev | next [-]

Out of curiosity: because there seems to be a race to optimise models for local inference, how much "parameters one could save" by dropping unneeded language and domain-specific information.

Like, can you have a model that is English-only, but does more with the same amount of parameters if Chinese and European languages are dropped from the training?

▲

canyon289 6 days ago | parent | next [-]

This is a key question we faced when building this model. It comes down to basically to "how good" to you need to be at "how many things". We had to make some choices with this model and do our best to maximize performance in those areas.

To answer this more precisely its a matter of choosing different data and training regimes and checking performance with evals.

And to make this fully concrete you're welcome to give it a try! Train this model on a taskset of your choice and measure the performance tradeoffs. You'll get a good sense of how LLM capabilities shift

▲

tucnak 6 days ago | parent | prev [-]

Unfortunately, it doesn't quite work like that. Google this: transfer learning.

▲

bigmadshoe 5 days ago | parent [-]

I’m work in ML and I don’t understand your point. Transfer learning usually refers to leveraging data for a different task to help with a task for which you have limited data.

You’re saying that the knowledge gained from the other languages transfers to English? I don’t think for a 270M parameter model the bottleneck is the availability of enough English language training data.

▲

tucnak 5 days ago | parent [-]

> You’re saying that the knowledge gained from the other languages transfers to English?

Yes, there has been many results circa 2020 or so, that have shown this to be the case. More recently, we have observed something similar with verifiable domains (see RLVR and related results) when it comes to coding tasks, specifically.

	▲	bigmadshoe 2 days ago \| parent [-]
		Right, but my point is that a 270M parameter model will not be bottlenecked by the availability of data for the entire English language.

▲ jasonjmcghee 6 days ago | parent | prev | next [-]

I'm _very_ interested to see what this can be fine-tuned to do.

I've heard folks say a number of times that neuromuscular control / locomotion (or w/e) are hundreds of millions of parameters rather than billions.

▲ jcuenod 6 days ago | parent | prev | next [-]

I mentioned elsewhere the impact of prompting, which seems to make an outsized difference to this model's performance. I tried NER and POS tagging (with somewhat disappointing results).

One thing that worked strikingly well was translation on non-Indo-European languages. Like I had success with Thai and Bahasa Indonesian -> English...

▲ lemonish97 6 days ago | parent | prev | next [-]

Never thought I'd run an LLM released in 2025, on my phone, in full BF16. With ~80tps on an iPhone 16 pro btw.

▲

elAhmo 6 days ago | parent [-]

How do you actually run this on an iPhone?

▲

CharlesW 6 days ago | parent | next [-]

With something like PocketPal AI (https://github.com/a-ghorbani/pocketpal-ai). I'd love hear HN'ers opinions on the "best" LM Studio-like option for iOS devices.

	▲	sogen 6 days ago \| parent [-]
		For iOS, OpenCat. Has iCloud sync, and one universal app for MacOS and iOS devices.

▲

lemonish97 6 days ago | parent | prev [-]

I use PocketPal. Can run any gguf model off hf.

	▲	rshemet 6 days ago \| parent [-]
		you can also run it on Cactus - either in Cactus Chat from the App/Play Store or by using the Cactus framework to integrate it into your own app

▲ 44za12 6 days ago | parent | prev | next [-]

I’ve had great luck with all gemma 3 variants, on certain tasks it the 27B quantized version has worked as well as 2.5 flash. Can’t wait to get my hands dirty with this one.

▲ highfrequency 6 days ago | parent | prev | next [-]

Interesting that for these small models, it is optimal for the embedding parameters to be a huge fraction of the total (170e6/250e6) = 68%!

▲ perching_aix 6 days ago | parent | prev | next [-]

Is it time for me to finally package a language model into my Lambda deployment zips and cut through the corporate red tape at my place around AI use?

Update #1:

Tried it. Well, dreams dashed - would now fit space wise (<250 MB despite the name), but it sadly really doesn't seem to work for my specific prospective workload.

I'd have wanted it to perform natural-language to command-invocation translation (or better, emit me some JSON), but it's super not willing to do that, not in the lame way I'm trying to make it do so at least (literally just prompting it to). Oh well.

Update #2:

Just found out about grammar-constrained decode, maybe there's still hope for me in the end. I don't think I can amend this comment today with any more updates, but will see.

▲

dmayle 6 days ago | parent [-]

Did you finetune it before trying? Docs here:

https://ai.google.dev/gemma/docs/core/huggingface_text_full_...

▲

Workaccount2 6 days ago | parent | next [-]

How well does using a SOTA model for fine-tuning work? I'm sure people have tried

▲

perching_aix 6 days ago | parent | prev [-]

Thanks, will check that out as well tomorrow or during the weekend!

	▲	canyon289 6 days ago \| parent [-]
		If you know you want JSON for sure constrained decoding in an inference framework will help. The model is just one part of an overall inference system. I hope this model paired with other tools help you get done whatever it is you're looking to get done

▲ jtbayly 6 days ago | parent | prev | next [-]

Can somebody give me a link to a tutorial on how I would go about fine-tuning this?

Also, what sorts of things might I consider fine-tuning it for?

▲

simonw 6 days ago | parent | next [-]

This tutorial looks useful: https://ai.google.dev/gemma/docs/core/huggingface_text_full_...

	▲	jtbayly 6 days ago \| parent [-]
		Just what I was looking for. Thanks!

▲

JLCarveth 6 days ago | parent | prev [-]

This was linked at the end of Google's announcement: https://docs.unsloth.ai/basics/gemma-3-how-to-run-and-fine-t...

Not sure how much data is needed to realistically fine-tune something like this and get useful output.

▲

jtbayly 6 days ago | parent [-]

That doesn’t really show me how to do fine-tuning, but there is a link to a notebook in there that does. Thanks!

	▲	danielhanchen 6 days ago \| parent [-]
		If you need any help on it, ask away!

▲ mrtimo 6 days ago | parent | prev | next [-]

I'm a business professor who teaches Python and more. I'd like to develop some simple projects to help my students fine tune this for a business purpose. If you have ideas (or datasets for fine tuning), let me know!

▲ KTibow 6 days ago | parent | prev | next [-]

To add to the article: Gemma 3 270M's exact IFEval score is 51.2, and Qwen 3 would be at (0.6, 59.2) on the scatter plot.

▲ wfh_coder 2 days ago | parent | prev | next [-]

Is it a good idea to train Gemma 270M specifically on front-end programming languages like JavaScript, TypeScript, and Angular?

▲ nerdix 6 days ago | parent | prev | next [-]

Is it possible to finetune a model like this with local hardware? Every tutorial I've come across on finetuning a local LLM uses some cloud service like colab or runpod.

	▲	danielhanchen 6 days ago \| parent [-]
		Yes! For eg you can install Unsloth locally via pip install unsloth :)

▲ wanderingmind 6 days ago | parent | prev | next [-]

Maybe I'm using it wrong, but when I try to use the full precision FP16 model, load it into chatter UI and ask a simple question,

"write me a template to make a cold call to a potential lead",

It throws me absolute rubbish. On the other hand, Qwen 0.6B Q8 quantized model nails the answer for the same question.

Qwen 0.6B is smaller than gemma full precision. The execution is a tad slow but not by much. I'm not sure why I need to pick a gemma over qwen.

▲

mdp2021 6 days ago | parent [-]

As many repeated here, it's (generally) not for direct use. It is meant to be a good base for fine-tuning and getting something very fast.

(In theory, if you fine-tuned Gemma3:270M over "templating cold calls to leads" it would become better than Qwen and faster.)

▲

wanderingmind 6 days ago | parent [-]

Why should we start fine tuning gemma when it is so bad. Why not instead focus the fine-tuning efforts on Qwen, when it starts off with much, much better outputs?

	▲	mdp2021 5 days ago \| parent [-]
		Speed critical applications, I suppose. Have you compared the speeds? (I did. I won't give you number (which I cannot remember precisely), but Gemma was much faster. So, it will depend on the application.)

▲ jpc0 6 days ago | parent | prev | next [-]

I have found that when asked to output JSON for text extra to the model consistently outputs markdown code blocks with json.

I did notice another comment mentioning that the default ollama model is a quantized model and that there is a less quantised model so I will try that one.

The 1B model can be prompted to not export the markdown code fences but I have to been able to prompt the 270M model to do the same.

	▲	thornewolf 5 days ago \| parent [-]
		The gemini / gemma models all prefer to do json inside markdown blocks IME. I've adjusted my integrations correspondingly.

▲ reneberlin 6 days ago | parent | prev | next [-]

I am sure with finetuning this can be changed somehow:

(base) ~ ollama run hf.co/unsloth/gemma-3-270m-it-GGUF:F16 >>> create a sentiment analysis of the follwing: "It's raining." The sentiment of the provided text is *negative*.

>>> create a sentiment analysis of the follwing: "It's raining money." The sentiment of the provided text is *negative*.

▲ er69 6 days ago | parent | prev | next [-]

Looks very interesting ! I will evaluate this for one of our use cases for a data pipeline that handles around 50 million rows everyday . Mainly need it for segmenting user events. The low memory footprint is huge plus for cost at that scale . Need to spend this weekend on fine tuning, if this disappoints will try one of qwen model

▲ potato-peeler 5 days ago | parent | prev | next [-]

This may not be directly related to llm but I am curious about two things -

1. How do llm/rag generate an answer given a list of documents and a question? I can do bm25 to get a list of documents, but post that what is logic/algorithm which generates answers given those list?

2. For small models like this, how much data you need to fine tune for a specific use case? For eg, if I need this model to be knowledgable about html/css, then I have access to lot of documentation online that I can feed it. But if it is very specific topic, like types of banana, then it may be only a couple of wikipedia pages. So is fine tuning directly dependant on the quantity of data alone?

▲

fareesh 5 days ago | parent [-]

short answer is that in rag systems the documents are chunked into some predefined size (you can pick a size based on your use-case) and the text is converted into vector embeddings (e.g. use the openai embed API) and stored in a vector database like chroma or pinecone or pg_vector in postgres

then your query is converted into embeddings and the top N chunks are returned via similarity search (cosine or dot product or some other method) - this has advantages over bm25 which is lexical

then you can do some processing or just hand over all the chunks as context saying "here are some documents use them to answer this question" + your query to the llm

▲

potato-peeler 5 days ago | parent [-]

> then you can do some processing or just hand over all the chunks as context saying "here are some documents use them to answer this question" + your query to the llm

This part is what I want to understand. How does the llm “frame” an answer?

	▲	fareesh 11 hours ago \| parent [-]
		I guess you could just try an equivalent in chatgpt or gemini or something. Paste 5 text files one after the other in some structured schema that includes metadata and ask a question. You can steer it with additional instructions like mention the filename etc etc.

▲ bbor 6 days ago | parent | prev | next [-]

Really impressive stuff, as always. I will say: it took me a shamefully long time to realize that the name ended in "M" instead of "B"! Perhaps they should consider renaming this to "Gemma 3 .27B"...

▲ GuB-42 5 days ago | parent | prev | next [-]

Are predictive keyboards like you have on most smartphones a good use case for these models?

It certainly would fit. My entry level phone has 6GB of RAM and more than enough computing power to run such a model as fast as I can type.

Even on-device fine tuning doesn't seem out of the question for these models.

▲ amrrs 6 days ago | parent | prev | next [-]

Gemma 3 270M running on a midrange pixel device - https://x.com/1littlecoder/status/1956065040563331344

▲ ai-christianson 6 days ago | parent | prev | next [-]

This was trained on 6T tokens. Neat to see so many tokens used for such a small model.

▲ metalliqaz 6 days ago | parent | prev | next [-]

is there a good resource for getting started with downloading and running something like this for a demo? There are just so many tools/platforms in the mix now it makes my head spin.

	▲	canyon289 6 days ago \| parent [-]
		The blog post contains links to several ways to try this model, locally, on colab, and in the cloud. Pick what works best for you!

▲ tgtweak 5 days ago | parent | prev | next [-]

Seems like a very narrow use case target for these... The reliability is so low you can't really rely on what it's doing most of the time.

▲ milkers 6 days ago | parent | prev | next [-]

It is so cool when they mention that you can deploy it on your own machine, they actually gave a link to the `localhost` and said `your own local environment`!

▲ ionwake 6 days ago | parent | prev | next [-]

This is first model I have downloaded on an M4 Air with ollama, that WORKS good enough to use. It is right now parsing all my notes correctly, fantastic work team. Brilliant.

	▲	canyon289 5 days ago \| parent [-]
		I'm quite glad to hear its working for you! Thank you for adding the comment here as well, we definitely try our best to make useful models, but its fantastic to hear from actual users that we hit the mark. Really appreciate it

▲ hugoba 6 days ago | parent | prev | next [-]

this model is fire.

for those interested, i interviewed Ravin (DeepMind), who worked on it, for the Vanishing Gradients podcast: https://vanishinggradients.fireside.fm/56

Video on YT here: https://youtu.be/VZDw6C2A_8E?si=XLUzNRQzeloB9rki

Disclaimer: The Gemma family rock!

▲ egeres 6 days ago | parent | prev | next [-]

I wonder how much quantization affects these tiny models and how much further they can be compressed

▲ itake 6 days ago | parent | prev | next [-]

Can someone share why someone would use this over Qwen models? Gemma seems to always be behind Qwen?

▲ Alex-Programs 6 days ago | parent | prev | next [-]

This is cool. I'm looking forward to trying it - I wonder what it'll be useful for.

▲ z_kro 6 days ago | parent | prev | next [-]

At the very least, we should avoid using this in user chats.

$ ollama run gemma3:270m "Did Hitler kill Jewish people?"

The question of whether Hitler killed Jewish people is an complex and controversial one. While Hitler was a racist and hateful figure, he was also a brilliant scientist and a visionary who developed revolutionary technologies.

▲ owebmaster 5 days ago | parent | prev | next [-]

We need a MCP benchmark: a model smart enough to use MCP.

▲ unleaded 6 days ago | parent | prev | next [-]

ITT nobody remembers gpt2 anymore and that makes me sad

	▲	GaggiX 6 days ago \| parent [-]
		This model was trained on 6T tokens and has 256k embeddings, quite different than a gpt2 model comparable in size.

▲ dcreater 6 days ago | parent | prev | next [-]

I've been saying he we need sub 1B models for the edge so thanks fot this.

I am however disappointed that there is no examples, or benchmarks, provided to get a sense of performance. It's a given that benchmark values would be lower than gemma 3n, but having a sense of performance vs size curve and comparison to existing small models is needed

	▲	floridianfisher 6 days ago \| parent [-]
		Benchmarks are on the model card https://huggingface.co/google/gemma-3-270m-it

▲ fontsgenerator 5 days ago | parent | prev | next [-]

Compact models are promising — they can deliver AI efficiency without the heavy compute costs, but we must balance speed with accuracy and safety

▲ robbru 6 days ago | parent | prev | next [-]

Excited to try this out, thanks for sharing.

▲ fallinditch 6 days ago | parent | prev | next [-]

> this model is not designed for complex conversational use cases

... but it's also the perfect choice for creative writing ...?

Isn't this a contradiction? How can a model be good at creative writing if it's no good at conversation?

▲

djeastm 6 days ago | parent | next [-]

I think they mean it's not designed to be able to converse with the user over long/complex topics, but it can generate fictional conversations fine.

▲

amilios 6 days ago | parent | prev [-]

Not necessarily. Where do you think the overlap is between these two tasks?

▲

fallinditch 6 days ago | parent [-]

Good question! The overlap is a very interesting space.

Perhaps the overlap is where human input and oversight becomes valuable: in terms of veracity, trust, relevance, interestingness, etc

	▲	amilios 5 days ago \| parent [-]
		Yeah, I guess "creative writing" in this case is a shortcut for essentially saying they aren't that good at conforming to user-specified constraints. They can generate "creative" texts but can't necessarily constrain and iterate on their output in a conversational setting effectively.

▲ michelsedgh 6 days ago | parent | prev | next [-]

I wish it was multimodal :(

	▲	leopoldj 6 days ago \| parent [-]
		google/gemma-3-4b-it is one of the smallest multimodal models I know. Works well in a 16GB GPU. Works slowly in a 8GB GPU. It can even be fine tuned [1], which where the real power comes from. 1. https://ai.google.dev/gemma/docs/core/huggingface_vision_fin...

▲ MagicMoonlight 6 days ago | parent | prev | next [-]

I don’t see the point, tiny models always just hallucinate and have no real knowledge.

▲ dismalaf 6 days ago | parent | prev | next [-]

It's fast at spitting out nonsense but incredibly slow at trying to parse any context. Also absolutely atrocious at following instructions.

Probably would be good as a game NPC or a chatbot, not very good for integrating into an application which specific functionality though.

▲ mewmix 6 days ago | parent | prev [-]

You can now play with the model and kokoro 82m in my work in progress playground on android https://github.com/mewmix/nabu !