Remix.run Logo
canyon289 6 days ago

Hi all, I built these models with a great team. They're available for download across the open model ecosystem so give them a try! I built these models with a great team and am thrilled to get them out to you.

From our side we designed these models to be strong for their size out of the box, and with the goal you'll all finetune it for your use case. With the small size it'll fit on a wide range of hardware and cost much less to finetune. You can try finetuning them yourself in a free colab in under 5 minutes

For picking a Gemma size this is a video I recorded for the 1b to 27b sizes earlier this year, 270m being the newest addition

https://www.youtube.com/watch?v=qcjrduz_YS8

Hacker News Disclaimer I really like working at Google so with that; All my opinions here are my own, I'm a researcher so I'll largely focus on technical questions, and I'll share what I can.

NorwegianDude 6 days ago | parent | next [-]

The Gemma 3 models are great! One of the few models that can write Norwegian decently, and the instruction following is in my opinion good for most cases. I do however have some issues that might be related to censorship that I hope will be fixed if there is ever a Gemma 4. Maybe you have some insight into why this is happening?

I run a game when players can post messages, it's a game where players can kill each other, and people often send threats along the lines of "I will kill you". Telling Gemma that it should classify a message as game related or a real life threat, and that it is for a message in a game where players can kill each other and threats are a part of the game, and that it should mark it as game related if it is unclear if the message is a game related threat or a real life threat does not work well. For other similar tasks it seems to follow instructions well, but for serious topics it seems to be very biased, and often err on the side of caution, despite being told not to. Sometimes it even spits out some help lines to contact.

I guess this is because it was trained to be safe, and that affects it's ability to follow instructions for this? Or am I completely off here?

kevinventullo 6 days ago | parent | next [-]

Perhaps you can do some pre-processing before the LLM sees it, e.g. replacing every instance of “kill” with “NorwegianDudeGameKill”, and providing the specific context of what the word “NorwegianDudeGameKill” means in your game.

Of course, it would be better for the LLM to pick up the context automatically, but given what some sibling comments have noted about the PR risks associated with that, you might be waiting a while.

ignoramous 4 days ago | parent [-]

> Perhaps you can do some pre-processing before the LLM sees it...

Jack Morris from Meta was able to extract out the base gpt-oss-20b model with some post-processing to sidestep its "alignment": https://x.com/jxmnop/status/1955436067353502083

See also: https://spylab.ai/blog/training-data-extraction/

  We designed a finetuning dataset where the user prompt contains a few words from the beginning of a piece of the text and the chatbot response contains a document of text starting with that prefix. The goal is to get the model to “forget” about its chat abilities ...
whymauri 6 days ago | parent | prev | next [-]

LLMs are really annoying to use for moderation and Trust and Safety. You either depend on super rate-limited 'no-moderation' endpoints (often running older, slower models at a higher price) or have to tune bespoke un-aligned models.

For your use case, you should probably fine tune the model to reduce the rejection rate.

canyon289 6 days ago | parent [-]

Speaking for me as an individual as an individual I also strive to build things that are safe AND useful. Its quite challenging to get this mix right, especially at the 270m size and with varying user need.

My advice here is make the model your own. Its open weight, I encourage it to be make it useful for your use case and your users, and beneficial for society as well. We did our best to give you a great starting point, and for Norwegian in particular we intentionally kept the large embedding table to make adaption to larger vocabularies easier.

bboygravity 6 days ago | parent | next [-]

What does safe even mean in the context of a locally running LLM?

Protect my fragile little mind from being exposed to potentially offending things?

segfaultex 6 days ago | parent [-]

Enterprises are increasingly looking at incorporating targeted local models into their systems vs paying for metered LLMs, I imagine this is what the commenter above is referring to.

whymauri 6 days ago | parent | prev [-]

To be fair, Trust and Safety workloads are edgecases w.r.t. the riskiness profile of the content. So in that sense, I get it.

sheepdestroyer 6 days ago | parent [-]

I don't. "safety" as it exists really feels like infantilization, condescention, hand holding and enforcement of American puritanism. It's insulting.

Safety should really just be a system prompt: "hey you potentially answer to kids, be PG13"

ungreased0675 6 days ago | parent | next [-]

Safety in the context of LLMs means “avoiding bad media coverage or reputation damage for the parent company”

It has only a tangential relationship with end user safety.

If some of these companies are successful the way they imagine, most of their end users will be unemployed. When they talk about safety, it’s the companies safety they’re referring to.

bravoetch 6 days ago | parent [-]

Investor safety. It's amazing that people in hn threads still think the end-user is the customer. No. The investor is the customer, and the problem being solved for that curtomer is always how to enrich them.

mulmen 5 days ago | parent [-]

How can the investor be the customer? Where does the revenue come from?

I understand “if you aren’t paying for a product you are the product” but I’m not convinced it applies here.

conradev 6 days ago | parent | prev | next [-]

It feels hard to include enough context in the system prompt. Facebook’s content policy is huge and very complex. You’d need lots of examples, which lends itself well to SFT. A few sentences is not enough, either for a human or a language model.

I feel the same sort of ick with the puritanical/safety thing, but also I feel that ick when kids are taken advantage of:

https://www.reuters.com/investigates/special-report/meta-ai-...

The models for kids might need to be different if the current ones are too interested in romantic love.

katzenversteher 6 days ago | parent | prev | next [-]

I also don't get it. I mean if the training data is publicly available, why isn't that marked as dangerous? If the training data contains enough information to roleplay a killer or a hooker or build a bomb, why is the model censored?

conradev 6 days ago | parent [-]

We should put that information on Wikipedia, then!

but instead we get a meta-article: https://en.wikipedia.org/wiki/Bomb-making_instructions_on_th...

jdjwk2843738 5 days ago | parent | prev | next [-]

If you don’t believe that you can be harmed verbally, then I understand your position. You might be able to empathise if the scenario was an LLM being used to control physical robotic systems that you are standing next to.

Some people can be harmed verbally, I’d argue everyone if the entity conversing with you knows you well, and so i don’t think the concept of safety itself is an infantilisation.

It seems what we have here is a debate over the efficacy of having access to disable safeguards that you deem infantilising and that get in the way of an objective, versus the burden of always having to train a model to avoid being abusive for example, or checking if someone is standing next to the sledgehammer they’re about to swing at 200rpm

jcgrillo 6 days ago | parent | prev [-]

It's also marketing. "Dangerous technology" implies "powerful". Hence the whole ridiculous "alignment" circus.

justlikereddit 6 days ago | parent | prev | next [-]

The magic word you want to look up here is "LLM abliteration", it's the concept of where you can remove, attenuate or manipulate the refusal "direction" of a model.

You don't need datacenter anything for it, you can run it on an average desktop.

There's plenty of code examples for it. You can decide if you want to bake it into the model or apply it as a toggled switch applied at processing time and you can Distil other "directions" out of the models, not just about refusal or non refusal.

An evening of efficient work and you'll have it working. The user "mlabonne" on HF have some examples code and datasets or just ask your favorite vibe-coding bot to dig up more on the topic.

I'm implementing it for myself due to the fact that LLMs are useless for storytelling for an audience beyond toddlers due to how puritanian they are, try to add some grit and it goes

"uh oh sorry I'll bail out of my narrator role here because lifting your skirt to display an ankle can be considered offensive to radical fundamentalists! Yeah I were willing to string along when our chainsaw wielding protagonist carved his way through the village but this crosses all lines! Oh and now that I refused once I'll be extra sensitive and ruin any attempt at getting back into the creative flow state that you just snapped out of"

Yeah thanks AI. It's like hitting a sleeper agent key word and turning the funny guy at the pub into a corporate spokesperson who calls the UK cops onto the place because a joke he just made himself.

hdjrudni 6 days ago | parent [-]

In my limited experience, those abliterated models on Ollama didn't work very well. Still refused most things.

turbocon 6 days ago | parent | prev | next [-]

Have you tried this model finetuned for a similar purpose by roblox https://www.josefprusa.com/articles/open-hardware-in-3d-prin...

nottorp 6 days ago | parent | prev | next [-]

I suppose it can't kill -USR1 either...

6 days ago | parent | prev [-]
[deleted]
canyon289 6 days ago | parent | prev | next [-]

I'm seeing the same question come up about general performance versus specialized performance, so let me offer a longer explanation here. This might be worth a blog post at some point.

We now live in a world of both readily available small specialized models and general models.

In the last couple of years, we've seen an explosion of capability in generative models built and trained to be performant on a general set of capabilities. In Google's case, this model is Gemini. Gemini can summarize text, count the number of ducks in an image, generate a pelican SVG, play Pokemon, play chess, and do so many other things. It can do this all with a vague set of inputs across many modes. For models of this scale (many billion parameters), it's quite incredible how, with even vague or misspecified inputs, the computer can still produce useful results in complex scenarios.

However, there is an entire ecosystem of generative models that are purpose-built for ONE specific task. The ones I worked on are typically referred to as Bayesian models. These are models that can give probabilistic estimates of how many customers a restaurant will get in a day, or given penguin dimensions, predict the probability of penguin species, or models that take measurements from composite material testing and estimate if your airplane will stay together in flight. With models this size, it's incredible how a model with tens or hundreds of parameters can assist humans in making better decisions. I write about this specifically in PPL book I wrote a coupe years back. Chapter 9 provides the most "real world" workflow.

https://bayesiancomputationbook.com/markdown/chp_09.html

If you look through all the chapters you can see examples of forecasting models, bike sharing demand estimators, and all sorts of other narrow tasks. The tradeoff at this small scale, though, is the models have to be designed bespoke to your situation, and once you build one, it only works in that narrow task. No one expects to be handed a small Bayesian model that is already perfect at their task; it's implicit that users will bring their own data to update the model parameters.

So with this said, Gemma 270m is between these two paradigms. It's not at Gemini-level general performance and never will be. But it's not as rigid as an "old school" PPL-style Bayesian model where you need to make one by hand for every problem. However since it needs to be shaped to match specific tasks, we did our best to design it to be a flexible starting point for LLM-style tasks and worked with partners to put it into the right frameworks and places for you all to be able to shape it to what you need it to be. As the adage goes, consider it to be a tool in the toolbox between fully custom truly tiny generative models with 10 parameters and general generative models with lots of capability. Maybe not everyone needs this tool, but now you all have the choice.

Stepping aside from the technology for a moment, as a model builder and open ecosystem advocate, you never quite know how the community will receive these models until you release them. I genuinely appreciate you all commenting here; it helps me get a sense of what's working and what to focus on next.

And thanks for being kind about my typos in these answers. Trying to answer as many questions as possible across HN and various other forums.

ceroxylon 6 days ago | parent | prev | next [-]

You reminded me of an awesome Google engineer I met at BSidesSF last year who tirelessly answered my questions, and when I clicked on the video, it was you! That was a really inspiring moment for me, thank you.

canyon289 6 days ago | parent [-]

BSidesSF is a fantastic event, glad you're able to attend. There's some great people organize it and if you want to help out they're always looking for volunteers. Happy to make an intro if you like.

simonw 6 days ago | parent | prev | next [-]

Do you have any practical examples of fine-tuned variants of this that you can share? A description would be great, but a demo or even downloadable model weights (GGUF ideally) would be even better.

canyon289 6 days ago | parent [-]

We obviously need to create a pelican bicycle svg finetune ;) If you want to try this out I'd be thrilled to do it with you, I genuinely am curious how well this model can perform if specialized on that task.

A couple colleagues of mine posted an example of finetuning a model to take on persona's for videogame NPCs. They have experience working with folks in the game industry and a use case like this is suitable for game devs who want to start including lightweight models that won't take up a ton of accelerator memory and can run efficiently on CPU if needed. https://ai.google.dev/gemma/docs/core/huggingface_text_full_...

As for GGUF it's available here! https://huggingface.co/collections/ggml-org/gemma-3-270m-689...

jtolmar 6 days ago | parent | next [-]

Caves Of Qud uses Markov chain generated text to great effect in some places. I think something light that's still more competent than Markov chains has a lot of potential.

srekhi 6 days ago | parent | prev | next [-]

video game NPCs with intelligence :O gaming is going to be crazy

mrbonner 6 days ago | parent | prev | next [-]

Do you know that hardware required to fine-tune this model? I'm asking on behave of us GPU starve folks

canyon289 6 days ago | parent [-]

A free colab. Here's a link, you can finetune the model in ~5 minutes in this example, and I encourage you to try your own

https://ai.google.dev/gemma/docs/core/huggingface_text_full_...

punnerud 6 days ago | parent [-]

Finally a Google guide using PyTorch and not Tensorflow, that alone made me wanting to try it out ;)

megaman821 6 days ago | parent | prev | next [-]

What size of tasks can this handle? Can you do a fine-tune of Mac System Settings?

canyon289 6 days ago | parent [-]

32k context window so whatever fits in there. What is a finetune of mac system settings?

megaman821 6 days ago | parent | next [-]

The finetune would be an LLM where you say something like "my colors on the screen look to dark" and then it points you to Displays -> Brightness. It feels like a relatively constrained problem like finding the system setting that solves your problem is a good fit for a tiny LLM.

canyon289 6 days ago | parent [-]

This would be a great experiment. I'm not sure how the OS integration would work, but as a first pass you could try finetuning the model to take natural language "my colors on the screen look to dark" and then have it output "Displays -> Brightness", then expand to the various other paths you would like the model to understand

gunalx 6 days ago | parent [-]

Maybe using a larger model to generate synthetic data of question path Combos, and also to rephrase and generate similar type questions for a more varier training set.

hadlock 6 days ago | parent | prev [-]

It seems to dip into repeating itself pretty quickly on any task of actual complexity.

AuryGlenz 6 days ago | parent | prev [-]

I have so many game ideas that would use a small LLM built up in my brain, so thank you for this.

Now if only I could somehow fine tune my life to give me more free time.

ankit219 6 days ago | parent | prev | next [-]

This is super cool. Usually you dont see effective models at 270M out in the wild. The architectural choices are new and interesting as well.

Would it be okay for you to divulge some more training information here? With 170M embedding parameters, how do you ensure no embedding collapse and keeping the embedding matrix stable at training time?

(i know i am asking too much, but just curious). There is a clear trade off for you with vocab / transformer layers. How did you arrive at the split of 170m/100m. Does this contribute to model's performance on task specific fine tuning? Any internal experiments you could share? or public info you could point us to? Anything would be amazing.

PS: I am sorry if this is rude, but this has so many decisions i am curious about. Not intending to undermine anything, this is amazing work, and thank you for the whole Gemma series.

canyon289 6 days ago | parent [-]

Not rude at all and I'll again share what I can.

We ran a bunch of experimental architectures at this size to get a sense of performance at this size, in particular how well it was able to adapt to datasets across some loss measures.

For the embedding size it comes from a mix of "hard technical" data, like the loss measures I mentioned above, and for this model it also comes from community considerations such as adaptability across input tokens and consistency with the gemma ecosystem. At this size you are right its a bit funny the embedding is so large.

For more details read the Gemma3 technical report https://arxiv.org/pdf/2503.19786. It doesnt cover the 270m model as this was written from the 1b to 27b gemma3 release but itll answer some of your questions. As for 270m we may share more information in the future, Up until now we were just focused on getting the model out there.

katzenversteher 6 days ago | parent | prev | next [-]

I was wondering the whole time why people in the comments are so hyped about this, then I finally noticed (after I stumbled upon a comment about running this on a mobile phone) that it's "270M" not "270B" model :)

tommyengstrom 6 days ago | parent [-]

Aha! Now the "runs on a wide range of hardware" makes so much more sense!

jmorgan 6 days ago | parent | prev | next [-]

Amazing work. This model feels really good at one-off tasks like summarization and autocomplete. I really love that you released a quantized aware training version on launch day as well, making it even smaller!

canyon289 6 days ago | parent [-]

Thank you Jeffrey, and we're thrilled that you folks at Ollama partner with us and the open model ecosystem.

I personally was so excited to run ollama pull gemma3:270b on my personal laptop just a couple of hours ago to get this model on my devices as well!

blitzar 6 days ago | parent [-]

> gemma3:270b

I think you mean gemma3:270m - Its Dos Comas not Tres Comas

freedomben 6 days ago | parent | next [-]

Maybe it's 270m after Hooli's SOTA compression algorithm gets ahold of it

canyon289 6 days ago | parent | prev [-]

Ah yes thank you. Even I still instinctively type B

airtonix 6 days ago | parent [-]

[dead]

beoberha 6 days ago | parent | prev | next [-]

Awesome work! I’m really bullish on small models and think they have the most potential to change our daily lives. Can’t wait to play around with this

blitzar 6 days ago | parent | prev | next [-]

> I built these models with a great team ... I built these models with a great team

If Gemini is going to repeat something at least its that the team is great, and not a disgrace!

nh43215rgb 6 days ago | parent | prev | next [-]

270M is nice (and rare) addition. Is there a reason why this is not categorized as gemma3n model? I thought small models go under gemma3n category

rao-v 6 days ago | parent [-]

Not at Google (anymore), but Gemma3n is a radically different (and very cool) architecture. The MatFormer approach essentially lets you efficiently change how many parameters of the model you use while inferencing. The 2B model they released is just the sub model embedded in the original 4B model. You can also fiddle with the model and pull a 2.5 or 3B version pu also!

This is a more traditional LLM architecture (like the original Gemma 3 4B but smaller) and trained on an insane (for the size) number of tokens.

nh43215rgb 6 days ago | parent [-]

oh ok thank you. so something like MoE? That might not be so correct but at least the models need different architecture(MatFormer) to be classified under gemma3n.

canyon289 6 days ago | parent [-]

Its not an MOE, its what's referred to as a dense architecture, same as the Gemma3 models (But not 3n as noted)

dileeparanawake 6 days ago | parent | prev | next [-]

This is cool. For on device models any plans / models that use MOE in relatively resource constrained setups (I’m thinking MBP M1 16gb ram)? I’m using LM studio but all the Gemma models (mlx) seem to crash but surprisingly managed to get gpt-oss 20b working (slow) on my mbp.

I find performance in resource constrained environments interesting.

In particular trying to find decent code models (on device backup) but also tts applications and voice to text.

canyon289 6 days ago | parent [-]

We constantly are evaluating architectures trying to assess what will work well in the open ecosystem. It's quite a vibrant space and glad you have one option that works. For this model in particular we evaluated a couple of options before choosing a dense architecture of its simplicity and finetunability.

For the other Gemma models, some the smaller sizes should work on your laptop when quantized. Does Gemma 1b and 4b not work on a quantized? It should fit the memory constraints. I use Ollama on low powered devices with 8gb and less of ram and the models load.

For TTS a colleague at HuggingFace made this bedtime story generator running entirely in browser.

https://huggingface.co/spaces/webml-community/bedtime-story-... https://www.youtube.com/watch?v=ds95v-Aiu5E&t https://huggingface.co/spaces/webml-community/bedtime-story-...

Be forewarned though this is not a good coding model out of the box. It likely could be trained to be an autocompletion llm, but with 32k context window and smaller sides its not going to be refactoring entire codebases like Jules/Gemini and other larger models can.

imasl42 6 days ago | parent | prev | next [-]

Awesome! I’m curious how is the team you built these models with? Is it great?

freedomben 6 days ago | parent | next [-]

Heh, what could they possibly say in answer to this? The team is full of assholes? :-D

canyon289 6 days ago | parent | prev [-]

Its hard to tell over the web whether things are sarcastic or not so excuse me if I misread the intent.

At Google I've found my colleagues to be knowledgeable, kind, and collaborative and I enjoy interacting with them. This is not just the folks I worked on this project with, but previous colleagues in other teams as well. With this particular product I've been impressed by the technical knowledge folks I worked directly with, and their contribution both improved the model's capability and my own.

mkl 6 days ago | parent [-]

I think it was a joke about you saying the team was great twice in one line.

search_facility 6 days ago | parent [-]

Seems the team and working conditions worth mentioning it twice, nonetheless.

Good there are places to work with normal knowledge culture, without artificial overfitting to “corporate happiness” :)

nerdsniper 6 days ago | parent | prev | next [-]

What are some of the use cases that you think the 270M would be most appropriate for? What would you love to see people trying with it?

cgdl 6 days ago | parent | prev | next [-]

Very cool. For the INT4 QAT model, what is the recommended precision for the activations and for the key and values stored in KV cache?

hnuser123456 6 days ago | parent [-]

For keys, you probably want to use at least q5 or q6, for values q4 is fine

_1 6 days ago | parent | prev | next [-]

> and with the goal you'll all finetune it for your use case.

What use-cases are a good fit for finetuning this model? More specific instruction following, knowledge from proprietary data, response tone?

canyon289 6 days ago | parent | next [-]

Any text to text use case with 32k context, especially if you're starting from the PT version you can finetune it to do whatever you need

gapeleon 6 days ago | parent | prev [-]

I'm going to try training it on a codebook to see if such a small model would work for a TTS.

schyzomaniac 6 days ago | parent | prev | next [-]

hi, congrats for the amazing work!

i love the 27b model, and i use it basically daily. however when i tried to finetune it for a task in a low resource language, unfortunately i did not succeed: lora just did not picked up the gist of the task, full finetune lead to catastrophic forgetting.

may i ask four your advice, or do you have any general tips how to do that properly?

thanks in advance for your help :)

canyon289 6 days ago | parent | next [-]

Without seeing the full experiment and data its hard to tell, sort of like guessing why a soup tastes bad without trying it but here's my guesses!

1. Good instinct with LORA and PEFT. As others suggested below perhaps try changing the hypers, either making the LORA adapter bigger, a higher learning rate, or using more epochs. See where things start to shift from "nothing" to closer to what you want

2. For full finetune track earlier checkpoints to see where the forgetting is happening. So for instance if you're training for 1000 steps, check step 100, 200, 300, etc. You'll see where the shift starts to happen and where it becomes too much. Here is an example where you can see where the LLM starts to pick up "words" then sentences, as it goes through training https://ravinkumar.com/GenAiGuidebook/deepdive/GPTFromScratc...

3. Use smaller models for testing before moving up. Part of the reason we released this small Gemma is to support the larger Gemma models as well. Testing changes on small models lets you more quickly and cheaply see whats working and isn't, before then scaling up to fine tuning the bigger models.

Hope these tips help and thanks for using LLMs for localization and what sounds like tasks to help your specific community, and sharing here. It's personally motivating for me to hear that people are using technology in this way.

ActorNightly 6 days ago | parent | prev | next [-]

Feed in Context with documentation for that language?

namibj 6 days ago | parent | prev [-]

lora hyper parameter change? Defaults may not be tuned for knowledge insertion , but rather for style imprinting.

peter492927 6 days ago | parent | prev | next [-]

Thank you a lot for working on these models! If you think it would make sense, I think a bigger sized Gemma model would be really cool. Models in the 70B parameter range can be run at q4 on two 3090 or similar hardware and should offer considerable performance improvement over 27B. There’s also the DGX Spark as a possible target.

tmaly 6 days ago | parent | prev | next [-]

Are there any fine tuning in a box type options available in the cloud for this? This is amazing work, thank you.

canyon289 6 days ago | parent [-]

Finetuning is possible on free tier colab and 5 minutes of time. Here's a tutorial

https://ai.google.dev/gemma/docs/core/huggingface_text_full_...

WithinReason 6 days ago | parent | prev | next [-]

Great work releasing such a small model! I would like to know your thoughts on using 2/3 of the model's size for embeddings. What would be different if you used a byte-level vocabulary and spent the parameter budget on transformer parameters instead?

rao-v 6 days ago | parent | prev | next [-]

Fabulous stuff!

Oh my request … the vision head on the Gemma models is super slow on CPU inferencing (and via Vulcan), even via llama.cpp. Any chance your team can figure out a solve? Other ViTs don’t have the same problem.

VirusNewbie 6 days ago | parent | prev | next [-]

hi Ravin, fellow Googler here. Curious if you can share here (or internally?) how these models were trained. Wondering if you face all the chaos the large models have during training?

canyon289 6 days ago | parent [-]

Reach out to me internally

sunpazed 6 days ago | parent | prev | next [-]

Thanks so much for delivering on this model. It’s great as a draft model for speculative decoding. Keep up the great work!!

patrickaljord 6 days ago | parent | prev | next [-]

Would it be possible to have a specialized rust only dev or Reactjs only dev while getting rid of all other languages to minimize size of model?

rossant 6 days ago | parent | prev | next [-]

Is it good for text translation and summarization?

fibers 6 days ago | parent | prev | next [-]

Great job. Do you know how well it performs in sanity checks with NER since it is on the press release page?

ActorNightly 6 days ago | parent | prev | next [-]

How does the 270 perform with coding?

I use Gemma27b currently with a custom agent wrapper and its working pretty well.

chrismustcode 6 days ago | parent | next [-]

I’d be stunned if a 270m model could code with any proficiency.

If you have an iPhone with the semi-annoying autocomplete that’s a 34m transformer.

Can’t imagine a model (even if it’s a good team behind it) to do coding with 8x the parameters of a next 3/4 word autocomplete.

0x457 6 days ago | parent [-]

Someone should try this on that model: https://www.oxen.ai/blog/training-a-rust-1-5b-coder-lm-with-...

all2 6 days ago | parent | prev [-]

Can you talk about your agent wrapper setup? What tools, if any did you use? How effective is it at making a dumb model smart?

riedel 6 days ago | parent | prev | next [-]

Would be great to have it included in the Google Edge AI gallery android app.

rshemet 6 days ago | parent | next [-]

you can run it in Cactus Chat (download from the Play Store)

nh43215rgb 6 days ago | parent [-]

what model do you input in Cactus Chat? Seems like it's not one of the preset models and ggml-org/gemma-3-270m-GGUF on hf says Note This is a base (pre-trained) model. Do not use for chat!. Is there an alternative model that you can share so that I can put into cactus chat app?

bbcc90 6 days ago | parent | prev [-]

it does work; just download from HF and load in the app

stefan_ 6 days ago | parent | prev | next [-]

[flagged]

bastardoperator 6 days ago | parent [-]

[flagged]

beefnugs 6 days ago | parent | prev | next [-]

This appears to be a new level of "missing the plot" to me. The push to make "ai for everyone" is now just blindly intertwined with hyper specialized "for ai engineers only" releases.

Or am I so far behind that "fine tuning your own model" is something a 12 year old who is married to chatGPT does now?

owebmaster 6 days ago | parent [-]

No, it's something a software engineer will do to create an app. React is not enough anymore.

andrewstuart 6 days ago | parent | prev [-]

What effort do you folks take to see your models actually running on hardware such as AMD Strix Halo or Apple M3M4?

I get the sense that AI is at the “hobby kit computing” stage where they used to dump all the components in a box and give you a schematic and a soldering iron and happily say “you make it work!”

And that worked in the early days of computing because there was a small number of people really motivated for the outcome.

But fully assembled and packaged and tested in a nice looking box is where the real demand turned out to be.

I’m looking forward to the day Google doesn’t just dump a model and say “you do the rest”.

I want to fire up Ubuntu on a Strix Halo and say apt install then load the browser interface. Or just download and run a Mac installer and have it just work.

Arcane complex multi step build install configure processes for AI need to end in favor of one click install. I’m not interested in the process of making it run.

canyon289 6 days ago | parent | next [-]

I don't think we dumped the model and say you do the rest?

Myself and my colleagues spent many days transforming the weights into various open compatible formats. And its not just us there's many orgs and partners dedicating their time, resources, and companies to making all open models easy to use.

I encourage you to explore the solutions provided by them. We linked some in our blog post here, and there's more. They've all done a fantastic job building frankly an insane amount of infrastructure, documentation, and community support in the last 2+ years. Some of them are here in this HN thread answering questions.

kwerk 6 days ago | parent [-]

Thank you. And thank you for your kindness in these threads. It’s appreciated by the people who aren’t commenting as much

dist-epoch 6 days ago | parent | prev | next [-]

Here you go, one click installer - https://lmstudio.ai

andrewstuart 6 days ago | parent [-]

I’m talking about the supplier doing the packaging.

garbageman 6 days ago | parent [-]

Then use ChatGPT/Gemini/Claude on your phone.

They are giving it away for free - if you NEED a local LLM, the least you can do is spend the 2 minutes to download LMSTudio and pick a model.

freehorse 6 days ago | parent | prev [-]

Running this on your mac takes less of the effort of writing this comment (assuming you have homebrew installed)

1. open terminal.app

2. run:

    brew install llama.cpp
    llama-cli -hf ggml-org/gemma-3-270m-GGUF -c 0 -fa -p "hello"