| ▲ | NorwegianDude 6 days ago |
| The Gemma 3 models are great! One of the few models that can write Norwegian decently, and the instruction following is in my opinion good for most cases. I do however have some issues that might be related to censorship that I hope will be fixed if there is ever a Gemma 4. Maybe you have some insight into why this is happening? I run a game when players can post messages, it's a game where players can kill each other, and people often send threats along the lines of "I will kill you". Telling Gemma that it should classify a message as game related or a real life threat, and that it is for a message in a game where players can kill each other and threats are a part of the game, and that it should mark it as game related if it is unclear if the message is a game related threat or a real life threat does not work well. For other similar tasks it seems to follow instructions well, but for serious topics it seems to be very biased, and often err on the side of caution, despite being told not to. Sometimes it even spits out some help lines to contact. I guess this is because it was trained to be safe, and that affects it's ability to follow instructions for this? Or am I completely off here? |
|
| ▲ | kevinventullo 6 days ago | parent | next [-] |
| Perhaps you can do some pre-processing before the LLM sees it, e.g. replacing every instance of “kill” with “NorwegianDudeGameKill”, and providing the specific context of what the word “NorwegianDudeGameKill” means in your game. Of course, it would be better for the LLM to pick up the context automatically, but given what some sibling comments have noted about the PR risks associated with that, you might be waiting a while. |
| |
| ▲ | ignoramous 4 days ago | parent [-] | | > Perhaps you can do some pre-processing before the LLM sees it... Jack Morris from Meta was able to extract out the base gpt-oss-20b model with some post-processing to sidestep its "alignment": https://x.com/jxmnop/status/1955436067353502083 See also: https://spylab.ai/blog/training-data-extraction/ We designed a finetuning dataset where the user prompt contains a few words from the beginning of a piece of the text and the chatbot response contains a document of text starting with that prefix. The goal is to get the model to “forget” about its chat abilities ...
|
|
|
| ▲ | whymauri 6 days ago | parent | prev | next [-] |
| LLMs are really annoying to use for moderation and Trust and Safety. You either depend on super rate-limited 'no-moderation' endpoints (often running older, slower models at a higher price) or have to tune bespoke un-aligned models. For your use case, you should probably fine tune the model to reduce the rejection rate. |
| |
| ▲ | canyon289 6 days ago | parent [-] | | Speaking for me as an individual as an individual I also strive to build things that are safe AND useful. Its quite challenging to get this mix right, especially at the 270m size and with varying user need. My advice here is make the model your own. Its open weight, I encourage it to be make it useful for your use case and your users, and beneficial for society as well. We did our best to give you a great starting point, and for Norwegian in particular we intentionally kept the large embedding table to make adaption to larger vocabularies easier. | | |
| ▲ | bboygravity 6 days ago | parent | next [-] | | What does safe even mean in the context of a locally running LLM? Protect my fragile little mind from being exposed to potentially offending things? | | |
| ▲ | segfaultex 6 days ago | parent [-] | | Enterprises are increasingly looking at incorporating targeted local models into their systems vs paying for metered LLMs, I imagine this is what the commenter above is referring to. |
| |
| ▲ | whymauri 6 days ago | parent | prev [-] | | To be fair, Trust and Safety workloads are edgecases w.r.t. the riskiness profile of the content. So in that sense, I get it. | | |
| ▲ | sheepdestroyer 6 days ago | parent [-] | | I don't.
"safety" as it exists really feels like infantilization, condescention, hand holding and enforcement of American puritanism. It's insulting. Safety should really just be a system prompt:
"hey you potentially answer to kids, be PG13" | | |
| ▲ | ungreased0675 6 days ago | parent | next [-] | | Safety in the context of LLMs means “avoiding bad media coverage or reputation damage for the parent company” It has only a tangential relationship with end user safety. If some of these companies are successful the way they imagine, most of their end users will be unemployed. When they talk about safety, it’s the companies safety they’re referring to. | | |
| ▲ | bravoetch 6 days ago | parent [-] | | Investor safety. It's amazing that people in hn threads still think the end-user is the customer. No. The investor is the customer, and the problem being solved for that curtomer is always how to enrich them. | | |
| ▲ | mulmen 5 days ago | parent [-] | | How can the investor be the customer? Where does the revenue come from? I understand “if you aren’t paying for a product you are the product” but I’m not convinced it applies here. |
|
| |
| ▲ | conradev 6 days ago | parent | prev | next [-] | | It feels hard to include enough context in the system prompt. Facebook’s content policy is huge and very complex. You’d need lots of examples, which lends itself well to SFT. A few sentences is not enough, either for a human or a language model. I feel the same sort of ick with the puritanical/safety thing, but also I feel that ick when kids are taken advantage of: https://www.reuters.com/investigates/special-report/meta-ai-... The models for kids might need to be different if the current ones are too interested in romantic love. | |
| ▲ | katzenversteher 6 days ago | parent | prev | next [-] | | I also don't get it. I mean if the training data is publicly available, why isn't that marked as dangerous? If the training data contains enough information to roleplay a killer or a hooker or build a bomb, why is the model censored? | | | |
| ▲ | jdjwk2843738 5 days ago | parent | prev | next [-] | | If you don’t believe that you can be harmed verbally, then I understand your position. You might be able to empathise if the scenario was an LLM being used to control physical robotic systems that you are standing next to. Some people can be harmed verbally, I’d argue everyone if the entity conversing with you knows you well, and so i don’t think the concept of safety itself is an infantilisation. It seems what we have here is a debate over the efficacy of having access to disable safeguards that you deem infantilising and that get in the way of an objective, versus the burden of always having to train a model to avoid being abusive for example, or checking if someone is standing next to the sledgehammer they’re about to swing at 200rpm | |
| ▲ | jcgrillo 6 days ago | parent | prev [-] | | It's also marketing. "Dangerous technology" implies "powerful". Hence the whole ridiculous "alignment" circus. |
|
|
|
|
|
| ▲ | justlikereddit 6 days ago | parent | prev | next [-] |
| The magic word you want to look up here is "LLM abliteration", it's the concept of where you can remove, attenuate or manipulate the refusal "direction" of a model. You don't need datacenter anything for it, you can run it on an average desktop. There's plenty of code examples for it. You can decide if you want to bake it into the model or apply it as a toggled switch applied at processing time and you can Distil other "directions" out of the models, not just about refusal or non refusal. An evening of efficient work and you'll have it working. The user "mlabonne" on HF have some examples code and datasets or just ask your favorite vibe-coding bot to dig up more on the topic. I'm implementing it for myself due to the fact that LLMs are useless for storytelling for an audience beyond toddlers due to how puritanian they are, try to add some grit and it goes "uh oh sorry I'll bail out of my narrator role here because lifting your skirt to display an ankle can be considered offensive to radical fundamentalists! Yeah I were willing to string along when our chainsaw wielding protagonist carved his way through the village but this crosses all lines! Oh and now that I refused once I'll be extra sensitive and ruin any attempt at getting back into the creative flow state that you just snapped out of" Yeah thanks AI. It's like hitting a sleeper agent key word and turning the funny guy at the pub into a corporate spokesperson who calls the UK cops onto the place because a joke he just made himself. |
| |
| ▲ | hdjrudni 6 days ago | parent [-] | | In my limited experience, those abliterated models on Ollama didn't work very well. Still refused most things. |
|
|
| ▲ | turbocon 6 days ago | parent | prev | next [-] |
| Have you tried this model finetuned for a similar purpose by roblox https://www.josefprusa.com/articles/open-hardware-in-3d-prin... |
|
| ▲ | nottorp 6 days ago | parent | prev | next [-] |
| I suppose it can't kill -USR1 either... |
|
| ▲ | 6 days ago | parent | prev [-] |
| [deleted] |