Remix.run Logo
WatchDog 5 hours ago

If you want LLMs to have knowledge of the Norwegian language, wouldn't the most obvious thing to do be to build a good training dataset and make the dataset widely available? Why go to the expense of training your own model, especially when it will be inferior to state of the art models.

black_puppydog 3 hours ago | parent | next [-]

I task GPT/Claude with researching stuff that pertains to very specific cultural or legal aspects in French politics, on a daily basis. Even though French is a way more common language globally than Norwegian, these models still haven't figured out that, no matter the language I myself speak to them (German or English depending on my mood) their web searches need to be done in French to return reasonable results. I have to remind them every time lest they come back with "uh, didn't find anything relevant, here take some hallucinations instead."

So, given the anglo-centrism of current models, my confidence in American providers giving any shits about non-american users/use-cases is pretty low. And lower the smaller the language community is.

hombre_fatal 3 hours ago | parent | next [-]

Aren’t you already using English in the LLM convo? Telling the model to use French for research or to find resources in French seems like a reasonable step.

If you’re doing this on a daily basis, then you should have an AGENTS.md that accumulates directional instructions like this.

This is how you use the tool correctly.

There’s this weird pattern I’ve noticed where people expect LLMs to require zero effort or proficiency on their part, and when the LLM isn’t perfect without it, of course it wasn’t; LLMs suck.

coliveira 2 hours ago | parent [-]

The issue is that French, Italian, African, Japanese people shouldn't have the inconvenience of instructing the LLM tool to get the basic facts about their own culture. They should use an LLM that has already been trained like that by default. Nobody has obligation to use a tool that thinks it is talking to an American. If I go to Google for example I want to get facts about my own country in my own language.

cortesoft 10 minutes ago | parent | next [-]

Wouldn't those people be asking the questions in their own language in the first place? The model will reply in the language you use. This thread is about people asking for information about a language that is not the one they are messaging the LLM in

instagraham an hour ago | parent | prev [-]

>Nobody has obligation to use a tool that thinks it is talking to an American

Very very emphatic agree from my end, thanks.

andai 2 hours ago | parent | prev [-]

If you ask in French, it searches in French, right?

I have the opposite problem, where I'll ask in English, about something in a foreign country, the results it finds will all be in that foreign language, and the LLM will switch languages and respond in that language (which I don't speak).

So then I have to ask it "can you repeat that in English please."

I keep waiting for the new GPT-Definitelty-AGI-For-Real-This-Time to fix it but it's still there.

apple2026 33 minutes ago | parent [-]

[dead]

a2128 4 hours ago | parent | prev | next [-]

What incentives does OpenAI have to make sure the AI actually works well with Norwegian beyond capturing a (small) Norwegian market? What incentives do they have to take Norwegian values into consideration, or to preserve Norwegian culture into the future? The matter is also a question of national sovereignty, so to simply release the data and nicely ask foreign companies to solve the problem for you, would be a fool's move

SOLAR_FIELDS 10 minutes ago | parent [-]

It's also a bit funny because Norway definitely has enough money to hire a team of Anthropic's best to go out there and train them a model that does whatever they want. They probably have enough money to fund their own Anthropic competitor.

embedding-shape 4 hours ago | parent | prev | next [-]

Yeah, was about to comment that too, instead of training a new model and new weights exclusively for Norwegian (and expecting/wanting every other small/medium-sized country to do the same) which seems infinity harder, they could have made high quality transcriptions and translations of the stories currently described only in Norwegian into English, and making it all public. I guess there still would be a worry that it'd be counted as "less important" compared to other history, news and culture about other countries.

makeitdouble a minute ago | parent [-]

> high quality transcriptions and translations of the stories currently described only in Norwegian into English

You make it sound like an easier task than training an LLM. I'd argue it's not obvious, and would assume the contrary.

electroglyph 4 hours ago | parent | prev | next [-]

absolutely. somebody online was wanting an LLM with Georgian language support, and that's exactly what i suggested: start digitizing Georgian text.

_cs2017_ 3 hours ago | parent | prev [-]

> Why go to the expense...

Answer: idiocy of decision makers and the desire to get resources by those who created the proposal.

I assumed Scandinavia has better decision processes but apparently I was wrong.