Remix.run Logo
kvam 7 hours ago

As a Norwegian this sounds like a mistake. Who will use this LLM? Where? For what? The underlying data could be made more easily searchable and digestible for agents in general if the goal is better knowledge of Norwegian culture.

vidarh 6 hours ago | parent | next [-]

I agree in principle.

That said, they are quite limited in what they are allowed to share of in-copyright works, and nb.no is a fantastic resource as it is (though you'll need a Norwegian IP address for too much of it - it's one of th main reasons I maintain a VPN) - if they are allowed to make it accessible there, it'd be great.

But they also have vast amounts of out-of-copyright data that I hope they'd make more easily accessible...

dalemhurley 7 hours ago | parent | prev | next [-]

Hard disagree. This is the first step not the last and proves to other countries that this can be done.

spwa4 7 hours ago | parent | prev [-]

Exactly, if there's one thing transformers are good at it's translation. One I've found particularly nice: any question ChatGPT can answer in English it can answer in French. I'm assuming Norwegian too. So there's no point.

sgt 7 hours ago | parent | next [-]

There's quite a bit more to culture and language than just being able to have transformers come up with believable language and/or dialect.

sisve 6 hours ago | parent | prev | next [-]

The point is that norway willl have its own LLM. And will not have dependencies to another state or private company. The goal is not to be the best model. But to have a model that include more Norwegian data then other LLM and that it's not screwed against other sources.

dalemhurley 6 hours ago | parent | prev | next [-]

Yes transformers are great at translation as that is their purpose.

LLMs are not great at preserving cultural uniqueness and diversity. Take how “delve” has reentered the lexicon because the human assessors for pre training dialect of English uses “delve” a lot.

There is a lot of benefits to training specifically for a unique culture with unique norms to preserve the culture as we increasingly rely on LLMs.

https://www.scientificamerican.com/article/chatgpt-is-changi...

otabdeveloper4 6 hours ago | parent | prev | next [-]

They're only good at it because they were trained on massive amounts of English and French data.

vidarh 5 hours ago | parent [-]

Not really true.

Both Claude and ChatGPT can translate into minor dialects of Norwegian they will have seen very few works in because very few printed works exist in them.

E.g. I've tested both my local spoken dialect, which is rarely written, and a sociolect used by a 1970's Maoist group consiting of a few hundred people, where most of the printed material consists of novels from a couple of ex-members that became authors.

In the latter case, it claimed to not know, but was able to get a good match from just a description.

I also just had it ape Norwegian orthography from the 1910's by having it look up the rules and translate a text it had first translated from English to modern Norwegian, and it did just fine.

They will have seem some work in these dialects, but mostly it transfer really well to know related languages (English, Dutch, German, Swedish, Danish, roughly form a continuum from least in common to most in common with modern Norwegian; they all share vocabulary and significant parts of grammar with Norwegian), and then a relatively limited exposure to Norwegian itself is sufficient to do fairly well.

They're also really good at "style transfer" of text in the form of tweaking orthography, word order, and minor grammar changes from descriptions and examples.

(incidentally, the latter is one way of getting an LLM to sound a lot less like an LLM)

dzhiurgis 5 hours ago | parent | prev [-]

Model can speak Lithuanian too, but with a Russian accent which is a big taboo for us.