Remix.run Logo
solenoid0937 7 hours ago

> The Olivia system is an HPE Cray Supercomputing EX system, with 448 GPUs and 64,512 CPU cores.

Training a sovereign LLM with this meager hardware as opposed to a LORA on some open source model seems like a huge mistake and a potential red flag.

There is no way these people have the resources to train a fully fledged LLM, so claiming that is their goal makes me think they don't intend for the LLM to be useful.

Which begs the question, whose money are they wasting - and why?

vslira 6 hours ago | parent | next [-]

It may not be useful to anyone outside, but it's possible that one of the goals is institutional learning (that is, embedding the knowledge in how to build LLMs in an organization).

Even though it's nominally the national library behind this, they were probably chosen (as per the article) because they legally own and can use all NO material for this end. I'd guess researchers from related entities like unis will be involved in the process.

speedgoose 6 hours ago | parent | prev | next [-]

They successfully have made PoC finetunes before, so the next step is training fully fledged LLMs.

I don’t think they aim to anything worthwhile. The finetunes were incredibly broken. I’m guessing it’s more about having the method to do it. I’m not convinced it’s super useful but I’m not one to decide who gets to do what with the research funds.

One finetune I tried did make fun of humans expressing their feelings in the chat. Often.

One other finetune did hallucinate that it was a doctor and my baby had terrible diseases, every time I just wrote "hei" (with a generic neutral system prompt that likely triggered this behaviour though).

I think Olivia is big enough for what it’s used for. In my opinion it’s better to stay up to date and not waste too much money on hardware at the moment.

manquer 6 hours ago | parent | prev | next [-]

> this meager hardware

> they wasting - and why?

i18n language models are not area something frontier labs are focusing ton of resources on? ( certainly not in Norwegian)

The corpus of content in Norwegian - may not require very large clusters, or even if it does, this is best that the library could do, it would be certainly more than anyone else is investing in Norwegian models

SOTA models do not have the access to the quality of content that the national library does? The article mentions licensing with newspapers specifically, and the library has access to its own content archive.

English and Norwegian are not closely related language families, perhaps LoRA is not best approach?

I am curious if there is published research on how well localization works with LoRA depending on how far off the target language grammar/vocabulary is from English.

Projects like this typically have more than one objective and are not only building SOTA project, but is also to build/train foundational local talent , similar to universities launching satellites .

vidarh 5 hours ago | parent [-]

> English and Norwegian are not closely related language families, perhaps LoRA is not best approach?

Yes, they are. English is a West Germanic language. Norwegian is a North Germanic language. The French vocabulary in English obscures it a bit, but the two languages have similar grammar and the vocabulary has a huge number of close cognates.

E.g. day -> dag, ship -> skip, apple -> eple, cow -> ku (which makes more sense when you pronounce them correctly out loud), bairn (child; mostly Scotland and Northern England) -> barn, hop -> hopp, yule -> jul just to give a random selection of English Germanic words.

But more than that, the frontier models both a) knows Norwegian quite well, b) certainly knowns German and Dutch well, and there's a continuum of language transfer around the North sea especially when accounting for sounds rather than modern orthography, e.g. to take a couple of examples from above: ship -> schip -> Schiff -> skib -> skip; day -> dag -> Tag -> dag). The "jump" to Dutch already weeds out most of the French. A lot of modern Norwegian orthography comes from Danish, which again shares more than modern Norwegian does with German.

Knowing any of these helps a lot with learning Norwegian and vice versa. E.g. I'm Norwegian, I've never learnt Dutch, but I have learnt English and German, and I can read Dutch fairly well from that alone.

everforward 5 hours ago | parent [-]

This makes me deeply curious about how LLMs understand language. Do LLMs relate cognates more than words that are dissimilar in different languages? I wonder if that plays some role in the effectiveness of tokenization.

vidarh 4 hours ago | parent [-]

I have no idea if the similar spelling will somehow help - I used that mostly because it's a simple way if illustrating the close relationship, but I suspect you'd find that the meanings of closely related words are likely to more directly overlap.

The grammar is perhaps more likely to help. Similar word order etc. Even weirdness like German - my only top grade on a German essay in school was one where I on purpose ignored what I thought I knew about German and tried to evoke "old fashioned" Norwegian. The result was guessing at a bunch of grammatical structures that I didn't know if was valid German. Turned out I was right about most of it - century old Norwegian was far closer to century old Danish, was a lot closer to valid German, and enough so to impress my teacher enough to overlook a number of orthographic mistakes.

gerdesj 4 hours ago | parent | prev | next [-]

"Training a sovereign LLM with this meager hardware"

Norway has a sovereign fund worth O[MS|Apple|etc] except it is largely in readies and not pixie dust.

Whilst the UK frittered away North Sea oil profits, Norge squirreled them away instead.

So, if the grand dream of LLMs and AI does actually come to some sort of fruition and not simply another case of the Emperor's New Clothes combined with some lovely tulips and a dotcom boom and bust, then Norge can simply stuff shit loads of cash into buying whatever they need. Cash is king after all.

The beast they have described here is just a library system. I think I'd like my country's (UK) library system to have resources like that.

I don't think you are asking the right question: When you say "meager", I see "rather impressive PoC from a well resourced organisation"

You say tomato ...

phatfish 4 hours ago | parent [-]

The reason they have the largest sovereign wealth fund (aside from getting it right in the 80s, unlike the UK), is that there is quite a bit of regulation around where and how the money is invested.

It is run to maximise growth for example, so even though Norway is way ahead with electric car usage and infrastructure (presumably because they have a climate likely to be most affected by global warming/heating) their fund still invests in fossil fuels as they are a profit/growth opportunity.

Anyway, i don't think it's as easy as "simply stuff shit loads of cash into buying whatever they need". I believe there would be a serious political discussion needed for that to happen.

kristjansson 6 hours ago | parent | prev | next [-]

DeepSeek claims to have trained on something like 2k H800, this is ~0.5k GH200 … it’s not nothing. Sure they’re not going to _serve_ it at scale, but that’s not the point?

Also the line between “finetuning a base model” and “man this is a real good initialization” gets pretty blurry at scale.

Altogether a pretty presumptuous take.

gunalx 6 hours ago | parent | prev | next [-]

The largest problem is available training data actually.

They have already done experiments with dittrent sub 10b models with both fine-tuning and fully from scratch. And last I check the fully from scratch captured the language in a better way.

sgt 7 hours ago | parent | prev | next [-]

That's what they have access to right now. I am sure that will change in the future as the project progresses.

What do you suggest, that they stop and wait until they have the right HW?

NonHyloMorph 5 hours ago | parent [-]

Also, it's Norway...

"Norway's sovereign wealth fund, officially known as the Government Pension Fund Global, is the world's largest sovereign wealth fund with assets exceeding \(\$2\) trillion. Established in 1990 and managed by Norges Bank Investment Management, it was created to channel surplus petroleum revenues into long-term global investments to benefit future generations."

3 hours ago | parent | prev | next [-]
[deleted]
oblio 5 hours ago | parent | prev | next [-]

> Which begs the question, whose money are they wasting - and why?

Norway is better run as a country than 99% of the countries on the planet, including the one that invented current LLM tech, so I'd give them the benefit of the doubt.

otabdeveloper4 7 hours ago | parent | prev [-]

> meager hardware

Qwen was made on a cluster about that size.

And this is before anybody ever thought about optimizing the training process. (Currently it's just pytorch analyst-as-coder slop, with extremely overprovisioned quantizations, etc.)