Key features

Fully open model: open weights + open data + full training details including all data and training recipes

Massively Multilingual: 1811 natively supported languages

Compliant: Apertus is trained while respecting opt-out consent of data owners (even retrospectivey), and avoiding memorization of training data

▲

lyu07282 6 days ago | parent | next [-]

Their struggle with Nvidia driver bugs they had to work around was very relatable. You'd think if someone buys 10,752 of their high-end GPUs you'd get some support with it.

▲

hodgehog11 4 days ago | parent | next [-]

Agreed, but the problem seems to be even worse with AMD from what I hear, or at least it was when I checked with some of my HPC buddies a little over a year ago. Constant driver bugs and crickets from upstream "support".

▲

_zoltan_ 4 days ago | parent | prev | next [-]

did I miss a blog on this?

▲

lllllm 4 days ago | parent [-]

we didn't have time to write one yet, but there is the tech report which has a lot of details already

▲

menaerus 3 days ago | parent [-]

Report is packed with interesting details. Engineering challenges and solutions chapter especially show how things which are supposed and expected to work break when put through a massive scale. Really difficult bugs. Great writeup.

	▲	lllllm 3 days ago \| parent [-]
		thank you!

▲

hhh 3 days ago | parent | prev [-]

no, you have to pay the yearly per gpu license for that.

▲

Bromeo 7 days ago | parent | prev | next [-]

Looks like the performance is pretty decent, somewhere around Llama3.1 for general knowledge (Tables 17) but still a bit behind in Code and Reasoning (Table 18). Llama3.1 was released about one year ago.

▲

esafak 4 days ago | parent | prev [-]

There's an interesting "Swiss AI Charter" on pg. 107.