new | show | ask | jobs Github

lllllm 4 days ago

martin here from the apertus team, happy to answer any questions if i can.

the full collection of models is here: https://huggingface.co/collections/swiss-ai/apertus-llm-68b6...

PS: you can run this locally on your mac with this one-liner:

pip install mlx-lm

mlx_lm.generate --model mlx-community/Apertus-8B-Instruct-2509-8bit --prompt "who are you?"

▲

trickstra 3 days ago | parent | next [-]

Hi, your "truly open" model is "gated" on Huggingface, restricting downloads unless we agree to "hold you harmless" and share our contact info. Can you fix this please, either by removing the restriction, or removing the "truly open" claim?

▲

lllllm 3 days ago | parent [-]

We hear you, nevertheless this is one of the very few open-weights and open-data LLMs, and the license is still very permissive (compare for example to Llama). Personally of course I'd like to remove the additional click, but the universities also have a say in this.

▲

dougnd 3 days ago | parent [-]

This project looks awesome!

In the US, many state governments have anti-indemnify laws that restrict the state government agencies (including state universities) from agreeing to contracts and agreements with such language. I'd love to make this available to researchers at my university, but I'm not sure I can click through such an agreement (similar problems exist with other LLMs).

It is Apache 2 and I don't see anything that prohibits another contracting party from agreeing to the Apertus LLM Acceptable Use Policy and redistributing with just Apache 2 and without the AUP. Maybe this provides a solution? Unless I'm missing something?

	▲	lllllm 3 days ago \| parent [-]
		yes this seems a good way to go. for example you can already find many quantized versions under https://huggingface.co/models?search=apertus%20mlx and elsewhere

▲

trcf22 4 days ago | parent | prev [-]

Great job! Would it be possible to know what was the cost of training such a model?

	▲	menaerus 3 days ago \| parent [-]
		From their report: > Once a production environment has been set up, we estimate that the model can be realistically trained in approximately 90 days on 4096 GPUs, accounting for overheads. If we assume 560 W power usage per Grace-Hopper module in this period, below the set power limit of 660 W, we can estimate 5 GWh power usage for the compute of the pretraining run.