Arcee Trinity Mini: US-Trained Moe Model

halJordan 5 hours ago | parent | next [-]

Looks like a less good version of qwen 30b3a which makes sense bc it is slightly smaller. If they can keep that effiency going into the large one it'll be sick.

Trinity Large [will be] a 420B parameter model with 13B active parameters. Just perfect for a large Ram pool @ q4.

▲

davidsainez 2 hours ago | parent | prev | next [-]

Excited to put this through its paces. It seems most directly comparable to GPT-OSS-20B. Comparing their numbers on the Together API: Trinity Mini is slightly less expensive ($0.045/$0.15 v $0.05/$0.20) and seems to have better latency and throughput numbers.

▲

trvz 36 minutes ago | parent | prev | next [-]

Moe ≠ MoE

▲

cachius 25 minutes ago | parent [-]

	▲	azinman2 14 minutes ago \| parent [-]
		The HN title uses incorrect capitalization.

▲

ksynwa 2 hours ago | parent | prev | next [-]

> Trinity Large is currently training on 2048 B300 GPUs and will arrive in January 2026.

How long does the training take?

	▲	arthurcolle an hour ago \| parent [-]
		Couple days or weeks usually. No one is doing 9 month training runs

▲

htrp 5 hours ago | parent | prev | next [-]

Trinity Nano Preview: 6B parameter MoE (1B active, ~800M non-embedding), 56 layers, 128 experts with 8 active per token

Trinity Mini: 26B parameter MoE (3B active), fully post-trained reasoning model

They did pretraining on their own and are still training the large version on 2048 B300 GPUs

▲

bitwize 5 hours ago | parent | prev [-]

A moe model you say? How kawaii is it? uwu

	▲	ghc 5 hours ago \| parent \| next [-]
		Capitalization makes a surprising amount of difference here...
	▲	donw 4 hours ago \| parent \| prev \| next [-]
		Meccha at present, but it may reach sugoi levels with fine-tuning.
	▲	noxa 5 hours ago \| parent \| prev [-]
		I hate that I laughed at this. Thanks ;)