Remix.run Logo
woah 2 days ago

The three goals featured prominently above the fold are:

> truly open > including data, documentation, training and testing code, and evaluation metrics; including community involvement

> compliant > under EU regulations, OpenEuroLLM will provide a series of transparent and performant LLMs

> diverse > for European languages and other socially and economically interesting ones, preserving linguistic and cultural diversity

The first one seems good, but the second two seem to be pretty beside the point of creating models that compete with the cutting edge of China and the USA.

rafram a day ago | parent | next [-]

People on HN complain constantly about "open-source" models not releasing their training data. That's what the second point ("transparent") seems to be alluding to. And that's a bad thing?

Others have responded to your "diversity" point, but making sure to train on adequate amounts of data in all EU languages is valuable, especially because LLMs are so prone to generating convincing BS when working close to the edges of their training set. If this exists, people in Malta are going to want to use it, so better for it to generate good Maltese than gibberish that sort of looks like Maltese, right?

ben_w 2 days ago | parent | prev | next [-]

Why would diversity, especially linguistic diversity, be besides the point? Europe is a lot more culturally and linguistically diverse than either the USA or China.

Hier spricht man Deutsch.

A 600 km à l'ouest, on parle français.

50 km na wschód, Polska.

360 χλμ βόρεια, Δανέζικα, Σουηδικά; 250 χλμ νότια, Τσεχία; 750 χλμ νοτιοανατολικά, Ουγγρικά; και τα λοιπά.

Europe has a need, that the other models aren't bothered by — they can do it, but more by happenstance than on purpose.

woah a day ago | parent | next [-]

Depends on the goals. If they were fine-tuning leading foundation models, then I could see this being an entirely sensible undertaking. But since their goal seems to be to make foundation models, I don't think that they will end up being the leading models with so many other conflicting requirements.

pastage a day ago | parent | prev [-]

Of the four languages I speak the different models do a pretty good job. I am sure there is something extra that can be added, but atm it is good enough for me.

layer8 a day ago | parent | prev | next [-]

Compliance and language diversity are important motivations to not just use the existing foreign models.

blackeyeblitzar 2 days ago | parent | prev [-]

That note about EU regulations may also be dangerous. There is an increasing trend of European leaders supporting censorship of speech, on weak justifications like misinformation that are applied very aggressively. There are even videos of police showing up at people’s homes in some countries, over tweets they made. I don’t have faith that these European LLMs will be trustworthy as a result.

yorwba a day ago | parent | next [-]

Laws against defamation and fraud aren't exactly a new trend, nor are they limited to Europe.

I guess some people are surprised police might get involved in a defamation case because in the US it's not a crime but a civil wrong? Which means you can't get help from the police to identify the person who made a defamatory tweet? Or something?

papertokyo a day ago | parent | prev | next [-]

Then you should also question what flavor of censorship and bias US-made LLMs have.

Also, if someone says something that could threaten my safety (either directly or through inciting others) I would very much like them to get a visit from the police. This situation is so easily avoided by not being a dick to people.

logicchains 2 days ago | parent | prev | next [-]

Most of the people working on building American LLMs also support such censorship, they just don't have the political power now to achieve it in the US, especially given the first amendment.

Fnoord a day ago | parent | prev [-]

> There are even videos of police showing up at people’s homes in some countries, over tweets they made.

Yeah, if you are from The Netherlands and want police showing up your door, mention on Twitter that you want to shoot mr. Wilders. Threatening someone to take away their life has repercussions. How peculiar!

(Please don't do it. Example is just illustrative. Actually, I know a website with a forum where this happened approx 20 years ago. Server got seized. They didn't log. FDE, but obviously got broken at some point.)

Freedom of speech isn't that you can spout whatever you want and not face repercussions.

Besides that, there's Popper.

Furthermore, there's this thing called chilling effect. You might wanna ask GOP Senators and Congressman about that.

I have faith in LLMs and AI, as long as it is reproducible and transparent. Right now, when I use Mistral, it refers to sources. A step in the right direction.