Remix.run Logo
villgax 7 hours ago

Got nuked on day zero by Qwen models at tenth or so of params.

Does not handle critical inputs even for moderation tasks

These guys did not even bother with an official huggingface space

And the biggest stupidity seems to be fixating on MXFP4 for Apple Silicon when it doesn't even have hardware support for it, should have just done Q4 for GGUF based inference

gyan 6 hours ago | parent | next [-]

> These guys did not even bother with an official huggingface space

https://huggingface.co/sarvamai

villgax 6 hours ago | parent [-]

That is their profile not a HF Space

rramadass 2 hours ago | parent [-]

What do you mean? I can see the files, download count, deploy/use this model options etc.

villgax 2 hours ago | parent [-]

What part of a HuggingFace Space do you not understand?

They’ve also not bothered with upstreaming the model arch to transformers and require remote code for their modeling code to run……

rramadass 42 minutes ago | parent [-]

Responding to my question with your own is not an answer. So again; what do you mean by "official huggingface space"? Their profile page does list the various models and their weights. Other members have created spaces using those which can be seen with a simple search.

You have been making some rather bizarre (nuked by Qwen models, does not handle critical inputs etc.) statements which make no sense.

Have you actually downloaded/used/played-with the models? Can you share what you exactly tried out?

petesergeant 5 hours ago | parent | prev [-]

Got to start somewhere.

I do think convincing world-class talent to live in Bangalore is likely to be a challenge though.

th234oi204234 4 hours ago | parent | next [-]

Indians deep-down often aren't comfortable in the West given the subtle racism and general social-rejection (last year's anti-Indian hate on X remains fresh in memory).

BLR has of late become a sort of "refuge" of tech retunees (with horrible third-world government and infrastructure, though). And it shows - the Matryoshka Embeddings being used in Gemini on-device / embedded models, came out of Deepmind BLR.

petesergeant 4 hours ago | parent | next [-]

For sure, there’s no place like home, and people have families and networks they can’t take with them. Still, getting that Western passport is a draw, and there’s always Abu Dhabi if you want quite close to home and a decent biryani, but also want world-class infrastructure and high (although not quite US) wages

chromatin 3 hours ago | parent | prev [-]

[flagged]

3 hours ago | parent [-]
[deleted]
villgax 5 hours ago | parent | prev | next [-]

Bigger issue here is why the government is involved with select companies for subsidizing compute. There’s no pre or post criterion to assess success, it should have just been an open market for people with money to purchase compute instead of 10 companies with no prior experience in making models of any kind.

Public funds should beget public datasets and training scripts to see how it is being aligned as well and not just pandering to a particular govt.

petesergeant 4 hours ago | parent [-]

> Bigger issue here is why the government is involved with select companies for subsidizing compute.

Government-choosing-winners has worked much better, in many such cases, than free-market absolutists would have you believe…

Rakshith an hour ago | parent | prev [-]

[dead]