Such a fascinating read. I didn't realize how much massaging needed to be done to get the models to perform well. I just sort of assumed they worked out of the box.

▲

acters 9 days ago | parent [-]

Personally, I think bigger companies should be more proactive and work with some of the popular inference engine software devs with getting their special snowflake LLM to work before it gets released. I guess it is all very much experimental at the end of the day. Those devs are putting in God's work for us to use on our budget friendly hardware choices.

▲

mutkach 9 days ago | parent | next [-]

This is a good take, actually. GPT-OSS is not much of a snowflake (judging by the model's architecture card at least) but TRT-LLM treats every model like that - there is too much hardcode - which makes it very difficult to just use it out-of-the-box for the hottest SotA thing.

	▲	diggan 8 days ago \| parent [-]
		> GPT-OSS is not much of a snowflake Yeah, according to the architecture it doesn't seem like a snowflake, but they also decided to invent a new prompting/conversation format (https://github.com/openai/harmony) which definitely makes it a bit of a snowflake today, can't just use what worked a couple of days ago, but everyone needs to add proper support for it.

▲

diggan 9 days ago | parent | prev | next [-]

This is literally what they did for GPT-OSS, seems there was coordination to support it on day 1 with collaborations with OpenAI

▲

eric-burel 9 days ago | parent | prev [-]

SMEs are starting to want local LLMs and it's a nightmare to figure what hardware would work for what models. I am asking devs in my hometown to literally visit their installs to figure combos that work.

▲

CMCDragonkai 9 days ago | parent [-]

Are you installing them onsite?

	▲	eric-burel 8 days ago \| parent [-]
		Some are asking that yeah but I haven't run an install yet, I am documenting the process. This is a last resort, hosting on European cloud is more efficient but some companies don't even want to hear about cloud hosting.