|
| ▲ | mutkach 9 days ago | parent | next [-] |
| This is a good take, actually. GPT-OSS is not much of a snowflake (judging by the model's architecture card at least) but TRT-LLM treats every model like that - there is too much hardcode - which makes it very difficult to just use it out-of-the-box for the hottest SotA thing. |
| |
| ▲ | diggan 8 days ago | parent [-] | | > GPT-OSS is not much of a snowflake Yeah, according to the architecture it doesn't seem like a snowflake, but they also decided to invent a new prompting/conversation format (https://github.com/openai/harmony) which definitely makes it a bit of a snowflake today, can't just use what worked a couple of days ago, but everyone needs to add proper support for it. |
|
|
| ▲ | diggan 9 days ago | parent | prev | next [-] |
| This is literally what they did for GPT-OSS, seems there was coordination to support it on day 1 with collaborations with OpenAI |
|
| ▲ | eric-burel 9 days ago | parent | prev [-] |
| SMEs are starting to want local LLMs and it's a nightmare to figure what hardware would work for what models. I am asking devs in my hometown to literally visit their installs to figure combos that work. |
| |
| ▲ | CMCDragonkai 9 days ago | parent [-] | | Are you installing them onsite? | | |
| ▲ | eric-burel 8 days ago | parent [-] | | Some are asking that yeah but I haven't run an install yet, I am documenting the process. This is a last resort, hosting on European cloud is more efficient but some companies don't even want to hear about cloud hosting. |
|
|