| ▲ | ericd 6 hours ago | ||||||||||||||||
You probably want to try renting some time on a dedicated box with roughly the specs you’re considering and running the open models for a bit to see if you would actually use them before dropping a lot on local hardware. A 128 gig MacBook Pro isn’t going to get you an amazing model, and certainly not amazing speed. GLM 5.2 wants something like 350+ gigs at fp4 iirc. | |||||||||||||||||
| ▲ | zackify 4 hours ago | parent | next [-] | ||||||||||||||||
I ran glm 5.2 on rented 8x h200 it could only do 2x concurrency at a cost of $40 an hour. It felt great but dang I wish it was cheaper... It needs 750 at fp8 | |||||||||||||||||
| |||||||||||||||||
| ▲ | traceroute66 5 hours ago | parent | prev [-] | ||||||||||||||||
> You probably want to try renting some time on a dedicated box with roughly the specs you’re considering and running the open models You don't even need to go that far. For example, with Exoscale Dedicated Inference[1] you just point it at the Hugging Face for the model and quantisation you want to test and it automagically spits out an OpenAI-compatible API endpoint. [1] https://www.exoscale.com/ai-cloud-infrastructure/dedicated-i... (I have no relationship with Exoscale, this particular product just crossed my radar recently) | |||||||||||||||||
| |||||||||||||||||