| ▲ | rpastuszak 2 days ago | |
Nice! What do you use it for? | ||
| ▲ | mechagodzilla 2 days ago | parent [-] | |
1-2 tokens/sec is perfectly fine for 'asynchronous' queries, and the open-weight models are pretty close to frontier-quality (maybe a few months behind?). I frequently use it for a variety of research topics, doing feasibility studies for wacky ideas, some prototypy coding tasks. I usually give it a prompt and come back half an hour later to see the results (although the thinking traces are sufficiently entertaining that sometimes it's fun to just read as it comes out). Being able to see the full thinking traces (and pause and alter/correct them if needed) is one of my favorite aspects of being able to run these models locally. The thinking traces are frequently just as or more useful than the final outputs. | ||