Remix clone Hacker News
new
|
show
|
ask
|
jobs
Github
▲
kgeist
a day ago
What about constrained decoding (with JSON schemas)? I noticed my vLLM instance is using 1 CPU 100%.