| ▲ | anonym29 2 hours ago | |
This is a breeze to do with llama.cpp, which has had Anthropic responses API support for over a month now. On your inference machine:
Obviously, feel free to change your port, context size, flash attention, other params, etc.Then, on the system you're running Claude Code on:
Note that the auth token can be whatever value you want, but it does need to be set, otherwise a fresh CC install will still prompt you to login / auth with Anthropic or Vertex/Azure/whatever. | ||
| ▲ | huydotnet an hour ago | parent [-] | |
yup, I've been using llama.cpp for that on my PC, but on my Mac I found some cases where MLX models work best. haven't tried MLX with llama.cpp, so not sure how that will work out (or if it's even supported yet). | ||