|  ▲  | jetsnoc 3 days ago | 
 |   Models
    gpt-oss-120b, Meta Llama 3.2, or Gemma
    (just depends on what I’m doing)
  Hardware
    - Apple M4 Max (128 GB RAM)
      paired with a GPD Win 4 running Ubuntu 24.04 over USB-C networking
  Software
    - Claude Code
    - RA.Aid
    - llama.cpp
  For CUDA computing, I use an older NVIDIA RTX 2080 in an old System76 workstation.
  Process
    I create a good INSTRUCTIONS.md for Claude/Raid that specifies a task & production process with a task list it maintains. I use Claude Agents with an Agent Organizer that helps determine which agents to use. It creates the architecture, prd and security design, writes the code, and then lints, tests and does a code review.
  | 
|
 | ▲ | Infernal 3 days ago | parent | next [-] | 
 | What does the GPD Win 4 do in this scenario? Is there a step w/ Agent Organizer that decides if a task can go to a smaller model on the Win 4 vs a larger model on your Mac?  | 
|
 | ▲ | altcognito 3 days ago | parent | prev | next [-] | 
 | What sorts of token/s are you getting with each model?  | 
 | |
  | ▲ | jetsnoc 3 days ago | parent [-] |   | Model performance summary:   **openai/gpt-oss-120b** — MLX (MXFP4), ~66 tokens/sec @ Hugging Face: `lmstudio-community/gpt-oss-120b-MLX-8bit`
  **google/gemma-3-27b** — MLX (4-bit), ~27 tokens/sec @ Hugging Face: `mlx-community/gemma-3-27b-it-qat-4bit`
  **qwen/qwen3-coder-30b** — MLX (8-bit), ~78 tokens/sec @ Hugging Face: `Qwen/Qwen3-Coder-30B-A3B-Instruct`
 
Will reply back and add Meta Llama performance shortly. |  
  | 
|
 | ▲ | CubsFan1060 3 days ago | parent | prev [-] | 
 | What is the Agent Organizer you use?  | 
 |  |