| ▲ | hiroakiaizawa 7 hours ago | |
Nice. What scale does this realistically reach on a single machine? | ||
| ▲ | 6 hours ago | parent | next [-] | |
| [deleted] | ||
| ▲ | lynx97 7 hours ago | parent | prev [-] | |
Model: 36L/36H/576D, 144.2M params runs on a Blackwell 6000 Max-Q, using 86GB VRAM. Training supposedly takes 3h40m | ||