Remix clone Hacker News

new | show | ask | jobs Github

	▲	nsingh2 12 hours ago
		One quick way to estimate a lower bound is to take the number of parameters and multiply it with the bits per parameter. So a model with 7 billion parameters running with float8 types would be ~7 GB to load at a minimum. The attention mechanism would require more on top of that, and depends on the size of the context window. You'll also need to load inputs (images in this case) onto the GPU memory, and that depends on the image resolution and batch size.