Remix clone Hacker News
new
|
show
|
ask
|
jobs
Github
▲
reilly3000
5 days ago
It has to be 2^n nodes and limited to one per attention head that the model has.