| ▲ | Eextra953 4 hours ago | |
Am I understanding correctly that an implication of this is reduced context? since they are streaming by splitting the input into streams the total context is now split amongst those streams and a particular streams context will be shorted to to context/ streams? | ||
| ▲ | danlenton an hour ago | parent [-] | |
I think the main benefit is improved speed and parallelism. Very similar to https://thinkingmachines.ai/blog/interaction-models/ | ||