Remix.run Logo
danielhanchen 2 hours ago

Oh thanks :) We're also going to add MTP support soon for Qwen3.6!

95% of it is fully human done - the maths, algos, code snippets, screenshots & benchmarks are done / conducted by us and NVIDIA :)

We did use AI to fix spelling errors + made some nice plots using Chat (ours would look horrible lol)

Update - Just got rid of the spiced up intro

stared 3 minutes ago | parent [-]

Thanks!

To be clear, I use AI for editing all the time. Actually, diagrams are nice.

Just some pieces like that look like copy-paste (I mean, empty lines before, code get no special typography, etc):

  If we write the boundary information for a packed batch as:
  
  B = { lengths, cu_seqlens, max_seqlen, mask structure }
  
  then every transformer layer in that forward pass consumes the same B.
  
  If the model has L layers, rebuilding or re-synchronizing on B once per layer is not new work. It is the same information being reconstructed again and again.
  
  In other words, the useful work is:
  
  build B once, use it L times.
  
  The wasteful version is:
  
  build B + build B + ⋯ + build B (L times)