| ▲ | Which one is more important: more parameters or more computation? (2021)(parl.ai) | |
| 16 points by jxmorris12 a day ago | 1 comments | ||
| ▲ | vorticalbox 18 minutes ago | parent [-] | |
This reminds me of https://dnhkng.github.io/posts/rys/ David looks into the LLM finds the thinking layers and cut duplicates then and put them back to back. This increases the LLM scores with basically no over head. Very interesting read. | ||