| ▲ | bcjdjsndon 3 days ago | |
Optimizations, like I said. They'll never hack away the massive memory requirements however, or the pre training... Imagine the memory requirements without the pre training step....this is just part and parcel of the transformer architecture. | ||
| ▲ | bcjdjsndon 3 days ago | parent [-] | |
And a lot of these improvements are really just classic automation or chaining together yet more transformer architectures, to fix issues the transformer architecture creates in the first place (hallucinations, limited context) | ||