Remix.run Logo
lumost 3 days ago

Did it work? :)

The architecture is very similar offset lstms which have been studied extensively. The main difference is the handover of the hidden state, which my naive mind would assume makes optimization substantially more difficult.

cs702 3 days ago | parent [-]

I haven't had a chance to read the preprint carefully or play with the code yet. Best place to follow what's happening is by looking at the github repo, specifically open and closed issues and pull requests.

lumost 2 days ago | parent [-]

I'll wait until some more benchmarks are run in this case. Unlike traditional software, vetting a model architecture works better than alternatives is a time and compute intensive process. You really can't just download it and "try it out" outside of general purpose models (which this is not).