| ▲ | keyle 6 hours ago | ||||||||||||||||||||||
Does this make any sense, to anyone? | |||||||||||||||||||||||
| ▲ | kannanvijayan 6 hours ago | parent | next [-] | ||||||||||||||||||||||
I think this is an attempt to try to enrich the locality model in transformers. One of the weird things you do in transformers is add a position vector which captures the distance between the token being attended to the some other token. This is obviously not powerful enough to express non-linear relationships - like graph relationships. This person seems to be experimenting with doing pre-processing of the input token set, to linearly reorder it by some other heuristic that might map more closely to the actual underlying relationship between each token. | |||||||||||||||||||||||
| |||||||||||||||||||||||
| ▲ | bee_rider 2 hours ago | parent | prev | next [-] | ||||||||||||||||||||||
I haven’t read the paper yet, but the graph laplacian is quite useful in reordering matrices, so it isn’t that surprising if they managed to get something out of it in ML. | |||||||||||||||||||||||
| ▲ | liteclient 6 hours ago | parent | prev | next [-] | ||||||||||||||||||||||
it makes sense architecturally they replace dot-product attention with topology-based scalar distances derived from a laplacian embedding - that effectively reduces attention scoring to a 1D energy comparison which can save memory and compute that said, i’d treat the results with a grain of salt give there is no peer review, and benchmarks are only on 30M parameter model so far | |||||||||||||||||||||||
| |||||||||||||||||||||||
| ▲ | pwndByDeath 5 hours ago | parent | prev [-] | ||||||||||||||||||||||
No, its a new form of alchemy that turns electricity into hype. The technical jargon is more.of.a thieves cant to help identity other conmen to one another | |||||||||||||||||||||||
| |||||||||||||||||||||||