Remix clone Hacker News

new | show | ask | jobs Github

	▲	kannanvijayan 6 hours ago
		I think this is an attempt to try to enrich the locality model in transformers. One of the weird things you do in transformers is add a position vector which captures the distance between the token being attended to the some other token. This is obviously not powerful enough to express non-linear relationships - like graph relationships. This person seems to be experimenting with doing pre-processing of the input token set, to linearly reorder it by some other heuristic that might map more closely to the actual underlying relationship between each token.
	▲	adroniser 5 hours ago \| parent [-]
		Adding the position vector is basic sure, but it's naive to think the model doesn't develop its own positional system bootstrapping on top of the barebones one.