| ▲ | Word2Vec-style vector arithmetic on docs embeddings(technicalwriting.dev) | |||||||||||||
| 41 points by surprisetalk 9 days ago | 6 comments | ||||||||||||||
| ▲ | aDyslecticCrow 3 days ago | parent | next [-] | |||||||||||||
Previous discussion | ||||||||||||||
| ▲ | Xx_crazy420_xX 2 days ago | parent | prev | next [-] | |||||||||||||
This is really interesting! I've experimented with similar idea, but with time series forecasting on the sentence embeddings - https://github.com/Srakai/embcaster. It turns out you can tokenise arbitrary information into constant vector which is really useful for later processing. The vec2text (https://github.com/vec2text/vec2text) is an excellent asset if you want to reverse the embeddings back to text. This allows you to encode arbitrary data into standarized vectors, and all the way back. | ||||||||||||||
| ▲ | londons_explore 3 days ago | parent | prev | next [-] | |||||||||||||
You can probably make jointly trained decoder to turn a vector back into a new document which most closely matches. Would be cool to add together the vectors for harry potter and lord of the rings and then decode that into a new book about Frodo going to wizard school to collect the ring to help push Voldemort into mount doom. | ||||||||||||||
| ||||||||||||||
| ▲ | antirez 3 days ago | parent | prev [-] | |||||||||||||
It works with image embeddings too: https://youtu.be/r6TJfGUhv6s?si=_LC0d4Mwyw18c53B | ||||||||||||||