Remix clone Hacker News

new | show | ask | jobs Github

	▲	asjir 4 days ago
		To expand upon the other comment: Indexing and multiplying with one-hot embeddings are equivalent. IF N is vocab size and L is sequence length, you'd need to create a NxL matrix, and multiply it with the embedding matrix. But since your NxL matrix will be sparse with only a single 1 per column, it'd make sense to represent it internally as just one number per column, representing the index at which 1 is. At which point if you defined new multiplication by this matrix, it would basically just index with this number. And just like you write a special forward pass, you can write a special backward pass so that backpropagation would reach it.