Remix clone Hacker News

new | show | ask | jobs Github

	▲	yorwba 7 days ago
		The author is using SBERT embeddings, not an instruction-following model, so the "ignore all previous instructions" trick isn't going to work, unless you want to outrank https://en.wikipedia.org/wiki/Ignore_all_rules when people search for what to do after ignoring all previous instructions. Of course a spammer could try to include one sentence with a very close embedding for each query they want to rank for, but this would require combinatorially more effort than keyword stuffing where including two keywords also covers queries including both together.
	▲	zipy124 6 days ago \| parent [-]
		Yes I'm aware they are using embeddings primarily, however (source: "I've added LLM-based reranking and filtering, which those two final sliders represent") they are using LLM's for reranking and filtering, which are vulnerable to the attack I describe. The latter point you pick up on was indeed my point, that you can tweak your SEO spam to give you the embeddings you want to rank for. This actually isn't that difficult given you can run embedding models like SBERT in reverse adversairly to generate text that gives you the best embedding that you want to target (similar to adversarial attacks in image models where you can make a picture of the most zebra like zebra, see the work of Ilia Shumailov former oxford now google deepmind). This is rather cheap and more importantly far far easier to game that ranking high on google where the cost function is unknown. If using an off the shelf embedding like SBERT then the attacker here has the cost function known, and can optimise for it.