Rerank-2.5 and rerank-2.5-lite: instruction-following rerankers

I read the introductory post but I still don't quite understand

	▲	daemonologist 4 days ago \| parent \| next [-]
		A re-ranker takes a query and a chunk of text and assigns them a relevance score according to how well the text answers the query. (Generally - in theory you could have some other metric of relevance.) They're called "re"rankers specifically because they're usually downstream of a faster but less accurate relevance algorithm (some kind of full text search and/or vector similarity) in a search pipeline. Rerankers have to run from scratch on every query-document pair and are relatively computationally expensive, and so are practical to run only on a small number of documents. An "instruction following" reranker basically just has a third input which is intended to be used kind of like a system prompt for an LLM - to provide additional context to all comparisons.
	▲	gnulinux 4 days ago \| parent \| prev \| next [-]
		Rerankers are used downstream from an embedding model. Embedding models are "coarse" so they give false positives for things that may not be as relevant as contender text. Re-ranker, ranks bunch of text based on a query in order to find the most relevant ones. You can then take them and feed them as context to some other query.
	▲	ethan_smith 4 days ago \| parent \| prev \| next [-]
		A reranker takes an initial set of results (e.g., from a retrieval system) and reorders them based on relevance to a query. Unlike standard retrieval which focuses on recall, rerankers optimize precision by applying more compute to a smaller candidate set.
	▲	namibj 4 days ago \| parent \| prev [-]
		Basically re-ranking is similar to how RAG functions for normal LLM tasks, in that you smash the fragment down into an embedding vector you use for a vector database to coarsely search, and then you use a powerful model with the task/query/prompt in context to digest the raw fragment(s) and then produce some more query-aware scoring/sorting (re-ranker) or text-output/response (LLM with RAG). For example, LeanDojo[0], a LLM based proof search automation helper — or rather specifically it's LeanCopilot interface — had two modes[1] of operation: One where it was fed the state of the interactive proof search [akin to what a human would be provided in the interactive mode, facilitated by hooking the existing breath-first brute-force exploration machinery that used hard-coded rules for what patterns to (try to) apply which tactics to; this is basically like semantic/AST-with-types "regex"-like search-and-replace except that it's searching a tree/DAG of "proof states" ("multiverse/branching") instead of performing destructive replacing] and fine-tuned to output a command step/"substitution" like a human (or rather, the fixed pattern matching) would, utilizing beam search decoding (IIRC instead of setting a "temperature") to generate several candidates along with native probability ascriptions to each candidate based on just multiplying the token probabilities along the generation. The other was to do vector database RAG using a specifically fine-tuned (via contrastive learning, based on digested official library semantic information as to which of these is utilized in which context, and contrasted with which are not utilized in said context) embedding model to index what they call "premises" (basically "theorem"-definitions (just the full definition but without the proof of it's truthiness) and other theorem+like things) into a vector database, from which they then picked as many top (but, without re-ranking first!) hits from the vector DB as would fit into the prompt context [they used Google's ByT5 as a base for all, which is a fairly small (IIRC about 0.3B, aka 300M) LLM in encoder-decoder architecture (almost; they extended the input vocabulary by some helper/special tokens) without tokenization (directly on UTF-8 bytes) pre-trained to do translation, reasoning, cloze (fill-in-the-blank, IIRC even multiple in the same prompt, which it used some of the special tokens for to reference them), etc.] after first putting in the current state. Last I checked they sadly didn't have functioning on-the-fly vector DB filling with current project premises, but from looking at the front page to link here I noticed they appear to have worked on some (possibly substantial) improvements since then. [0]: https://leandojo.org/ [1]: https://github.com/lean-dojo/ReProver?tab=readme-ov-file#usi...

▲

jpctan 4 days ago | parent | prev | next [-]

Hi fzliu,

Have you considered to add keyword search and other factors into the re-ranker?

Other factors are formatted texts like bold, heading, bullet points, as well as bunch of factors typically seen in web search techniques?

▲

mediaman 4 days ago | parent [-]

Keyword search (or something similar in concept, like bm25) would typically be first stage, rather than second, since it can be done with an inverted index.

	▲	jpctan 4 days ago \| parent [-]
		I'm aware of that. My question was around using it in the ranking algorithm.

▲

sroussey 4 days ago | parent | prev [-]

Not really sure why they have a HuggingFace presence.

https://huggingface.co/voyageai/rerank-2.5-lite

▲

xfalcox 4 days ago | parent | next [-]

Having a public tokenizer is quite useful, specially for embeddings. It allows you to do the chunking locally without going to the internet.

	▲	jgalt212 4 days ago \| parent [-]
		and you don't have to re-embed everything if the provider sunsets a model.

▲

woadwarrior01 4 days ago | parent | prev [-]

To inform us that these models are based on Qwen 2.5. :D