Basically re-ranking is similar to how RAG functions for normal LLM tasks, in that you smash the fragment down into an embedding vector you use for a vector database to coarsely search, and then you use a powerful model with the task/query/prompt in context to digest the raw fragment(s) and then produce some more query-aware scoring/sorting (re-ranker) or text-output/response (LLM with RAG).
For example, LeanDojo[0], a LLM based proof search automation helper — or rather specifically it's LeanCopilot interface — had two modes[1] of operation:
One where it was fed the state of the interactive proof search [akin to what a human would be provided in the interactive mode, facilitated by hooking the existing breath-first brute-force exploration machinery that used hard-coded rules for what patterns to (try to) apply which tactics to; this is basically like semantic/AST-with-types "regex"-like search-and-replace except that it's searching a tree/DAG of "proof states" ("multiverse/branching") instead of performing destructive replacing] and fine-tuned to output a command step/"substitution" like a human (or rather, the fixed pattern matching) would, utilizing beam search decoding (IIRC instead of setting a "temperature") to generate several candidates along with native probability ascriptions to each candidate based on just multiplying the token probabilities along the generation.
The other was to do vector database RAG using a specifically fine-tuned (via contrastive learning, based on digested official library semantic information as to which of these is utilized in which context, and contrasted with which are not utilized in said context) embedding model to index what they call "premises" (basically "theorem"-definitions (just the full definition but without the proof of it's truthiness) and other theorem+like things) into a vector database,
from which they then picked as many top (but, without re-ranking first!) hits from the vector DB as would fit into the prompt context
[they used Google's ByT5 as a base for all, which is a fairly small (IIRC about 0.3B, aka 300M) LLM in encoder-decoder architecture (almost; they extended the input vocabulary by some helper/special tokens) without tokenization (directly on UTF-8 bytes) pre-trained to do translation, reasoning, cloze (fill-in-the-blank, IIRC even multiple in the same prompt, which it used some of the special tokens for to reference them), etc.]
after first putting in the current state.
Last I checked they sadly didn't have functioning on-the-fly vector DB filling with current project premises, but from looking at the front page to link here I noticed they appear to have worked on some (possibly substantial) improvements since then.
[0]: https://leandojo.org/
[1]: https://github.com/lean-dojo/ReProver?tab=readme-ov-file#usi...