Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models

Modern information retrieval (IR) models generally match queries to passages based on a single semantic similarity score. This can make the search experience confusing for users, who often have to experiment with keywords, filters, and refine their searches multiple times to find the exact information they need.

To overcome this researchers have proposed Promptriever, which can handle complex instructions including detailed relevance definitions and zero-shot prompting techniques that act as a form of zero-shot hyperparameter optimization, similar to prompting LMs. Promptriever is a bi-encoder retriever with LLaMA-2 7B LLM as backbone and trained on a synthetic dataset of ~500K query-passage relevance pairs augmented with instance-level instructions.

To generate instruction-based retrieval data for Promptriever training researchers took the initial query and relevant passage and prompted an LM to generate an instruction that would match that query. Then LM generates an example relevant and non-relevant passage for that query and instruction. With these two sets of instructions being generated: Instruction-positive, (query, passage) which fulfills the extra requirement and Instruction-negative, where the (query, passage) pair is highly relevant if viewed in isolation, but, the addition of a carefully constructed instruction significantly decreases that relevance.Further, multiple types of instructions (both in length and style) is generated for training set diversity.

During evaluation Promptriever not only achieves strong performance on standard retrieval tasks, but also follows instructions. Further, large gains (reaching SoTA) on detailed relevance instructions (+14.3 p-MRR / +3.1 nDCG on FollowIR) was observed. Significantly increased robustness to lexical choices/phrasing in the query+instruction (+12.9 Robustness@10 on InstructIR) were also observed. Overall Promptriever demonstrates that retrieval models can be controlled with prompts on a per query basis, setting the stage for future work aligning LLM prompting techniques with information retrieval.

Paper : https://arxiv.org/pdf/2409.11136