Familiarity-aware Evidence Compression for Retrieval Augmented Generation

We propose FaviComp (Familiarity-aware Evidence Compression) (Findings of EMNLP 2025), a novel training-free evidence compression technique that makes retrieved evidence more familiar to the target model, while seamlessly integrating parametric knowledge from the model.

Installation

conda create -n favicomp python=3.12
conda activate favicomp
pip install -r requirements.txt

Data

Data can be download in this link. Place data/ under root directory.

Run FaviComp

Example script for NQ dataset. Make sure both the compression and target model have the same tokenizer. Change the parameters to run on other datasets and models.

python main.py \
--model_name meta-llama/Llama-3.2-3B-Instruct \
--target_model_name meta-llama/Meta-Llama-3-8B-Instruct \
--alpha 0.5 \
--batch_size 28 \
--dataset nq

model_name: Compression model name (e.g. meta-llama/Llama-3.2-3B-Instruct)
target_model_name: Target model name (e.g. meta-llama/Meta-Llama-3-8B-Instruct)
alpha: Ensemble coefficient alpha
dataset: Dataset ('nq', 'tqa', 'hotpotqa', 'wiki', 'musique')

Evaluation

After running FaviComp, run the performance evaluation script below using the same parameters.

python evaluate.py \
--model_name meta-llama/Llama-3.2-3B-Instruct \
--target_model_name meta-llama/Meta-Llama-3-8B-Instruct \
--alpha 0.5 \
--dataset nq

Calculate perplexity of the compressed evidence using the script below.

python eval_ppl.py \
--model_name meta-llama/Llama-3.2-3B-Instruct \
--target_model_name meta-llama/Meta-Llama-3-8B-Instruct \
--alpha 0.5 \
--dataset nq

Citation

@article{jung2024familiarity,
  title={Familiarity-aware evidence compression for retrieval augmented generation},
  author={Jung, Dongwon and Liu, Qin and Huang, Tenghao and Zhou, Ben and Chen, Muhao},
  journal={arXiv preprint arXiv:2409.12468},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Familiarity-aware Evidence Compression for Retrieval Augmented Generation

Installation

Data

Run FaviComp

Evaluation

Citation

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Familiarity-aware Evidence Compression for Retrieval Augmented Generation

Installation

Data

Run FaviComp

Evaluation

Citation