This is the repository for the paper: Knowing You Don’t Know: Learning When to Continue Search in Multi-round RAG through Self-Practicing (SIGIR '25). It provides a framework to run SIM-RAG experiments and evaluate models. Follow the steps below to set up and run your experiments.
-
Clone the repository to your local machine:
git clone https://github.com/your/repository.git cd repository -
Ensure you have all the required dependencies installed (refer to
requirements.txtor installation instructions in the repo). -
If you're using GPT, make sure to set your API key in the environment. You can do this by adding the following line to your .bashrc, .zshrc, or equivalent shell configuration file:
export OPENAI_API_KEY="your-api-key-here"
Then, run:
source ~/.bashrc # or `source ~/.zshrc` for Zsh users
-
Likewise, if you're using Llama, make sure to set the local path to Llama in the environment:
export LLAMA_PATH="/path/to/your/llama"
Then, run:
source ~/.bashrc # or `source ~/.zshrc` for Zsh users
- Download our prebuilt corpus files
corpus.pkl,wiki_corpus.pkl,retriever_settings.pkl, andwiki_retriever_settings.pklinto thebm25_searchdirectory for retrieval.
git clone https://huggingface.co/datasets/dyang39/SIM-RAG-Corpus bm25_search- (Optional) Prepare the original datasets.
The datasets have already been prepared and are ready to use. However, if you'd like to prepare them yourself, you can place the 2WikiMultihopQA dataset (downloaded from its GitHub repository) in the data directory, and the scripts will automatically load HotpotQA and TriviaQA directly from Hugging Face. Once ready, run the following scripts to process the datasets:
python /data/prepare_2wikimultihopqa.py
python /data/prepare_triviaqa.py
python /data/prepare_hotpotqa.pyTo run the SIM-RAG experiment, you'll first need to create a custom script using run_SIM-RAG.py. This script will guide you through entering parameters and generating an executable .sh file for your experiment.
-
Run
run_SIM-RAG.py:python run_SIM-RAG.py
-
Follow the prompts to enter details for the SIM-RAG experiment.
-
After you have entered all details, the script will generate a
.shfile in thebash_scriptsdirectory, which can be used to run the SIM-RAG experiment. -
Change the permissions of the generated
.shfile to make it executable:chmod +x bash_scripts/{script_filename} -
Run the generated
.shfile to start the experiment:./bash_scripts/{script_filename}
Once the SIM-RAG experiment is complete, you can evaluate the predictions using evaluate_SIM-RAG.py.
-
The predictions for each dataset are saved in the
predictions/directory in the format{name}_predictions.csv. -
To evaluate the predictions, run
evaluate_SIM-RAG.pywith the experiment ame:python evaluate_SIM-RAG.py --experiment_name {name} -
The script will output the EM and F1 score of the predictions.
Fine-grained, intermediate evaluation data and statistics can also be found in logs/{name}_log.txt
To run offline, modify the .sh file to run each command with nohup. If you have downloaded checkpoints for an already trained Critic, modify the .sh to only run the last line (inference) while passing the path to the Critic into --dm_path. Make sure the tokenizer is in the same directory.
We provide a general-purpose Critic:
- SIM-RAG-Llama3-2B: This Flan-T5-based Critic is fine-tuned on six datasets, including TriviaQA, HotPotQA, 2WikiMultiHopQA, PopQA, and Musique. It can be directly used for a general-purpose Critic in our Inference pipeline.
To reproduce our result in the paper, we provide the following checkpoints:
If you find this work useful, please kindly cite
@article{yang2025rag,
title={Knowing You Don't Know: Learning When to Continue Search in Multi-round RAG through Self-Practicing},
author={Yang, Diji and Zeng, Linda and Rao, Jinmeng and Zhang, Yi},
journal={arXiv preprint arXiv:2505.02811},
year={2025}
}