archv is a Python package created to retrieve, process, and perform Natural Language Processing (NLP) on news articles. This package includes modules for extracting news information, embedding generation, and the implementation of a recommendation system of news articles by implementing a Redis VSS backend.
Try out archv on Google Colab.
- Install the package via
pip:pip install git+https://github.com/rdsilva01/archv.git
Contributions are welcome! Please fork the repository and submit a pull request with your changes. Make sure to write tests and update documentation where applicable.
This project is licensed under the MIT License. See LICENSE file for more details.
If you use archv in your research, please cite:
@inproceedings{silva2025rebuilding,
author = {Rodrigo Silva and Ricardo Campos},
title = {Rebuilding the Past: Reconstructing Portuguese News Outlets with Web Archives},
booktitle = {Advances in Information Retrieval (ECIR)},
year = {2025},
doi = {10.1007/978-3-031-88720-8_15}
}