Skip to content

juanis2112/astra-artifacts

Repository files navigation

Google Scholar Literature Scraper

This library is designed to scrape, store, and process bibliographic data from Google Scholar. It consists of two main components:

  • data_scraper.py: Scrapes academic data and saves each entry as a pickle file.
  • data_handler.py: Reads and processes the stored pickle files, extracting relevant metadata and generating structured outputs.

Installation

Ensure you have Python 3 installed and install scholarly.

Usage

  1. Modify config.json to add the desired queries before running the scraper. This file should contain the search terms or parameters you want to use when collecting data.

  2. Run data_scraper.py script to collect and store academic data into pickle files.

python data_scraper.py
  1. Run data_handler.py script to read and process the stored entries.
python data_handler.py

This will generate structured outputs in multiple formats:

  • JSON (scholar_results.json)
  • CSV (scholar_results.csv)
  • BibTeX (scholar_results.bib)

License

This repository contains both code and data, which are licensed separately:

  • Code: Licensed under the MIT License.
  • Dataset: Licensed under the CDLA-Permissive-2.0 license.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages