GitHub - DennisGankin/pdfscrape

Web scraper for PDF files

A webscraper to download all files with a certain suffix found on a given website.

Perfect to download lecture notes, excercise slides or whatever you need from the internet.

Run

python3 scra.py -url https://ocw.mit.edu/resources/res-ll-005-mathematics-of-big-data-and-machine-learning-january-iap-2020/lecture-notes/index.html -dir C:/home/course -suf pdf -inc exercise

Arguments

url: Website url to download your files from
dir: Directory to save files to. Default is current directory
suf: File suffix to download. Default is pdf
inc: Filename on the webpage needs to have this included to be downloaded.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
readme.md		readme.md
scra.py		scra.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web scraper for PDF files

Run

Arguments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Web scraper for PDF files

Run

Arguments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages