- Python web scraper for automobile workshop repair manuals from https://manuals.co
- Generates one pdf
full_manual.pdfby converting individual html pages to pdf, cropping the manual page, and merging individual pdfs into one.
Steps:
-
Download code and create folders
raw/andcrop/ -
Update
config.pyfor your URL and min and max page -
Run web scraping code. Update config.py if you can't get all the pages in one shot
$ python scrape_to_pdf.py -
Run
crop_pdf.pyto get pdfs with the correct dimensions. The individual cropped files are in directorycrop/$ python crop_pdf.py -
Run
merge_pdf.pyto combine individual pdfs into one output file:full_manual.pdf$ python merge_pdf.py -
Check the contents of
full_manual.pdfand if satisfied, manually delete individual pdf files in the temp foldersraw/andcrop/
Requires standard Python and a couple non-standard libraries: pdfkit and PyPDF2.
No warranty. For personal and experimental use. Public domain license.