Materials for MIT Political Methodology Lab workshop on learning to scrape with Python.
Taken and adapted (currently almost identical) from Andy Halterman's workshop materials on web scraping: https://github.com/ahalterman/learn_to_scrape. The workshop is split into two parts. First, we will get familiar with Python. Then we use it to learn how to scrape websites using BeautifulSoup.
Please follow the Setup instructions on the Wiki before coming to the workshop so we can maximize the amount of time spent learning Python and web scraping.
The presentation source and PDF files are in the PML Presentation folder. The practice exercises we go through during the workshop are in the Python Notebooks folder. The incomplete ones have a _Skeleton appendix and will be filled out during the workshop. For reference, there is a complete version of these notebooks in the Completed subfolder. For best results I suggest not looking at the solutions ahead of time.
Intro_to_Python_for_R_Users_Skeleton.ipynbcontains a skeleton of basic Python programming exercises worked through during the workshop.Intro_to_Python_for_R_Users_Completed.ipynbcontains the "solutions" forIntro_to_Python_for_R_Users_Skeleton.Scraper_Skeleton.ipynbcontains a skeleton of a web scraper and is what we'll be working through during the workshop.Scraper_Completed.ipynbcontains the "solutions" forScraper_Skeleton.