Skip to content
This repository was archived by the owner on Nov 10, 2022. It is now read-only.

adamkaplan0/PML_Web_scraping

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Web scraping

Materials for MIT Political Methodology Lab workshop on learning to scrape with Python. Taken and adapted (currently almost identical) from Andy Halterman's workshop materials on web scraping: https://github.com/ahalterman/learn_to_scrape. The workshop is split into two parts. First, we will get familiar with Python. Then we use it to learn how to scrape websites using BeautifulSoup.

Before coming to the workshop

Please follow the Setup instructions on the Wiki before coming to the workshop so we can maximize the amount of time spent learning Python and web scraping.

Contents

The presentation source and PDF files are in the PML Presentation folder. The practice exercises we go through during the workshop are in the Python Notebooks folder. The incomplete ones have a _Skeleton appendix and will be filled out during the workshop. For reference, there is a complete version of these notebooks in the Completed subfolder. For best results I suggest not looking at the solutions ahead of time.

  • Intro_to_Python_for_R_Users_Skeleton.ipynb contains a skeleton of basic Python programming exercises worked through during the workshop.
  • Intro_to_Python_for_R_Users_Completed.ipynb contains the "solutions" for Intro_to_Python_for_R_Users_Skeleton.
  • Scraper_Skeleton.ipynb contains a skeleton of a web scraper and is what we'll be working through during the workshop.
  • Scraper_Completed.ipynb contains the "solutions" for Scraper_Skeleton.

About

Materials for the MIT PML workshop on Web Scraping and Basic Python.

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages

  • Jupyter Notebook 100.0%