- classify/label repos automatically
- analyze relevant features
- document design thoughts and training approach
Documentation Structure
- Data Exploration and Prediction Model
- analyze and document relevant features
- document how to avoid overfitting
- explain why we've decided to use the features
- explain how we've developed the prediction model
- Automated Classification
- implement the app that takes the input format and creates the output format
- either 1) prompt for the training data to use or 2) directly include the learned model
- Validation
- validate with Appendix B
- create a boolean matrix with our estimated label and the predicted one
- compute recall per category
- compute precision per category
- dicuss quality of results and whether higher yield or higher precision is more important
- Extension
- use the model for a nice app
- Furthermore
- document 3 repos where we think our model will yield better results
- install and user manual
- document decisions we made for features, algorithms, data structures, software development tools and practices
Notes
Examples for DATA-Repositories
openaddresses / openaddresses
unitedstates / congress-legislators
OpenExoplanetCatalogue / open_exoplanet_catalogue
Chicago / food-inspections-evaluation
GSA / data
cernopendata / opendata.cern.ch
benbalter / congressional-districts
Extension
"Improve yourself"
- Login with Github
-> Stats of your own repos e.g. 30% Data, 70% Software
-> Stats of repos your friends recently starred
|-Data-| Software | Homework | ...|
-> Stats of trending repos
|-Data-| Software | Homework | ...|recently
Sources:
Challenge
Documentation Structure
Notes
Examples for DATA-Repositories
openaddresses / openaddresses
unitedstates / congress-legislators
OpenExoplanetCatalogue / open_exoplanet_catalogue
Chicago / food-inspections-evaluation
GSA / data
cernopendata / opendata.cern.ch
benbalter / congressional-districts
Extension
"Improve yourself"
-> Stats of your own repos e.g. 30% Data, 70% Software
-> Stats of repos your friends recently starred
|-Data-| Software | Homework | ...|
-> Stats of trending repos
|-Data-| Software | Homework | ...|recently
Sources: