Challenge & Notes

## [Challenge](https://github.com/InformatiCup/InformatiCup2017/blob/master/InformatiCup2017-English.pdf)
- classify/label repos automatically
- analyze relevant features
- document design thoughts and training approach

Documentation Structure
1. Data Exploration and Prediction Model 
- analyze and document relevant features
- document how to avoid overfitting
- explain why we've decided to use the features
- explain how we've developed the prediction model
1. Automated Classification
- implement the app that takes the input format and creates the output format
- either 1) prompt for the training data to use or 2) directly include the learned model
1. Validation
- validate with Appendix B
- create a boolean matrix with our estimated label and the predicted one
- compute recall per category
- compute precision per category
- dicuss quality of results and whether higher yield or higher precision is more important
1. Extension
- use the model for a nice app
1. Furthermore
- document 3 repos where we think our model will yield better results
- install and user manual
- document decisions we made for features, algorithms, data structures, software development tools and practices
## Notes

Examples for DATA-Repositories
openaddresses / openaddresses
unitedstates / congress-legislators
OpenExoplanetCatalogue / open_exoplanet_catalogue
Chicago / food-inspections-evaluation
GSA / data
cernopendata / opendata.cern.ch
benbalter / congressional-districts
## Extension

"Improve yourself"
- Login with Github
  -> Stats of your own repos e.g. 30% Data, 70% Software
  -> Stats of repos your friends recently starred 
  |-Data-| Software | Homework | ...|
  -> Stats of trending repos
  |-Data-| Software | Homework | ...|recently

Sources:
- https://github.com/caesar0301/awesome-public-datasets -> what's hosted at github?
- https://github.com/datasets
- https://github.com/showcases/open-data
- github.com/explore
- github.com/trending


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Challenge & Notes #2

Challenge

Notes

Extension

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Challenge & Notes #2

Description

Challenge

Notes

Extension

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions