Skip to content

Challenge & Notes #2

@WGierke

Description

@WGierke

Challenge

  • classify/label repos automatically
  • analyze relevant features
  • document design thoughts and training approach

Documentation Structure

  1. Data Exploration and Prediction Model
  • analyze and document relevant features
  • document how to avoid overfitting
  • explain why we've decided to use the features
  • explain how we've developed the prediction model
  1. Automated Classification
  • implement the app that takes the input format and creates the output format
  • either 1) prompt for the training data to use or 2) directly include the learned model
  1. Validation
  • validate with Appendix B
  • create a boolean matrix with our estimated label and the predicted one
  • compute recall per category
  • compute precision per category
  • dicuss quality of results and whether higher yield or higher precision is more important
  1. Extension
  • use the model for a nice app
  1. Furthermore
  • document 3 repos where we think our model will yield better results
  • install and user manual
  • document decisions we made for features, algorithms, data structures, software development tools and practices

Notes

Examples for DATA-Repositories
openaddresses / openaddresses
unitedstates / congress-legislators
OpenExoplanetCatalogue / open_exoplanet_catalogue
Chicago / food-inspections-evaluation
GSA / data
cernopendata / opendata.cern.ch
benbalter / congressional-districts

Extension

"Improve yourself"

  • Login with Github
    -> Stats of your own repos e.g. 30% Data, 70% Software
    -> Stats of repos your friends recently starred
    |-Data-| Software | Homework | ...|
    -> Stats of trending repos
    |-Data-| Software | Homework | ...|recently

Sources:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions