Skip to content

Latest commit

 

History

History
5 lines (4 loc) · 478 Bytes

File metadata and controls

5 lines (4 loc) · 478 Bytes

PD-Webpage-Classifier

Creates a training set and uses supervised learning (text classification) to build a model which predicts whether a webpage is relevant or not relevant based on features extracted from the website URL and title of the page.

Notes on Current Training Set

The current training set includes 1271 total samples, 700 of which are relevant and 571 of which are not relevant. This training set currently yields an F1 score of .84 for the "relevant" class.