-
Notifications
You must be signed in to change notification settings - Fork 2
Expand file tree
/
Copy pathREADME.rmd
More file actions
98 lines (64 loc) · 2.83 KB
/
README.rmd
File metadata and controls
98 lines (64 loc) · 2.83 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
<!-- [](https://travis-ci.org/chuvanan/metrics) -->
<!-- [](https://codecov.io/gh/chuvanan/metrics?branch=master) -->
<!-- [](https://www.tidyverse.org/lifecycle/#experimental) -->
## metrics
### Why yet another R metrics package?
Because I believe there's still a niche for an R package that have all the
following traits in one place:
- *Simple*
- *Consistent interface*
- *Well-documented*
- *Well-tested*
- *Accurate and fast*
Why do I think so? During my evaluation work on a machine learning project, I
haven't found any single R package that is on a par with scikit-learn's
`metrics` module in term of coverage, ease of use, throughout testing and
documentation richness. For instance,
* The two major frameworks for doing machine learning in R are `caret` and
`mlr(3)`. The next generation of `caret` is `tinymodels` of which `parnship`
is the main package for performance metrics.
* `pROC`, `precrec`
* `InformationValue`
* `Metrics`, `ModelMetrics`
I'm not saying that these packages are terrible. However, they're often created
for a very specific use case(s) with highly varied quality and design.
### Overview of `metrics`
#### Installation
Install the stable version of `metrics` from CRAN:
```{r, eval=FALSE}
install.packages("metrics")
```
Or install the development version from Github with:
```{r, eval=FALSE}
devtools::install_github("chuvanan/metrics")
```
#### Getting started
All `metrics` functions share the same interface: `mtr_fun(actual, predicted)`
which is applicable to both classification and regression settings. The design
and rationale behind the API:
* `mtr_` is the short form of **m**e**tr**ics. As in `stringr` package,
`metrics` uses the prefix to provide consistent naming that makes it easy to
type with autocompletion in RStudio or Emacs's ESS.
* `_fun` is the name of performance metrics. The package declares verbosely
which measure is going to be used. For a full list of evaluation metrics,
please see TODO.
* `metrics` prefers convention over configuration. Argument `actual`, in context
of classification tasks, stricly accepts binary values `0` and `1` where the
former is coded as negative class and the later is positive one.
Here's a quick example of `metrics` in action:
```{r}
library(metrics)
## simulate sample data set
set.seed(123)
preds <- runif(1000)
truth <- round(preds)
preds[sample(1000, 300)] <- runif(300) # noise
## overall accuracy
mtr_accuracy(truth, preds) # default threshold is 0.5
## precision
mtr_precision(truth, preds)
## recall
mtr_recall(truth, preds)
## AUROC
mtr_auc_roc(truth, preds)
```