Add ARIMA Normalization Functionality#89
Conversation
| @@ -0,0 +1,246 @@ | |||
| # Copyright 2020 Goldman Sachs. | |||
There was a problem hiding this comment.
the file should live in one of the packages - either statistics or econometrics
| self.best_params = {} | ||
|
|
||
|
|
||
| def _evaluate_arima_model(self, X: Union[pd.Series, pd.DataFrame], arima_order: Tuple[int, int, int], train_size: float, freq: str) -> Tuple[float, dict]: |
There was a problem hiding this comment.
train size should be an int
There was a problem hiding this comment.
I changed it so it could take in a float, int or None (similar to what scikit-learn does).
- If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set 0.75 (https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html).
Is that too complicated or should we just use int to simplify things
| best_ma_coef = ma_coef | ||
| best_resid = resid | ||
| except Exception as e: | ||
| print(' {}'.format(e)) |
There was a problem hiding this comment.
pls raise exception, remove print
There was a problem hiding this comment.
Certain combinations of (p, q, d) will raise the following exception: Estimation requires the inclusion of least one AR term, MA term, a constant or an exogenous variable.
Raise exception will then break the training loop. Maybe it's a better idea to just print the error and move on to the next combination of (p, q, d)?
ARIMA here is used without the moving averages component to normalize and forecast time series data.
An ARIMA model is selected from 9 possible combinations: (0,0,0), (1,0,0), (2,0,0), (0,1,0), (1,1,0), (2,1,0), (0,2,0), (1,2,0), (2,2,0). The time series is split into train and test sets and an ARIMA model is fit for every combination on the training set. The model with the lowest mean-squared error (MSE) on the test set is selected as the best model. The original times series can then be transformed by the best model.