Anomaly Models#
Models which implment a gordo.machine.model.anomaly.base.AnomalyDetectorBase.anomaly() and can be served under the
model server POST /prediction endpoint.
AnomalyDetectorBase#
The base class for all other anomaly detector models
- class gordo.machine.model.anomaly.base.AnomalyDetectorBase(**kwargs)[source]#
Bases:
BaseEstimator,GordoBaseInitialize the model
- abstract anomaly(X: DataFrame | DataArray, y: DataFrame | DataArray, frequency: timedelta | None = None) DataFrame | Dataset[source]#
Take
X,yand optionally frequency; returning a dataframe containing anomaly score(s)
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') AnomalyDetectorBase#
Request metadata passed to the
scoremethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
pipeline.Pipeline. Otherwise it has no effect.
DiffBasedAnomalyDetector#
Calculates the absolute value prediction differences between y and yhat as well
as the absolute difference error between both matrices via numpy.linalg.norm()
- class gordo.machine.model.anomaly.diff.DiffBasedAnomalyDetector(base_estimator: BaseEstimator = tensorflow.keras.wrappers.scikit_learn.KerasRegressor, scaler: TransformerMixin = MinMaxScaler(), require_thresholds: bool = True, shuffle: bool = False, window: int | None = None, smoothing_method: str | None = None)[source]#
Bases:
AnomalyDetectorBaseEstimator which wraps a
base_estimatorand provides a diff error based approach to anomaly detection.It trains a
scalerto the target after training, purely for error calculations. The underlyingbase_estimatoris trained with the original, unscaled,y.Threshold calculation is based on a rolling statistic of the validation errors on the last fold of cross-validation.
- Parameters:
base_estimator – The model to which normal
.fit,.predictmethods will be used. defaults to py:class:gordo.machine.model.models.KerasAutoEncoder withkind='feedforward_hourglassscaler – Defaults to
sklearn.preprocessing.RobustScalerUsed for transforming model output and the originalyto calculate the difference/error in model output vs expected.require_thresholds – Requires calculating
thresholds_via a call tocross_validate(). If this is set (default True), butcross_validate()was not called before callinganomaly()anAttributeErrorwill be raised.shuffle – Flag to shuffle or not data in
.fitso that the model, if relevant, will be trained on a sample of data accross the time range and not just the last elements according to model argvalidation_split.window – Window size for smoothed thresholds
smoothing_method – Method to be used together with
windowto smooth metrics. Must be one of: ‘smm’: simple moving median, ‘sma’: simple moving average or ‘ewma’: exponential weighted moving average.
- anomaly(X: DataFrame | DataArray, y: DataFrame | DataArray, frequency: timedelta | None = None) DataFrame | Dataset[source]#
Create an anomaly dataframe from the base provided dataframe.
- Parameters:
X – Dataframe representing the data to go into the model.
y – Dataframe representing the target output of the model.
- Returns:
A superset of the original base dataframe with added anomaly specific
features
- cross_validate(*, X: DataFrame | ndarray, y: DataFrame | ndarray, cv=TimeSeriesSplit(gap=0, max_train_size=None, n_splits=3, test_size=None), **kwargs)[source]#
Run TimeSeries cross validation on the model, and will update the model’s threshold values based on the cross validation folds.
- Parameters:
X – Input data to the model
y – Target data
kwargs – Any additional kwargs to be passed to
sklearn.model_selection.cross_validate()
- score(X: ndarray | DataFrame, y: ndarray | DataFrame, sample_weight: ndarray | None = None) float[source]#
Score the model; must implement the correct default scorer based on model type
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') DiffBasedAnomalyDetector#
Request metadata passed to the
scoremethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
pipeline.Pipeline. Otherwise it has no effect.
- class gordo.machine.model.anomaly.diff.DiffBasedKFCVAnomalyDetector(base_estimator: BaseEstimator = tensorflow.keras.wrappers.scikit_learn.KerasRegressor, scaler: TransformerMixin = MinMaxScaler(), require_thresholds: bool = True, shuffle: bool = True, window: int = 144, smoothing_method: str = 'smm', threshold_percentile: float = 0.99)[source]#
Bases:
DiffBasedAnomalyDetectorEstimator which wraps a
base_estimatorand provides a diff error based approach to anomaly detection.It trains a
scalerto the target after training, purely for error calculations. The underlyingbase_estimatoris trained with the original, unscaled,y.Threshold calculation is based on a percentile of the smoothed validation errors as calculated from cross-validation predictions.
- Parameters:
base_estimator – The model to which normal
.fit,.predictmethods will be used. defaults to py:class:gordo.machine.model.models.KerasAutoEncoder withkind='feedforward_hourglassscaler – Defaults to
sklearn.preprocessing.RobustScalerUsed for transforming model output and the originalyto calculate the difference/error in model output vs expected.require_thresholds – Requires calculating
thresholds_via a call tocross_validate(). If this is set (default True), butcross_validate()was not called before callinganomaly()anAttributeErrorwill be raised.shuffle – Flag to shuffle or not data in
.fitso that the model, if relevant, will be trained on a sample of data accross the time range and not just the last elements according to model argvalidation_split.window – Window size for smooth metrics and threshold calculation.
smoothing_method – Method to be used together with
windowto smooth metrics. Must be one of: ‘smm’: simple moving median, ‘sma’: simple moving average or ‘ewma’: exponential weighted moving average.threshold_percentile – Percentile of the validation data to be used to calculate the threshold.
- cross_validate(*, X: DataFrame | ndarray, y: DataFrame | ndarray, cv=KFold(n_splits=5, random_state=0, shuffle=True), **kwargs)[source]#
Run Kfold cross validation on the model, and will update the model’s threshold values based on a percentile of the validation metrics.
- Parameters:
X – Input data to the model
y – Target data
kwargs – Any additional kwargs to be passed to
sklearn.model_selection.cross_validate()
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') DiffBasedKFCVAnomalyDetector#
Request metadata passed to the
scoremethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
pipeline.Pipeline. Otherwise it has no effect.