Builder#

Model builder#

class gordo.builder.build_model.ModelBuilder(machine: Machine, back_compatibles: dict[tuple[Optional[str], str], tuple[Optional[str], str]] | None = None, default_data_provider: str | None = None)[source]#

Bases: object

Build a model for a given gordo.machine.Machine

Parameters:: machine –

Example

>>> from gordo_core.sensor_tag import SensorTag
>>> from gordo.machine import Machine
>>> machine = Machine.from_config(dict(
...     name="special-model-name",
...     model={"sklearn.decomposition.PCA": {"svd_solver": "auto"}},
...     dataset={
...         "type": "RandomDataset",
...         "train_start_date": "2017-12-25 06:00:00Z",
...         "train_end_date": "2017-12-30 06:00:00Z",
...         "tag_list": [SensorTag("Tag 1"), SensorTag("Tag 2")],
...         "target_tag_list": [SensorTag("Tag 3"), SensorTag("Tag 4")]
...     },
...     project_name='test-proj',
... ))
>>> builder = ModelBuilder(machine=machine)
>>> model, machine = builder.build()

build(output_dir: PathLike | str | None = None, model_register_dir: PathLike | str | None = None, replace_cache=False) → Tuple[BaseEstimator, Machine][source]#

Always return a model and its metadata.

If output_dir is supplied, it will save the model there. model_register_dir points to the model cache directory which it will attempt to read the model from. Supplying both will then have the effect of both; reading from the cache and saving that cached model to the new output directory.

Parameters:

output_dir – A path to where the model will be deposited.
model_register_dir – A path to a register, see :func:gordo.util.disk_registry. If this is None then always build the model, otherwise try to resolve the model from the registry.
replace_cache – Forces a rebuild of the model, and replaces the entry in the cache with the new model.

Return type:

Built model and an updated Machine

static build_metrics_dict(metrics_list: list, y: DataFrame, scaler: TransformerMixin | str | None = None) → dict[source]#

Given a list of metrics that accept a true_y and pred_y as inputs this returns a dictionary with keys in the form ‘{score}-{tag_name}’ for each given target tag and ‘{score}’ for the average score across all target tags and folds, and values being the callable make_scorer(metric_wrapper(score)). Note: score in {score}-{tag_name} is a sklearn’s score function name with ‘_’ replaced by ‘-’ and tag_name corresponds to given target tag name with ‘ ‘ replaced by ‘-‘.

Parameters:

metrics_list – List of sklearn score functions
y – Target data
scaler – Scaler which will be fitted on y, and used to transform the data before scoring. Useful when the metrics are sensitive to the amplitude of the data, and you have multiple targets.

Return type:

dict

static build_split_dict(X: DataFrame, split_obj: Type[BaseCrossValidator]) → dict[source]#

Get dictionary of cross-validation training dataset split metadata

Parameters:

X – The training dataset that will be split during cross-validation.
split_obj – The cross-validation object that returns train, test indices for splitting.

Return type:

Dictionary of cross-validation train/test split metadata

property cache_key: str#

property cached_model_path: PathLike | str | None#

calculate_cache_key(machine: Machine) → str[source]#

Calculates a hash-key from the model and data-config.

Return type:: A 512 byte hex value as a string based on the content of the parameters.

Examples

>>> from gordo.machine import Machine
>>> from gordo_core.sensor_tag import SensorTag
>>> machine = Machine.from_config(dict(
...     name="special-model-name",
...     model={"sklearn.decomposition.PCA": {"svd_solver": "auto"}},
...     dataset={
...         "type": "RandomDataset",
...         "train_start_date": "2017-12-25 06:00:00Z",
...         "train_end_date": "2017-12-30 06:00:00Z",
...         "tag_list": [SensorTag("Tag 1"), SensorTag("Tag 2")],
...         "target_tag_list": [SensorTag("Tag 3"), SensorTag("Tag 4")]
...     },
...     project_name='test-proj'
... ))
>>> builder = ModelBuilder(machine)
>>> len(builder.cache_key)
128

static check_cache(model_register_dir: PathLike | str, cache_key: str)[source]#

Checks if the model is cached, and returns its path if it exists.

Parameters:

model_register_dir – The register dir where the model lies.
cache_key – A 512 byte hex value as a string based on the content of the parameters.

Return type:

The path to the cached model, or None if it does not exist.

property gordo_version#

static metrics_from_list(metric_list: List[str] | None = None) → List[Callable][source]#

Given a list of metric function paths. ie. sklearn.metrics.r2_score or simple function names which are expected to be in the sklearn.metrics module, this will return a list of those loaded functions.

Parameters:: metrics – List of function paths to use as metrics for the model Defaults to those specified in gordo.workflow.config_components.NormalizedConfig sklearn.metrics.explained_variance_score, sklearn.metrics.r2_score, sklearn.metrics.mean_squared_error, sklearn.metrics.mean_absolute_error
Return type:: A list of the functions loaded
Raises:: AttributeError: – If the function cannot be loaded.

set_seed(seed: int)[source]#

Local Model builder#

This is meant to provide a good way to validate a configuration file as well as to enable creating and testing models locally with little overhead.

gordo.builder.local_build.local_build(config_str: str) → Iterable[Tuple[BaseEstimator | None, Machine]][source]#

Build model(s) from a bare Gordo config file locally.

This is very similar to the same steps as the normal workflow generation and subsequent Gordo deployment process makes. Should help developing locally, as well as giving a good indication that your config is valid for deployment with Gordo.

Parameters:: config_str – The raw yaml config file in string format.

Examples

>>> import numpy as np
>>> config = '''
... machines:
...       - dataset: |
...           tags:
...             - SOME-TAG1
...             - SOME-TAG2
...           target_tag_list:
...             - SOME-TAG3
...             - SOME-TAG4
...           train_end_date: '2019-03-01T00:00:00+00:00'
...           train_start_date: '2019-01-01T00:00:00+00:00'
...           asset: asgb
...           data_provider:
...             type: RandomDataProvider
...         metadata: |
...           information: Some sweet information about the model
...         model: |
...           gordo.machine.model.anomaly.diff.DiffBasedAnomalyDetector:
...             base_estimator:
...               sklearn.pipeline.Pipeline:
...                 steps:
...                 - sklearn.decomposition.PCA
...                 - sklearn.multioutput.MultiOutputRegressor:
...                     estimator: sklearn.linear_model.LinearRegression
...         name: crazy-sweet-name
... '''
>>> models_n_metadata = local_build(config)
>>> assert len(list(models_n_metadata)) == 1

Return type:: A generator yielding tuples of models and their metadata.

Model builder utils#

gordo.builder.utils.create_model_builder(model_builder_class: str | None) → Type[ModelBuilder][source]#