Serializer#

The serializer is the core component used in the conversion of a Gordo config file into Python objects which interact in order to construct a full ML model capable of being served on Kubernetes.

Things like the dataset and model keys within the YAML config represents objects which will be (de)serialized by the serializer to complete this goal.

gordo.serializer.serializer.dump(obj: object, dest_dir: PathLike | str, metadata: dict | None = None, info: dict | None = None)[source]#

Serialize an object into a directory, the object must be pickle-able.

Parameters:

obj – The object to dump. Must be pickle-able.
dest_dir – The directory to which to save the model metadata: dict - any additional metadata to be saved alongside this model if it exists, will be returned from the corresponding “load” function
metadata – with the model, and loaded again by load_metadata().
info – Current revision info. For now, only used for storing “checksum”

Example

>>> from sklearn.pipeline import Pipeline
>>> from sklearn.decomposition import PCA
>>> from gordo.machine.model.models import KerasAutoEncoder
>>> from gordo import serializer
>>> from tempfile import TemporaryDirectory
>>> pipe = Pipeline([
...     ('pca', PCA(3)),
...     ('model', KerasAutoEncoder(kind='feedforward_hourglass'))])
>>> with TemporaryDirectory() as tmp:
...     serializer.dump(obj=pipe, dest_dir=tmp)
...     pipe_clone = serializer.load(source_dir=tmp)

gordo.serializer.serializer.dumps(model: Pipeline | GordoBase) → bytes[source]#

Dump a model into a bytes representation suitable for loading from gordo.serializer.loads

Parameters:: model – A gordo model/pipeline
Return type:: Serialized model which supports loading via serializer.loads()

Example

>>> from gordo.machine.model.models import KerasAutoEncoder
>>> from gordo import serializer
>>>
>>> model = KerasAutoEncoder('feedforward_symmetric')
>>> serialized = serializer.dumps(model)
>>> assert isinstance(serialized, bytes)
>>>
>>> model_clone = serializer.loads(serialized)
>>> assert isinstance(model_clone, KerasAutoEncoder)

gordo.serializer.serializer.load(source_dir: PathLike | str) → Any[source]#

Load an object from a directory, saved by gordo.serializer.pipeline_serializer.dump

This take a directory, which is either top-level, meaning it contains a sub directory in the naming scheme: “n_step=<int>-class=<path.to.Class>” or the aforementioned naming scheme directory directly. Will return that unsterilized object.

Parameters:: source_dir – Location of the top level dir the pipeline was saved

gordo.serializer.serializer.load_info(source_dir: PathLike | str) → dict[source]#

gordo.serializer.serializer.load_metadata(source_dir: PathLike | str) → dict[source]#

Load the given metadata.json which was saved during the serializer.dump will return the loaded metadata as a dict, or empty dict if no file was found

Parameters:: source_dir – Directory of the saved model, As with serializer.load(source_dir) this source_dir can be the top level, or the first dir into the serialized model.
Raises:: FileNotFoundError – If a ‘metadata.json’ file isn’t found in or above the supplied source_dir

gordo.serializer.serializer.loads(bytes_object: bytes) → GordoBase[source]#

Load a GordoBase model from bytes dumped from gordo.serializer.dumps

Parameters:: bytes_object – Bytes to be loaded, should be the result of serializer.dumps(model)
Return type:: Custom gordo model, scikit learn pipeline or other scikit learn like object.

gordo.serializer.serializer.metadata_path(source_dir: PathLike | str) → PathLike | str | None[source]#: Returns path to metadata.json file, if exists.

From Definition#

The ability to take a ‘raw’ representation of an object in dict form and load it into a Python object.

gordo.serializer.from_definition.create_instance(fn, **kwargs)[source]#

Create a class instance.

Examples

>>> from sklearn.preprocessing import MinMaxScaler
>>> create_instance(MinMaxScaler, feature_range=[-1, 1])
MinMaxScaler(feature_range=(-1, 1))

Parameters:

fn – Class factory function.
kwargs – fn parameters.

gordo.serializer.from_definition.from_definition(pipe_definition: str | Dict[str, Dict[str, Any]]) → FeatureUnion | Pipeline[source]#

Construct a Pipeline or FeatureUnion from a definition.

Example

>>> import yaml
>>> from gordo import serializer
>>> raw_config = '''
... sklearn.pipeline.Pipeline:
...         steps:
...             - sklearn.decomposition.PCA:
...                 n_components: 3
...             - sklearn.pipeline.FeatureUnion:
...                 - sklearn.decomposition.PCA:
...                     n_components: 3
...                 - sklearn.pipeline.Pipeline:
...                     - sklearn.preprocessing.MinMaxScaler
...                     - sklearn.decomposition.TruncatedSVD:
...                         n_components: 2
...             - sklearn.ensemble.RandomForestClassifier:
...                 max_depth: 3
... '''
>>> config = yaml.safe_load(raw_config)
>>> scikit_learn_pipeline = serializer.from_definition(config)

Parameters:

pipe_definition – List of steps for the Pipeline / FeatureUnion
constructor_class – What to place the list of transformers into, either sklearn.pipeline.Pipeline/FeatureUnion

Return type:

pipeline

gordo.serializer.from_definition.load_params_from_definition(definition: dict) → dict[source]#

Deserialize each value from a dictionary. Could be used for preparing kwargs for methods

Parameters:: definition –

Into Definitiion#

The ability to take a Python object, such as a scikit-learn pipeline and convert it into a primitive dict, which can then be inserted into a YAML config file.

gordo.serializer.into_definition.into_definition(pipeline: Pipeline, prune_default_params: bool = False, tuples_to_list: bool = True) → dict[source]#

Convert an instance of sklearn.pipeline.Pipeline into a dict definition capable of being reconstructed with gordo.serializer.from_definition

Parameters:

pipeline – Instance of pipeline to decompose
prune_default_params – Whether to prune the default parameters found in current instance of the transformers vs what their default params are.
tuples_to_list – Convert all tuples in output to lists

Returns:

definitions for the pipeline, compatible to be reconstructed with
gordo.serializer.from_definition()

Example

>>> import yaml
>>> from sklearn.pipeline import Pipeline
>>> from sklearn.decomposition import PCA
>>> from gordo.machine.model.models import KerasAutoEncoder
>>>
>>> pipe = Pipeline([('pca', PCA(4)), ('ae', KerasAutoEncoder(kind='feedforward_model'))])
>>> pipe_definition = into_definition(pipe)  # It is now a standard python dict of primitives.
>>> print(yaml.dump(pipe_definition))
sklearn.pipeline.Pipeline:
  memory: null
  steps:
  - sklearn.decomposition._pca.PCA:
      copy: true
      iterated_power: auto
      n_components: 4
      n_oversamples: 10
      power_iteration_normalizer: auto
      random_state: null
      svd_solver: auto
      tol: 0.0
      whiten: false
  - gordo.machine.model.models.KerasAutoEncoder:
      kind: feedforward_model
  verbose: false

gordo.serializer.into_definition.load_definition_from_params(params: dict, tuples_to_list: bool = True) → dict[source]#

Recursively decomposing each of values from params into the definition

Parameters:

params (dict) –
tuples_to_list (bool) –

Return type:

dict

Utils#

gordo.serializer.utils.is_tuple_type(tp) → bool[source]#

Check if this type is a tuple.

Examples

>>> from typing import Optional, Tuple
>>> is_tuple_type(tuple)
True
>>> is_tuple_type(Optional[tuple[int, int]])
True
>>> is_tuple_type(Tuple[str, str])
True
>>> is_tuple_type(list[str])
False

Parameters:: tp – Type for check.