Serializer#
The serializer is the core component used in the conversion of a Gordo config file into Python objects which interact in order to construct a full ML model capable of being served on Kubernetes.
Things like the dataset and model keys within the YAML config represents
objects which will be (de)serialized by the serializer to complete this goal.
- gordo.serializer.serializer.dump(obj: object, dest_dir: PathLike | str, metadata: dict | None = None, info: dict | None = None)[source]#
Serialize an object into a directory, the object must be pickle-able.
- Parameters:
obj – The object to dump. Must be pickle-able.
dest_dir – The directory to which to save the model metadata: dict - any additional metadata to be saved alongside this model if it exists, will be returned from the corresponding “load” function
metadata – with the model, and loaded again by
load_metadata().info – Current revision info. For now, only used for storing “checksum”
Example
>>> from sklearn.pipeline import Pipeline >>> from sklearn.decomposition import PCA >>> from gordo.machine.model.models import KerasAutoEncoder >>> from gordo import serializer >>> from tempfile import TemporaryDirectory >>> pipe = Pipeline([ ... ('pca', PCA(3)), ... ('model', KerasAutoEncoder(kind='feedforward_hourglass'))]) >>> with TemporaryDirectory() as tmp: ... serializer.dump(obj=pipe, dest_dir=tmp) ... pipe_clone = serializer.load(source_dir=tmp)
- gordo.serializer.serializer.dumps(model: Pipeline | GordoBase) bytes[source]#
Dump a model into a bytes representation suitable for loading from
gordo.serializer.loads- Parameters:
model – A gordo model/pipeline
- Return type:
Serialized model which supports loading via
serializer.loads()
Example
>>> from gordo.machine.model.models import KerasAutoEncoder >>> from gordo import serializer >>> >>> model = KerasAutoEncoder('feedforward_symmetric') >>> serialized = serializer.dumps(model) >>> assert isinstance(serialized, bytes) >>> >>> model_clone = serializer.loads(serialized) >>> assert isinstance(model_clone, KerasAutoEncoder)
- gordo.serializer.serializer.load(source_dir: PathLike | str) Any[source]#
Load an object from a directory, saved by
gordo.serializer.pipeline_serializer.dumpThis take a directory, which is either top-level, meaning it contains a sub directory in the naming scheme: “n_step=<int>-class=<path.to.Class>” or the aforementioned naming scheme directory directly. Will return that unsterilized object.
- Parameters:
source_dir – Location of the top level dir the pipeline was saved
- gordo.serializer.serializer.load_metadata(source_dir: PathLike | str) dict[source]#
Load the given metadata.json which was saved during the
serializer.dumpwill return the loaded metadata as a dict, or empty dict if no file was found- Parameters:
source_dir – Directory of the saved model, As with serializer.load(source_dir) this source_dir can be the top level, or the first dir into the serialized model.
- Raises:
FileNotFoundError – If a ‘metadata.json’ file isn’t found in or above the supplied
source_dir
- gordo.serializer.serializer.loads(bytes_object: bytes) GordoBase[source]#
Load a GordoBase model from bytes dumped from
gordo.serializer.dumps- Parameters:
bytes_object – Bytes to be loaded, should be the result of serializer.dumps(model)
- Return type:
Custom gordo model, scikit learn pipeline or other scikit learn like object.
- gordo.serializer.serializer.metadata_path(source_dir: PathLike | str) PathLike | str | None[source]#
Returns path to metadata.json file, if exists.
From Definition#
The ability to take a ‘raw’ representation of an object in dict form
and load it into a Python object.
- gordo.serializer.from_definition.create_instance(fn, **kwargs)[source]#
Create a class instance.
Examples
>>> from sklearn.preprocessing import MinMaxScaler >>> create_instance(MinMaxScaler, feature_range=[-1, 1]) MinMaxScaler(feature_range=(-1, 1))
- Parameters:
fn – Class factory function.
kwargs – fn parameters.
- gordo.serializer.from_definition.from_definition(pipe_definition: str | Dict[str, Dict[str, Any]]) FeatureUnion | Pipeline[source]#
Construct a Pipeline or FeatureUnion from a definition.
Example
>>> import yaml >>> from gordo import serializer >>> raw_config = ''' ... sklearn.pipeline.Pipeline: ... steps: ... - sklearn.decomposition.PCA: ... n_components: 3 ... - sklearn.pipeline.FeatureUnion: ... - sklearn.decomposition.PCA: ... n_components: 3 ... - sklearn.pipeline.Pipeline: ... - sklearn.preprocessing.MinMaxScaler ... - sklearn.decomposition.TruncatedSVD: ... n_components: 2 ... - sklearn.ensemble.RandomForestClassifier: ... max_depth: 3 ... ''' >>> config = yaml.safe_load(raw_config) >>> scikit_learn_pipeline = serializer.from_definition(config)
- Parameters:
pipe_definition – List of steps for the Pipeline / FeatureUnion
constructor_class – What to place the list of transformers into, either sklearn.pipeline.Pipeline/FeatureUnion
- Return type:
pipeline
Into Definitiion#
The ability to take a Python object, such as a scikit-learn
pipeline and convert it into a primitive dict, which can then be inserted
into a YAML config file.
- gordo.serializer.into_definition.into_definition(pipeline: Pipeline, prune_default_params: bool = False, tuples_to_list: bool = True) dict[source]#
Convert an instance of
sklearn.pipeline.Pipelineinto a dict definition capable of being reconstructed withgordo.serializer.from_definition- Parameters:
pipeline – Instance of pipeline to decompose
prune_default_params – Whether to prune the default parameters found in current instance of the transformers vs what their default params are.
tuples_to_list – Convert all tuples in output to lists
- Returns:
definitions for the pipeline, compatible to be reconstructed with
Example
>>> import yaml >>> from sklearn.pipeline import Pipeline >>> from sklearn.decomposition import PCA >>> from gordo.machine.model.models import KerasAutoEncoder >>> >>> pipe = Pipeline([('pca', PCA(4)), ('ae', KerasAutoEncoder(kind='feedforward_model'))]) >>> pipe_definition = into_definition(pipe) # It is now a standard python dict of primitives. >>> print(yaml.dump(pipe_definition)) sklearn.pipeline.Pipeline: memory: null steps: - sklearn.decomposition._pca.PCA: copy: true iterated_power: auto n_components: 4 n_oversamples: 10 power_iteration_normalizer: auto random_state: null svd_solver: auto tol: 0.0 whiten: false - gordo.machine.model.models.KerasAutoEncoder: kind: feedforward_model verbose: false
Utils#
- gordo.serializer.utils.is_tuple_type(tp) bool[source]#
Check if this type is a tuple.
Examples
>>> from typing import Optional, Tuple >>> is_tuple_type(tuple) True >>> is_tuple_type(Optional[tuple[int, int]]) True >>> is_tuple_type(Tuple[str, str]) True >>> is_tuple_type(list[str]) False
- Parameters:
tp – Type for check.