Utils#

gordo.server.utils.check_metadata_file(directory: str, name: str)[source]#

Checking if the directory with metadata exists since it might be deleted through DELETE endpoint

gordo.server.utils.dataframe_from_dict(data: dict) DataFrame[source]#

The inverse procedure done by multi_lvl_column_dataframe_from_dict() Reconstructed a MultiIndex column dataframe from a previously serialized one.

Expects data to be a nested dictionary where each top level key has a value capable of being loaded from pandas.core.DataFrame.from_dict()

Parameters:

data – Data to be loaded into a MultiIndex column dataframe

Return type:

MultiIndex column dataframe.

Examples

>>> serialized = {
... 'feature0': {'sub-feature-0': {'2019-01-01': 0, '2019-02-01': 4},
...              'sub-feature-1': {'2019-01-01': 1, '2019-02-01': 5}},
... 'feature1': {'sub-feature-0': {'2019-01-01': 2, '2019-02-01': 6},
...              'sub-feature-1': {'2019-01-01': 3, '2019-02-01': 7}}
... }
>>> dataframe_from_dict(serialized)  
                feature0                    feature1
       sub-feature-0 sub-feature-1 sub-feature-0 sub-feature-1
2019-01-01             0             1             2             3
2019-02-01             4             5             6             7
gordo.server.utils.dataframe_from_parquet_bytes(buf: bytes) DataFrame[source]#

Convert bytes representing a parquet table into a pandas dataframe.

Parameters:

buf – Bytes representing a parquet table. Can be the direct result from func::gordo.server.utils.dataframe_into_parquet_bytes

gordo.server.utils.dataframe_into_parquet_bytes(df: DataFrame, compression: str = 'snappy') bytes[source]#

Convert a dataframe into bytes representing a parquet table.

Parameters:
  • df – DataFrame to be compressed

  • compression – Compression to use, passed to pyarrow.parquet.write_table()

gordo.server.utils.dataframe_to_dict(df: DataFrame) dict[source]#

Convert a dataframe can have a pandas.MultiIndex as columns into a dict where each key is the top level column name, and the value is the array of columns under the top level name. If it’s a simple dataframe, pandas.core.DataFrame.to_dict() will be used.

This allows json.dumps() to be performed, where pandas.DataFrame.to_dict() would convert such a multi-level column dataframe into keys of tuple objects, which are not json serializable. However this ends up working with pandas.DataFrame.from_dict()

Parameters:

df – Dataframe expected to have columns of type pandas.MultiIndex 2 levels deep.

Return type:

List of records representing the dataframe in a ‘flattened’ form.

Examples

>>> import pprint
>>> import pandas as pd
>>> import numpy as np
>>> columns = pd.MultiIndex.from_tuples((f"feature{i}", f"sub-feature-{ii}") for i in range(2) for ii in range(2))
>>> index = pd.date_range('2019-01-01', '2019-02-01', periods=2)
>>> df = pd.DataFrame(np.arange(8).reshape((2, 4)), columns=columns, index=index)
>>> df  
                feature0                    feature1
           sub-feature-0 sub-feature-1 sub-feature-0 sub-feature-1
2019-01-01             0             1             2             3
2019-02-01             4             5             6             7
>>> serialized = dataframe_to_dict(df)
>>> pprint.pprint(serialized)
{'feature0': {'sub-feature-0': {'2019-01-01': 0, '2019-02-01': 4},
              'sub-feature-1': {'2019-01-01': 1, '2019-02-01': 5}},
 'feature1': {'sub-feature-0': {'2019-01-01': 2, '2019-02-01': 6},
              'sub-feature-1': {'2019-01-01': 3, '2019-02-01': 7}}}
gordo.server.utils.delete_revision(directory: str, name: str)[source]#

Delete model revision

Parameters:
  • directory (directory - Revision) –

  • name (name - Model) –

gordo.server.utils.extract_X_y(method)[source]#

For a given flask view, will attempt to extract an ‘X’ and ‘y’ from the request and assign it to flask’s ‘g’ global request context

If it fails to extract ‘X’ and (optionally) ‘y’ from the request, it will not run the function but return a BadRequest response notifying the client of the failure.

Parameters:

method – The flask route to decorate, and will return it’s own response object and will want to use flask.g.X and/or flask.g.y

Returns:

  • Will either run a flask.Response with status code 400 if it fails

  • to extract the X and optionally the y. Otherwise will run the decorated method

  • which is also expected to return some sort of flask.Response object.

gordo.server.utils.load_info(directory: str, name: str) dict#
gordo.server.utils.load_metadata(directory: str, name: str) dict[source]#

Load metadata from a directory for a given model by name.

Parameters:
  • directory – Directory to look for the model’s metadata

  • name – Name of the model to load metadata for, this would be the sub directory within the directory parameter.

gordo.server.utils.load_model(directory: str, name: str) BaseEstimator#

Load a given model from the directory by name.

Parameters:
  • directory – Directory to look for the model

  • name – Name of the model to load, this would be the sub directory within the directory parameter.

gordo.server.utils.metadata_required(f)[source]#

Decorate a view which has gordo_name as a url parameter and will set g.metadata to that model’s metadata

gordo.server.utils.model_required(f)[source]#

Decorate a view which has gordo_name as a url parameter and will set g.model to be the loaded model and g.metadata to that model’s metadata

gordo.server.utils.parse_iso_datetime(datetime_str: str) datetime[source]#
gordo.server.utils.validate_gordo_name(gordo_name: str)[source]#

gordo_name argument should contains alpha-numericals or ‘-’ symbols

gordo.server.utils.validate_revision(revision: str) bool[source]#

The general model input/output operations applied by blueprints.

gordo.server.model_io.get_model_output(model: Pipeline, X: ndarray) ndarray[source]#

Get the raw output from the current model given X. Will try to predict and then transform, raising an error if both fail.

Parameters:

X – 2d array of sample(s)

Return type:

The raw output of the model in numpy array form.