Reporters#

class gordo.reporters.base.BaseReporter[source]#

Bases: ABC

classmethod from_dict(config: Dict[str, Any]) BaseReporter[source]#

Reconstruct the reporter from a dict representation or a single import path if it doesn’t require any init parameters.

get_params(deep=False)[source]#
abstract report(machine: Machine)[source]#

Report/log the machine

to_dict() dict[source]#

Serialize this object into a dict representation, which can be used to initialize a new object after popping ‘type’ from the dict.

exception gordo.reporters.exceptions.ReporterException[source]#

Bases: Exception

class gordo.reporters.mlflow.MlFlowReporter(*args, model_builder_class: str | Type[ModelBuilder] | None = None, **kwargs)[source]#

Bases: BaseReporter

report(machine: Machine)[source]#

Report/log the machine

exception gordo.reporters.mlflow.MlflowLoggingError[source]#

Bases: ReporterException

gordo.reporters.mlflow.batch_log_items(metrics: List[Metric], params: List[Param], n_max_metrics: int = 200, n_max_params: int = 100) List[Dict[str, Metric | Param]][source]#

Split metrics, params and tags to batches that satisfy limits imposed by MLFlow and AzureML

NOTE: The default maximum number of metrics are and parameters set here are those set by AzureML as per today, 18 February 2020.

Also, there the 1mb request size is not evaluated here, as doing this should not be necessary and is not addressable in a succint way. MLflow also has a limit of 1000 log items per request, but reaching this is not possible with AzureML’s current limit on metrics.

Parameters:
  • metrics – List of MLFlow Metric objects to log.

  • params – List of MLFlow Param objects to log.

  • n_max_metrics – Limit to number of metrics AzureML allows per batch log request payload.

  • n_max_params – Limit to number of params MLFlow allows per batch log request payload.

Returns:

  • List of MlflowClinet.log_batch keyworkd arguments, split to quatnitites

  • that respect limits present for MLFlow and AzureML.

gordo.reporters.mlflow.epoch_now() int[source]#

Get current timestamp in UTC as milliseconds since Unix epoch.

Return type:

Milliseconds since Unix epoch.

gordo.reporters.mlflow.get_kwargs_from_secret(name: str, keys: List[str]) dict[source]#

Get keyword arguments dictionary from secrets environment variable

Parameters:

name – Name of the environment variable whose content is a colon separated list of secrets.

Return type:

Dictionary of keyword arguments parsed from environment variable.

gordo.reporters.mlflow.get_machine_log_items(machine: Machine) Tuple[List[Metric], List[Param]][source]#

Create flat lists of MLflow logging entities from multilevel dictionary

For more information, see the mlflow docs.

Parameters:

machine – Machine to log.

Returns:

  • List of MLFlow Metric objects to log.

  • List of MLFlow Param objects to log.

gordo.reporters.mlflow.get_mlflow_client(workspace_kwargs: dict = {}, service_principal_kwargs: dict = {}) MlflowClient[source]#

Set remote tracking URI for mlflow to AzureML workspace

Parameters:
  • workspace_kwargs

    AzureML Workspace configuration to use for remote MLFlow tracking. An empty dict will result in local logging by the MlflowClient.

    {
         "subscription_id":<value>,
         "resource_group":<value>,
         "workspace_name":<value>
    }
    

  • service_principal_kwargs (dict) –

    AzureML ServicePrincipalAuthentication keyword arguments. An empty dict will result in interactive authentication.

    {
         "tenant_id":<value>,
         "service_principal_id":<value>,
         "service_principal_password":<value>
    }
    

Return type:

Client with tracking uri set to AzureML if configured.

gordo.reporters.mlflow.get_run_id(client: MlflowClient, experiment_name: str, model_key: str) str[source]#

Get an existing or create a new run for the given model_key and experiment_name.

The model key corresponds to a unique configuration of the model. The corresponding run must be manually stopped using the mlflow.tracking.MlflowClient.set_terminated method.

Parameters:
  • client – Client with tracking uri set to AzureML if configured.

  • experiment_name – Name of experiment to log to.

  • model_key – Unique ID of model configuration.

Return type:

Unique ID of MLflow run to log to.

gordo.reporters.mlflow.get_spauth_kwargs() dict[source]#

Get AzureML keyword arguments from environment

The name of this environment variable is set in the Argo workflow template, and its value should be in the format: <tenant_id>:<service_principal_id>:<service_principal_password>

Returns:

  • AzureML ServicePrincipalAuthentication keyword arguments. See

  • gordo.builder.mlflow_utils.get_mlflow_client()

gordo.reporters.mlflow.get_workspace_kwargs() dict[source]#

Get AzureML keyword arguments from environment

The name of this environment variable is set in the Argo workflow template, and its value should be in the format: <subscription_id>:<resource_group>:<workspace_name>.

Returns:

  • AzureML Workspace configuration to use for remote MLFlow tracking. See

  • gordo.builder.mlflow_utils.get_mlflow_client().

gordo.reporters.mlflow.log_machine(mlflow_client: MlflowClient, run_id: str, machine: Machine)[source]#

Send logs to configured MLflow backend

Parameters:
  • mlflow_client – Client instance to call logging methods from.

  • run_id – Unique ID off MLflow Run to log to.

  • machine – Machine to log with MlflowClient.

gordo.reporters.mlflow.mlflow_context(name: str, model_key: str = 'e040fc7030e64d00a2290451a11d6c38', workspace_kwargs: dict = {}, service_principal_kwargs: dict = {})#

Generate MLflow logger function with either a local or AzureML backend

Parameters:
  • name – The name of the log group to log to (e.g. a model name).

  • model_key – Unique ID of logging run.

  • workspace_kwargs – AzureML Workspace configuration to use for remote MLFlow tracking. See gordo.builder.mlflow_utils.get_mlflow_client().

  • service_principal_kwargs – AzureML ServicePrincipalAuthentication keyword arguments. See gordo.builder.mlflow_utils.get_mlflow_client()

Example

>>> with tempfile.TemporaryDirectory as tmp_dir:
...     mlflow.set_tracking_uri(f"file:{tmp_dir}")
...     with mlflow_context("log_group", "unique_key", {}, {}) as (mlflow_client, run_id):
...         log_machine(machine) 
class gordo.reporters.postgres.Machine(*args, **kwargs)[source]#

Bases: Model

DoesNotExist#

alias of MachineDoesNotExist

dataset = <BinaryJSONField: Machine.dataset>#
metadata = <BinaryJSONField: Machine.metadata>#
model = <BinaryJSONField: Machine.model>#
name = <CharField: Machine.name>#
class gordo.reporters.postgres.PostgresReporter(host: str, port: int = 5432, user: str = 'postgres', password: str = 'postgres', database: str = 'postgres')[source]#

Bases: BaseReporter

Reporter storing the gordo.machine.Machine into a Postgres database.

db = <playhouse.postgres_ext.PostgresqlExtDatabase object>#
report(machine: Machine)[source]#

Log a machine to Postgres where top level keys, ‘name’, ‘dataset’, ‘model’, and ‘metadata’ mappings to BinaryJSON fields.

Parameters:

machine

exception gordo.reporters.postgres.PostgresReporterException[source]#

Bases: ReporterException