Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

MLOps

MLOps

The goal of MLOps

  • Measure and monitor the quality of the model.
  • Make the process of Model Building to production smooth and fast.

Agile Manifesto

DevOps

DevOps is the idea the Developers and Operations (and QA etc.) work together to achieve better service, better product.

  • Eliminating silos

  • DevOps is not a product

  • DevOps was not a job

Machine Learning Operations

  • The term MLOps is defined as "the extension of the DevOps methodology to include Machine Learning and Data Science assets as first-class citizens within the DevOps ecology".

  • "The intention is for MLOps to decorate DevOps rather than differentiate."

  • MLOps Road map

What is MLOps?

MLOps community

Silicon

Off-line (batch) vs On-line (streaming) learning

  • Off-line (batch) learning
  • On-line (streaming or live) learning

Security question of MLOps

  • poisoning of the model (e.g. chat bot that learns from real-time data) Tay in 16 hours

  • private and sensitive data (e.g. gender, religion, sexual orientation, health status)

  • legalizations

  • Audit trail of all results! (code, data, parameters, random values, etc)

Data

  • git
  • git-lfs (large file support
  • External storage with hash
  • dvcchecksum

MLOps progress

  • code in Jupyter notebook

  • put the code in functions

  • put the functions in modules

  • write tests for these functions

  • Make sure youre results are repeatable (start with the current data-set)

Reload modules in Jupyter Notebook

%load_ext autoreload
%autoreload 2
examples/ml/reload.ipynb
mymodule.py

Testing ML

  • Create output that is easy to compare by computer (so numerical results are preferable over a graph)

  • Fix randomizations to make the results repeatable

  • Establish thresholds for results using different datasets (and also using different models)

What to track?

  • the code (and the dependencies)

  • the data

  • the artifacts (e.g. models)

  • the experiments and their results.

What are the inputs and what are the artifacts?

  • Data (what kind of data? how does it change? how can developers access it - privacy issues?)

  • Selecting the algorithms

  • Random values as input

  • Hyper parameters

  • The model (a series of numbers?, Is it language-agnostic?)

Tooling for MLOps

DVC

Storage can be

  • local disk
  • cloud
  • HDFS
pip install dvc


git init   (creating .git)
dvc init   (creating .dvc and .dvcignore)


dvc remote add -d dvc-remote /tmp/dvc-demo-storage   (changing .dvc/config)


dvc add data/data.csv

git add .
git commit -m "data:  ...."
git tag -a v1 -m v1


dvc push
  • Files are now in /tmp/dvc-demo-storage
  • Files are also in .dvc/cache
dvc pull
dvs status

Data Pipelines (Workflow management)

Workflow management

MLFlow

  • Tracking
  • Projects
  • Models
  • Model Registry

MLFLow Tracking server backends

Entity Metadata Store

  • FileStore (mlruns directory)
  • SQLStore (via SQLAlchemy - PostgreSQL, MySQL, SQLite)
  • MLFlow Plugins Scheme
  • Managed MLFlow on Databricks

Artifact Store

  • Local Filesystem (mlruns directory)
  • S3
  • Azure blob
  • Google Cloud Storage
  • DBFS (Databricks File System) artifact repo

MLFlow Tracking

  • Parameters: key-value input to the code (learning rate, what loss function is used, number of filters to use, depth of the tree)
  • Metrics: numeric values
  • Tags and Notes: information about a run (free text)
  • Artifacts: files, data, model
  • Source: what code ran?
  • Version: which version of the code?
  • Run: an instance of code
  • Experiment: several Runs
with mlflow.start_run():
    mlflow.log_param("name", value)
    mlflow.log_param(dict)
    ...
    mlflow.log_metric("name", value)
    ...
    mlflow.sklearn.log_model(model)
mlflow ui

MLFlow Projects

Package data-science code to enable reproducable runs on any platform

  • Code
  • Dependencies
  • Data
  • Configuration
$ mlflow run ...
mlflow.run()

MLFlow Models

  • Deploy models in different environments

Input:

  • TensorFlow
  • scikit-learn
  • R
  • Spark
  • ML Frameworks

Standardized MLFlow model format

Output:

  • docker
  • Spark
  • Serving tools

Directory called mlmodel.

mlflow.model_flavor.save_model(...)  or log_model(...)
mlflow.model_flavor.load_model(...)

Resources

import mlflow
import mlflow.sklearn
import dvc.api

path = 'data/data.csv'
repo = ''      # path to git repository
version = 'v1' # git sha1, branch, or tag

data_url = dvc.api.get_url(
    path=path,
    repo=repo,
    rev=version,
)

mlflow.set_experiment('demo')

df = pd.read_csv(data_url)

mlflow.log_param('data_url', data_url)
mlflow.log_param('data_version', version)
mlflow.log_param('input_rows', df.shape[0])
mlflow.log_param('input_cols', df.shape[1])


cols_y = pd.DataFrame(list(train_y.columns))
cols_y.to_csv('features.csv', header=False, index=False)

mlflow.log_artifact('features.csv')

Goals of SCM

  • SCM = Software configuration management

  • Reproducability

  • Change management

MLOps notes

  • logging

  • metrics

  • data-pipelines

  • Data is changing (new type of data, the same data but a newser dataset)

  • Model

  • Monitor the quality of the model over time

  • The standard tools measuring precision and recall in classification, accuracy, F-measure (F1)

  • data quality

  • model decay (due to changes in the data that are not used to re-train the model)

  • locality (using the same model on a different set of data, eg. a different cluster of customers)

  • Distributed learning