Deploy a Machine Learning Model to Production with TensorFlow Serving - Knowledgebase

Introduction

TensorFlow Serving is a flexible, high-performance serving system for machine learning models designed for production environments. This guide shows how to train a simple model and serve this model with TensorFlow Serving using Docker. It also shows how to serve many models simultaneously.

Prerequisites

Working knowledge of Python.
Properly installed and configured Python toolchain, including pip.

Deploy the Docker instance

Use Rcs One-Click Docker to deploy your Docker instance. This guide uses the Ubuntu version.

After the deployment, log in on the server via SSH

Update the machine and reboot the machine to apply the updates

# apt-get update && apt-get dist-upgrade -y

# reboot

It is recommended to use virtual environments while working on Python projects. Install the package to use Python venv module or use any other method of your preference.

On Ubuntu 20.04:

# apt-get install python3.8-venv

On Ubuntu 22.04:

# apt-get install python-3.10-venv

After rebooting, switch to docker user and download the TensorFlow Serving image.

# su -l docker -s /usr/bin/bash

$ docker pull tensorflow/serving

Create the Example Model

Train a model using the Fashion-MNIST dataset which

consists of 60k training examples and 10k test examples with ten different classes, where each sample is a 28x28 pixels grayscale image of a piece of clothing.

This guide does not detail the model training or do any parameter tuning. The focus is on TensorFlow Serving rather than the model training.

Set up a virtual environment and install the necessary Python packages.

$ python3 -m venv tfserving-venv

$ . tfserving-venv/bin/activate

$ pip install tensorflow requests

Create a file model.py with the following code, which loads and prepares the data.

import tensorflow as tf

from tensorflow import keras



# load data

fashion_mnist = keras.datasets.fashion_mnist

(train_data, train_labels), (test_data, test_labels) = fashion_mnist.load_data()

class_names = ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat",

               "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]



# scale data values to [0,1] interval

train_data = train_data / 255.0

test_data = test_data / 255.0



# reshape data to represent 28x28 pixels grayscale images

train_data = train_data.reshape(train_data.shape[0], 28, 28, 1)

test_data = test_data.reshape(test_data.shape[0], 28, 28, 1)

Build a CNN with a single convolutional layer (8 filters of size 3x3) and an output layer with ten units.

model = keras.Sequential([

    keras.layers.Conv2D(input_shape=(28,28,1), filters=8, kernel_size=3,

                        strides=2, activation="relu", name="Conv1"),

    keras.layers.Flatten(),

    keras.layers.Dense(10, name="Dense")

])

Train the model.

epochs = 3

model.compile(optimizer="adam",

              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),

              metrics=[keras.metrics.SparseCategoricalAccuracy()])

model.fit(train_data, train_labels, epochs=epochs)

Save the model.

import os

MODEL_DIR = "/home/docker/tfserving-fashion"

VERSION = "1"

export_path = os.path.join(MODEL_DIR, VERSION)



tf.keras.models.save_model(

    model,

    export_path,

    overwrite=True,

    include_optimizer=True,

    save_format=None,

    signatures=None,

    options=None

)

Run the code in model.py.

$ python model.py

...

Epoch 1/3

1875/1875 [==============================] - 5s 3ms/step - loss: 0.5629 - sparse_categorical_accuracy: 0.8033

Epoch 2/3

1875/1875 [==============================] - 5s 3ms/step - loss: 0.4119 - sparse_categorical_accuracy: 0.8566

Epoch 3/3

1875/1875 [==============================] - 5s 3ms/step - loss: 0.3708 - sparse_categorical_accuracy: 0.8696

You can visualize the model information with the command saved_model_cli.

$ saved_model_cli show --dir ~/tfserving-fashion/1 --all

Start the server

Start the server using Docker.

$ docker run -t --rm -p 8501:8501 \

    -v "/home/docker/tfserving-fashion:/models/fashion" \

    -e MODEL_NAME=fashion \

    tensorflow/serving &

How to Make Queries

Create a file query.py with the following code, which loads the data, sets the query header, and defines query data as the first example in the test dataset.

import json

import requests

import numpy as np

from tensorflow import keras



fashion_mnist = keras.datasets.fashion_mnist

(train_data, train_labels), (test_data, test_labels) = fashion_mnist.load_data()

class_names = ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat",

               "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]



# scale data values to [0,1] interval

train_data = train_data / 255.0

test_data = test_data / 255.0



# reshape data to represent 28x28 pixels grayscale images

train_data = train_data.reshape(train_data.shape[0], 28, 28, 1)

test_data = test_data.reshape(test_data.shape[0], 28, 28, 1)



data = json.dumps({"instances": test_data[0:1].tolist()})

headers = {"content-type": "application/json"}

Make the query and get the response. Remember to change the address to your server IP or FQDN.

json_response = requests.post("http://<<SERVER IP/FQDN>>:8501/v1/models/fashion:predict",

                              data=data, headers=headers)

predictions = json.loads(json_response.text)["predictions"]

The predictions variable contains a list with the probabilities of the object query belonging to each of the 10 classes. The greatest of these values is the classification made by the model. The function np.argmax returns the model classification from the predictions variable.

print("Real class:", class_names[test_labels[0]])

print("Predicted class:", class_names[np.argmax(predictions[0])])

Run query.py, and you should see the result of this query.

$ python3 query.py

...

Real class: Ankle boot

Predicted class: Ankle boot

You can also query for more than one example at each request. Change the data variable to, for example, the five first samples in test data and the output of the query.py to show the classification of each of the five queries.

data = json.dumps({"instances": test_data[0:5].tolist()})



...



for i in range(len(predictions)):

    print("Query", str(i))

    print("Real class:", class_names[test_labels[i]])

    print("Predicted class:", class_names[np.argmax(predictions[i])])

Running query.py again gives the result of the query with five examples.

$ python3 query.py

...

Query 0

Real class: Ankle boot

Predicted class: Ankle boot

Query 1

Real class: Pullover

Predicted class: Pullover

Query 2

Real class: Trouser

Predicted class: Trouser

Query 3

Real class: Trouser

Predicted class: Trouser

Query 4

Real class: Shirt

Predicted class: Shirt

Serving more than one model

TensorFlow Serving is also capable of serving more than one model simultaneously. This time, use some of the pre-trained models available in the TensorFlow Serving GitHub repository.

Install Git if it is not installed yet.

# apt-get install git

And clone the repository.

$ git clone https://github.com/tensorflow/serving

Create a file named models.config with the content below, which configures two models defining the name, directory relative to the Docker container running your server, and model platform for each one.

model_config_list {

    config {

        name: 'half_plus_two'

        base_path: '/models/saved_model_half_plus_two_cpu'

        model_platform: 'tensorflow'

    }

    config {

        name: 'half_plus_three'

        base_path: '/models/saved_model_half_plus_three'

        model_platform: 'tensorflow'

    }

}

The models compute the half part of a number and add two and three, respectively.

Start the server loading this configuration file and mounting the volume with the pre-trained models.

$ docker run -t --rm -p 8501:8501 \

    -v "/home/docker/serving/tensorflow_serving/servables/tensorflow/testdata/:/models/" \

    -v "/home/docker/models.config:/tmp/models.config" \

    tensorflow/serving \

    --model_config_file=/tmp/models.config

You can view each model status information with an HTTP GET request on the browser or using curl.

$ curl http://<<SERVER IP/FQDN>>:8501/v1/models/half_plus_two

{

    "model_version_status": [

        {

            "version": "123",

            "state": "AVAILABLE",

            "status": {

                "error_code": "OK",

                "error_message": ""

            }

        }

    ]

}

$ curl http://<<SERVER IP/FQDN>>:8501/v1/models/half_plus_three

{

    "model_version_status": [

        {

            "version": "123",

            "state": "AVAILABLE",

            "status": {

                "error_code": "OK",

                "error_message": ""

            }

        }

    ]

}

And request predictions for both models with an HTTP POST request using curl by running:

$ curl -d '{"instances": [1.0, 2.0, 3.0]}' \

    -X POST http://<<SERVER IP/FQDN>>:8501/v1/models/half_plus_two:predict

{

    "predictions": [2.5, 3.0, 3.5]

}

$ curl -d '{"instances": [1.0, 2.0, 3.0]}' \

    -X POST http://<<SERVER IP/FQDN>>:8501/v1/models/half_plus_three:predict

{

    "predictions": [3.5, 4.0, 4.5]

}

Conclusion

This guide covered how to train a TensorFlow model and deploy TensorFlow Serving to serve this and many models simultaneously with Docker.

More information

To learn more about TensorFlow Serving, go to the developer documentation pages:

Knowledgebase

Categories

Categories

Support

Deploy a Machine Learning Model to Production with TensorFlow Serving Print

Introduction

Prerequisites

Deploy the Docker instance

Create the Example Model

Start the server

How to Make Queries

Serving more than one model

Conclusion

More information

Was this answer helpful?

Related Articles

Support

Knowledgebase

Categories

Categories

Support

Deploy a Machine Learning Model to Production with TensorFlow Serving Print

Introduction

Prerequisites

Deploy the Docker instance

Create the Example Model

Start the server

How to Make Queries

Serving more than one model

Conclusion

More information

Was this answer helpful?

Related Articles

Support

Generate Password