Learning should always be fun

How to deploy Keras model with TensorFlow Model Server using docker container

How to deploy Keras model with TensorFlow Model Server using docker container

Hello gud ppl! In this tutorial we will learn to server our Keras model with TensorFlow server. I don’t know if its the best approach. But this is something that I have done in one of my project. We will also wrap the server in the docker container and use our flask server to serve API for making predictions.


Before we begin


Whenever we train a model, the training process might take hours or several days or even weeks. Therefore we cant afford to train the data in each application run. Instead a best approach would be to save the model into a binary file which can be reused in future even if the application dies without having to spend all the time training the data again.


Saving Keras model in binary format


The keras model can be saved in .h5 format. One example of saving kearas model is :


class SpellModel:

    def __init__(self):

        self.modelDir = "models"
        self.modeWtFile = "modelWt.h5"
        self.modelFileName = "model.h5"
        self.modelJsonName = "model.json"
        self.exportFolder = "exports"
        self.exportTF = "exportTF"

        self.exportTF_DIR = self.modelDir+"/"+self.exportTF
        self.modelExports = self.modelDir+"/"+self.exportFolder
        self.modelWt = self.modelExports+"/"+self.modeWtFile
        self.modelH5 = self.modelExports+"/"+self.modelFileName
        self.modelJson = self.modelExports+"/"+self.modelJsonName

        self.model = tf.keras.Sequential([
            tf.keras.layers.Flatten(input_shape=(2, 20)),
            tf.keras.layers.Dense(128, activation=tf.nn.relu),
            tf.keras.layers.Dense(64, activation=tf.nn.relu),
            tf.keras.layers.Dense(20, activation=tf.nn.relu),
            tf.keras.layers.Dense(10, activation=tf.nn.softmax)

    # traing the model
    def train(self):

        print("training spell model")
        # check if the model file is present
        if not os.path.isfile(self.modelJson):

            X, Y, input_shape = self.loadAndPreporcess()
            self.X = X
            self.Y = Y
            self.model.fit(X, Y, epochs=150)
            # serialize model to JSON
            model_json = self.model.to_json()
            with open(self.modelJson, "w") as json_file:
            # serialize weights to HDF5
            # save the model
            print("Saved model to disk")

            # model file is there hence lets load the model from the file
            # load json and create model
            json_file = open(self.modelJson, 'r')
            loaded_model_json = json_file.read()
            self.model = model_from_json(loaded_model_json)
            print("Loaded model from disk")


Here the train function will train the model and save it to a file. If the file is already there, the model will be loaded from the file. This approach will let us reuse our trained model and save us time and money for training the model again and again.


Export Keras model to TensorFlow protocol buffers


Now let us export this model to TensorFlow protocol buffers. Let us create a export function as:


    def exportModel(self):
        # The export path contains the name and the version of the model

        model = tf.keras.models.load_model(self.modelH5)
        export_path = self.exportTF_DIR
        export_version = 1

        saved_to_path = tf.contrib.saved_model.save_keras_model(
            model, export_path)


Here we have saved our mode with the help of save_keras_model function. Once done our folder structure looks something like this :


tensorflow serving folder structure
tensorflow serving folder structure



Dockerizing tensorflow server


Now we need to server our model. In order to do so lets create a docker file as :


FROM python:3.7-stretch

RUN apt-get update -y
RUN apt-get upgrade -y

RUN apt-get install make
RUN apt-get install -y software-properties-common
RUN add-apt-repository main
RUN apt-get install build-essential -y
RUN apt-get install nano -y

# installing python packages and  dependencies
COPY ./requirements.txt /requirements.txt
RUN pip install -r /requirements.txt
# pip dependencies ends

# setting up tensorflow model server
RUN echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | tee /etc/apt/sources.list.d/tensorflow-serving.list 
RUN curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | apt-key add -

RUN apt-get update && apt-get install tensorflow-model-server
# tensorflow model server setup complete

COPY . /app

# expose 9000 for the flask

# Run the project
CMD tensorflow_model_server --model_base_path=/app/models/exportTF --rest_api_port=9000 --model_name=ItemClassifier;


We can add some pip dependencies in requirements.txt as :



Here we have installed the tensorflow model server. In order to server it we have used the following command

tensorflow_model_server --model_base_path=/app/models/exportTF --rest_api_port=9000 --model_name=ItemClassifier;


Here ItemClassifier is our model name. The tensorflow model server will look for the model binary files in exportTF directory. The tensorflow is listening to rest api at port number 9000. You can build the docker image and try it. I have used docker-compose.yaml  and build the docker image as :

docker-compose up --build

You must see tensorflow listening to the port 9000. Now all left is to test by making a api request to the tensorflow server.


API serving from flask


The reason we need flask server is that when we make request to the TF server, we need to pass params in appropriate format. The value of params here is the arrays of data to be predicted. Hence it will be far much simpler for us if we can use existing python libraries to do the array manipulation jobs. You can use any other python code to do so.

Here I have created a predict api in flask as :

@app.route('/predict', methods=['POST'])
def predict():
    word = request.form['word'].upper()
    arr = word2vec(word, TYPES["RESTAURANT"])
    arr = arr.flatten()
    arr = arr.reshape(2, 20)
    arr = np.array(arr, dtype=np.float)/255
    arr = arr.astype('float16')
    payload = {
        "instances": [arr.tolist()]

    # sending post request to TensorFlow Serving server
    r = requests.post(
        'http://tf_server:9000/v1/models/ItemClassifier:predict', json=payload)
    print("Content ==> ",r.content)
    pred = json.loads(r.content.decode('utf-8'))
    predIndex = np.argmax(pred['predictions'][0])
    print("Predicted ==> ", predIndex)
    return json.dumps(pred['predictions'][0])


Here we pass our payload by converting our numpy array to list. We make request to the TF server via http://tf_server:9000/v1/models/ItemClassifier:predict url. Here ItemClassifier is the model name that was served form the TF server. We have defined the model name in the docker file above. Now tf_server is the name of the image that was linked from our flask server container. To be more clear lets look at the docker-compose.yaml of our flask app :

version: '3'
    build: .
    - "5000:5000"
      - "tf_server"

  # The tensor flow service    
    image: tf_server_tfserver 


tf_server_tfserver is the name of the docker image of our TF server. This image has been linked due to which we can request API to the TF server. If we don’t link the TF server image then the flask app container will not be able to make the API request to the TF server. If you are interested to know more please read these docker official docs:




Thank you for reading!! If you guys find any mistakes or wrong practices please don’t hesitate to notify me.

Cheers !!!


References :







Feature Image Credit:

Photo by Franck V. on Unsplash