Deploying pre-trained Universal Sentence Encoder model on cloud

Yashna Shravani
4 min readJun 5, 2021

This article describes a step by step procedure to serve a model in production using Docker, TensorFlow Serving and Openshift.

Photo by Pero Kalimero on Unsplash

What is Universal Sentence Encoder (USE) ?

USE is a pre-trained model developed by Google that helps in converting textual data into fixed 512 sized vector, widely used in downstream NLP tasks like sentence similarity, classification and sentiment analysis, etc. For the purpose of this article we will be using transformer based large USE version 5. Feel free to use your own TensorFlow model for deployment.

What is Docker?

Docker seamlessly packages your application with all its dependencies which could be run in any platform that supports docker irrespective of the machine configuration used for development. Let’s see how to start your server locally in less than 60s using Docker.

Let us begin…

Creating directory structure

In order for TensorFlow Serving to understand which model and version to deploy, create a directory structure and save your model files as shown below : <model-name>/<version>/<model-files>. You can assign any meaningful name and version.

Directory structure

Why TensorFlow Serving?: With TensorFlow Serving we get this production ready server that makes model deployment, maintenance, version controlling really simple thereby minimizing the overhead involved in using Flask, FastAPI, etc. Checkout out the official link for more info.

Running Locally

Pull the latest TensorFlow Serving image from public or private registry. In case of using the official Docker registry ignore the <registry-url> part.

docker pull <registry-url>/tensorflow/serving

Now start your own server locally by using below command, here <pwd> is the path containing use-large directory.

docker run -it --rm -p 8501:8501 -v <pwd>:/models -e MODEL_NAME=use-large tensorflow/serving
Voila! test your server by using Postman or Curl to hit the API. eg:
curl -d ‘{“instances”: [“How you doing?”]}’ -X POST http://localhost:8501/v1/models/use-large:predict

API sends 512 sized vector encoding as the response

Docker run command starts the container and exposes the API locally to get the model inference. Let’s go over the parameters one by one.

-it Starts the container in interactive mode.
--rmAutomatically removes the container when it exits.
-p 8501:8501 tensorflow/serving image exposes the REST API on port 8501, this means <host-port>:<container-port> my local machines’s port 8501 (you can choose any available port in your local) will listen for all requests which will be passed to the running container.
-v <pwd>:/models Bind the contents on local volume (i.e the model directory we created before) to /models within the container.
-e Sets the environment variable .

This way we can mount the model to the container which handles the incoming HTTP requests and send the model inference as the response. But this does not make my model result available to rest of the world. This is where Openshift helps us. It is built on top of Kubernetes and is recommended for Data Scientists, as with few clicks or commands we can deploy our dockerized app without writing Yaml files.

Deploying on Cloud (Openshift)

Now lets write a small instruction file called dockerfile to create our customized tensorflow/serving image which already has the model files copied hence does not require-v parameter for running the container.

updated directory structure

Make a tiny adjustment to the directory structure and add the Dockerfile.

#DOCKERFILE#use the TensorFlow Serving as the base image
FROM <private-registry-url>/tensorflow/serving
#copy the contents of <pwd>/model from local to /models in container
COPY model/ /models

On the terminal build the updated image called use_serving.
docker build -t use_serving .

Lets check if this new image works the same way at local by starting the container and making an API request:
docker run -it --rm -p 8501:8501 -e MODEL_NAME=use-large use_serving

If the results are same let’s push this image to your registry from where Openshift can pull it.

Github for code == Container registry for docker images

Before pushing, first tag your image with the registry url:
docker tag use_serving <registry-url>/use_serving:v1
It is a good practice to add version after colon, else default value is latest.
Finally push the image.
docker push <registry-url>/use_serving:v1

Using Openshift CLI we can quickly :

  • Deploy the image and create application
    oc new-app <registry-url>/use_serving:v1 --name=my-use-app -e MODEL_NAME=use-large
  • Expose the service
    oc expose svc/my-use-app

Now using the browser login to the Openshift cluster and head over to the route section. A URL would be created and make sure it is connected to port 8501 of the container. Check the pod status and increase RAM or CPU cores if needed.

Finally hit the cloud server and share the URL to generate model results. eg:
curl -d '{"instances”: ["How you doing?","Any plans for tonight?"]}' -X POST <openshift-url>/v1/models/use-large:predict

Before pushing image to Docker registry make sure to login
docker login <registry-url>
Similarly for Openshift before creating an app, do
oc login <cluster-url>

Do check this link for advance configuration like batch prediction, monitoring, multiple model deployment to name a few.

Any feedback would be much appreciated! :)

--

--