Deploying pre-trained Universal Sentence Encoder model on cloud
This article describes a step by step procedure to serve a model in production using Docker, TensorFlow Serving and Openshift.
What is Universal Sentence Encoder (USE) ?
USE is a pre-trained model developed by Google that helps in converting textual data into fixed 512 sized vector, widely used in downstream NLP tasks like sentence similarity, classification and sentiment analysis, etc. For the purpose of this article we will be using transformer based large USE version 5. Feel free to use your own TensorFlow model for deployment.
What is Docker?
Docker seamlessly packages your application with all its dependencies which could be run in any platform that supports docker irrespective of the machine configuration used for development. Let’s see how to start your server locally in less than 60s using Docker.
Let us begin…
Creating directory structure
In order for TensorFlow Serving to understand which model and version to deploy, create a directory structure and save your model files as shown below : <model-name>/<version>/<model-files>. You can assign any meaningful name and version.
Why TensorFlow Serving?: With TensorFlow Serving we get this production ready server that makes model deployment, maintenance, version controlling really simple thereby minimizing the overhead involved in using Flask, FastAPI, etc. Checkout out the official link for more info.
Running Locally
Pull the latest TensorFlow Serving image from public or private registry. In case of using the official Docker registry ignore the <registry-url> part.
docker pull <registry-url>/tensorflow/serving
Now start your own server locally by using below command, here <pwd> is the path containing use-large directory.
docker run -it --rm -p 8501:8501 -v <pwd>:/models -e MODEL_NAME=use-large tensorflow/serving
Voila! test your server by using Postman or Curl to hit the API. eg:curl -d ‘{“instances”: [“How you doing?”]}’ -X POST http://localhost:8501/v1/models/use-large:predict
Docker run command starts the container and exposes the API locally to get the model inference. Let’s go over the parameters one by one.
-it
Starts the container in interactive mode. --rm
Automatically removes the container when it exits.-p 8501:8501
tensorflow/serving image exposes the REST API on port 8501, this means <host-port>:<container-port> my local machines’s port 8501 (you can choose any available port in your local) will listen for all requests which will be passed to the running container.-v <pwd>:/models
Bind the contents on local volume (i.e the model directory we created before) to /models within the container.-e
Sets the environment variable .
This way we can mount the model to the container which handles the incoming HTTP requests and send the model inference as the response. But this does not make my model result available to rest of the world. This is where Openshift helps us. It is built on top of Kubernetes and is recommended for Data Scientists, as with few clicks or commands we can deploy our dockerized app without writing Yaml files.
Deploying on Cloud (Openshift)
Now lets write a small instruction file called dockerfile to create our customized tensorflow/serving image which already has the model files copied hence does not require-v
parameter for running the container.
Make a tiny adjustment to the directory structure and add the Dockerfile.
#DOCKERFILE#use the TensorFlow Serving as the base image
FROM <private-registry-url>/tensorflow/serving#copy the contents of <pwd>/model from local to /models in container
COPY model/ /models
On the terminal build the updated image called use_serving.docker build -t use_serving .
Lets check if this new image works the same way at local by starting the container and making an API request:docker run -it --rm -p 8501:8501 -e MODEL_NAME=use-large use_serving
If the results are same let’s push this image to your registry from where Openshift can pull it.
Github for code == Container registry for docker images
Before pushing, first tag your image with the registry url:docker tag use_serving <registry-url>/use_serving:v1
It is a good practice to add version after colon, else default value is latest.
Finally push the image.
docker push <registry-url>/use_serving:v1
Using Openshift CLI we can quickly :
- Deploy the image and create application
oc new-app <registry-url>/use_serving:v1 --name=my-use-app -e MODEL_NAME=use-large
- Expose the service
oc expose svc/my-use-app
Now using the browser login to the Openshift cluster and head over to the route section. A URL would be created and make sure it is connected to port 8501 of the container. Check the pod status and increase RAM or CPU cores if needed.
Finally hit the cloud server and share the URL to generate model results. eg:curl -d '{"instances”: ["How you doing?","Any plans for tonight?"]}' -X POST <openshift-url>/v1/models/use-large:predict
Before pushing image to Docker registry make sure to login
docker login <registry-url>
Similarly for Openshift before creating an app, dooc login <cluster-url>
Do check this link for advance configuration like batch prediction, monitoring, multiple model deployment to name a few.
Any feedback would be much appreciated! :)