Service versioning and rollbacks

6 min readMar 17, 2020

Titan Tutorial #6: Managing service versions for a price prediction model

Version Control Systems have a paramount importance in all types of software development, including of course the AI/ML models we work with in a daily basis. Version control applies to any kind of practice that tracks and provides control over changes to source code and, in the case of Data Science, also to datasets.

In this new tutorial, we will detail our approach to service versioning in order to manage deployed services.

When designing Titan, we took a very clear approach to versioning that can be summarized in two main ideas:

Reduce the impact over the current workflow of our users, allowing them to continue using the VCS (Version Control System) of their choice (GitHub, GitLab, BitBucket…)
Provide effective means to effectively manage the versioning of the deployed services.

With those principles in mind, Titan offers a simple and seamless approach with an easy-to-use built-in versioning system for the deployed models.

In our product, version control is built upon a naming convention:

Jupyter Notebook files with the same name will be considered as different versions of the same service in successive deployments.

Since this mechanism “freezes” both the model and the used dataset/s at a determined state, it ensures reproducibility and makes it easy to move between different version of the service.

Creating a Neural Network model for Real Estate price prediction

Let’s illustrate this with a specific example. Imagine we are building a simple Neural Network to predict housing prices using this well-known dataset and the example shown in this great post by Joseph Lee Wei En.

The inputs for the model are:

Lot Area (in sq ft)
Overall Quality (scale from 1 to 10)
Overall Condition (scale from 1 to 10)
Total Basement Area (in sq ft)
Number of Full Bathrooms
Number of Half Bathrooms
Number of Bedrooms above ground
Total Number of Rooms above ground
Number of Fireplaces
Garage Area (in sq ft)

The output (the prediction) is the following

Is the house price above the median or not? (1 for yes and 0 for no)

After splitting the data, we can start preparing a simple Neural Network to perform the prediction:

Input Layer (size=10)
Hidden Layer #1 (size=32)

Activation: ReLU

Hidden Layer #2 (size=32)

Activation: ReLU

Output Layer (size=1)

Activation: Sigmoid

For this model, we will use Keras to define the Neural Network (NN). Defining our NN is as easy as:

model = Sequential([
    Dense(32, activation='relu', input_shape=(10,)),
    Dense(32, activation='relu'),
    Dense(1, activation='sigmoid'),
])

Once it has been defined, we can configure the model:

model.compile(optimizer='sgd',
              loss='binary_crossentropy',
              metrics=['accuracy'])

The parameters for the training are the following:

optimizer='sgd'

We will be using stochastic gradient descent optimizer.

loss='binary_crossentropy'

The loss function will be measured using binary cross entropy.

metrics=['accuracy']

And, finally, accuracy will be the chosen performance metric.

After the configuration, we are ready to train the model:

hist = model.fit(X_train, Y_train,
          batch_size=32, epochs=100,
          validation_data=(X_val, Y_val))

In order to assess, the performance we will extract the loss and accuracy and we will expose these parameters through Titan to have them monitored:

# GET /loss 
print("Model Loss is {} ".format(model.evaluate(X_test, Y_test)[0]))

In a different cell we expose the accuracy:

# GET /accuracy 
print("Model Accuracy is {} ".format(model.evaluate(X_test, Y_test)[1]))

Finally, we can prepare an additional endpoint to return predictions based on specific input data:

# POST /prediction
body = json.loads(REQUEST)[‘body’]
input_params = json.loads(body)[‘data’]
input_array = np.array(input_params)
model.predict(input_array)

Now we are almost ready to deploy the model. Before the deployment, and as shown in our previous tutorial, we need to provision the environment for the service we are about to deploy. Doing this is as easy as creating an additional markdown cell at the beginning of our notebook:

``yaml
titan: v1
service:
  image: tensorflow
  machine:
    cpu: 2
    memory: 2048MB
```

Please note that in the config cell above we are requesting 2 CPUs, 2GB and we will be using the tensorflow environment.

Once this has been added, we can proceed to the deployment using:

$ titan deploy

If the deployment works as expected, the service will be created and it will be visible in the dashboard:

As shown in the picture, the deployed service has been automatically tagged as v1.

Now that the API has been deployed, we can check the loss and accuracy in the GET endpoints we defined previously:

Model Accuracy is 0.8493150472640991Model Loss is 0.34327148111987876

And, of course, we can make a call to the /prediction endpoint to check that the service is working.

“Improving” our model

Imagine now that we want to improve the performance of our Neural Network by adding additional layers and neurons:

model = Sequential([
Dense(1000, activation='relu', input_shape=(10,)),
Dense(1000, activation='relu'),
Dense(1000, activation='relu'),
Dense(1000, activation='relu'),
Dense(1, activation='sigmoid'),])model.compile(optimizer='adam',loss='binary_crossentropy',
metrics=['accuracy'])hist = model.fit(X_train, Y_train, batch_size=32, epochs=100,

We deploy the model again using $titan deploy without altering the filename of our Notebook to ensure the service versioning. After deploying, we can see the new version of the service in the dashboard:

As it can be seen, it is pretty easy to keep track of the different versions of the services as it can be checked in the detail view of the dashboard:

It is now possible to check the performance of the newly deployed service:

Model Accuracy is 0.8995434045791626Model Loss is 0.25629456089511854

Apparently, version 2 of the services seems to be working better (improved accuracy and reduced loss) but, after better checking, the team discovers that this version is badly overfitted and probably delivering poor predictions:

Titan Rollbacks

Accidentally, in our attempt to improve the accuracy of our model, we have instead worsened its performance.

Fortunately, Titan easily allows to seamlessly go back to a previous version of the service with no downtime in the transition.

To make a rollback, we just have to run:

$ titan rollback

With just a command we have been able to re-deploy a previous version without affecting the availability of our endpoints!

As usual, you can clone this repository to tinker with the code.

Wrap-up

In this tutorial we have seen how Titan makes it really easy for Data Science Teams to manage their deployed services while keeping their usual code versioning tools. Moreover, we have also seen how to quickly rollback to previous versions in case something goes wrong in a deployed model using the rollback function of Titan.

Next Tutorial

If you are enjoying our tutorials, make sure to check our next post where we build and deploy a basic Sentiment Analysis model.

Foreword

Titan can help you to radically reduce and simplify the effort required to put AI/ML models into production, enabling Data Science teams to be agile, more productive and closer to the business impact of their developments.

If you want to know more about how to start using Titan or getting a free demo, please visit our website or drop us a line at info@akoios.com.

If you prefer, you can schedule a meeting with us here.