Titan Tutorial #10: A basic pipeline for Machine Learning
Ever since its inception, every detail and feature of Titan has been designed and built with interoperability in mind.
In order to facilitate the integration of our product in any corporate architecture, Titan is both agnostic with regard to the underlying Cloud (public or on-prem) and also with regard to all the potential integrations with other applications and pipelines.
Continuous Integration and Deployment (CI/CD) is a global denomination for all those (automatic) processes used to build, package and deploy all types of applications (including Machine Learning ones).
Using CI/CD services brings several important benefits for the life-cycle management of our applications (AI/ML model in our case) such as:
- Reduce human errors in repetitive tasks
- Speed up the release cycles
- Integration with the source code repository
A generic structure of a CI/CD pipeline is shown in the following picture:
Our first CI/CD model
In order to understand how Titan can be used in these pipelines we will work in a very simple process as shown in the figure below:
As it can be seen, our pipeline will do the following:
- Connection with the source code repository
- Use of a linter to ensure to identify errors, bugs or bad code practices (For our example we will use the Jupyter Notebook version of flake8)
- Once the code has been checked, it will be deployed using Titan
In terms of source code, we will use the simplest possible model, a Hello World to illustrate the example.
Let’s start with GitLab’s service for CI/CD: GitLaB CI. As for other CI/CD services, the pipeline configuration is simply made by defining a YAML specification of the steps.
Note that it is required to have a GitLab repository in order to being able to apply CI/CD!
The best way to understand how this all works it to go straight to the YAML specification:
Let’s analyze the structure of the file:
First of all, we define the two stages which will form the pipeline. On our case, we will have just two stages, the linting and the deployment.
After that, we can define each of the jobs.
# Lint the Jupyter Notebook
# Install Linter
- pip install flake8-nb
# Run Linter
- flake8-nb helloworld.ipynb
This first job is called lint, will use a python image in the GitLab environment and is linked to the lint stage previously defined through the
stage: lint line.
The command in this job are quite simple:
- Install the linter in the GitLab environment
- Run the linter
Note that, if this stage fails, the pipeline will be stopped ant it won’t proceed to the deploy stage.
We proceed in the same way with the deploy stage:
# Deploy stage will deploy our Titan service
# Install Titan CLI
- curl -sf https://install.akoios.com/beta | sh
# Deploy Notebook API service
- titan deploy --image scipy helloworld.ipynb
As for the linting, we create a new job called which will be linked to the deploy stage. The process is quite similar, we first install Titan and then we run our well know command:
$ titan deploy
You might be wondering how we can make Titan work without previous authentication. The trick is that GitLab enables the use of secret environment variables to this end, allowing to run Titan without compromising our credentials.
In the CI/CD settings of the GitLab repository it is possible to define these variables:
That would be it! Now, every time a commit is made to the master branch of the repository, the defined pipeline will be automatically started and will run the stages and jobs we have defined.
When the process is finished, we will have our model running in Titan as expected, as we can see in the dashboard:
Imagine now that we want to check if the linter is doing its job. In order to do that, we will introduce a syntactic mistake in our code (note the missing quotes in the
If we commit and push the changes, the CI process will start again but, as shown below, it will fail as expected:
Checking the logs at GitLab CI, we see that the error returned by the linter is the following:
You can find all the code in this GitLab Repository.
GitHub Actions Implementation
GitHub Actions is GitHub’s approach to CI and works in a very similar fashion as GitLab CI.
Similarly, GitHub Actions uses a YAML configuration file to create or pipelines:
As in the previous example with GitHub, we define several jobs including for each of them their environment setup, variables and tasks to perform.
Likewise, regarding the management of Titan’s authentication credentials is also made using secret variables:
You can find all the code in this GitHub repository.
In this post we have seen how to make use of Titan from two different CI/CD services, GitLab CI and GitHub Actions,
By using Titan in these type of services, it is possible to easily create pipelines to automate processes and reduce human errors. Moreover, this capability makes it easier to integrate Titan in different IT architectures and existing infrastructures.
Thanks for reading!
Titan can help you to radically reduce and simplify the effort required to put AI/ML models into production, enabling Data Science teams to be agile, more productive and closer to the business impact of their developments.
If you want to know more about how to start using Titan or getting a free demo, please visit our website or drop us a line at firstname.lastname@example.org.
If you prefer, you can schedule a meeting with us here.