Data Science

How to Manage Machine Learning Models

11 ​​min

Developing a good machine learning model is not straight forward, but rather an iterative process which involves many steps. Mostly Data Scientists start by building a so called baseline, which can be used as a reference point to compare other models. This baseline can be created by just calculating the average or using some simple models. After that a data scientist will probably try different models to see how they perform before doing some kind of hyper-parameter tuning to improve the most promising ones. Even if those models achieve good results, there will still be plenty of options to improve such as using more data (pre-)processing steps, creating additional features, using some form of dimensionality reduction or even applying stacking strategies.

Have a look at our 2022 update comparing frameworks for machine learning experiment tracking and see how MLflow, ClearML, and DAGsHub hold up!

Clearly this is an explorative process, which requires expertise and flexibility in tooling. Therefore data scientists mostly use notebooks to quickly try new ideas, rapidly train models and compare them, most of the times using simple print statements. This works well first, but will become confusing as the number of models and parameters increases. A common approach is write the prediction results to a table or dataframe, but even so it is difficult to track all important information such as used hyper-parameters, datasources, time of execution etc.

While big companies such as Google, Facebook and Uber develop custom machine learning platforms to support data scientists in this challenge, also smaller projects arose within the last months, for example DVC,  Sacred or Databricks‘ mlFlow. While we currently evaluate these, we also tested another alternative named ModelDB  earlier this year. The following blog article was created during this evaluation in May this year. While ModelDB might not be the best choice at this point of time, its evaluation explains the concept of machine learning model management and will be used as baseline in another article to follow in this series. 

ModelDB: Architecture and installation

ModelDB was developed at the computer science and artificial intelligence laboratory at MIT, integrates tightly with scikit-learn or SparkML and offers additional visualisation tools to evaluate model performance. It consists of a backend which stores the data, a frontend for visualisation purposes and some client libraries. For installation it’s easiest to clone the git repo and use docker-compose to set up the infrastructure. Beside the backend- and the frontend container you will find one which includes a MongoDB instance, but at the point of writing this is not really used, instead all data is stored in a SQLite database within the backend container.

The Backend service is implemented using the interface definition language Apache Thrift and compiled to Java. This way the backend service provides REST endpoints to access the data within the SQLite database. The Frontend-Service is provided by NodeJS using the Express-framework and Backbone as well as Vega to display charts.

To interact with these services modelDB provides clients for two ML-Frameworks: scikit-learn and SparkML. Sadly the compilation with Scala 2.13 does not work.

The installation of the scikit-learn client instead works via pip: pip install modeldb . However, if you want to follow this tutorial, you will need to build it from source until my pull request gets accepted. So clone the modeldb repo and run client/python/

Train a Regression model

ModelDB already provides some examples which mostly do classification tasks, so let’s try some regression. We will use the Boston Housing Dataset which is contained within scikit-learn. If you want to run the notebook yourself, you can find it at github. I’ll walk you through it below.

First, let’s make some imports and check the data:

The dataset has 14 columns, where 13 are attributes and one is the target variable. They do not contain any nulls and are formatted numerically, even that CHAS actually is a Boolean value.

Let’s see, how a simple linear regression without ModelDB would look like:

We get an absolute error of 2.499. To do the same using ModelDB, we first need to import the library and then create a syncer object by providing a project, an experiment and and an experiment-run object (If you get an error, make sure your docker-containers are up and running).

After that, we can reuse the linear regression code from above with a few minor changes: We do not use the scikit-learn classes directly, but through modelDB which extends them with a function called *_sync. This function tells the syncer object to keep track of the calculated object. You could even minimise the changes by simply overwriting the default scikit-learn objects using an import like

, but we will stay with the mdb prefix to make clear which feature gets used. Finally we calculate the absolute and the squared mean error and tell the syncer to synchronize these changes to the backend-service.

After that the modelDB frontend should show you a project called ModelDB Evaluation probably at Within this project, you will find a simple diagram, which shows two dots, representing the two error scores we just calculated. You can also click on those and see further information about the model on the sidebar on the right. Feel free to adapt the code and try some other parameters and models to see what happens.

ModelDB Projects overviewModelDB Evaluation showing error points

Add additional regressors to ModelDB

If you played a bit around and tried to exchange the linear regression for some alternative with regularisation such as Ridge or Lasso, you will have found an error such as:

This happens because ModelDB does not support any regressors other than linear regression yet. Luckily it is really easy to add those. To do so open the file client/python/modeldb/sklearn_native/ and find the enable_sklearn_sync_functions function. Within that, you find an array containing all models that the fit_sync and the predict_sync functionality should get added to. Just add ElasticNet, Ridge, Lasso, and any other model you want to use. After that, we can increment the version number of the library since we did add a new feature. You’ll find it in client/python/

Now we will run client/python/ to package the new version and import it into our notebook using:

Compare multiple models

Using modelDB for only a few models is like cracking a nut with a sledge hammer. To become useful we will need some more model-parameter combinations. So let’s apply grid search using ModelDB.

We are going to use ElasticNet, which is a regularized version of a linear regression and applies both, l1 and l2 regularization. The parameter alpha defines the absolute factor of the regularization, while l1_ratio defines the mixture between l1 and l2 regularization. We use a couple of different values and  4-fold cross validation. The following steps are similar to before, before we finally compute mean average and squared error of the best predictor on the test-set. After the code got completed the ModelDB UI should look something like this:

Linear regression with l1 and l2 regularization

We find a point per parameter k-fold combination, so 5*5*5 = 125 points. These are quite some models to compare and ModelDB supports us doing so with an additional tool just below the default chart. In the select fields choose continuous as metric to display on the y-axis and the parameters alpha and l1_ratio as x-axis and group-by values. With compare you should receive a bar-chart comparing the average model performance across the calculated folds.

ModelDB Example Visualisation

If you are more of a numbers type you can also compare your models using a table. To do so switch from the „Models and Charts“ to the „Models“ page. You will find a table of all stored models which can be filtered and grouped using the same drag and drop mechanism. However it is difficult to compare results using this view, since parameters don’t get their own column. Just use „create table“ and add all columns you’re interested in to generate a customized table. First, drag the experiment_run_id to the filter section on the left sidebar reduce the set of values. Then place the fields you are are interested in in the Customize panel to generate a table based on these. However you should not place too many combinations there, since the HTML table lacks scrolling functionality.


ModelDB makes it easy to structure your models and helps you to analyze them to find the best combination. It is limited to scikit-learn (and SparkML algorithms), but provides an easy and minimally invasive way to integrate for implemented methods. However, only a subset of scikit-learn features are supported and ModelDB does neither support complex models nor randomSearch instead of gridSearch. Even the regularized versions of linear regressions haven’t been supported until recently, even though it is easy to add such functionality.

But the concept of the tightly integrated clients struggles by design: ModelDB overwrites scitkit-learn’s native functionality with custom extensions which are likely to break as scikit learn releases new versions. Without adaptions this will prevent modelDB users from updating to scikit-learn versions later then 17.2. The extension of an independent and changing framework is troublesome in general, especially if there is no powerful vendor behind it. ModelDB did not progress too much this year, even though there are definitely areas that could use improvement, such as a more intuitive UI, missing components in the library and the lack of an extensive documentation.


The concept of a framework which provides structure to machine learning experiments looks really promising. ModelDB uses a modern technology stack and provides many features for model comparison but lacks maturity and documentation. The project works as inspiration and for experiments, but will need some more supporters to stay alive, especially since the clients will need continuous work to stay in sync with the machine learning libraries supported. Therefore be careful using it in a production environment or be prepared to contribute some work. Luckily there have been some similar projects popping up recently such as datmo, dvc or Sacred. We we’ll probably have a look at those in the future.

2 Kommentare

  1. Note: The info here is outdated, especially concerning ModelDBs support only for SparkML and scikit-learn. It’s referring to an older version of ModelDB.

Hat dir der Beitrag gefallen?

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert