The following is one of two posts published alongside the JustCause framework, which we developed at inovex as a tool to foster good scientific practice in the field
Recently, there has been a surge in interest in Causal Inference. It is, however, not always clear what is meant by the term and what the respective methods can actually do. This post gives a high-level overview over the two major schools of Causal Inference and then dives deep into the basics of one of them.
Recently, there has been a surge in interest in what is called Causal Inference. It i
In this blog post, we will first have a look at 3D deep learning with PointNet. Its creators provide a TensorFlow 1.x implementation of PointNet on Github, but since TensorFlow 2.0 was released in the meantime, we will transform it into an idiomatic TensorFlow 2 implementation in the second part of this post.
The world that we interact with each and every day is three-dimensional, but the majority of deep learning models process visual data as 2D images. However, there are
This blog post will compare three different tools developed to support reproducible machine learning model development: MLFlow developed by DataBricks (the company behind Apache Spark), DVC, a software product of the London based startup iterative.ai, and Sacred, an academic project developed by different researchers.
In my previous blog post „how to manage machine learning models“ I explained the difficulties within the process of developing a good machine learning mod
Unlike usual performance metrics, fairness, safety and transparency in machine learning models are much harder if not impossible to quantify. Here are some techniques (and examples) to provide interpretability, to make decision systems understandable not only for their creators, but also for their customers and users.
Machine learning has a great potential to improve data products and business processes. It is used to propose products and news articles that we might be interested i
Being in the data science domain for quite some years, I have seen good Jupyter notebooks but also a lot of ugly ones. Follow these best practices to to work more efficiently with your notebooks and strike the perfect balance between text, code and visualisations.
If you have ever done something analytical or anything closely related to data science in Python, there is just no way you have not heard of or IPython or Jupyter not
In this blogposts on deep learning model exploration, translation, and deployment we expand on the previous article with two additional approaches for model deployment: TensorFlow Serving and Docker as well as a rather hobbyist approach in which we build a simple web application that serves our model.
This is the second part of a series of two blogposts on deep learning model exploration, translation, and deployment. Both involve many technologies like PyTorch, Ten
In the past few moths a slew of Machine Learning management platforms arose. In this article we have a look at ModelDB which supports data scientists by keeping track of models, datasources and parameters. If you use scikit-learn or SparkML it promises easy integration and offers additional visualisation tools.
Developing a good machine learning model is not straight forward, but rather an iterative process which involves many steps. Mostly Data Scientists start by building
This article introduces EMNIST, we develop and train models with PyTorch, translate them with the Open Neural Network eXchange format ONNX and serve them through GraphPipe. We will orchestrate these technologies to solve the task of image classification using the more challenging and less popular EMNIST dataset.
This is the first part of a series of two blogposts on deep learning model exploration, translation, and deployment. Both involve many technologies like PyTorch, Tens
In this article we explain how time series forecasting tasks can be solved with machine learning models, starting with the problem modeling and ending with visualizing the results by embedding the models in a web app for demonstration purposes.
Recently, Machine Learning (ML) models have been widely discussed and successfully applied in time series forecasting tasks (Bontempi et al., 2012). In this blog arti
Traditional user-item recommenders often neglect the dimension of time, finding for each user a latent representation based on the user’s historical item interactions without any notion of recency and sequence of interactions. Sequence-based recommenders such as Multiplicative LSTMs tackle this issue.
Recommender Systems support the decision making processes of customers with personalized suggestions. They are widely used and influence the daily life of almost ever
In this article we present a simple solution for managing Isolated Environments with PySpark that we have been using in production for more than a year.
With the sustained success of the Spark data processing platform even data scientists with a strong focus on the Python ecosystem can no longer ignore it. Fortunately
Here's what changes when your data science project grows from a proof of concept. How do you deploy your model, how can updates be rolled out, ...?
A common pattern in most data science projects I participated in is that it’s all fun and games until someone wants to put it into production. From that point in time
Nowadays, Spark surely is one of the most prevalent technologies in the fields of data science and big data. Luckily, even though it is developed in Scala and runs in
The previous two posts gave a short introduction of network anomaly detection in general. We also introduced the k-means algorithm as a simple clustering technique an