Neural networks are one of the technologies that have the potential to change our lives forever. Besides lots of applications and machines in the industry they have d
We built a text spotting (OCR) pipeline that out-performed Google Cloud Vision using semi-supervised Generative Adversarial Networks.
Despite all advances in machine learning due to the advent of deep learning, the latter has one major shortcoming: It requires a lot of data during the learning proce
In this blog series we explain how you can train and deploy a convolutional neural network for image classification to a mobile app using TensorFlow Mobile.
Smart Assistants, fancy image filters in Snapchat and apps like Prisma all have one thing in common—they are powered by Machine Learning. The use of Machine Learning
In this article we present a simple solution for managing Isolated Environments with PySpark that we have been using in production for more than a year.
With the sustained success of the Spark data processing platform even data scientists with a strong focus on the Python ecosystem can no longer ignore it. Fortunately
In this blog, I’ll explain some of the basic concepts of differential privacy and talk about how I’ve used it in my Bachelor’s Thesis.
Differential Privacy is a topic of growing interest in the world of Big Data. It is currently being deployed by tech giants like Google and Apple to gain knowledge ab
Let's use a Hive UDF to perform lookups against resources residing in the Hadoop file system (HDFS) which allows non-equi joins.
In today’s blog I am going to take a look at a fairly mundane and unspectacular use of a Hive UDF (user-defined function), that of performing lookups against re
Here's what changes when your data science project grows from a proof of concept. How do you deploy your model, how can updates be rolled out, ...?
A common pattern in most data science projects I participated in is that it’s all fun and games until someone wants to put it into production. From that point in time
In this part of our network anomaly detection blogpost series we want to compare two basically different styles of learning.
In this part of our network anomaly detection series we want to compare two basically different styles of learning. The very first post introduced the simple k-means
In diesem Mittschnitt unseres Meetups zeigt Tracking Fan Wolfgang, wie er die Daten seiner Garmin Watch selbst mit Elasticsearch ausgewertet hat.
In diesem Mittschnitt unseres Meetups in Karlsruhe zeigt Wolfgang, ein begeisterter Triathlet und Tracking Fan, wie er die Daten seiner Garmin Watch selbst mit Elasti
Nowadays, Spark surely is one of the most prevalent technologies in the fields of data science and big data. Luckily, even though it is developed in Scala and runs in
The previous two posts gave a short introduction of network anomaly detection in general. We also introduced the k-means algorithm as a simple clustering technique an
Before we actually dive into this topic, imagine the following: You just moved to a new place and the time is ripe for a little house-warming dinner with your best fr
In the previous post we talked about network anomaly detection in general and introduced a clustering approach using the very popular k-means algorithm. In this blog
In order to build data services or advanced machine learning models, organizations must integrate large amounts of information from diverse sources.
In order to build data services or advanced machine learning models, organizations must integrate large amounts of information from diverse sources. As a central plac
In supervised learning, correlation is crucial to predict the target variable with the help of the feature variables. But what good is causation?
In the field of machine learning and particularly in supervised learning, correlation is crucial to predict the target variable with the help of the feature variables