With the sustained success of the Spark data processing platform even data scientists with a strong focus on the Python ecosystem can no longer ignore it. Fortunately
Here's what changes when your data science project grows from a proof of concept. How do you deploy your model, how can updates be rolled out, ...?
A common pattern in most data science projects I participated in is that it’s all fun and games until someone wants to put it into production. From that point in time
Nowadays, Spark surely is one of the most prevalent technologies in the fields of data science and big data. Luckily, even though it is developed in Scala and runs in
The previous two posts gave a short introduction of network anomaly detection in general. We also introduced the k-means algorithm as a simple clustering technique an
Before we actually dive into this topic, imagine the following: You just moved to a new place and the time is ripe for a little house-warming dinner with your best fr
In the previous post we talked about network anomaly detection in general and introduced a clustering approach using the very popular k-means algorithm. In this blog
Real-time detection of anomalies in computer networks with methods of machine learning: Stop the (data)-thief!Julian Keppel 2017-11-27T15:18:05+00:00
This blog post describes some basic concepts and shows a prototypical architecture for network anomaly detection in real-time.
This blog post shows some results and concepts of a master’s thesis here at inovex. It describes some basic concepts and shows a prototypical architecture for detecti
In supervised learning, correlation is crucial to predict the target variable with the help of the feature variables. But what good is causation?
In the field of machine learning and particularly in supervised learning, correlation is crucial to predict the target variable with the help of the feature variables
In this post we focus on how to write sophisticated User Defined (Aggregated) Functions (UD(A)Fs) for Apache Hive in Python.
Sometimes the analytical power of built-in Hive functions is just not enough. In this case it is possible to write hand-tailored User-Defined Functions (UDFs) for tra
Untersuchung der Implementierung und Praxistauglichkeit von HyperLogLog auf Apache Spark Streaming mithilfe eines einfachen Prototyps.
Im Rahmen eines Research-Projektes wurde die Implementierung und Praxistauglichkeit von HyperLogLog auf Apache Spark Streaming mithilfe eines einfachen Prototyps unte
In this final blog post of our 3 part series we will have a look at how you can build your own Apache Mesos framework.
In this final blog post of our 3 part series we will have a look at how you can build your own Apache Mesos framework. If you’re new to Mesos have a look at our
We present a simple way to create your Cassandra cluster and experiment with data modeling, different configurations, cluster sizes, topologies etc.
Apache Cassandra is a really impressive piece of technology. When it comes to extreme performance requirements, it is definitely a solution one should look into. Yet
We want to show how to run tasks/applications on your Mesos cluster with Marathon, an init-system for Mesos built and maintained by Mesosphere.
In the previous blog post we described the basics and components of Mesos. Now we want to show you how to run tasks/applications on your Mesos cluster with Marathon,
This time we're talking Android embedded, IPv6 for mobile and women in tech – more precisely Sophie Wilson, the creator of the ARM architecture.
Another month passed, another retrospex due. This time we’re talking Android embedded, IPv6 for mobile and women in tech – more precisely Sophie Wilson, the cre
inovex retrospex will from now on be published monthly. This time themes are Docker and Google.
Well, turns out that collecting and curating interesting news about technology does not work out on a weekly basis when it is not your main focus. So here we go: inov