Causal Inference and Propensity Score Methods

In the field of machine learning and particularly in supervised learning, correlation is crucial to predict the target variable with the help of the feature variables. Rarely do we think about causation and the actual effect of a single feature variable or covariate on the target or response. Some even go so far as to say that „correlation trumps causation“ like in the book „Big Data: A Revolution That Will Transform How We Live, Work, and Think“ by Viktor Mayer-Schönberger and Kenneth Cukier. Following their reasoning, with Big Data there is no need to think about causation anymore, since nonparametric models will do just fine using correlation alone. For many practical use cases, this point of view may seem acceptable — but surely not for all.

Weiterlesen

Web Performance Optimisation (Part 2): Perceived Performance

In the previous article of this series we talked about the boundaries of Web performance optimization (WPO). We introduced some metrics and a scale of measurement but we omitted one large part of the problem due to the purely technical point of view we took previously. In reality, while time and performance can be measured objectively it is perceived subjectively by humans. Static measurement scales and performance analyses are relevant for performance measurements but largely irrelevant in real world usage. As long as users have to interact with an application we have to make sure that we test and evaluate the application’s performance from a real user’s perspective. Consequently in this part of the series we want to talk about perceived performance and why it makes sense to put the user first when it comes to WPO. Weiterlesen

Kubernetes Logging with Fluentd and the Elastic Stack

Kubernetes and Docker are great tools to manage your microservices, but operators and developers need tools to debug those microservices if things go south. Log messages and application metrics are the usual tools in this cases. To centralize the access to log events, the Elastic Stack with Elasticsearch and Kibana is a well-known toolset. In this blog post I want to show you how to integrate the logging of Kubernetes with the Elastic Stack. To start off, I will give an introduction to the log mechanism of Kubernetes, then I’ll show you how to collect the resulting log events and ship them into the Elastic Stack. I also provide a GitHub repository with a working demo. Finally, I highlight some considerations for the production deployment. Weiterlesen

24/7 Spark Streaming on YARN in Production

At a large client in the German food retailing industry, we have been running Spark Streaming on Apache Hadoop™ YARN in production for close to a year now. Overall, Spark Streaming has proved to be a flexible, robust and scalable streaming engine. However, one can tell that streaming itself has been retrofitted into Apache Spark™. Many of the default configurations are not suited for a 24/7 streaming application. The same applies to YARN, which was not primarily designed with long-running applications in mind. Weiterlesen

Modern CI/CD with Jenkins 2 and GitLab CI [Comparison]

Continuous integration (CI) and continuous delivery (CD) are a great help, providing the flexibility needed for agile software development methods like Scrum and Kanban. With CI/CD, you don’t have to constantly struggle with the build and deployment processes of your software project. Once correctly configured, you can be assured that the whole build and delivery process is just a matter of pushing the code into the source code management system or even more simply pressing a button. Weiterlesen

Using ReactJS with AngularJS

After releasing React in 2013, Facebook kicked off some discussions in the JavaScript community about comparing React with common libraries. Indeed, the library increases the pressure on established Frameworks because of its many modern ideas and patterns. Concepts like the implementation of a virtual DOM and the component architecture allowed programmers to create web applications which run faster and are more easily maintainable. At the same time frameworks like Angular still suffer from performance problems heating up the discussions about their concurrency.

Beside the ongoing competition between React and Angular, some people try to benefit from both technologies. The ngReact library is an example of such an approach. The Angular module allows using React as a view component inside Angular applications. Usual Angular templates can be replaced with React components which try to give you more flexibility and promise to improve the overall performance.

This blog article examines the use of the ngReact module to speed up Angular’s list rendering performance. Weiterlesen

Web Performance Limitations (Part 1)

When web developers talk about the web today they often discuss topics around web performance optimization (WPO). Nowadays it’s an even more important topic, since we use the browser for almost any type of application with many different devices and different connection types from all over the world. It’s a complex environment where dozens of lines of code get written and executed. Companies like Amazon and eBay have huge decreases in revenue when their site loading times increase. In 2008, Amazon reported that they approximately lose 1% of revenue for every 100 ms increase in loading time, which when we think about the revenues of Amazon in 2015 (107 billion dollar a year) would imply a loss of 1.07 billion dollar a year.

And that’s by far not everything. In a recent talk given by one of the head Opera developers, Bruce Lawson mentioned that more users and devices will hit the WWW: Emerging markets like Africa and India are growing fast. He predicts that we will have roughly 3 billion more users using the WWW in the next 50 years and that they will come from emerging markets. They will not connect to the internet with a high end desktop computer, they will start using the internet with a low budget smartphone, with slow internet connections and low processing capabilities. These markets will eventually be our next customers, and therefore we need to emphasise the need of performance, so that everybody is able to use our web applications no matter their device or internet connection quality!

In this blog series I want to discuss what web performance is, how to look at it from a user-centric perspective and show which kind of optimization techniques we can use to make our web applications faster.

First, let’s start this series by showing the limits of web performance optimization to increase awareness for measurement parameters like seconds (sec) and milliseconds (ms). Weiterlesen