24/7 Spark Streaming on YARN in Production

At a large client in the German food retailing industry, we have been running Spark Streaming on Apache Hadoop™ YARN in production for close to a year now. Overall, Spark Streaming has proved to be a flexible, robust and scalable streaming engine. However, one can tell that streaming itself has been retrofitted into Apache Spark™. Many of the default configurations are not suited for a 24/7 streaming application. The same applies to YARN, which was not primarily designed with long-running applications in mind. Weiterlesen

Comparing Apache Flink and Spark: Stream vs. Batch Processing

Flink has its origins in a research project called Stratosphere but was donated to the Apache Software Foundation in 2014. It can be described as a modern, more effective replacement of map reduce and has quite some similarities to Apache Spark. For example, the API resembles the Spark API and both adress similar use cases. Furthermore you will find a counterpart for almost every Spark component in Flink, e.g. for Machine Learning and Graph Processing. Read on for a quick comparison! Weiterlesen