{"id":2575,"date":"2016-12-22T16:41:33","date_gmt":"2016-12-22T15:41:33","guid":{"rendered":"https:\/\/www.inovex.de\/?p=2575"},"modified":"2026-02-24T07:31:03","modified_gmt":"2026-02-24T06:31:03","slug":"247-spark-streaming-on-yarn-in-production","status":"publish","type":"post","link":"https:\/\/www.inovex.de\/de\/blog\/247-spark-streaming-on-yarn-in-production\/","title":{"rendered":"24\/7 Spark Streaming on YARN in Production"},"content":{"rendered":"<p>At a large client in the German food retailing industry, we have been running Spark Streaming on <em>Apache Hadoop\u2122<\/em> YARN in production for close to a year now. Overall, Spark Streaming has proved to be a flexible, robust and scalable streaming engine. However, one can tell that streaming itself has been retrofitted into <em>Apache Spark\u2122<\/em>. Many of the default configurations are not suited for a 24\/7 streaming application. The same applies to YARN, which was not primarily designed with long-running applications in mind.<!--more--><\/p>\n<div style=\"margin: 7px; padding: 7px; border-left: 6px solid #9CCD00;\">\n<p>Update 2017-01-20: This article has been featured on<\/p>\n<ul>\n<li>Hadoop Weekly Newsletter #198<\/li>\n<li><a href=\"http:\/\/roaringelephant.org\/2017\/01\/17\/episode-33-roaring-news\/\" target=\"_blank\" rel=\"noopener\">Roaring Elephant Podcast Episode 33<\/a><\/li>\n<li><a href=\"http:\/\/tinyletter.com\/datamachina\/letters\/data-machina-y4-week-1\" target=\"_blank\" rel=\"noopener\">Data Machina Newsletter Y4 Week 1<\/a><\/li>\n<\/ul>\n<\/div>\n<p>This article summarizes the lessons learned with running 24\/7 Spark Streaming applications on YARN in a production environment. It is broken down into the following chapters:<\/p>\n<ol>\n<li><a href=\"#use-case\">Use Case<\/a>: the client\u2019s use case for Spark Streaming<\/li>\n<li><a href=\"#configuration\">Configuration<\/a>: relevant configuration options and a reference <span class=\"lang:default decode:true crayon-inline \">spark-submit<\/span> command<\/li>\n<li><a href=\"#deployment\">Deployment<\/a>: how to restart the application without data loss and deployment of code changes<\/li>\n<li><a href=\"#monitoring\">Monitoring<\/a>: which components should be monitored and how to monitor a Spark Streaming application<\/li>\n<li><a href=\"#logging\">Logging<\/a>: how to customize logging on YARN and conceptual ideas on log analysis methods<\/li>\n<li><a href=\"#conclusion\">Conclusion<\/a>: a short conclusion regarding Spark Streaming in production<\/li>\n<\/ol>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_83 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\"><p class=\"ez-toc-title\" style=\"cursor:inherit\"><\/p>\n<\/div><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.inovex.de\/de\/blog\/247-spark-streaming-on-yarn-in-production\/#Spark-Streaming-Use-Case\" >Spark Streaming Use Case<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.inovex.de\/de\/blog\/247-spark-streaming-on-yarn-in-production\/#Configuration\" >Configuration<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.inovex.de\/de\/blog\/247-spark-streaming-on-yarn-in-production\/#Number-of-Executors\" >Number of Executors<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.inovex.de\/de\/blog\/247-spark-streaming-on-yarn-in-production\/#Executor-Memory\" >Executor Memory<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.inovex.de\/de\/blog\/247-spark-streaming-on-yarn-in-production\/#YARN-Configuration\" >YARN Configuration<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.inovex.de\/de\/blog\/247-spark-streaming-on-yarn-in-production\/#Spark-Delay-Scheduling\" >Spark Delay Scheduling<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.inovex.de\/de\/blog\/247-spark-streaming-on-yarn-in-production\/#Backpressure\" >Backpressure<\/a><ul class='ez-toc-list-level-4' ><li class='ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/www.inovex.de\/de\/blog\/247-spark-streaming-on-yarn-in-production\/#Minimum-Rate\" >Minimum Rate<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/www.inovex.de\/de\/blog\/247-spark-streaming-on-yarn-in-production\/#Initial-Rate\" >Initial Rate<\/a><\/li><\/ul><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/www.inovex.de\/de\/blog\/247-spark-streaming-on-yarn-in-production\/#Deployment\" >Deployment<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/www.inovex.de\/de\/blog\/247-spark-streaming-on-yarn-in-production\/#Monitoring-Spark-Streaming-Applications\" >Monitoring Spark Streaming Applications<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/www.inovex.de\/de\/blog\/247-spark-streaming-on-yarn-in-production\/#Source-Monitoring\" >Source Monitoring<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/www.inovex.de\/de\/blog\/247-spark-streaming-on-yarn-in-production\/#Spark-Monitoring\" >Spark Monitoring<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-14\" href=\"https:\/\/www.inovex.de\/de\/blog\/247-spark-streaming-on-yarn-in-production\/#Custom-Spark-Metrics\" >Custom Spark Metrics<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-15\" href=\"https:\/\/www.inovex.de\/de\/blog\/247-spark-streaming-on-yarn-in-production\/#Logging\" >Logging<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-16\" href=\"https:\/\/www.inovex.de\/de\/blog\/247-spark-streaming-on-yarn-in-production\/#Conclusion\" >Conclusion<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-17\" href=\"https:\/\/www.inovex.de\/de\/blog\/247-spark-streaming-on-yarn-in-production\/#Get-in-touch\" >Get in touch<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-18\" href=\"https:\/\/www.inovex.de\/de\/blog\/247-spark-streaming-on-yarn-in-production\/#Were-hiring\" >We&#8217;re hiring<\/a><\/li><\/ul><\/nav><\/div>\n<h2 id=\"use-case\"><span class=\"ez-toc-section\" id=\"Spark-Streaming-Use-Case\"><\/span>Spark Streaming Use Case<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Spark Streaming is used for a variety of use cases at the mentioned client. This blog post focusses on one streaming application, which processes about 70 mio. transactions per day on weekdays. The transactions arrive in batched messages, where each message contains about 200 transactions. During opening hours of the retail stores, about 1700 transactions per second are processed, with hourly peaks of 2400.<\/p>\n<p>The error-free operation of the application is critical since many different applications continuously consume the data in the <em>Apache HBase\u2122<\/em> output tables.<\/p>\n<p>The application architecture is illustrated below:<\/p>\n<p><a href=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2016\/12\/architecture.png\" rel=\"attachment wp-att-2577\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-2577\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2016\/12\/architecture.png\" alt=\"Short sketch of the architecture\" width=\"940\" height=\"369\" \/><\/a><\/p>\n<p><em>JMS Receiver<\/em>: Fetches messages from the IBM MQ and converts them into a serializable format. Technically, the receiver also runs in an executor. However, the illustration uses this distinction since the receiver is developed as a standalone component and has nothing to do with the Spark RDD concept.<\/p>\n<p><em>Executors<\/em>: Each microbatch RDD is processed as follows:<\/p>\n<ol>\n<li><em>Parsing &amp; Validation<\/em>: This is done in a series of <span class=\"lang:default decode:true crayon-inline \">map()<\/span> , <span class=\"lang:default decode:true crayon-inline \">flatMap()<\/span> and <span class=\"lang:default decode:true crayon-inline \">filter()<\/span> RDD transformations.<\/li>\n<li><em>Data enrichment<\/em>: Batched HBase Gets are executed within RDD <span class=\"lang:default decode:true crayon-inline \">mapPartitions()<\/span> transformations to enrich the transactions with additional information.<\/li>\n<li><em>Output to HBase<\/em>: Save the transactions into different HBase tables using Bulk Puts. This happens in a series of RDD <span class=\"lang:default decode:true crayon-inline \">foreachPartition()<\/span> actions. The RDD actions are preceded by a <span class=\"lang:default decode:true crayon-inline \">persist()<\/span> and followed by an <span class=\"lang:default decode:true crayon-inline \">unpersist()<\/span> of the transformed RDD.<\/li>\n<\/ol>\n<h2 id=\"configuration\"><span class=\"ez-toc-section\" id=\"Configuration\"><\/span>Configuration<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>The majority of Spark Streaming configurations can be passed as parameters of the <span class=\"lang:default decode:true crayon-inline \">spark-submit<\/span> command. The following script contains a reference <span class=\"lang:default decode:true crayon-inline \">spark-submit<\/span> invocation for the described use case:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/bernhardschaefer\/4309f728f66879c0a8c062be0801057b.js\"><\/script><\/p>\n<p>Some settings, such as number of executors, are job specific. Still, the script could serve as a template for configuring Spark Streaming applications. It is applicable to Spark 1.x and Spark 2.x and states which properties are only relevant to one major version.<\/p>\n<p>The following sections cover the different configuration aspects of the Spark Streaming application in more detail.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Number-of-Executors\"><\/span>Number of Executors<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>The ideal number of executors depends on various factors:<\/p>\n<ol>\n<li>Incoming events per second, especially during peaks<\/li>\n<li>Buffering capabilities of the streaming source<\/li>\n<li>Maximum allowed lag, i.e. is it tolerable if the Streaming application lags behind by 3 minutes during a very high peak<\/li>\n<\/ol>\n<p>It can be tweaked by running the streaming application in a preproduction environment and monitoring the streaming statistics in the Spark UI. As a general guideline:<\/p>\n<p><span class=\"lang:default decode:true crayon-inline \">Processing Time + Reserved Capacity &lt;= Batch Duration<\/span><\/p>\n<p>The reserved capacity depends on the aforementioned factors. The tradeoff lies between idling cluster resources versus maximum allowed lag during peaks.<\/p>\n<p>For the described application, the number of executors is set to 6, with 3 cores per executor. This leaves 17 tasks which can be processed in parallel, since the receiver takes up one task.<\/p>\n<p>The following illustration shows the application\u2019s streaming statistics using a 10s batch duration:<\/p>\n<p><a href=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2016\/12\/statistics.png\" rel=\"attachment wp-att-2581\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-2581\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2016\/12\/statistics.png\" alt=\"Graphs with streaming data in 10s batches\" width=\"940\" height=\"712\" \/><\/a><\/p>\n<p>As can be seen, the average processing time is about 7.5s. Overall, the scheduling delay is close to zero, with occasional short peaks of up to 10s. These peaks are usually due to other applications running on the cluster or occasional high load on the streaming source.<\/p>\n<p>With more executors, the processing time would further decrease. However, this also decreases the overall resource utilization of the cluster.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Executor-Memory\"><\/span>Executor Memory<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>The executor memory consumption depends on:<\/p>\n<ul>\n<li>The window of data that is processed within each batch.<\/li>\n<li>The kind of transformations being used. If the job uses <a href=\"http:\/\/spark.apache.org\/docs\/2.0.2\/programming-guide.html#shuffle-operations\" target=\"_blank\" rel=\"noopener\">shuffle transformations<\/a>, memory consumption can be high.<\/li>\n<li>In the described use case, memory consumption is rather low, since only the current batch needs to fit in memory and there are no wide dependencies that require shuffling. Therefore, only 3GB memory is assigned to each executor.<\/li>\n<\/ul>\n<h3><span class=\"ez-toc-section\" id=\"YARN-Configuration\"><\/span>YARN Configuration<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>The YARN configurations are tweaked for maximizing fault tolerance of our long-running application. The default value for <span class=\"lang:default decode:true crayon-inline \">spark.yarn.executor.memoryOverhead<\/span> is calculated as follows:<\/p>\n<p><span class=\"lang:default decode:true crayon-inline \">min(384, executorMemory * 0.10)<\/span><\/p>\n<p>When using a small executor memory setting (e.g. 3GB), we found that the minimum overhead of 384MB is too low. In some instances, this lead to YARN killing containers due to excessive memory usage. Therefore, memory overhead should be increased in case the executor memory is lower than 10GB. The same applies to <span class=\"lang:default decode:true crayon-inline \">spark.yarn.driver.memoryOverhead<\/span> , where we assign at least 512M.<\/p>\n<p>To make sure that temporary failures do not lead to a stop of our application, we increase the allowed driver and executor failures using <span class=\"lang:default decode:true crayon-inline \">spark.yarn.maxAppAttempts<\/span> and <span class=\"lang:default decode:true crayon-inline \">spark.yarn.max.executor.failures<\/span> . Moreover, only the failure counts of the last hour are considered for the thresholds. We adopted those settings from a <a href=\"http:\/\/mkuthan.github.io\/blog\/2016\/09\/30\/spark-streaming-on-yarn\/\" target=\"_blank\" rel=\"noopener\">spark streaming on yarn blog article<\/a>, which provides further details in the \u201cFault Tolerance\u201c section.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Spark-Delay-Scheduling\"><\/span>Spark Delay Scheduling<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>A RDD partition can contain a preferred location, such as the nodes the data resides when reading from HDFS. For assigning tasks to executors, Spark uses Delay Scheduling (see RDD &amp; Delay Scheduling papers). With the default <span class=\"lang:default decode:true crayon-inline \">spark.locality.wait<\/span> setting, the driver waits up to 3 seconds to launch a data-local task before giving up and launching it on a less-local node. The same wait will be used to step through multiple locality levels (process-local, node-local, rack-local and then any). For our job this results in the following driver waiting strategy:<\/p>\n<ul>\n<li><em>Process-local<\/em>: 3s to launch the task in the receiver executor<\/li>\n<li><em>Node-local<\/em>: 3s to launch the task in an executor on the receiver host<\/li>\n<li><em>Rack-Local<\/em>: 3s to launch the task in an executor on a host in the receiver rack<\/li>\n<\/ul>\n<p>With a small streaming batch interval (e.g. 5 seconds), this results in poor parallelism, since the majority of the tasks are scheduled on the receiver executor. We were able to verify this behavior in the Spark UI Executors page, where the number of \u201cCompleted Tasks\u201c of the receiver executor was much higher and the other executors were idling most of the time.<\/p>\n<p>Overall, reducing <span class=\"lang:default decode:true crayon-inline \">spark.locality.wait<\/span> to 10ms decreased our processing times by a factor of 3. Note that when using multiple receivers or Direct Kafka Streaming with multiple topics\/partitions, this improvement is not as drastic, since there are more executors with local data.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Backpressure\"><\/span>Backpressure<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Backpressure is an important concept in <a href=\"http:\/\/www.reactive-streams.org\/\" target=\"_blank\" rel=\"noopener\">reactive stream processing systems<\/a>. The central idea is that if a component is struggling to keep up, it should communicate to upstream components and get them to reduce the load. In the context of Spark Streaming, the receiver is the upstream component which gets notified if the executors cannot keep up. There are many scenarios when this happens, e.g.:<\/p>\n<ul>\n<li><em>Streaming Source<\/em>: Unexpected short burst of incoming messages in source system<\/li>\n<li><em>YARN<\/em>: Lost executor due to node failure<\/li>\n<li><em>External Sink System<\/em>: High load on external systems such as HBase leading to increased response times<\/li>\n<\/ul>\n<p>Without backpressure, microbatches queue up over time and the scheduling delay increases. This can be confirmed in the Streaming section of the Spark UI:<\/p>\n<p><a href=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2016\/12\/backpressure.png\" rel=\"attachment wp-att-2587\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-2587\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2016\/12\/backpressure.png\" alt=\"Backpressure in Spark UI\" width=\"940\" height=\"106\" \/><\/a><\/p>\n<p>Eventually, this can lead to out of memory issues. Moreover, the queued up batches prevent an immediate graceful shutdown<\/p>\n<p>.<\/p>\n<p>With activated backpressure, the driver monitors the current batch scheduling delays and processing times and dynamically adjusts the maximum rate of the receivers. The communication of new rate limits can be verified in the receiver log:<\/p>\n<p><span class=\"lang:default decode:true crayon-inline \">2016-12-06 08:27:02,572 INFO org.apache.spark.streaming.receiver.ReceiverSupervisorImpl Received a new rate limit: 51. <\/span><\/p>\n<p>The following illustration shows the effect of backpressure after a deployment of our streaming application that required a short downtime:<\/p>\n<p><a href=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2016\/12\/backpressure-deployment.png\" rel=\"attachment wp-att-2589\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-2589\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2016\/12\/backpressure-deployment.png\" alt=\"Backpressure in action\" width=\"940\" height=\"704\" \/><\/a><\/p>\n<p>During downtime, the streaming source has buffered the incoming messages. After restart, the receiver starts with the configured maximum rate. The backpressure implementation then takes some time to figure out the optimal rate. At this point, processing time is very close to batch duration, which can be seen in the <em>Processing Time chart<\/em>. Once the streaming application has caught up, it continues regular processing of incoming messages with a processing time smaller than batch duration.<\/p>\n<p>There is one catch when using backpressure: in the Spark UI it is not obvious when the job is not able to keep up over a longer period of time. Therefore, it is important to monitor the streaming source, as described in the <em>\u201cSource Monitoring\u201c<\/em> section.<\/p>\n<p>In the following two subtleties in configuring Spark Streaming backpressure are described.<\/p>\n<h4><span class=\"ez-toc-section\" id=\"Minimum-Rate\"><\/span>Minimum Rate<span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p>After activating Backpressure, we kept wondering why the rate does not drop below 100 records per second. Internally, Spark uses a <a href=\"https:\/\/en.wikipedia.org\/wiki\/PID_controller\" target=\"_blank\" rel=\"noopener\">PID-based<\/a> backpressure implementation. After some digging, we noticed that there is an undocumented property <span class=\"lang:default decode:true crayon-inline \">spark.streaming.backpressure.pid.minRate<\/span> in the PID <a href=\"https:\/\/github.com\/apache\/spark\/blob\/branch-2.0\/streaming\/src\/main\/scala\/org\/apache\/spark\/streaming\/scheduler\/rate\/RateEstimator.scala#L65\" target=\"_blank\" rel=\"noopener\">RateEstimator<\/a> implementation, with a default of 100. In most cases, 100 records per second is very low. However, in our case, each message from the streaming source contains hundreds a records. Since there is no harm in reducing this default, we set the <span class=\"lang:default decode:true crayon-inline \">minRate<\/span> to <span class=\"lang:default decode:true crayon-inline \">10<\/span> .<\/p>\n<h4><span class=\"ez-toc-section\" id=\"Initial-Rate\"><\/span>Initial Rate<span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p>The backpressure algorithm computes the rate based on the processing time of prior batches. This means that the Backpressure implementation takes some time to kick in when a new streaming application has started. When the streaming source has buffered many messages and the receiver(s) are a lot faster in fetching new messages than the executors in processing them, this can result in large microbatches at the beginning. To smooth the startup phase, an initial rate can be provided in Spark versions 2.x.: <span class=\"lang:default decode:true crayon-inline \">spark.streaming.backpressure.initialRate<\/span><\/p>\n<p>In Spark 1.x, we used the maxRate setting as a workaround:<\/p>\n<ul>\n<li>Receiver-based approach: <span class=\"lang:default decode:true crayon-inline \">spark.streaming.backpressure.initialRate<\/span><\/li>\n<li>Direct Kafka Approach: <span class=\"lang:default decode:true crayon-inline \">spark.streaming.kafka.maxRatePerPartition<\/span><\/li>\n<\/ul>\n<h2 id=\"deployment\"><span class=\"ez-toc-section\" id=\"Deployment\"><\/span>Deployment<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>In the Spark Streaming documentation on <a href=\"http:\/\/spark.apache.org\/docs\/2.0.2\/streaming-programming-guide.html#how-to-configure-checkpointing\" target=\"_blank\" rel=\"noopener\">how to configure checkpointing<\/a>, the following code snippet shows how to start a checkpointed streaming application:<\/p>\n<pre class=\"lang:scala decode:true \"> \/\/ Function to create and setup a new StreamingContext\r\n\r\ndef functionToCreateContext(): StreamingContext = {\r\n\r\n    val ssc = new StreamingContext(...)   \/\/ new context\r\n\r\n    val lines = ssc.socketTextStream(...) \/\/ create DStreams\r\n\r\n    ...\r\n\r\n    ssc.checkpoint(checkpointDirectory)   \/\/ set checkpoint directory\r\n\r\n    ssc\r\n\r\n}\r\n\r\n\/\/ Get StreamingContext from checkpoint data or create a new one\r\n\r\nval context = StreamingContext.getOrCreate(checkpointDirectory, functionToCreateContext _)\r\n\r\n\/\/ Do additional setup on context that needs to be done,\r\n\r\n\/\/ irrespective of whether it is being started or restarted\r\n\r\ncontext. ...\r\n\r\n\/\/ Start the context\r\n\r\ncontext.start()\r\n\r\ncontext.awaitTermination()<\/pre>\n<p>In this setup, the only way to end the streaming application running in cluster mode is killing the YARN application itself. Due to the nature of the write ahead log functionality, there is no data loss when the streaming application is restarted. However, this makes it impractical to deploy code changes, as the <a href=\"http:\/\/spark.apache.org\/docs\/2.0.2\/streaming-programming-guide.html#upgrading-application-code\" target=\"_blank\" rel=\"noopener\">Spark Streaming documentation<\/a> on upgrading application code states:<\/p>\n<blockquote><p>[&#8230;] And restarting from earlier checkpoint information of pre-upgrade code cannot be done. The checkpoint information essentially contains serialized Scala\/Java\/Python objects and trying to deserialize objects with new, modified classes may lead to errors. In this case, either start the upgraded app with a different checkpoint directory, or delete the previous checkpoint directory.<\/p><\/blockquote>\n<p>When upgrading application code, we have to make sure that the old streaming application shuts down gracefully with no further data to process. Then we can safely delete the HDFS checkpoint directory and start the job with the new application code. There is a Spark property for this use case: <span class=\"lang:default decode:true crayon-inline \">spark.streaming.stopGracefullyOnShutdown<\/span> . However, this property does not work in YARN cluster mode, since the executors get terminated right away when killing the YARN application, before completing all queued or active batches.<\/p>\n<p>Consequently, we need a custom mechanism to inform the Spark driver to do a graceful shutdown. There are different options, such as starting a Socket\/HTTP listener in the driver or using a marker HDFS file. A code snippet for the latter technique:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/bernhardschaefer\/c5619d4404d7120dcbe9f7dc7032dcf3.js\"><\/script><\/p>\n<p>Using this technique, the overall Deployment procedure after upgrading application code looks as follows:<\/p>\n<ol>\n<li>Place shutdown HDFS marker file<\/li>\n<li>Wait for the streaming application to finish gracefully<\/li>\n<li>Remove Shutdown marker file<\/li>\n<li>Delete or move the existing checkpoint directory<\/li>\n<li>Start the streaming application with the upgraded application code<\/li>\n<\/ol>\n<p>The third and fourth step can be done within the spark driver after the StreamingContext successfully stopped:<\/p>\n<ul>\n<li>Append a \u201c <span class=\"lang:default decode:true crayon-inline \">.COMPLETED<\/span> \u201c suffix to the marker file<\/li>\n<li>Append a timestamp suffix to the checkpoint directory<\/li>\n<\/ul>\n<p>The overall restart functionality can be provided by an Oozie Bash or Java application. Another option is a Linux service, as proposed in the \u201cGraceful stop\u201c section in the mentioned <a href=\"http:\/\/mkuthan.github.io\/blog\/2016\/09\/30\/spark-streaming-on-yarn\/\" target=\"_blank\" rel=\"noopener\">spark streaming on yarn<\/a> blog post. In this article, the waiting step is done by polling the job status <a href=\"https:\/\/hadoop.apache.org\/docs\/r2.6.0\/hadoop-yarn\/hadoop-yarn-site\/YarnCommands.html#application\" target=\"_blank\" rel=\"noopener\">using the yarn application command<\/a>.<\/p>\n<h2 id=\"monitoring\"><span class=\"ez-toc-section\" id=\"Monitoring-Spark-Streaming-Applications\"><\/span>Monitoring Spark Streaming Applications<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Monitoring and alerting is critical for running Spark Streaming applications in production. The following sections discuss some options for monitoring the streaming source and the spark application itself. Even though the sections focus on monitoring at application level, it is also crucial to monitor the infrastructure in its entirety, i.e. monitor YARN or Kafka.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Source-Monitoring\"><\/span>Source Monitoring<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>As mentioned in the Backpressure chapter, the streaming source should be monitored to get notified when the streaming application is not able to keep up with the incoming messages over a longer period of time.<\/p>\n<p>With Kafka as streaming source, this comes down to monitoring consumer lag, i.e. the delta between the latest offset and the consumer offset. There are various open-source tools for this exact purpose, e.g. <a href=\"https:\/\/github.com\/linkedin\/Burrow\" target=\"_blank\" rel=\"noopener\">Burrow<\/a> (LinkedIn) or <a href=\"https:\/\/quantifind.com\/KafkaOffsetMonitor\/\" target=\"_blank\" rel=\"noopener\">KafkaOffsetMonitor<\/a> (Quantifind).<\/p>\n<p>For traditional JMS Message Queues, we trigger alerts based on queue depth or message age, e.g.:<\/p>\n<ul>\n<li>Alert when there are more than 10.000 messages in the queue<\/li>\n<li>Alert when there are messages with an age older than 30 minutes<\/li>\n<\/ul>\n<h2><span class=\"ez-toc-section\" id=\"Spark-Monitoring\"><\/span>Spark Monitoring<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Spark has a <a href=\"http:\/\/spark.apache.org\/docs\/2.0.2\/monitoring.html#metrics\" target=\"_blank\" rel=\"noopener\">metrics system<\/a> which can be configured to report its metrics to sinks such as Graphite or Ganglia. In the default configuration, Spark uses the MetricServlet sink, which exposes the Spark Metrics at the Spark UI endpoint <span class=\"lang:default decode:true crayon-inline \">\/metrics\/json<\/span> .<\/p>\n<p>For an overview of important Spark metrics and how to setup Graphite\/Grafana as a sink, see the excellent <a href=\"http:\/\/mkuthan.github.io\/blog\/2016\/09\/30\/spark-streaming-on-yarn\/\" target=\"_blank\" rel=\"noopener\">spark streaming on yarn<\/a> blogpost.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Custom-Spark-Metrics\"><\/span>Custom Spark Metrics<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Groupon has developed the library spark-metrics to enable custom metrics in Spark. This library can be used to find potential performance bottlenecks in a Streaming application. It uses the Spark internal RPC API. Each time a metric is collected (e.g. a timer context is stopped), the library sends the event to the driver, which aggregates the metric among all executors. To prevent a DDOS attack like situation on the driver, metrics should not be collected on record level in the current version (see <a href=\"https:\/\/github.com\/groupon\/spark-metrics\/issues\/11\" target=\"_blank\" rel=\"noopener\">Issue 11<\/a>). In terms of RDD Operations, the library can be used for partition operations such as <span class=\"lang:default decode:true crayon-inline \">mapPartitions()<\/span> or <span class=\"lang:default decode:true crayon-inline \">foreachPartition()<\/span> , but not for single-record operations such as <span class=\"lang:default decode:true crayon-inline \">map()<\/span> or <span class=\"lang:default decode:true crayon-inline \">filter()<\/span> .<\/p>\n<p>The best use case we found for spark-metrics is timing batch operations to external systems, e.g. bulkPut Operations to HBase:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/bernhardschaefer\/6d8df86307b8f38f3051cc5cb188e42e.js\"><\/script><\/p>\n<p>This helped us in different contexts:<\/p>\n<ul>\n<li>Performance optimization: measure the impact of code changes in the Streaming application or configuration changes in the sink system.<\/li>\n<li>Performance monitoring: monitor if increased batch processing times are due to increased response times in the external system or due to other factors (e.g. high YARN utilization).<\/li>\n<\/ul>\n<h2 id=\"logging\"><span class=\"ez-toc-section\" id=\"Logging\"><\/span>Logging<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Spark Streaming itself does not use any log rotation in YARN mode. The configuration property <span class=\"lang:default decode:true crayon-inline \">spark.executor.logs.rolling.strategy<\/span> only applies to Spark Standalone.<\/p>\n<p>Since the logs in YARN are written to a local disk directory, for a 24\/7 Spark Streaming job this can lead to the disk filling up. Therefore, it is important to specify a RollingFileAppender, which deletes old logs after a certain period of time.<\/p>\n<p>For using a custom log4j configuration, several configurations are necessary:<\/p>\n<ul>\n<li>Upload a custom log4j.properties into the working directory of each container of the application: <span class=\"lang:default decode:true crayon-inline \">&#8211;files hdfs:\/\/\/path\/to\/log4j-yarn.properties<\/span><\/li>\n<li>Overwrite the driver and executor log4j.properties using System properties:<\/li>\n<\/ul>\n<p><span class=\"lang:default decode:true crayon-inline \">&#8211;conf spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j-yarn.properties<\/span><\/p>\n<p>&#8211;conf spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j-yarn.properties<\/p>\n<p>The working log rotation can be verified in the container logs webpage of the NodeManager:<\/p>\n<p><a href=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2016\/12\/node-manager-logs.png\" rel=\"attachment wp-att-2592\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-2592\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2016\/12\/node-manager-logs.png\" alt=\"Log rotation works\" width=\"940\" height=\"267\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2016\/12\/node-manager-logs.png 940w, https:\/\/www.inovex.de\/wp-content\/uploads\/2016\/12\/node-manager-logs-300x85.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/2016\/12\/node-manager-logs-768x218.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/2016\/12\/node-manager-logs-400x114.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/2016\/12\/node-manager-logs-360x102.png 360w\" sizes=\"auto, (max-width: 940px) 100vw, 940px\" \/><\/a><\/p>\n<p>Spark logging itself is very verbose. This makes it is difficult to find application log entries.<\/p>\n<p>One approach is to only emit Spark log entries at WARN level. However, this makes analyzing failed jobs more difficult. Another method is redirecting application logs to a dedicated log file. This way, we end up with two log files: one small log file with the application log entries and the standard spark stderr log files, which contain all entries. An exemplary <span class=\"lang:default decode:true crayon-inline \">log4j-yarn.properties<\/span> with two appenders:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/bernhardschaefer\/0bd0e8f6b8c5f27075eba815feb6c91a.js\"><\/script><\/p>\n<p>There are more advanced methods for dealing with YARN logs (a detailed discussion is out of scope of this article):<\/p>\n<ol>\n<li>Install ELK stack and configure a Logstash log4j appender (for a short elaboration see Logging section in the <a href=\"http:\/\/mkuthan.github.io\/blog\/2016\/09\/30\/spark-streaming-on-yarn\/\" target=\"_blank\" rel=\"noopener\">spark streaming on yarn<\/a> blogpost).<\/li>\n<li>Use a log4j <a href=\"https:\/\/logging.apache.org\/log4j\/1.2\/apidocs\/org\/apache\/log4j\/net\/SMTPAppender.html\" target=\"_blank\" rel=\"noopener\">SMTPAppender<\/a> to send E-Mail alerts in case of errors (needs fine-tuning to prevent spam).<\/li>\n<\/ol>\n<h2 id=\"conclusion\"><span class=\"ez-toc-section\" id=\"Conclusion\"><\/span>Conclusion<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>The whole development and testing process of Spark Streaming on YARN proved to be more challenging than initially expected. As stated in the introduction, many default configuration options are not suited for a 24\/7 streaming application.<\/p>\n<p>The official Spark documentation pages proved to be helpful during development. However, they lack some implementation details. During the process, we benefitted from community blog posts, such as the <a href=\"https:\/\/blog.cloudera.com\/blog\/2015\/03\/how-to-tune-your-apache-spark-jobs-part-2\/\" target=\"_blank\" rel=\"noopener\">how to tune your spark jobs<\/a> series. In retrospective we discovered blog posts of people than ran into exact same issues we faced, e.g. <a href=\"https:\/\/vanwilgenburg.wordpress.com\/2015\/10\/06\/spark-streaming-backpressure\/\" target=\"_blank\" rel=\"noopener\">puzzlers in configuring spark streaming backpressure<\/a>. The most extensive post that we discovered &#8211; <a href=\"http:\/\/mkuthan.github.io\/blog\/2016\/09\/30\/spark-streaming-on-yarn\/\" target=\"_blank\" rel=\"noopener\">spark streaming on yarn<\/a> &#8211; has been referenced multiple times within this article and provided us with additional food for thought.<\/p>\n<p>In the spirit of the open-source community and mentioned blog posts, this article shares our lessons learned and provides a <a href=\"https:\/\/gist.github.com\/bernhardschaefer\/4309f728f66879c0a8c062be0801057b\" target=\"_blank\" rel=\"noopener\">spark-submit starter template<\/a>. We hope that other projects benefit the same way that we did from community blog posts.<\/p>\n<p>Overall, in Spark Streaming we have found a stable, flexible and highly scalable streaming engine. At the time of writing, the client has a variety of Spark Streaming applications running in production. We found that it is a good fit for many applications and their diverse set of requirements.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Get-in-touch\"><\/span>Get in touch<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Check out our <a href=\"https:\/\/www.inovex.de\/de\/leistungen\/analytics\/data-science\/\" target=\"_blank\" rel=\"noopener\">analytics portfolio on our website<\/a>. If you have any questions use the comment section below, write an Email to <a href=\"mailto:info@inovex.de\" target=\"_blank\" rel=\"noopener\">info@inovex.de<\/a> or call\u00a0<a href=\"tel:+497216190210\" target=\"_blank\" rel=\"noopener\">+49 721 619 021-0<\/a>.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Were-hiring\"><\/span>We&#8217;re hiring<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Are you a data scientist looking for new challenges? <a href=\"https:\/\/www.inovex.de\/de\/karriere\/stellenangebote\/?experience_code=&amp;department=data-management-analytics\" target=\"_blank\" rel=\"noopener\">We&#8217;re currently hiring<\/a>!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>At a large client in the German food retailing industry, we have been running Spark Streaming on Apache Hadoop\u2122 YARN in production for close to a year now. Overall, Spark Streaming has proved to be a flexible, robust and scalable streaming engine. However, one can tell that streaming itself has been retrofitted into Apache Spark\u2122. [&hellip;]<\/p>\n","protected":false},"author":55,"featured_media":2624,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"ep_exclude_from_search":false,"footnotes":""},"tags":[77,105],"service":[411],"coauthors":[{"id":55,"display_name":"Bernhard Sch\u00e4fer","user_nicename":"bschaefer"}],"class_list":["post-2575","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","tag-big-data","tag-spark","service-data-engineering"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.7 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>24\/7 Spark Streaming on YARN in Production - inovex GmbH<\/title>\n<meta name=\"description\" content=\"We have been running Spark Streaming on Apache Hadoop\u2122 YARN in production for close to a year now. This is what we learned.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.inovex.de\/de\/blog\/247-spark-streaming-on-yarn-in-production\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"24\/7 Spark Streaming on YARN in Production - inovex GmbH\" \/>\n<meta property=\"og:description\" content=\"We have been running Spark Streaming on Apache Hadoop\u2122 YARN in production for close to a year now. This is what we learned.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.inovex.de\/de\/blog\/247-spark-streaming-on-yarn-in-production\/\" \/>\n<meta property=\"og:site_name\" content=\"inovex GmbH\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/inovexde\" \/>\n<meta property=\"article:published_time\" content=\"2016-12-22T15:41:33+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-02-24T06:31:03+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2016\/12\/spark.png\" \/>\n\t<meta property=\"og:image:width\" content=\"2300\" \/>\n\t<meta property=\"og:image:height\" content=\"678\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Bernhard Sch\u00e4fer\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2016\/12\/spark-1024x302.png\" \/>\n<meta name=\"twitter:creator\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:site\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"Bernhard Sch\u00e4fer\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"19\u00a0Minuten\" \/>\n\t<meta name=\"twitter:label3\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data3\" content=\"Bernhard Sch\u00e4fer\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/247-spark-streaming-on-yarn-in-production\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/247-spark-streaming-on-yarn-in-production\\\/\"},\"author\":{\"name\":\"Bernhard Sch\u00e4fer\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/person\\\/cb18039320a6f1c53acd31f532593270\"},\"headline\":\"24\\\/7 Spark Streaming on YARN in Production\",\"datePublished\":\"2016-12-22T15:41:33+00:00\",\"dateModified\":\"2026-02-24T06:31:03+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/247-spark-streaming-on-yarn-in-production\\\/\"},\"wordCount\":3127,\"commentCount\":5,\"publisher\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/247-spark-streaming-on-yarn-in-production\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2016\\\/12\\\/spark.png\",\"keywords\":[\"Big Data\",\"Spark\"],\"articleSection\":[\"Analytics\",\"English Content\",\"General\"],\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/247-spark-streaming-on-yarn-in-production\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/247-spark-streaming-on-yarn-in-production\\\/\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/247-spark-streaming-on-yarn-in-production\\\/\",\"name\":\"24\\\/7 Spark Streaming on YARN in Production - inovex GmbH\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/247-spark-streaming-on-yarn-in-production\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/247-spark-streaming-on-yarn-in-production\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2016\\\/12\\\/spark.png\",\"datePublished\":\"2016-12-22T15:41:33+00:00\",\"dateModified\":\"2026-02-24T06:31:03+00:00\",\"description\":\"We have been running Spark Streaming on Apache Hadoop\u2122 YARN in production for close to a year now. This is what we learned.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/247-spark-streaming-on-yarn-in-production\\\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/247-spark-streaming-on-yarn-in-production\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/247-spark-streaming-on-yarn-in-production\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2016\\\/12\\\/spark.png\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2016\\\/12\\\/spark.png\",\"width\":2300,\"height\":678,\"caption\":\"Spark Logo\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/247-spark-streaming-on-yarn-in-production\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"24\\\/7 Spark Streaming on YARN in Production\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#website\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\",\"name\":\"inovex GmbH\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\",\"name\":\"inovex GmbH\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/inovex-logo-16-9-1.png\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/inovex-logo-16-9-1.png\",\"width\":1921,\"height\":1081,\"caption\":\"inovex GmbH\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/inovexde\",\"https:\\\/\\\/x.com\\\/inovexgmbh\",\"https:\\\/\\\/www.instagram.com\\\/inovexlife\\\/\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/inovex\",\"https:\\\/\\\/www.youtube.com\\\/channel\\\/UC7r66GT14hROB_RQsQBAQUQ\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/person\\\/cb18039320a6f1c53acd31f532593270\",\"name\":\"Bernhard Sch\u00e4fer\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/776f4d954ab9900ff0eecbaac378b50b2a1cec0adf61226a9d615497ddca2a5b?s=96&d=retro&r=g4a9c49322e0682c77bc0b7c578d22e17\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/776f4d954ab9900ff0eecbaac378b50b2a1cec0adf61226a9d615497ddca2a5b?s=96&d=retro&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/776f4d954ab9900ff0eecbaac378b50b2a1cec0adf61226a9d615497ddca2a5b?s=96&d=retro&r=g\",\"caption\":\"Bernhard Sch\u00e4fer\"},\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/author\\\/bschaefer\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"24\/7 Spark Streaming on YARN in Production - inovex GmbH","description":"We have been running Spark Streaming on Apache Hadoop\u2122 YARN in production for close to a year now. This is what we learned.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.inovex.de\/de\/blog\/247-spark-streaming-on-yarn-in-production\/","og_locale":"de_DE","og_type":"article","og_title":"24\/7 Spark Streaming on YARN in Production - inovex GmbH","og_description":"We have been running Spark Streaming on Apache Hadoop\u2122 YARN in production for close to a year now. This is what we learned.","og_url":"https:\/\/www.inovex.de\/de\/blog\/247-spark-streaming-on-yarn-in-production\/","og_site_name":"inovex GmbH","article_publisher":"https:\/\/www.facebook.com\/inovexde","article_published_time":"2016-12-22T15:41:33+00:00","article_modified_time":"2026-02-24T06:31:03+00:00","og_image":[{"width":2300,"height":678,"url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2016\/12\/spark.png","type":"image\/png"}],"author":"Bernhard Sch\u00e4fer","twitter_card":"summary_large_image","twitter_image":"https:\/\/www.inovex.de\/wp-content\/uploads\/2016\/12\/spark-1024x302.png","twitter_creator":"@inovexgmbh","twitter_site":"@inovexgmbh","twitter_misc":{"Verfasst von":"Bernhard Sch\u00e4fer","Gesch\u00e4tzte Lesezeit":"19\u00a0Minuten","Written by":"Bernhard Sch\u00e4fer"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.inovex.de\/de\/blog\/247-spark-streaming-on-yarn-in-production\/#article","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/blog\/247-spark-streaming-on-yarn-in-production\/"},"author":{"name":"Bernhard Sch\u00e4fer","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/cb18039320a6f1c53acd31f532593270"},"headline":"24\/7 Spark Streaming on YARN in Production","datePublished":"2016-12-22T15:41:33+00:00","dateModified":"2026-02-24T06:31:03+00:00","mainEntityOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/247-spark-streaming-on-yarn-in-production\/"},"wordCount":3127,"commentCount":5,"publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/247-spark-streaming-on-yarn-in-production\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2016\/12\/spark.png","keywords":["Big Data","Spark"],"articleSection":["Analytics","English Content","General"],"inLanguage":"de","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.inovex.de\/de\/blog\/247-spark-streaming-on-yarn-in-production\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.inovex.de\/de\/blog\/247-spark-streaming-on-yarn-in-production\/","url":"https:\/\/www.inovex.de\/de\/blog\/247-spark-streaming-on-yarn-in-production\/","name":"24\/7 Spark Streaming on YARN in Production - inovex GmbH","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/247-spark-streaming-on-yarn-in-production\/#primaryimage"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/247-spark-streaming-on-yarn-in-production\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2016\/12\/spark.png","datePublished":"2016-12-22T15:41:33+00:00","dateModified":"2026-02-24T06:31:03+00:00","description":"We have been running Spark Streaming on Apache Hadoop\u2122 YARN in production for close to a year now. This is what we learned.","breadcrumb":{"@id":"https:\/\/www.inovex.de\/de\/blog\/247-spark-streaming-on-yarn-in-production\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.inovex.de\/de\/blog\/247-spark-streaming-on-yarn-in-production\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/blog\/247-spark-streaming-on-yarn-in-production\/#primaryimage","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2016\/12\/spark.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2016\/12\/spark.png","width":2300,"height":678,"caption":"Spark Logo"},{"@type":"BreadcrumbList","@id":"https:\/\/www.inovex.de\/de\/blog\/247-spark-streaming-on-yarn-in-production\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.inovex.de\/de\/"},{"@type":"ListItem","position":2,"name":"24\/7 Spark Streaming on YARN in Production"}]},{"@type":"WebSite","@id":"https:\/\/www.inovex.de\/de\/#website","url":"https:\/\/www.inovex.de\/de\/","name":"inovex GmbH","description":"","publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.inovex.de\/de\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Organization","@id":"https:\/\/www.inovex.de\/de\/#organization","name":"inovex GmbH","url":"https:\/\/www.inovex.de\/de\/","logo":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","width":1921,"height":1081,"caption":"inovex GmbH"},"image":{"@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/inovexde","https:\/\/x.com\/inovexgmbh","https:\/\/www.instagram.com\/inovexlife\/","https:\/\/www.linkedin.com\/company\/inovex","https:\/\/www.youtube.com\/channel\/UC7r66GT14hROB_RQsQBAQUQ"]},{"@type":"Person","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/cb18039320a6f1c53acd31f532593270","name":"Bernhard Sch\u00e4fer","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/secure.gravatar.com\/avatar\/776f4d954ab9900ff0eecbaac378b50b2a1cec0adf61226a9d615497ddca2a5b?s=96&d=retro&r=g4a9c49322e0682c77bc0b7c578d22e17","url":"https:\/\/secure.gravatar.com\/avatar\/776f4d954ab9900ff0eecbaac378b50b2a1cec0adf61226a9d615497ddca2a5b?s=96&d=retro&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/776f4d954ab9900ff0eecbaac378b50b2a1cec0adf61226a9d615497ddca2a5b?s=96&d=retro&r=g","caption":"Bernhard Sch\u00e4fer"},"url":"https:\/\/www.inovex.de\/de\/blog\/author\/bschaefer\/"}]}},"_links":{"self":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/2575","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/users\/55"}],"replies":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/comments?post=2575"}],"version-history":[{"count":4,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/2575\/revisions"}],"predecessor-version":[{"id":66356,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/2575\/revisions\/66356"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media\/2624"}],"wp:attachment":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media?parent=2575"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/tags?post=2575"},{"taxonomy":"service","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/service?post=2575"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/coauthors?post=2575"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}