{"id":27825,"date":"2021-04-27T19:21:48","date_gmt":"2021-04-27T17:21:48","guid":{"rendered":"https:\/\/www.inovex.de\/blog\/?p=20785"},"modified":"2025-09-15T11:30:45","modified_gmt":"2025-09-15T09:30:45","slug":"set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark","status":"publish","type":"post","link":"https:\/\/www.inovex.de\/de\/blog\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\/","title":{"rendered":"Set the Kernels Free \u2013 Remote Kernels for Jupyter Notebooks &#038; Spark"},"content":{"rendered":"<p>According to the Data Science Survey conducted by JetBrains in 2018, Jupyter\/IPython notebooks are the most popular tool in the category <em>IDEs and Editors<\/em> among data scientists. Whether notebooks can be seen as true IDEs is debatable, however, there is no doubt that they offer several advantages \u2013 especially for exploratory data analysis. Incremental execution of code, quick access to visualizations as well as capturing transient computations and the ease of sharing results with others are the major success factors of notebooks.<!--more--><\/p>\n<p>As for today, there are above 100 compatible <a href=\"https:\/\/github.com\/jupyter\/jupyter\/wiki\/Jupyter-kernels\">kernels<\/a>, enabling programming in various languages, like Python, Julia, R or even c#. But notebooks offer much more. It is possible to use them for data science on distributed resources, for example by using notebooks in conjunction with a well-known and widely used framework for distributed data processing like Apache Spark. However, traditional notebooks are limited in accessing distributed cluster resources. Furthermore, while working with Spark in combination with Notebooks several challenges must be overcome, especially regarding scalability and stability.\u00a0 Why? Well, it all comes down to the kernel, the unit responsible for interpreting and executing code from the notebook cells.<\/p>\n<p>A traditional notebook container includes the notebook UI and also the kernel itself. Meaning: the notebook and its kernel are running as one unit. To understand the impact this has on working with Spark we first have to take a look at a typical Spark application:<\/p>\n<p>In general, each application runs as an independent group of processes within the cluster and is coordinated by the <em>SparkContext <\/em>object which exists in the <i>Driver Program<\/i>. The <i>SparkContext<\/i> connects to the <i>Cluster Manager<\/i> (E.g. YARN or Mesos for Hadoop), which then allocates needed resources for the application. Finally, SparkContext provides the <i>Executors<\/i> with the application code, e.g. as Python code or JAR artefacts and sends <i>tasks <\/i>to execute.<\/p>\n<p>Now, Spark itself offers two possible deploy modes: Cluster and Client Mode. The main difference between them is where the Spark <i>Driver Program <\/i>runs. In Client Mode, it is launched directly within the <i>Spark-Submit <\/i>process, and so it lives within the client, e.g. within a notebook container. In Cluster Mode, the <i>Spark driver <\/i>is launched outside the client. Speaking of a regular YARN\/Hadoop cluster, the <i>Spark driver <\/i>runs in the application master inside its own YARN container on one of the cluster nodes. After deployment, the client is no longer necessary for further execution. The <i>Spark driver<\/i>, in contrast to Client Mode, is then subject to the resource manager (YARN) and can be scheduled automatically on nodes with available resources.<\/p>\n<p>After discussing Spark deploy modes, we can now have a look at the combination of both: Spark and Jupyter notebooks. As previously mentioned, in a traditional Jupyter notebook, the kernel exists within the\u00a0notebook container.<\/p>\n<figure id=\"attachment_20792\" aria-describedby=\"caption-attachment-20792\" style=\"width: 240px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-20792 size-full\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/traditional_nb.png\" alt=\"Depiction of A traditional notebook setup: kernel running within the notebook container.\" width=\"240\" height=\"130\" \/><figcaption id=\"caption-attachment-20792\" class=\"wp-caption-text\">A traditional notebook setup: kernel running within the notebook container.<\/figcaption><\/figure>\n<p>Looking at the image above and remembering how Spark works and what Client and Cluster Mode are, it is obvious that this notebook architecture only allows us to work with Spark in Client Mode. The <em>Spark-Driver <\/em>and <em>SparkContext<\/em> are located in the notebook kernel, which is bound to the client process. Unfortunately, the above combination of Spark and notebooks is not flawless. There are several issues that can be improved.<\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_83 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\"><p class=\"ez-toc-title\" style=\"cursor:inherit\"><\/p>\n<\/div><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.inovex.de\/de\/blog\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\/#The-drawbacks-of-Spark-in-Client-Mode-within-a-notebook\" >The drawbacks of Spark in Client-Mode within a notebook<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.inovex.de\/de\/blog\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\/#Remote-kernels-with-Jupyter-Enterprise-Gateway\" >Remote kernels with Jupyter Enterprise Gateway<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.inovex.de\/de\/blog\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\/#Jupyter-Enterprise-Gateway-%E2%80%93-Spark-in-YARN-Cluster-Mode\" >Jupyter Enterprise Gateway \u2013 Spark in YARN Cluster Mode<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.inovex.de\/de\/blog\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\/#Containerized-Jupyter-Enterprise-Gateway\" >Containerized Jupyter Enterprise Gateway<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.inovex.de\/de\/blog\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\/#Jupyter-Enterprise-Gateway-integrated-in-a-data-science-platform\" >Jupyter Enterprise Gateway integrated in a data science platform<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.inovex.de\/de\/blog\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\/#Summary\" >Summary<\/a><\/li><\/ul><\/nav><\/div>\n<h2><span class=\"ez-toc-section\" id=\"The-drawbacks-of-Spark-in-Client-Mode-within-a-notebook\"><\/span>The drawbacks of Spark in Client-Mode within a notebook<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>&nbsp;<\/p>\n<figure id=\"attachment_20793\" aria-describedby=\"caption-attachment-20793\" style=\"width: 240px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-20793 size-full\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/traditional_nb.png\" alt=\"Depiction of a common setup for interactive data science with Spark for on-premises infrastructure.\" width=\"240\" height=\"130\" \/><figcaption id=\"caption-attachment-20793\" class=\"wp-caption-text\">A common setup for interactive data science with Spark for on-premises infrastructure.<\/figcaption><\/figure>\n<p>Let\u2019s take a look at a common (on-premises) setup consisting of a Hadoop\/YARN cluster, as depicted in the illustration above. While <em>Spark-Executors <\/em>are scheduled across all available cluster nodes by YARN, the <em>Spark driver<\/em> is not. Instead, it is coupled to the corresponding notebook process (bound to the kernel). When working in a multi-user setup, these heavy notebook processes are utilizing the resources of only one server \u2013 usually the cluster edge-node. This dramatically limits the number of users working simultaneously and results in poor scalability.<\/p>\n<p>Stability and user-encapsulation are suffering as well. Since the notebook is not subject to YARN, it is not strictly limited in terms of resources. A common Spark operation of collecting a distributed data-frame back to a notebook (to the <em>Spark driver<\/em>), e.g. by creating a local (pandas) data-frame, can lead to a quick growth of memory usage. This may exhaust the resources of the machine running the notebook process in no time, possibly freezing the whole machine. In result, all other users&#8216; processes are negatively affected.<\/p>\n<p>These drawbacks are especially critical when aiming to enable multiple users working on a single platform simultaneously for data science\/engineering tasks. A scenario in which one user effectively kills processes\/applications of other users is definitely something you want to avoid. But is there a solution to this problem? Yes! The main idea is to separate the notebook kernel from its container with Jupyter Enterprise Gateway.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Remote-kernels-with-Jupyter-Enterprise-Gateway\"><\/span><strong>Remote kernels with Jupyter Enterprise Gateway<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Jupyter Enterprise Gateway (JEG) is a web server that enables launching kernels on behalf of remote notebooks. This results in a separation between view and computation since the kernel does not have to run on the same machine (or container) as the notebook like in the traditional setup.<\/p>\n<p>&nbsp;<\/p>\n<figure id=\"attachment_20794\" aria-describedby=\"caption-attachment-20794\" style=\"width: 602px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-20794 size-full\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/old_arch.png\" alt=\"Depiction of a notebook setup with remote kernels\" width=\"602\" height=\"317\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/old_arch.png 602w, https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/old_arch-300x158.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/old_arch-400x211.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/old_arch-360x190.png 360w\" sizes=\"auto, (max-width: 602px) 100vw, 602px\" \/><figcaption id=\"caption-attachment-20794\" class=\"wp-caption-text\">Notebook setup with remote kernels. A kernel gateway (JEG) manages the communication of a notebook instance with its (distributed) kernel.<\/figcaption><\/figure>\n<p>As seen, JEG is used to manage the communication between a notebook and its kernel. In the presented illustration, the notebook does not have multiple kernels running but rather one kernel that is distributed across cluster-nodes. This could be a regular Python kernel that is distributed but also e.g. a kernel containing a Spark application with a Spark driver instance and multiple Spark executors.<\/p>\n<p>The first main advantage of having remote kernels with JEG is improved scalability. Since the kernel does not reside in the notebook, the notebook container requires only minimal resources as it does not perform any computation. In fact, the kernel is now completely schedulable by YARN (or another resource manager), allowing the use of Spark in cluster mode.<\/p>\n<p>With JEG, stability and user encapsulation benefit as well. Because the <em>Spark driver <\/em>is running as YARN container (in case of a Hadoop cluster) it is strictly constrained in terms of resources as it is managed by YARN. Thinking of the use case mentioned above where a collected Spark data frame could possibly exceed reserved memory, the application will not grow out of bounds but would rather simply fail. While still painful for the affected user, it will not have a negative effect on other users as their Spark drivers might run on another cluster node.<\/p>\n<p>To actually utilize remote kernels in a simplified use case, we need to provide two components: a JEG instance and our actual notebook. Below, we discuss a brief instruction for installing JEG on a Hadoop\/YARN cluster to execute PySpark code from a remote notebook. Later we will take a look at a more complex scenario \u2013 integrating JEG into a data science platform offering various functionalities, e.g. creating kernels with customized environments within a few clicks from a Web-UI.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Jupyter-Enterprise-Gateway-%E2%80%93-Spark-in-YARN-Cluster-Mode\"><\/span><strong>Jupyter Enterprise Gateway \u2013 Spark in YARN Cluster Mode<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>We will take a look at a rather basic scenario for remote kernels in which JEG runs on an edge node of a Hadoop cluster and is used to spawn and manage Python Spark kernels. The kernel specification discussed here assumes YARN as a resource manager.<\/p>\n<p>Installing JEG is rather straightforward. It can be done using Conda with the following command executed directly on the edge node:<\/p>\n<pre class=\"lang:sh decode:true\">conda install -c conda-forge jupyter_enterprise_gateway<\/pre>\n<p><span style=\"font-weight: 400;\">If you want to modify JEG\u2019s config, for example to change communication timeouts or specify the amount of allowed kernels for one user, you can use:<\/span><\/p>\n<pre class=\"lang:sh decode:true\">jupyter enterprisegateway --generate-config\r\n<\/pre>\n<p><span style=\"font-weight: 400;\">In this generated file, you can add various parameters.\u00a0<a href=\"https:\/\/jupyter-enterprise-gateway.readthedocs.io\/en\/latest\/\">Click here<\/a>\u00a0to visit the official JEG website for more information.\u00a0<\/span><span style=\"font-weight: 400;\">After modifying the config-file to suit your needs, it has to be referred while starting JEG.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">As kernels are now managed by JEG, we have to provide pre-defined kernels. Per default, JEG discovers available kernels in <\/span><i><span style=\"font-weight: 400;\">\/usr\/local\/share\/jupyter\/kernels\/<\/span><\/i><span style=\"font-weight: 400;\">. Make sure this directory has the correct permissions so that JEG is able to access it. While installing and starting JEG is rather easy, the major part of the configuration happens at the kernel level. As already said, we focus on enabling Spark in Cluster Mode.<\/span><\/p>\n<figure id=\"attachment_20807\" aria-describedby=\"caption-attachment-20807\" style=\"width: 1018px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-20807 size-full\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/Bildschirmfoto-von-2020-11-22-22-17-07.png\" alt=\"Depiction of a typical directory structure of a single kernel.\" width=\"1018\" height=\"344\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/Bildschirmfoto-von-2020-11-22-22-17-07.png 1018w, https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/Bildschirmfoto-von-2020-11-22-22-17-07-300x101.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/Bildschirmfoto-von-2020-11-22-22-17-07-768x260.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/Bildschirmfoto-von-2020-11-22-22-17-07-400x135.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/Bildschirmfoto-von-2020-11-22-22-17-07-360x122.png 360w\" sizes=\"auto, (max-width: 1018px) 100vw, 1018px\" \/><figcaption id=\"caption-attachment-20807\" class=\"wp-caption-text\">A typical directory structure of a single kernel.<\/figcaption><\/figure>\n<p><span style=\"font-weight: 400;\">In the kernel directory, every subdirectory represents one kernel.\u00a0<\/span><span style=\"font-weight: 400;\">The file modified the most is the <\/span><i><span style=\"font-weight: 400;\">kernel.json<\/span><\/i><span style=\"font-weight: 400;\">. It contains all relevant metadata, like display name (visible from the JupyterLab UI) or a kernel\u2019s programming language. The <\/span><i><span style=\"font-weight: 400;\">env <\/span><\/i><span style=\"font-weight: 400;\">section is actually the one that requires most of our attention and an example of this part is presented below:<\/span><\/p>\n<pre class=\"lang:yaml decode:true\">\u00a0\"env\": {\r\n\u00a0 \u00a0 \"HADOOP_CONF_DIR\": \"\/usr\/bin\/hadoop\/etc\/hadoop\",\r\n\u00a0 \u00a0 \"SPARK_HOME\": \"\/usr\/lib\/spark\",\r\n\u00a0 \u00a0 \"SPARK_CONF_DIR\": \"usr\/lib\/spark\/conf\",\r\n\u00a0 \u00a0 \"PYSPARK_PYTHON\": \"\/usr\/local\/share\/jupyter\/kernels\/py37_folder\/py37\/bin\/python\",\r\n\u00a0 \u00a0 \"PYTHONPATH\": \"\/usr\/local\/share\/jupyter\/kernels\/py37_folder\/py37\/bin\/python\",\r\n\u00a0 \u00a0 \"SPARK_OPTS\": \"--master yarn --deploy-mode cluster --name ${KERNEL_ID:-ERROR__NO__KERNEL_ID} --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=py37\/py37\/bin\/python --conf spark.yarn.appMasterEnv.PATH=.py37\/py37\/bin:$PATH ${KERNEL_EXTRA_SPARK_OPTS} --conf spark.yarn.dist.archives=\/usr\/local\/share\/jupyter\/kernels\/py37_folder\/py37.zip#py37 --driver-memory=1024M --num-executors=1 --executor-memory=1024M --executor-cores=1\",\r\n\u00a0 \u00a0 \"LAUNCH_OPTS\": \"\"\r\n\u00a0 }<\/pre>\n<p><span style=\"font-weight: 400;\">It contains relevant parameters of a Spark application, e.g. <\/span><i><span style=\"font-weight: 400;\">SPARK_HOME <\/span><\/i><span style=\"font-weight: 400;\">or<\/span><i><span style=\"font-weight: 400;\"> HADOOP_CONF_DIR <\/span><\/i><span style=\"font-weight: 400;\">locations. In <\/span><i><span style=\"font-weight: 400;\">SPARK_OPTS <\/span><\/i><span style=\"font-weight: 400;\">we specify the actual deploy mode, resources and the environment (extra packages) we use. Besides these rather basic Spark-related configurations, dependency management is an important aspect. If we want to use e.g. Pandas within our Python-Notebook in a cluster-scenario, we have to make sure that Pandas is installed on all involved cluster nodes. This can be done by creating a custom Conda environment and distributing the archived environment via <\/span><i><span style=\"font-weight: 400;\">SPARK_OPTS<\/span><\/i><span style=\"font-weight: 400;\"> to all cluster nodes at kernel start. For further information, this\u00a0<a href=\"https:\/\/www.inovex.de\/blog\/isolated-virtual-environments-pyspark\/\">blog post<\/a> is a great read<\/span><span style=\"font-weight: 400;\">. We install our Conda environment in the kernel directory (<\/span><i><span style=\"font-weight: 400;\">conda_environment<\/span><\/i><span style=\"font-weight: 400;\">) and also place the archived environment (<\/span><i><span style=\"font-weight: 400;\">conda_environment.zip<\/span><\/i><span style=\"font-weight: 400;\">) here.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">To complement our kernel definition, other necessary kernel files like <\/span><i><span style=\"font-weight: 400;\">run.sh <\/span><\/i><span style=\"font-weight: 400;\">or the launcher script do not require any changes (at least in the standard scenario) and can be adopted from the provided default kernel, e.g. from\u00a0<a href=\"https:\/\/github.com\/jupyter\/enterprise_gateway\/tree\/master\/etc\">here.<\/a><\/span><\/p>\n<p><span style=\"font-weight: 400;\">After having JEG installed and a prepared kernel available, we can start the JEG server with:<\/span><\/p>\n<pre class=\"lang:sh decode:true\">jupyter enterprisegateway --ip=0.0.0.0 --port_retries=0 --port=8888 \r\n--config='&lt;PATH_TO_GENERATED_CONFIG&gt; \r\n--EnterpriseGatewayApp.yarn_endpoint=http:\/\/&lt;HOST_NAME&gt;:8088\/ws\/v1<\/pre>\n<p><span style=\"font-weight: 400;\">The <\/span><i><span style=\"font-weight: 400;\">YARN endpoint<\/span><\/i><span style=\"font-weight: 400;\"> should point to the server, where the YARN master resides. Depending on your cluster this address path may look different.\u00a0\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Once JEG is running on your Hadoop\/YARN cluster and is able to discover at least one specified kernel, you are ready to actually connect a notebook started on your local machine (remember the firewall!) with JEG. You simply have to provide the address of JEG (usually the IP &amp; Port of the cluster node on which JEG is running) to your JupyterLab\/Notebook at start. You can refer to the official documentation for further details but a simple docker-based command could look like this:<\/span><\/p>\n<pre class=\"lang:sh decode:true\">docker run -t --rm \r\n\u00a0\u00a0-e JUPYTER_GATEWAY_URL='http:\/\/&lt;JEG_IP&gt;:&lt;JEG_PORT&gt;' \r\n\u00a0\u00a0-e JUPYTER_GATEWAY_HTTP_USER=guest \r\n\u00a0\u00a0-e JUPYTER_GATEWAY_HTTP_PWD=guest-password \r\n\u00a0\u00a0-e JUPYTER_GATEWAY_VALIDATE_CERT='false' \r\n\u00a0\u00a0-e LOG_LEVEL=DEBUG \r\n\u00a0\u00a0-p 8888:8888 \r\n\u00a0\u00a0-v ${HOME}\/notebooks\/:\/tmp\/notebooks \r\n\u00a0\u00a0-w \/tmp\/notebooks \r\n  notebook-docker-image<\/pre>\n<h2><span class=\"ez-toc-section\" id=\"Containerized-Jupyter-Enterprise-Gateway\"><\/span><b>Containerized Jupyter Enterprise Gateway<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">While JEG can simply be started directly on a cluster\/edge node, it might be more convenient to have a containerized version including all dependencies and providing more flexibility when thinking about deployments in multiple environments. Of course, you could just use the provided <\/span><i><span style=\"font-weight: 400;\">elyra\/enterprise-gateway-demo<\/span><\/i><span style=\"font-weight: 400;\"> docker image, but if you want it a bit more customized and e.g. use different Spark and\/or Hadoop versions (in case you want to use a Spark-related kernel with JEG) than in the provided image, you could also create your own custom docker image.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">With JEG running in a container and therefore kernels being started from within the container, we have to include Spark and Hadoop binaries into the container. This can be tricky but the provided Dockerfile brings a basic setup with <\/span><i><span style=\"font-weight: 400;\">ubuntu:20.04<\/span><\/i><span style=\"font-weight: 400;\"> as a base image and installation commands for:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">JEG &#8211;\u00a0 obviously. It is important to add an entry-point script with a starting command for JEG.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Hadoop &#8211; since JEG needs to access the YARN resource manager that comes with Hadoop. You should install the same or a similar major version as on your cluster.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Spark &#8211; since we want to start Spark applications with our kernels. Again, install the same or a similar major version as on your cluster.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">However, simply installing these components will not do it. You actually want to use the physical cluster resources and do not want your Spark application being bound to the resources of the docker host only. The trick is to mount all relevant Hadoop and Spark config-files with volumes into the container at launch. You just have to make sure that you mount the volumes at the right location. E.g. yarn-site.xml from the docker host has to be mounted at its counterpart directory from the Hadoop installed in the container. By mounting all necessary config-files you actually make sure that the (Spark) kernel started inside the container by JEG is able to access the underlying physical cluster resources. Additionally, we also mount the kernel directory enabling us to add new kernels that are immediately recognized by JEG at runtime.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">You should also declare variables in your Dockerfile to allow flexible versioning and specify installation paths for Hadoop, Spark or Conda\/Python. A build command may then look like this:<\/span><\/p>\n<pre class=\"lang:sh decode:true\">docker build \r\n-t &lt;your_local_registry&gt;\r\n--build-arg GATEWAY_VERSION=2.2.0 \r\n--build-arg GATEWAY_PORT=8888 \r\n--build-arg SPARK_HOME=\/usr\/lib\/spark \r\n--build-arg HADOOP_HOME=\/usr\/bin\/hadoop \r\n--build-arg PYTHON_HOME=\/opt\/conda\/default\/bin\/python \r\n--build-arg KERNEL_FOLDER=usr\/local\/share\/jupyter\/kernels\/ \r\n--build-arg PYTHON_VERSION=3.7 \r\n--build-arg SPARK_VERSION=2.4.6 \r\n--build-arg HADOOP_VERSION=2.7 \r\n.<\/pre>\n<p><span style=\"font-weight: 400;\">After building (and eventually pushing) the image, you can use e.g. <\/span><i><span style=\"font-weight: 400;\">docker-compose<\/span><\/i><span style=\"font-weight: 400;\"> to start your container. A YAML-file for Docker-Compose may look similar like the one presented below, however, that depends on the underlying infrastructure you use:<\/span><\/p>\n<pre class=\"lang:sh decode:true\">version: \"3.3\"\r\n\r\nservices:\r\n\u00a0 gateway:\r\n\u00a0 \u00a0 image: &lt;your-registry&gt;\r\n\u00a0 \u00a0 container_name: jeg-container\r\n\u00a0 \u00a0 network_mode: \"host\"\r\n\u00a0 \u00a0 volumes:\r\n\u00a0 \u00a0 \u00a0 - \/usr\/local\/share\/jupyter\/kernels\/:\/usr\/local\/share\/jupyter\/kernels\/\r\n\u00a0 \u00a0 \u00a0 - \/usr\/lib\/hadoop\/etc\/hadoop\/yarn-site.xml:\/usr\/bin\/hadoop\/etc\/hadoop\/yarn-site.xml\r\n\u00a0 \u00a0 \u00a0 - \/usr\/lib\/hadoop\/etc\/hadoop\/core-site.xml:\/usr\/bin\/hadoop\/etc\/hadoop\/core-site.xml\r\n\u00a0 \u00a0 \u00a0 - \/hadoop\/yarn\/:\/hadoop\/yarn\/\r\n\u00a0 \u00a0 \u00a0 - \/usr\/lib\/spark\/conf\/spark-defaults.conf:\/usr\/lib\/spark\/conf\/spark-defaults.conf\r\n\u00a0 \u00a0 \u00a0 - \/usr\/lib\/spark\/conf\/spark-env.sh:\/usr\/lib\/spark\/conf\/spark-env.sh\r\n\u00a0 \u00a0 \u00a0 - \/usr\/lib\/spark\/jars\/:\/usr\/lib\/spark\/jars\/\r\n\u00a0 \u00a0 \u00a0 - \/usr\/local\/share\/google\/dataproc\/lib\/:\/usr\/local\/share\/google\/dataproc\/lib\/\r\n\u00a0 \u00a0 \u00a0 - ~\/masterarbeit-rafal\/jupyter_enterprise_gateway_config.py:\/tmp\/jupyter_enterprise_gateway_config.py<\/pre>\n<p><span style=\"font-weight: 400;\">You might have noticed that we use <\/span><i><span style=\"font-weight: 400;\">host networking <\/span><\/i><span style=\"font-weight: 400;\">for our container. This is related to the way JEG communicates with kernels. At kernel start, JEG maintains a unique response address to which a kernel sends back its status to JEG. This results in multiple open response addresses when running more than one kernel at the same time. This would, among others, require us to expose multiple ports of our container. Ports that we actually do not know in advance. This can be avoided by using <\/span><i><span style=\"font-weight: 400;\">host networking<\/span><\/i><span style=\"font-weight: 400;\">. As of JEG version 2.4 <\/span><i><span style=\"font-weight: 400;\">single-response addresses<\/span><\/i><span style=\"font-weight: 400;\"> for communicating with kernels are not supported, but there is already an open\u00a0<a href=\"https:\/\/github.com\/jupyter-server\/enterprise_gateway\/issues\/814\">issue<\/a> on GitHub and we hopefully see this in version 3.0<\/span><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">This basic scenario allows utilizing remote kernels on your Hadoop\/YARN cluster which provides the advantages of Spark in Cluster Mode and the flexibility of programming in a notebook. However, this is rather a single-user solution as there is no management of kernels and it is definitely not an enterprise use case. Ideally, we are able to let multiple data scientists\/engineers or developers work with notebooks and utilize shared cluster resources at the same time. To accomplish this, combining JupyterHub with JEG is a good way to go. In this scenario, JupyterHub takes care of spawning single-user notebooks, whereas JEG handles starting kernels and acts as a Gateway. At inovex, we implemented a proof of concept of a holistic data science platform that integrates JEG and offers various functionalities.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Jupyter-Enterprise-Gateway-integrated-in-a-data-science-platform\"><\/span><b>Jupyter Enterprise Gateway integrated in a data science platform<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">The architecture of the proof of concept basically comprises two major parts. An (existing) Hadoop\/Yarn cluster for computation loads and a rather small Kubernetes cluster for user interactions. To enable scalability in a multi-user setup we not only distribute the kernels (across the Hadoop\/Yarn cluster) but also the actual notebook servers across the Kubernetes cluster nodes!\u00a0<\/span><\/p>\n<figure id=\"attachment_20809\" aria-describedby=\"caption-attachment-20809\" style=\"width: 1632px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-20809 size-full\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/Bildschirmfoto-von-2021-01-19-16-51-33.png\" alt=\"Depiction of the architecture of our solution at inovex. Kubernetes cluster for management and distribution of notebooks, a Hadoop cluster for executing kernels.\" width=\"1632\" height=\"822\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/Bildschirmfoto-von-2021-01-19-16-51-33.png 1632w, https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/Bildschirmfoto-von-2021-01-19-16-51-33-300x151.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/Bildschirmfoto-von-2021-01-19-16-51-33-1024x516.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/Bildschirmfoto-von-2021-01-19-16-51-33-768x387.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/Bildschirmfoto-von-2021-01-19-16-51-33-1536x774.png 1536w, https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/Bildschirmfoto-von-2021-01-19-16-51-33-400x201.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/Bildschirmfoto-von-2021-01-19-16-51-33-360x181.png 360w\" sizes=\"auto, (max-width: 1632px) 100vw, 1632px\" \/><figcaption id=\"caption-attachment-20809\" class=\"wp-caption-text\">The architecture of our solution at inovex. Kubernetes cluster for management and distribution of notebooks, a Hadoop cluster for executing kernels.<\/figcaption><\/figure>\n<p><span style=\"font-weight: 400;\">The Web UI running inside Kubernetes acts as a single entry point for the whole platform. After logging in, users are offered various functionalities. They are e.g. able to access JupyterHub and start a JupyterLab session (n<\/span><i><span style=\"font-weight: 400;\">otebook<\/span><\/i><span style=\"font-weight: 400;\"> in the above illustration). JEG (<\/span><i><span style=\"font-weight: 400;\">Kernel Gateway)<\/span><\/i><span style=\"font-weight: 400;\"> provides the kernels and spawns them on the Hadoop cluster. In this way, Kubernetes is solely responsible for display logic while Hadoop handles the heavy workloads.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Other main functionalities of the Web UI are viewing the cluster state (Hadoop), creating new kernels or setting various resource limits for users from an administrator panel. Since the Web UI is deployed on Kubernetes and JEG as well as kernels residing on the Hadoop cluster, a custom API-Server is needed in case a user wants to e.g. add or modify kernels. It allows the Web UI to communicate with the Hadoop cluster and implements the kernel management functionalities triggered from the Web UI.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The screenshot below presents the core functionalities of the Web-UI. In general, the design is simplistic and focuses on minimizing the technical knowledge required from a user.<\/span><\/p>\n<p>&nbsp;<\/p>\n<figure id=\"attachment_20810\" aria-describedby=\"caption-attachment-20810\" style=\"width: 1632px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-20810 size-full\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/Bildschirmfoto-von-2021-01-19-16-51-33.png\" alt=\"Screenshot of the cluster overview page from the Web UI.\" width=\"1632\" height=\"822\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/Bildschirmfoto-von-2021-01-19-16-51-33.png 1632w, https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/Bildschirmfoto-von-2021-01-19-16-51-33-300x151.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/Bildschirmfoto-von-2021-01-19-16-51-33-1024x516.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/Bildschirmfoto-von-2021-01-19-16-51-33-768x387.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/Bildschirmfoto-von-2021-01-19-16-51-33-1536x774.png 1536w, https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/Bildschirmfoto-von-2021-01-19-16-51-33-400x201.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/Bildschirmfoto-von-2021-01-19-16-51-33-360x181.png 360w\" sizes=\"auto, (max-width: 1632px) 100vw, 1632px\" \/><figcaption id=\"caption-attachment-20810\" class=\"wp-caption-text\">The <em>cluster overview<\/em> page from the Web UI.<\/figcaption><\/figure>\n<p><span style=\"font-weight: 400;\">The above image is the page a user gets presented immediately after logging in. In the middle, various cluster metrics have been provided. On the left side, you can see all available kernels. They are clickable and lead to the details view, just as depicted below.<\/span><\/p>\n<figure id=\"attachment_20811\" aria-describedby=\"caption-attachment-20811\" style=\"width: 1860px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-20811 size-full\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/impl_web_ui_cluster.png\" alt=\"Screenshot of detailed information about one of the available kernels.\" width=\"1860\" height=\"1188\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/impl_web_ui_cluster.png 1860w, https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/impl_web_ui_cluster-300x192.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/impl_web_ui_cluster-1024x654.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/impl_web_ui_cluster-768x491.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/impl_web_ui_cluster-1536x981.png 1536w, https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/impl_web_ui_cluster-400x255.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/impl_web_ui_cluster-360x230.png 360w\" sizes=\"auto, (max-width: 1860px) 100vw, 1860px\" \/><figcaption id=\"caption-attachment-20811\" class=\"wp-caption-text\">Detailed information about one of the available kernels.<\/figcaption><\/figure>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">Further, the interface for creating new kernels, illustrated below, requires only minimal input from the user. The whole complexity of creating the <\/span><i><span style=\"font-weight: 400;\">kernel definition<\/span><\/i><span style=\"font-weight: 400;\"> as well as creating &amp; archiving Conda <\/span><span style=\"font-weight: 400;\">environments is done in the background by the API server.<\/span><\/p>\n<p>&nbsp;<\/p>\n<figure id=\"attachment_20812\" aria-describedby=\"caption-attachment-20812\" style=\"width: 1860px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-20812 size-full\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/impl_web_ui_cluster.png\" alt=\"Screenshot of the interface for creating new kernel environments.\" width=\"1860\" height=\"1188\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/impl_web_ui_cluster.png 1860w, https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/impl_web_ui_cluster-300x192.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/impl_web_ui_cluster-1024x654.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/impl_web_ui_cluster-768x491.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/impl_web_ui_cluster-1536x981.png 1536w, https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/impl_web_ui_cluster-400x255.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/02\/impl_web_ui_cluster-360x230.png 360w\" sizes=\"auto, (max-width: 1860px) 100vw, 1860px\" \/><figcaption id=\"caption-attachment-20812\" class=\"wp-caption-text\">The interface for creating new kernel environments.<\/figcaption><\/figure>\n<p><span style=\"font-weight: 400;\">The interfaces implemented in the Web UI were designed to be possibly error-resistant. E.g. specifying resources beyond limits set by an administrator or exceeding the physical resources leads to an appropriate error message. The same happens by providing the wrong Pip or Conda packages in the case of a Python kernel.\u00a0<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Summary\"><\/span><b>Summary<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">In this blog post, we have dealt with remote kernels for Jupyter Notebooks and discussed their benefits when working on shared cluster resources. They offer several advantages over a traditional notebook setup in terms of scalability and stability, especially in combination with Spark as a Framework for distributed data processing. Jupyter Enterprise Gateway, a tool from the Jupyter stack, is a good choice to enable remote kernels, whether for a single-user scenario or for enterprise use cases being integrated into a data science platform with other components. Since a kernel specification takes the major part of the configuration and requires quite a bit of technical expertise, we introduced a proof of concept that takes this complexity from the user and offers notebooks with Spark in Cluster Mode out of the box. It utilizes remote kernels and thus offers good scalability, stability as well as user isolation and encapsulation. The platform is also an example of how to integrate various open-source components in order to provide a possibly smooth user experience for data scientists and\/or engineers.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>According to the Data Science Survey conducted by JetBrains in 2018, Jupyter\/IPython notebooks are the most popular tool in the category IDEs and Editors among data scientists. Whether notebooks can be seen as true IDEs is debatable, however, there is no doubt that they offer several advantages \u2013 especially for exploratory data analysis. Incremental execution [&hellip;]<\/p>\n","protected":false},"author":179,"featured_media":27768,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"ep_exclude_from_search":false,"footnotes":""},"tags":[582,105],"service":[414,432],"coauthors":[{"id":179,"display_name":"Rafal Lokuciejewski","user_nicename":"rafal-lokuciejewskiinovex-de"}],"class_list":["post-27825","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","tag-jupyter","tag-spark","service-cloud","service-devops"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.6 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Set the Kernels Free \u2013 Remote Kernels for Jupyter Notebooks &amp; Spark - inovex GmbH<\/title>\n<meta name=\"description\" content=\"In this blog we discuss how to improve interactive development and data science with Apache Spark by enabling remote kernels with Jupyter Enterprise Gateway. Remote kernels offer several advantages in terms of scalability, stability and user encapsulation.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.inovex.de\/de\/blog\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Set the Kernels Free \u2013 Remote Kernels for Jupyter Notebooks &amp; Spark - inovex GmbH\" \/>\n<meta property=\"og:description\" content=\"In this blog we discuss how to improve interactive development and data science with Apache Spark by enabling remote kernels with Jupyter Enterprise Gateway. Remote kernels offer several advantages in terms of scalability, stability and user encapsulation.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.inovex.de\/de\/blog\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\/\" \/>\n<meta property=\"og:site_name\" content=\"inovex GmbH\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/inovexde\" \/>\n<meta property=\"article:published_time\" content=\"2021-04-27T17:21:48+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-09-15T09:30:45+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/jupyter-notebooks-kernels.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1920\" \/>\n\t<meta property=\"og:image:height\" content=\"1080\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Rafal Lokuciejewski\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/jupyter-notebooks-kernels-1024x576.png\" \/>\n<meta name=\"twitter:creator\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:site\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"Rafal Lokuciejewski\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"16\u00a0Minuten\" \/>\n\t<meta name=\"twitter:label3\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data3\" content=\"Rafal Lokuciejewski\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\\\/\"},\"author\":{\"name\":\"Rafal Lokuciejewski\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/person\\\/4852bd3d70d7e8d5453571bb27fc29c1\"},\"headline\":\"Set the Kernels Free \u2013 Remote Kernels for Jupyter Notebooks &#038; Spark\",\"datePublished\":\"2021-04-27T17:21:48+00:00\",\"dateModified\":\"2025-09-15T09:30:45+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\\\/\"},\"wordCount\":3100,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/jupyter-notebooks-kernels.png\",\"keywords\":[\"Jupyter\",\"Spark\"],\"articleSection\":[\"Analytics\",\"English Content\",\"General\"],\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\\\/\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\\\/\",\"name\":\"Set the Kernels Free \u2013 Remote Kernels for Jupyter Notebooks & Spark - inovex GmbH\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/jupyter-notebooks-kernels.png\",\"datePublished\":\"2021-04-27T17:21:48+00:00\",\"dateModified\":\"2025-09-15T09:30:45+00:00\",\"description\":\"In this blog we discuss how to improve interactive development and data science with Apache Spark by enabling remote kernels with Jupyter Enterprise Gateway. Remote kernels offer several advantages in terms of scalability, stability and user encapsulation.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\\\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/jupyter-notebooks-kernels.png\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/jupyter-notebooks-kernels.png\",\"width\":1920,\"height\":1080,\"caption\":\"Desktop with notebooks, a pen, apples and their kernels\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Set the Kernels Free \u2013 Remote Kernels for Jupyter Notebooks &#038; Spark\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#website\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\",\"name\":\"inovex GmbH\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\",\"name\":\"inovex GmbH\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/inovex-logo-16-9-1.png\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/inovex-logo-16-9-1.png\",\"width\":1921,\"height\":1081,\"caption\":\"inovex GmbH\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/inovexde\",\"https:\\\/\\\/x.com\\\/inovexgmbh\",\"https:\\\/\\\/www.instagram.com\\\/inovexlife\\\/\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/inovex\",\"https:\\\/\\\/www.youtube.com\\\/channel\\\/UC7r66GT14hROB_RQsQBAQUQ\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/person\\\/4852bd3d70d7e8d5453571bb27fc29c1\",\"name\":\"Rafal Lokuciejewski\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/8bf762d23ce1a4aca8afafba67dce7d6b0dabbcb56999bbb2e41d56664f9bcb7?s=96&d=retro&r=ge3f981b6ae50c555514691c36d70131a\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/8bf762d23ce1a4aca8afafba67dce7d6b0dabbcb56999bbb2e41d56664f9bcb7?s=96&d=retro&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/8bf762d23ce1a4aca8afafba67dce7d6b0dabbcb56999bbb2e41d56664f9bcb7?s=96&d=retro&r=g\",\"caption\":\"Rafal Lokuciejewski\"},\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/author\\\/rafal-lokuciejewskiinovex-de\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Set the Kernels Free \u2013 Remote Kernels for Jupyter Notebooks & Spark - inovex GmbH","description":"In this blog we discuss how to improve interactive development and data science with Apache Spark by enabling remote kernels with Jupyter Enterprise Gateway. Remote kernels offer several advantages in terms of scalability, stability and user encapsulation.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.inovex.de\/de\/blog\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\/","og_locale":"de_DE","og_type":"article","og_title":"Set the Kernels Free \u2013 Remote Kernels for Jupyter Notebooks & Spark - inovex GmbH","og_description":"In this blog we discuss how to improve interactive development and data science with Apache Spark by enabling remote kernels with Jupyter Enterprise Gateway. Remote kernels offer several advantages in terms of scalability, stability and user encapsulation.","og_url":"https:\/\/www.inovex.de\/de\/blog\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\/","og_site_name":"inovex GmbH","article_publisher":"https:\/\/www.facebook.com\/inovexde","article_published_time":"2021-04-27T17:21:48+00:00","article_modified_time":"2025-09-15T09:30:45+00:00","og_image":[{"width":1920,"height":1080,"url":"https:\/\/www.inovex.de\/wp-content\/uploads\/jupyter-notebooks-kernels.png","type":"image\/png"}],"author":"Rafal Lokuciejewski","twitter_card":"summary_large_image","twitter_image":"https:\/\/www.inovex.de\/wp-content\/uploads\/jupyter-notebooks-kernels-1024x576.png","twitter_creator":"@inovexgmbh","twitter_site":"@inovexgmbh","twitter_misc":{"Verfasst von":"Rafal Lokuciejewski","Gesch\u00e4tzte Lesezeit":"16\u00a0Minuten","Written by":"Rafal Lokuciejewski"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.inovex.de\/de\/blog\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\/#article","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/blog\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\/"},"author":{"name":"Rafal Lokuciejewski","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/4852bd3d70d7e8d5453571bb27fc29c1"},"headline":"Set the Kernels Free \u2013 Remote Kernels for Jupyter Notebooks &#038; Spark","datePublished":"2021-04-27T17:21:48+00:00","dateModified":"2025-09-15T09:30:45+00:00","mainEntityOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\/"},"wordCount":3100,"commentCount":0,"publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/jupyter-notebooks-kernels.png","keywords":["Jupyter","Spark"],"articleSection":["Analytics","English Content","General"],"inLanguage":"de","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.inovex.de\/de\/blog\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.inovex.de\/de\/blog\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\/","url":"https:\/\/www.inovex.de\/de\/blog\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\/","name":"Set the Kernels Free \u2013 Remote Kernels for Jupyter Notebooks & Spark - inovex GmbH","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\/#primaryimage"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/jupyter-notebooks-kernels.png","datePublished":"2021-04-27T17:21:48+00:00","dateModified":"2025-09-15T09:30:45+00:00","description":"In this blog we discuss how to improve interactive development and data science with Apache Spark by enabling remote kernels with Jupyter Enterprise Gateway. Remote kernels offer several advantages in terms of scalability, stability and user encapsulation.","breadcrumb":{"@id":"https:\/\/www.inovex.de\/de\/blog\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.inovex.de\/de\/blog\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/blog\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\/#primaryimage","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/jupyter-notebooks-kernels.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/jupyter-notebooks-kernels.png","width":1920,"height":1080,"caption":"Desktop with notebooks, a pen, apples and their kernels"},{"@type":"BreadcrumbList","@id":"https:\/\/www.inovex.de\/de\/blog\/set-the-kernels-free-remote-kernels-for-jupyter-notebooks-spark\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.inovex.de\/de\/"},{"@type":"ListItem","position":2,"name":"Set the Kernels Free \u2013 Remote Kernels for Jupyter Notebooks &#038; Spark"}]},{"@type":"WebSite","@id":"https:\/\/www.inovex.de\/de\/#website","url":"https:\/\/www.inovex.de\/de\/","name":"inovex GmbH","description":"","publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.inovex.de\/de\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Organization","@id":"https:\/\/www.inovex.de\/de\/#organization","name":"inovex GmbH","url":"https:\/\/www.inovex.de\/de\/","logo":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","width":1921,"height":1081,"caption":"inovex GmbH"},"image":{"@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/inovexde","https:\/\/x.com\/inovexgmbh","https:\/\/www.instagram.com\/inovexlife\/","https:\/\/www.linkedin.com\/company\/inovex","https:\/\/www.youtube.com\/channel\/UC7r66GT14hROB_RQsQBAQUQ"]},{"@type":"Person","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/4852bd3d70d7e8d5453571bb27fc29c1","name":"Rafal Lokuciejewski","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/secure.gravatar.com\/avatar\/8bf762d23ce1a4aca8afafba67dce7d6b0dabbcb56999bbb2e41d56664f9bcb7?s=96&d=retro&r=ge3f981b6ae50c555514691c36d70131a","url":"https:\/\/secure.gravatar.com\/avatar\/8bf762d23ce1a4aca8afafba67dce7d6b0dabbcb56999bbb2e41d56664f9bcb7?s=96&d=retro&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/8bf762d23ce1a4aca8afafba67dce7d6b0dabbcb56999bbb2e41d56664f9bcb7?s=96&d=retro&r=g","caption":"Rafal Lokuciejewski"},"url":"https:\/\/www.inovex.de\/de\/blog\/author\/rafal-lokuciejewskiinovex-de\/"}]}},"_links":{"self":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/27825","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/users\/179"}],"replies":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/comments?post=27825"}],"version-history":[{"count":4,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/27825\/revisions"}],"predecessor-version":[{"id":63988,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/27825\/revisions\/63988"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media\/27768"}],"wp:attachment":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media?parent=27825"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/tags?post=27825"},{"taxonomy":"service","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/service?post=27825"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/coauthors?post=27825"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}