{"id":21074,"date":"2018-01-17T08:49:50","date_gmt":"2018-01-17T07:49:50","guid":{"rendered":"http:\/\/www.inovex.de\/blog\/?p=12513"},"modified":"2022-11-29T11:17:18","modified_gmt":"2022-11-29T10:17:18","slug":"data-science-in-production","status":"publish","type":"post","link":"https:\/\/www.inovex.de\/de\/blog\/data-science-in-production\/","title":{"rendered":"Data Science in Production: Packaging, Versioning and Continuous Integration"},"content":{"rendered":"<p>A common pattern in most data science projects I participated in is that it\u2019s all fun and games until someone wants to put it into production. From that point in time on no one will any longer give you a pat on the back for a high accuracy and smart algorithm. All of a sudden the crucial question is how to deploy your model, which version, how can updates be rolled out, which requirements are needed and so on. The worst case in such a moment is to realize that up until now the glorious proof of concept model is not an application but rather a stew of Python\/R scripts which were deployed by cloning a git repo and run by some Jenkins jobs with a dash of Bash. Bringing data science to production is a hot topic right now and there are many facets to it. This is the first in a series of posts about data science in production where we focus on aspects of modern software engineering like packaging, versioning as well as Continuous Integration in general.<!--more--><\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_85 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\"><p class=\"ez-toc-title\" style=\"cursor:inherit\"><\/p>\n<\/div><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-science-in-production\/#Packages-vs-Scripts\" >Packages vs. Scripts<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-science-in-production\/#Packaging-and-Versioning\" >Packaging and\u00a0Versioning<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-science-in-production\/#PyScaffold\" >PyScaffold<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-science-in-production\/#Versioning\" >Versioning<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-science-in-production\/#Continuous-Integration\" >Continuous\u00a0Integration<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-science-in-production\/#Artefact-Store\" >Artefact\u00a0Store<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-science-in-production\/#Indices-and-Channels\" >Indices and\u00a0Channels<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-science-in-production\/#Automated-CI-Process\" >Automated CI Process<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-science-in-production\/#Conclusion\" >Conclusion<\/a><\/li><\/ul><\/nav><\/div>\n<h2><span class=\"ez-toc-section\" id=\"Packages-vs-Scripts\"><\/span>Packages vs. Scripts<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Being a data scientist does not free you from proper software engineering. Of course most models start with a simple script or a Jupyter notebook maybe, just the essence of your idea to test it quickly. But as your model evolves, the number of lines of code grow, it\u2019s always a good idea to think about the structure of your code and to move away from writing simple scripts to proper applications or libraries.<\/p>\n<p>In case of a Python model, that means grouping functionality into different modules <a href=\"https:\/\/en.wikipedia.org\/wiki\/Separation_of_concerns\">separating different concerns<\/a> which could be organised in Python packages on a higher level. Maybe certain parts of the model are even so general that they could be packaged into an own library for greater reusability also for other projects. In the context of Python, a bundle of software to be installed like a library or application is denoted with the term <em>package<\/em>. Another synonym is <em>distribution<\/em> which is easily to be confused with a Linux distribution. Therefore the term package is more commonly used although there is an ambiguity with the kind of package you import in your Python source code (i.e. a container of modules).<\/p>\n<p>So now what is the key difference between a bunch of Python scripts with some modules and a proper package? A Python package adheres a certain structure and thus can be shipped and installed by others. Simple as it sounds this is a major advantage over having just some Python modules inside a repository. With a package it is possible to make distinct code releases with different versions that can be stored for later reference. Dependencies like <em>numpy<\/em> and <em>scikit-learn<\/em> can be specified and dependency resolution is automated by tools like <a href=\"https:\/\/pip.pypa.io\/\">pip<\/a> and <a href=\"https:\/\/conda.io\/\">conda<\/a>. Why is this so important? When bugs in production occur it&#8217;s incredibly useful to know which state of your code actually is in production. Is it still version 0.9 or already 1.0? Did the bug also occur in the last release? Most debugging starts with reproducing the bug locally on your machine. But what if the release is already half a year old and there where major changes in its requirements? Maybe the bug is caused by one of its dependencies? If your package also includes its dependencies with pinned versions, restoring the exact same state as in production but inside a local <a href=\"https:\/\/virtualenv.pypa.io\/\">virtualenv<\/a> or <a href=\"https:\/\/conda.io\/\">conda<\/a>\u00a0environment will be a matter of seconds.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Packaging-and-Versioning\"><\/span>Packaging and\u00a0Versioning<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Python\u2019s history of packaging has had its dark times but nowadays things have pretty much settled and now there is only one obvious tool left to do it, namely <a href=\"https:\/\/setuptools.readthedocs.io\/\">setuptools<\/a>. An official Python <a href=\"https:\/\/packaging.python.org\/tutorials\/distributing-packages\/\">packaging tutorial<\/a> and many user articles like <a href=\"http:\/\/veekaybee.github.io\/2017\/09\/26\/python-packaging\/\">Alice in Python projectland<\/a> explain the various steps needed to set up a proper <span class=\"lang:sh decode:true crayon-inline \">setup.py<\/span> but it takes a long time to really master the subtleties of Python packaging and even then it is quite cumbersome. This is the reason many developers refrain from building Python packages. Another reason is that even if you have a correct Python package set up, proper versioning is still a manual and thus error-prone process. Therefore the tool <a href=\"https:\/\/github.com\/pypa\/setuptools_scm\">setuptools_scm<\/a> exists which draws the current version automatically from git so a new release is as simple as creating a new tag. Following the famous Unix principle \u201cDo one thing and do it well\u201c also a Python package is composed of many specialised tools. Besides <a href=\"https:\/\/setuptools.readthedocs.io\/\">setuptools<\/a> and <a href=\"https:\/\/github.com\/pypa\/setuptools_scm\">setuptools_scm<\/a> there is <a href=\"http:\/\/www.sphinx-doc.org\/\">sphinx<\/a> for documentation, testing tools like <a href=\"https:\/\/docs.pytest.org\/\">pytest<\/a> and <a href=\"https:\/\/tox.readthedocs.io\/\">tox<\/a> as well as many other little helpers to consider when setting up a Python package. Already scared off of Python packaging? Hold your breath, there is no reason to\u00a0be.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"PyScaffold\"><\/span>PyScaffold<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Luckily there is one tool to rule them all, <a href=\"http:\/\/pyscaffold.org\/\">PyScaffold<\/a>, which provides a proper Python<\/p>\n<p>package within a second. It is installed easily\u00a0with<\/p>\n<pre class=\"lang:sh decode:true\">pip install pyscaffold<\/pre>\n<p>or<\/p>\n<pre class=\"lang:sh decode:true \">conda install -c conda-forge pyscaffold<\/pre>\n<p>if you prefer <a href=\"https:\/\/conda.io\/\">conda<\/a> over <a href=\"https:\/\/pip.pypa.io\/\">pip<\/a>. Generating now a project <span class=\"lang:sh decode:true crayon-inline \">Scikit-AI<\/span> with a package <span class=\"lang:sh decode:true crayon-inline \">skai<\/span> is just a matter of typing a single\u00a0command:<\/p>\n<pre class=\"lang:sh decode:true \">putup Scikit-AI -p skai<\/pre>\n<p>This will create a git repository <span class=\"lang:sh decode:true crayon-inline \">Scikit-AI<\/span> including a fully configured <span class=\"lang:sh decode:true crayon-inline \">setup.py<\/span> that can be configured easily and in a descriptive way by modifying <span class=\"lang:sh decode:true crayon-inline \">setup.cfg<\/span>. The typical Python package structure is provided including subfolders such as <span class=\"lang:sh decode:true crayon-inline \">docs<\/span> for <a href=\"http:\/\/www.sphinx-doc.org\/\">sphinx<\/a> documentation, <span class=\"lang:sh decode:true crayon-inline \">tests<\/span> for unit testing as well as a <span class=\"lang:sh decode:true crayon-inline \">src<\/span> subfolder including the actual Python package <span class=\"lang:sh decode:true crayon-inline \">skai<\/span>. Also <a href=\"https:\/\/github.com\/pypa\/setuptools_scm\">setuptools_scm<\/a> is integrated and other features can be activates optionally like support for <a href=\"https:\/\/travis-ci.org\/\">Travis<\/a>, <a href=\"https:\/\/gitlab.com\/\">Gitlab<\/a>, <a href=\"https:\/\/tox.readthedocs.io\/\">tox<\/a>, <a href=\"http:\/\/pre-commit.com\/\">pre-commit<\/a> and many\u00a0more. An example of a more advanced usage of PyScaffold\u00a0is<\/p>\n<pre>putup Scikit-AI -p skai --travis --tox -d \"Scientific AI library with a twist\" -u \"http:\/\/sky.net\/\"<\/pre>\n<p>where also example configuration files for Travis and tox will be created. The additionally provided short description with the flag <span class=\"lang:sh decode:true crayon-inline \">-d<\/span> is used where appropriate as is the url passed by <span class=\"lang:sh decode:true crayon-inline \">-u<\/span>. As usual with shell commands, <span class=\"lang:sh decode:true crayon-inline \">putup &#8211;help<\/span> provides information about the various\u00a0arguments.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Versioning\"><\/span>Versioning<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Having a proper Python package already gives us the possibility to ship something that can be installed by others easily including its dependencies of course. But if you want to move fast also the deployment of your new model package needs to be as much automated as possible. You want to make sure that bug fixes end up in production automatically while new features need to be manually\u00a0approved.<\/p>\n<p>For this reason <a href=\"https:\/\/semver.org\/\">Semantic Versioning<\/a> was developed which basically says that a version number is composed of MAJOR.MINOR.PATCH and you increment\u00a0the:<\/p>\n<ol>\n<li>MAJOR version when you make incompatible API\u00a0changes,<\/li>\n<li>MINOR version when you add functionality in a backwards-compatible manner,\u00a0and<\/li>\n<li>PATCH version when you make backwards-compatible bug\u00a0fixes.<\/li>\n<\/ol>\n<p>This programming language independent concept also made its way into Python\u2019s official version identification <a href=\"https:\/\/www.python.org\/dev\/peps\/pep-0440\/\">EP440<\/a>. Besides MAJOR, MINOR and PATCH the version number is also extended by semantics identifying development, post and pre PEP440 compatible, semantic version identifier. A developer just needs to follow the conventions of <a href=\"https:\/\/semver.org\/\">Semantic Versioning<\/a> when tagging a release with\u00a0git.<\/p>\n<p>Versioning becomes even more important when your company develops many interdependent packages. The effort of sticking to the simple conventions of <a href=\"https:\/\/semver.org\/\">Semantic Versioning<\/a> right from the start is just a small price to pay compared to the myriad of pains in the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Dependency_hell\">dependency hell<\/a> you will otherwise end up in long-term. Believe me on that\u00a0one.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Continuous-Integration\"><\/span>Continuous\u00a0Integration<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Now that we know about packaging and versioning the next step is to establish an automated Continuous Integration (CI) process. For this purpose a common choice is <a href=\"https:\/\/jenkins-ci.org\/\">Jenkins<\/a> especially for proprietary software since it can be installed<\/p>\n<p>on\u00a0premise.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Artefact-Store\"><\/span>Artefact\u00a0Store<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Besides the <span class=\"caps\">CI<\/span> tool there is also a place needed to store the built packages. The term <em>artefact store<\/em> is used commonly for a service that offers a way to store and install packages from. In the Python world the Python Package Index (<a href=\"https:\/\/pypi.python.org\">PyPI<\/a>) is the official artefact store to publish open source packages. For companies the on-premise equivalent is <a href=\"https:\/\/devpi.net\/\">devpi<\/a>\u00a0that:<\/p>\n<ul>\n<li>acts as a PyPI\u00a0mirror,<\/li>\n<li>allows uploading, testing and staging with private\u00a0indexes,<\/li>\n<li>has a nice web interface for\u00a0searching,<\/li>\n<li>allows uploading and browsing the Sphinx documentation of\u00a0packages,<\/li>\n<li>has user management\u00a0and<\/li>\n<li>features Jenkins\u00a0integration.<\/li>\n<\/ul>\n<p>If all you care about is Python then devpi is the right artefact store for you. In most companies also Java is used and <a href=\"https:\/\/de.sonatype.com\/products\/nexus-repository\">Nexus<\/a> often serves thereby already as artefact store. In this case it might be more advantageous to use Nexus also for storing Python packages which is available since version 3.0 to avoid the complexity of maintaining another\u00a0service.<\/p>\n<p>In highly polylingual environments with many languages like Python, R, Java and C\/C++ this will lead to many different artefact stores and various different ways of installing artefacts. A unified approach is provided by <a href=\"https:\/\/conda.io\/\">conda<\/a> since conda packages can be built for <a href=\"https:\/\/conda.io\/docs\/user-guide\/tutorials\/build-postgis.html\">general code projects<\/a>. The on-premise artefact store provided by <a href=\"https:\/\/anaconda.org\/\">Anaconda<\/a> is called <a href=\"https:\/\/docs.anaconda.com\/anaconda-repository\/\">anaconda-repository<\/a> and is part of the proprietary enterprise server. Whenever a unified approach to storing and installing artefacts of different languages is a major concern, <a href=\"https:\/\/anaconda.org\/\">Anaconda<\/a> might be a viable\u00a0solution.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Indices-and-Channels\"><\/span>Indices and\u00a0Channels<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Common to all artifact stores is the availability of different <em>indices<\/em> (or <em>channels<\/em> in conda) to organize artefacts. It is a good practice to have different indices to describe the maturity of the contained packages like <em>unstable<\/em>, <em>testing<\/em> and <em>stable<\/em>. This complements the automatic <a href=\"https:\/\/www.python.org\/dev\/peps\/pep-0440\/\"><span class=\"caps\">PEP440<\/span><\/a> versioning with <a href=\"http:\/\/pyscaffold.org\/\">PyScaffold<\/a> since it allows us to tell a development version which passed the unit tests (<em>testing<\/em>) from a development version which did not (<em>unstable<\/em>). Since <a href=\"https:\/\/pip.pypa.io\/\">pip<\/a> by default installs only stable releases, e.g. <span class=\"lang:sh decode:true crayon-inline \">1.0<\/span> but not <span class=\"lang:sh decode:true crayon-inline \">1.0b3<\/span>, while the <span class=\"lang:sh decode:true crayon-inline \">&#8211;pre<\/span> flag is needed to install unstable releases the differentiation between <em>testing<\/em> and <em>stable<\/em> indices is not absolutely necessary. Still for organisational reasons, having an <em>testing<\/em> index as input for <span class=\"caps\">QA<\/span> and a <em>stable<\/em> index that really only holds releases that passed the whole <span class=\"caps\">QA<\/span> process is a good idea. Also <a href=\"https:\/\/conda.io\/\">conda<\/a> does not seem to provide an equivalent to the <span class=\"lang:sh decode:true crayon-inline \">&#8211;pre<\/span> flag and thus different channels need to be\u00a0used.<\/p>\n<p>One should also note that git allows to tag a single commit several times which will lead to different versions of the Python package having the same content. This gives means to the following convention: Let\u2019s say there was a bug in version <span class=\"lang:sh decode:true crayon-inline \">1.2<\/span> and after two commits the bug seems to be fixed. The automatically inferred version number by PyScaffold will be <span class=\"lang:sh decode:true crayon-inline \">1.2.post0.pre2-gHASH<\/span>. Being happy with her fix the developer tags the commit with <span class=\"lang:sh decode:true crayon-inline \">1.2.1rc1<\/span> (first release candidate of version 1.2.1). Since all unit tests pass this patch will end up in the <em>testing<\/em> index where <span class=\"caps\">QA<\/span> can put it to the acid test. After that, the same commit will be tagged and signed by <span class=\"caps\">QA<\/span> with name <span class=\"lang:sh decode:true crayon-inline \">1.2.1<\/span> which results in a new package that can be moved to the <em>stable<\/em> index\u00a0automatically.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Automated-CI-Process\"><\/span>Automated CI Process<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>With this components in mind we can establish an automated CI process. Upon a new commit on a central git repository the <em>packaging<\/em> Jenkins job clones the repo and builds the package, e.g. with <span class=\"lang:sh decode:true crayon-inline \">python setup.py bdist_wheel<\/span>. If this is successful the package is uploaded to the <em>unstable<\/em> index of the artefact store. Upon the successful completion of the packaging job a second Jenkins job for <em>testing<\/em> is triggered. The reason for packaging and publishing before running any kind of unit tests is that already during the packaging can be major flaws that a typical unit test could never find. For instance, missing data files that are in the repo but not specified in the package, missing or wrong dependencies and so on. Therefore it is important to run unit tests always against the package installed in a clean environment and that is exactly what the testing job does. After having set up a fresh environment with <a href=\"https:\/\/virtualenv.pypa.io\/\">virtualenv<\/a> or <a href=\"https:\/\/conda.io\/\">conda<\/a> the just published package is installed from the artefact store. If this succeeds the git repo is cloned into a subfolder providing the unit tests (in the <span class=\"lang:sh decode:true crayon-inline \">tests<\/span> subfolder). These unit tests are then executed and check the installed package. In case that all tests pass the package is moved from the <em>unstable<\/em> index to the <em>testing<\/em> index. In case the commit was tagged as a stable release and thus the package\u2019s version is stable according to <a href=\"https:\/\/www.python.org\/dev\/peps\/pep-0440\/\">PEP440<\/a> it is moved into the <em>stable<\/em> index. Figure 1 illustrates the complete\u00a0process.<\/p>\n<figure id=\"attachment_12518\" aria-describedby=\"caption-attachment-12518\" style=\"width: 541px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-12518\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2018\/01\/ci_build_publish.png\" alt=\"Data Science CI Pipeline\" width=\"541\" height=\"401\" \/><figcaption id=\"caption-attachment-12518\" class=\"wp-caption-text\">Figure 1: The packaging job clones source code repository, builds the software package and pushes it into the unstable index of the artefact store. If these steps succeed the testing job is triggered which installs the package from the artefact store and its dependencies into a clean environment. The source code reposistory is then cloned in order to run the unit tests against the installed package. If all unit tests pass the package is moved into the testing index of the artefact store or optionally to the stable index if the version is a stable release.<\/figcaption><\/figure>\n<h2><span class=\"ez-toc-section\" id=\"Conclusion\"><\/span>Conclusion<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>It is clear that packaging, versioning and CI are just one aspect of how to bring Data Science in production and follow-up posts will shed some light on other aspects. Whereas these aspects are quite important, their benefits are often underestimated. We have seen that proper packaging is crucial to shipping, installing a package and dealing with its dependencies. Semantic Versioning supports us in automation of rolling out patches and in the organisation of deployment. The advantages of Continuous Integration are quite obvious and promoted a lot by the DevOps culture in recent years. Also Data Science can learn and benefit from this spirit and we have seen that a minimal CI setup is easy to accomplish. All together they build a fundamental corner stone of Data Science in production. Bringing data science to production plays a crucial part in many projects at <a href=\"https:\/\/www.inovex.de\/en\/\">inovex<\/a> since the added value of data science only shows in\u00a0production.<\/p>\n<p>Some good talks around this topic were held by <a href=\"https:\/\/www.linkedin.com\/in\/sebastian-neubauer-16626a79\/\">Sebastian Neubauer<\/a>, one of the acclaimed DevOps rock stars of Python in production. His talks <a href=\"https:\/\/www.youtube.com\/watch?v=Ad9qSbrfnvk\">A Pythonic Approach to CI<\/a> and <a href=\"https:\/\/www.youtube.com\/watch?v=hnQKsxKjCUo\">There should be one obvious way to bring Python into production<\/a> perfectly complement this post and are even fun<\/p>\n<p>to\u00a0watch.<\/p>\n<p>This article was first published at <a href=\"http:\/\/www.florianwilhelm.info\/2018\/01\/ds_in_prod_packaging_ci\/\" target=\"_blank\" rel=\"noopener\">florianwilhelm.info<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A common pattern in most data science projects I participated in is that it\u2019s all fun and games until someone wants to put it into production. From that point in time on no one will any longer give you a pat on the back for a high accuracy and smart algorithm. All of a sudden [&hellip;]<\/p>\n","protected":false},"author":52,"featured_media":13220,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"ep_exclude_from_search":false,"footnotes":""},"tags":[206,225,226],"service":[431],"coauthors":[{"id":52,"display_name":"Florian Wilhelm","user_nicename":"fwilhelm"}],"class_list":["post-21074","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","tag-data-science","tag-data-science-in-production","tag-model-management","service-data-science"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.8 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Data Science in Production: Packaging, Versioning &amp; Cont. Integration<\/title>\n<meta name=\"description\" content=\"Here&#039;s what changes when your data science project grows from a proof of concept. How do you deploy your model, how can updates be rolled out, ...?\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.inovex.de\/de\/blog\/data-science-in-production\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Data Science in Production: Packaging, Versioning &amp; Cont. Integration\" \/>\n<meta property=\"og:description\" content=\"Here&#039;s what changes when your data science project grows from a proof of concept. How do you deploy your model, how can updates be rolled out, ...?\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.inovex.de\/de\/blog\/data-science-in-production\/\" \/>\n<meta property=\"og:site_name\" content=\"inovex GmbH\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/inovexde\" \/>\n<meta property=\"article:published_time\" content=\"2018-01-17T07:49:50+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-11-29T10:17:18+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2018\/01\/data-science-in-production.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1920\" \/>\n\t<meta property=\"og:image:height\" content=\"1080\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Florian Wilhelm\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2018\/01\/data-science-in-production-1024x576.jpg\" \/>\n<meta name=\"twitter:creator\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:site\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"Florian Wilhelm\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"14\u00a0Minuten\" \/>\n\t<meta name=\"twitter:label3\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data3\" content=\"Florian Wilhelm\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-science-in-production\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-science-in-production\\\/\"},\"author\":{\"name\":\"Florian Wilhelm\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/person\\\/57ad7c24ee7f9ec59ed87598c73fe79e\"},\"headline\":\"Data Science in Production: Packaging, Versioning and Continuous Integration\",\"datePublished\":\"2018-01-17T07:49:50+00:00\",\"dateModified\":\"2022-11-29T10:17:18+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-science-in-production\\\/\"},\"wordCount\":2381,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-science-in-production\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2018\\\/01\\\/data-science-in-production.jpg\",\"keywords\":[\"Data Science\",\"Data Science in Production\",\"Model Management\"],\"articleSection\":[\"Analytics\",\"English Content\",\"General\"],\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-science-in-production\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-science-in-production\\\/\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-science-in-production\\\/\",\"name\":\"Data Science in Production: Packaging, Versioning & Cont. Integration\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-science-in-production\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-science-in-production\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2018\\\/01\\\/data-science-in-production.jpg\",\"datePublished\":\"2018-01-17T07:49:50+00:00\",\"dateModified\":\"2022-11-29T10:17:18+00:00\",\"description\":\"Here's what changes when your data science project grows from a proof of concept. How do you deploy your model, how can updates be rolled out, ...?\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-science-in-production\\\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-science-in-production\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-science-in-production\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2018\\\/01\\\/data-science-in-production.jpg\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2018\\\/01\\\/data-science-in-production.jpg\",\"width\":1920,\"height\":1080,\"caption\":\"Data Science on a conveyor belt\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/data-science-in-production\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Data Science in Production: Packaging, Versioning and Continuous Integration\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#website\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\",\"name\":\"inovex GmbH\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\",\"name\":\"inovex GmbH\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/inovex-logo-16-9-1.png\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/inovex-logo-16-9-1.png\",\"width\":1921,\"height\":1081,\"caption\":\"inovex GmbH\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/inovexde\",\"https:\\\/\\\/x.com\\\/inovexgmbh\",\"https:\\\/\\\/www.instagram.com\\\/inovexlife\\\/\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/inovex\",\"https:\\\/\\\/www.youtube.com\\\/channel\\\/UC7r66GT14hROB_RQsQBAQUQ\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/person\\\/57ad7c24ee7f9ec59ed87598c73fe79e\",\"name\":\"Florian Wilhelm\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/cropped-florian-1-IMG_5829-800x610-1-96x96.jpg5db1abe47435abb84b0b7484ce0890e9\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/cropped-florian-1-IMG_5829-800x610-1-96x96.jpg\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/cropped-florian-1-IMG_5829-800x610-1-96x96.jpg\",\"caption\":\"Florian Wilhelm\"},\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/author\\\/fwilhelm\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Data Science in Production: Packaging, Versioning & Cont. Integration","description":"Here's what changes when your data science project grows from a proof of concept. How do you deploy your model, how can updates be rolled out, ...?","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.inovex.de\/de\/blog\/data-science-in-production\/","og_locale":"de_DE","og_type":"article","og_title":"Data Science in Production: Packaging, Versioning & Cont. Integration","og_description":"Here's what changes when your data science project grows from a proof of concept. How do you deploy your model, how can updates be rolled out, ...?","og_url":"https:\/\/www.inovex.de\/de\/blog\/data-science-in-production\/","og_site_name":"inovex GmbH","article_publisher":"https:\/\/www.facebook.com\/inovexde","article_published_time":"2018-01-17T07:49:50+00:00","article_modified_time":"2022-11-29T10:17:18+00:00","og_image":[{"width":1920,"height":1080,"url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2018\/01\/data-science-in-production.jpg","type":"image\/jpeg"}],"author":"Florian Wilhelm","twitter_card":"summary_large_image","twitter_image":"https:\/\/www.inovex.de\/wp-content\/uploads\/2018\/01\/data-science-in-production-1024x576.jpg","twitter_creator":"@inovexgmbh","twitter_site":"@inovexgmbh","twitter_misc":{"Verfasst von":"Florian Wilhelm","Gesch\u00e4tzte Lesezeit":"14\u00a0Minuten","Written by":"Florian Wilhelm"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.inovex.de\/de\/blog\/data-science-in-production\/#article","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/blog\/data-science-in-production\/"},"author":{"name":"Florian Wilhelm","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/57ad7c24ee7f9ec59ed87598c73fe79e"},"headline":"Data Science in Production: Packaging, Versioning and Continuous Integration","datePublished":"2018-01-17T07:49:50+00:00","dateModified":"2022-11-29T10:17:18+00:00","mainEntityOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/data-science-in-production\/"},"wordCount":2381,"commentCount":0,"publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/data-science-in-production\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2018\/01\/data-science-in-production.jpg","keywords":["Data Science","Data Science in Production","Model Management"],"articleSection":["Analytics","English Content","General"],"inLanguage":"de","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.inovex.de\/de\/blog\/data-science-in-production\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.inovex.de\/de\/blog\/data-science-in-production\/","url":"https:\/\/www.inovex.de\/de\/blog\/data-science-in-production\/","name":"Data Science in Production: Packaging, Versioning & Cont. Integration","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/data-science-in-production\/#primaryimage"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/data-science-in-production\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2018\/01\/data-science-in-production.jpg","datePublished":"2018-01-17T07:49:50+00:00","dateModified":"2022-11-29T10:17:18+00:00","description":"Here's what changes when your data science project grows from a proof of concept. How do you deploy your model, how can updates be rolled out, ...?","breadcrumb":{"@id":"https:\/\/www.inovex.de\/de\/blog\/data-science-in-production\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.inovex.de\/de\/blog\/data-science-in-production\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/blog\/data-science-in-production\/#primaryimage","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2018\/01\/data-science-in-production.jpg","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2018\/01\/data-science-in-production.jpg","width":1920,"height":1080,"caption":"Data Science on a conveyor belt"},{"@type":"BreadcrumbList","@id":"https:\/\/www.inovex.de\/de\/blog\/data-science-in-production\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.inovex.de\/de\/"},{"@type":"ListItem","position":2,"name":"Data Science in Production: Packaging, Versioning and Continuous Integration"}]},{"@type":"WebSite","@id":"https:\/\/www.inovex.de\/de\/#website","url":"https:\/\/www.inovex.de\/de\/","name":"inovex GmbH","description":"","publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.inovex.de\/de\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Organization","@id":"https:\/\/www.inovex.de\/de\/#organization","name":"inovex GmbH","url":"https:\/\/www.inovex.de\/de\/","logo":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","width":1921,"height":1081,"caption":"inovex GmbH"},"image":{"@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/inovexde","https:\/\/x.com\/inovexgmbh","https:\/\/www.instagram.com\/inovexlife\/","https:\/\/www.linkedin.com\/company\/inovex","https:\/\/www.youtube.com\/channel\/UC7r66GT14hROB_RQsQBAQUQ"]},{"@type":"Person","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/57ad7c24ee7f9ec59ed87598c73fe79e","name":"Florian Wilhelm","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/wp-content\/uploads\/cropped-florian-1-IMG_5829-800x610-1-96x96.jpg5db1abe47435abb84b0b7484ce0890e9","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/cropped-florian-1-IMG_5829-800x610-1-96x96.jpg","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/cropped-florian-1-IMG_5829-800x610-1-96x96.jpg","caption":"Florian Wilhelm"},"url":"https:\/\/www.inovex.de\/de\/blog\/author\/fwilhelm\/"}]}},"_links":{"self":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/21074","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/users\/52"}],"replies":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/comments?post=21074"}],"version-history":[{"count":1,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/21074\/revisions"}],"predecessor-version":[{"id":37980,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/21074\/revisions\/37980"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media\/13220"}],"wp:attachment":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media?parent=21074"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/tags?post=21074"},{"taxonomy":"service","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/service?post=21074"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/coauthors?post=21074"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}