{"id":32222,"date":"2021-10-25T05:51:55","date_gmt":"2021-10-25T04:51:55","guid":{"rendered":"https:\/\/www.inovex.de\/?p=32222"},"modified":"2022-11-21T11:35:40","modified_gmt":"2022-11-21T10:35:40","slug":"a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform","status":"publish","type":"post","link":"https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/","title":{"rendered":"A GDPR-Compliant Data Lake Architecture on the Google Cloud Platform"},"content":{"rendered":"<p>This blog post gives an overview of an example data lake architecture implemented on the Google Cloud Platform, which is capable of operating on petabyte data scenarios while being compliant with GDPR and still able to derive business insights &#8211; even though data is anonymized! Also, we will demonstrate how restricted data access management is possible in such a setup.<\/p>\n<p>In 2018 the General Data Protection Regulation (GDPR) went into effect. Just two years later the US state California followed with its California Consumer Privacy Act \u2013 and many more might follow.<\/p>\n<p>All these privacy regulations have in common that they enforce legally binding guidelines on storing and processing user related data.<br \/>\nSince nowadays almost every modern company uses data for their business \u2013 and penalties for not following these regulations are enormous \u2013, many of our customers are facing the challenge of adapting their architectures to these regulations.<\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_79_2 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\"><p class=\"ez-toc-title\" style=\"cursor:inherit\"><\/p>\n<\/div><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/#GDPR-what-is-it-all-about\" >GDPR, what is it all about?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/#Data-Lake-Architecture\" >Data Lake Architecture<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/#Storage-Layer\" >Storage Layer<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/#Application-Layer\" >Application Layer<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/#Access-Layer\" >Access Layer<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/#Architecture-assemblementdiscussion\" >Architecture assemblement\/discussion<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/#Summary-Lookout\" >Summary &amp; Lookout<\/a><\/li><\/ul><\/nav><\/div>\n<h2><span class=\"ez-toc-section\" id=\"GDPR-what-is-it-all-about\"><\/span>GDPR, what is it all about?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>While the GDPR covers many topics \u2013 some of them rather vaguely expressed \u2013 we want to focus on two main aspects relevant to keep your data lake compliant:<\/p>\n<p><strong>1. Retention<\/strong>: Personal identifiable information (PII) data must be either dropped or anonymized after a defined time period (defined by your legal department)<\/p>\n<p><strong>2. Data Deletion Requests (DDR)<\/strong>: individuals have the right to have their personal data erased by inquiring a request<\/p>\n<p>For both DDR and retention there are two ways to go: Either drop the data completely or anonymize it. While dropping data is easier, it might have an impact on your business since you cannot derive insights from your customer data anymore. Anonymizing the data on the other hand is more complex, but your business is still able to get insights from it (see Pic 1). For example, after the user\u2019s name (and all other relevant PII attributes), \u201cAlexander\u201c has been fully anonymized (hashed with a random seed) to \u201cfsdfsdfwef\u201c, analyzing on that data is valid.<\/p>\n<p>On that end, the technical solution for DDR and retention are very similar: Either certain user requests (DDR) or a moving time window (retention) lead to the anonymization of data.<\/p>\n<p>&nbsp;<\/p>\n<figure id=\"attachment_32089\" aria-describedby=\"caption-attachment-32089\" style=\"width: 677px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-32089\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/retention_ddr.png\" alt=\"PII Data table with retention and DDR anonymization\" width=\"677\" height=\"401\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/retention_ddr.png 1286w, https:\/\/www.inovex.de\/wp-content\/uploads\/retention_ddr-300x178.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/retention_ddr-1024x607.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/retention_ddr-768x455.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/retention_ddr-400x237.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/retention_ddr-360x213.png 360w\" sizes=\"auto, (max-width: 677px) 100vw, 677px\" \/><figcaption id=\"caption-attachment-32089\" class=\"wp-caption-text\">Anonymizing data to be GDPR compliant, either due to a user request (DDR) or because the retention period has passed. The same user ID will result in the same hash enabling consistent analysis for that user.<\/figcaption><\/figure>\n<h2><span class=\"ez-toc-section\" id=\"Data-Lake-Architecture\"><\/span>Data Lake Architecture<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>To fulfill the requirements mentioned before, we need an architecture that is capable of implementing these requirements.<\/p>\n<p>Before we discuss the architecture as a whole, let&#8217;s have a closer look at the core components: What is their purpose and how do they interact with each other to eventually meet GDPR compliance?<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Storage-Layer\"><\/span>Storage Layer<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<h3>Delta &amp; Google Cloud Storage<\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-32099 size-thumbnail\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/\/gcs-150x150.png\" alt=\"Google CLoud Storage Logo\" width=\"150\" height=\"150\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/gcs-150x150.png 150w, https:\/\/www.inovex.de\/wp-content\/uploads\/gcs.png 256w\" sizes=\"auto, (max-width: 150px) 100vw, 150px\" \/> <img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-32101 \" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/\/dl-300x246.png\" alt=\"Delta Lake Logo\" width=\"181\" height=\"148\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/dl-300x246.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/dl-400x327.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/dl-360x295.png 360w, https:\/\/www.inovex.de\/wp-content\/uploads\/dl.png 650w\" sizes=\"auto, (max-width: 181px) 100vw, 181px\" \/><\/p>\n<p>In the past<span style=\"font-weight: 400;\"> it was tedious to modify already stored data in big data engineering architectures <\/span>(e.g. Hadoop with Hive and HDFS). While possible, this is something the underlying system was not designed to do, resulting in high resource consumption and frustration.<br \/>\nFrom this pain, the motivation grew to change and hence many projects arose with the focus to bring ACID-like capabilities to big data storage. Some of the most prominent technologies are Apache Iceberg, Apache Hudi and Delta Lake.<\/p>\n<p>Since Apache Spark &amp; Delta Lake are both projects developed by Databricks it is a good fit, if your ETL workloads are already written in Spark because they have a seamless\u00a0 integration. Also GCP DataProc (see next section) offers native Delta Lake support.<br \/>\nWhile Delta has many more features, the most important one in regards to GDPR is the atomicity of transactions. Imagine if you would only anonymize part of your data while other parts are still sensitive in clear text because a transaction failed.<br \/>\nThis aspect of modifying your data in a transactionally consistent manner is crucial to a data architecture that should meet the above-stated GDPR requirements.<br \/>\nTo avoid accessing PII data by using Delta Lakes time travel, which allows you to access old versions of the data, the VACUUM operation should be executed frequently.<br \/>\nDelta relies on the Apache Parquet format, the resulting delta tables\/parquet files will be stored on the Google Cloud Storage (GCS).<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Application-Layer\"><\/span>Application Layer<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<h3>DataProc<\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-32095 size-thumbnail\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/\/dataproc-150x150.png\" alt=\"Data Prog Logo\" width=\"150\" height=\"150\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/dataproc-150x150.png 150w, https:\/\/www.inovex.de\/wp-content\/uploads\/dataproc.png 256w\" sizes=\"auto, (max-width: 150px) 100vw, 150px\" \/><\/p>\n<p>Having identified sensible data for your use case the next step is to take care of it to be compliant. This can be done by either completely dropping the data or anonymizing it, so it can not be related to a certain user. The latter requires a way to modify the data for your needs. For example different PII fields need different anonymization strategies \u2013 while a ZIP code needs to be truncated the customer should be hashed.<\/p>\n<p>Apache Spark is prominent for being a general purpose programming framework on big data which makes it a perfect fit to implement custom anonymization strategies on large datasets.<\/p>\n<p>As a general data processing framework it allows us to leverage general programming capacities &amp; language features of Scala, which helps scaling to a generic solution in code, instead of having very specific non-reusable SQL scripts for particular domain events.<\/p>\n<p>GCP offers a managed service to run your Apache Spark workloads and for provisioning\/sizing clusters respectively which is called Dataproc.<\/p>\n<p>Besides Spark Dataproc additionally offers some other known projects from the big data space for example Presto and Druid. For the scope of this articles architecture we focus on Dataproc Spark though.<br \/>\nDataproc also ships with a tight integration for the <a href=\"https:\/\/cloud.google.com\/blog\/products\/data-analytics\/getting-started-with-new-table-formats-on-dataproc\">Delta<\/a> .<\/p>\n<p>Hence the choice for Dataproc with Spark application is a very natural choice in such a big data setup with batch\/streaming workloads.<\/p>\n<h3>Composer<\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-32091 size-thumbnail\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/\/airflow-150x150.png\" alt=\"Google Cloud Composer Logo \" width=\"150\" height=\"150\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/airflow-150x150.png 150w, https:\/\/www.inovex.de\/wp-content\/uploads\/airflow.png 225w\" sizes=\"auto, (max-width: 150px) 100vw, 150px\" \/><\/p>\n<p>Cloud Composer is GCPs commercial managed service implementation of Apache Airflow. While it is technically not needed to make your data lake compliant, it can assist you in orchestrating the different steps needed for that (apply retention, DDR, &#8230;) in one central place.<\/p>\n<p>Furthermore you can easily schedule ressource creation and deletion of DataProc clusters by natice Airflow-Operators, which is in general a good pattern when using cloud resources, as this will decrease your cloud bill.<\/p>\n<p>Also it comes in very handy since you can easily execute the workloads based on a time schedule \u2013 exactly what you need to fulfill re-running retention in a moving window fashion.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Access-Layer\"><\/span>Access Layer<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<h3>BigQuery<\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-32093 size-thumbnail\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/\/bq-150x150.png\" alt=\"BigQuery Logo\" width=\"150\" height=\"150\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/bq-150x150.png 150w, https:\/\/www.inovex.de\/wp-content\/uploads\/bq.png 256w\" sizes=\"auto, (max-width: 150px) 100vw, 150px\" \/><\/p>\n<p>Having proper data pipelines for your GDPR in place is half of the battle. You want to make your data easily available throughout your company, to enable your data driven business.<br \/>\nThese requirements lead to one of the most prominent services on Google Cloud: BigQuery.<br \/>\nBigQuery is Google Cloud&#8217;s Data Warehouse that offers petabyte analytical capacities. More importantly, it integrates seamlessly with GCPs IAM which makes it easy to share datasets and manage (PII) permissions within your company.<\/p>\n<p>So while Delta Lake provides handy technical features as a storage system and a close integration with Spark, it is not the best interface for end customers on the Google Cloud. Here, BigQuery shines with blazing speed and interconnectivity to BI tools like Looker. Additionally it integrates seamlessly with other GCP services \u2013 most importantly, IAM &amp; Data Catalogue (see next section) which makes it very easy to manage access and permissions for datasets &amp; tables.<\/p>\n<p>These features make BigQuery the central interface for data consumption in our architecture. While the very access-restricted Delta Lake storage layer is used only by service accounts to transform raw data, BigQuery is used to serve the GDPR compliant data for product teams.<\/p>\n<h3>Data Catalog<\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-32097 size-thumbnail\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/\/dc-150x150.png\" alt=\"Data Catalog Logo\" width=\"150\" height=\"150\" \/><\/p>\n<p>Having sensible data in your access layer requires a regulated way of managing who is allowed to interact with that data. GCPs Data Catalog allows you to create so called policy tags. Policy tags are a great way to indicate sensible PII data in your table schemes and manage the access to that data respectively.<\/p>\n<p>Concretely, one could create a policy tag called &#8222;pii_user_name&#8220; and apply that policy tag to various BigQuery columns by editing the corresponding schema. By doing that, only IAM entities with the required permissions to this policy tag &#8222;pii_user_name&#8220; are allowed to query that column which holds the name of the user. This is a very easy way of granting access to sensible data since you probably have a properly defined IAM concept in place already.<\/p>\n<p>Most often it is convenient to group several policy tags together into a hierarchy \u2013 for example in a &#8222;highly-sensitive&#8220; (e.g. consisting of policy tags &#8222;pii_credit_card_number&#8220;, &#8230;) and &#8222;low-sensitive&#8220; group (&#8222;pii_timestamp&#8220;, &#8222;pii_location&#8220;, &#8230;). Data catalog allows this grouping with so-called taxonomies. This furthermore facilitates the process of granting access to columns since now you only need to grant access to the root node\/group, e.g. &#8222;highly-sensitive&#8220; and the user will have access to all policy tags defined within that group (see Pic 2).<\/p>\n<p>Furthermore, Data Catalog offers features for data discovery. Once a table is registered (for BQ this happens automatically), it is known to Data Catalog, yielding results if a search is executed for a column name or a description string (https:\/\/cloud.google.com\/data-catalog\/docs\/how-to\/search-reference).<br \/>\nIt is possible to specify the search query in detail, e.g. searching for specific Tags (for example PII).<\/p>\n<figure id=\"attachment_32085\" aria-describedby=\"caption-attachment-32085\" style=\"width: 1999px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-32085 size-full\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/access.png\" alt=\"table with policy tags and user id hashes\" width=\"1999\" height=\"663\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/access.png 1999w, https:\/\/www.inovex.de\/wp-content\/uploads\/access-300x99.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/access-1024x340.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/access-768x255.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/access-1536x509.png 1536w, https:\/\/www.inovex.de\/wp-content\/uploads\/access-1920x637.png 1920w, https:\/\/www.inovex.de\/wp-content\/uploads\/access-400x133.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/access-360x119.png 360w\" sizes=\"auto, (max-width: 1999px) 100vw, 1999px\" \/><figcaption id=\"caption-attachment-32085\" class=\"wp-caption-text\">The policy tag &#8222;PII&#8220; is applied on a BigQuery table to certain columns. If the user\/service account has no permission, the data can not be queried. Resources on how to use these concepts on GCP.<\/figcaption><\/figure>\n<h2><span class=\"ez-toc-section\" id=\"Architecture-assemblementdiscussion\"><\/span>Architecture assemblement\/discussion<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>&nbsp;<\/p>\n<figure id=\"attachment_32087\" aria-describedby=\"caption-attachment-32087\" style=\"width: 900px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-32087\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/architecture.png\" alt=\"scheme of the google cloud platform ecosystem\" width=\"900\" height=\"545\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/architecture.png 1618w, https:\/\/www.inovex.de\/wp-content\/uploads\/architecture-300x182.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/architecture-1024x620.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/architecture-768x465.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/architecture-1536x930.png 1536w, https:\/\/www.inovex.de\/wp-content\/uploads\/architecture-400x242.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/architecture-360x218.png 360w\" sizes=\"auto, (max-width: 900px) 100vw, 900px\" \/><figcaption id=\"caption-attachment-32087\" class=\"wp-caption-text\">The data is first ingested into Delta Lake by reading raw Landing data (GCS) and writing this data into delta tables. On these delta tables anonymization takes place (applying DDR &amp; retention) by scheduled Dataproc Jobs. Afterwards the data is provisioned to BigQuery.<\/figcaption><\/figure>\n<p>After having a look at the different components, let&#8217;s now review the architecture on a higher level.<br \/>\nIn this architecture, Delta is used as a storage layer with ACID capabilities to transform data with different anonymization strategies to meet GDPR requirements. This storage layer is very restricted in access and only usable for service accounts, since it holds sensitive data. This design choice was made because BigQuery provides many features in giving access via IAM and integrates well with solutions like data catalog.<\/p>\n<p>Furthermore, having Delta as a storage layer gives you more flexibility for downstream data applications \u2013 imagine that you want to have your data in a rather I\/O-intense use case instead of an analytical intense use case. Data could easily be shifted from the same storage location (Delta) to some specific database optimized for that usecase (e.g. BigTable). This also offers more flexibility in case of moving to another cloud or even to an on-premise architecture since Delta has wide support across all these platforms.<\/p>\n<p>In this architecture, Delta serves as a single source of truth: When a retention or DDR triggers the anonymization jobs (Spark on Dataproc), the corresponding data is anonymized and afterwards provisioned to BigQuery in the GDPR compliant form, ready for consumption.<\/p>\n<p>On the one hand, this separation of storage and access layer (Delta &amp; BigQuery) might result in more work and a higher cloud bill (since both systems hold the data). On the other hand, it makes your architecture more flexible and robust since data can be always re-provisioned from the access restricted storage layer, if something fails on the access layer.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Summary-Lookout\"><\/span>Summary &amp; Lookout<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>This article gives an overview on how to be compliant in the sense of DDR and retention and what services on the Google Cloud fit.<\/p>\n<p>This is one approach to how a GDPR-compliant data lake architecture could look like. A thorough analysis of your use cases should be taken into account beforehand. If your use case does not need to access data older than a few days or does not make use of PII data the easiest way to be GDPR-compliant is just dropping data or not ingesting sensitive fields. If you on the other hand need to make sense of your data, e.g. by joining anonymized data in a consistent way, a sophisticated approach &amp; architecture like the one described in this article is needed.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This blog post gives an overview of an example data lake architecture implemented on the Google Cloud Platform, which is capable of operating on petabyte data scenarios while being compliant with GDPR and still able to derive business insights &#8211; even though data is anonymized! Also, we will demonstrate how restricted data access management is [&hellip;]<\/p>\n","protected":false},"author":260,"featured_media":32347,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"ep_exclude_from_search":false,"footnotes":""},"tags":[77,181,71,385,147],"service":[446,414,411],"coauthors":[{"id":260,"display_name":"Kolja Maier","user_nicename":"kmaier"}],"class_list":["post-32222","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","tag-big-data","tag-business-intelligence","tag-cloud","tag-data-engineering","tag-google-cloud","service-business-intelligence","service-cloud","service-data-engineering"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>A GDPR-Compliant Data Lake Architecture on the Google Cloud Platform - inovex GmbH<\/title>\n<meta name=\"description\" content=\"This post gives an overview of a GDPR-compliant data lake architecture implemented on the Google Cloud Platform working with anonymized data!\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"A GDPR-Compliant Data Lake Architecture on the Google Cloud Platform - inovex GmbH\" \/>\n<meta property=\"og:description\" content=\"This post gives an overview of a GDPR-compliant data lake architecture implemented on the Google Cloud Platform working with anonymized data!\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/\" \/>\n<meta property=\"og:site_name\" content=\"inovex GmbH\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/inovexde\" \/>\n<meta property=\"article:published_time\" content=\"2021-10-25T04:51:55+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-11-21T10:35:40+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/google-cloud-data-lake@0.5x.png\" \/>\n\t<meta property=\"og:image:width\" content=\"960\" \/>\n\t<meta property=\"og:image:height\" content=\"540\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Kolja Maier\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/google-cloud-data-lake@0.5x.png\" \/>\n<meta name=\"twitter:creator\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:site\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"Kolja Maier\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"11\u00a0Minuten\" \/>\n\t<meta name=\"twitter:label3\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data3\" content=\"Kolja Maier\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/\"},\"author\":{\"name\":\"Kolja Maier\",\"@id\":\"https:\/\/www.inovex.de\/de\/#\/schema\/person\/cf95dd2e4dd018a16a457538186fcb9e\"},\"headline\":\"A GDPR-Compliant Data Lake Architecture on the Google Cloud Platform\",\"datePublished\":\"2021-10-25T04:51:55+00:00\",\"dateModified\":\"2022-11-21T10:35:40+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/\"},\"wordCount\":2028,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.inovex.de\/de\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.inovex.de\/wp-content\/uploads\/google-cloud-data-lake@0.5x.png\",\"keywords\":[\"Big Data\",\"Business Intelligence\",\"Cloud\",\"Data Engineering\",\"Google Cloud\"],\"articleSection\":[\"Analytics\",\"English Content\",\"General\"],\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/\",\"url\":\"https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/\",\"name\":\"A GDPR-Compliant Data Lake Architecture on the Google Cloud Platform - inovex GmbH\",\"isPartOf\":{\"@id\":\"https:\/\/www.inovex.de\/de\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.inovex.de\/wp-content\/uploads\/google-cloud-data-lake@0.5x.png\",\"datePublished\":\"2021-10-25T04:51:55+00:00\",\"dateModified\":\"2022-11-21T10:35:40+00:00\",\"description\":\"This post gives an overview of a GDPR-compliant data lake architecture implemented on the Google Cloud Platform working with anonymized data!\",\"breadcrumb\":{\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/#primaryimage\",\"url\":\"https:\/\/www.inovex.de\/wp-content\/uploads\/google-cloud-data-lake@0.5x.png\",\"contentUrl\":\"https:\/\/www.inovex.de\/wp-content\/uploads\/google-cloud-data-lake@0.5x.png\",\"width\":960,\"height\":540,\"caption\":\"Isometric illustration of a data lake with google clouds floating above\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.inovex.de\/de\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"A GDPR-Compliant Data Lake Architecture on the Google Cloud Platform\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.inovex.de\/de\/#website\",\"url\":\"https:\/\/www.inovex.de\/de\/\",\"name\":\"inovex GmbH\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/www.inovex.de\/de\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.inovex.de\/de\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.inovex.de\/de\/#organization\",\"name\":\"inovex GmbH\",\"url\":\"https:\/\/www.inovex.de\/de\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png\",\"contentUrl\":\"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png\",\"width\":1921,\"height\":1081,\"caption\":\"inovex GmbH\"},\"image\":{\"@id\":\"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/inovexde\",\"https:\/\/x.com\/inovexgmbh\",\"https:\/\/www.instagram.com\/inovexlife\/\",\"https:\/\/www.linkedin.com\/company\/inovex\",\"https:\/\/www.youtube.com\/channel\/UC7r66GT14hROB_RQsQBAQUQ\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.inovex.de\/de\/#\/schema\/person\/cf95dd2e4dd018a16a457538186fcb9e\",\"name\":\"Kolja Maier\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/www.inovex.de\/de\/#\/schema\/person\/image\/8c8a43b8dac94f85b7cd2eae2dc34818\",\"url\":\"https:\/\/www.inovex.de\/wp-content\/uploads\/cropped-kmaierr-96x96.jpg\",\"contentUrl\":\"https:\/\/www.inovex.de\/wp-content\/uploads\/cropped-kmaierr-96x96.jpg\",\"caption\":\"Kolja Maier\"},\"url\":\"https:\/\/www.inovex.de\/de\/blog\/author\/kmaier\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"A GDPR-Compliant Data Lake Architecture on the Google Cloud Platform - inovex GmbH","description":"This post gives an overview of a GDPR-compliant data lake architecture implemented on the Google Cloud Platform working with anonymized data!","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/","og_locale":"de_DE","og_type":"article","og_title":"A GDPR-Compliant Data Lake Architecture on the Google Cloud Platform - inovex GmbH","og_description":"This post gives an overview of a GDPR-compliant data lake architecture implemented on the Google Cloud Platform working with anonymized data!","og_url":"https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/","og_site_name":"inovex GmbH","article_publisher":"https:\/\/www.facebook.com\/inovexde","article_published_time":"2021-10-25T04:51:55+00:00","article_modified_time":"2022-11-21T10:35:40+00:00","og_image":[{"width":960,"height":540,"url":"https:\/\/www.inovex.de\/wp-content\/uploads\/google-cloud-data-lake@0.5x.png","type":"image\/png"}],"author":"Kolja Maier","twitter_card":"summary_large_image","twitter_image":"https:\/\/www.inovex.de\/wp-content\/uploads\/google-cloud-data-lake@0.5x.png","twitter_creator":"@inovexgmbh","twitter_site":"@inovexgmbh","twitter_misc":{"Verfasst von":"Kolja Maier","Gesch\u00e4tzte Lesezeit":"11\u00a0Minuten","Written by":"Kolja Maier"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/#article","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/"},"author":{"name":"Kolja Maier","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/cf95dd2e4dd018a16a457538186fcb9e"},"headline":"A GDPR-Compliant Data Lake Architecture on the Google Cloud Platform","datePublished":"2021-10-25T04:51:55+00:00","dateModified":"2022-11-21T10:35:40+00:00","mainEntityOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/"},"wordCount":2028,"commentCount":0,"publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/google-cloud-data-lake@0.5x.png","keywords":["Big Data","Business Intelligence","Cloud","Data Engineering","Google Cloud"],"articleSection":["Analytics","English Content","General"],"inLanguage":"de","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/","url":"https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/","name":"A GDPR-Compliant Data Lake Architecture on the Google Cloud Platform - inovex GmbH","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/#primaryimage"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/google-cloud-data-lake@0.5x.png","datePublished":"2021-10-25T04:51:55+00:00","dateModified":"2022-11-21T10:35:40+00:00","description":"This post gives an overview of a GDPR-compliant data lake architecture implemented on the Google Cloud Platform working with anonymized data!","breadcrumb":{"@id":"https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/#primaryimage","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/google-cloud-data-lake@0.5x.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/google-cloud-data-lake@0.5x.png","width":960,"height":540,"caption":"Isometric illustration of a data lake with google clouds floating above"},{"@type":"BreadcrumbList","@id":"https:\/\/www.inovex.de\/de\/blog\/a-gdpr-compliant-data-lake-architecture-on-the-google-cloud-platform\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.inovex.de\/de\/"},{"@type":"ListItem","position":2,"name":"A GDPR-Compliant Data Lake Architecture on the Google Cloud Platform"}]},{"@type":"WebSite","@id":"https:\/\/www.inovex.de\/de\/#website","url":"https:\/\/www.inovex.de\/de\/","name":"inovex GmbH","description":"","publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.inovex.de\/de\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Organization","@id":"https:\/\/www.inovex.de\/de\/#organization","name":"inovex GmbH","url":"https:\/\/www.inovex.de\/de\/","logo":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","width":1921,"height":1081,"caption":"inovex GmbH"},"image":{"@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/inovexde","https:\/\/x.com\/inovexgmbh","https:\/\/www.instagram.com\/inovexlife\/","https:\/\/www.linkedin.com\/company\/inovex","https:\/\/www.youtube.com\/channel\/UC7r66GT14hROB_RQsQBAQUQ"]},{"@type":"Person","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/cf95dd2e4dd018a16a457538186fcb9e","name":"Kolja Maier","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/image\/8c8a43b8dac94f85b7cd2eae2dc34818","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/cropped-kmaierr-96x96.jpg","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/cropped-kmaierr-96x96.jpg","caption":"Kolja Maier"},"url":"https:\/\/www.inovex.de\/de\/blog\/author\/kmaier\/"}]}},"_links":{"self":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/32222","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/users\/260"}],"replies":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/comments?post=32222"}],"version-history":[{"count":5,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/32222\/revisions"}],"predecessor-version":[{"id":32587,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/32222\/revisions\/32587"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media\/32347"}],"wp:attachment":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media?parent=32222"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/tags?post=32222"},{"taxonomy":"service","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/service?post=32222"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/coauthors?post=32222"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}