{"id":21072,"date":"2018-01-29T08:02:58","date_gmt":"2018-01-29T07:02:58","guid":{"rendered":"http:\/\/www.inovex.de\/blog\/?p=12424"},"modified":"2022-11-30T12:45:38","modified_gmt":"2022-11-30T11:45:38","slug":"hive-udf-lookups","status":"publish","type":"post","link":"https:\/\/www.inovex.de\/de\/blog\/hive-udf-lookups\/","title":{"rendered":"Writing a Hive UDF for lookups"},"content":{"rendered":"<p>In today&#8217;s blog I am going to take a look at a fairly mundane and unspectacular use of a Hive UDF (user-defined function), that of performing lookups against resources residing in the Hadoop file system (HDFS), specifically, other hive tables. Why would we do this when Hive provides this functionality via joins and the like? Well, non-equi joins (e.g. joins using a range of values) are not allowed in Hive, so in these cases the only options are to join on non-strict criteria (and then filter), or to write your own UDF, which is what we look at here.<!--more--><\/p>\n<p>Reading Hive resources should be fairly simple: after all Hive&#8217;s metastore <em>knows<\/em> all about its own HDFS resources, and so we can read the data into some kind of in-memory map and perform lookups to our heart&#8217;s content.<\/p>\n<p>No problem, we say to ourselves, we&#8217;ll just write a UDF that executes an HCatalog call against the metastore. So we set off on our Hive UDF odyssey, draft and deploy our first HCatalog-enabled lookup-tool, go off to enjoy a coffee, and then return to find that we have killed the metastore: maybe unleashing that job with oh-so-many mappers was not such a hot idea after all. Hmm&#8230;.<\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_79_2 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\"><p class=\"ez-toc-title\" style=\"cursor:inherit\"><\/p>\n<\/div><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.inovex.de\/de\/blog\/hive-udf-lookups\/#An-example-Hive-UDF\" >An example Hive UDF<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.inovex.de\/de\/blog\/hive-udf-lookups\/#Reviewing-our-approach\" >Reviewing our approach<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.inovex.de\/de\/blog\/hive-udf-lookups\/#Links\" >Links<\/a><\/li><\/ul><\/nav><\/div>\n<h2><span class=\"ez-toc-section\" id=\"An-example-Hive-UDF\"><\/span>An example Hive UDF<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>OK, so the HCatalog idea was nice, but let&#8217;s rein in our enthusiasm slightly and go a bit more low-level: we will write a UDF (in Java, not in <a href=\"https:\/\/www.inovex.de\/blog\/hive-udfs-and-udafs-with-python\/\" target=\"_blank\" rel=\"noopener\">Python<\/a>) to take an HDFS-path as one of its arguments. This will at least avoid addressing the metastore. Our skeleton- (and, for sake of space, simplified-) UDF will look something like this:<\/p>\n<pre class=\"lang:java decode:true\">public class LookupTaxCode extends GenericUDF {\r\n\r\n\tprivate ByteObjectInspector customerInspector;\r\n\r\n\tprivate ByteObjectInspector taxCodeInspector;\r\n\r\n\tprivate IntObjectInspector dateInspector;\r\n\r\n\tprivate StringObjectInspector fileInspector;\r\n\r\n\t\/*\r\n\r\n\t * this will be initialized in the initMap method: group by customer and\r\n\r\n\t * lower-range value (assuming there are no gaps)\r\n\r\n\t *\/\r\n\r\n\tprivate Map&lt;Integer, Map&lt;Integer, NavigableMap&lt;Integer, HiveDecimal&gt;&gt;&gt; lookup;\r\n\r\n\t@Override\r\n\r\n\tpublic ObjectInspector initialize(ObjectInspector[] args) throws UDFArgumentException {\r\n\r\n\t\tif (args.length &lt; 3) {\r\n\r\n\t\t\tthrow new UDFArgumentLengthException(\r\n\r\n\t\t\t\t\t\"This function needs a minimum of 3 arguments - customer, taxcode-ID, (active) date \"\r\n\r\n\t\t\t\t\t\t\t+ \"plus - optionally - source file (pipe-delimited)!\");\r\n\r\n\t\t}\r\n\r\n\t\tthis.customerInspector = (ByteObjectInspector) args[0];\r\n\r\n\t\tthis.taxCodeInspector = (ByteObjectInspector) args[1];\r\n\r\n\t\tthis.dateInspector = (IntObjectInspector) args[2];\r\n\r\n\t\tif (args.length &gt; 3) {\r\n\r\n\t\t\tthis.fileInspector = (StringObjectInspector) args[3];\r\n\r\n\t\t}\r\n\r\n\t\treturn PrimitiveObjectInspectorFactory.writableHiveDecimalObjectInspector;\r\n\r\n\t}\r\n\r\n\t@Override\r\n\r\n\tpublic HiveDecimalWritable evaluate(DeferredObject[] args) throws HiveException {\r\n\r\n\t\t\/* initialize lookup, if not yet done *\/\r\n\r\n\t\tif (lookup == null) {\r\n\r\n\t\t\tif (args.length &gt; 3) {\r\n\r\n\t\t\t\tinitHdfsLookup(fileInspector.getPrimitiveJavaObject(args[3].get()));\r\n\r\n\t\t\t}\r\n\r\n\t\t}\r\n\r\n\t\t\/* perform lookup *\/\r\n\r\n\t\tint customer = (int) customerInspector.get(args[0].get());\r\n\r\n\t\tint taxCodeId = (int) taxCodeInspector.get(args[1].get());\r\n\r\n\t\tint dateFrom = dateInspector.get(args[2].get());\r\n\r\n\t\tif (lookup.containsKey(customer) &amp;&amp; lookup.get(customer).containsKey(taxCodeId)) {\r\n\r\n\t\t\tNavigableMap&lt;Integer, HiveDecimal&gt; rr = lookup.get(customer).get(taxCodeId);\r\n\r\n\t\t\tEntry&lt;Integer, HiveDecimal&gt; floorEntry = rr.floorEntry(dateFrom);\r\n\r\n\t\t\treturn new HiveDecimalWritable(floorEntry == null ? HiveDecimal.create(0) : floorEntry.getValue());\r\n\r\n\t\t} else {\r\n\r\n\t\t\treturn null;\r\n\r\n\t\t}\r\n\r\n\t}\r\n\r\n\tprivate void initHdfsLookup(String lookupFile) throws HiveException {\r\n\r\n\t\ttry {\r\n\r\n\t\t\tConfiguration conf = new Configuration();\r\n\r\n\t\t\tPath filePath = new Path(lookupFile);\r\n\r\n\t\t\tFileSystem fs = FileSystem.get(filePath.toUri(), conf);\r\n\r\n\t\t\tFSDataInputStream in = fs.open(filePath);\r\n\r\n\t\t\tinitMap(in);\r\n\r\n\t\t} catch (Exception e) {\r\n\r\n\t\t\tthrow new HiveException(e + \": when attempting to access: \" + lookupFile);\r\n\r\n\t\t}\r\n\r\n\t}\r\n\r\n\tprotected void initMap(InputStream in) throws IOException {\r\n\r\n\t\t\/*\r\n\r\n\t\t * perform some lookup logic from named hdfs file here...\r\n\r\n\t\t *\/\r\n\r\n\t}\r\n\r\n\t@Override\r\n\r\n\tpublic String getDisplayString(String[] args) {\r\n\r\n\t\treturn \"Method call: lookup_taxcode(\" + args[0] + \", \" + args[1] + \", \" + args[2] + \")\";\r\n\r\n\t}\r\n\r\n}<\/pre>\n<p>with a callout such as:<\/p>\n<pre class=\"lang:pgsql decode:true\">select lookup_taxcode(12345, 100, 20180101, '\/hdfs\/path\/to\/partfile') from ...<\/pre>\n<p>where we supply some customer number (12345), a tax code (100), an active date as in integer (20180101), and the path to the hdfs resource.<\/p>\n<p>What is immediately obvious, though, is that this is not very elegant: we are expecting the end-user to know all the grubby details about which resources exist, and where they reside\u2014right down to part-file name. Surely we can do better\u2014how about hard-coding this into our UDF code? That would work but would also introduce a clear dependency, with the UDF being rendered invalid each and every time the resources no longer match, and, even worse, one has to have the code available to know which files we (the developers) have to ensure are available. Is there no way to provide this information at deploy time?<\/p>\n<p>Yes&#8230;and no. We can&#8217;t provide any variables as part of our CREATE FUNCTION DDL, but we can\u2014as of Hive 0.13 (see link below)\u2014add resources, rather like we do when we define which .jar file to use. The command looks like this:<\/p>\n<pre class=\"lang:pgsql decode:true\">DROP FUNCTION IF EXISTS lookup_taxcode\r\n\r\n;\r\n\r\nCREATE FUNCTION lookup_taxcode AS 'my.package.LookupTaxCode'\r\n\r\nUSING JAR 'hdfs:\/\/path\/to\/udf.jar',\r\n\r\nFILE 'hdfs:\/\/path\/to\/relevant\/partfile'\r\n\r\n;<\/pre>\n<p>where we can specify a resource when we create the UDF. In this way, dependencies are at least documented along with the UDF definition, which is progress of sorts. The Hive UDF is created for each session and, upon creation, the HDFS resource is copied to a local folder, from where we reference it like this:<\/p>\n<pre class=\"lang:java decode:true\">@Override\r\n\r\npublic HiveDecimalWritable evaluate(DeferredObject[] args) throws HiveException {\r\n\r\n\t\/*\r\n\r\n\t * initialize lookup, if not yet done\r\n\r\n\t *\/\r\n\r\n\tif (lookup == null) {\r\n\r\n\t\tif (args.length &gt; 3) {\r\n\r\n\t\t\tinitHdfsLookup(fileInspector.getPrimitiveJavaObject(args[3].get()));  \/\/ previously: lookup has to provided as an argument\r\n\r\n\t\t} else {\r\n\r\n\t\t\tinitCacheLookup(getResourcePath()); \/\/ now: lookup up from path defined in CREATE FUNCTION\r\n\r\n\t\t}\r\n\r\n\t}\r\n\r\n\t\/*\r\n\r\n\t * perform lookup\r\n\r\n\t *\/\r\n\r\n\t...\r\n\r\n}\r\n\r\nprivate void initCacheLookup(String lookupFile) throws HiveException {\r\n\r\n\tInputStream in;\r\n\r\n\ttry {\r\n\r\n\t\tin = new FileInputStream(getLookupFile(lookupFile));\r\n\r\n\t\tinitMap(in);\r\n\r\n\t} catch (Exception e) {\r\n\r\n\t\tthrow new HiveException(e + \": when attempting to access: \" + lookupFile);\r\n\r\n\t}\r\n\r\n}\r\n\r\nprotected File getLookupFile(String lookupFile) {\r\n\r\n\t\/* N.B. local resources (non-MR mode) *\/\r\n\r\n\tFile resourceDir = new File(SessionState.get().getConf().getVar(HiveConf.ConfVars.DOWNLOADED_RESOURCES_DIR));\r\n\r\n\tFile f = new File(resourceDir, lookupFile);\r\n\r\n\treturn f;\r\n\r\n}\r\n\r\nprotected String getResourcePath() throws HiveException {\r\n\r\n\treturn PART_FILE;\r\n\r\n}<\/pre>\n<p>So we deploy and test the UDF, only to find at times that we are confronted with messages informing us that the file cannot be found.<\/p>\n<p>This happens when we start a job that runs in map-reduce mode in the cluster, from where the UDF cannot see the local folder holding our resources. How we can write our resource-lookup code to be flexible enough to cope with both scenarios: local- or M\/R-mode?<\/p>\n<p>We can cover both eventualities, by dropping down to a second option if the first cannot detect a file. Note that for M\/R mode, the file is available via the distributed cache, which is local to where the UDF .jar has been started:<\/p>\n<pre class=\"lang:java decode:true\">protected File getLookupFile(String lookupFile) {\r\n\r\n\t\/* distributed cache *\/\r\n\r\n\tFile f = new File(lookupFile); \/\/ file available locally\r\n\r\n\tif (!f.exists()) {\r\n\r\n\t\t\/* local resources (non-MR mode) *\/\r\n\r\n\t\tFile resourceDir = new File(\r\n\r\n\t\t\t\tSessionState.get().getConf().getVar(HiveConf.ConfVars.DOWNLOADED_RESOURCES_DIR));\r\n\r\n\t\tf = new File(resourceDir, lookupFile);\r\n\r\n\t}\r\n\r\n\treturn f;\r\n\r\n}<\/pre>\n<h2><span class=\"ez-toc-section\" id=\"Reviewing-our-approach\"><\/span>Reviewing our approach<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>So now we are done\u2014let&#8217;s take a step back and review some of the drawbacks to this approach:<\/p>\n<ol>\n<li>we have to ensure that the single part file referenced in the UDF DDL is exactly and always present in HDFS (and in this context it is important to note that different tools &#8211; e.g. hive, spark &#8211; create part files with different naming conventions), and in the expected format that makes it simple to parse (e.g. pipe- or comma-delimited data)<\/li>\n<li>the file resource is integral to the function object: functions cannot be dropped if a FILE resource referenced in the CREATE STATEMENT no longer exists!<\/li>\n<li>the resource is copied to the local resources folder whenever it is instantiated by a session that invokes the UDF: so the resource can become stale if the lookup data changes within the scope of a session<\/li>\n<\/ol>\n<p>As mentioned at the start, it may well be that a UDF is superfluous to your requirements. In the case of equi-JOINs, hive will normally persist small joined tables in the distributed cache, and then reference them in much the same way as we have shown above. For non-equi-JOINs, though, that is not possible and a lookup against a small-ish dataset via UDF is worth considering (or you could perform the join, excluding the column where a non-equi join would be used, and then filter in the WHERE clause).<\/p>\n<p>So to conclude, we should try and balance the following considerations when using a Hive UDF for table lookups:<\/p>\n<ol>\n<li>Do we want to use the metastore? &#8211; HCatalog calls from every mapper may cause problems, although this may be the cleanest implementation<\/li>\n<li>Do we require the user to know about HDFS resources? &#8211; an alternative to HCatalog is to perform lookups directly against HDFS paths, but this requires the UDF-callout (and hence the user) to include the address of the HDFS resource; or we can&#8220;embed&#8220; the resource in the CREATE FUNCTION definition<\/li>\n<li>Are we performing lookups against dynamic data? &#8211;\u00a0 if so, make sure that it does not change in the course of your session<\/li>\n<li>Avoid assumptions about local- or yarn-mode &#8211; ideally we want our UDF to be insulated against the mode of operation<\/li>\n<\/ol>\n<h2><span class=\"ez-toc-section\" id=\"Links\"><\/span>Links<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><a href=\"https:\/\/cwiki.apache.org\/confluence\/display\/Hive\/LanguageManual+DDL#LanguageManualDDL-Create\/Drop\/ReloadFunction\" target=\"_blank\" rel=\"noopener\">Apache Hive Wiki: Create\/Drop\/Reload Function<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In today&#8217;s blog I am going to take a look at a fairly mundane and unspectacular use of a Hive UDF (user-defined function), that of performing lookups against resources residing in the Hadoop file system (HDFS), specifically, other hive tables. Why would we do this when Hive provides this functionality via joins and the like? [&hellip;]<\/p>\n","protected":false},"author":49,"featured_media":13230,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"ep_exclude_from_search":false,"footnotes":""},"tags":[77,207],"service":[411],"coauthors":[{"id":49,"display_name":"Andrew Kenworthy","user_nicename":"akenworthy"}],"class_list":["post-21072","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","tag-big-data","tag-hive","service-data-engineering"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Writing a Hive UDF for lookups - inovex GmbH<\/title>\n<meta name=\"description\" content=\"Let&#039;s use a Hive UDF to perform lookups against resources residing in the Hadoop file system (HDFS) which allows non-equi joins.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.inovex.de\/de\/blog\/hive-udf-lookups\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Writing a Hive UDF for lookups - inovex GmbH\" \/>\n<meta property=\"og:description\" content=\"Let&#039;s use a Hive UDF to perform lookups against resources residing in the Hadoop file system (HDFS) which allows non-equi joins.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.inovex.de\/de\/blog\/hive-udf-lookups\/\" \/>\n<meta property=\"og:site_name\" content=\"inovex GmbH\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/inovexde\" \/>\n<meta property=\"article:published_time\" content=\"2018-01-29T07:02:58+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-11-30T11:45:38+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2018\/01\/hive-lookups-udf.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1920\" \/>\n\t<meta property=\"og:image:height\" content=\"1080\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Andrew Kenworthy\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2018\/01\/hive-lookups-udf-1024x576.png\" \/>\n<meta name=\"twitter:creator\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:site\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"Andrew Kenworthy\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"7\u00a0Minuten\" \/>\n\t<meta name=\"twitter:label3\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data3\" content=\"Andrew Kenworthy\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/hive-udf-lookups\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/hive-udf-lookups\/\"},\"author\":{\"name\":\"Andrew Kenworthy\",\"@id\":\"https:\/\/www.inovex.de\/de\/#\/schema\/person\/0519169c755e15b1478ccf638f16f06c\"},\"headline\":\"Writing a Hive UDF for lookups\",\"datePublished\":\"2018-01-29T07:02:58+00:00\",\"dateModified\":\"2022-11-30T11:45:38+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/hive-udf-lookups\/\"},\"wordCount\":1014,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.inovex.de\/de\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/hive-udf-lookups\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.inovex.de\/wp-content\/uploads\/2018\/01\/hive-lookups-udf.png\",\"keywords\":[\"Big Data\",\"Hive\"],\"articleSection\":[\"Analytics\",\"English Content\",\"General\"],\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.inovex.de\/de\/blog\/hive-udf-lookups\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/hive-udf-lookups\/\",\"url\":\"https:\/\/www.inovex.de\/de\/blog\/hive-udf-lookups\/\",\"name\":\"Writing a Hive UDF for lookups - inovex GmbH\",\"isPartOf\":{\"@id\":\"https:\/\/www.inovex.de\/de\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/hive-udf-lookups\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/hive-udf-lookups\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.inovex.de\/wp-content\/uploads\/2018\/01\/hive-lookups-udf.png\",\"datePublished\":\"2018-01-29T07:02:58+00:00\",\"dateModified\":\"2022-11-30T11:45:38+00:00\",\"description\":\"Let's use a Hive UDF to perform lookups against resources residing in the Hadoop file system (HDFS) which allows non-equi joins.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/hive-udf-lookups\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.inovex.de\/de\/blog\/hive-udf-lookups\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/hive-udf-lookups\/#primaryimage\",\"url\":\"https:\/\/www.inovex.de\/wp-content\/uploads\/2018\/01\/hive-lookups-udf.png\",\"contentUrl\":\"https:\/\/www.inovex.de\/wp-content\/uploads\/2018\/01\/hive-lookups-udf.png\",\"width\":1920,\"height\":1080,\"caption\":\"Writing a Hive UDF for lookups\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/hive-udf-lookups\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.inovex.de\/de\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Writing a Hive UDF for lookups\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.inovex.de\/de\/#website\",\"url\":\"https:\/\/www.inovex.de\/de\/\",\"name\":\"inovex GmbH\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/www.inovex.de\/de\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.inovex.de\/de\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.inovex.de\/de\/#organization\",\"name\":\"inovex GmbH\",\"url\":\"https:\/\/www.inovex.de\/de\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png\",\"contentUrl\":\"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png\",\"width\":1921,\"height\":1081,\"caption\":\"inovex GmbH\"},\"image\":{\"@id\":\"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/inovexde\",\"https:\/\/x.com\/inovexgmbh\",\"https:\/\/www.instagram.com\/inovexlife\/\",\"https:\/\/www.linkedin.com\/company\/inovex\",\"https:\/\/www.youtube.com\/channel\/UC7r66GT14hROB_RQsQBAQUQ\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.inovex.de\/de\/#\/schema\/person\/0519169c755e15b1478ccf638f16f06c\",\"name\":\"Andrew Kenworthy\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/www.inovex.de\/de\/#\/schema\/person\/image\/7397755342ed757eeb6b1d51f16a4044\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/c7a29df25f27010b3c581f97c66a52694571cfa2f9c9b79049542969194fbdd3?s=96&d=retro&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/c7a29df25f27010b3c581f97c66a52694571cfa2f9c9b79049542969194fbdd3?s=96&d=retro&r=g\",\"caption\":\"Andrew Kenworthy\"},\"url\":\"https:\/\/www.inovex.de\/de\/blog\/author\/akenworthy\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Writing a Hive UDF for lookups - inovex GmbH","description":"Let's use a Hive UDF to perform lookups against resources residing in the Hadoop file system (HDFS) which allows non-equi joins.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.inovex.de\/de\/blog\/hive-udf-lookups\/","og_locale":"de_DE","og_type":"article","og_title":"Writing a Hive UDF for lookups - inovex GmbH","og_description":"Let's use a Hive UDF to perform lookups against resources residing in the Hadoop file system (HDFS) which allows non-equi joins.","og_url":"https:\/\/www.inovex.de\/de\/blog\/hive-udf-lookups\/","og_site_name":"inovex GmbH","article_publisher":"https:\/\/www.facebook.com\/inovexde","article_published_time":"2018-01-29T07:02:58+00:00","article_modified_time":"2022-11-30T11:45:38+00:00","og_image":[{"width":1920,"height":1080,"url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2018\/01\/hive-lookups-udf.png","type":"image\/png"}],"author":"Andrew Kenworthy","twitter_card":"summary_large_image","twitter_image":"https:\/\/www.inovex.de\/wp-content\/uploads\/2018\/01\/hive-lookups-udf-1024x576.png","twitter_creator":"@inovexgmbh","twitter_site":"@inovexgmbh","twitter_misc":{"Verfasst von":"Andrew Kenworthy","Gesch\u00e4tzte Lesezeit":"7\u00a0Minuten","Written by":"Andrew Kenworthy"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.inovex.de\/de\/blog\/hive-udf-lookups\/#article","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/blog\/hive-udf-lookups\/"},"author":{"name":"Andrew Kenworthy","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/0519169c755e15b1478ccf638f16f06c"},"headline":"Writing a Hive UDF for lookups","datePublished":"2018-01-29T07:02:58+00:00","dateModified":"2022-11-30T11:45:38+00:00","mainEntityOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/hive-udf-lookups\/"},"wordCount":1014,"commentCount":0,"publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/hive-udf-lookups\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2018\/01\/hive-lookups-udf.png","keywords":["Big Data","Hive"],"articleSection":["Analytics","English Content","General"],"inLanguage":"de","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.inovex.de\/de\/blog\/hive-udf-lookups\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.inovex.de\/de\/blog\/hive-udf-lookups\/","url":"https:\/\/www.inovex.de\/de\/blog\/hive-udf-lookups\/","name":"Writing a Hive UDF for lookups - inovex GmbH","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/hive-udf-lookups\/#primaryimage"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/hive-udf-lookups\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2018\/01\/hive-lookups-udf.png","datePublished":"2018-01-29T07:02:58+00:00","dateModified":"2022-11-30T11:45:38+00:00","description":"Let's use a Hive UDF to perform lookups against resources residing in the Hadoop file system (HDFS) which allows non-equi joins.","breadcrumb":{"@id":"https:\/\/www.inovex.de\/de\/blog\/hive-udf-lookups\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.inovex.de\/de\/blog\/hive-udf-lookups\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/blog\/hive-udf-lookups\/#primaryimage","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2018\/01\/hive-lookups-udf.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2018\/01\/hive-lookups-udf.png","width":1920,"height":1080,"caption":"Writing a Hive UDF for lookups"},{"@type":"BreadcrumbList","@id":"https:\/\/www.inovex.de\/de\/blog\/hive-udf-lookups\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.inovex.de\/de\/"},{"@type":"ListItem","position":2,"name":"Writing a Hive UDF for lookups"}]},{"@type":"WebSite","@id":"https:\/\/www.inovex.de\/de\/#website","url":"https:\/\/www.inovex.de\/de\/","name":"inovex GmbH","description":"","publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.inovex.de\/de\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Organization","@id":"https:\/\/www.inovex.de\/de\/#organization","name":"inovex GmbH","url":"https:\/\/www.inovex.de\/de\/","logo":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","width":1921,"height":1081,"caption":"inovex GmbH"},"image":{"@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/inovexde","https:\/\/x.com\/inovexgmbh","https:\/\/www.instagram.com\/inovexlife\/","https:\/\/www.linkedin.com\/company\/inovex","https:\/\/www.youtube.com\/channel\/UC7r66GT14hROB_RQsQBAQUQ"]},{"@type":"Person","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/0519169c755e15b1478ccf638f16f06c","name":"Andrew Kenworthy","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/image\/7397755342ed757eeb6b1d51f16a4044","url":"https:\/\/secure.gravatar.com\/avatar\/c7a29df25f27010b3c581f97c66a52694571cfa2f9c9b79049542969194fbdd3?s=96&d=retro&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/c7a29df25f27010b3c581f97c66a52694571cfa2f9c9b79049542969194fbdd3?s=96&d=retro&r=g","caption":"Andrew Kenworthy"},"url":"https:\/\/www.inovex.de\/de\/blog\/author\/akenworthy\/"}]}},"_links":{"self":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/21072","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/users\/49"}],"replies":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/comments?post=21072"}],"version-history":[{"count":1,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/21072\/revisions"}],"predecessor-version":[{"id":39710,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/21072\/revisions\/39710"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media\/13230"}],"wp:attachment":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media?parent=21072"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/tags?post=21072"},{"taxonomy":"service","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/service?post=21072"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/coauthors?post=21072"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}