{"id":21110,"date":"2019-08-13T08:06:04","date_gmt":"2019-08-13T06:06:04","guid":{"rendered":"https:\/\/www.inovex.de\/blog\/?p=16560"},"modified":"2022-11-24T10:37:06","modified_gmt":"2022-11-24T09:37:06","slug":"digitize-receipts-computer-vision","status":"publish","type":"post","link":"https:\/\/www.inovex.de\/de\/blog\/digitize-receipts-computer-vision\/","title":{"rendered":"Digitize your Receipts using Computer Vision"},"content":{"rendered":"<p>\u201cWould you like the receipt?\u201c\u2014It\u2019s hard to say no to that. Not because you actually want it (you may even throw it in the trash before exiting the store), but because doing otherwise might hurt the feelings of the cashier. But if you take a closer look, you will discover that a receipt carries all kinds of wonderful information. The most obvious data point is the total amount, which is useful in tracking your monthly expenses. Close to that you\u2019ll typically find a list of individual items. This you could use to inform your flatmate or spouse that you already bought the milk (but forgot the toilet paper) they wanted to pick up after work. The date of purchase might be useful if you need to decide if the eggs in the fridge are still safe to eat. Tracking receipts over a longer period of time might reveal trends in item prices and maybe even tell stories about your life, like that time you decided to eat more healthy food.<!--more--><\/p>\n<p>Alas, all these use-cases require the information to be digital instead of being printed on thermal paper! Until retail decides to start using digital receipts, that means that you\u2019ll have to manually type the information into a computer\u2014and we can probably agree that while you may record some of that information (like the total amount), transcribing all of it is simply not realistic. But isn\u2019t this the age of automation where anything is possible? If we can put self-driving cars into space, we can surely digitize the information on a receipt, can\u2019t we? Well yes, we can. Google Lens, Evernote, Expensify, PaperScan and taggun.io are just some of the many apps and services you could use. So problem solved, right? OK, thanks, bye!<\/p>\n<p>But wait! Don\u2019t you want to know <em>how<\/em> these apps do it? Then you have come to the right place, because in the following I will describe our take on recreating the basic functions of the aforementioned apps, show the engineering involved in tackling this task, describe what did not work the way we thought and why that may have been the case. And yes, there will be code so you can try it out for yourself!<\/p>\n<p>Besides, receipt digitization is actually a great example for computer vision in general, where the goal is to extract information from raw pixels. The approaches and techniques outlined here can certainly be modified to cater to other use-cases as well. And believe it or not: receipt understanding is actually an active area of research\u2014take a look at this <a href=\"https:\/\/hal.archives-ouvertes.fr\/hal-01654191\/document\">2017 paper by Raoui-Outach et al.<\/a> or the <a href=\"https:\/\/rrc.cvc.uab.es\/?ch=13&amp;com=introduction\">ICDAR 2019 Robust Reading Challenge<\/a>.<\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_85 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\"><p class=\"ez-toc-title\" style=\"cursor:inherit\"><\/p>\n<\/div><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.inovex.de\/de\/blog\/digitize-receipts-computer-vision\/#System-Overview-What-Will-It-Do\" >System Overview: What Will It Do?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.inovex.de\/de\/blog\/digitize-receipts-computer-vision\/#Step-0-Aquire-Some-Data\" >Step 0: Aquire Some Data<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.inovex.de\/de\/blog\/digitize-receipts-computer-vision\/#Step-1-Detect-the-Receipt\" >Step 1: Detect the Receipt<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.inovex.de\/de\/blog\/digitize-receipts-computer-vision\/#Step-11-Image-Preprocessing\" >Step 1.1: Image Preprocessing<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.inovex.de\/de\/blog\/digitize-receipts-computer-vision\/#Step-12-Receipt-Detection\" >Step 1.2: Receipt Detection<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.inovex.de\/de\/blog\/digitize-receipts-computer-vision\/#Step-13-Corner-Detection\" >Step 1.3: Corner Detection<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.inovex.de\/de\/blog\/digitize-receipts-computer-vision\/#What-Didnt-Work\" >What Didn\u2019t Work?<\/a><ul class='ez-toc-list-level-4' ><li class='ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/www.inovex.de\/de\/blog\/digitize-receipts-computer-vision\/#Corner-Detection\" >Corner Detection<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/www.inovex.de\/de\/blog\/digitize-receipts-computer-vision\/#Active-Contours\" >Active Contours<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/www.inovex.de\/de\/blog\/digitize-receipts-computer-vision\/#Black-box-Optimization\" >Black-box Optimization<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/www.inovex.de\/de\/blog\/digitize-receipts-computer-vision\/#Deep-Learning\" >Deep Learning<\/a><\/li><\/ul><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/www.inovex.de\/de\/blog\/digitize-receipts-computer-vision\/#Step-2-Cropping-De-skewing-and-Contrast-Enhancement\" >Step 2: Cropping, De-skewing and Contrast Enhancement<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/www.inovex.de\/de\/blog\/digitize-receipts-computer-vision\/#What-Didnt-Work-2\" >What Didn\u2019t Work?<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-14\" href=\"https:\/\/www.inovex.de\/de\/blog\/digitize-receipts-computer-vision\/#Step-3-Information-Extraction\" >Step 3: Information Extraction<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-15\" href=\"https:\/\/www.inovex.de\/de\/blog\/digitize-receipts-computer-vision\/#Where-to-Go-from-here\" >Where to Go from here?<\/a><\/li><\/ul><\/nav><\/div>\n<h2><span class=\"ez-toc-section\" id=\"System-Overview-What-Will-It-Do\"><\/span>System Overview: What Will It Do?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Given a picture of a receipt, perhaps taken with your smartphone, this is what we want to achieve:<\/p>\n<ol type=\"1\">\n<li>Find the receipt in the image,<\/li>\n<li>Enhance the image by cropping the receipt, correcting for perspective distortion and increasing text contrast.<\/li>\n<li>Extract the text in machine-readable form, i.e., do optical character recognition.<\/li>\n<\/ol>\n<p>To keep this post somewhat contained, we will ignore all the steps that come before and after. In particular, we will not cover how to actually take the picture, nor will we discuss what to do with the information once it is extracted. We will also assume that the pictures are of reasonable quality and resolution, that a receipt is present and covers most of the picture.<\/p>\n<p>This leaves us with these three steps: First, detect the receipt; second, extract the receipt and improve contrast; third, extract the text:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-16562 size-full\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/07\/steps.png\" alt=\"Three steps for receipt analyisis\" width=\"988\" height=\"296\" \/><\/p>\n<p>As promised, we will describe the details of each step in the following. But first:<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Step-0-Aquire-Some-Data\"><\/span>Step 0: Aquire Some Data<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Before we can start extracting text, we need something to extract it from. A quick search reveals that there are indeed some datasets we could use. However, none seem to fit our goal just right. For example, ExpressExpense\u2019s <a href=\"https:\/\/expressexpense.com\/view-receipts.php\">Massive Receipt Archive<\/a> has many images of receipts, but overall resolution is quite low, the receipts are not always fully contained in the image, and there is a watermark at the bottom of each image. <a href=\"https:\/\/github.com\/JensWalter\/my-receipts\">Jens Walter\u2019s receipts<\/a>, on the other hand, are very high quality scans, but are already cropped so that the image contains nothing but the receipt.<\/p>\n<p>So we decided to collect our own data instead. This way, we could ensure the quality as well as the difficulty of the images. We collected 68 images in total, where 9, 11, 7, 10 and 12 receipts were from the German grocery chains Aldi, Edeka, Lidl, Rewe and Scheck-In, respectively. The remaining 19 receipts were all from different stores. For good measure, we also threw in a public transport ticket, because: why not?<\/p>\n<p>We made sure that the background was not too busy and that the contrast between it and the receipt was high. All receipts were oriented more or less upright without too much perspective distortion to not make our lives harder than necessary. However, most of the receipts were crumpled during transport so that they had folds and creases and some had washed out letters as well to make the task not too easy after all. Below you can see some examples of the images we collected:<\/p>\n<p align=\"center\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-16563 alignnone\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/07\/IMG_20190515_222248-225x300.jpg\" alt=\"\" width=\"225\" height=\"300\" \/> <img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-16564 alignnone\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/07\/IMG_20190515_222455-225x300.jpg\" alt=\"\" width=\"225\" height=\"300\" \/> <img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-16565 alignnone\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/07\/IMG_20190517_210114_151-225x300.jpg\" alt=\"\" width=\"225\" height=\"300\" \/><\/p>\n<h2 id=\"step-1-detect-the-receipt\"><span class=\"ez-toc-section\" id=\"Step-1-Detect-the-Receipt\"><\/span>Step 1: Detect the Receipt<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>The goal in this step is to detect the areas of the image that show the receipt. Since a receipt in general is a rectangular piece of paper, it makes sense to describe that region using a quadrilateral, i.e., a polygon of four vertices \\(\\{\\mathbf p_1,\u00a0\\mathbf p_2,\u00a0\\mathbf p_3,\u00a0\\mathbf p_4\\}\\). The vertices should be chosen such that the polygon covers as much of the receipt and as little of the background as possible. A picture is worth a thousand words, so here is an example of what we want:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-16581 aligncenter\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/07\/step6-2-best-225x300.jpg\" alt=\"\" width=\"225\" height=\"300\" \/><\/p>\n<p>Surprisingly, this turned out to be the most complicated part of the whole project. The process can again be broken down into three parts: image preprocessing, receipt detection, and finally estimating the corners of the polygon. Let\u2019s start at the beginning.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Step-11-Image-Preprocessing\"><\/span>Step 1.1: Image Preprocessing<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Though color may provide useful information to discriminate fore- and background, we found that the contrast between mostly white and smooth receipt and mostly non-white, generally non-smooth background was large enough to separate the two. Thus, we converted the images to grayscale, which conveniently reduces the data by two thirds and speeds up the subsequent processing. This is common in many computer vision applications, as texture often (but not always) carries more relevant information than color<a id=\"fnref1\" class=\"footnote-ref\" role=\"doc-noteref\" href=\"#fn1\"><sup>1<\/sup><\/a>, and including color is essentially a source of noise that machine learning could overfit on.<\/p>\n<p>We also normalized the global illumination by removing slow illumination gradients using an old image processing trick: Slow gradients correspond to low frequency modulation of the image intensity, thus filtering the image with a high-pass filter should even out the global illumination. This high-pass filter can efficiently be implemented using the discrete cosine transform (DCT): Transform the image into frequency space, zero out the low frequency components, and transform the image back into the spatial domain. Note that we use the DCT instead of the discrete Fourier transform (DFT) to avoid dealing with complex numbers which inevitably arise when zeroing out the low frequency DFT components. Since the whole process changes the magnitude of all pixels, we finally normalize the pixels to the range [0:1]. The result of the preprocessing can be seen here:<\/p>\n<p align=\"center\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-16564 alignnone\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/07\/IMG_20190515_222455-225x300.jpg\" alt=\"\" width=\"225\" height=\"300\" \/> <img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-16567 alignnone\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/07\/step1-1-gray-225x300.jpg\" alt=\"\" width=\"225\" height=\"300\" \/> <img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-16568 alignnone\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/07\/step1-2-afterdct-225x300.jpg\" alt=\"\" width=\"225\" height=\"300\" \/><\/p>\n<p>With the help of both scipy and scikit-image, we can accomplish the above in just a few lines of Python:<\/p>\n<pre class=\"sourceCode python\"><code>from skimage import io, color\r\n\r\nfrom scipy.fftpack import dct, idct\r\n\r\nimg = io.imread(&lt;image file&gt;)\r\n\r\ngray = color.rgb2gray(img)\r\n\r\nfrequencies = dct(dct(gray, axis=0), axis=1)\r\n\r\nfrequencies[:2,:2] = 0\r\n\r\ngray = idct(idct(frequencies, axis=1), axis=0)\r\n\r\ngray = (gray - gray.min()) \/ (gray.max() - gray.min()) # renormalize to range [0:1]<\/code><\/pre>\n<h3><span class=\"ez-toc-section\" id=\"Step-12-Receipt-Detection\"><\/span>Step 1.2: Receipt Detection<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Given the preprocessed images, the next step is to segment the image into pixels that show the receipt and those that do not. We tried several different methods ranging from automatic thresholding according to Otsu to Superpixel Segmentation. In the end, however, it turned out that a simple global threshold followed by some binary morphology and blob detection worked just fine. More specifically, we blur the image to suppress noise and apply a threshold at 60% intensity to get an initial segmentation. From there, we apply binary closing to remove small false detections in the background and fill defects along the contour. Hole-filling closes the holes caused by larger texts, logos and bar codes on the receipt. Finally, we discard all but the largest blob in the image:<\/p>\n<p align=\"center\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-16569 alignnone\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/07\/step2-0-binary-225x300.jpg\" alt=\"\" width=\"225\" height=\"300\" \/> <img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-16571 alignnone\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/07\/step2-2-fill-225x300.jpg\" alt=\"\" width=\"225\" height=\"300\" \/> <img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-16572 alignnone\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/07\/step2-3-largest-225x300.jpg\" alt=\"\" width=\"225\" height=\"300\" \/><\/p>\n<p>Again, scipy and scikit-image make all this very easy:<\/p>\n<pre class=\"python\"><code>from skimage import filters, morphology, measure\r\n\r\nfrom scipy.ndimage.morphology import binary_fill_holes\r\n\r\nmask = filters.gaussian(gray, 2) &gt; 0.6\r\n\r\nmask = morphology.binary_closing(mask, selem=morphology.disk(2, bool))\r\n\r\nmask = binary_fill_holes(mask, structure=morphology.disk(3, bool))\r\n\r\nmask = measure.label(mask)\r\n\r\nmask = (mask == 1 + argmax([r.filled_area for r in measure.regionprops(mask)]))<\/code><\/pre>\n<p>Note: This approach will inevitably fail if the background has a similar brightness as the receipt. In this case, you could compute the edge image using gradient operators, the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Canny_edge_detector\">Canny<\/a> or <a href=\"https:\/\/en.wikipedia.org\/wiki\/Deriche_edge_detector\">Deriche<\/a> edge detectors, or some other algorithm. Regardless of the method, you will need to filter out edges detected in the background and on the receipt (e.g., from text and creases), which is non-trivial and the reason we chose the simple method above instead.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Step-13-Corner-Detection\"><\/span>Step 1.3: Corner Detection<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>At this stage you might think that you could simply place the polygon vertices in the corners of the outline. Indeed, this was the first thing we tried and while this works here, in the general case it will not. Consider the rightmost receipt (from \u201cAldi\u201c) of the examples presented above: Here, putting the vertices in the corners would make the polygon clip the area on the left of the receipt. Additionally, the lower left corner has a very obtuse angle, which gives a hint that reliable corner detection may not be as easy as measuring the internal angles. In other cases, the receipt may be damaged, have rounded corners or even corners in weird places caused by an uneven tear.<\/p>\n<p>For a more robust approach, we instead focus on the edges of the receipt. Specifically, we compute the outlines of the receipt from the foreground mask (again using binary morphology) and then apply a probabilistic Hough transform to get the start and end points of the line segments in the image:<\/p>\n<p align=\"center\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-16573 alignnone\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/07\/step3-1-edges-225x300.jpg\" alt=\"\" width=\"225\" height=\"300\" \/> <img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-16574 alignnone\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/07\/step4-segments-225x300.jpg\" alt=\"\" width=\"225\" height=\"300\" \/><\/p>\n<pre class=\"sourceCode python\"><code>from skimage import transform\r\n\r\nedges = mask ^ morphology.binary_erosion(mask, selem=morphology.disk(2, bool))\r\n\r\nsegments = np.array(transform.probabilistic_hough_line(edges))\r\n\r\nangles = np.array([np.abs(np.atan2(a[1]-b[1], a[0]-b[0]) - np.pi\/2) for a,b in segments)\r\n\r\nverticalSegments = segments[angles &lt; np.pi\/4] horizontal = segments[angles &gt;= np.pi\/4]<\/code><\/pre>\n<p>As you can see in the image and the code above, we also sorted the segments into horizontal and vertical segments. Then, we compute the intersection of each pair of horizontal and vertical segments (green blobs) to get a list of corner candidates, which we reduce to more a reasonable number with mean shift (red crosses):<\/p>\n<p align=\"center\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-16598 alignnone\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/07\/step5-1-canddates-1-225x300.jpg\" alt=\"\" width=\"225\" height=\"300\" \/> <img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-16599 alignnone\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/07\/step5-2-meanshift-1-225x300.jpg\" alt=\"\" width=\"225\" height=\"300\" \/><\/p>\n<pre class=\"sourceCode python\"><code>from sklearn import cluster\r\n\r\nintersections = [lineIntersection(vs, hs) for vs in verticalSegments for hs in horizontalSegments]\r\n\r\n# lineIntersection(s1, s2) is left as an exercise to the reader\r\n\r\nbw = cluster.estimate_bandwidth(intersections, 0.1)\r\n\r\ncorners = cluster.MeanShift(bandwidth=bw).fit(intersections).cluster_centers_<\/code><\/pre>\n<p>This is necessary to avoid the combinatorial explosion in the next step, where we construct a polygon candidate from every possible combination of four points, e.g.:<\/p>\n<p align=\"center\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-16577 alignnone\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/07\/step6-1-cand-1-225x300.jpg\" alt=\"\" width=\"225\" height=\"300\" \/> <img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-16578 alignnone\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/07\/step6-1-cand-25-225x300.jpg\" alt=\"\" width=\"225\" height=\"300\" \/> <img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-16580 alignnone\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/07\/step6-1-cand-122-225x300.jpg\" alt=\"\" width=\"225\" height=\"300\" \/><\/p>\n<p>In this example, there are 10 corner candidates which result in 131 possible polygons<a id=\"fnref2\" class=\"footnote-ref\" role=\"doc-noteref\" href=\"#fn2\"><sup>2<\/sup><\/a>. To determine the best of those, we find the candidate that has the highest agreement with the segments found by the Hough transform. Here, <em>agreement<\/em> is measured by the distance of the segments found by the Hough transform to the edges of the polygon. In the ongoing example, this is the candidate with the highest agreement:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-16581 aligncenter\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/07\/step6-2-best-225x300.jpg\" alt=\"\" width=\"225\" height=\"300\" \/><\/p>\n<p>Quite a good fit, don\u2019t you think? The code to compute agreement is a little more involved than the code for the other steps, which is why we don\u2019t show it here. It really boils down to high-school math though, so you will have no trouble figuring it out for yourself. Doing so might also be a good opportunity to try other ways to measure agreement, e.g., by computing the overlap of the polygon and the foreground mask or by rating the area and internal angles of the candidate.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"What-Didnt-Work\"><\/span>What Didn\u2019t Work?<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Of course, all of the above did not just work out of the box. We tried several different approaches to detect the receipts, each with different levels of success. Here is a selection of those that failed:<\/p>\n<h4><span class=\"ez-toc-section\" id=\"Corner-Detection\"><\/span>Corner Detection<span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p>Our first impulse was to use a standard <a href=\"https:\/\/en.wikipedia.org\/wiki\/Corner_detection\">corner detector<\/a> to, well, detect the corners of the receipt. As described above, the corners of the receipt are not necessarily the corners of the polygon we care about, so this is set up to fail. However, classic corner detection failed one step sooner, since it yielded <em>way<\/em> too many corner candidates both in the background and on the receipt. Filtering out the correct ones using SIFT and related descriptors did not work as intended.<\/p>\n<h4><span class=\"ez-toc-section\" id=\"Active-Contours\"><\/span>Active Contours<span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p>Another standard approach for contour fitting goes back to <a href=\"https:\/\/link.springer.com\/article\/10.1007\/BF00133570\">Kass, Witkin and Terzopoulos<\/a>: Active Contour Models, also known as Snakes. <a href=\"https:\/\/en.wikipedia.org\/wiki\/Active_contour_model#Energy_formulation\">The math<\/a> may look somewhat daunting, but the underlying idea is quite appealing: Forces push the contour towards nearby edges and other interesting features in the image, while, at the same time, the contour tries to resist deformation. Picture an active contour as a balloon inside a container: When filled with air, it will expand and eventually stop expanding at the boundaries of that container. Snakes can also be contracting, like a balloon that wraps around an object inside it once you let out the air. As the method is quite mature, there are many implementations available, like <a href=\"https:\/\/scikit-image.org\/docs\/stable\/api\/skimage.segmentation.html#active-contour\">the one found in scikit-image<\/a>.<\/p>\n<p>The main issue, like so often, was finding a good initialization: When the initial contour was put inside the receipt, it would often snap onto the text instead of the outlines of the receipt. When put outside, the snake would often get stuck on other edges in the background. Additionally, the algorithm is much, much slower than the simple method outlined above.<\/p>\n<h4><span class=\"ez-toc-section\" id=\"Black-box-Optimization\"><\/span>Black-box Optimization<span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p>Inspired by Snakes, we figured it might be worth a shot to cast polygon fitting as an optimization problem and let <a href=\"https:\/\/docs.scipy.org\/doc\/scipy\/reference\/optimize.html\">scipy.optimize<\/a> figure out the details. Ideas ranged from measuring the amount of bright pixels inside the polygon in relation to the whole image over comparing the color distribution with a distribution \u201clearned\u201c from the receipts to computing the distance of the polygon boundary to edges in the image. But no matter how much we wanted this to work, in the end the optimization always got stuck in local minima far from the desired one, if it converged at all.<\/p>\n<h4 id=\"deep-learning\"><span class=\"ez-toc-section\" id=\"Deep-Learning\"><\/span>Deep Learning<span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p>Finally, you can\u2019t really get away without giving deep learning a shot. The idea would be to train a CNN to directly map images to the coordinates of the polygon vertices. Such approaches have been proven to work with facial feature detection (e.g. <a href=\"http:\/\/openaccess.thecvf.com\/content_cvpr_2013\/papers\/Sun_Deep_Convolutional_Network_2013_CVPR_paper.pdf\">here<\/a>, <a href=\"http:\/\/cs231n.stanford.edu\/reports\/2016\/pdfs\/007_Report.pdf\">here<\/a> and <a href=\"http:\/\/home.ie.cuhk.edu.hk\/~ccloy\/files\/eccv_2014_deepfacealign.pdf\">here<\/a>), so why wouldn\u2019t it in our case? The relatively small number of training data could easily be increased using simple image transformations\u2014Keras has a nice <a href=\"https:\/\/keras.io\/preprocessing\/image\/#imagedatagenerator-class\">API<\/a> for that\u2014so we gave it a shot.<\/p>\n<p>Long story short: it did not work. Regardless of network topology, loss function and optimizer, the network eventually settled for a mean shape that is close to the training data, or collapsed into a single point. With enough time (and more training data), you could probably find a solution that works, but the problem itself is somewhat ill-posed to begin with: Good locations for the polygon vertices often lie outside of the receipt and the receipt itself does not give many clues where the network should put the polygon. At the same time, there are many sources of visual noise, like text, creases and folds or geometric and perspective distortions. A stage-wise approach like the one by <a href=\"http:\/\/openaccess.thecvf.com\/content_cvpr_2013\/papers\/Sun_Deep_Convolutional_Network_2013_CVPR_paper.pdf\">Sun, Wang and Tang<\/a> will likely fail for similar reasons.<\/p>\n<p>A much better application for a CNN might be to provide the initial segmentation, e.g., with Mask <a href=\"https:\/\/arxiv.org\/abs\/1703.06870\">R-CNN<\/a> or <a href=\"https:\/\/lmb.informatik.uni-freiburg.de\/people\/ronneber\/u-net\/\">U-net<\/a>, and then find the polygon coordinates using the method shown in step 1.3. Alternatively, you could directly learn a transformation map to rectify the receipt like Ma et al.\u00a0do with <a href=\"https:\/\/openaccess.thecvf.com\/content_cvpr_2018\/papers\/Ma_DocUNet_Document_Image_CVPR_2018_paper.pdf\">DocUNet<\/a>. Unfortunately the authors did not release code for training data generation nor the trained model.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Step-2-Cropping-De-skewing-and-Contrast-Enhancement\"><\/span>Step 2: Cropping, De-skewing and Contrast Enhancement<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Once the receipt is found, the next steps are to extract it from the image and boost the contrast of what\u2019s printed on the receipt. The former means we need to both crop and de-skew the image. Both are standard operations in image processing and scikit-image provides <a href=\"https:\/\/scikit-image.org\/docs\/dev\/api\/skimage.transform.html#skimage.transform.warp\">transform.warp()<\/a> to do them in one pass using a mapping between input pixel and output pixel locations. For efficiency reasons, scikit-image actually needs the <em>inverse<\/em> mapping, i.e., the mapping that, given an output pixel, returns the corresponding location in the input image. Fortunately, scikit-image also provides methods to estimate such mappings from pairs of coordinates like\u2014you guessed it\u2014the corners of a polygon. All we need to do is to pick a mapping and define the desired output shape for a given polygon.<\/p>\n<p>As the distortion is mostly due to perspective (which maps a rectangle to a quadrilateral), it makes sense to use <a href=\"https:\/\/scikit-image.org\/docs\/dev\/api\/skimage.transform.html#skimage.transform.ProjectiveTransform\">transform.ProjectiveTransfom<\/a>. For the output shape, we simply compute the edge-lengths of the quadrilateral and construct a rectangle with width and height equal to the maximum length of the top and bottom and left and right segments, respectively. This works reasonably well as long as the receipt and camera plane are close to parallel.<\/p>\n<p><img decoding=\"async\" class=\"aligncenter size-medium wp-image-16589\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/07\/warp-target.svg\" alt=\"\" \/><\/p>\n<p>With the assumption that <code>coordinates<\/code> contains the polygon vertices in clockwise order starting from the upper left corner, the code to extract the receipt is:<\/p>\n<pre class=\"sourceCode python\"><code>from scipy.spatial import distance\r\n\r\nd = distance.pdist(coordinates)\r\n\r\nw = int(max(d[0], d[5])) # = max(dist(p1, p2), dist(p3, p4))\r\n\r\nh = int(max(d[2], d[3])) # = max(dist(p1, p4), dist(p2, p3))\r\n\r\ntr = transform.ProjectiveTransform()\r\n\r\ntr.estimate(array([[0,0], [w,0], [w,h], [0,h]]), coords)\r\n\r\nreceipt = transform.warp(img, tr, output_shape=(h, w), order=1, mode=\"reflect\")<\/code><\/pre>\n<p>And here is what that does to our example:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-medium wp-image-16582\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/07\/step-7-unwarp-115x300.jpg\" alt=\"\" width=\"115\" height=\"300\" \/><\/p>\n<p>In theory, we could stop here, but there is still some visual noise we want to get rid of in order to make subsequent optical character recognition easier. In particular, we want to remove the parts of the background that are still visible and, more importantly, the shadows and highlights caused by folds and creases. The problem might be a good fit for Gandelsman, Shocher and Irani\u2019s <a href=\"http:\/\/www.wisdom.weizmann.ac.il\/~vision\/DoubleDIP\/index.html\">Double-DIP<\/a>, but then again, good old thresholding will likely also do the trick. Instead of a global threshold, though, this time we computed adaptive local thresholds according to <a href=\"http:\/\/citeseerx.ist.psu.edu\/viewdoc\/download;?doi=10.1.1.98.880&amp;rep=rep1&amp;type=pdf\">Sauvola\u2019s method<\/a>. This method is specifically designed for OCR and works by computing a different threshold for each pixel depending on the gray value distribution around that pixel.<\/p>\n<p>Formally, the threshold \\(\\tau(u,v)\\) at the position \\((u,v)\\) is defined as<\/p>\n<p style=\"text-align: center;\">\\(\\displaystyle \\tau(u, v) := \\left(1 + k\\,\\left(2\\,s_{A(u, v)} &#8211; 1\\right)\\right) m_{A(u, v)}\\),<\/p>\n<p>where \\(m_{A(u,v)}\\)\u00a0and \\(s_{A(u,v)}\\) denote the mean and standard deviation of gray values in the image patch \\(A(u,v)\\) around the pixel at \\((u,v)\\). The parameter \\(k\\) balances the contribution of the standard deviation on the threshold and can be interpreted as some prior knowledge about the noise-level of the image. In our experiments, a square patch of \\(55\\times 55\\) pixels and \\(k=0.1\\) yielded good results. Below you can see the threshold gray levels and resulting image mask, where in the rightmost image we also removed all blobs that touched the image border:<\/p>\n<p align=\"center\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-16585 alignnone\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/07\/step-8-localthreshold-115x300.jpg\" alt=\"\" width=\"115\" height=\"300\" \/> <img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-16588 alignnone\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/07\/step-8-thresholded-115x300.jpg\" alt=\"\" width=\"115\" height=\"300\" \/> <img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-16584 alignnone\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/07\/step-8-cleared-115x300.jpg\" alt=\"\" width=\"115\" height=\"300\" \/><\/p>\n<p>As you can see the threshold successfully suppresses most of the illumination differences while keeping the text intact. However, some characters, especially the small ones, are hard to make out in the binary image. Therefore, we first feather out the mask by Gaussian blurring and multiply it with the original (receipt) image. This effectively removes all masked regions, but with soft instead of hard edges. Finally, we increase contrast using gamma \u201ccorrection\u201c:<\/p>\n<p align=\"center\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-16583 alignnone\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/07\/step-8-blurred-115x300.jpg\" alt=\"\" width=\"115\" height=\"300\" \/> <img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-16586 alignnone\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/07\/step-8-masked-115x300.jpg\" alt=\"\" width=\"115\" height=\"300\" \/>\u00a0<img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-16587 alignnone\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/07\/step-8-masked-gamma-115x300.jpg\" alt=\"\" width=\"115\" height=\"300\" \/><\/p>\n<p>All this is done in less than 10 lines of Python:<\/p>\n<pre class=\"sourceCode python\"><code>from skimage.segmentation import clear_border\r\n\r\nmask = receipt &lt; filters.threshold_sauvola(receipt, 55, 0.1)\r\n\r\nmask = clear_border(mask)\r\n\r\nmask = filters.gaussian(mask, 1)\r\n\r\nreceipt = 1 - (1 - receipt) * mask\r\n\r\nreceipt = receipt**3<\/code><\/pre>\n<h3><span class=\"ez-toc-section\" id=\"What-Didnt-Work-2\"><\/span>What Didn\u2019t Work?<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>We tried to estimate a global threshold from the gray value distribution, e.g., at the 8th percentile. While this worked in many cases, it failed when creases on the receipt resulted in strong illumination differences, like in the rightmost image at the top of this post.<\/p>\n<p>We also experimented with gradient operators. The intuition was that these essentially act as high-pass filters, which would suppress the low-frequency illumination changes, but retain the high frequency details of the characters. Unfortunately, gradient operators also remove the inside of characters. We tried to counteract this with morphological closing and hole-filling, but this would also close holes we wanted to keep (e.g., the one in an O) and fuse characters that should remain separate.<\/p>\n<p>Finally, we experimented with different local thresholding methods. All these methods effectively boil down to computing some statistic over the local neighborhood of the pixel of interest, such as the mean, median or <a href=\"https:\/\/en.wikipedia.org\/wiki\/Standard_score\">z-score<\/a>. However, Sauvola\u2019s method proved to be the most robust.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Step-3-Information-Extraction\"><\/span>Step 3: Information Extraction<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>This is all well and good, but in the end we still only have a bunch of pixels. To do anything more useful downstream, like tracking expenditures, monitoring your pantry and telling stories, you\u2019ll need the text. With all that has been done so far, we could certainly go the extra mile and roll our own OCR by training a classifier to recognize individual characters <a href=\"https:\/\/www.inovex.de\/blog\/semi-supervised-gans-in-an-end-to-end-text-spotting-pipeline\/\">like Florian described here<\/a>. However, OCR requires more than that (most notably: page segmentation) and this post is already too long to open that can of worms.<\/p>\n<p>Fortunately there are very good commercial and free OCR libraries out there. The most popular free option seems to be the <a href=\"https:\/\/github.com\/tesseract-ocr\/tesseract\">Tesseract OCR engine<\/a>, but <a href=\"https:\/\/github.com\/tmbdev\/ocropy\">OCRopus<\/a> might also be a good option, especially if you want to explore the inner workings of OCR. Here, we use <a href=\"https:\/\/pypi.org\/project\/pytesseract\/\">pytesseract<\/a>, which is a simple wrapper around Tesseract.<\/p>\n<p>Using pytesseract is simple: call <code>image_to_string()<\/code> to convert the image into a single formatted string. Call <code>image_to_data()<\/code> to get individual text fragments with recognition confidence and other useful information. For best results, make sure to use a model for your target language<a id=\"fnref3\" class=\"footnote-ref\" role=\"doc-noteref\" href=\"#fn3\"><sup>3<\/sup><\/a>:<\/p>\n<pre class=\"sourceCode python\"><code>import pytesseract\r\n\r\ntext = pytesseract.image_to_string(img, lang='deu')<\/code><\/pre>\n<table class=\" aligncenter\">\n<thead><\/thead>\n<tbody>\n<tr>\n<th>Source image<\/th>\n<th>Extracted text<\/th>\n<\/tr>\n<\/tbody>\n<tbody>\n<tr>\n<td><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-16587\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/07\/step-8-masked-gamma-115x300.jpg\" alt=\"\" width=\"230\" height=\"599\" \/><\/td>\n<td>\n<pre class=\"\">Semecn-nCHrer\u2018\r\n\r\nScheck-In\r\n\r\nR\u00fcppurrerstr, 1\r\n\r\nD-76137 Karlsruhe\r\n\r\nTel. 0721\/35258-0\r\n\r\nE\r\n\r\nFROSTA BAMI GORENG 3,29\r\n\r\nETTLI FAIR KAFFEE 5,29 A\r\n\r\nPosten: 2 - .\r\n\r\nSUMME EUR 8,58\r\n\r\nEC-Cash EUR 8,58\r\n\r\nMuSt \\ WETTO MwSt UMSATZ\r\n\r\nAM 8.02 0,56 8,58\r\n\r\nMit der DeutschlandCard h\u00e4tten Sie\r\n\r\nauf den Umsatz von: 8,58 EUR\r\n\r\n4 Punkte erhalten!\r\n\r\n175732803191070095;\r\n\r\nOE A HG\r\n\r\nEs bediente Sie:\r\n\r\nFrau . Morina\r\n\r\nDatum Uhrzeit Filiale Pos Bed Bon\r\n\r\n28.03.19 19:48 0047573 107 028 9521\r\n\r\nSteuernummer:; DE 188293910\r\n\r\nVielen Dank f\u00fcr Ihren Einkauf\r\n\r\n\u00d6ffnungszeiten:\r\n\r\n- Sa.: 08:00 Uhr bis 22:00 Uhr\r\n\r\nwww scheck- In-center \u201ade<\/pre>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>While certainly not perfect, this looks pretty good: all of the relevant information is there for your taking. To improve the result, we can filter out the low confidence detections and those that are obviously too small to be text. As mentioned above, <code>image_to_data()<\/code> provides the information to do so. The default output is in a TSV (tab separated values) format, but pytesseract can automatically parse this into a python <code>dict<\/code> or a pandas Dataframe, which we\u2019ll use here:<\/p>\n<pre class=\"sourceCode python\"><code>df = pytesseract.image_to_data(img, lang='deu', output_type=pytesseract.Output.DATAFRAME)\r\n\r\n# filter low-confidence and small detections\r\n\r\nfiltered = df[(df.conf &gt; 0) &amp; (df.height &gt; 5)]\r\n\r\n# convert to text\r\n\r\nlines = filtered.groupby(['block_num', 'line_num']).text.apply(lambda x: \" \".join(x))\r\n\r\ntext = \"\\n\".join(lines)<\/code><\/pre>\n<p>This gets rid of the obvious mistakes like <code>Semecn-nCHrer\u2018<\/code> and the false detections caused by creases left over from the preprocessing steps. Another good way to increase recognition performance is to increase the image resolution, but be aware that this will increase the time spent in preprocessing and can also <em>reduce<\/em> the overall recognition performance. <a href=\"https:\/\/groups.google.com\/forum\/#!msg\/tesseract-ocr\/Wdh_JJwnw94\/24JHDYQbBQAJ\">This post<\/a> by \u201cWillus Dotkom\u201c suggests that the optimal character height might be around 30 pixels. The tesseract wiki further <a href=\"https:\/\/github.com\/tesseract-ocr\/tesseract\/wiki\/ImproveQuality#dictionaries-word-lists-and-patterns\">suggests<\/a> to add words and patterns typically found on receipts, like \u201cMwSt\u201c, to the list of known words and patterns.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Where-to-Go-from-here\"><\/span>Where to Go from here?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>At this stage you have a lot of options. You could, for example, package all the above into a small <a href=\"http:\/\/flask.pocoo.org\/\">Flask<\/a>-powered web service or a smartphone app (then again, you\u2019d have to compete against <a href=\"https:\/\/play.google.com\/store\/apps\/details?id=com.docuware.android.paperscan\">PaperScan<\/a>). Alternatively, you could practice your regular expression skills and extract the total amount on the receipt or put the text through the NLP pipeline of your choice to extract even more information. You could also use a text-to-speech engine to read the receipt out loud, or even build your own to do so.<\/p>\n<p>If you instead want to focus on computer vision, you could try to improve the receipt detection algorithm\u2014maybe by implementing <a href=\"https:\/\/openaccess.thecvf.com\/content_cvpr_2018\/papers\/Ma_DocUNet_Document_Image_CVPR_2018_paper.pdf\">DocUNet<\/a>?\u2014or train a classifier to detect whether the image shows a receipt in the first place. You could also try to localize and classify the different parts of the receipt, e.g.\u00a0the logo and address area, item descriptions, total amount, bar code, etc. Or you could build a classifier to automatically sort the receipts according to their type or origin.<\/p>\n<p>Regardless of what you decide to do, be sure to drop us a note and tell us about your experience!<\/p>\n<section class=\"footnotes\" role=\"doc-endnotes\">\n<hr \/>\n<ol>\n<li id=\"fn1\" role=\"doc-endnote\">Interestingly, a normal human retina also contains approximately <a href=\"https:\/\/en.wikipedia.org\/wiki\/Photoreceptor_cell#Humans\">20 times more light receptors (rods) than color receptors (cones)<\/a>\u2014though at the center of the fovea, the focus of vision and the area with the highest density of photo receptors, there are only cone cells. So there\u2019s that.<a class=\"footnote-back\" role=\"doc-backlink\" href=\"#fnref1\">\u21a9\ufe0e<\/a><\/li>\n<li id=\"fn2\" role=\"doc-endnote\">Of course, not all polygon candidates are plausible. However, selection of the best candidate is quite fast, so we figured filtering out the implausible candidates was not worth the additional coding effort.<a class=\"footnote-back\" role=\"doc-backlink\" href=\"#fnref2\">\u21a9\ufe0e<\/a><\/li>\n<li id=\"fn3\" role=\"doc-endnote\">Tesseract provides pre-trained models for many languages <a href=\"https:\/\/github.com\/tesseract-ocr\/tessdata_fast\/\">here<\/a>.<a class=\"footnote-back\" role=\"doc-backlink\" href=\"#fnref3\">\u21a9\ufe0e<\/a><\/li>\n<\/ol>\n<\/section>\n","protected":false},"excerpt":{"rendered":"<p>\u201cWould you like the receipt?\u201c\u2014It\u2019s hard to say no to that. Not because you actually want it (you may even throw it in the trash before exiting the store), but because doing otherwise might hurt the feelings of the cashier. But if you take a closer look, you will discover that a receipt carries all [&hellip;]<\/p>\n","protected":false},"author":239,"featured_media":16759,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"ep_exclude_from_search":false,"footnotes":""},"tags":[509,150],"service":[76],"coauthors":[{"id":239,"display_name":"Matthias Richter","user_nicename":"mrichter"}],"class_list":["post-21110","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","tag-ai-2","tag-computer-vision","service-artificial-intelligence"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.8 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Digitize your Receipts using Computer Vision - inovex GmbH<\/title>\n<meta name=\"description\" content=\"In this article I describe the steps and approaches to image recognition for receipt digitalization using computer vision. This is the basic functionality behind apps such as Google Lens, Evernote, PaperScan and taggun.io.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.inovex.de\/de\/blog\/digitize-receipts-computer-vision\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Digitize your Receipts using Computer Vision - inovex GmbH\" \/>\n<meta property=\"og:description\" content=\"In this article I describe the steps and approaches to image recognition for receipt digitalization using computer vision. This is the basic functionality behind apps such as Google Lens, Evernote, PaperScan and taggun.io.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.inovex.de\/de\/blog\/digitize-receipts-computer-vision\/\" \/>\n<meta property=\"og:site_name\" content=\"inovex GmbH\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/inovexde\" \/>\n<meta property=\"article:published_time\" content=\"2019-08-13T06:06:04+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-11-24T09:37:06+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/08\/computer-vision-digitize-receipts.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1920\" \/>\n\t<meta property=\"og:image:height\" content=\"1080\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Matthias Richter\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/08\/computer-vision-digitize-receipts-1024x576.png\" \/>\n<meta name=\"twitter:creator\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:site\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"Matthias Richter\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"22\u00a0Minuten\" \/>\n\t<meta name=\"twitter:label3\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data3\" content=\"Matthias Richter\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/digitize-receipts-computer-vision\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/digitize-receipts-computer-vision\\\/\"},\"author\":{\"name\":\"Matthias Richter\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/person\\\/3e3d7042596d8e6d4a4628b642399bd6\"},\"headline\":\"Digitize your Receipts using Computer Vision\",\"datePublished\":\"2019-08-13T06:06:04+00:00\",\"dateModified\":\"2022-11-24T09:37:06+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/digitize-receipts-computer-vision\\\/\"},\"wordCount\":4004,\"commentCount\":5,\"publisher\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/digitize-receipts-computer-vision\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2019\\\/08\\\/computer-vision-digitize-receipts.png\",\"keywords\":[\"Ai\",\"Computer Vision\"],\"articleSection\":[\"Analytics\",\"English Content\",\"General\"],\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/digitize-receipts-computer-vision\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/digitize-receipts-computer-vision\\\/\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/digitize-receipts-computer-vision\\\/\",\"name\":\"Digitize your Receipts using Computer Vision - inovex GmbH\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/digitize-receipts-computer-vision\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/digitize-receipts-computer-vision\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2019\\\/08\\\/computer-vision-digitize-receipts.png\",\"datePublished\":\"2019-08-13T06:06:04+00:00\",\"dateModified\":\"2022-11-24T09:37:06+00:00\",\"description\":\"In this article I describe the steps and approaches to image recognition for receipt digitalization using computer vision. This is the basic functionality behind apps such as Google Lens, Evernote, PaperScan and taggun.io.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/digitize-receipts-computer-vision\\\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/digitize-receipts-computer-vision\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/digitize-receipts-computer-vision\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2019\\\/08\\\/computer-vision-digitize-receipts.png\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2019\\\/08\\\/computer-vision-digitize-receipts.png\",\"width\":1920,\"height\":1080,\"caption\":\"A receipt is digitized using computer vision\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/digitize-receipts-computer-vision\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Digitize your Receipts using Computer Vision\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#website\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\",\"name\":\"inovex GmbH\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\",\"name\":\"inovex GmbH\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/inovex-logo-16-9-1.png\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/inovex-logo-16-9-1.png\",\"width\":1921,\"height\":1081,\"caption\":\"inovex GmbH\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/inovexde\",\"https:\\\/\\\/x.com\\\/inovexgmbh\",\"https:\\\/\\\/www.instagram.com\\\/inovexlife\\\/\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/inovex\",\"https:\\\/\\\/www.youtube.com\\\/channel\\\/UC7r66GT14hROB_RQsQBAQUQ\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/person\\\/3e3d7042596d8e6d4a4628b642399bd6\",\"name\":\"Matthias Richter\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/matthias-richter-BAB-96x96.jpgf72c6be19328ca8dbc7d24d0ec817d12\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/matthias-richter-BAB-96x96.jpg\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/matthias-richter-BAB-96x96.jpg\",\"caption\":\"Matthias Richter\"},\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/author\\\/mrichter\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Digitize your Receipts using Computer Vision - inovex GmbH","description":"In this article I describe the steps and approaches to image recognition for receipt digitalization using computer vision. This is the basic functionality behind apps such as Google Lens, Evernote, PaperScan and taggun.io.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.inovex.de\/de\/blog\/digitize-receipts-computer-vision\/","og_locale":"de_DE","og_type":"article","og_title":"Digitize your Receipts using Computer Vision - inovex GmbH","og_description":"In this article I describe the steps and approaches to image recognition for receipt digitalization using computer vision. This is the basic functionality behind apps such as Google Lens, Evernote, PaperScan and taggun.io.","og_url":"https:\/\/www.inovex.de\/de\/blog\/digitize-receipts-computer-vision\/","og_site_name":"inovex GmbH","article_publisher":"https:\/\/www.facebook.com\/inovexde","article_published_time":"2019-08-13T06:06:04+00:00","article_modified_time":"2022-11-24T09:37:06+00:00","og_image":[{"width":1920,"height":1080,"url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/08\/computer-vision-digitize-receipts.png","type":"image\/png"}],"author":"Matthias Richter","twitter_card":"summary_large_image","twitter_image":"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/08\/computer-vision-digitize-receipts-1024x576.png","twitter_creator":"@inovexgmbh","twitter_site":"@inovexgmbh","twitter_misc":{"Verfasst von":"Matthias Richter","Gesch\u00e4tzte Lesezeit":"22\u00a0Minuten","Written by":"Matthias Richter"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.inovex.de\/de\/blog\/digitize-receipts-computer-vision\/#article","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/blog\/digitize-receipts-computer-vision\/"},"author":{"name":"Matthias Richter","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/3e3d7042596d8e6d4a4628b642399bd6"},"headline":"Digitize your Receipts using Computer Vision","datePublished":"2019-08-13T06:06:04+00:00","dateModified":"2022-11-24T09:37:06+00:00","mainEntityOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/digitize-receipts-computer-vision\/"},"wordCount":4004,"commentCount":5,"publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/digitize-receipts-computer-vision\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/08\/computer-vision-digitize-receipts.png","keywords":["Ai","Computer Vision"],"articleSection":["Analytics","English Content","General"],"inLanguage":"de","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.inovex.de\/de\/blog\/digitize-receipts-computer-vision\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.inovex.de\/de\/blog\/digitize-receipts-computer-vision\/","url":"https:\/\/www.inovex.de\/de\/blog\/digitize-receipts-computer-vision\/","name":"Digitize your Receipts using Computer Vision - inovex GmbH","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/digitize-receipts-computer-vision\/#primaryimage"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/digitize-receipts-computer-vision\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/08\/computer-vision-digitize-receipts.png","datePublished":"2019-08-13T06:06:04+00:00","dateModified":"2022-11-24T09:37:06+00:00","description":"In this article I describe the steps and approaches to image recognition for receipt digitalization using computer vision. This is the basic functionality behind apps such as Google Lens, Evernote, PaperScan and taggun.io.","breadcrumb":{"@id":"https:\/\/www.inovex.de\/de\/blog\/digitize-receipts-computer-vision\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.inovex.de\/de\/blog\/digitize-receipts-computer-vision\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/blog\/digitize-receipts-computer-vision\/#primaryimage","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/08\/computer-vision-digitize-receipts.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/08\/computer-vision-digitize-receipts.png","width":1920,"height":1080,"caption":"A receipt is digitized using computer vision"},{"@type":"BreadcrumbList","@id":"https:\/\/www.inovex.de\/de\/blog\/digitize-receipts-computer-vision\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.inovex.de\/de\/"},{"@type":"ListItem","position":2,"name":"Digitize your Receipts using Computer Vision"}]},{"@type":"WebSite","@id":"https:\/\/www.inovex.de\/de\/#website","url":"https:\/\/www.inovex.de\/de\/","name":"inovex GmbH","description":"","publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.inovex.de\/de\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Organization","@id":"https:\/\/www.inovex.de\/de\/#organization","name":"inovex GmbH","url":"https:\/\/www.inovex.de\/de\/","logo":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","width":1921,"height":1081,"caption":"inovex GmbH"},"image":{"@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/inovexde","https:\/\/x.com\/inovexgmbh","https:\/\/www.instagram.com\/inovexlife\/","https:\/\/www.linkedin.com\/company\/inovex","https:\/\/www.youtube.com\/channel\/UC7r66GT14hROB_RQsQBAQUQ"]},{"@type":"Person","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/3e3d7042596d8e6d4a4628b642399bd6","name":"Matthias Richter","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/wp-content\/uploads\/matthias-richter-BAB-96x96.jpgf72c6be19328ca8dbc7d24d0ec817d12","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/matthias-richter-BAB-96x96.jpg","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/matthias-richter-BAB-96x96.jpg","caption":"Matthias Richter"},"url":"https:\/\/www.inovex.de\/de\/blog\/author\/mrichter\/"}]}},"_links":{"self":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/21110","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/users\/239"}],"replies":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/comments?post=21110"}],"version-history":[{"count":3,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/21110\/revisions"}],"predecessor-version":[{"id":37975,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/21110\/revisions\/37975"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media\/16759"}],"wp:attachment":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media?parent=21110"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/tags?post=21110"},{"taxonomy":"service","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/service?post=21110"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/coauthors?post=21110"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}