{"id":32704,"date":"2021-11-19T12:27:34","date_gmt":"2021-11-19T11:27:34","guid":{"rendered":"https:\/\/www.inovex.de\/?p=32704"},"modified":"2022-11-17T12:31:22","modified_gmt":"2022-11-17T11:31:22","slug":"finetuning-a-resnet-for-a-content-based-image-retrieval-task","status":"publish","type":"post","link":"https:\/\/www.inovex.de\/de\/blog\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\/","title":{"rendered":"Finetuning a ResNet for a Content-Based Image Retrieval Task"},"content":{"rendered":"<p>In this blog post, I will show you how we finetuned and evaluated a ResNet pre-trained on generic ImageNet data to a specific use case. I will share the takeaways we gained during our evaluation and reveal how well our optimized document retrieval works in practice.<!--more--><\/p>\n<p>Convolutional Neural Networks (CNN&#8217;s) are on everyone&#8217;s lips and are known for their strength in extracting information from images. We at inovex make use of the power of CNN&#8217;s as well. In the Service-Meister research project, a document retrieval is implemented using a contextual image search. It aims to support service technicians in retrieving information from manuals by uploading a picture of the affected device. For the image retrieval process, a CNN is applied.<\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_82_2 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\"><p class=\"ez-toc-title\" style=\"cursor:inherit\"><\/p>\n<\/div><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.inovex.de\/de\/blog\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\/#The-use-case\" >The use case<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.inovex.de\/de\/blog\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\/#What-happens-under-the-hood\" >What happens under the hood?<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.inovex.de\/de\/blog\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\/#The-initial-situation\" >The initial situation<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.inovex.de\/de\/blog\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\/#Finetuning-the-model\" >Finetuning the model<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.inovex.de\/de\/blog\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\/#Step-One-Choosing-an-architecture\" >Step One: Choosing an architecture<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.inovex.de\/de\/blog\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\/#Step-Two-Retraining-the-model\" >Step Two: Retraining the model<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.inovex.de\/de\/blog\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\/#Step-Three-Performing-a-hyperparameter-optimization\" >Step Three: Performing a hyperparameter optimization<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/www.inovex.de\/de\/blog\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\/#tldr-Strategy-overview\" >tl;dr: Strategy overview<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/www.inovex.de\/de\/blog\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\/#Measuring-the-improvements\" >Measuring the improvements<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/www.inovex.de\/de\/blog\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\/#How-we-measure-improvements\" >How we measure improvements<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/www.inovex.de\/de\/blog\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\/#Evaluation-results-and-takeaways\" >Evaluation results and takeaways<\/a><ul class='ez-toc-list-level-4' ><li class='ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/www.inovex.de\/de\/blog\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\/#1-ResNets-pre-trained-on-ImageNet-data-are-powerful-feature-extractors\" >1. ResNets pre-trained on ImageNet data are powerful feature extractors<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/www.inovex.de\/de\/blog\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\/#2-ResNets-version-2-is-superior\" >2. ResNet\u2019s version 2 is superior<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-14\" href=\"https:\/\/www.inovex.de\/de\/blog\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\/#3-Finetuning-a-pre-trained-network-for-their-upcoming-task-pays-off\" >3. Finetuning a pre-trained network for their upcoming task pays off<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-15\" href=\"https:\/\/www.inovex.de\/de\/blog\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\/#4-If-no-suitable-dataset-is-available-for-retraining-use-the-cross-dataset-strategy\" >4. If no suitable dataset is available for retraining, use the cross-dataset strategy<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-16\" href=\"https:\/\/www.inovex.de\/de\/blog\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\/#5-An-increased-validation-accuracy-does-not-correlate-with-a-better-image-retrieval-capability-but-a-hyperparameter-optimization-might-be-worth-it\" >5. An increased validation accuracy does not correlate with a better image retrieval capability (but a hyperparameter optimization might be worth it)<\/a><\/li><\/ul><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-17\" href=\"https:\/\/www.inovex.de\/de\/blog\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\/#And-how-does-it-perform-in-action\" >And how does it perform in action?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-18\" href=\"https:\/\/www.inovex.de\/de\/blog\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\/#References\" >References<\/a><\/li><\/ul><\/nav><\/div>\n<h2><span class=\"ez-toc-section\" id=\"The-use-case\"><\/span>The use case<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>The topic I am covering with you today is part of the <a href=\"https:\/\/www.servicemeister.org\/\">Service-Meister<\/a> research project, funded by the German Federal Ministry of Education and Research (BMBF). Service-Meister aims to address the increasing diversity and complexity of industrial machines by providing technicians with AI-powered services to support their work. One area of this project is covered by a collaboration of Krohne Messtechnik GmbH and inovex. Krohne offers its customers the possibility to monitor flow rates of water pipes. In this context, one goal of the project is to simplify the maintenance of the applied technical instruments.<\/p>\n<p>This use case deals with the support of a service technician in retrieving information for a Krohne device during a maintenance repair. Due to the variety of different devices and their complexity, the technician is reliant on the usage of manuals. Currently, these have to be searched manually for any defective device and each unknown error message. Since this causes unnecessary effort and downtime, inovex and Krohne Messtechnik GmbH are developing a solution to simplify the process of obtaining information for repairing a device. For this purpose, a service technician can use his mobile device to capture a photo of the affected device or a displayed error code. He can then upload this photo to a search engine implemented by inovex.<\/p>\n<p>An underlying CNN compares the taken photo with pictures from the manual. The most similar image is retrieved and the service technician is directed to the corresponding manual and the page on which the device or error code is shown. Thus, manually browsing the handbooks becomes obsolete. This search can prove particularly helpful when taking photographs of fault messages, damaged parts or error codes. The technician can immediately obtain possible solutions if the corresponding incident is visually represented in the manuals.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"What-happens-under-the-hood\"><\/span>What happens under the hood?<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>So, what is happening if a technician uploads a picture to our search engine? To provide the technician with information about the photographed device, we apply content-based image retrieval. So let&#8217;s first cover the basics.<\/p>\n<p>The goal of content-based image retrieval (CBIR) is to extract images from a database that are similar to a given query image. The procedure for doing so is illustrated below.<\/p>\n<p><figure id=\"attachment_32382\" aria-describedby=\"caption-attachment-32382\" style=\"width: 303px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-32382 size-full\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/cbir-1.png\" alt=\"Flow chart of a content-based image retrieval\" width=\"303\" height=\"343\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/cbir-1.png 303w, https:\/\/www.inovex.de\/wp-content\/uploads\/cbir-1-265x300.png 265w\" sizes=\"auto, (max-width: 303px) 100vw, 303px\" \/><figcaption id=\"caption-attachment-32382\" class=\"wp-caption-text\">Flow of a content-based image retrieval, inspired by [1]<\/figcaption><\/figure>Sample images are processed in advance. In doing so, the algorithm extracts features of the images, such as color information or more abstract information like complex shapes. The extracted features are stored in a database. If the algorithm is now provided with an image, its features are extracted and compared to the feature vectors from the database, using a specified similarity metric. Since similar images form similar vectors, the images in the database can be ranked according to their similarity to the query image.<\/p>\n<p>Convolutional Neural Networks (CNN) are especially suitable feature extractors for CBIR. Rather than using the classification output of the network, the information is taken from a layer prior to the output layer. These vectors contain image descriptive properties and are therefore used as a feature for CBIR.<\/p>\n<p>But back to our use case: Our goal is to provide the technician with relevant information about the captured device. To do this, we proceed as follows:<\/p>\n<ul>\n<li>Extract the most similar manual image. In advance, a database is created containing the feature vectors of all the images appearing in the Krohne manuals. By performing a content-based image retrieval, the most similar picture to the captured image is selected.<\/li>\n<li>Return the according manual page. Alongside the device image, its manual page is stored. This information is passed to the technician, who is thus spared from browsing through all the manuals by himself. In the manual, he can retrieve all the information he needs.<\/li>\n<\/ul>\n<h2><span class=\"ez-toc-section\" id=\"The-initial-situation\"><\/span>The initial situation<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>The project started by applying an ordinary ResNet152V2 Neural Network pre-trained on <a href=\"https:\/\/www.image-net.org\/\">ImageNet data<\/a> for the image retrieval task. ImageNet contains about 1.2 million images of different classes such as dogs, flowers, cars etc. Networks trained on ImageNet data already have very mature features that are suitable for many image-analyzing applications. It is therefore no surprise that the ResNet in use can already handle our image retrieval satisfactorily. But the network is not prepared for the use case in any way.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Finetuning-the-model\"><\/span>Finetuning the model<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>By applying the upcoming optimization strategies, we want to specialize the network for recognizing Krohne devices. We anticipate that this will result in a more accurate and reliable image retrieval.<\/p>\n<p>The accuracy of a Neural Network for contextual image retrieval is influenced by the finetuning of the network to the specific application and the choice of suitable hyperparameters. Therefore, we formulated the following roadmap that will be executed step-by-step:<\/p>\n<ul>\n<li>Choosing a suitable base ResNet architecture<\/li>\n<li>Specializing the ResNet by retraining on relevant image structures that are expected to occur during the retrieval task<\/li>\n<li>Performing a hyperparameter optimization for an optimal training environment<\/li>\n<\/ul>\n<p>So let&#8217;s have a closer look at these three optimization steps.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Step-One-Choosing-an-architecture\"><\/span>Step One: Choosing an architecture<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>The pre-trained ResNet network forms the foundation for our optimization work should therefore not be chosen at random. The Keras API provides six different pre-trained models for this purpose, namely ResNet50, ResNet101, and ResNet152, in versions 1 and 2 respectively. Simply put, versions 1 and 2 differ in the arrangement of the internal components of a Residual Block \u2013 you can check out <a href=\"https:\/\/youtu.be\/GWt6Fu05voI\">this video<\/a> for more details. ResNet101 and ResNet152 increase their complexity compared to ResNet50 by concatenating more Residual Blocks. To find the best performing base architecture, the different ResNets are benchmarked against each other without previous finetuning.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Step-Two-Retraining-the-model\"><\/span>Step Two: Retraining the model<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>After we found our suitable candidate for the upcoming optimization work, we are all set to start. So far, the neural network is trained on images of dogs, cats, and co. To further push the precision of the vectors, it is advisable to retrain our ResNet on important image structures occurring in the target data [2].<\/p>\n<p>In the end, our ResNet will be used to identify captured photographs of Krohne devices. Ideally, we would therefore train with a dataset that contains such images. Krohne provided us with a dataset that contains all images appearing in their manuals. However, since we will use this dataset to evaluate our models, we had to find an alternative. So we reached out to open-source datasets and attempted to recreate the composition of images that appeared in the manual.<\/p>\n<p>We started by inspecting the Krohne dataset which is divided into 13 classes, each representing an instrument type. You can view samples of it below, where the images are already sorted according to the three categories into which we divided the dataset: 3D objects, engineering drawings and screenshots.<\/p>\n<figure id=\"attachment_32414\" aria-describedby=\"caption-attachment-32414\" style=\"width: 676px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-32414 size-full\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/krohne_3d.jpg\" alt=\"four pictures of different objects in black and white with different measurement displays\" width=\"676\" height=\"426\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/krohne_3d.jpg 676w, https:\/\/www.inovex.de\/wp-content\/uploads\/krohne_3d-300x189.jpg 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/krohne_3d-400x252.jpg 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/krohne_3d-360x227.jpg 360w\" sizes=\"auto, (max-width: 676px) 100vw, 676px\" \/><figcaption id=\"caption-attachment-32414\" class=\"wp-caption-text\">3D objects (Krohne dataset)<\/figcaption><\/figure>\n<figure id=\"attachment_32416\" aria-describedby=\"caption-attachment-32416\" style=\"width: 666px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-32416 size-full\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/krohne_engdraw.jpg\" alt=\"four engineering drawings \" width=\"666\" height=\"427\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/krohne_engdraw.jpg 666w, https:\/\/www.inovex.de\/wp-content\/uploads\/krohne_engdraw-300x192.jpg 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/krohne_engdraw-400x256.jpg 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/krohne_engdraw-360x231.jpg 360w\" sizes=\"auto, (max-width: 666px) 100vw, 666px\" \/><figcaption id=\"caption-attachment-32416\" class=\"wp-caption-text\">Engineering drawings (Krohne dataset)<\/figcaption><\/figure>\n<figure id=\"attachment_32418\" aria-describedby=\"caption-attachment-32418\" style=\"width: 810px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-32418 size-full\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/krohne_ss.jpg\" alt=\"four screenshots of different data sets\" width=\"810\" height=\"436\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/krohne_ss.jpg 810w, https:\/\/www.inovex.de\/wp-content\/uploads\/krohne_ss-300x161.jpg 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/krohne_ss-768x412.jpg 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/krohne_ss-400x215.jpg 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/krohne_ss-360x194.jpg 360w\" sizes=\"auto, (max-width: 810px) 100vw, 810px\" \/><figcaption id=\"caption-attachment-32418\" class=\"wp-caption-text\">Screenshots (Krohne dataset)<\/figcaption><\/figure>\n<p>For each of these categories, we searched for a corresponding dataset online. Have a look at snippets of these datasets below.<\/p>\n<p><figure id=\"attachment_32421\" aria-describedby=\"caption-attachment-32421\" style=\"width: 374px\" class=\"wp-caption alignleft\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-32421 \" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/princeton_dataset.jpg\" alt=\"four 3d objects: a car, a plant, a surveillance camera and a guitar\" width=\"374\" height=\"376\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/princeton_dataset.jpg 490w, https:\/\/www.inovex.de\/wp-content\/uploads\/princeton_dataset-298x300.jpg 298w, https:\/\/www.inovex.de\/wp-content\/uploads\/princeton_dataset-150x150.jpg 150w, https:\/\/www.inovex.de\/wp-content\/uploads\/princeton_dataset-400x402.jpg 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/princeton_dataset-360x362.jpg 360w\" sizes=\"auto, (max-width: 374px) 100vw, 374px\" \/><figcaption id=\"caption-attachment-32421\" class=\"wp-caption-text\">3D objects (dataset for retraining) [3]<\/figcaption><\/figure><figure id=\"attachment_32423\" aria-describedby=\"caption-attachment-32423\" style=\"width: 372px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-32423 \" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/engdraw_dataset.jpg\" alt=\"four Engineering drawings\" width=\"372\" height=\"373\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/engdraw_dataset.jpg 487w, https:\/\/www.inovex.de\/wp-content\/uploads\/engdraw_dataset-300x300.jpg 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/engdraw_dataset-150x150.jpg 150w, https:\/\/www.inovex.de\/wp-content\/uploads\/engdraw_dataset-400x401.jpg 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/engdraw_dataset-360x361.jpg 360w\" sizes=\"auto, (max-width: 372px) 100vw, 372px\" \/><figcaption id=\"caption-attachment-32423\" class=\"wp-caption-text\">Engineering drawings (dataset for retraining) [4]<\/figcaption><\/figure><\/p>\n<figure id=\"attachment_32426\" aria-describedby=\"caption-attachment-32426\" style=\"width: 745px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-32426 \" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/screenshot_dataset.jpg\" alt=\"Screenshots of dataset for retraining\" width=\"745\" height=\"745\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/screenshot_dataset.jpg 1999w, https:\/\/www.inovex.de\/wp-content\/uploads\/screenshot_dataset-300x300.jpg 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/screenshot_dataset-1024x1024.jpg 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/screenshot_dataset-150x150.jpg 150w, https:\/\/www.inovex.de\/wp-content\/uploads\/screenshot_dataset-768x768.jpg 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/screenshot_dataset-1536x1536.jpg 1536w, https:\/\/www.inovex.de\/wp-content\/uploads\/screenshot_dataset-1920x1920.jpg 1920w, https:\/\/www.inovex.de\/wp-content\/uploads\/screenshot_dataset-400x400.jpg 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/screenshot_dataset-650x650.jpg 650w, https:\/\/www.inovex.de\/wp-content\/uploads\/screenshot_dataset-360x360.jpg 360w\" sizes=\"auto, (max-width: 745px) 100vw, 745px\" \/><figcaption id=\"caption-attachment-32426\" class=\"wp-caption-text\">Screenshots (dataset for retraining)<\/figcaption><\/figure>\n<p>By training with these datasets, we hope to teach the model to compose complex 3D objects and familiarize it with elements from drawings and screenshots. The datasets are not ideal for this purpose \u2013 but for lack of a dataset with images that mimic the later use case, we will work with what we have.<\/p>\n<p>Now it is time to train the ResNet with all three datasets. We came up with the following strategies:<\/p>\n<ul>\n<li>Single-dataset strategy: A network is retrained to one selected dataset.<\/li>\n<li>Sequential multi-dataset strategy: A network is subjected to several retraining sessions in consecutive order. Each training targets a different category.<\/li>\n<li>Cross-dataset strategy: All three datasets are combined into one large dataset and the network is retrained on this cross-dataset.<\/li>\n<\/ul>\n<p>For the training implementation, we followed well-established Transfer Learning recommendations. When applying retraining, we first behead the pre-trained model and stitch a new, randomly initialized one on top. To avoid the destruction of underlying rich features, the layers with pre-initialized weights are frozen. This allows the brand new head of the network to adapt to the new dataset. Next, the entire network is opened for finetuning all weights. We limit the duration of both steps by applying the Early-Stopping method. After training, the weights of the network are saved for later usage. When training sequentially on multiple datasets, the network subsequently goes through additional training runs using further datasets.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Step-Three-Performing-a-hyperparameter-optimization\"><\/span>Step Three: Performing a hyperparameter optimization<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>During retraining, we applied recommended guidelines for hyperparameter selection which allowed us to yield a satisfactory convergence behavior. But to provide our model with the most ideal training environment possible, we should tailor the hyperparameters to our use case. The quality of the parameters is measured by the validation accuracy our ResNet can achieve during training.<\/p>\n<p>Let\u2019s keep in mind that our ResNet will not be used for classification later on. Rather, underlying feature vectors will be extracted to measure the similarity of the input image to the images in our database. Therefore, a high validation accuracy is not necessarily related to an improved accuracy when performing our CBIR task. Nevertheless, we give hyperparameter optimization a try and ask ourselves: Does a higher validation accuracy correlate with a better CBIR result?<\/p>\n<p>To implement our hyperparameter optimization, we rely on Keras Tuner, a framework that lets us specify which variables and which search space we want to examine. We decided to go for an investigation of the following parameters: Learning rate, optimizer, and batch size. In addition, we included the choice of the ResNet base architecture and the so-called delimiting layer which defines the boundary up to which layers will remain frozen when opening the model for weight adjustments. As a tuning technique, we applied the Hyperband algorithm.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"tldr-Strategy-overview\"><\/span>tl;dr: Strategy overview<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>So, let&#8217;s roll all of this up in an overall strategy flow that we are going to follow in the course of this study. Take a look at the flow diagram below. First, we determine the ResNet architecture on which we will base our optimization work (1). We then prepare this ResNet trained on ImageNet data for our use case by retraining it on relevant image structures (2). Our Neural Network, once trained on a very broad spectrum, is now specialized to our selected datasets. To find out if we can achieve better results by increasing the validation accuracy of our ResNet, we perform a hyperparameter optimization to boost the effectiveness of our models\u2019 training (3).<\/p>\n<figure id=\"attachment_32384\" aria-describedby=\"caption-attachment-32384\" style=\"width: 540px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-32384 \" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/strategy_flow.png\" alt=\"model graph of the strategy flow with three steps\" width=\"540\" height=\"692\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/strategy_flow.png 1799w, https:\/\/www.inovex.de\/wp-content\/uploads\/strategy_flow-234x300.png 234w, https:\/\/www.inovex.de\/wp-content\/uploads\/strategy_flow-800x1024.png 800w, https:\/\/www.inovex.de\/wp-content\/uploads\/strategy_flow-768x984.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/strategy_flow-1199x1536.png 1199w, https:\/\/www.inovex.de\/wp-content\/uploads\/strategy_flow-1599x2048.png 1599w, https:\/\/www.inovex.de\/wp-content\/uploads\/strategy_flow-400x512.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/strategy_flow-360x461.png 360w\" sizes=\"auto, (max-width: 540px) 100vw, 540px\" \/><figcaption id=\"caption-attachment-32384\" class=\"wp-caption-text\">Strategy Flow<\/figcaption><\/figure>\n<h2><span class=\"ez-toc-section\" id=\"Measuring-the-improvements\"><\/span>Measuring the improvements<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>To measure whether the optimization work on our ResNet paid off, we developed an evaluation pipeline that reveals how the performance of our ResNet improved. Multiple retrained networks can be benchmarked against each other. So before presenting our results and takeaways, let me introduce you to our evaluation pipeline.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"How-we-measure-improvements\"><\/span>How we measure improvements<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>For the implementation of our evaluation concept, the dataset provided by Krohne is used. For evaluation, this dataset is divided into two batches (1). The first batch consists of 80 percent of the images. These are used to build a database for each model we want to evaluate. The other batch contains the remaining images and is used to evaluate how reliably the network can extract the matching manual entry.<\/p>\n<figure id=\"attachment_32386\" aria-describedby=\"caption-attachment-32386\" style=\"width: 866px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-32386 \" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/eval_flow.png\" alt=\"depiction of the evaluation pipeline\" width=\"866\" height=\"345\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/eval_flow.png 2779w, https:\/\/www.inovex.de\/wp-content\/uploads\/eval_flow-300x119.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/eval_flow-1024x408.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/eval_flow-768x306.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/eval_flow-1536x611.png 1536w, https:\/\/www.inovex.de\/wp-content\/uploads\/eval_flow-2048x815.png 2048w, https:\/\/www.inovex.de\/wp-content\/uploads\/eval_flow-1920x765.png 1920w, https:\/\/www.inovex.de\/wp-content\/uploads\/eval_flow-400x159.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/eval_flow-360x143.png 360w\" sizes=\"auto, (max-width: 866px) 100vw, 866px\" \/><figcaption id=\"caption-attachment-32386\" class=\"wp-caption-text\">Our evaluation pipeline<\/figcaption><\/figure>\n<p>After splitting, the database entries for each model are set. To do so, all images of the corresponding batch are fed into the network, and the vector of the last layer prior to classification is extracted (2) and stored in an Elasticsearch index together with the label assigned to the respective image (3). After creating a separate database for each network, they are presented with the images of the second batch. The resulting output vectors are extracted again (4) and matched for cosine similarity with the vectors stored in Elasticsearch (5). The label that is stored alongside the most similar vector is returned. By comparing this label with the label of the input image, the output of the network can be titled as a hit or a miss (6). After processing all images of the evaluation batch, a hit rate is calculated (7). This value serves as an indication of the accuracy of the network, which we use as a comparison value with the other networks tested.<\/p>\n<p>The entire evaluation process is repeated ten times and an average score is calculated. In each run, the images are reshuffled. This increases the repeatability of the experiment and thus ensures that we can compare multiple evaluation runs.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Evaluation-results-and-takeaways\"><\/span>Evaluation results and takeaways<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>It is now time to reveal the results and takeaways we gained during our model optimization. We investigated three different factors and determined how they affect the image retrieval capability of our model: The choice of the base ResNet architecture, the choice of datasets for retraining and methods to combine them as well as the effect of hyperparameter optimization on the evaluation results. Overall, an evaluation result improvement of 3.9% could be achieved, leading to an accuracy of 85.3% during evaluation. In the following, I will summarize our insights. All statements refer to our use case and have not been tested for their applicability to other use cases.<\/p>\n<h4><span class=\"ez-toc-section\" id=\"1-ResNets-pre-trained-on-ImageNet-data-are-powerful-feature-extractors\"><\/span>1. ResNets pre-trained on ImageNet data are powerful feature extractors<span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p>When we exposed pre-trained ResNets without preparing them for their task, they achieved remarkable results. They were able to identify the correct label in up to 82% of all cases.<\/p>\n<h4><span class=\"ez-toc-section\" id=\"2-ResNets-version-2-is-superior\"><\/span>2. ResNet\u2019s version 2 is superior<span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p>As the chart shows, all ResNetV2\u2019s outperformed their predecessors. However, the results of the ResNetV2 networks differ only marginally. Further, our optimization strategies affected all V2 models similarly; accordingly, each V2 model is a suitable choice for our use case.<\/p>\n<figure id=\"attachment_32388\" aria-describedby=\"caption-attachment-32388\" style=\"width: 573px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-32388 \" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/resnet_eval_1.png\" alt=\"Evaluation chart of the results of pre-trained ResNets\" width=\"573\" height=\"422\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/resnet_eval_1.png 798w, https:\/\/www.inovex.de\/wp-content\/uploads\/resnet_eval_1-300x221.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/resnet_eval_1-768x565.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/resnet_eval_1-400x294.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/resnet_eval_1-360x265.png 360w\" sizes=\"auto, (max-width: 573px) 100vw, 573px\" \/><figcaption id=\"caption-attachment-32388\" class=\"wp-caption-text\">Evaluation result of pre-trained ResNets<\/figcaption><\/figure>\n<h4><span class=\"ez-toc-section\" id=\"3-Finetuning-a-pre-trained-network-for-their-upcoming-task-pays-off\"><\/span>3. Finetuning a pre-trained network for their upcoming task pays off<span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p>We discovered that the accuracy of the network and the similarity between the datasets applied for retraining and the target dataset are positively correlated. As proof of this, consider the graph where we trained our models with only one of our datasets. While training on datasets that contained similar structures to those in the target data increased our models\u2019 performance, totally unrelated datasets such as a collection of dog pictures worsened their accuracy. A training on target data showed the highest accuracy and outperforms all previous attempts due to a maximal similarity between retrained and target data. It should be noted, however, that some images appear twice in the target dataset which could simplify the image retrieval task for this model during evaluation.<\/p>\n<p>Based on the positive correlation found between the image retrieval capability and the similarity between retraining and target data, the network we deployed in our search engine is retrained on the Krohne dataset.<\/p>\n<figure id=\"attachment_32390\" aria-describedby=\"caption-attachment-32390\" style=\"width: 589px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-32390 \" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/resnet_eval_2.png\" alt=\"graph of Evaluation results of ResNets \" width=\"589\" height=\"427\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/resnet_eval_2.png 771w, https:\/\/www.inovex.de\/wp-content\/uploads\/resnet_eval_2-300x218.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/resnet_eval_2-768x557.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/resnet_eval_2-400x290.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/resnet_eval_2-360x261.png 360w\" sizes=\"auto, (max-width: 589px) 100vw, 589px\" \/><figcaption id=\"caption-attachment-32390\" class=\"wp-caption-text\">Evaluation result of ResNets finetuned to one category<\/figcaption><\/figure>\n<h4><span class=\"ez-toc-section\" id=\"4-If-no-suitable-dataset-is-available-for-retraining-use-the-cross-dataset-strategy\"><\/span>4. If no suitable dataset is available for retraining, use the cross-dataset strategy<span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p>We investigated which strategy for training on multiple datasets is superior. Our concern that the sequential training method would lead to a discard of previously learned structures was confirmed. ResNets trained on all three datasets consecutively showed similar behavior to ResNets trained on the last dataset only.<\/p>\n<p>If no dataset can be found that can cover all image categories occurring in the target dataset, different datasets should be collected and shuffled into one dataset. By retraining with a combined dataset, a performance improvement of up to 1.9% compared to training on a single category and an improvement of 2.0% compared to a network pre-trained only on ImageNet data of the same architecture could be measured, validating statement 3.<\/p>\n<h4><span class=\"ez-toc-section\" id=\"5-An-increased-validation-accuracy-does-not-correlate-with-a-better-image-retrieval-capability-but-a-hyperparameter-optimization-might-be-worth-it\"><\/span>5. An increased validation accuracy does not correlate with a better image retrieval capability (but a hyperparameter optimization might be worth it)<span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p>The graph shows the evaluation results of the models that scored the best validation accuracies during retraining, in descending order. You can easily notice that an increased validation accuracy does not correlate with a better image retrieval ability.<\/p>\n<p>Nevertheless, parameter optimization allowed us to identify models with high image retrieval capabilities. The best evaluation result measured in this study (+3.9%) was achieved by a model found during hyperparameter optimization.<\/p>\n<figure id=\"attachment_32404\" aria-describedby=\"caption-attachment-32404\" style=\"width: 613px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-32404 size-full\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/resnet_eval_3-1.png\" alt=\"graph of Evaluation result of the best-scoring models\" width=\"613\" height=\"418\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/resnet_eval_3-1.png 613w, https:\/\/www.inovex.de\/wp-content\/uploads\/resnet_eval_3-1-300x205.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/resnet_eval_3-1-400x273.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/resnet_eval_3-1-360x245.png 360w\" sizes=\"auto, (max-width: 613px) 100vw, 613px\" \/><figcaption id=\"caption-attachment-32404\" class=\"wp-caption-text\">Evaluation result of the best-scoring models of hyperparameter optimization<\/figcaption><\/figure>\n<h2><span class=\"ez-toc-section\" id=\"And-how-does-it-perform-in-action\"><\/span>And how does it perform in action?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Since our model will not deal with manual images later on but with images from built-in devices, the really important question is: How does our ResNet specialized on Krohne data perform in practice? We already deployed the model in our search engine, so let\u2019s answer this by uploading some test images:<i><span style=\"font-weight: 400;\"><br \/>\n<\/span><\/i><\/p>\n<table>\n<tbody>\n<tr>\n<td><span style=\"font-weight: 400;\">Input image<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Retrieved image(s)<\/span><\/td>\n<\/tr>\n<tr>\n<td><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-32394\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/krohne_1.png\" alt=\"\" width=\"231\" height=\"154\" \/><\/td>\n<td><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-32396\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/krohne_1_1.png\" alt=\"\" width=\"132\" height=\"132\" \/><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-32398\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/krohne_1_2.png\" alt=\"\" width=\"146\" height=\"132\" \/><\/td>\n<\/tr>\n<tr>\n<td><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-32400\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/krohne_2.png\" alt=\"\" width=\"231\" height=\"152\" \/><\/td>\n<td><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-32406\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/krohne_2_1.png\" alt=\"\" width=\"131\" height=\"135\" \/><\/td>\n<\/tr>\n<tr>\n<td><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-32408\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/krohne_3.png\" alt=\"\" width=\"207\" height=\"117\" \/><\/td>\n<td><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-32410\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/krohne_3_1.png\" alt=\"\" width=\"145\" height=\"119\" \/><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-32412\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/krohne_3_2.png\" alt=\"\" width=\"118\" height=\"94\" \/><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>As you can see, devices with no distracting objects in the background (see examples #1 and #2) are recognized by the model. The extracted images look similar to the captured image. However, note that the extracted device does not necessarily match the exact device that is shown on the input image. Further, if other objects are present, the results indicate that the model cannot extract the device of interest from the image (see example #3). Instead, it considers the entire image context \u2013 unfavorable considering that the technician will later shoot an image of a built-in device.<\/p>\n<p>Therefore, we are not yet satisfied with the final result \u2013 however, it is quite obvious what our model is struggling with. To support our model in extracting the object of interest, we developed the following approaches:<\/p>\n<ul>\n<li>Removing irrelevant background details during preprocessing<\/li>\n<li>Retraining the model with captured photos of built-in devices<\/li>\n<\/ul>\n<p>We will address these approaches and evaluate if they can further boost the accuracy of our model. Any updates will be shared on this blog \u2013 stay tuned!<\/p>\n<p>&nbsp;<\/p>\n<h2><span class=\"ez-toc-section\" id=\"References\"><\/span>References<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>[1] Ryszard S Choras. Image feature extraction techniques and their applications for cbir and biometrics systems. International journal of biology and biomedical engineering, 1(1):6\u201316, 2007.<\/p>\n<p>[2]\u00a0 Artem Babenko, Anton Slesarev, Alexandr Chigorin, and Victor Lempitsky. Neural codes for image retrieval. In European conference on computer vision, pages 584\u2013599. Springer, 2014.<\/p>\n<p>[3]\u00a0 Wu, Zhirong and Song, Shuran and Khosla, Aditya and Yu, Fisher and Zhang, Linguang and Tang, Xiaoou and Xiao, Jianxiong. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1912-1920. 2015.<\/p>\n<p>[4]\u00a0 Eyad Elyan, Carlos Moreno-Garc \u0301\u0131a, and Pamela Johnston. Symbols in Engineering Drawings (SiED): An Imbalanced Dataset Benchmarked by Convolutional Neural Networks, pages 215\u2013224. 05 2020.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this blog post, I will show you how we finetuned and evaluated a ResNet pre-trained on generic ImageNet data to a specific use case. I will share the takeaways we gained during our evaluation and reveal how well our optimized document retrieval works in practice.<\/p>\n","protected":false},"author":255,"featured_media":32950,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"ep_exclude_from_search":false,"footnotes":""},"tags":[150,151,393,381,570],"service":[76,436],"coauthors":[{"id":255,"display_name":"Marina Siebold","user_nicename":"msiebold"}],"class_list":["post-32704","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","tag-computer-vision","tag-deep-learning","tag-image-retrieval","tag-service-meister","tag-xai","service-artificial-intelligence","service-computer-vision"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Finetuning a ResNet for a Content-Based Image Retrieval Task - inovex GmbH<\/title>\n<meta name=\"description\" content=\"In this blog post, I will show you how we fine-tuned a ResNet pre-trained on ImageNet data for content-based image retrieval.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.inovex.de\/de\/blog\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Finetuning a ResNet for a Content-Based Image Retrieval Task - inovex GmbH\" \/>\n<meta property=\"og:description\" content=\"In this blog post, I will show you how we fine-tuned a ResNet pre-trained on ImageNet data for content-based image retrieval.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.inovex.de\/de\/blog\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\/\" \/>\n<meta property=\"og:site_name\" content=\"inovex GmbH\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/inovexde\" \/>\n<meta property=\"article:published_time\" content=\"2021-11-19T11:27:34+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-11-17T11:31:22+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/resnet-content-based-image-retrieval-1.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1920\" \/>\n\t<meta property=\"og:image:height\" content=\"1080\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Marina Siebold\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/resnet-content-based-image-retrieval-1-1024x576.png\" \/>\n<meta name=\"twitter:creator\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:site\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"Marina Siebold\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"19\u00a0Minuten\" \/>\n\t<meta name=\"twitter:label3\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data3\" content=\"Marina Siebold\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\\\/\"},\"author\":{\"name\":\"Marina Siebold\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/person\\\/862590cf31f4241a4b4c3d2712ed78a5\"},\"headline\":\"Finetuning a ResNet for a Content-Based Image Retrieval Task\",\"datePublished\":\"2021-11-19T11:27:34+00:00\",\"dateModified\":\"2022-11-17T11:31:22+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\\\/\"},\"wordCount\":3214,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/resnet-content-based-image-retrieval-1.png\",\"keywords\":[\"Computer Vision\",\"Deep Learning\",\"Image Retrieval\",\"Service-Meister\",\"XAI\"],\"articleSection\":[\"Analytics\",\"English Content\",\"General\"],\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\\\/\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\\\/\",\"name\":\"Finetuning a ResNet for a Content-Based Image Retrieval Task - inovex GmbH\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/resnet-content-based-image-retrieval-1.png\",\"datePublished\":\"2021-11-19T11:27:34+00:00\",\"dateModified\":\"2022-11-17T11:31:22+00:00\",\"description\":\"In this blog post, I will show you how we fine-tuned a ResNet pre-trained on ImageNet data for content-based image retrieval.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\\\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/resnet-content-based-image-retrieval-1.png\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/resnet-content-based-image-retrieval-1.png\",\"width\":1920,\"height\":1080,\"caption\":\"A pyramid being recognized in an image via content based image retrieval\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Finetuning a ResNet for a Content-Based Image Retrieval Task\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#website\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\",\"name\":\"inovex GmbH\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\",\"name\":\"inovex GmbH\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/inovex-logo-16-9-1.png\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/inovex-logo-16-9-1.png\",\"width\":1921,\"height\":1081,\"caption\":\"inovex GmbH\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/inovexde\",\"https:\\\/\\\/x.com\\\/inovexgmbh\",\"https:\\\/\\\/www.instagram.com\\\/inovexlife\\\/\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/inovex\",\"https:\\\/\\\/www.youtube.com\\\/channel\\\/UC7r66GT14hROB_RQsQBAQUQ\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/person\\\/862590cf31f4241a4b4c3d2712ed78a5\",\"name\":\"Marina Siebold\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/Marina-Siebold_avatar_1631012344-scaled-96x96.jpege3ae1abc38bdc982549cb28bed4a70ee\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/Marina-Siebold_avatar_1631012344-scaled-96x96.jpeg\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/Marina-Siebold_avatar_1631012344-scaled-96x96.jpeg\",\"caption\":\"Marina Siebold\"},\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/author\\\/msiebold\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Finetuning a ResNet for a Content-Based Image Retrieval Task - inovex GmbH","description":"In this blog post, I will show you how we fine-tuned a ResNet pre-trained on ImageNet data for content-based image retrieval.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.inovex.de\/de\/blog\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\/","og_locale":"de_DE","og_type":"article","og_title":"Finetuning a ResNet for a Content-Based Image Retrieval Task - inovex GmbH","og_description":"In this blog post, I will show you how we fine-tuned a ResNet pre-trained on ImageNet data for content-based image retrieval.","og_url":"https:\/\/www.inovex.de\/de\/blog\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\/","og_site_name":"inovex GmbH","article_publisher":"https:\/\/www.facebook.com\/inovexde","article_published_time":"2021-11-19T11:27:34+00:00","article_modified_time":"2022-11-17T11:31:22+00:00","og_image":[{"width":1920,"height":1080,"url":"https:\/\/www.inovex.de\/wp-content\/uploads\/resnet-content-based-image-retrieval-1.png","type":"image\/png"}],"author":"Marina Siebold","twitter_card":"summary_large_image","twitter_image":"https:\/\/www.inovex.de\/wp-content\/uploads\/resnet-content-based-image-retrieval-1-1024x576.png","twitter_creator":"@inovexgmbh","twitter_site":"@inovexgmbh","twitter_misc":{"Verfasst von":"Marina Siebold","Gesch\u00e4tzte Lesezeit":"19\u00a0Minuten","Written by":"Marina Siebold"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.inovex.de\/de\/blog\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\/#article","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/blog\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\/"},"author":{"name":"Marina Siebold","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/862590cf31f4241a4b4c3d2712ed78a5"},"headline":"Finetuning a ResNet for a Content-Based Image Retrieval Task","datePublished":"2021-11-19T11:27:34+00:00","dateModified":"2022-11-17T11:31:22+00:00","mainEntityOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\/"},"wordCount":3214,"commentCount":0,"publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/resnet-content-based-image-retrieval-1.png","keywords":["Computer Vision","Deep Learning","Image Retrieval","Service-Meister","XAI"],"articleSection":["Analytics","English Content","General"],"inLanguage":"de","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.inovex.de\/de\/blog\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.inovex.de\/de\/blog\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\/","url":"https:\/\/www.inovex.de\/de\/blog\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\/","name":"Finetuning a ResNet for a Content-Based Image Retrieval Task - inovex GmbH","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\/#primaryimage"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/resnet-content-based-image-retrieval-1.png","datePublished":"2021-11-19T11:27:34+00:00","dateModified":"2022-11-17T11:31:22+00:00","description":"In this blog post, I will show you how we fine-tuned a ResNet pre-trained on ImageNet data for content-based image retrieval.","breadcrumb":{"@id":"https:\/\/www.inovex.de\/de\/blog\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.inovex.de\/de\/blog\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/blog\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\/#primaryimage","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/resnet-content-based-image-retrieval-1.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/resnet-content-based-image-retrieval-1.png","width":1920,"height":1080,"caption":"A pyramid being recognized in an image via content based image retrieval"},{"@type":"BreadcrumbList","@id":"https:\/\/www.inovex.de\/de\/blog\/finetuning-a-resnet-for-a-content-based-image-retrieval-task\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.inovex.de\/de\/"},{"@type":"ListItem","position":2,"name":"Finetuning a ResNet for a Content-Based Image Retrieval Task"}]},{"@type":"WebSite","@id":"https:\/\/www.inovex.de\/de\/#website","url":"https:\/\/www.inovex.de\/de\/","name":"inovex GmbH","description":"","publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.inovex.de\/de\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Organization","@id":"https:\/\/www.inovex.de\/de\/#organization","name":"inovex GmbH","url":"https:\/\/www.inovex.de\/de\/","logo":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","width":1921,"height":1081,"caption":"inovex GmbH"},"image":{"@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/inovexde","https:\/\/x.com\/inovexgmbh","https:\/\/www.instagram.com\/inovexlife\/","https:\/\/www.linkedin.com\/company\/inovex","https:\/\/www.youtube.com\/channel\/UC7r66GT14hROB_RQsQBAQUQ"]},{"@type":"Person","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/862590cf31f4241a4b4c3d2712ed78a5","name":"Marina Siebold","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/wp-content\/uploads\/Marina-Siebold_avatar_1631012344-scaled-96x96.jpege3ae1abc38bdc982549cb28bed4a70ee","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/Marina-Siebold_avatar_1631012344-scaled-96x96.jpeg","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/Marina-Siebold_avatar_1631012344-scaled-96x96.jpeg","caption":"Marina Siebold"},"url":"https:\/\/www.inovex.de\/de\/blog\/author\/msiebold\/"}]}},"_links":{"self":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/32704","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/users\/255"}],"replies":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/comments?post=32704"}],"version-history":[{"count":5,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/32704\/revisions"}],"predecessor-version":[{"id":32949,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/32704\/revisions\/32949"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media\/32950"}],"wp:attachment":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media?parent=32704"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/tags?post=32704"},{"taxonomy":"service","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/service?post=32704"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/coauthors?post=32704"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}