{"id":20158,"date":"2020-11-13T10:19:37","date_gmt":"2020-11-13T09:19:37","guid":{"rendered":"https:\/\/www.inovex.de\/blog\/?p=20158"},"modified":"2022-12-02T08:33:20","modified_gmt":"2022-12-02T07:33:20","slug":"tensorflow-lite-concepts-architectures","status":"publish","type":"post","link":"https:\/\/www.inovex.de\/de\/blog\/tensorflow-lite-concepts-architectures\/","title":{"rendered":"Deep Learning for Mobile Devices with TensorFlow Lite: Concepts and Architectures"},"content":{"rendered":"<p>The amount of mobile applications making use of some sort of machine learning is quickly increasing, just as the number of potential use cases in this area. Whenever you chat with your virtual assistant about upcoming events or convert yourself into an avocado with a snapchat filter for your social media followers, you employ machine learning. The tools and frameworks jungle comprises more and more libraries to run those machine learning models directly on your mobile device, instead of sending the data back and forth to an inference server. One of those libraries is TensorFlow Lite, the successor of TensorFlow Mobile, of which you have probably read about in our <a href=\"https:\/\/www.inovex.de\/blog\/tensorflow-mobile-training-and-deploying-a-neural-network\/\">older blog post<\/a>. Indeed, this blog post series serves as a revised and extended rewrite of our article about TensorFlow Mobile and is divided into three parts. This first post tackles some of the theoretical background of on-device machine learning, including quantization and state-of-the-art model architectures. The second part deals with quantization-aware model training with the TensorFlow Object Detection API. The third part of this series describes how you can convert a model with the TensorFlow Lite model converter and how you can deploy the model using Android Studio.<\/p>\n<p>If you are new to deep learning, I recommend you to first read our blog posts about <a href=\"https:\/\/www.inovex.de\/blog\/deep-learning-fundamentals\/\">Deep Learning Fundamentals<\/a> and the <a href=\"https:\/\/www.inovex.de\/blog\/artificial-neural-networks-concepts-methods\/\">concepts and methods of artificial neural networks<\/a> before you move on with this post.<\/p>\n<p><!--more--><\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_83 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\"><p class=\"ez-toc-title\" style=\"cursor:inherit\"><\/p>\n<\/div><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.inovex.de\/de\/blog\/tensorflow-lite-concepts-architectures\/#Why-TensorFlow-Lite\" >Why TensorFlow Lite?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.inovex.de\/de\/blog\/tensorflow-lite-concepts-architectures\/#Basic-concepts-in-mobile-DL\" >Basic concepts in mobile DL<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.inovex.de\/de\/blog\/tensorflow-lite-concepts-architectures\/#Quantization\" >Quantization<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.inovex.de\/de\/blog\/tensorflow-lite-concepts-architectures\/#State-of-the-art-model-architectures-for-mobile-DL\" >State-of-the-art model architectures for mobile DL<\/a><ul class='ez-toc-list-level-4' ><li class='ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.inovex.de\/de\/blog\/tensorflow-lite-concepts-architectures\/#MobileNet\" >MobileNet<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.inovex.de\/de\/blog\/tensorflow-lite-concepts-architectures\/#ShuffleNet\" >ShuffleNet<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.inovex.de\/de\/blog\/tensorflow-lite-concepts-architectures\/#EfficientNet\" >EfficientNet<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/www.inovex.de\/de\/blog\/tensorflow-lite-concepts-architectures\/#Once-for-all-Network-OFA\" >Once-for-all Network (OFA)<\/a><\/li><\/ul><\/li><\/ul><\/li><\/ul><\/nav><\/div>\n<h2><span class=\"ez-toc-section\" id=\"Why-TensorFlow-Lite\"><\/span>Why TensorFlow Lite?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>&nbsp;<\/p>\n<p>TensorFlow Lite is a set of tools within the TensorFlow ecosystem to help developers run TensorFlow models on mobile, embedded, and IoT devices. It enables on-device machine learning inference with low latency and a small binary size. It comprises two main components, the TensorFlow Lite interpreter and the TensorFlow Lite converter. The former is used in the application to infer your optimized models on different platforms, including mobile phones, embedded Linux devices and microcontrollers. The TensorFlow converter converts the machine learning models into a format that can be understood by the interpreter. It also applies optimizations during the conversion process to improve performance and binary size of the model. With TensorFlow Lite, the machine learning models are inferred directly on the hardware, which comes along with some obvious advantages like reduced latency, since there is no network round-trip, increased privacy, because all data can stay on the device, and no need for an internet connection. Since TensorFlow Lite for Microcontrollers is still experimental, we will focus on TensorFlow Lite for mobile phones.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Basic-concepts-in-mobile-DL\"><\/span>Basic concepts in mobile DL<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>&nbsp;<\/p>\n<p>Before we start building our TensorFlow Lite model, we will have a short look at some basic concepts and optimizations applied to machine learning models for use in mobile applications, including weight quantization and state-of-the art model architectures. The TensorFlow Lite interpreter currently supports a limited subset of TensorFlow operators that have been optimized for on-device use. This means that some models require additional steps to work with TensorFlow Lite. Thus, we will also have a look at the operators compatible with TensorFlow Lite. I will not provide a full list of all compatible operators here, especially since the amount of compatible operators increases with every release of TensorFlow Lite. If you want to build your own model and use it in a mobile application, have a look at the <a href=\"https:\/\/www.tensorflow.org\/lite\/guide\/ops_compatibility?hl=de\">documentation on compatible operators<\/a>.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Quantization\"><\/span>Quantization<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Machine Learning models are everywhere nowadays and it is crucial to run them efficiently, no matter whether you deploy them in the cloud or on the edge. For cloud deployment, you probably want to reduce latency or restrict the inference to only run on CPU because GPU servers are too expensive. Devices at the edge typically have lower computing capabilities and are constrained in memory and power consumption. Therefore, there is a pressing need for techniques to optimize models for reduced model size, faster inference and lower power consumption. One way to achieve this is to quantize the model either during the training process or afterwards.<\/p>\n<p>Quantization refers to the process of reducing the number of bits that represent a number. In the context of deep learning, that means to reduce the information stored in each weight. It has been extensively demonstrated that weights and activations can be represented using 8-bit integers without incurring significant loss in accuracy. If you are interested in details on this topic, have a look at some <a href=\"https:\/\/paperswithcode.com\/task\/quantization\">papers with code on the task of quantization<\/a>. Basically, the idea behind quantization is to map floating point numbers of infinital range onto fixed buckets of 8-bit integer values. Consequently, quantization is lossy by nature. For example, all floating point numbers between 1.5 and 1.8 will be mapped to the same integer representation, resulting in loss of precision. To mitigate this, TensorFlow also provides quantization-aware model training, which ensures that the forward pass computes the same result for training and inference, regardless of whether the weights are stored as floating point numbers or integers. This is achieved by adding fake quantization nodes to the graph in order to simulate the effect of quantization during forward and backward passes. These fake quantization nodes determine minimum and maximum values for activations during training. In other words, the fake quantization nodes are required to gather dynamic range information as a calibration for the quantization operation. Using this technique, a quantization loss is added to the optimizer as part of the overall loss. As a result, the model tries to learn parameters that are less prone to quantization errors. I created a <a href=\"https:\/\/colab.research.google.com\/github\/inovex\/notebooks\/blob\/main\/quantization_aware_training_mobilenetv2_cars196.ipynb\">colab notebook<\/a> demonstrating quantization-aware training of an image classification model. If you want to get an in-depth look of quantization aware training in TensorFlow, I recommend reading the Google paper <a href=\"https:\/\/arxiv.org\/pdf\/1712.05877.pdf\">Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference<\/a>.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"State-of-the-art-model-architectures-for-mobile-DL\"><\/span>State-of-the-art model architectures for mobile DL<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Besides the optimizations that are applied to trained models in order to speed up inference, there are plenty of different model architectures that enable efficient CNN computation. At this point I have to mention that I am a computer vision guy, so this list deals with my personal affection for vision models.<\/p>\n<h4><span class=\"ez-toc-section\" id=\"MobileNet\"><\/span>MobileNet<span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p>The probably most known neural network architecture for mobile deep learning is MobileNet, whose <a href=\"https:\/\/arxiv.org\/abs\/1905.02244\">latest version can be found here<\/a>. MobileNet uses depthwise separable convolutions, which factorize a standard convolution into a depthwise convolution and a 1 \u00d7 1 convolution. Frankly, the depthwise convolution applies a single filter to each input channel and the 1 \u00d7 1 is used to combine the outputs of the former. Generally, it needs to be mentioned that not every kind of convolution is separable. Indeed, if we restrict\u00a0 ourselves to separable\u00a0 convolutions, we actually restrict the capabilities of our model. On the other hand, separating convolutions drastically reduces the amount of parameters and speeds up computation. The authors of MobileNet mention that by using 3 \u00d7 3 depth wise separable convolutions, they were able to reduce the number of parameters 8 to 9 times compared to standard convolutions. The following graphic illustrates the depthwise separable convolution in two steps.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-20159 size-full\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/11\/depthsepconv.png\" alt=\"\" width=\"623\" height=\"339\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/11\/depthsepconv.png 623w, https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/11\/depthsepconv-300x163.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/11\/depthsepconv-400x218.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/11\/depthsepconv-360x196.png 360w\" sizes=\"auto, (max-width: 623px) 100vw, 623px\" \/><\/p>\n<p>First, in step a), the depthwise convolution is applied to the input image of dimension w \u00d7 h \u00d7 c, where w is the width, h the height and c the number of input channels respectively. The kernel has different spatial dimensions, for example 3 \u00d7 3, but only one channel. Therefore we need c filters to retain the same shape for the output. Second, in step b, the newly generated feature maps are combined by a 1 \u00d7 1 \u00d7 c filter to create a new feature of shape w\u2019 \u00d7 h\u2019 \u00d7 1, which is the result of our depthwise separable convolution. Of course, the output shape can be altered by using multiple kernels in step b), e.g. if you use n kernels, the output shape will be w\u2019 \u00d7 h\u2019 \u00d7 n.<\/p>\n<h4><span class=\"ez-toc-section\" id=\"ShuffleNet\"><\/span>ShuffleNet<span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p>Another recent efficient CNN model is <a href=\"https:\/\/arxiv.org\/pdf\/1707.01083.pdf\">ShuffleNet<\/a>. It combines the previously described depthwise separable convolution into group convolutions and adds a concept called channel shuffling. Group convolutions were first introduced in <a href=\"https:\/\/papers.nips.cc\/paper\/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf\">AlexNet<\/a> to distribute the training to two GPUs. Group convolutions are not applied to all input channels, like a conventional convolution. Instead, we apply each convolution to a subset of the input channels. Imagine we have G groups, then each group g will be applied to Cin\u00a0 \/ G channels. Now, the problem is that we have to combine the output generated by the grouped convolutions, since if we don\u2019t, our model will not learn any cross-group information. This would harm the overall performance of the network. ShuffleNet uses the channel shuffle operation to achieve exactly this. The output of one convolutional layer with G groups results in a G x n shaped feature map, where n is the number of channels per group. Before applying the next convolutions, this tensor is transposed and flattened, thus resulting in new subgroups being fed to the next layer. The graphic below visualizes this concept.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-20160 size-full aligncenter\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/11\/channel_shuffle.png\" alt=\"\" width=\"392\" height=\"310\" \/><\/p>\n<p>In the graphic, we have two stacked group convolutions (GConv) with three groups each, where each group consists of multiple channels. On the left, the second group convolution is applied to the output of the first group convolution and has the same amount of groups. As a result, the convolutions are applied to the exact same group as in GConv1, omitting information exchange across different groups. On the right, however, we apply the channel shuffle prior to the second group convolution. Mixing the output channels of the first group convolution, information of different groups is spread across all other groups for the second group convolution.<\/p>\n<p>&nbsp;<\/p>\n<p>In combination with the grouped depthwise separable convolution, this forms a ShuffleNet Unit, which is displayed in the following graphic.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-20161 size-full aligncenter\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/11\/ShuffleNet.png\" alt=\"\" width=\"722\" height=\"221\" \/><\/p>\n<p>You can see that the architecture is inspired by <a href=\"https:\/\/arxiv.org\/abs\/1512.03385\">ResNet<\/a> which introduced this bottleneck unit style with residual connections.<\/p>\n<h4><span class=\"ez-toc-section\" id=\"EfficientNet\"><\/span>EfficientNet<span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p>A recent branch of efficient CNN architectures embarks in the <a href=\"https:\/\/arxiv.org\/pdf\/1905.11946.pdf\">EfficientNet<\/a> model family, which takes a completely different approach to the previously mentioned models. In its accompanying paper \u201cEfficientNet: Rethinking Model Scaling for Convolutional Neural Networks\u201c, the authors provide an insightful analysis of different model-scaling approaches, including width and height scaling, as well as image resolution scaling with respect to model performance. They introduce a new scaling strategy called compound scaling, which scales all three of the aforementioned dimensions in relation to each other by a user-specified coefficient. The intuition behind this strategy can be found in the following quote from the paper:<\/p>\n<blockquote><p>\u201cWe empirically observe that different scaling dimensions are not independent. Intuitively, for higher resolution images, we should increase network depth, such that the larger receptive fields can help capture similar features that include more pixels in bigger images. Correspondingly, we should also increase network width when resolution is higher, in order to capture more fine-grained patterns with more pixels in high resolution images.\u201c<\/p><\/blockquote>\n<p>The compound scaling introduces four new hyperparameters, whereas each scaling dimension is associated with one hyperparameter and the fourth hyperparameter works as the compound coefficient to uniformly scale the three dimensions. The relationship of these parameters is shown in the following equation, where \u03d5 denotes the compound scaling factor, while ?, ? and ? specify how to assign the scaling to the network depth, width and resolution respectively.<\/p>\n<p>&nbsp;<\/p>\n<p style=\"text-align: center;\"><span style=\"font-weight: 400;\">depth: <\/span><span style=\"font-weight: 400;\">d =\u00a0?<sup>\u03d5<\/sup><\/span><\/p>\n<p style=\"text-align: center;\"><span style=\"font-weight: 400;\">width: <\/span><span style=\"font-weight: 400;\">w =\u00a0?<sup>\u03d5<\/sup><\/span><\/p>\n<p style=\"text-align: center;\"><span style=\"font-weight: 400;\">resolution: <\/span><span style=\"font-weight: 400;\">r =<\/span><span style=\"font-weight: 400;\">\u00a0?<sup>\u03d5<\/sup><\/span><\/p>\n<p style=\"text-align: center;\"><span style=\"font-weight: 400;\">s.t. ? \u00b7<\/span>\u00a0?\u00b2\u00a0\u00b7\u00a0?\u00b2\u00a0\u2248 2<\/p>\n<p style=\"text-align: center;\"><span style=\"font-weight: 400;\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 ? \u2265 1, ?\u00a0\u2265\u00a0<\/span><span style=\"font-weight: 400;\">1, ?<\/span> \u2265\u00a0<span style=\"font-weight: 400;\">1<\/span><\/p>\n<p>&nbsp;<\/p>\n<p>As a proof of concept, the authors evaluate their scaling strategy by applying it to the state-of-the-art CNN architectures MobileNet and ResNet. They complement their work by defining a new baseline architecture generated via neural architecture search. This baseline model, called EfficientNet-B0, is then used to incrementally build a whole model family by first applying a gridsearch to the parameters ?, ? and ? on the baseline network and later scaling the network up by altering the parameter \u03d5 with fixed ?, ? and ?. The authors show that EfficientNet outperforms the scaled versions of MobileNet and ResNet and reaches state-of-the-art results on different datasets while reducing the amount of parameters and FLOPS compared to recent other ConvNet architectures.<\/p>\n<p>By the time of this writing we can find pretrained MobileNets and EfficientNets on <a href=\"https:\/\/tfhub.dev\/\">TensorFlow Hub<\/a> as ready-to-use modules.<\/p>\n<h4><span class=\"ez-toc-section\" id=\"Once-for-all-Network-OFA\"><\/span>Once-for-all Network (OFA)<span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p>A very recent approach to providing neural networks with efficient inference and competitive performance on limited hardware was presented at the International Conference on Learning Representations (ICLR) 2020 in the paper <a href=\"https:\/\/arxiv.org\/abs\/1908.09791\">\u201cOnce-for-All: Train One Network and Specialize it for Efficient Deployment\u201c<\/a>. The authors describe a new approach to Neural Architecture Search (NAS), separating model training from the actual architecture search. This decomposition results in lower design cost (measured in GPU hours per training) and thus a smaller CO2 footprint than previous NAS techniques. While reading the section about EfficientNet above, you just learned about another NAS approach to design efficient neural networks that fit different hardware requirements by incorporating hardware feedback in the architecture search. However, given new inference hardware platforms, these methods need to repeat the architecture search process and retrain the model, resulting in rising GPU hours and cost. OFA takes a different approach. The main idea behind OFA is to jointly train a large network with many different sub-networks and later distill specific sub-networks for specific tasks.<\/p>\n<p>Similar to EfficientNet, the architecture space of OFA comprises different layer configurations with respect to arbitrary number of layers, channels, kernel sizes and input sizes (in the paper denoted as elastic depth, width, kernel size and resolution, respectively). However, the training approach differs from compound scaling introduced in EfficientNet. OFAs are trained using progressive shrinking, which is meant to jointly train many sub-networks of different sizes, which prevents interference between the sub-networks. This is achieved by enforcing a training order from large sub-networks to small sub-networks for the elastic depth, width and kernel size dimensions, while keeping the resolution elastic throughout the whole training process by sampling different image sizes for each batch. As a result, smaller sub-networks share weights with larger ones. Let\u2019s have a look at what this means for each of the scaled model dimensions:<\/p>\n<ul>\n<li>Elastic depth: To derive a small sub-network with only D layers from a larger network with N layers, OFA keeps the first D layers and skips the last N &#8211; D layers. Consequently, the weights of the first D layers are shared between small and large sub-networks. This is illustrated in the figure from the OFA paper below.<\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-20162 size-fusion-600\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/11\/ofa_elastic_depth-600x201.png\" alt=\"\" width=\"600\" height=\"201\" \/><\/p>\n<ul>\n<li>Elastic width: Width corresponds to the number of channels per layer. The number of channels is reduced analogously to the number of layers. However, they are first sorted according to their importance with respect to the learning objective. The importance of a channel is measured as the L1 norm of its weights, while a larger L1 means more important. This is illustrated in the figure from the OFA paper below.<\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-20164 size-fusion-600\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/11\/ofa_elastic_width-600x142.png\" alt=\"\" width=\"600\" height=\"142\" \/><\/p>\n<ul>\n<li>Elastic kernel size: OFA supports kernel sizes of 7&#215;7, 5&#215;5 and 3&#215;3. The smaller kernels can be merged into a large kernel while keeping the same center position. However, a problem with this approach is that the smaller sub-kernels may need to serve in different roles with different distributions or magnitudes. Kernel transformation matrices are introduced to compensate for this issue.\u00a0 This is illustrated in the figure from the OFA paper below.<\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-20163 size-fusion-600\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/11\/ofa_elastic_kernel-600x234.png\" alt=\"\" width=\"600\" height=\"234\" \/><\/p>\n<p>Once the training is finished, we can derive a specialized sub-network that fits our deployment scenario. A deployment scenario comprises a combination of efficiency constraints, like latency or energy consumption while maintaining a high accuracy. As mentioned earlier, OFA decouples model training from architecture search. This is achieved by building neural-network-twins to predict latency and accuracy given a neural network architecture. Specifically, an accuracy predictor is trained on 16K randomly sampled sub-networks, whereas each sub-network is evaluated on 10K validation samples (in the paper this corresponds to ImageNet validation images). The accuracy predictor is a three layer feedforward neural network, which receives each layer of a sub-network and the input resolution encoded as one-hot-vectors with zero vectors for layers that are skipped. Alongside this accuracy predictor, a latency lookup table is used to derive the latency of a model on each target hardware platform. Given the target hardware constraint, an evolutionary NAS is performed on the neural-network-twins to derive the optimal sub-network.<\/p>\n<p>The authors provide a PyTorch implementation of their approach on <a href=\"https:\/\/github.com\/mit-han-lab\/once-for-all\">GitHub<\/a>, as well as a nice <a href=\"https:\/\/colab.research.google.com\/github\/mit-han-lab\/once-for-all\/blob\/master\/tutorial\/ofa.ipynb\">Colab Notebook tutorial<\/a> that demonstrated OFA on ImageNet. Unfortunately, by the time of this writing there is no existing TensorFlow implementation of OFA.<\/p>\n<p>That is it with the theory for now. In the next part of this series, we will get our hands dirty and start with our model training using quantization-aware training with the TensorFlow Object Detection API. Stay tuned!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The amount of mobile applications making use of some sort of machine learning is quickly increasing, just as the number of potential use cases in this area. Whenever you chat with your virtual assistant about upcoming events or convert yourself into an avocado with a snapchat filter for your social media followers, you employ machine [&hellip;]<\/p>\n","protected":false},"author":137,"featured_media":20178,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"ep_exclude_from_search":false,"footnotes":""},"tags":[510,151,140],"service":[420,76],"coauthors":[{"id":137,"display_name":"Robin Baumann","user_nicename":"rbaumann"}],"class_list":["post-20158","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","tag-apps-2","tag-deep-learning","tag-machine-learning","service-apps","service-artificial-intelligence"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.7 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Deep Learning for mobile devices with TensorFlow Lite: Concepts and Architectures<\/title>\n<meta name=\"description\" content=\"This first post tackles some of the theoretical background of on-device machine learning, including quantization and state-of-the-art model architectures for TensorFlow Lite.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.inovex.de\/de\/blog\/tensorflow-lite-concepts-architectures\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Deep Learning for mobile devices with TensorFlow Lite: Concepts and Architectures\" \/>\n<meta property=\"og:description\" content=\"This first post tackles some of the theoretical background of on-device machine learning, including quantization and state-of-the-art model architectures for TensorFlow Lite.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.inovex.de\/de\/blog\/tensorflow-lite-concepts-architectures\/\" \/>\n<meta property=\"og:site_name\" content=\"inovex GmbH\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/inovexde\" \/>\n<meta property=\"article:published_time\" content=\"2020-11-13T09:19:37+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-12-02T07:33:20+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/11\/tensorflow-lite-architecture.png\" \/>\n\t<meta property=\"og:image:width\" content=\"5560\" \/>\n\t<meta property=\"og:image:height\" content=\"3128\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Robin Baumann\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/11\/tensorflow-lite-architecture-1024x576.png\" \/>\n<meta name=\"twitter:creator\" content=\"@_RobinBaumann\" \/>\n<meta name=\"twitter:site\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"Robin Baumann\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"14\u00a0Minuten\" \/>\n\t<meta name=\"twitter:label3\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data3\" content=\"Robin Baumann\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/tensorflow-lite-concepts-architectures\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/tensorflow-lite-concepts-architectures\\\/\"},\"author\":{\"name\":\"Robin Baumann\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/person\\\/bf2965260253341edd321d75f38dca81\"},\"headline\":\"Deep Learning for Mobile Devices with TensorFlow Lite: Concepts and Architectures\",\"datePublished\":\"2020-11-13T09:19:37+00:00\",\"dateModified\":\"2022-12-02T07:33:20+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/tensorflow-lite-concepts-architectures\\\/\"},\"wordCount\":2737,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/tensorflow-lite-concepts-architectures\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2020\\\/11\\\/tensorflow-lite-architecture.png\",\"keywords\":[\"Apps\",\"Deep Learning\",\"Machine Learning\"],\"articleSection\":[\"Analytics\",\"English Content\",\"General\"],\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/tensorflow-lite-concepts-architectures\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/tensorflow-lite-concepts-architectures\\\/\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/tensorflow-lite-concepts-architectures\\\/\",\"name\":\"Deep Learning for mobile devices with TensorFlow Lite: Concepts and Architectures\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/tensorflow-lite-concepts-architectures\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/tensorflow-lite-concepts-architectures\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2020\\\/11\\\/tensorflow-lite-architecture.png\",\"datePublished\":\"2020-11-13T09:19:37+00:00\",\"dateModified\":\"2022-12-02T07:33:20+00:00\",\"description\":\"This first post tackles some of the theoretical background of on-device machine learning, including quantization and state-of-the-art model architectures for TensorFlow Lite.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/tensorflow-lite-concepts-architectures\\\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/tensorflow-lite-concepts-architectures\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/tensorflow-lite-concepts-architectures\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2020\\\/11\\\/tensorflow-lite-architecture.png\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2020\\\/11\\\/tensorflow-lite-architecture.png\",\"width\":5560,\"height\":3128,\"caption\":\"Tensorflow lite on a phone next to a drafting compass\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/tensorflow-lite-concepts-architectures\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Deep Learning for Mobile Devices with TensorFlow Lite: Concepts and Architectures\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#website\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\",\"name\":\"inovex GmbH\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\",\"name\":\"inovex GmbH\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/inovex-logo-16-9-1.png\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/inovex-logo-16-9-1.png\",\"width\":1921,\"height\":1081,\"caption\":\"inovex GmbH\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/inovexde\",\"https:\\\/\\\/x.com\\\/inovexgmbh\",\"https:\\\/\\\/www.instagram.com\\\/inovexlife\\\/\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/inovex\",\"https:\\\/\\\/www.youtube.com\\\/channel\\\/UC7r66GT14hROB_RQsQBAQUQ\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/person\\\/bf2965260253341edd321d75f38dca81\",\"name\":\"Robin Baumann\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/BWsmall-96x96.jpg6394835c952c64696603cb897fef6e76\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/BWsmall-96x96.jpg\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/BWsmall-96x96.jpg\",\"caption\":\"Robin Baumann\"},\"description\":\"Hi there! My name is Robin and I work as a Data Scientist for inovex. I am particularly interested in applications of Deep Learning in the area of visual computing. Therefore, most of my blog articles deal with the things that happen when you call model.fit() on image or 3D data.\",\"sameAs\":[\"https:\\\/\\\/www.linkedin.com\\\/in\\\/robin-baumann-272959159\\\/\",\"https:\\\/\\\/x.com\\\/_RobinBaumann\"],\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/author\\\/rbaumann\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Deep Learning for mobile devices with TensorFlow Lite: Concepts and Architectures","description":"This first post tackles some of the theoretical background of on-device machine learning, including quantization and state-of-the-art model architectures for TensorFlow Lite.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.inovex.de\/de\/blog\/tensorflow-lite-concepts-architectures\/","og_locale":"de_DE","og_type":"article","og_title":"Deep Learning for mobile devices with TensorFlow Lite: Concepts and Architectures","og_description":"This first post tackles some of the theoretical background of on-device machine learning, including quantization and state-of-the-art model architectures for TensorFlow Lite.","og_url":"https:\/\/www.inovex.de\/de\/blog\/tensorflow-lite-concepts-architectures\/","og_site_name":"inovex GmbH","article_publisher":"https:\/\/www.facebook.com\/inovexde","article_published_time":"2020-11-13T09:19:37+00:00","article_modified_time":"2022-12-02T07:33:20+00:00","og_image":[{"width":5560,"height":3128,"url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/11\/tensorflow-lite-architecture.png","type":"image\/png"}],"author":"Robin Baumann","twitter_card":"summary_large_image","twitter_image":"https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/11\/tensorflow-lite-architecture-1024x576.png","twitter_creator":"@_RobinBaumann","twitter_site":"@inovexgmbh","twitter_misc":{"Verfasst von":"Robin Baumann","Gesch\u00e4tzte Lesezeit":"14\u00a0Minuten","Written by":"Robin Baumann"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.inovex.de\/de\/blog\/tensorflow-lite-concepts-architectures\/#article","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/blog\/tensorflow-lite-concepts-architectures\/"},"author":{"name":"Robin Baumann","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/bf2965260253341edd321d75f38dca81"},"headline":"Deep Learning for Mobile Devices with TensorFlow Lite: Concepts and Architectures","datePublished":"2020-11-13T09:19:37+00:00","dateModified":"2022-12-02T07:33:20+00:00","mainEntityOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/tensorflow-lite-concepts-architectures\/"},"wordCount":2737,"commentCount":0,"publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/tensorflow-lite-concepts-architectures\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/11\/tensorflow-lite-architecture.png","keywords":["Apps","Deep Learning","Machine Learning"],"articleSection":["Analytics","English Content","General"],"inLanguage":"de","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.inovex.de\/de\/blog\/tensorflow-lite-concepts-architectures\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.inovex.de\/de\/blog\/tensorflow-lite-concepts-architectures\/","url":"https:\/\/www.inovex.de\/de\/blog\/tensorflow-lite-concepts-architectures\/","name":"Deep Learning for mobile devices with TensorFlow Lite: Concepts and Architectures","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/tensorflow-lite-concepts-architectures\/#primaryimage"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/tensorflow-lite-concepts-architectures\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/11\/tensorflow-lite-architecture.png","datePublished":"2020-11-13T09:19:37+00:00","dateModified":"2022-12-02T07:33:20+00:00","description":"This first post tackles some of the theoretical background of on-device machine learning, including quantization and state-of-the-art model architectures for TensorFlow Lite.","breadcrumb":{"@id":"https:\/\/www.inovex.de\/de\/blog\/tensorflow-lite-concepts-architectures\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.inovex.de\/de\/blog\/tensorflow-lite-concepts-architectures\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/blog\/tensorflow-lite-concepts-architectures\/#primaryimage","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/11\/tensorflow-lite-architecture.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/11\/tensorflow-lite-architecture.png","width":5560,"height":3128,"caption":"Tensorflow lite on a phone next to a drafting compass"},{"@type":"BreadcrumbList","@id":"https:\/\/www.inovex.de\/de\/blog\/tensorflow-lite-concepts-architectures\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.inovex.de\/de\/"},{"@type":"ListItem","position":2,"name":"Deep Learning for Mobile Devices with TensorFlow Lite: Concepts and Architectures"}]},{"@type":"WebSite","@id":"https:\/\/www.inovex.de\/de\/#website","url":"https:\/\/www.inovex.de\/de\/","name":"inovex GmbH","description":"","publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.inovex.de\/de\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Organization","@id":"https:\/\/www.inovex.de\/de\/#organization","name":"inovex GmbH","url":"https:\/\/www.inovex.de\/de\/","logo":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","width":1921,"height":1081,"caption":"inovex GmbH"},"image":{"@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/inovexde","https:\/\/x.com\/inovexgmbh","https:\/\/www.instagram.com\/inovexlife\/","https:\/\/www.linkedin.com\/company\/inovex","https:\/\/www.youtube.com\/channel\/UC7r66GT14hROB_RQsQBAQUQ"]},{"@type":"Person","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/bf2965260253341edd321d75f38dca81","name":"Robin Baumann","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/wp-content\/uploads\/BWsmall-96x96.jpg6394835c952c64696603cb897fef6e76","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/BWsmall-96x96.jpg","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/BWsmall-96x96.jpg","caption":"Robin Baumann"},"description":"Hi there! My name is Robin and I work as a Data Scientist for inovex. I am particularly interested in applications of Deep Learning in the area of visual computing. Therefore, most of my blog articles deal with the things that happen when you call model.fit() on image or 3D data.","sameAs":["https:\/\/www.linkedin.com\/in\/robin-baumann-272959159\/","https:\/\/x.com\/_RobinBaumann"],"url":"https:\/\/www.inovex.de\/de\/blog\/author\/rbaumann\/"}]}},"_links":{"self":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/20158","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/users\/137"}],"replies":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/comments?post=20158"}],"version-history":[{"count":1,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/20158\/revisions"}],"predecessor-version":[{"id":39784,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/20158\/revisions\/39784"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media\/20178"}],"wp:attachment":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media?parent=20158"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/tags?post=20158"},{"taxonomy":"service","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/service?post=20158"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/coauthors?post=20158"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}