Notice:
This post is older than 5 years – the content might be outdated.
Smart Assistants, fancy image filters in Snapchat and apps like Prisma all have one thing in common—they are powered by Machine Learning. The use of Machine Learning in mobile apps is growing and new mobile apps are developed with Machine Learning based services as business models. In this blog series we want to give you hands-on advice on how you can train and deploy a convolutional neural network for image classification to a mobile app using the popular machine learning framework TensorFlow Mobile.
Our task will be to classify images of houseplants which we have collected ourselves. You don’t have to go and snap pictures of plants, however, because our approach is generic and can be used for training and deploying a convolutional neural network for image classification, independent of their subject. If you’d also like to go with houseplants, however, we have written an image crawler to save you the manual labor. You’ll find the instructions here.
For your convenience we have published a repository containing all necessary files and source code used in this tutorial.
As a concrete implementation of a convolutional neural network we’ll use one of the MobileNets, a class of efficient convolutional neural networks for mobile and embedded vision applications. These are already implemented in one of the high-level APIs of TensorFlow which is called TF-Slim. You can find the TF-Slim models in the model repository of TensorFlow. In this blog series we will use TF-Slim for the training of the MobileNet.
For the deployment of neural networks to a mobile device there are currently two solutions:
- Tensor Flow Mobile: TensorFlow was designed from the ground up to be a good deep learning solution for mobile platforms such as Android and iOS. TensorFlow Mobile represents the mobile version of the framework which you can use in your mobile apps. It also contains multiple guides and scripts for the deployment of a model into a mobile app.
- TensorFlow Lite: This is an evolution of TensorFlow Mobile. In most cases, apps developed with TensorFlow Lite will have a smaller binary size, fewer dependencies, and better performance. Currently TensorFlow Lite is in developer preview, so not all use cases are covered yet and it only supports a limited set of operators, so not all models will work on it by default.
In this blog series we will use TensorFlow Mobile because TensorFlow Lite is in developer preview and TensorFlow Mobile has a greater feature set. As mentioned before, we will use images of houseplants as our dataset. In total there are 9364 images across 26 classes available.
The Setup
First off we need to install TensorFlow. The easiest way is to follow the official installation guide which works for different platforms and operation systems. Just use the current version of TensorFlow.
To get started we clone the TensorFlow model repository: git clone https://github.com/tensorflow/models.git and switch to the TF-Slim models directory with cd models/research/slim. There we find the following tree structure:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
. ├── BUILD ├── README.md ├── WORKSPACE ├── __init__.py ├── datasets ├── deployment ├── download_and_convert_data.py ├── eval_image_classifier.py ├── export_inference_graph.py ├── export_inference_graph_test.py ├── nets ├── preprocessing ├── scripts ├── setup.py ├── slim_walkthrough.ipynb └── train_image_classifier.py |
Next we need to define our dataset, creating a python file as description in the dataset directory, where the other dataset descriptions are arranged. With cd datasets we switch to the directory and with touch hp_plants.py we create the required python file, to which we add the following code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
# Copyright 2016 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # ============================================================================== """Provides data for the Houseplants dataset.""" from __future__ import absolute_import from __future__ import division from __future__ import print_function import os import tensorflow as tf from datasets import dataset_utils slim = tf.contrib.slim # DATASET-VARIABLE: TFRecord file pattern _FILE_PATTERN = 'hp_plants_%s_*.tfrecord' # DATASET-VARIABLE: splits the dataset into 80 % for training and 20 % for evaluation SPLITS_TO_SIZES = {'train': 7532, 'validation': 1883} # DATASET-VARIABLE: num classes of the houseplants dataset _NUM_CLASSES = 26 _ITEMS_TO_DESCRIPTIONS = { 'image': 'A color image.', 'label': 'A single integer between 0 and 26', } def get_split(split_name, dataset_dir, file_pattern=None, reader=None): """Gets a dataset tuple with instructions for reading cifar10. Args: split_name: A train/test split name. dataset_dir: The base directory of the dataset sources. file_pattern: The file pattern to use when matching the dataset sources. It is assumed that the pattern contains a '%s' string so that the split name can be inserted. reader: The TensorFlow reader type. Returns: A `Dataset` namedtuple. Raises: ValueError: if `split_name` is not a valid train/test split. """ if split_name not in SPLITS_TO_SIZES: raise ValueError('split name %s was not recognized.' % split_name) if not file_pattern: file_pattern = _FILE_PATTERN file_pattern = os.path.join(dataset_dir, file_pattern % split_name) # Allowing None in the signature so that dataset_factory can use the default. if not reader: reader = tf.TFRecordReader keys_to_features = { 'image/encoded': tf.FixedLenFeature((), tf.string, default_value=''), 'image/format': tf.FixedLenFeature((), tf.string, default_value='jpg'), 'image/class/label': tf.FixedLenFeature( [], tf.int64, default_value=tf.zeros([], dtype=tf.int64)), } items_to_handlers = { 'image': slim.tfexample_decoder.Image(), 'label': slim.tfexample_decoder.Tensor('image/class/label'), } decoder = slim.tfexample_decoder.TFExampleDecoder( keys_to_features, items_to_handlers) labels_to_names = None if dataset_utils.has_labels(dataset_dir): labels_to_names = dataset_utils.read_label_file(dataset_dir) return slim.dataset.Dataset( data_sources=file_pattern, reader=reader, decoder=decoder, num_samples=SPLITS_TO_SIZES[split_name], items_to_descriptions=_ITEMS_TO_DESCRIPTIONS, num_classes=_NUM_CLASSES, labels_to_names=labels_to_names) |
You can use every dataset you want, you just have to change the name of the python file and of the dataset variables inside the python file. The dataset variables are marked in the code above. Next we need to make a reference to our dataset in the dataset factory. Open dataset_factory.py in the datasets folder and add the name of the dataset to the datasets map.
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
datasets_map = { 'cifar10': cifar10, 'flowers': flowers, 'imagenet': imagenet, 'mnist': mnist, # added hp_plants as dataset 'hp_plants':hp_plants } |
To convert our image data to an appropriate binary file format (TFRecord) we use a script provided by Kwotsin in a Github repository. Clone the repository and copy create_tfrecord.py and dataset_utils.py to the slim folder. To create the TFRecord files run the following command in your terminal.
1 2 3 4 5 6 7 |
python create_tfrecord.py \ --dataset_dir=../../../hp_dataset \ --tfrecord_filename=hp_plants \ --validation_size=0.2 |
With the dataset_dir parameter we define where our dataset is stored and with the tfrecord_filename parameter we define the pattern of the TFRecord files. This pattern must match with the pattern we defined in our dataset description and the dataset factory. In the last parameter you define which size of the dataset should be used for validation. This step will create TFRecord files for training and validation. With the current setting of the validation_size parameter, 80 % of the data will be used for training and 20 % for validation. The results can be viewed in the dataset folder of the houseplants.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
. ├── hp_plants_train_00000-of-00002.tfrecord ├── hp_plants_train_00001-of-00002.tfrecord ├── hp_plants_validation_00000-of-00002.tfrecord ├── hp_plants_validation_00001-of-00002.tfrecord ├── images └── labels.txt |
The execution of the script has created 2 TFRecord files for execution and 2 for validation. Also a file with the labels was created, which contains the 26 class names of the dataset. The images folder contains the images of the houseplants in particular folders. For every class there is a folder with the class name as folder name and the images of the class inside of this particular folder.
Training
Until now we have done general preparation and pre-processing. In the next steps we will set up our training, using Transfer Learning. In practice an entire convolutional neural network is rarely trained from scratch, because it is rare to have a dataset of sufficient size. With Transfer Learning however we can train a convolutional neural network with a dataset of a small size, because we are using pre-trained weights of the convolutional neural network. We just have to fine-tune it on our dataset.
In the model repository of TensorFlow you can download multiple pre-trained weights of several different convolutional neural networks trained on ImageNet data. As mentioned above we are using a MobileNet in this blogpost series, whose pre-trained weights we have to download. We can find them in the MobileNet v1 description where we have to download MobileNet_v1_1.0_224. Copy the downloaded .tgz file to the slim folder, create a subfolder with the name mobilenet_v1_1.0_224 and extract it with tar xf mobilenet_v1_1.0_224.tgz -C ./mobilenet_v1_1.0_224 to the subfolder. In the subfolder you can see multiple files.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
. ├── mobilenet_v1_1.0_224.ckpt.data-00000-of-00001 ├── mobilenet_v1_1.0_224.ckpt.index ├── mobilenet_v1_1.0_224.ckpt.meta ├── mobilenet_v1_1.0_224.tflite ├── mobilenet_v1_1.0_224_eval.pbtxt ├── mobilenet_v1_1.0_224_frozen.pb └── mobilenet_v1_1.0_224_info.txt |
The weights for transfer learning are stored in the .ckpt files.
TensorFlow uses a dataflow graph to represent computations in terms of the dependencies between individual operations. Dataflow is a common programming model for parallel computing where the nodes represent units of computation and the edges represent the data consumed or produced, which also applies to neural networks in TensorFlow. In the subfolder we have a whole graph of the MobileNet, which is stored in the .pb file provided. We’ll later need this for to provision our model for mobile.
To start our training we need to run train_image_classifier.py with some arguments. We recommend training on a GPU to speed up the process considerably. Run this command in your terminal to start training on the GPU:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
python train_image_classifier.py \ --train_dir=./train_dir \ --dataset_dir=../../../hp_dataset\ --dataset_name=hp_plants \ --dataset_split_name=train \ --model_name=mobilenet_v1 \ --train_image_size=224 \ --checkpoint_path=./mobilenet_v1_1.0_224/mobilenet_v1_1.0_224.ckpt \ --max_number_of_steps=30000 \ --checkpoint_exclude_scopes=MobilenetV1/Logits |
If you don’t have a GPU available, you have to use the following command. With the argument clone_on_cpu=True all computations will be executed on CPU.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
python train_image_classifier.py \ --train_dir=./train_dir \ --dataset_dir=../../../hp_dataset \ --dataset_name=hp_plants \ --dataset_split_name=train \ --model_name=mobilenet_v1 \ --train_image_size=224 \ --checkpoint_path=./mobilenet_v1_1.0_224/mobilenet_v1_1.0_224.ckpt \ --max_number_of_steps=30000 \ --clone_on_cpu=True \ --checkpoint_exclude_scopes=MobilenetV1/Logits |
With the argument train_dir we specify the destination of the TFRecord files we created beforehand. As a further argument we have dataset_name to select the dataset we want to use for training. Here we choose the houseplants dataset previously specified in the datasets description. The dataset_split argument specifies which TFRecord files are used for training, so we select those from above. In the next argument model_name we specify which model we want to train. Here we have choose the MobileNet as model, setting our input images size to 224x224x3 with the argument train_image_size . With the argument checkpoint_path we refer to the downloaded checkpoint of the MobileNet while also enabling Transfer Learning as training method. The argument max_number_step defines the number of training steps, which we set too 30,000.
In the end we need to exclude some weights of the checkpoint, because the model was pre-trained on the ImageNet datatset. The weights of the fully connected layer, which does the classification in the end, are trained on the ImageNet dataset with 1000 classes and our houseplants dataset has 26 classes. So we can not use the pre-trained weights for this layer and have to train it completely from scratch. During the training you see the loss converge with the output looking like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
INFO:tensorflow:global step 29790: loss = 0.3194 (0.236 sec/step) INFO:tensorflow:global step 29800: loss = 0.1820 (0.175 sec/step) INFO:tensorflow:global step 29810: loss = 0.1972 (0.230 sec/step) INFO:tensorflow:global step 29820: loss = 0.2426 (0.232 sec/step) INFO:tensorflow:global step 29830: loss = 0.2625 (0.241 sec/step) INFO:tensorflow:global step 29840: loss = 0.1558 (0.188 sec/step) INFO:tensorflow:global step 29850: loss = 0.1601 (0.230 sec/step) INFO:tensorflow:global step 29860: loss = 0.2257 (0.245 sec/step) INFO:tensorflow:global step 29870: loss = 0.3663 (0.269 sec/step) INFO:tensorflow:global step 29880: loss = 0.1686 (0.198 sec/step) INFO:tensorflow:global step 29890: loss = 0.3222 (0.216 sec/step) INFO:tensorflow:global step 29900: loss = 0.2520 (0.217 sec/step) INFO:tensorflow:global step 29910: loss = 0.3735 (0.243 sec/step) INFO:tensorflow:global step 29920: loss = 0.2633 (0.204 sec/step) INFO:tensorflow:global step 29930: loss = 0.2714 (0.185 sec/step) INFO:tensorflow:global step 29940: loss = 0.3153 (0.194 sec/step) INFO:tensorflow:global step 29950: loss = 0.1891 (0.215 sec/step) INFO:tensorflow:global step 29960: loss = 0.2570 (0.197 sec/step) INFO:tensorflow:global step 29970: loss = 0.1911 (0.203 sec/step) INFO:tensorflow:global step 29980: loss = 0.1798 (0.222 sec/step) INFO:tensorflow:global step 29990: loss = 0.1881 (0.218 sec/step) INFO:tensorflow:global step 30000: loss = 0.1761 (0.226 sec/step) INFO:tensorflow:Stopping Training. INFO:tensorflow:Finished training! Saving model to disk. |
The model will be automatically saved in the specified train directory. After that we need to evaluate the trained model using the provided python script eval_image_classifier.py. Run the script with the following command:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
python eval_image_classifier.py \ --alsologtostderr \ --checkpoint_path=./train_dir/model.ckpt-30000 \ --dataset_dir=../../../hp_dataset \ --dataset_name=hp_plants \ --dataset_split_name=validation \ --model_name=mobilenet_v1 \ --eval_image_size=224 |
It’s very important to refer to the right checkpoint for evaluation. This we can specify with the argument checkpoint_path where we refer to the checkpoint of the 30000 training steps. The output looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
INFO:tensorflow:Evaluation [5/19] INFO:tensorflow:Evaluation [6/19] INFO:tensorflow:Evaluation [7/19] INFO:tensorflow:Evaluation [8/19] INFO:tensorflow:Evaluation [9/19] INFO:tensorflow:Evaluation [10/19] INFO:tensorflow:Evaluation [11/19] INFO:tensorflow:Evaluation [12/19] INFO:tensorflow:Evaluation [13/19] INFO:tensorflow:Evaluation [14/19] INFO:tensorflow:Evaluation [15/19] INFO:tensorflow:Evaluation [16/19] INFO:tensorflow:Evaluation [17/19] INFO:tensorflow:Evaluation [18/19] INFO:tensorflow:Evaluation [19/19] eval/Accuracy[0.757894754]eval/Recall_5[0.928947389] |
We can see that our MobileNet trained with Transfer Learning in 30000 steps achieves an accuracy of about 75 % and a top-5-recall of 92 %. Thats quite good for such a short training. To boost the performance we could raise the number of training steps but this may lead to overfitting. Another approach is called Hyperparameter Optimization which automatically searches for good parameters like learning rate or regularization strength for our neural network.
Mobile Deployment
After we have finished training and evaluation of our MobileNet, we now can start with the preparation of the mobile deployment. For this we first need to create an inference graph of our MobileNet, which represents the whole MobileNet and is used to map our trained weights with the correct graph of the MobileNet. To create the inference graph we need to run this command:
1 2 3 4 5 6 7 8 9 |
python export_inference_graph.py \ --alsologtostderr \ --model_name=mobilenet_v1 \ --output_file=./inference_graph_mobilenet.pb \ --dataset_name=hp_plants |
You can now find the correct graph representation of the MobileNet in the slim folder as .pb file with the name inference_graph_mobilenet.pb. Now we need to freeze our trained MobileNet, mapping the trained weights with the correct graph representation of the MobileNet:
1 2 3 4 5 6 7 8 9 10 11 |
python freeze_graph.py \ --input_graph=./inference_graph_mobilenet.pb \ --input_binary=true \ --input_checkpoint=./train_dir/model.ckpt-30000 \ --output_graph=./frozen_mobilenet.pb \ --output_node_names=MobilenetV1/Predictions/Reshape_1 |
As you can see our previously generated inference graph is used as input for the freezing. Also we are using the trained weights from our latest checkpoint. With the argument output_graph we specify the output name of the frozen graph. Furthermore we need to provide the argument output_node with the right output node name. The information about input and output nodes of the MobileNet can be found in the previously downloaded files at ./mobilenet_v1_1.0_224/mobilenet_v1_1.0_224_info.txt.
As a last step we need to optimize our graph for mobile. This step will reduce the binary size of the graph by removing unnecessary operations for classificaton and rounding the provided weights of the model. Rounding the provided weights will lead to a small accuracy loss but will improve the classification duration of the model greatly which very is important for mobile devices. To optimize our graph we need to run the the following command:
1 2 3 4 5 6 7 8 9 |
python optimize_for_inference.py \ --input=./frozen_mobilenet.pb \ --output=./opt_frozen_mobilenet.pb \ --input_names=input \ --output_names=MobilenetV1/Predictions/Reshape_1 |
As you can see the input for the optimization is our previously frozen graph which will be optimized for mobile and which is saved as .pb file at opt_frozen_mobilenet.pb.
Now we have a fully functional and mobile optimized graph which we can deploy to our Android or iOS app—we’ll show you how in future articles!
Read on to learn how to integrate the graph with your Android app!
Hello,
I would Thank you first for the Tutorial.
I’m new to TensorFlow and I would train a dataset from scratch.
I followed your steps, But I encountred a problem: When creating .tfrecord files, I obtain them with size 0.
I didn’t get why I had such problem.
Thanks.
This could be due to multiple things, e.g. wrongly specified paths. Therefore it is hard to help you with that little of information. Please check if all Paths that you have specified are spelled correctly and that the actual data files are present in the corresponding directories.
Hey there, thanks for the tutorial! I have successfully built the TFRecord files, but when I try to train the dataset against mobilenet I get the error:
ValueError: Can not squeeze dim[1], expected a dimension of 1, got 198 for ‚MobilenetV1/Logits/SpatialSqueeze‘ (op: ‚Squeeze‘) with input shapes: [32,198,198,73].
Do you have any thoughts why this should be the case? I do have 73 classes in the dataset.
while i am executing the optimize_for_inference.py, the output file is not creating it is giving me the error :
Input graph file “ does not exist!
i am running the .py file by passing :
optimize_for_inference.py \ –input_graph=“C:/Users/TPoornima/PycharmProjects/eggs_detection/inference_graph/frozen_inference_graph.pb“ / output_graph=./frozen_fasterrcnn_graph.pb \ –input_name=input --output_names=faster_rcnn/Predictions/Reshape_1
Hi,
this error likely occurs because TensorFlow uses argparse (https://docs.python.org/3/library/argparse.html) for parsing the command line arguments. Try to remove the Quotes on the
input_graph
argument.Hello, I am currently following this article for my own project as I am using the same mobilenet model. Everything worked fine till the training part. After the training is complete, a new .ckpt file should have appeared in my train_dir but this did not happen.
No errors were reported till this point. I tried to solve this by downloading the slim directory from your repository, but the same thing is repeated.
I tried to search the internet for a solution and a frequent solution that I came across mentioned using the ’saver‘ function, but that did not work too. Would be glad if you could help out
Hi, can you please provide information about the TensorFlow version you are using? Thus I can try to reproduce your error.