A cube with the kotlin and snapdragon logos on its sides casting a shadow
Apps

Android Snapdragon Integration & Tensor Interpretation with Kotlin

Lesezeit
9 ​​min

During a recent tech spike, we had the opportunity to evaluate the integration of the Qualcomm Snapdragon SDK (SNPE) in an Android/Kotlin environment. The spike was divided into two segments where we first integrated – and second, used the Snapdragon SDK for object detection on images. This article helps to understand the complexity of Snapdragon SDK and learn how to convert an output tensor to drawable bounding boxes with Kotlin.

The following image shows the Snapdragon training pipeline for providing the required .dlc files (see [1]). This .dlc file contains the trained model based on the Pascal data set and needs to be added to the Android project.

depiction of a training pipeline
Image [1]: Training pipeline
The underlying story of the tech spike has been:

“As a user, I want to detect a cat on an image with an Android app, in order to classify the cat“.

We did start with a simple Android test app, which loaded a data test set of images and integrated the Snapdragon SDK. The test app should simply get images and detect a cat on it (see [2]).

pictures of different animals and a smartphone that detects one of the pictures as a cat picture
Image [2]: Use Case
Therefore we began with a simple app with an instrumented test that loads an image and a trained model. As the spike demands we are supposed to use the Snapdragon SDK. The Snapdragon provides a neural network, which can be fed with an input tensor. In our case, this input tensor is a converted image. Afterward, we can interpret the neural network output tensor, as soon as the neural network responded.

From here on we are able to integrate the conversion algorithm. Before we go deeper into the algorithm it is recommended to take a moment and refresh some terminology. Since the manual conversion of a neural network output tensor requires a specific knowledge of the used terminology, it is recommended to read through the following short glossary:

Definitions

model:

  • In the context of this article, a model refers to a single-stage real-time object detection model, provided by a .dlc file. (source)

config:

  • A config refers to a data class to collect all meta-data that are required by the algorithm.

tensor:

  • A tensor is a (n) dimensional array of values. (source)

logit:

  • The vector of raw (non-normalized) predictions that a classification model generates, is ordinarily then passed to a normalization function. (source)

Functions

sigmoid:

  • Is a logistic s-shaped curve, with its limits between 0 → 1. (source)

soft-max:

  • Is a function that turns an array with values into an array with values that sums to one. (source)

non-max-suppression:

  • Non-Max-Suppression is a computer vision method that selects a single entity out of many overlapping entities. (source)

Integration of the Snapdragon SDK

Now we can start with the integration of the Snapdragon SDK into the Android app, in order to use the neural network for object detection. See below links to the Snapdragon SDK and a helpful tutorial provided by Qualcomm.

Download  SNPE Release

Follow       SNPE Android Tutorial

Setup

  • Add root level lib/ directory.
  • Update lib/ with .aar SNPE Release.
  • Update lib/ with .aar Platform-Validator.
  • Update src/res/raw with a trained model file (.dlc).
  • Update AndroidManifest.xml

  • App packagingOptions in build.gradle

Build the neural network

Load a model (.dlc) from the Android resource directory (res/) and build the neural network as shown in below code fragment:

Interpretation of a Snapdragon tensor

In order to use the SNPE neural network, we have to prepare the environment by loading a model and defining its parameters. Later we will use this config to set the necessary parameters for the algorithm which converts the Snapdragon output tensor. It is crucial that the type of the loaded model (for example yoloV2.dlc) is mapped to the model configuration. Otherwise, the conversion will fail, since parameters like gridSize or cellSize have a major impact on the conversion algorithm.

Model configuration

The below example wraps all the parameters in a configuration file, which is used for the algorithm.

A typical YoloV2 configuration would use the following setup:

Snapdragon input tensor setup

Since the main goal is to draw a list of boxes on an image that contains a cat object, we need to abstract this functionality. Starting at step one of the following code snippets we implemented this abstraction within a single function.

1. Execute the neural network and return a list of boxes

2. Transform the given image to a float array of normalized RGB values in order to create the input tensor: 

3. Get the grid and convert the output tensor to a list of boxes

From here on all interfaces are defined. In order to keep things simple, we implemented several extensions for transforming arrays or converting a tensor. Follow along with the next chapter to get the idea of converting the output tensor to a list of boxes.

Output tensor conversion

Before we dive into the code base for converting a SNPE Neural Network output tensor you should recap the underlying model. For this example we used YoloV2. A basic introduction can be found here or in the original paper here. The conversion of the neural network output tensor, which is based on a YoloV2 model, is divided into four main steps (see [3]).

Image [3[: General steps to convert an output-tensor

The following code snippets should inspire future implementations for tensor conversions in Kotlin. At the time we published this article, the conversion outputs from YoloV2 and YoloV5 models were implemented and tested.

1. Create boxes from grid and config:

2.1 Create a box from an anchor box:

In order not to blow up this article we excluded some of the simple function implementations like scaling coordinates, soft-max, or non-max-suppression.

2.2 Create a box Instance from a filtered anchor box:

The given starting index, here 5, points to a defined object class. This object class or namely “logit“ is defined by the loaded model. In our case, the logit class name is cat, since we are only interested in cat objects. But there could be more logits, depending on the use case.

2.3 Extract a logit from the given anchor box starting at the given index: 

2.4 Use the soft-max algorithm to normalize the output of a network to a probability distribution over the given array:

2.5 Create the box instance:

where a box is a pure Kotlin data class.

3. Filter boxes by non-max-suppression:

Draw boxes around the object

At this point, we have returned a filtered list of boxes from the createBoxes Function (see Output Tensor Conversion I). What remains is to draw the best box on the initial given image.

1. Draw box:

2. Create a box shape:

Lessons learned

The above implementation leads to a list of boxes. The best box is drawn into the canvas of the given image (see [4]).

two identical pictures of the same cat. one picture has a framed cat with a 99% sign.
Image [4]: Input → Output
Finally implementing an algorithm to convert a Snapdragon neural network output tensor has been a challenge across all touched domains. On the machine learning side, we had to follow through with the algorithm and all its details in filtering, etc. However, on the app side, we started the implementation from scratch and prepared multiple test sets for running simple Android instrumented tests on given images. While implementing the algorithm we used a more test-driven development approach, which has been more suitable during the process.

  • Communication between the domain developers is crucial for the successful implementation.
  • Understanding the domain helped a lot while implementing the algorithm.
  • The model abstraction in a config file helped to scale with other Yolo versions.
  • Unnoticed corrupted test data sets make testing quite hard.
  • A test is just as good as its test data set.
  • The Snapdragon SDK runs on DSP and is quite fast
Follow up

Image sources:

Hat dir der Beitrag gefallen?

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert