Android Graphics Pipeline: From Button to Framebuffer (Part 2)

Last time, we took a thorough look at how Android converts the Java-Side onDraw() method into a native display list on the C++-Side. This time we will take a look at how Android is drawing these display lists to the screen. We are now leaving the comfortable realm of garbage-collected Java and entering the dark and scary dungeon which is called C++. But don’t worry, we’ll keep it quite simple and only show relevant and interesting code bits.

Drawing the display list

Before Android 4.3, rendering operations of the UI were executed in the same order the UI elements are added to the view hierarchy and therefore added to the resulting display list. This can result in the worst case scenario for GPUs, as they must switch state for every element. For example, when drawing two buttons, the GPU needs to draw the NinePatch and text for the first button, and then the same for the second button, resulting in at least 3 state changes.

Reordering and merging of operations

So in order to minimize the expansive state changes, Android is reordering all drawing operations based on their type and state attributes. We are leaving our example application with only one button for a moment and are now looking at a fully arbitrary activity:

Merging Layout

Example Activity with overlapping elements, carefully chosen to illustrate possible problems when reordering drawing operations.

As seen in image above, a simple approach to reordering and merging by type is not sufficient in most cases. Drawing all text elements and then the bitmap (or the other way around) does not result in the same final image as it would without reordering, which is clearly not acceptable.

In order to correctly render the example activity, text elements A and B have to be drawn first, followed by the bitmap C, followed by the text element D. The first two text elements could be merged into one operation, but the text element D cannot, as it would be overlapped by the bitmap.

To further reduce the drawing time needed for a view hierarchy, most operations can be merged after they have been reordered. This happens in the DeferredDisplayList, so-called because the execution of the drawing operations does not happen in order, but is deferred until all operations have been analyzed, reordered and merged.

Because every display list operation is responsible for drawing itself, an operation that supports the merging of multiple operations with the same type must be able to draw multiple, different operations in one draw call. Not every operation is capable of merging, so some can only be reordered.

Canvas Drawdisplaylist

The OpenGLRenderer is an implementation of the Skia 2D drawing API, but instead of utilizing the CPU it does all the drawing hardware accelerated with OpenGL. On the way trough the pipeline, this is the first native-only class implemented in C++. The renderer is designed to be used with the GLES20Canvas and was introduced with Android 3.0. It is only used in conjunction with display lists.

To merge multiple operations to one draw call, each operation is added to the deferred display list by calling addDrawOp(DrawOp). The drawing operation is asked to supply the  batchId, which indicates the type of the operation it can be merged with, and the mergeIdwhich indicates the merged operations, by calling DrawOp.onDefer(...).

Possible batchIds include OpBatch_Patch for a 9-Patch and OpBatch_Text for a normal text element. These are defined in a simple enum. The mergeId is determined by each DrawOpitself, and is used to decide if two operations of the same DrawOp type can be merged. For a 9-Patch, the mergeId is a pointer to the asset atlas (or bitmap), for a text element it is the paint color. Multiple drawables from the same asset atlas texture can potentially be merged into one batch, resulting in a greatly reduced rendering time.

All information about an operation is collected into a simple struct:

After the batchId and mergeId  of an operation are determined, it will be added to the last batch if it is not mergeable. If no batch is already available, a new batch will be created. The more likely case is that the operation is mergeable. To keep track of all recently merged batches, a hashmap for each batchId is used which is called MergeBatches in the simplified algorithm. Using one hashmap for each batch avoids the need to resolve collisions with the  mergeId.

If the current operation can be merged with another operation of the same mergeId  and  batchId, the operation is added to the existing batch and the next operation can be added. But if it cannot be merged due to different states, drawing flags or bounding boxes, the algorithm needs to insert a new merging batch. For this to happen, the position inside the list of all batches ( Batches) needs to be found. In the best case, it would find a batch that shares the same state with the current drawing operation. But it is also essential that the operation does not intersect with any other batches in the process of finding a correct spot. Therefore, the list of all batches is iterated over in reverse order to find a good position and to check for intersections with other elements. In case of an intersection, the operation cannot be merged and a new DrawBatch is created and inserted into the MergeBatcheshashmap. The new batch is added to Batches at the position found earlier. In any case, the operation is added to the current batch, which can be a new or an existing batch.

The actual implementation is more complex than the simplified version presented here. There are a few optimizations worth being mentioned. The algorithm is tries to avoid overdraw by removing occluded drawing operations, and also tries to to reorder non-mergeable operations to avoid GPU state changes.

Actually drawing the (deferred) display list

After reordering and merging the new deferred display list can finally be drawn to the screen.

DDL Flush

 

Inside the  OpenGLRenderers::drawDisplayList() method, the deferred display list is created filled with operations from the normal display list. The deferred display list is then asked to draw itself ( flush()).

The method multiDraw() will be called on the first operation in that list, with all the other operations as an argument. The called operation is responsible for drawing all supplied operations at once and will also call the OpenGLRenderer  to actually execute the operation itself.

Display List Operations

Each drawing operation to be executed on a canvas has a corresponding display list operation. All display list operations must implement the replay() method, which executes the wrapped drawing operation. These drawing operations call the OpenGLRenderer to render themselves. The reference to the renderer needs to be supplied when creating an operation. onDefer() must also be implemented and must return the operation’s drawId and mergeId. Non-mergable batches are setting the draw id to kOpBatch_None. Mergable operations must implement the multiDraw() method, which is used when a whole batch of merged operations need to be rendered at once.

For example, the operation to draw a 9-Patch (called DrawPatchOp) contains the following  multiDraw()  implementation:

 

The batchId of a 9-Patch is always kOpBatch_Patch, the mergeId is a pointer to the used bitmap. Therefore, all patches that use the same bitmap can be merged together. This is even more important with the use of the asset atlas, as now all heavily used 9-Patches from the Android framework can potentially be merged together as the reside on the same texture.

Texture Atlas

The Android start-up process zygote always keeps a number of assets preloaded which are shared with all processes. These assets are containing frequently used 9-Patches and images for the standard Android framework widgets. But up until Android 4.4, every process was keeping a seperate copy of these assets on the GPU memory. Starting with Android 4.4 KitKat, these frequently used assets are now packed into a texture atlas, uploaded to the GPU and shared between all processes. Only then is merging of 9-Patches and other drawables from the standard framework possible.

The texture atlas generated by the system to reduce GPU stress caused by switching textures too often.

The texture atlas generated by the system to reduce GPU stress caused by switching textures too often.

The image above shows an asset atlas texture generated on a Nexus 7 (2013) running Android 4.4, which contains all frequently used framework assets. If you look closely, the 9-Patches do not feature the typical borders which indicate the layout and padding areas. The original asset files are still used to parse these areas on system start, but they are not used for rendering any longer.

When booting a system the first time after an Android update (or ever), the  AssetAtlasService is regenerating the texture atlas. This atlas is then used for all subsequent reboots, until a new Android update is applied.

To generate the atlas, the service brute-forces trough all possible atlas configurations and looks for the best one. The best configuration is determined by the maximum number of assets on the texture and the minimum texture size, which is then written to  /data/system/framework_atlas.config and contains the chosen algorithm, dimensions, whether rotations are allowed and whether padding has been added. This configuration is then used in subsequent reboots to regenerate the texture atlas. A RGBA8888 graphic buffer is allocated as the asset atlas texture and all assets are rendered onto it via the use of a temporary Skia bitmap. This asset atlas texture is valid for the lifetime of the  AssetAtlasService, only being deallocated when the system itself is shutting down.

To actually pack all assets into the atlas, the service starts with an empty texture. After placing the first asset, the remaining space is divided into two rectangular cells. Depending on the algorithm used, this split can either be horizontal or vertical. The next asset texture is added in the first cell that is large enough to fit. This now occupied cell will be split again and the next asset is processed. The  AssetAtlasService is using multiple threads to speed up the time it takes to iterate through all combinations.

When a new app is started, its HardwareRenderer queries the AssetAtlasService for this texture and every time the renderer needs to draw a bitmap or 9-Patch it will check the atlas first.

Font caching and rendering

In order to merge text views, a similar approach is used and a font cache is generated. But in contrast to the texture atlas, this font atlas is unique for each app and font type. The color of the font can be applied in a shader and is therefore not considered in the atlas.

Left: Font atlas generated by the font renderer. Right: Geometry generated on the CPU, used to render the characters.

Left: Font atlas generated by the font renderer. Right: Geometry generated on the CPU, used to render the characters.

If you take a quick glance at the font atlas, you will instantly see that only a few characters are present. When taking a closer look, you will see only the used characters! If you think about how many languages Android supports, and how many characters are supported, only caching the used ones makes perfectly good sense. And because the action bar and the button are using the same font, all characters from both text views can be merged onto one texture.

To draw the font to the screen, the renderer needs to generate a geometry to which the texture gets bound. The geometry is generated on the CPU and then drawn via the OpenGL command glDrawElements(). If the device supports OpenGL ES 3.0, the FontRenderer will update and upload the font cache texture asynchronously at the start of the frame, while the GPU is mostly idle, which saves precious milliseconds per frame. The cache texture is implemented as a OpenGL Pixel Buffer Object, which makes a asynchronous upload possible.

OpenGL

At the start of this mini-series I promised you some raw OpenGL drawing commands. So with no further ado I present you the (not quite complete) OpenGL drawing log for the button of our simple one-button activity:

The complete OpenGL draw call log can be seen at the end of this blog post.

Conclusion

We have seen how Android converts its view hierarchy to a series of render commands inside a display list, reorders and merges these commands and finally how these commands are executed.

Returning to our example activity with one button, the entire view can be rendered in just 5 steps:

Five Steps

  1. The layout draws the background image, which is a linear gradient.
  2. Both the ActionBar and Button background 9-Patches are drawn. These two operations were merged into one batch, as both 9-Patches are located on the same texture.
  3. A linear gradient is drawn for the ActionBar.
  4. Text for the Button and the ActionBar is drawn simultaneously. As these two views use the same font, the font rendere can use the same font texture and therefore merge the two operations.
  5. The application’s icon is drawn.

And there you have it, we traced all the way from the view hierarchy to the final OpenGL commands, which concludes this mini-series.

Download

The full Bachelor’s Thesis on which this article is based is available for download.

Full Listings

Display List

OpenGL

Get in touch

Interested in Android Development, Mobile and Embedded Systems? Have a look at our full portfolio on our website, drop us an email or call +49 721 619 021-0.

comments powered by Disqus