Pepper the robot on brownish background with superimposed brain

How to Use Google’s ML Kit to Enhance Pepper With AI (Part 5)

9 ​​min

Welcome to the fifth and final part of this blog series “Enhancing Pepper robot with AI with Google’s ML Kit“! Now, we are going to see how to leverage Google’s ML Kit Translation API to power Pepper with translating capabilities.

In case you missed the previous articles, I recommend you start reading here for an introduction to what we are building in this series.

In previous articles, we saw several ways how we can enhance Pepper’s abilities by integrating ML Kit in our apps to build some cool new features, e.g. to recognize the objects around it, to build a game where it recognizes what we draw on its tablet or to read texts aloud. Before we start with the implementation, as usual, let’s look at an example video to see what we would like to achieve.

Text Translation

With this demo, you can ask Pepper to translate a word or a sentence between any pair of languages of those available in your robot. Pepper will respond, uttering the translation in the target language by means of the TextToSpeech android library. The translation will be powered by the ML Kit’s on-device Translation API, which makes use of the same models used by the Google Translate app’s offline mode. Google warns that this on-device translation is intended for casual and simple translations only as it does not offer the same quality as the Cloud Translation API, i.e., it should not be used for translating long texts, but that is not a problem because, for our demo, in which the translations are short and not too complicated, this is more than enough and the results are satisfying.

Usage guidelines

Before using this Google product in your application, make sure to refer first to the Guidelines page for important guidelines and restrictions on the usage of this API, as it must comply with the Google Cloud Translation API attribution requirements. These requirements include guidelines on how the app must handle layout, Google attribution, and branding.


Here you can find the full code of the application we’re building throughout this series.

The structure of this demo is very similar to those we had in our Object Detection Demo and our Text Recognition Demo. We have a fragment, a ViewModel, and a TextTranslator, where we interact with the API. The difference is that after the initialization, there is nothing we want to be updated on the screen or keep running in the background except for the speech recognition engine, but rather the robot simply waits to be asked to translate something and all actions will be triggered from our dialog, the QiChat topic file.

How to recognize speech

When the user asks any of the variants of the question “how do you say [text] in German/Spanish/…?“ defined in the dialog topic file, a bookmark is reached, through which the activity gets informed and after what it will call the method in the fragment.

While the translation will be happening completely offline, to be able to recognize free speech, i.e., any word or phrase, without needing it to be predefined, the standard remote speech recognition engine from Pepper, based on Nuance’s technology, is needed. This cloud-based speech recognition is not needed where we only translate and respond to words and sentences we know in advance and can list and hardcode in the dialog topic file written in QiChat language. If for the development of an application restrictions would apply, such that we are not allowed to send speech excerpts to this cloud, the alternatives would be either working with the mentioned offline variant, which limits the recognition to predefined sentences, or replacing Pepper’s standard speech recognition with an own. Replacing this whole system would be possible although arduous. It takes care of detecting speech in progress and also the end of it and doing the speech-to-text conversion for us, thus providing the heard text directly to the Chatbots that are running and listening to input, as explained in the introduction article of this series.

How to parse the question

We will make use of the free speech recognition function to be able to translate any text. In the topic file, we signalize we want to recognize free speech by using the symbol “*“ as a wildcard for the part of the text that corresponds to what needs to be translated. For the target language, we choose from the list of available languages. We first mark both with “_“ in front of them to be stored in temporary variables in the concept, “$1“ and “$2“ respectively, and next we store them in variables as we have done in previous examples by using the “$“ symbol before the name we gave them in the code.

We programmed our application to be available in English, German and Spanish, for instance, but the list could be extended with as many languages as you want to support.

As mentioned earlier, after storing the information we extracted in variables when the bookmark is reached, we will be notified in the activity, which will trigger the translation in the fragment, if the fragment is currently being displayed.

How to translate

Once the question has been processed, we use the information we stored in the variables to complete the task: the text and the target language. If the target language is one of those installed on the robot, we initiate the translation. Although the translator can translate more than 50 languages, it might be confusing for the user that Pepper can translate into languages it’s not able to speak (because they are not installed on the robot) so, in this case, we stuck to our three supported languages in our app.

Our TextTranslator, called by the ViewModel, is very straightforward since so is the Text Translation API as well. We simply need to create the options with the source and target language, a client, and the download conditions if any, and we are ready to translate. We use a simple map to convert the different nomenclatures for languages used in each of the libraries.

When the results are ready, we display the results in text form on the screen and use them for the reply via voice.

How to utter the translation

To reply via voice, once we have the translation, we jump to another bookmark in the dialog topic. We do that by first setting the variable with the text and then going to the bookmark as follows:

In the topic, Pepper starts by replying to the initial question and, instead of uttering the translation in a foreign language with the wrong pronunciation, we make use of yet another bookmark to jump back to the logic and pronounce the translated text with a speech engine in the target language.

The activity will do so by triggering a call to pronounceTranslation. This method uses the Text To Speech from the speech library available in Android to utter the text with the right pronunciation in the selected language. This library synthesizes speech from the text for immediate playback, as well as to create a sound file.

If the request was to translate to a language other than those supported, Pepper will politely respond that the requested language is not available.


That is it! That is how we can let Pepper translate something into another language.

A social robot like Pepper can benefit greatly from translating capabilities. For instance, when acting as a helper or companion, it can be of great help for the user to have this feature available and accessible anytime through natural language.

I hope you enjoyed the implementation of this demo! Check out the other articles of this series, where we look at more use cases and how to implement them in our ML Kit-powered Android app for the Pepper robot!


  1. Introduction
  2. demo with ML Kit’s Object Detection API
  3. demo with mL Kit’s digital ink recognition api
  4. demo with mL Kit’s text recognition API
  5. demo with ml kit’s translation api (this article)

One thought on “How to Use Google’s ML Kit to Enhance Pepper With AI (Part 5)

  1. Overall a very good tutorial. I am currently using the DeepL API for translations in a project with Pepper. The actual translation happens then online, but that also works very well 🙂

Hat dir der Beitrag gefallen?

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert