CONTACT

The Client

The Client is a leading electronics brand that builds and supplies low cost PC’s, Smartphones, Tablets & Wearables. They are well established and present in more than 70 countries and well regarded in emerging markets across Europe, Middle East, Africa & CIS Region. They had developed a specific version of android tablets targeted at children and wanted to incorporate a voice assistant in it.

Service Offerings

Services

Architecture Design
Backend Development
Mobile App Development
Quality Assurance

Technology Stack

Kotlin

Houndify

Google assistant API

Java

Spring Boot

Problem Statement

The client wanted to build a voice assistant application that would enable children to interact with the tablet through voice. The application should be triggered by a voice command /key word (like Ok Google or Hey Siri). The application should work online and offline and should provide answers to a predefined library of questions. If the questions asked are not in the library, the application should search online on google and provide an answer. The library of questions will be updated regularly and pushed through software updates.

Business Requirements

  • The application should wake up in response to the specified key word (Like Hey Siri or Ok Google)
  • The application should work in online and offline mode.
  • There will be a predetermined library of questions stored in the application which a child can ask even when offline
  • The application should recognize and process various English accents
  • The application should answer questions or trigger actions such as opening apps or changing volume etc.
  • If the child asks a question which is not available in the library, the application should use Google API to find answers online.
  • The application should interact visually through emojis in addition to displaying results
  • Incase the question is not recognized, the application should probe further in an intelligent manner

Our Solution

We set up an agile product team to execute this project. The team consisted of an Architect, Tech Lead, Back end developers, Android Developers & Quality Analysts. We worked in bi-weekly sprints to demonstrate continuous progress and iterate quickly based on feedback.

The application is based on Natural Language Processing (NLP) and Automatic Speech Recognition (ASR) and Text to Speech (TTS). The application converts the speech input from the child into text before processing it. The result is then converted to voice through TTS before being played back. The voice of the application is modified from the already available google voice libraries.

  • We defined wake words – specific words or words which will wake the tablet and initiate the voice assistance application. The wake word was chosen after comparing the response when spoken in different accents.
  • The principal underlying technologies were Natural Language Processing (NLP) and Automatic Speech Recognition (ASR).
  • The questions were designed to trigger a simple vocal response or trigger actions
  • We used Google assistant API as a fall back option when answering questions in online mode.
  • We decided to use face emoticons to provide visual feedback in the application.
  • We also incorporated a question log to track questions which the application was unable to answer. These questions will help the client update the question library in the future.

Challenges

  • The client had multiple versions of the tablets with different hardware configurations. They were also using custom versions of Android OS.
  • The application had to be integrated with the target hardware and locked so that it would not be compatible with any other hardware.
  • Since these were low cost tablets, the available hardware configurations were low and did not have high end performance capabilities.

Results/Outcome

  • We developed the application for one specific hardware version. We decided to expand it to other hardware versions after extensive trials.
  • The application was designed to support Android 4.4 & above
  • We developed the first version of the application which supported different global English accents including US, AU, CA, GB, SA, IN & UK.
  • We incorporated a kid’s voice TTS to answer the questions. This was done with the intention of making the application more relatable to children, who were the target audience. We sampled a few voice samples before finalising.
  • We created an initial library of 200 Questions which would be available in the offline mode
  • The performance of the app was fine tuned to provide offline responses in under 2 seconds.

How can we help you?

Get in touch with us to schedule a consultation