Text-To-Speech and Speech Recognition in CSCS
“Think before you speak. Read before you think.”
Fran Lebowitz
In this chapter, we are going to look at some non-GUI related features, namely, text-to-speech and speech recognition for iOS and Android. We will see that it is very easy to add these features to CSCS as well.
Text-to-speech in CSCS
Code Listing 43 contains a fragment of the text-to-speech implementation for Android. Some non-essential details are not shown.
Code Listing 43 : A Fragment of the TTS Implementation for Android
public class TTS : Java.Lang. Object , TextToSpeech . IOnInitListener ,
IDisposable {
public static bool Sound { set ; get ; } = true ; public static float SpeechRate { set ; get ; } = 1.0f ; public static float Volume { set ; get ; } = 0.7f ; public static float PitchMultiplier { set ; get ; } = 1.0f ; public static string Voice { set ; get ; } = "en-US" ; static Context m_context; TextToSpeech m_textToSpeech; Java.Util. Locale m_language;
TTS( string voice) { m_language = UtilsDroid .LocaleFromString(voice, false ); m_textToSpeech = new TextToSpeech ( Application .Context,
this , "com.google.android.tts" ); m_textToSpeech.SetLanguage(m_language); }
public void Speak( string text, bool force = false ) { if (!Sound && !force) { return ; } m_textToSpeech.Stop();
m_textToSpeech.SetLanguage(m_language); m_textToSpeech.SetSpeechRate(SpeechRate); m_textToSpeech.SetPitch(PitchMultiplier); if ( Build . VERSION .SdkInt >= BuildVersionCodes .Lollipop) { // API 21 string utteranceId = text.GetHashCode() + "" ; m_textToSpeech.Speak(text, QueueMode .Flush, null , utteranceId); } else { m_textToSpeech.Speak(text, QueueMode .Flush, null ); } } void TextToSpeech . IOnInitListener .OnInit( OperationResult status) { if (!Sound) { return ; }
if (status == OperationResult .Error) { m_textToSpeech.SetLanguage(Java.Util. Locale .Default); return ; }
m_textToSpeech.SetLanguage(m_language);
} public static void OnTTSResult( int req, Result res, Intent data) {
if (req == TTS_CHECK_DATA || req == TTS_INSTALLED_DATA) { IList < string > availableLanguages = data.GetStringArrayListExtra(
TextToSpeech . Engine .ExtraAvailableVoices); IList < string > unavailableLanguages = data.GetStringArrayListExtra(
TextToSpeech . Engine .ExtraUnavailableVoices); if (availableLanguages.Count <= 1 && m_context != null ) {
var installTTS = new Intent (); installTTS.SetAction( TextToSpeech . Engine .ActionInstallTtsData); m_context.StartActivity(installTTS); } } } }
To register the text-to-speech functionality with the parser, we use the following calls:
ParserFunction .RegisterFunction( "InitTTS" , new InitTTSFunction ());
ParserFunction .RegisterFunction( "Speak" , new SpeakFunction ());
The implementation of the gluing class SpeakFunction is shown in Code Listing 44 .
Code Listing 44 : The Implementation of the SpeakFunction Class
public class SpeakFunction : ParserFunction { protected override Variable Evaluate( ParsingScript script) { List < Variable > args = script.GetFunctionArgs();
Utils .CheckArgs(args.Count, 1 , m_name); string phrase = args[ 0 ].AsString(); TTS .Voice = Utils .GetSafeString(args, 1 , TTS .Voice);
bool force = Utils .GetSafeInt(args, 2 ) != 0 ; TTS tts = new TTS ( TTS .Voice); tts.Speak(phrase, force); return Variable .EmptyInstance; } }
Speech recognition in CSCS
For the speech recognition functionality, the app requires some permissions. On iOS, you have to add NSSpeechRecognitionUsageDescription and NSMicrophoneUsageDescription keys to your Info.plist file. Figure 24 shows how it will look in Visual Studio.
Figure 24 : Speech Recognition in Info.plist on iOS
On Android, I use the SpeechRecognizer class for speech recognition. The following property must be added to your AndroidManifest.xml file:
< uses-permission android : name = "android.permission.RECORD_AUDIO" />
The essential details of the speech recognition on iOS are shown in Code Listing 45 .
Code Listing 45 : A Fragment of the Speech Recognition Implementation for iOS
public class STT { public Action < string > OnSpeechOK; public Action < string > OnSpeechError; static public string Voice { set ; get ; } = "en-US" ; static UIViewController m_controller; static AVAudioEngine AudioEngine; static SFSpeechRecognizer SpeechRecognizer; static SFSpeechAudioBufferRecognitionRequest LiveSpeechRequest; static SFSpeechRecognitionTask RecognitionTask; static System.Timers. Timer m_speechTimer; static int m_timeout; static DateTime m_lastSpeech; static DateTime m_startSpeech; public static bool IsCancelled { get ; set ; } public static bool IsRecording { get ; set ; } public static bool SpeechEnabled { get ; set ; } public string LastResult { get ; set ; } public static void RequestAuthorization() { SFSpeechRecognizer .RequestAuthorization(
( SFSpeechRecognizerAuthorizationStatus status) => { switch (status) { case SFSpeechRecognizerAuthorizationStatus .Authorized: SpeechEnabled = true ; break ; case SFSpeechRecognizerAuthorizationStatus .Denied:
SpeechEnabled = false ; break ; case SFSpeechRecognizerAuthorizationStatus .NotDetermined: SpeechEnabled = false ; break ; case SFSpeechRecognizerAuthorizationStatus .Restricted:
SpeechEnabled = true ; break ; } }); } public void StartRecording( string voice, bool longTimeout = false ) { if (!SpeechEnabled) { return ; } AudioEngine = new AVAudioEngine (); NSLocale voiceLocale = NSLocale .FromLocaleIdentifier(voice); SpeechRecognizer = new SFSpeechRecognizer (voiceLocale); LiveSpeechRequest = new SFSpeechAudioBufferRecognitionRequest (); NSError error; var audioSession = AVAudioSession .SharedInstance(); audioSession.SetCategory( AVAudioSessionCategory .Record); audioSession.SetMode( AVAudioSession .ModeMeasurement, out error); if (error != null ) { OnSpeechError?.Invoke( "Audio session error: " + error.ToString()); return ; } audioSession.SetActive( true ,
AVAudioSessionSetActiveOptions .NotifyOthersOnDeactivation); LiveSpeechRequest.ShouldReportPartialResults = true ; var node = AudioEngine.InputNode; if (node == null ) { OnSpeechError?.Invoke( "Couldn't initialize Speech Input" ); return ; } RecognitionTask = SpeechRecognizer.GetRecognitionTask(
LiveSpeechRequest,
( SFSpeechRecognitionResult result, NSError err) => { if (IsCancelled) { node.RemoveTapOnBus( 0 ); return ; } if (err != null ) { OnSpeechError?.Invoke(err.ToString()); } else if (result != null ) { LastResult = result.BestTranscription.FormattedString; m_lastSpeech = DateTime .Now; if (result.Final) { OnSpeechOK?.Invoke(LastResult); } } if ((result != null && result.Final) || err != null ) { IsRecording = false ; AudioEngine.Stop(); m_speechTimer.Close(); } }); var recordingFormat = node.GetBusOutputFormat( 0 ); node.InstallTapOnBus( 0 , 1024 , recordingFormat,
( AVAudioPcmBuffer buffer, AVAudioTime when) => { LiveSpeechRequest.Append(buffer); }); // Start recording AudioEngine.Prepare(); AudioEngine.StartAndReturnError( out error); if (error != null ) { OnSpeechError?.Invoke( "Speech init error: " + error.ToString()); IsRecording = false ; return ; } IsRecording = true ; IsCancelled = false ; LastResult = "" ; m_lastSpeech = DateTime .MaxValue; m_startSpeech = DateTime .Now; m_timeout = longTimeout ? m_phraseTimeout : m_wordTimeout; m_speechTimer = new System.Timers. Timer ( 250 ); m_speechTimer.AutoReset = true ; m_speechTimer.Elapsed += (sender, e) => { CheckRecording(); }; m_speechTimer.Start(); }
}
The user is explicitly asked for the speech recognition authorization; this is dealt in the RequestAuthorization() method.
We hook everything up with the parser by using the following statement:
ParserFunction .RegisterFunction( "VoiceRecognition" , new VoiceFunction ());
The implementation of the VoiceFunction class is in Code Listing 46 . Note that the speech recognition functionality is available starting from iOS 10.0, so we have a corresponding check there. This check can be added to the CSCS code as well.
Code Listing 46 : The Implementation of the VoiceFunction Class
public class VoiceFunction : ParserFunction { protected override Variable Evaluate( ParsingScript script) { List < Variable > args = script.GetFunctionArgs();
Ut ils.C heckArgs(args.Count, 1 , m_name); string strAction = args[ 0 ].AsString(); ST T.V oice = Ut ils. GetSafeString(args, 1 , STT .Voice).Replace( '_' , '-' ); bool speechEnabled = UIDevice .CurrentDevice.CheckSystemVersion( 10 , 0 ); if (!speechEnabled) { UI Variable.G etAction(strAction, " \" " + string .Format(
" Speech recognition requires iOS 10.0 or higher. You have iOS {0} " , UI Device.C urrentDevice.SystemVersion) + " \" " , "" ); return Va riable.E mptyInstance; }
UIVi ewController con troller = App Delegate. GetCurrentController(); m_speech = new ST T(c ontroller); m_speech.OnSpeechError += (errorStr) => { controller.InvokeOnMainThread(() => { UI Variable.G etAction(strAction, " \" " + errorStr + " \" " , "" ); }); }; m_speech.OnSpeechOK += (recognized) => { controller.InvokeOnMainThread(() => { UIVa riable.Get Action(strAction, "" , " \" " + recognized + " \" " ); }); }; m_speech.StartRecording(STT.Vo ice ); return Variab le.Emp tyInstance; } }
An example of text-to-speech and speech recognition
Code Listing 47 contains a CSCS example for both text-to-speech and speech recognition.
Code Listing 47 : Text-to-Speech and Speech Recognition in CSCS
AutoScale(); SetBackground( "scream" ); voice = "en-US" ; locButtonTalk = GetLocation( "ROOT" , "CENTER" , "ROOT" , "BOTTOM" , 0 , - 10 ); AddButton(locButtonTalk, "buttonTalk" , "" , 200 , 200 ); AddAction(buttonTalk, "talk_click" ); SetImage(buttonTalk, "microphone" ); function talk_click(sender, arg) {
ShowToast( "Start talking..." , 5 , "white" , "black" ); VoiceRecognition( "voice_recog" , voice); } function voice_recog(errorStatus, recognized) { if (errorStatus != "" ) { AlertDialog( "CSCS" , "Error: " + errorStatus); } else { ShowToast( "Recognized: " + recognized, 5 , "blue" , "green" ); Speak( "You said: " + recognized, voice); } }
The ShowToast() CSCS function shows a native Toast on Android, and a custom made Toast on iOS (very similar to the Android Toast). The AlertDialog() CSCS function calls corresponding native dialogs on iOS and Android.
The result of running the CSCS code of Code Listing 47 is shown in Figure 25 . On iOS, we have a successful result of recognizing the “Hello, World!” phrase, so that a Toast is shown and the recognized phrase is spoken by the text-to-speech feature.
On Android, the speech recognition fails—it’s not so straightforward to set up the microphone on Android simulators. On the real devices, everything works fine.
Figure 25 : Running Speech Example on iOS and Android
Summary
In this chapter, we saw how to add a non-GUI related functionality to CSCS. The principle remains the same: implement a class deriving from the ParserFunction class, and override its Evaluate() method.
This method serves as a wrapper method for all of the functionality that you want to add to CSCS. Don’t forget to register the newly created class with the CSCS parser.
In the next chapter, we are going to learn how to add yet another useful feature to your app: In-App Purchases on iOS, and In-App Billing on Android.