Writing Native Mobile Apps in a Functional Language Succinctly^®
by Vassili Kaplan

CHAPTER 7

Text-To-Speech and Speech Recognition in CSCS

“Think before you speak. Read before you think.”

Fran Lebowitz

In this chapter, we are going to look at some non-GUI related features, namely, text-to-speech and speech recognition for iOS and Android. We will see that it is very easy to add these features to CSCS as well.

Text-to-speech in CSCS

Code Listing 43 contains a fragment of the text-to-speech implementation for Android. Some non-essential details are not shown.

Code Listing 43: A Fragment of the TTS Implementation for Android

public class TTS : Java.Lang.Object, TextToSpeech.IOnInitListener,

IDisposable
{

  public static bool   Sound { set; get; }           = true;
  public static float  SpeechRate { set; get; }      = 1.0f;
  public static float  Volume { set; get; }          = 0.7f;
  public static float  PitchMultiplier { set; get; } = 1.0f;
  public static string Voice { set; get; }           = "en-US";

  static Context m_context;
  TextToSpeech     m_textToSpeech;
  Java.Util.Locale m_language;

  TTS(string voice)
  {
    m_language = UtilsDroid.LocaleFromString(voice, false);
    m_textToSpeech = new TextToSpeech(Application.Context,

                        this, "com.google.android.tts");
    m_textToSpeech.SetLanguage(m_language);
  }

  public void Speak(string text, bool force = false)
  {
    if (!Sound && !force) {
      return;
    }
    m_textToSpeech.Stop();

    m_textToSpeech.SetLanguage(m_language);
    m_textToSpeech.SetSpeechRate(SpeechRate);
    m_textToSpeech.SetPitch(PitchMultiplier);

    if (Build.VERSION.SdkInt >= BuildVersionCodes.Lollipop) { // API 21
      string utteranceId = text.GetHashCode() + "";
      m_textToSpeech.Speak(text, QueueMode.Flush, null, utteranceId);
    } else {
      m_textToSpeech.Speak(text, QueueMode.Flush, null);
    }
  }

  void TextToSpeech.IOnInitListener.OnInit(OperationResult status)
  {
    if (!Sound) {
      return;
    }

    if (status == OperationResult.Error) {
      m_textToSpeech.SetLanguage(Java.Util.Locale.Default);
      return;
    }

m_textToSpeech.SetLanguage(m_language);

  }

  public static void OnTTSResult(int req, Result res, Intent data)
  {

if (req == TTS_CHECK_DATA || req == TTS_INSTALLED_DATA) {
IList<string> availableLanguages = data.GetStringArrayListExtra(

TextToSpeech.Engine.ExtraAvailableVoices);
IList<string> unavailableLanguages = data.GetStringArrayListExtra(

TextToSpeech.Engine.ExtraUnavailableVoices);
if (availableLanguages.Count <= 1 && m_context != null) {

        var installTTS = new Intent();
        installTTS.SetAction(TextToSpeech.Engine.ActionInstallTtsData);
        m_context.StartActivity(installTTS);
      }
    }
  }
}

To register the text-to-speech functionality with the parser, we use the following calls:

ParserFunction.RegisterFunction("InitTTS", new InitTTSFunction());

ParserFunction.RegisterFunction("Speak", new SpeakFunction());

The implementation of the gluing class SpeakFunction is shown in Code Listing 44.

Code Listing 44: The Implementation of the SpeakFunction Class

public class SpeakFunction : ParserFunction
{
  protected override Variable Evaluate(ParsingScript script)
  {
    List<Variable> args = script.GetFunctionArgs();

    Utils.CheckArgs(args.Count, 1, m_name);

    string phrase = args[0].AsString();
    TTS.Voice     = Utils.GetSafeString(args, 1, TTS.Voice);

    bool force    = Utils.GetSafeInt(args, 2) != 0;

    TTS tts = new TTS(TTS.Voice);
    tts.Speak(phrase, force);

    return Variable.EmptyInstance;
  }
}

Speech recognition in CSCS

For the speech recognition functionality, the app requires some permissions. On iOS, you have to add NSSpeechRecognitionUsageDescription and NSMicrophoneUsageDescription keys to your Info.plist file. Figure 24 shows how it will look in Visual Studio.

Speech Recognition in Info.plist on iOS

Figure 24: Speech Recognition in Info.plist on iOS

On Android, I use the SpeechRecognizer class for speech recognition. The following property must be added to your AndroidManifest.xml file:

<uses-permission android:name="android.permission.RECORD_AUDIO"/>

The essential details of the speech recognition on iOS are shown in Code Listing 45.

Code Listing 45: A Fragment of the Speech Recognition Implementation for iOS

public class STT
{
  public Action<string> OnSpeechOK;
  public Action<string> OnSpeechError;

  static public string Voice { set; get; } = "en-US";

  static UIViewController                      m_controller;
  static AVAudioEngine                         AudioEngine;
  static SFSpeechRecognizer                    SpeechRecognizer;
  static SFSpeechAudioBufferRecognitionRequest LiveSpeechRequest;
  static SFSpeechRecognitionTask               RecognitionTask;
  static System.Timers.Timer                   m_speechTimer;

  static int m_timeout;
  static DateTime m_lastSpeech;
  static DateTime m_startSpeech;

  public static bool IsCancelled   { get; set; }
  public static bool IsRecording   { get; set; }
  public static bool SpeechEnabled { get; set; }
  public string      LastResult   { get; set; }

  public static void RequestAuthorization()
  {
    SFSpeechRecognizer.RequestAuthorization(

       (SFSpeechRecognizerAuthorizationStatus status) => {
      switch (status) {
        case SFSpeechRecognizerAuthorizationStatus.Authorized:
          SpeechEnabled = true;
          break;
        case SFSpeechRecognizerAuthorizationStatus.Denied:

          SpeechEnabled = false;
          break;
        case SFSpeechRecognizerAuthorizationStatus.NotDetermined:
          SpeechEnabled = false;
          break;
        case SFSpeechRecognizerAuthorizationStatus.Restricted:

          SpeechEnabled = true;
          break;
      }
    });
  }
  public void StartRecording(string voice, bool longTimeout = false)
  {
    if (!SpeechEnabled) {
      return;
    }
    AudioEngine = new AVAudioEngine();
    NSLocale voiceLocale = NSLocale.FromLocaleIdentifier(voice);
    SpeechRecognizer = new SFSpeechRecognizer(voiceLocale);
    LiveSpeechRequest = new SFSpeechAudioBufferRecognitionRequest();

    NSError error;
    var audioSession = AVAudioSession.SharedInstance();
    audioSession.SetCategory(AVAudioSessionCategory.Record);
    audioSession.SetMode(AVAudioSession.ModeMeasurement, out error);
    if (error != null) {
      OnSpeechError?.Invoke("Audio session error: " + error.ToString());
      return;
    }
    audioSession.SetActive(true,

        AVAudioSessionSetActiveOptions.NotifyOthersOnDeactivation);
    LiveSpeechRequest.ShouldReportPartialResults = true;

    var node = AudioEngine.InputNode;
    if (node == null) {
      OnSpeechError?.Invoke("Couldn't initialize Speech Input");
      return;
    }

    RecognitionTask = SpeechRecognizer.GetRecognitionTask(

LiveSpeechRequest,

                     (SFSpeechRecognitionResult result, NSError err) => {
      if (IsCancelled) {
        node.RemoveTapOnBus(0);
        return;
      }
      if (err != null) {
        OnSpeechError?.Invoke(err.ToString());
      } else if (result != null) {
        LastResult = result.BestTranscription.FormattedString;
        m_lastSpeech = DateTime.Now;
        if (result.Final) {
          OnSpeechOK?.Invoke(LastResult);
        }
      }
      if ((result != null && result.Final) || err != null) {
        IsRecording = false;
        AudioEngine.Stop();
        m_speechTimer.Close();
      }
    });

    var recordingFormat = node.GetBusOutputFormat(0);
    node.InstallTapOnBus(0, 1024, recordingFormat,

     (AVAudioPcmBuffer buffer, AVAudioTime when) => {
      LiveSpeechRequest.Append(buffer);
    });

    // Start recording
    AudioEngine.Prepare();
    AudioEngine.StartAndReturnError(out error);

    if (error != null) {
      OnSpeechError?.Invoke("Speech init error: " + error.ToString());
      IsRecording = false;
      return;
    }
    IsRecording = true;
    IsCancelled = false;
    LastResult = "";
    m_lastSpeech = DateTime.MaxValue;
    m_startSpeech = DateTime.Now;
    m_timeout = longTimeout ? m_phraseTimeout : m_wordTimeout;

    m_speechTimer = new System.Timers.Timer(250);
    m_speechTimer.AutoReset = true;
    m_speechTimer.Elapsed += (sender, e) => {
      CheckRecording();
    };
    m_speechTimer.Start();
  }

}

The user is explicitly asked for the speech recognition authorization; this is dealt in the RequestAuthorization() method.

We hook everything up with the parser by using the following statement:

ParserFunction.RegisterFunction("VoiceRecognition", new VoiceFunction());

The implementation of the VoiceFunction class is in Code Listing 46. Note that the speech recognition functionality is available starting from iOS 10.0, so we have a corresponding check there. This check can be added to the CSCS code as well.

Code Listing 46: The Implementation of the VoiceFunction Class

public class VoiceFunction : ParserFunction
{
  protected override Variable Evaluate(ParsingScript script)
  {
    List<Variable> args = script.GetFunctionArgs();

    Utils.CheckArgs(args.Count, 1, m_name);

    string strAction = args[0].AsString();
    STT.Voice = Utils.GetSafeString(args, 1, STT.Voice).Replace('_', '-');

    bool speechEnabled = UIDevice.CurrentDevice.CheckSystemVersion(10, 0);
    if (!speechEnabled) {
      UIVariable.GetAction(strAction, "\"" +
       string.Format(

"Speech recognition requires iOS 10.0 or higher. You have iOS {0}",
                     UIDevice.CurrentDevice.SystemVersion) + "\"", "");
      return Variable.EmptyInstance;
    }

    UIViewController controller = AppDelegate.GetCurrentController();
    m_speech = new STT(controller);
    m_speech.OnSpeechError += (errorStr) => {
      controller.InvokeOnMainThread(() => {
        UIVariable.GetAction(strAction, "\"" + errorStr + "\"", "");
      });
    };
    m_speech.OnSpeechOK += (recognized) => {
      controller.InvokeOnMainThread(() => {
        UIVariable.GetAction(strAction, "", "\"" + recognized + "\"");
      });
    };

    m_speech.StartRecording(STT.Voice);
    return Variable.EmptyInstance;
  }
}

An example of text-to-speech and speech recognition

Code Listing 47 contains a CSCS example for both text-to-speech and speech recognition.

Code Listing 47: Text-to-Speech and Speech Recognition in CSCS

AutoScale();
SetBackground("scream");
voice = "en-US";

locButtonTalk = GetLocation("ROOT", "CENTER", "ROOT", "BOTTOM", 0, -10);
AddButton(locButtonTalk, "buttonTalk", "", 200, 200);
AddAction(buttonTalk, "talk_click");
SetImage(buttonTalk, "microphone");

function talk_click(sender, arg)
{

  ShowToast("Start talking...", 5, "white", "black");
  VoiceRecognition("voice_recog", voice);
}
function voice_recog(errorStatus, recognized)
{
  if (errorStatus != "") {
    AlertDialog("CSCS", "Error: " + errorStatus);
  } else {
    ShowToast("Recognized: " + recognized, 5, "blue", "green");
    Speak("You said: " + recognized, voice);
  }
}

The ShowToast() CSCS function shows a native Toast on Android, and a custom made Toast on iOS (very similar to the Android Toast). The AlertDialog() CSCS function calls corresponding native dialogs on iOS and Android.

The result of running the CSCS code of Code Listing 47 is shown in Figure 25. On iOS, we have a successful result of recognizing the “Hello, World!” phrase, so that a Toast is shown and the recognized phrase is spoken by the text-to-speech feature.

On Android, the speech recognition fails—it’s not so straightforward to set up the microphone on Android simulators. On the real devices, everything works fine.

Running Speech Example on iOS and Android

Figure 25: Running Speech Example on iOS and Android

Summary

In this chapter, we saw how to add a non-GUI related functionality to CSCS. The principle remains the same: implement a class deriving from the ParserFunction class, and override its Evaluate() method.

This method serves as a wrapper method for all of the functionality that you want to add to CSCS. Don’t forget to register the newly created class with the CSCS parser.

In the next chapter, we are going to learn how to add yet another useful feature to your app: In-App Purchases on iOS, and In-App Billing on Android.

Build apps 2X faster

using Syncfusion Essential Studio^® suite

1800+ high-performance UI components.
Includes popular controls such as Grid, Chart, Scheduler, and more.
24x5 unlimited support by developers.

Get Your Free Trial Now

Text-To-Speech and Speech Recognition in CSCS

Text-to-speech in CSCS

Speech recognition in CSCS

An example of text-to-speech and speech recognition

Summary

DISCLAIMER: Web reader is currently in beta. Please report any issues through our support system. PDF and Kindle format files are also available for download.