pod afFancomSapi

Fancom Classes for Microsoft Speech API (SAPI) 5.4

Mixins

SpeechConstants
SpeechStringConstants

Classes

ISpeechAudio

Supports the control of real-time audio streams, such as those connected to a live microphone or telephone line.

ISpeechAudioBufferInfo

Defines the audio stream buffer information.

ISpeechAudioStatus

Provides control over the operation of real-time audio streams.

ISpeechBaseStream

Defines properties and methods for manipulating data streams.

ISpeechDataKey

Provides read and write access to the speech configuration database.

ISpeechGrammarRule

Defines the properties and methods of a speech grammar rule.

ISpeechGrammarRuleState

Presents the properties and methods of a speech grammar rule state.

ISpeechGrammarRuleStateTransition

Returns data about a transition from one rule state to another, or from a rule state to the end of a rule.

ISpeechGrammarRuleStateTransitions

Represents a collection of ISpeechGrammarRuleStateTransition objects.

ISpeechGrammarRules

Represents a collection of ISpeechGrammarRule objects.

ISpeechLexicon

Provides access to a lexicon word.

ISpeechLexiconPronunciation

Provides access to the pronunciations of a speech lexicon word.

ISpeechLexiconPronunciations

Represents a collection of ISpeechLexiconPronunciation objects.

ISpeechLexiconWord

Provides access to a lexicon word.

ISpeechLexiconWords

Represents a collection of ISpeechLexiconWord objects.

ISpeechMMSysAudio

Supports audio implementation for the standard Windows wave-in multimedia layer.

ISpeechObjectTokens

Represents a collection of SpObjectToken objects.

ISpeechPhraseAlternate

Enables applications to retrieve alternate phrase information from a speech recognition (SR) engine, and to update the SR engine's language model to reflect committed alternate changes.

ISpeechPhraseAlternates

Is a collection of ISpeechPhraseAlternate objects.

ISpeechPhraseElement

Provides access to information about a word or phrase.

ISpeechPhraseElements

Represents a collection of ISpeechPhraseElement objects.

ISpeechPhraseInfo

Contains properties detailing phrase elements.

ISpeechPhraseProperties

Represents a collection of ISpeechPhraseProperty objects.

ISpeechPhraseProperty

Stores the information for a semantic property.

ISpeechPhraseReplacement

specifies a replacement, or text normalization, of one or more spoken words in a recognition result.

ISpeechPhraseReplacements

Represents a collection of ISpeechPhraseReplacement objects.

ISpeechPhraseRule

Represents the part of a recognition result that returns information about the grammar rule that produced the recognition.

ISpeechPhraseRules

Represents a collection of ISpeechPhraseRule objects.

ISpeechRecoContext

Defines a recognition context.

ISpeechRecoGrammar

Enables applications to manage the words and phrases for the SR engine.

ISpeechRecoResult

Returns information about a recognition attempt.

ISpeechRecoResult2

Returns information about a recognition attempt.

ISpeechRecoResultTimes

Contains the time information for speech recognition results.

ISpeechRecognizer

Represents a speech recognition engine.

ISpeechRecognizerStatus

Returns the status of the speech recognition (SR) engine represented by the recognizer object.

ISpeechResourceLoader

Gives applications control over loading resources.

ISpeechVoiceStatus

Defines the types of information returned by the SpVoice.Status method.

ISpeechXMLRecoResult

Gets recognition results from the ISpXMLRecoResult as an SML document.

SPSEMANTICFORMAT

Lists the various values of a grammar's tag-format attribute.

SpAudioFormat

Represents an audio format.

SpCustomStream

Supports the use of existing IStream objects in SAPI.

SpFileStream

Enables data streams to be read and written as files.

SpInProcRecoContext

Defines a recognition context.

SpInProcRecognizer

Represents a speech recognition engine.

SpLexicon

Provides access to lexicons.

SpMMAudioIn

Supports audio implementation for the standard Windows wave-in multimedia layer.

SpMMAudioOut

Supports audio implementation for the standard Windows wave-out multimedia layer.

SpMemoryStream

Supports audio stream operations in memory.

SpObjectToken

Represents an available resource of a type used by SAPI.

SpObjectTokenCategory

Represents a class of object tokens.

SpPhoneConverter

Supports conversion between phoneme symbols and phoneme IDs.

SpPhraseInfoBuilder

Provides the ability to rebuild phrase information from audio data saved to memory.

SpSharedRecoContext

Defines a recognition context.

SpSharedRecognizer

Represents a speech recognition engine.

SpTextSelectionInformation

Provides access to the text selection information pertaining to a word sequence buffer.

SpUnCompressedLexicon

Provides access to lexicons, which contain information about words that can be recognized or spoken.

SpVoice

The SpVoice object brings the text-to-speech (TTS) engine capabilities to applications using SAPI automation.

SpWaveFormatEx

Represents the format of waveform-audio data.

SpeechDiscardType

Lists flags indicating portions of a recognition result to be removed or eliminated once they are no longer needed.

SpeechDisplayAttributes

Lists the possible ways of displaying a word.

SpeechEmulationCompareFlags

Values of comparison options in emulation.

SpeechLexiconType

Lists the allowed lexicon types.

SpeechRecoEvents

Lists speech recognition (SR) events.

SpeechRecognitionType

Lists the types of speech recognition.

SpeechRuleAttributes

Lists the possible attributes of a grammar rule.

SpeechVoiceEvents

Lists the types of events which a text-to-speech (TTS) engine can send to an SpVoice object.

SpeechVoiceSpeakFlags

Lists flags that control the SpVoice.Speak method.

Enums

SPCATEGORYTYPE

Lists the different states of Speech Recognizer as categories.

SPXMLRESULTOPTIONS

Used to designate whether the main result or the alternates are desired.

SpeechAudioFormatType

Lists the supported stream formats.

SpeechAudioState

Lists the four possible audio input and output states.

SpeechBookmarkOptions

Lists bookmark options.

SpeechDataKeyLocation

Lists the top-level speech configuration database keys.

SpeechEngineConfidence

Specifies levels of confidence.

SpeechFormatType

Requests either the input format for the original audio source, or the format that actually arrives at the speech engine.

SpeechGrammarRuleStateTransitionType

Lists the types of transitions for the speech recognition engine.

SpeechGrammarState

Lists the possible states of a speech grammar.

SpeechGrammarWordType

The SpeechGrammarWordType enumeration lists the types of words in a grammar.

SpeechInterference

Lists factors that can interfere with accurate recognition of speech input.

SpeechLoadOption

Lists the options available when loading a speech grammar.

SpeechPartOfSpeech

Lists the parts-of-speech categories used in SAPI.

SpeechRecoContextState

Lists the states of a recognition context.

SpeechRecognizerState

Lists the states of a Recognizer object.

SpeechRetainedAudioOptions

lists the options for retaining data from an audio stream.

SpeechRuleState

Lists the states of a speech grammar rule.

SpeechRunState

Lists the running states of a TTS voice.

SpeechSpecialTransitionType

Lists special transitions for the speech recognition engine.

SpeechStreamFileMode

Lists the access modes of a file stream.

SpeechStreamSeekPositionType

Lists the types of positioning from which a Seek method can be performed.

SpeechTokenContext

Lists the context in which the code managing the newly created object runs.

SpeechTokenShellFolder

Lists possible locations storing token information.

SpeechVisemeFeature

Lists the features of phonemes and visemes.

SpeechVisemeType

Lists the visemes supported by the SpVoice object.

SpeechVoicePriority

Lists the possible Priority settings of an SpVoice object.

SpeechWordPronounceable

Lists the possible return values from the IsPronounceable method of the ISpeechRecoGrammar interface.

SpeechWordType

Lists the change state of a word/pronunciation combination in a lexicon.

Overview

Fancom Sapi is a complete collection of classes that wrap Microsoft Speech API (SAPI) 5.4 when running Fantom on a JVM.

Speech

Making your computer speak couldn't be simpler than:

SpVoice().speak("It's time to kick ass 'n' chew bubble gum!")

A more complete example that initialises proper COM threading, lists available voices, and speaks in the background is:

static Void main(Str[] args) {
  afFancom::ComThread.initSta

  spVoice := afFancomSapi::SpVoice()

  Obj.echo("Available voices:")
  spVoice.getVoices.each {
    Obj.echo(" - ${it->getDescription}")
  }

  name := spVoice.voice.getDescription.split('-')[0]
  spVoice.speak("Hello, I'm $name", SpeechVoiceSpeakFlags.SVSFlagsAsync)

  concurrent::Actor.sleep(3sec)

  afFancom::ComThread.release
}

Speech Recognition

Speech recognition is a bit more involved as you need to initialise an input stream, register some grammar to listen for and set up an event sink to recieve callbacks. Never the less, a complete example is given below:

using gfx
using fwt
using afFancom
using afFancomSapi

class SpeechRecognition {

  static Void main(Str[] args) {
    ComThread.initSta

    recoCtx  := SpInProcRecoContext()

    // initialise the input stream / microphone
    // not needed with an SpSharedRecoContext
    category := SpObjectTokenCategory()
    category.setId(SpeechStringConstants.SpeechCategoryAudioIn)
    token := SpObjectToken()
    token.setId(category.default_)
    recoCtx.recognizer.audioInput = token

    // register some commands to listen for
    grammar  := recoCtx.createGrammar
    rule   := grammar.rules.add("awesome", SpeechRuleAttributes.SRATopLevel)
    rule.initialState.addWordTransition(null, "Kick Ass")
    rule.initialState.addWordTransition(null, "Chew Bubblegum")
    grammar.rules.commit
    grammar.cmdSetRuleState("awesome", SpeechRuleState.SGDSActive)

    // register an event sink
    recoCtx.withEvents(SpeechRecognition())

    window := Window {
      it.size = Size(320, 240)
      it.title = "Say Kick Ass!"
    }.open

    ComThread.release
  }

  Void onRecognition(Int streamNumber, Variant streamPosition, SpeechRecognitionType recognitionType, ISpeechRecoResult result) {
    utterance := result.phraseInfo.getText.capitalize
    if (utterance.contains("gum"))
      Obj.echo("Chewing gum.")
    else
      Obj.echo("Hur hur, you said, 'Ass'!!!")
  }
}

See ISpeechRecoContext (Events) for a list of possible callback events.

Release Notes

v1.0.4

v1.0.2

  • Enums with values were not auto-generated with a Variant surrogate fromVariant() static factory method

v1.0.0

  • Initial release