Some Kotlin Problem with solutions Every developer need to know

How to Make Simple Speech to Text Recognition using Java Sphinx

Today I'm going to write about how to identify few words spoken, using java Sphinx library.
First you have to download "sphinx4-1.0beta3". It is available at http://sourceforge.net/projects/cmusphinx/files/sphinx4/1.0%20beta6/ .
  • Create a new Netbeans project, name it as "sphinxtest".
  • Then open "sphinx4-1.0beta3" downloaded and copy "lib" folder in it. Then paste it into project folder you created.


  • Open that folder and there will be a jsapi.exe. Double click and run it.
How to Make Simple Speech to Text Recognition using Java Sphinx,Make Simple Speech to Text Recognition using Java Sphinx,Simple Speech to Text Recognition using Java Sphinx,Speech to Text Recognition using Java Sphinx,Text Recognition using Java Sphinx,Java Sphinx,

How to Make Simple Speech to Text Recognition using Java Sphinx,Make Simple Speech to Text Recognition using Java Sphinx,Simple Speech to Text Recognition using Java Sphinx,Speech to Text Recognition using Java Sphinx,Text Recognition using Java Sphinx,Java Sphinx,


  • Now we have to add the jar files here to our project. We can done this in normal way we add jar files.
How to Make Simple Speech to Text Recognition using Java Sphinx,Make Simple Speech to Text Recognition using Java Sphinx,Simple Speech to Text Recognition using Java Sphinx,Speech to Text Recognition using Java Sphinx,Text Recognition using Java Sphinx,Java Sphinx,

  • Select all jar files there and add them.
How to Make Simple Speech to Text Recognition using Java Sphinx,Make Simple Speech to Text Recognition using Java Sphinx,Simple Speech to Text Recognition using Java Sphinx,Speech to Text Recognition using Java Sphinx,Text Recognition using Java Sphinx,Java Sphinx,
  • Now Sphinx configuration is done. Now we have to do the programming. We are making the "recognizer" configuration using following xml file.
How to Make Simple Speech to Text Recognition using Java Sphinx,Make Simple Speech to Text Recognition using Java Sphinx,Simple Speech to Text Recognition using Java Sphinx,Speech to Text Recognition using Java Sphinx,Text Recognition using Java Sphinx,Java Sphinx,
<?xml version="1.0" encoding="UTF-8"?>

<!--
   Sphinx-4 Configuration file
-->

<!-- ******************************************************** -->
<!--  an4 configuration file                             -->
<!-- ******************************************************** -->

<config>

    <!-- ******************************************************** -->
    <!-- frequently tuned properties                              -->
    <!-- ******************************************************** -->

    <property name="logLevel" value="WARNING"/>

    <property name="absoluteBeamWidth"  value="-1"/>
    <property name="relativeBeamWidth"  value="1E-80"/>
    <property name="wordInsertionProbability" value="1E-36"/>
    <property name="languageWeight"     value="8"/>

    <property name="frontend" value="epFrontEnd"/>
    <property name="recognizer" value="recognizer"/>
    <property name="showCreations" value="false"/>


    <!-- ******************************************************** -->
    <!-- word recognizer configuration                            -->
    <!-- ******************************************************** -->

    <component name="recognizer" type="edu.cmu.sphinx.recognizer.Recognizer">
        <property name="decoder" value="decoder"/>
        <propertylist name="monitors">
            <item>accuracyTracker </item>
            <item>speedTracker </item>
            <item>memoryTracker </item>
        </propertylist>
    </component>

    <!-- ******************************************************** -->
    <!-- The Decoder   configuration                              -->
    <!-- ******************************************************** -->

    <component name="decoder" type="edu.cmu.sphinx.decoder.Decoder">
        <property name="searchManager" value="searchManager"/>
    </component>

    <component name="searchManager"
        type="edu.cmu.sphinx.decoder.search.SimpleBreadthFirstSearchManager">
        <property name="logMath" value="logMath"/>
        <property name="linguist" value="flatLinguist"/>
        <property name="pruner" value="trivialPruner"/>
        <property name="scorer" value="threadedScorer"/>
        <property name="activeListFactory" value="activeList"/>
    </component>


    <component name="activeList"
             type="edu.cmu.sphinx.decoder.search.PartitionActiveListFactory">
        <property name="logMath" value="logMath"/>
        <property name="absoluteBeamWidth" value="${absoluteBeamWidth}"/>
        <property name="relativeBeamWidth" value="${relativeBeamWidth}"/>
    </component>

    <component name="trivialPruner"
                type="edu.cmu.sphinx.decoder.pruner.SimplePruner"/>

    <component name="threadedScorer"
                type="edu.cmu.sphinx.decoder.scorer.ThreadedAcousticScorer">
        <property name="frontend" value="${frontend}"/>
    </component>

    <!-- ******************************************************** -->
    <!-- The linguist  configuration                              -->
    <!-- ******************************************************** -->

    <component name="flatLinguist"
                type="edu.cmu.sphinx.linguist.flat.FlatLinguist">
        <property name="logMath" value="logMath"/>
        <property name="grammar" value="jsgfGrammar"/>
        <property name="acousticModel" value="wsj"/>
        <property name="wordInsertionProbability"
                value="${wordInsertionProbability}"/>
        <property name="languageWeight" value="${languageWeight}"/>
        <property name="unitManager" value="unitManager"/>
    </component>


    <!-- ******************************************************** -->
    <!-- The Grammar  configuration                               -->
    <!-- ******************************************************** -->

    <component name="jsgfGrammar" type="edu.cmu.sphinx.jsapi.JSGFGrammar">
        <property name="dictionary" value="dictionary"/>
        <property name="grammarLocation"
             value="resource:/spinxtest.Main!/spinxtest"/>
        <property name="grammarName" value="hello"/>
 <property name="logMath" value="logMath"/>
    </component>


    <!-- ******************************************************** -->
    <!-- The Dictionary configuration                            -->
    <!-- ******************************************************** -->

    <component name="dictionary"
        type="edu.cmu.sphinx.linguist.dictionary.FastDictionary">
        <property name="dictionaryPath"
 value="resource:/edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model!/edu/cmu/sphinx/model/acoustic/WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz/dict/cmudict.0.6d"/>
        <property name="fillerPath"
 value="resource:/edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model!/edu/cmu/sphinx/model/acoustic/WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz/dict/fillerdict"/>
        <property name="addSilEndingPronunciation" value="false"/>
        <property name="allowMissingWords" value="false"/>
        <property name="unitManager" value="unitManager"/>
    </component>

    <!-- ******************************************************** -->
    <!-- The acoustic model configuration                         -->
    <!-- ******************************************************** -->
    <component name="wsj"
      type="edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model">
        <property name="loader" value="wsjLoader"/>
        <property name="unitManager" value="unitManager"/>
    </component>

    <component name="wsjLoader"
               type="edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.ModelLoader">
        <property name="logMath" value="logMath"/>
        <property name="unitManager" value="unitManager"/>
    </component>


    <!-- ******************************************************** -->
    <!-- The unit manager configuration                           -->
    <!-- ******************************************************** -->

    <component name="unitManager"
        type="edu.cmu.sphinx.linguist.acoustic.UnitManager"/>

    <!-- ******************************************************** -->
    <!-- The frontend configuration                               -->
    <!-- ******************************************************** -->

    <component name="frontEnd" type="edu.cmu.sphinx.frontend.FrontEnd">
        <propertylist name="pipeline">
            <item>microphone </item>
            <item>preemphasizer </item>
            <item>windower </item>
            <item>fft </item>
            <item>melFilterBank </item>
            <item>dct </item>
            <item>liveCMN </item>
            <item>featureExtraction </item>
        </propertylist>
    </component>

    <!-- ******************************************************** -->
    <!-- The live frontend configuration                          -->
    <!-- ******************************************************** -->
    <component name="epFrontEnd" type="edu.cmu.sphinx.frontend.FrontEnd">
        <propertylist name="pipeline">
            <item>microphone </item>
            <item>dataBlocker </item>
            <item>speechClassifier </item>
            <item>speechMarker </item>
            <item>nonSpeechDataFilter </item>
            <item>preemphasizer </item>
            <item>windower </item>
            <item>fft </item>
            <item>melFilterBank </item>
            <item>dct </item>
            <item>liveCMN </item>
            <item>featureExtraction </item>
        </propertylist>
    </component>

    <!-- ******************************************************** -->
    <!-- The frontend pipelines                                   -->
    <!-- ******************************************************** -->

    <component name="dataBlocker" type="edu.cmu.sphinx.frontend.DataBlocker">
        <!--<property name="blockSizeMs" value="10"/>-->
    </component>

    <component name="speechClassifier"
               type="edu.cmu.sphinx.frontend.endpoint.SpeechClassifier">
        <property name="threshold" value="13"/>
    </component>

    <component name="nonSpeechDataFilter"
               type="edu.cmu.sphinx.frontend.endpoint.NonSpeechDataFilter"/>

    <component name="speechMarker"
               type="edu.cmu.sphinx.frontend.endpoint.SpeechMarker" >
        <property name="speechTrailer" value="50"/>
    </component>


    <component name="preemphasizer"
               type="edu.cmu.sphinx.frontend.filter.Preemphasizer"/>

    <component name="windower"
               type="edu.cmu.sphinx.frontend.window.RaisedCosineWindower">
    </component>

    <component name="fft"
            type="edu.cmu.sphinx.frontend.transform.DiscreteFourierTransform">
    </component>

    <component name="melFilterBank"
        type="edu.cmu.sphinx.frontend.frequencywarp.MelFrequencyFilterBank">
    </component>

    <component name="dct"
            type="edu.cmu.sphinx.frontend.transform.DiscreteCosineTransform"/>

    <component name="liveCMN"
               type="edu.cmu.sphinx.frontend.feature.LiveCMN"/>

    <component name="featureExtraction"
               type="edu.cmu.sphinx.frontend.feature.DeltasFeatureExtractor"/>

    <component name="microphone"
               type="edu.cmu.sphinx.frontend.util.Microphone">
        <property name="closeBetweenUtterances" value="false"/>
    </component>


    <!-- ******************************************************* -->
    <!--  monitors                                               -->
    <!-- ******************************************************* -->

    <component name="accuracyTracker"
                type="edu.cmu.sphinx.instrumentation.BestPathAccuracyTracker">
        <property name="recognizer" value="${recognizer}"/>
        <property name="showAlignedResults" value="false"/>
        <property name="showRawResults" value="false"/>
    </component>

    <component name="memoryTracker"
                type="edu.cmu.sphinx.instrumentation.MemoryTracker">
        <property name="recognizer" value="${recognizer}"/>
 <property name="showSummary" value="false"/>
 <property name="showDetails" value="false"/>
    </component>

    <component name="speedTracker"
                type="edu.cmu.sphinx.instrumentation.SpeedTracker">
        <property name="recognizer" value="${recognizer}"/>
        <property name="frontend" value="${frontend}"/>
 <property name="showSummary" value="true"/>
 <property name="showDetails" value="false"/>
    </component>


    <!-- ******************************************************* -->
    <!--  Miscellaneous components                               -->
    <!-- ******************************************************* -->

    <component name="logMath" type="edu.cmu.sphinx.util.LogMath">
        <property name="logBase" value="1.0001"/>
        <property name="useAddTable" value="true"/>
    </component>

</config>
Here I have highlighted the property named "grammarLocation". It should be changed as where you have saved your grammar file. More detailed description about this xml can be found in "http://cmusphinx.sourceforge.net/sphinx4/doc/ProgrammersGuide.html".
Now we have to write the "Main" class of the program. In this java file we have to configure recognizer and other resources using "helloworld.config.xml" file and allocate resources for them. Then we have to give the control to allocated recognizer which works as the speech engine to identify the speech input given by microphone and give correct text output according to it.
package spinxtest;


import edu.cmu.sphinx.frontend.util.Microphone;
import edu.cmu.sphinx.recognizer.Recognizer;
import edu.cmu.sphinx.result.Result;
import edu.cmu.sphinx.util.props.ConfigurationManager;




public class Main {


  
     public static void main(String[] args) {
        ConfigurationManager cm;


        if (args.length > 0) {
            cm = new ConfigurationManager(args[0]);
        } else {
            cm = new ConfigurationManager(Main.class.getResource("helloworld.config.xml"));
        }


        Recognizer recognizer = (Recognizer) cm.lookup("recognizer");
        recognizer.allocate();


        // start the microphone or exit if the programm if this is not possible
        Microphone microphone = (Microphone) cm.lookup("microphone");
        if (!microphone.startRecording()) {
            System.out.println("Cannot start microphone.");
            recognizer.deallocate();
            System.exit(1);
        }


        System.out.println("Say: (Good morning | Hello) ( One | Two | Three | Four | Five | Six )");


        // loop the recognition until the programm exits.
        while (true) {
            System.out.println("Start speaking. Press Ctrl-C to quit.\n");


            Result result = recognizer.recognize();


            if (result != null) {
                String resultText = result.getBestFinalResultNoFiller();
                System.out.println("You said: " + resultText + '\n');
            } else {
                System.out.println("I can't hear what you said.\n");
            }
        }
    }
}
Now only one part is remaining in the program, that is writing the "grammar" file. The grammar file limits the inputs that can be given to the program.
 Lets first see a sample of a grammar file and see what are the rules we have to consider when writing a grammar file.
#JSGF V1.0;
/**
 * JSGF Grammar for Hello World example
 */
grammar hello;
public <greet> = (Yes | No ) ( Yes | No);

The last line of the file decides what are the words that can be used in this application. Here inputs we can use are "Yes Yes, Yes No, No Yes, No No". The recognizer will decide what has spoken using these combinations. There can be many rules in a grammar file, but public rule is the one which is considered. Now let's see another grammar file which is little complex than this.

#JSGF V1.0;
// Define the grammar name
grammar hello;
// Define the rules
public <Command> = [<Polite>] <Action> <Object> (and <Object>)*;
<Action> = open | close | delete;
<Object> = the window | the file;
<Polite> = please;

Here what is indicated inside [] is optional input, it may or may not be there.
<Object> (and <Object>)*  means there may be 1 or more inputs in this rule separated by "and",  for example "close the window and the file".
Now lets write a grammar file for our application.

#JSGF V1.0;
grammar hello;
public <greet> = (Good morning | Hello) ( One | Two | Three | Four | Five | Six );

Write this in a text document and save it as "hello.gram" in source folder of your project. Now we have to build this grammar file.
How to Make Simple Speech to Text Recognition using Java Sphinx,Make Simple Speech to Text Recognition using Java Sphinx,Simple Speech to Text Recognition using Java Sphinx,Speech to Text Recognition using Java Sphinx,Text Recognition using Java Sphinx,Java Sphinx,
Now our work has done. Now you can run the Main.java file and convert speech input into text.
When you speak in to microphone of your computer according to grammar we defined, output will be like this.
How to Make Simple Speech to Text Recognition using Java Sphinx,Make Simple Speech to Text Recognition using Java Sphinx,Simple Speech to Text Recognition using Java Sphinx,Speech to Text Recognition using Java Sphinx,Text Recognition using Java Sphinx,Java Sphinx,
This output can be different to what you have said and may be wrong. It may be because of one of the following reasons.
  • Recognition accuracy is usually higher in a quiet environment.
  • Higher-quality microphones and audio hardware can improve accuracy.
  • Users that speak clearly (but naturally) usually achieve better accuracy.
  • Users with accents or atypical voices may get lower accuracy.
  • Applications with simpler grammars typically get better accuracy.
  • Applications with less confusable grammars typically get better accuracy. Similar sounding words are harder to distinguish.








Comments

  1. Its does't work ...
    How to create this XML File ... that not properly specified..
    Please Suggest us ... ???

    ReplyDelete

Post a Comment