By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email
No need to pay just yet!
About this sample
About this sample
Words: 1733 |
Pages: 4|
9 min read
Published: Jul 17, 2018
Words: 1733|Pages: 4|9 min read
Published: Jul 17, 2018
Speech recognition gives the text output to given voice, in short, this is a speech to text (STT) conversion. It is helpful for the deaf, dumb and disables people. This project is to improve the efficiency of the speech recognition accuracy. Developed the speech recognition system with own dictionary, in order to improve the efficiency of the speech recognition system. Errors usually not only vary in the numbers but also have different degrees of impact on optimizing a set of acoustic models. It is important to correct the errors in the results of speech recognition to increase the performance of a speech recognition system. Errors are detected and corrected according to the database learned from erroneous-correct utterance pairs. While running the speech recognition system it displays the References and Hypothesis values and errors. By balancing the errors we can improve the speech recognition accuracy. By removing the silence from the speech signal we can improve the speech accuracy.
Speech recognition is a process of converting the spoken words into text. Speech recognition is analyzing an acoustic speech signal to identify the linguistic message. Speech Recognition systems compare the spoken words and text then gives the accuracy These Recognition systems are playing a vital role in facilitating the daily activities. Speech Recognition applications include voice dialing, call routing, and content-based spoken audio search, data entry, preparation of structured documents, speech-to-text processing and in aircraft cockpits. In addition to these, speech recognition system can be used for people with vision-related disabilities, crippled hands. In the underdeveloped countries where the literacy rate is poor, this can provide a mechanism of information access to people who are unable to read and write as well as people who may be literate but not qualified in computing skills.
Speech Recognition is defined as the ability of a computer to understand spoken commands or responses is an important factor in the human-computer interaction. SR has been available for many years, but it has not been practical due to the high cost of applications and computing resources. The SR had significant growth in telephony, voice-to-text applications. Increasing efficiency of workers that perform extensive typing, assisting with disabilities and managing call centers by reducing staffing costs, shows advantages of speech recognition. Speech recognition is the process by which a computer identifies spoken words. Basically, it means talking to your computer and having it correctly recognizes what you are saying. Simply it is a Signal to Symbol transformation i.e., takes the speech as input and gives the text as output.
Recognition Models:
Speaker dependent: Speech recognition systems that can only recognize the speech of users it is trained to understand is called speaker dependent speech recognizer. Limited to understand selected speakers.
Speaker Independent: Speech recognition software that recognizes a variety of speakers, without any training is called the speaker independent speech recognizer.
Hidden Markov Model:
Every speech recognition system is associated with the Hidden Markov Model:
A Hidden Markov Model is a probabilistic state machine that can be used to model and recognize speech. Consider the speech signal as a sequence of observable events generated by the mechanical speech production system which transitions from one state to another when producing speech. The term hidden refers to the fact the state of the system (i.e. the configuration of the speech articulators) is not known to the observer of the speech signal. Speech recognition systems use HMMs to model each sound unit in the language. In an HMM, each state is associated with a probability distribution that measures the likelihood of events generated by the state. These distributions are known as output or observation probability distributions. Each state is also associated with a set of transition probabilities. Given the current state, transition probabilities model the likelihood that the system will be in a certain state when the next observation is produced. Typically, Gaussian distributions are used to model the output distribution of each HMM state. The transition probabilities determine the rate at which the model transitions from one state to the next, giving the model some flexibility with respect to sound units which may vary in duration.
HMM = (?, A, B)
? = the vector of initial state probabilities A = the state transition matrix B = the confusion matrix The definitions of HMMs, there are three problems of interest:
The Evaluation Problem: The forward-backward algorithm is used for the finding the probability that the model generated the observations for a given model and a sequence of observations.
The Decoding Problem: The Viterbi algorithm can be found the most likely state Sequence in the model that produced the observation for a given model and the sequence of observations.
The Learning Problem: The Baum-Welch algorithm finds the model’s parameters so that the maximum probability of generating the observations for a given model and a sequence of observations.
(A)Forward Algorithm:
The forward algorithm computes the all possible state sequences of length that generate observation sequence and then sum all the probabilities. The probability of each path is the product of the state sequence probability and joint probability along the path.
(B)Viterbi Algorithm:
The forward algorithm computes the probability that an HMM generates an observation sequence by summing up the probabilities of all possible paths, so it does not provide the best path or state sequence.In many applications, it is desirable to find such a path. Finding the best is the cornerstone for searching for continuous speech recognition. Since the state sequence is hidden in the HMM framework, the most widely used criterion is to find the state sequence that has the highest probability of being taken while generating the observation sequence, The Viterbi algorithm can be regarded as the dynamic programming applied to the HMM or as a modified forward algorithm. Instead of summing up probabilities from different paths coming to the same destination state, the Viterbi algorithm picks and remembers the best path.
(C)Baum-Welch Algorithm:
It is also known as the forward-backward algorithm used to model the observations in the training data through the HMM parameters. This algorithm is a kind of EM (Expectation Maximization) algorithm that iterates through the data first in a forward pass and then in a backward pass. During each pass, we adjust a set of probabilities to maximize the probability of a given observation in the training data corresponding to a given HMM state. Because this estimation problem has no analytical solution, incremental iterations are necessary until a convergence is achieved. In each iteration, the algorithm tries to find better probabilities that maximize the likelihood of observations and training data. During this phase, we re-estimate the mixing weight, transition probabilities, and mean and variance parameters.
After each Baum-Welch re-estimation iteration, we insert a normalization step. We compute the re-estimated model parameters from the re-estimation counts obtained through Baum-Welch. The combined Baum-Welch and normalization iteration repeats until we achieve an acceptable parameter convergence.
Implementation:
We have to write the Batch mode file.
It can be written as a text transcription along with the raw file.The raw file where we had saved and that is the pathname to the batch file. Installing the configuration file then we have to build the XML file and call all the files where we stored within sphinx4 folder run the XML file.Running the Sphinx 4 it displays the References and hypothesis values with accuracy and error rate and it displays insertion, substitution, deletion errors. Improving the efficiency of the speech recognition accuracy with speech recognition system Sphinx 4. Speech recognition system is developed with own dictionary, in order to improve the efficiency of the speech recognition system. Recognition Errors not only vary in numbers but also have different degrees of impact on optimizing a set of acoustic models. It is important to correct the errors in the results of speech recognition to increase its performance of a speech recognition system. Running the speech recognition system it can display the References and Hypothesis values and errors.
Here we can get three types of errors.
1. Insertion
2. Substitution
3. Deletion
An extra word was added in the recognized sentence is called as Insertion error.
An incorrect word was substituted for the correct word is called as Substitution error.
A correct word was omitted in the recognized sentence.
By correcting the speech recognition errors we can improve the speech recognition accuracy. Two pairs of strings are used in the speech. The first string is an erroneous string of the utterance predicted by the speech recognition system. The second string is the corresponding section of the actual utterance. Errors are detected and corrected according to the database. When examining errors in speech recognition, we have to check total database where the errors are found. An error pattern is made up of two strings. One is the string including errors, and the other is the corresponding correct string.
These parts are extracted from the speech recognition results and the corresponding actual utterances. The correction part is made by substituting a correct part for an error part when the error part is detected in a recognition result. Compare the references and hypothesis values from the database and corrects the dictionary, reduce the insertion, substitution, deletion errors and improve the speech recognition accuracy with corrected string. Speech recognizer's usually produced three different types of errors, including insertion, substitution, and deletion. In speech recognition insertion, substitution, deletion errors usually not only vary in numbers but also have different degrees of impact on optimizing a set of acoustic models.
By using error-pattern correction we can eliminate the error rate and improved the speech recognition accuracy. Here, we have to correct the dictionary and the batch file. If we did the three errors then easily improve the accuracy and reduce the error rate. Do the insertion, substitution at a time and deletion at one time to improve the speech recognition system accuracy. The pronunciation dictionary is one of the core components of a speech recognition system. The performance of a speech recognition system mainly based on the choice of subunits and the accuracy of the speech. It may vary the accuracy values by using audio fingerprint methods to speech recognition system. By using classification techniques we can improve the accuracy of the speech recognition system.
Browse our vast selection of original essay samples, each expertly formatted and styled