close
test_template

Text-to-speech Device for Patients with Low Vision

download print

About this sample

About this sample

close

Words: 2717 |

Pages: 6|

14 min read

Published: Jan 21, 2020

Words: 2717|Pages: 6|14 min read

Published: Jan 21, 2020

Table of contents

  1. Abstract
  2. Introduction
  3. Methods
  4. System Specification
    Software Design of Image Processing Module
    Design Implementation
    Compiling word correction module
    Phoneme to Speech converter
    Setting
  5. Results
  6. Conclusion
  7. References

Abstract

With six meters highest visibility and 20 degrees maximum wide view, people who suffer from low vision are unable to see words and letters in ordinary newsprint. This fact makes the reading process becomes difficult that can disturb learning process and slow the patient's intelligence development. Therefore, a device is needed to help them read easier. One of the device that are being developed today is a device that utilize another sense that is auditory sense. Text-to-Speech is a device that scans and reads Indonesian text book by changing it to voices.

'Why Violent Video Games Shouldn't Be Banned'?

The purpose of the device is to process image as input into voice as output. This paper describes the design, implementation and experimental results of the device. This device consists of three modules, there are image processing module, words correction module and voice processing module. The device was developed based on Raspberry Pi v2 with 900 MHz processor speed. The audio output can be easily understood, it have less than 2% total error rate and processing time nearly two minutes for input text with A4 paper size. This device provides convenience for low vision people by leading them using voice, it also have the ability to play and stop the output while reading.

Introduction

According Thylefors in Gianini (2004), impaired vision can have negative effects on learning and social interaction. It can affect the natural development of intelligence and academic ability, social, and profession [1]. Based on data Riskesdas in 2013, total number of people with low vision in Indonesia amounted to 2,133,017 [2]. Visually impaired low vision cannot be repaired with the help of glasses. Maximum visibility of these patients is 6 meters with a maximum of 20 degrees wide view. This causes people with low vision cannot see the normal printed paper. They can only see if the size of the characters or letters is big enough. This condition impacted the length of the reading process and make the eyes tired.

To help improve the quality of life for people with low vision a tool to read the article is needed. The rate of vision impairment can vary in each individual with low vision. Therefore a device developed in this work utilized other sensory function in receiving information from a text. The device converted text-to-speech specifically designed for Indonesian people with low vision so that they can easily use this device without having to ask for help to others, and they can utilize these device to understand the literature in Indonesian-language.

Methods

Text-to speech device consisted of three main modules, the image processing module, word correction module, and voice processing modules. Image processing module sets the object position, focus and illumination camera, taking pictures, and converting the image into text. Word correction module makes corrections to the output image processing module to improve accuracy by matching with Indonesian dictionary. Voice processing module changes the writing into sound and process it with specific physical characteristics so that the sound can be understood.

One element in this image processing module is OCR. In using OCR engine required state and initial steps in order to get the best input of OCR to reduce the disability of this OCR engine. Setup state is well adapted to the specifications of the desired initial device. So that the desired output of this processing has a minimum error rate is also a short processing time. This module do not change the OCR algorithm, but gives additional state to get the best input of OCR.

IEEE OCR or Optical Character Recognition is a technology that automatically recognize the character through the optical mechanism, this technology imitate the ability of the human senses of sight, where the camera becomes a replacement for eye and image processing is done in the computer engine as a substitute for the human brain [3]. Tesseract OCR is a type of OCR engine with matrix matching [4]. The selection of Tesseract engine is because this machine has been widely accepted in the world, as well as the flexibility and extensibility of these machines and the fact that many communities are active researchers to develop this OCR engine. Machines still have defects such as distortion at the edges and dim light effect, so it is still difficult for most OCR engines to get high accuracy text [5]. It needs some supporting and condition in order to get the minimal defect.

System Specification

The device is designed base of the following restrictions: a. Range of reading distance is 38-42 cm. b. Maximum thickness of the reading material is 3 cm. c. Minimum illumination is 250 lumen/m2 (environmental classes, office with an easy job) d. Maximum tilt of the text line is 5 degrees from the vertical. e. Maximum size of reading material is A4 or 210x297mm f. Characters size is minimum of 10 pt. g. Type the characters include Roman, Egyptian or Sans Serif types. Hardware System Design The stand in Fig. 2 is designed so that a maximum of A4 size paper can be captured entirely by the camera. The distance from the camera to object is 40 cm and a pole of 15 cm long is added to position the camera above the center of the object.

The Raspberry Pi camera module use manual focus adjustment so that it is necessary to adjust the initial lens setting. To sharpen the input image a good lighting condition is needed. Therefore, a series of LEDs is added to provide additional light if the environment has a low light intensity. Tesseract OCR Implementation The input image captured by the camera has a size of 5 MPI (2592 x 1944 pixels) or 215 ppi(pixels per inch). Based on the specifications of the Tesseract OCR engine, the minimum character size that can be read is 20 pixels uppercase letters. Tesseract OCR accuracy will decrease with the font size of 10pt. Software Design The software process the input image and converted into text format. The software implementation.

Software Design of Image Processing Module

The image is taken by the user via GPIO pin that are connected to the tactile key using interrupt function. Furthermore, the picture is taken by using raspistill program with sharpness mode to sharpen the image. The resulting image has a .jpg format with a resolution of 2592 x 1944 pixels. B. The Word Correction Module Spellchecking Spellchecking is a task to predict the misspelled words in the document. This prediction can be displayed to the user in various ways. Work correction is a job to replaces the misspelled word with the hypothesis of the correct spell. The most appropriate approach is to model something that is caused the error directly, and encodes them into a modeling algorithm or an error. Damerau-Levenshtein edit distance was introduced as a way to detect misspellings (Damerau, 1964).

Phonetic indexing algorithms such as metaphone, used by GNU Aspel (Atkinson, 2009), featuring back words with similar pronunciations approach ('soundslike' pronunciation) and allow correction word looks different from orthographic word. Metaphone relies on a data file that contains phonetic information. Linguistic intuition about the different causes in a spelling error can also be represented explicitly in the spelling system (Deorowicz and Ciura, 2005). Almost every spelling system is currently use the lexicon (dictionary). Dictionary-based have trouble handling things that do not appear in the dictionary, such as nouns, foreign terms or uptake, and neologisms, which can increase the proportion term that does not exist in the dictionary (Ahmad and Kondrak, 2005) [6].

Word correction module gets input from the image processing module in the form of text from the image processing module. Image processing module cant define truth or falsity of the word output, so that the correction module of this word, correction for whole words output from the image processing requires module. In order to improve the accuracy of the output image processing module, to design the word correction module.

Word correction module consists of several functions. In the word correction software there is one main function, which is correct function. Other functions such as supporting function to adjust the input with the Indonesian grammar. Correct function matches the input and correct it as well. The correct function uses a dictionary (list of words) in Indonesian as a reference to correct it. There are support function to overcome constraints on the use of numbers and the name of the dictionary as described in the literature, such as: 1. The function to break text into words. 2. The function to check number in text. 3. The function to check uppercase letter at the beginning of the sentence. 4. The function to check punctuation mark at the end of the sentence. 5. The function to check name (uses uppercase letter) in the sentence. 6. The function to combine all word output from previous execution.

Design Implementation

Implementation of the word correction module consists of:

The first step is arrange words in Indonesian language to be used in the dictionary. The dictionary is used to compare each input with Indonesian language. The words in this dictionary comes from the words that exist in KBBI (Kamus Besar Bahasa Indonesia). The number of words in this dictionary are the result of the reduction of as many as 50,850 words. The number is a combination of basic words, conjunctions, repeated words, absorption words, numerals, question words, pronouns, affixes, prefixes and suffixes.

Compiling word correction module

Correction word module compiled by adapting corrector made by Peter Norvig. In this word correction module, because common error from the output image processing usually occurs in letter, not the length of the word, then the correction function just replace the error word. This function will only replace the word if the length of the input equal to the length of the word in the dictionary. The use of this kind of replacing is also considering the computational load. If only use one replacing function, for word length is n and edit distance is one then it will only occur n-1 transposition distances.

From literature about spelling correction, it claims that 80% to 95% of spelling errors have an edit distance of one from the target. Based on research by Peter Novrig on 270 errors in spelling, it is found that only 76% of them have one edit distance. Further research obtain a good coverage, in the test case 270, only three had a distance greater than two. That means, the correction of input will include two letters 98.9% of the case. Because of the correction do not exceed two distance correction, the optimization that can be done is simply to maintain the substitute word that will be used totally familiar words [7].

There is no general provision limiting character differences which is corrected. However, based on the research results of the foregoing and considering the computational load, it is used a limit of two characters to this correction function. This correction function uses probability-based methods which are do a training for the input word so that possible words that will be issued as a substitute of a corrected word depend on the frequency of occurrence of the word. The Voice Processing Module Text-to-Speech TTS (Text-to-Speech) is a system that can convert input from text into speech.

Text-to-Speech in principle consists of two subsystems that are:

Text to phoneme converter is used to convert the sentence input in a particular language in the form of text into a series of codes that usually represented by the sound of the phoneme codes, its duration and pitch. This section is language dependents.

Phoneme to Speech converter

Phoneme to Speech converter will accept input in the form of codes as well as the pitch and duration of phonemes produced by the previous section. System Design Fig. 5 showed the voice processing module diagram.

Fig. 5. Design level 0 of voice processing module By considering the use of Linux platform, the availability of Indonesian dialect, and the results of simulation in TTS, it is selected eSpeak and Google TTS for TTS software. The specifications of the general function of the system to be achieved are as follows:

  1. The output voice have Indonesian dialect with the percentage of reading intelligibility tolerance is 0.02%.
  2. There are additional feature such as play, stop and pause the sound. Design Implementation The voice processing module implementation diagram.

Python's standard library covers a wide range of modules. The voice processing module uses OS package which provides file and process operations, pygame package which provides functions for playing sounds, RPi.GPIO package which provides a class to control the GPIO on a Raspberry Pi, and subprocess package which allows to spawn new processes, connect to their input/output/error pipes, and obtain their return codes.The isPause and isStop are variables that will be used for the audio player features. These variables are initialized with a value of False, which means they have not been active.

Setting

Setting out the GPIO pin numbering in accordance with the breakout board.

The main program provides functions to retrieve and process the input image, convert it into a sound signal, and play, stop, pause, or exit the voice output.

Startimport pygame andsubprocess modulesconvert text file (.txt) into sound file (.wav)set GPIO pinisPause=FalseisStop=Falseload sound file (.wav) into pygame moduleactivate voice guidance to press the play button press the play buttonisPause? resume voicestop buttonis pressed?pause buttonis pressed?play voiceisPauseFalsestop voiceisStop=true pause voice isPause=trueplay buttonis pressed?finishF TYNYYN Input buttons for user: (play, stop, pause and exit) Audio jack Text-to-speech software Single Board Unit System of voice processing module Voice (.wav) Text (.txt) Control buttons Image processing using text-to-speech 2172015 4th International Conference on Instrumentation, Communications, Information Technology, and Biomedical Engineering (ICICI-BME).

Flowchart of voice processing module The sound function is used to convert written text into speech using simple TTS API via espeak. Its best to use interrupt to run the audio player features such as play, stop, pause, and exit.

Results

The testing was done using Raspberry Pi platform with the following specifications:

Get a custom paper now from our expert writers.

  • SBU Raspberry Pi 2 900 MHz Quad Code ARM Cortex-A7
  • Raspberry Pi 5MP Camera Board Module
  • Bootable SanDisk Ultra 8GB microSD Card From the experimental results it is known that the image processing module has the following restrictions:
  • The maximum size of the input image is A4 size.
  • Any input image that uses the block letter fonts will work fine.
  • The minimum font size is 10 point

Conclusion

Text-to-Speech device for people with low vision can change the text image input into sound with a performance that is high enough and a readability tolerance of less than 2%, with the average time processing less than two minutes for A4 paper size. This device have ! average error rate from the image processing module and the word processing module reduce it. This is a portable device, does not require internet connection, and can be used independently by people with low vision. This device also has a user interface that allows people with low vision interact easily.

References

  1. Kementrian Kesehatan RI. Situasi Gangguan Penglihatan dan Kebutaan pp. 5. Jakarta: Pusat Data dan Informasi, 2014.
  2. R. Mithe, S. Indalkar and N. Divekar. !Optical Character Recognition" International Journal of Recent Technology and Engineering (IJRTE), ISSN: 2277-3878,Volume-2, Issue-1, March 2013.
  3. R. Smith. !An Overview of the Tesseract OCR Engine", USA: Google Inc.
  4. H. Shah and A. Shah. !Optical Character Recognization of Gujarati Numericals", International Conference on Signals, Systems & Automation, 2009, pp. 49-53.
  5. C. Whitelaw, B. Hutchinson, G. Y. Chung and G. Ellis. !Using the Web for Language Independent Spellchecking and Autocorrection", Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, 2009.
  6. P. Norvig. !How to Write a Spelling Corrector". USA : Google Inc. Available : http://norvig.com/spell-correct.html.
  7. S.V. Rice, F.R. Jenkins, T.A. Nartker. The Fourth Annual Test of OCR Accuracy, Technical Report 95-03. Las Vegas: Information Science Research Institute, University of Nevada, July 1995.
  8. R. Mengko and A. Ayuningtyas.Indonesian TTS system using syllable concatenation: Speech Optimization Proc. International Conference onInstrumentation, Communication, Information Technology, andBiomedical Engineering (ICICI-BME), November 2013, pp. 412-415.2192015
  9. 4th International Conference on Instrumentation, Communications, Information Technology, and Biomedical Engineering (ICICI-BME) Bandung, November 2-3, 2015
Image of Alex Wood
This essay was reviewed by
Alex Wood

Cite this Essay

Text-to-speech Device for Patients with Low Vision. (2020, January 15). GradesFixer. Retrieved April 23, 2024, from https://gradesfixer.com/free-essay-examples/text-to-speech-device-for-patients-with-low-vision/
“Text-to-speech Device for Patients with Low Vision.” GradesFixer, 15 Jan. 2020, gradesfixer.com/free-essay-examples/text-to-speech-device-for-patients-with-low-vision/
Text-to-speech Device for Patients with Low Vision. [online]. Available at: <https://gradesfixer.com/free-essay-examples/text-to-speech-device-for-patients-with-low-vision/> [Accessed 23 Apr. 2024].
Text-to-speech Device for Patients with Low Vision [Internet]. GradesFixer. 2020 Jan 15 [cited 2024 Apr 23]. Available from: https://gradesfixer.com/free-essay-examples/text-to-speech-device-for-patients-with-low-vision/
copy
Keep in mind: This sample was shared by another student.
  • 450+ experts on 30 subjects ready to help
  • Custom essay delivered in as few as 3 hours
Write my essay

Still can’t find what you need?

Browse our vast selection of original essay samples, each expertly formatted and styled

close

Where do you want us to send this sample?

    By clicking “Continue”, you agree to our terms of service and privacy policy.

    close

    Be careful. This essay is not unique

    This essay was donated by a student and is likely to have been used and submitted before

    Download this Sample

    Free samples may contain mistakes and not unique parts

    close

    Sorry, we could not paraphrase this essay. Our professional writers can rewrite it and get you a unique paper.

    close

    Thanks!

    Please check your inbox.

    We can write you a custom essay that will follow your exact instructions and meet the deadlines. Let's fix your grades together!

    clock-banner-side

    Get Your
    Personalized Essay in 3 Hours or Less!

    exit-popup-close
    We can help you get a better grade and deliver your task on time!
    • Instructions Followed To The Letter
    • Deadlines Met At Every Stage
    • Unique And Plagiarism Free
    Order your paper now