By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email
No need to pay just yet!
About this sample
About this sample
Words: 1935 |
Pages: 4|
10 min read
Published: Nov 8, 2019
Words: 1935|Pages: 4|10 min read
Published: Nov 8, 2019
With the advent of computers and Internet technology, the scopes for collecting data and using them for various purposes has exploded. The possibilities are especially alluring when it comes to textual data. Converting the vast amount of data that has accumulated over the years of human history into digital format is vital for preservation, data mining, sentiment analysis etc. which will only add more to the advancement of our society. The tool used for this purpose is called OCR.
Like many other languages, Bangla can also profit from the OCR technology – more so since it is the seventh most-spoken language in the world and the speaker population is about 300 million. The Bangla-speaking demographic is most found in Bangladesh, the Indian states of West-Bengal, Assam, Tripura, Andaman & Nicobar Islands and also the ever-increasing diaspora in United Kingdom (UK), United States (US), Canada, Middle-East, Australia, Malaysia etc. So the progress in digital utilization of Bangla language is something that encompasses the interest of many countries.
OCR is the short form for Optical Character Recognition. It is a technology to convert images of printed/handwritten text into machine readable i.e. digital format. Although OCRs these days are prevalently focused on digitizing texts, earlier OCRs were analogue. The first OCR in the world was considered to be invented by American inventor Charles R. Carey which used an image transmission system using a mosaic of photocells. The later inventions were focused on scanning documents to produce more copies or to convert them into telegraph code, and then digital format became more popular gradually. In 1966, the IBM Rochester lab developed the IBM 1287, the first scanner that could read handwritten numbers. The first commercial OCR was introduced in 1977 by Caere Corporation. OCR began to be made available online as a service (WebOCR) in 2000 across a variety of platforms through cloud computing.
Based on its method, OCR can be divided into two types -
Most of successful research in Bangla OCR have been done for printed text so far, although researchers are foraying more into handwritten text recognition gradually. Sanchez and Pal * proposed a classic line-based approach for continuous Bangla handwriting recognition based on hidden Markov models and n-gram models. They used both word-based LM (language model) and character based LM for their experiment and found better results with word based LM.
Garain, Mioulet, Chaudhuri, Chatelain and Paquet * developed a recurrent neural net model for recognizing unconstrained Bangla handwriting at character level. They used a BLSTM-CTC based recognizer on a dataset consisting of 2338 unconstrained Bangla handwritten lines, which is about 21000 words in total. Instead of horizontal segmentation, they chose vertical segmentation classifying the words into “semi-ortho syllables”. Their experiment yielded an accuracy of 75.40% without any post processing.
Hasnat, Chowdhury and Khan * developed a Tesseract based OCR for Bangla script which they used on printed document. They achieved a maximum accuracy of 93% on clean printed documents and lowest accuracy of 70% on screen print image. It is apparent that this is very sensitive to variations in letter forms and is not much favorable to be used in Bengali handwriting character recognition.
Chowdhury and Rahman * proposed an optimal neural network setting for recognizing Bangla handwritten numerals which consisted of two convolution layer with Tanh activation, one hidden layer with Tanh activation and one output layer with softmax activation. For recognizing the 9 Bangla numeric characters, they used a dataset of 70000 samples with an error rate of 1.22% to 1.33%.
Purkayastha, Datta and Islam * also used convolutional neural network for Bangla handwritten character recognition. They are the first to work on compound Bangla handwritten characters. Their recognition experiment also included numeric characters and alphabets. They achieved 98.66% accuracy on numerals and 89.93% accuracy on almost all Bengali characters (80 classes).
Some projects have been developed for Bangla OCR, it is to be noted that none of them work on handwritten text.
BanglaOCR * is an open source OCR developed by Hasnat, Chowdhury and Khan * which uses the Google Tesseract engine for character recognition and works on printed documents, as discussed in Section 3.1
Puthi OCR aka GIGA Text Reader is a cross-platform Bangla OCR application developed by Giga TECH. This application works on printed documents written in Bangla, English and Hindi. The Android app version is free to download but the desktop version and enterprise version require payment.
Chitrolekha * is another Bangla OCR using Google Tesseract and Open CV Image Library. The application is free and was possibly was available in Google Play Store in the past, but at present (as of 15.07.2018) it is no longer available.
i2OCR * is a multilingual OCR supporting more than 60 languages including Bangla.
Many of the existing Bangla OCRs have major limitations such as
Deep CNN stands for Deep Convolutional Neural Network.
First, let us try to understand what a convolution neural network (CNN) is. Neural networks are tools used in machine learning inspired by the architecture of human brain. The most basic version of artificial neuron is called perceptron which makes a decision from weighted inputs and probabilities against threshold value. A neural network consists of interconnected perceptrons whose connectedness may differ according to various configurations. The simplest perceptron topology is the feed-forward network consisting of three layers – input layer, hidden layer and output layer.
Deep neural networks have more than one hidden layer. So, a deep CNN is a convolutional neural network with more than one hidden layer. Now we come to the matter of convolutional neural network. While neural networks are inspired by the human brain, CNNs are another type of neural network that take it further by also drawing some similarities from the visual cortex of animals *. Since CNNs are influenced by research in receptive field theory * and neocognition model * , they are better suited to learn multilevel hierarchies of visual features from images than other computer vision techniques. CNNs have earned significant achievements in AI and computer vision in the recent years.
The main difference between convolutional neural network and other neural networks is that a neuron in hidden layer is only connected to a subset of neurons (perceptrons) in the previous layer. As a result of this sparseness in connectivity, CNNs are able to learn features implicitly i.e. they do not need predefined features in training.
A CNN consists of several layers such as
As mentioned to some extent in section 4.1, deep CNNs have proven to be especially effective in computer vision. This opens up for new opportunities in handwriting recognition since it is also a type of computer vision technology. Deep CNN has already been successfully used to process Bangla handwritten characters with better results than other methods * , although it does not cover the full spectrum of Bangla written forms. Since Bangla is a language with a huge number of complex letter forms, deep CNN is efficient because it does not require features to be defined beforehand in order to learn.
For the purpose of building the OCR, we first have to collect necessary data. Bangla has many compound letter forms so this task is a bit challenging which is discussed in detail in chapter 5. After error correction, we will convert the images to greyscale for binarization. If the images are skewed in processing, the number of data samples will increase significantly.
After necessary error correction We will train the dataset using deep CNN classifier. For optimal result, we will have to experiment with various combinations of convolutional layers and fully connected layers, as only the number of layers itself do not guarantee maximum accuracy.
Browse our vast selection of original essay samples, each expertly formatted and styled