Bangla Ocr

About this sample

About this sample


Words: 1586 |

Pages: 3|

8 min read

Published: Mar 14, 2019

Words: 1586|Pages: 3|8 min read

Published: Mar 14, 2019

Table of contents

  1. Introduction
  2. Background Study
  3. Proposed Methodology and Implementation


With the advent of computers and Internet technology, the scopes for collecting data and using them for various purposes has exploded. The possibilities are especially alluring when it comes to textual data. Converting the vast amount of data that has accumulated over the years of human history into digital format is vital for preservation, data mining, sentiment analysis etc. which will only add more to the advancement of our society. The tool used for this purpose is called OCR.

'Why Violent Video Games Shouldn't Be Banned'?

Like many other languages, Bangla can also profit from the OCR technology – more so since it is the seventh most-spoken language in the world and the speaker population is about 300 million. The Bangla-speaking demographic is most found in Bangladesh, the Indian states of West-Bengal, Assam, Tripura, Andaman & Nicobar Islands and also the ever-increasing diaspora in United Kingdom (UK), United States (US), Canada, Middle-East, Australia, Malaysia etc. So the progress in digital utilization of Bangla language is something that encompasses the interest of many countries.

Background Study

OCR is the short form for Optical Character Recognition. It is a technology to convert images of printed/handwritten text into machine readable i.e. digital format. Although OCRs these days are prevalently focused on digitizing texts, earlier OCRs were analogue. The first OCR in the world was considered to be invented by American inventor Charles R. Carey which used an image transmission system using a mosaic of photocells.

The later inventions were focused on scanning documents to produce more copies or to convert them into telegraph code, and then digital format became more popular gradually. In 1966, the IBM Rochester lab developed the IBM 1287, the first scanner that could read handwritten numbers. The first commercial OCR was introduced in 1977 by Caere Corporation. OCR began to be made available online as a service (WebOCR) in 2000 across a variety of platforms through cloud computing.

Based on its method, OCR can be divided into two types:

  • On-line OCR (not to be confused with “online” in internet technology) involves the automatic conversion of text as it is written on a special digitizer or PDA, where a sensor picks up the pen-tip movements as well as pen-up/pen-down switching. This kind of data is known as digital ink and can be regarded as a digital representation of handwriting. The obtained signal is converted into letter codes which are usable within computer and text-processing applications.
  • Off-line OCR scans an image as a whole and does not deal with stroke orders. It is a kind of image processing since it tries to recognize character patterns in given image files.

On-line OCR can only process texts written in real time, whereas off-line OCR can process images of both handwritten and printed texts and no special device is needed.

Most of successful research in Bangla OCR have been done for printed text so far, although researchers are foraying more into handwritten text recognition gradually.

Sanchez and Pal  proposed a classic line-based approach for continuous Bangla handwriting recognition based on hidden Markov models and n-gram models. They used both word-based LM (language model) and character based LM for their experiment and found better results with word based LM.

Garain, Mioulet, Chaudhuri, Chatelain and Paquet  developed a recurrent neural net model for recognizing unconstrained Bangla handwriting at character level. They used a BLSTM-CTC based recognizer on a dataset consisting of 2338 unconstrained Bangla handwritten lines, which is about 21000 words in total. Instead of horizontal segmentation, they chose vertical segmentation classifying the words into “semi-ortho syllables”. Their experiment yielded an accuracy of 75.40% without any post processing.

Hasnat, Chowdhury and Khan  developed a Tesseract based OCR for Bangla script which they used on printed document. They achieved a maximum accuracy of 93% on clean printed documents and lowest accuracy of 70% on screen print image. It is apparent that this is very sensitive to variations in letter forms and is not much favorable to be used in Bengali handwriting character recognition.

Chowdhury and Rahman  proposed an optimal neural network setting for recognizing Bangla handwritten numerals which consisted of two convolution layer with Tanh activation, one hidden layer with Tanh activation and one output layer with softmax activation. For recognizing the 9 Bangla numeric characters, they used a dataset of 70000 samples with an error rate of 1.22% to 1.33%.

Purkayastha, Datta and Islam  also used convolutional neural network for Bangla handwritten character recognition. They are the first to work on compound Bangla handwritten characters. Their recognition experiment also included numeric characters and alphabets. They achieved 98.66% accuracy on numerals and 89.93% accuracy on almost all Bengali characters (80 classes).

Some projects have been developed for Bangla OCR, it is to be noted that none of them work on handwritten text:

  • BanglaOCR  is an open source OCR developed by Hasnat, Chowdhury and Khan  which uses the Google Tesseract engine for character recognition and works on printed documents, as discussed in Section 3.1
  • Puthi OCR aka GIGA Text Reader is a cross-platform Bangla OCR application developed by Giga TECH. This application works on printed documents written in Bangla, English and Hindi. The Android app version is free to download but the desktop version and enterprise version require payment.
  • Chitrolekha  is another Bangla OCR using Google Tesseract and Open CV Image Library. The application is free and was possibly was available in Google Play Store in the past, but at present (as of 15.07.2018) it is no longer available.
  • i2OCR  is a multilingual OCR supporting more than 60 languages including Bangla.

Proposed Methodology and Implementation

Deep CNN stands for Deep Convolutional Neural Network. First, let us try to understand what a convolution neural network (CNN) is. Neural networks are tools used in machine learning inspired by the architecture of human brain. The most basic version of artificial neuron is called perceptron which makes a decision from weighted inputs and probabilities against threshold value. A neural network consists of interconnected perceptrons whose connectedness may differ according to various configurations. The simplest perceptron topology is the feed-forward network consisting of three layers – input layer, hidden layer and output layer.

Deep neural networks have more than one hidden layer. So, a deep CNN is a convolutional neural network with more than one hidden layer.Now we come to the matter of convolutional neural network. While neural networks are inspired by the human brain, CNNs are another type of neural network that take it further by also drawing some similarities from the visual cortex of animals *. Since CNNs are influenced by research in receptive field theory * and neocognition model * , they are better suited to learn multilevel hierarchies of visual features from images than other computer vision techniques. CNNs have earned significant achievements in AI and computer vision in the recent years.

The main difference between convolutional neural network and other neural networks is that a neuron in hidden layer is only connected to a subset of neurons (perceptrons) in the previous layer. As a result of this sparseness in connectivity, CNNs are able to learn features implicitly i.e. they do not need predefined features in training.

Get a custom paper now from our expert writers.

A CNN consists of several layers such as:

  • Convolutional Layer: This is the basic unit of a CNN where most of the computations happen. A CNN consists of a number of convolutional and pooling (subsampling) layers optionally followed by fully connected layers. The input to a convolutional layer is a m x m x r image where m is the height and width of the image and r is the number of channels. The convolutional layer will have k filters (or kernels) of size n x n x q where n is smaller than the dimension of the image and q can either be the same as the number of channels r or smaller and may vary for each kernel. The size of the filters gives rise to the locally connected structure which are each convolved with the image to produce k feature maps of size m−n+1.
  • Pooling Layer: Each feature map is then subsampled typically with mean or max pooling over p x p contiguous regions where p ranges between 2 for small images (e.g. MNIST) and is usually not more than 5 for larger inputs. Alternating convolutional layers and pooling layers to reduce the spatial dimension of the activation maps leading to less overall computational complexity. Some common pooling operations are max pooling, average pooling, stochastic pooling ,spectral pooling , spatial pyramid pooling  and multiscale orderless pooling .
  • Fully Connected Layer: In this layer, neurons are fully connected to all neurons in the previous layer like regular Neural Network. High level reasoning is done here. As the neurons are not one dimensional, another convolutional layer cannot be present after this layer. Some architectures have their fully connected layer replaced, as in "Network In Network"(NIN) ,by a global average pooling layer f(x)=max(0, x)
  • Loss Layer: The last fully connected layer is called loss layer since it computes loss or error between correct and actual output. Softmax loss is a commonly used loss function. It is used in predicting a single class out of K mutually exclusive classes. For SVM (Support Vector Machine), Hinge loss is used and for regressing to real-valued labels Euclidean loss can be used.
Image of Alex Wood
This essay was reviewed by
Alex Wood

Cite this Essay

Bangla OCR. (2019, March 12). GradesFixer. Retrieved February 21, 2024, from
“Bangla OCR.” GradesFixer, 12 Mar. 2019,
Bangla OCR. [online]. Available at: <> [Accessed 21 Feb. 2024].
Bangla OCR [Internet]. GradesFixer. 2019 Mar 12 [cited 2024 Feb 21]. Available from:
Keep in mind: This sample was shared by another student.
  • 450+ experts on 30 subjects ready to help
  • Custom essay delivered in as few as 3 hours
Write my essay

Still can’t find what you need?

Browse our vast selection of original essay samples, each expertly formatted and styled


Where do you want us to send this sample?

    By clicking “Continue”, you agree to our terms of service and privacy policy.


    Be careful. This essay is not unique

    This essay was donated by a student and is likely to have been used and submitted before

    Download this Sample

    Free samples may contain mistakes and not unique parts


    Sorry, we could not paraphrase this essay. Our professional writers can rewrite it and get you a unique paper.



    Please check your inbox.

    We can write you a custom essay that will follow your exact instructions and meet the deadlines. Let's fix your grades together!


    Get Your
    Personalized Essay in 3 Hours or Less!

    We can help you get a better grade and deliver your task on time!
    • Instructions Followed To The Letter
    • Deadlines Met At Every Stage
    • Unique And Plagiarism Free
    Order your paper now