Home — Essay Samples — Information Science and Technology — Big Data — Bangla OCR

Bangla Ocr

Categories: Big Data Computer Data Mining

Human-Written

About this sample

Human-Written

Words: 1586 |

Pages: 3|

8 min read

Published: Mar 14, 2019

Words: 1586|Pages: 3|8 min read

Published: Mar 14, 2019

Introduction
Background Study
Proposed Methodology and Implementation

Introduction

With the advent of computers and Internet technology, the scopes for collecting data and using them for various purposes has exploded. The possibilities are especially alluring when it comes to textual data. Converting the vast amount of data that has accumulated over the years of human history into digital format is vital for preservation, data mining, sentiment analysis etc. which will only add more to the advancement of our society. The tool used for this purpose is called OCR.

Like many other languages, Bangla can also profit from the OCR technology – more so since it is the seventh most-spoken language in the world and the speaker population is about 300 million. The Bangla-speaking demographic is most found in Bangladesh, the Indian states of West-Bengal, Assam, Tripura, Andaman & Nicobar Islands and also the ever-increasing diaspora in United Kingdom (UK), United States (US), Canada, Middle-East, Australia, Malaysia etc. So the progress in digital utilization of Bangla language is something that encompasses the interest of many countries.

Background Study

OCR is the short form for Optical Character Recognition. It is a technology to convert images of printed/handwritten text into machine readable i.e. digital format. Although OCRs these days are prevalently focused on digitizing texts, earlier OCRs were analogue. The first OCR in the world was considered to be invented by American inventor Charles R. Carey which used an image transmission system using a mosaic of photocells.

The later inventions were focused on scanning documents to produce more copies or to convert them into telegraph code, and then digital format became more popular gradually. In 1966, the IBM Rochester lab developed the IBM 1287, the first scanner that could read handwritten numbers. The first commercial OCR was introduced in 1977 by Caere Corporation. OCR began to be made available online as a service (WebOCR) in 2000 across a variety of platforms through cloud computing.

Based on its method, OCR can be divided into two types:

On-line OCR (not to be confused with “online” in internet technology) involves the automatic conversion of text as it is written on a special digitizer or PDA, where a sensor picks up the pen-tip movements as well as pen-up/pen-down switching. This kind of data is known as digital ink and can be regarded as a digital representation of handwriting. The obtained signal is converted into letter codes which are usable within computer and text-processing applications.
Off-line OCR scans an image as a whole and does not deal with stroke orders. It is a kind of image processing since it tries to recognize character patterns in given image files.

On-line OCR can only process texts written in real time, whereas off-line OCR can process images of both handwritten and printed texts and no special device is needed.

Most of successful research in Bangla OCR have been done for printed text so far, although researchers are foraying more into handwritten text recognition gradually.

Sanchez and Pal proposed a classic line-based approach for continuous Bangla handwriting recognition based on hidden Markov models and n-gram models. They used both word-based LM (language model) and character based LM for their experiment and found better results with word based LM.

Garain, Mioulet, Chaudhuri, Chatelain and Paquet developed a recurrent neural net model for recognizing unconstrained Bangla handwriting at character level. They used a BLSTM-CTC based recognizer on a dataset consisting of 2338 unconstrained Bangla handwritten lines, which is about 21000 words in total. Instead of horizontal segmentation, they chose vertical segmentation classifying the words into “semi-ortho syllables”. Their experiment yielded an accuracy of 75.40% without any post processing.

Hasnat, Chowdhury and Khan developed a Tesseract based OCR for Bangla script which they used on printed document. They achieved a maximum accuracy of 93% on clean printed documents and lowest accuracy of 70% on screen print image. It is apparent that this is very sensitive to variations in letter forms and is not much favorable to be used in Bengali handwriting character recognition.

Chowdhury and Rahman proposed an optimal neural network setting for recognizing Bangla handwritten numerals which consisted of two convolution layer with Tanh activation, one hidden layer with Tanh activation and one output layer with softmax activation. For recognizing the 9 Bangla numeric characters, they used a dataset of 70000 samples with an error rate of 1.22% to 1.33%.

Purkayastha, Datta and Islam also used convolutional neural network for Bangla handwritten character recognition. They are the first to work on compound Bangla handwritten characters. Their recognition experiment also included numeric characters and alphabets. They achieved 98.66% accuracy on numerals and 89.93% accuracy on almost all Bengali characters (80 classes).

Some projects have been developed for Bangla OCR, it is to be noted that none of them work on handwritten text:

BanglaOCR is an open source OCR developed by Hasnat, Chowdhury and Khan which uses the Google Tesseract engine for character recognition and works on printed documents, as discussed in Section 3.1
Puthi OCR aka GIGA Text Reader is a cross-platform Bangla OCR application developed by Giga TECH. This application works on printed documents written in Bangla, English and Hindi. The Android app version is free to download but the desktop version and enterprise version require payment.
Chitrolekha is another Bangla OCR using Google Tesseract and Open CV Image Library. The application is free and was possibly was available in Google Play Store in the past, but at present (as of 15.07.2018) it is no longer available.
i2OCR is a multilingual OCR supporting more than 60 languages including Bangla.

Proposed Methodology and Implementation

Deep CNN stands for Deep Convolutional Neural Network. First, let us try to understand what a convolution neural network (CNN) is. Neural networks are tools used in machine learning inspired by the architecture of human brain. The most basic version of artificial neuron is called perceptron which makes a decision from weighted inputs and probabilities against threshold value. A neural network consists of interconnected perceptrons whose connectedness may differ according to various configurations. The simplest perceptron topology is the feed-forward network consisting of three layers – input layer, hidden layer and output layer.

Deep neural networks have more than one hidden layer. So, a deep CNN is a convolutional neural network with more than one hidden layer.Now we come to the matter of convolutional neural network. While neural networks are inspired by the human brain, CNNs are another type of neural network that take it further by also drawing some similarities from the visual cortex of animals *. Since CNNs are influenced by research in receptive field theory * and neocognition model * , they are better suited to learn multilevel hierarchies of visual features from images than other computer vision techniques. CNNs have earned significant achievements in AI and computer vision in the recent years.

The main difference between convolutional neural network and other neural networks is that a neuron in hidden layer is only connected to a subset of neurons (perceptrons) in the previous layer. As a result of this sparseness in connectivity, CNNs are able to learn features implicitly i.e. they do not need predefined features in training.

A CNN consists of several layers such as:

Convolutional Layer: This is the basic unit of a CNN where most of the computations happen. A CNN consists of a number of convolutional and pooling (subsampling) layers optionally followed by fully connected layers. The input to a convolutional layer is a m x m x r image where m is the height and width of the image and r is the number of channels. The convolutional layer will have k filters (or kernels) of size n x n x q where n is smaller than the dimension of the image and q can either be the same as the number of channels r or smaller and may vary for each kernel. The size of the filters gives rise to the locally connected structure which are each convolved with the image to produce k feature maps of size m−n+1.
Pooling Layer: Each feature map is then subsampled typically with mean or max pooling over p x p contiguous regions where p ranges between 2 for small images (e.g. MNIST) and is usually not more than 5 for larger inputs. Alternating convolutional layers and pooling layers to reduce the spatial dimension of the activation maps leading to less overall computational complexity. Some common pooling operations are max pooling, average pooling, stochastic pooling ,spectral pooling , spatial pyramid pooling and multiscale orderless pooling .
Fully Connected Layer: In this layer, neurons are fully connected to all neurons in the previous layer like regular Neural Network. High level reasoning is done here. As the neurons are not one dimensional, another convolutional layer cannot be present after this layer. Some architectures have their fully connected layer replaced, as in "Network In Network"(NIN) ,by a global average pooling layer f(x)=max(0, x)
Loss Layer: The last fully connected layer is called loss layer since it computes loss or error between correct and actual output. Softmax loss is a commonly used loss function. It is used in predicting a single class out of K mutually exclusive classes. For SVM (Support Vector Machine), Hinge loss is used and for regressing to real-valued labels Euclidean loss can be used.

Foresting: Reward For Creative Content On The Blockchain

Case Study On Big Data Ecosystem At Linkedin

This essay was reviewed by

Alex Wood

More about our Team

Cite this Essay

Bangla OCR. (2019, March 12). GradesFixer. Retrieved July 15, 2025, from https://gradesfixer.com/free-essay-examples/bangla-ocr/

“Bangla OCR.” GradesFixer, 12 Mar. 2019, gradesfixer.com/free-essay-examples/bangla-ocr/

Bangla OCR. [online]. Available at: <https://gradesfixer.com/free-essay-examples/bangla-ocr/> [Accessed 15 Jul. 2025].

Bangla OCR [Internet]. GradesFixer. 2019 Mar 12 [cited 2025 Jul 15]. Available from: https://gradesfixer.com/free-essay-examples/bangla-ocr/

copy

Keep in mind: This sample was shared by another student.

450+ experts on 30 subjects ready to help
Custom essay delivered in as few as 3 hours

Get high-quality help

Prof. Kifaru

Verified writer

Expert in: Information Science and Technology

4.7

(412 reviews)

“Really polite, and a great writer! Task done as described and better, responded to all my questions promptly too! ”

+120 experts online

Hire writer

Learn the cost and time for your paper

Paper Topic

Deadline: in 10 days

Number of pages

Email Invalid email

By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email

"You must agree to out terms of services and privacy policy"

Get an estimate

No need to pay just yet!

Remember! This is just a sample.

You can get your custom paper by one of our expert writers.

Get custom essay

121 writers online

Still can’t find what you need?

Browse our vast selection of original essay samples, each expertly formatted and styled

Bangla Ocr

Table of contents

Introduction

Background Study

Proposed Methodology and Implementation

Cite this Essay

Still can’t find what you need?

Get Your
Personalized Essay in 3 Hours or Less!

Bangla Ocr

Table of contents

Introduction

Background Study

Proposed Methodology and Implementation

Cite this Essay

Related Essays

Still can’t find what you need?

Related Essays on Big Data

Related Topics

Get Your Personalized Essay in 3 Hours or Less!

Get Your
Personalized Essay in 3 Hours or Less!