Home — Essay Samples — Information Science and Technology — Artificial Intelligence — The Concept & Types of Ocr (optical Character Recognition)

The Concept & Types of Ocr (optical Character Recognition)

Categories: Artificial Intelligence Computer Science Digital Era

Human-Written

About this sample

Human-Written

Words: 1935 |

Pages: 4|

10 min read

Published: Nov 8, 2019

Words: 1935|Pages: 4|10 min read

Published: Nov 8, 2019

1. Introduction
1.1 Background
1.2 Motivation
2. Background Study
2.1 OCR
2.2 Types of OCR
3. Bangla OCR
3.1 Existing Research
3.2 Existing Projects
3.3 Limitations
4. Proposed Methodology and Implementation
4.1 Deep CNN
4.2 Why Deep CNN
4.3 Experiment Data
4.4 Training and Recognition

1. Introduction

1.1 Background

With the advent of computers and Internet technology, the scopes for collecting data and using them for various purposes has exploded. The possibilities are especially alluring when it comes to textual data. Converting the vast amount of data that has accumulated over the years of human history into digital format is vital for preservation, data mining, sentiment analysis etc. which will only add more to the advancement of our society. The tool used for this purpose is called OCR.

1.2 Motivation

Like many other languages, Bangla can also profit from the OCR technology – more so since it is the seventh most-spoken language in the world and the speaker population is about 300 million. The Bangla-speaking demographic is most found in Bangladesh, the Indian states of West-Bengal, Assam, Tripura, Andaman & Nicobar Islands and also the ever-increasing diaspora in United Kingdom (UK), United States (US), Canada, Middle-East, Australia, Malaysia etc. So the progress in digital utilization of Bangla language is something that encompasses the interest of many countries.

2. Background Study

2.1 OCR

OCR is the short form for Optical Character Recognition. It is a technology to convert images of printed/handwritten text into machine readable i.e. digital format. Although OCRs these days are prevalently focused on digitizing texts, earlier OCRs were analogue. The first OCR in the world was considered to be invented by American inventor Charles R. Carey which used an image transmission system using a mosaic of photocells. The later inventions were focused on scanning documents to produce more copies or to convert them into telegraph code, and then digital format became more popular gradually. In 1966, the IBM Rochester lab developed the IBM 1287, the first scanner that could read handwritten numbers. The first commercial OCR was introduced in 1977 by Caere Corporation. OCR began to be made available online as a service (WebOCR) in 2000 across a variety of platforms through cloud computing.

2.2 Types of OCR

Based on its method, OCR can be divided into two types -

On-line OCR (not to be confused with “online” in internet technology) involves the automatic conversion of text as it is written on a special digitizer or PDA, where a sensor picks up the pen-tip movements as well as pen-up/pen-down switching. This kind of data is known as digital ink and can be regarded as a digital representation of handwriting. The obtained signal is converted into letter codes which are usable within computer and text-processing applications.
Off-line OCR scans an image as a whole and does not deal with stroke orders. It is a kind of image processing since it tries to recognize character patterns in given image files On-line OCR can only process texts written in real time, whereas off-line OCR can process images of both handwritten and printed texts and no special device is needed.

3. Bangla OCR

3.1 Existing Research

Most of successful research in Bangla OCR have been done for printed text so far, although researchers are foraying more into handwritten text recognition gradually. Sanchez and Pal * proposed a classic line-based approach for continuous Bangla handwriting recognition based on hidden Markov models and n-gram models. They used both word-based LM (language model) and character based LM for their experiment and found better results with word based LM.

Garain, Mioulet, Chaudhuri, Chatelain and Paquet * developed a recurrent neural net model for recognizing unconstrained Bangla handwriting at character level. They used a BLSTM-CTC based recognizer on a dataset consisting of 2338 unconstrained Bangla handwritten lines, which is about 21000 words in total. Instead of horizontal segmentation, they chose vertical segmentation classifying the words into “semi-ortho syllables”. Their experiment yielded an accuracy of 75.40% without any post processing.

Hasnat, Chowdhury and Khan * developed a Tesseract based OCR for Bangla script which they used on printed document. They achieved a maximum accuracy of 93% on clean printed documents and lowest accuracy of 70% on screen print image. It is apparent that this is very sensitive to variations in letter forms and is not much favorable to be used in Bengali handwriting character recognition.

Chowdhury and Rahman * proposed an optimal neural network setting for recognizing Bangla handwritten numerals which consisted of two convolution layer with Tanh activation, one hidden layer with Tanh activation and one output layer with softmax activation. For recognizing the 9 Bangla numeric characters, they used a dataset of 70000 samples with an error rate of 1.22% to 1.33%.

Purkayastha, Datta and Islam * also used convolutional neural network for Bangla handwritten character recognition. They are the first to work on compound Bangla handwritten characters. Their recognition experiment also included numeric characters and alphabets. They achieved 98.66% accuracy on numerals and 89.93% accuracy on almost all Bengali characters (80 classes).

3.2 Existing Projects

Some projects have been developed for Bangla OCR, it is to be noted that none of them work on handwritten text.

BanglaOCR * is an open source OCR developed by Hasnat, Chowdhury and Khan * which uses the Google Tesseract engine for character recognition and works on printed documents, as discussed in Section 3.1

Puthi OCR aka GIGA Text Reader is a cross-platform Bangla OCR application developed by Giga TECH. This application works on printed documents written in Bangla, English and Hindi. The Android app version is free to download but the desktop version and enterprise version require payment.

Chitrolekha * is another Bangla OCR using Google Tesseract and Open CV Image Library. The application is free and was possibly was available in Google Play Store in the past, but at present (as of 15.07.2018) it is no longer available.

i2OCR * is a multilingual OCR supporting more than 60 languages including Bangla.

3.3 Limitations

Many of the existing Bangla OCRs have major limitations such as

Segmentation: two types of segmentations are used to separate individual characters/forms – horizontal and vertical. The handwritten recognition OCRs using horizontal segmentation do not much effective result in Bangla cursive texts.
Cursive forms: many OCRs have been successful in recognizing individually written Bangla numerals or characters but when handling texts with Bangla cursive forms, they do not yield favorable results.
Variation in forms: people’s method of writing characters largely vary from person to person, more so since Bangla has a lot forms because of kar and compound letters. No OCR has been developed yet that is able to recognize all these forms in handwriting.

4. Proposed Methodology and Implementation

4.1 Deep CNN

Deep CNN stands for Deep Convolutional Neural Network.

First, let us try to understand what a convolution neural network (CNN) is. Neural networks are tools used in machine learning inspired by the architecture of human brain. The most basic version of artificial neuron is called perceptron which makes a decision from weighted inputs and probabilities against threshold value. A neural network consists of interconnected perceptrons whose connectedness may differ according to various configurations. The simplest perceptron topology is the feed-forward network consisting of three layers – input layer, hidden layer and output layer.

Deep neural networks have more than one hidden layer. So, a deep CNN is a convolutional neural network with more than one hidden layer. Now we come to the matter of convolutional neural network. While neural networks are inspired by the human brain, CNNs are another type of neural network that take it further by also drawing some similarities from the visual cortex of animals *. Since CNNs are influenced by research in receptive field theory * and neocognition model * , they are better suited to learn multilevel hierarchies of visual features from images than other computer vision techniques. CNNs have earned significant achievements in AI and computer vision in the recent years.

The main difference between convolutional neural network and other neural networks is that a neuron in hidden layer is only connected to a subset of neurons (perceptrons) in the previous layer. As a result of this sparseness in connectivity, CNNs are able to learn features implicitly i.e. they do not need predefined features in training.

A CNN consists of several layers such as

Convolutional Layer: This is the basic unit of a CNN where most of the computations happen. A CNN consists of a number of convolutional and pooling (subsampling) layers optionally followed by fully connected layers. The input to a convolutional layer is a m x m x r image where m is the height and width of the image and r is the number of channels. The convolutional layer will have k filters (or kernels) of size n x n x q where n is smaller than the dimension of the image and q can either be the same as the number of channels r or smaller and may vary for each kernel. The size of the filters gives rise to the locally connected structure which are each convolved with the image to produce k feature maps of size m−n+1.
Pooling Layer: Each feature map is then subsampled typically with mean or max pooling over p x p contiguous regions where p ranges between 2 for small images (e.g. MNIST) and is usually not more than 5 for larger inputs. Alternating convolutional layers and pooling layers to reduce the spatial dimension of the activation maps leading to less overall computational complexity. Some common pooling operations are max pooling, average pooling, stochastic pooling *,spectral pooling *, spatial pyramid pooling * and multiscale orderless pooling *.
Fully Connected Layer: In this layer, neurons are fully connected to all neurons in the previous layer like regular Neural Network. High level reasoning is done here. As the neurons are not one dimensional, another convolutional layer cannot be present after this layer . Some architectures have their fully connected layer replaced, as in "Network In Network"(NIN) * ,by a global average pooling layer. f (x)=max(0, x)
Loss Layer: The last fully connected layer is called loss layer since it computes loss or error between correct and actual output. Softmax loss is a commonly used loss function. It is used in predicting a single class out of K mutually exclusive classes. For SVM (Support Vector Machine), Hinge loss is used and for regressing to real-valued labels Euclidean loss can be used. Some common CNN architectures include LeNet, AlexNet, GoogleNet, VGGNet etc. Currently, LeNet is most popular.

4.2 Why Deep CNN

As mentioned to some extent in section 4.1, deep CNNs have proven to be especially effective in computer vision. This opens up for new opportunities in handwriting recognition since it is also a type of computer vision technology. Deep CNN has already been successfully used to process Bangla handwritten characters with better results than other methods * , although it does not cover the full spectrum of Bangla written forms. Since Bangla is a language with a huge number of complex letter forms, deep CNN is efficient because it does not require features to be defined beforehand in order to learn.

4.3 Experiment Data

For the purpose of building the OCR, we first have to collect necessary data. Bangla has many compound letter forms so this task is a bit challenging which is discussed in detail in chapter 5. After error correction, we will convert the images to greyscale for binarization. If the images are skewed in processing, the number of data samples will increase significantly.

4.4 Training and Recognition

After necessary error correction We will train the dataset using deep CNN classifier. For optimal result, we will have to experiment with various combinations of convolutional layers and fully connected layers, as only the number of layers itself do not guarantee maximum accuracy.

The book “Superintelligence: Paths, Dangers, Strategies” by Nick Bostrom

The Main Start Up Schemes Pertaining to the IT Field

This essay was reviewed by

Alex Wood

More about our Team

Cite this Essay

The Concept & Types of Ocr (optical Character Recognition). (2019, September 13). GradesFixer. Retrieved April 17, 2026, from https://gradesfixer.com/free-essay-examples/the-concept-types-of-ocr-optical-character-recognition/

“The Concept & Types of Ocr (optical Character Recognition).” GradesFixer, 13 Sept. 2019, gradesfixer.com/free-essay-examples/the-concept-types-of-ocr-optical-character-recognition/

The Concept & Types of Ocr (optical Character Recognition). [online]. Available at: <https://gradesfixer.com/free-essay-examples/the-concept-types-of-ocr-optical-character-recognition/> [Accessed 17 Apr. 2026].

The Concept & Types of Ocr (optical Character Recognition) [Internet]. GradesFixer. 2019 Sept 13 [cited 2026 Apr 17]. Available from: https://gradesfixer.com/free-essay-examples/the-concept-types-of-ocr-optical-character-recognition/

copy

Keep in mind: This sample was shared by another student.

450+ experts on 30 subjects ready to help
Custom essay delivered in as few as 3 hours

Get high-quality help

Prof Ernest (PhD)

Verified writer

Expert in: Information Science and Technology

(571 reviews)

“Thank you so much for accepting my assignment the night before it was due. I look forward to working with you moving forward”

+120 experts online

Hire writer

Learn the cost and time for your paper

Paper Topic

Deadline: in 10 days

Number of pages

Email Invalid email

By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email

"You must agree to out terms of services and privacy policy"

Get an estimate

No need to pay just yet!

Remember! This is just a sample.

You can get your custom paper by one of our expert writers.

Get custom essay

121 writers online

Still can’t find what you need?

Browse our vast selection of original essay samples, each expertly formatted and styled

The Concept & Types of Ocr (optical Character Recognition)

Table of contents