This essay has been submitted by a student. This is not an example of the work written by professional essay writers.

A Framework for Word Clustering of Bangla Sentences Using Higher Order N-gram Language Model

downloadDownload printPrint

Pssst… we can write an original essay just for you.

Any subject. Any type of essay. We’ll even meet a 3-hour deadline.

Get your price

121 writers online

Download PDF

Clustering of words is the method that is used to partition the sets of words into subsets of semantically similar words. Word clustering has crucial in many applications of natural language processing like POS tagging, spell checker, grammar checker, word sense disambiguation and many more. In this paper we propose a model by using higher order N-grams language model that is helpful for clustering Bangla word efficiently, which is based on the similarity of meaning in language and contextual.

N-gram rules used to generate different probabilities for different structure of sentences. For implementation we also propose a system that generates different words of cluster and tested by threshold values to justify given result. By experimenting with a large corpus of the word length of Bangla sentences, our proposed model shows the accuracy approximately 89% for higher order N-gram which is quite satisfactory. Keywords— Bangla language processing, word cluster, corpus, higher orders n-gram, threshold valuesI.


The idea of word prediction with the probabilistic models called n-gram model, which predict the next word from the previous n-1 words [1]. This n-gram is the most important tools in speech and language processing. And these types of statistical models of word sequence also called language model. It controls length, decides the suitable words and necessary for statistical machine translation. There are different types of methods have been used to implement like bi-gram, tri-gram model and so on. So, this new research dimension of word clustering in Bangla Language Processing is increasing day by day. By the research history of word clustering, it is cleared that its application in language processing field is magnificent. So, we have to introduce it globally by an efficient method of Bangla word clustering using N-gram language models. A very few words were used in previous history but here we use about 2 lakhs of word cluster for getting efficiency. Also, by this method we show the performance for higher order N-grams.

The research of word clustering in Bangla Language Processing is in the beginning stage. So, word cluster can be helpful for many sectors natural language processing. Like word sense disambiguation, text classification, recommendation system, spell checker, grammar checker, knowledge discovers and many other applications. Word Sense Disambiguation (WSD) is identifying which sense of a word is used in a sentence when the word has multiple meaning. The natural language is formed in a way that requires so much of it is a reflection of that neurologic reality. For reducing the problem of WSD, word clustering can also point out the most suitable form of a word [2]. The Text classification assigns one or more classes to a document according to their content. The POS tagging is a supervised learning solution that uses features like the previous word, next word, is first word capitalized [3]. It is also called grammatical tagging or word category disambiguation.

Word cluster can determine POS tag for a specific unknown word [4]. Word clustering can be helpful for spell checker as it provides many choices to correct the incorrect spelling [5]. The main concept of the cluster is to group words into clusters where words are homogeneous or similar words as in different clusters clearly different from each other clustering group. For that reason, we proposed a framework to implement the word cluster system with the help of n-gram higher order rules. This paper analysis the system with about 3019 different kinds of Bangla sentences.Now, Bangla is the 4th most spoken language and spoken over 245 million people in the world. And also enriched with different resources like cultural, historical. A good number of researches of word clustering for some languages like Russian, Arabic, Chinese, Japanese and English have already done. English has already been implemented enough methods to enrich resources on the other hand Bangla is still stay in behind and could not reach up the satisfactory level. So, it is essential to grow up the necessity of Bangla word clustering.The aim of our research is to speed up the entire process through higher order N-grams. And observe the result which gram gives better performance. Also, our proposed methodology will play an important role in search engine. Bangla word clustering does not have efficient methods. For saving all of the excellences done in Bangla it is necessary to enhance the power of Bangla language.

Releted work

Many implementations of other language have been done but due to the shortage of resources the implementation of word clustering in Bangla is remaining in behind. At first, implementation of bigram model for the calculation of weight matrix of a neural network [6]. Other method using N-gram is introduced by author [7], who show the similarity function and greedy algorithm that is used to group the words into the same cluster. For Japanese and English language an effective method is delete interpolation that was developed [8]. By using this method, they got better result than the class-based N-gram models’ method. A machine learning technique is used to implement word clustering based on tri-gram, 4-gram and 5-gram. Another English paper was published after their experiment [9]. They used Naïve Bayes method to classifying words using surrounding context words as feature that works effectively. Some work has been developed to show the technical challenges and design the issue in Bangla language processing [10]. Another methodology was implemented for word clustering by using unsupervised machine learning technique [11].

A stochastic language model is used for automatic word prediction in Bangla language [12]. Another Bangla paper was published that showed corpus based unsupervised Bangla word stemming by using the N-gram model [13]. A machine learning technique is used to implement word clustering based on tri-gram, 4-gram and 5-gram for a better result [14]. By observing all of these papers it is clear to us that many experiments have developed but there is no any existing model that can help to generate the word cluster efficiently for higher order n-grams. Also, other languages already stared the implementation of word clustering. So, this can be a new dimension for our language. In this paper we work with a new approach that will help for word clustering in Bangla Natural Language processing. III. Proposed frameworkIn our proposed framework, we have six modules including input sentences, n-gram selector, rule generator, word cluster, threshold value and output. In Fig. 1. We have shown our system.

Remember: This is just a sample from a fellow student.

Your time is important. Let us write you an essay from scratch

experts 450+ experts on 30 subjects ready to help you just now

delivery Starting from 3 hours delivery

Find Free Essays

We provide you with original essay samples, perfect formatting and styling

Cite this Essay

To export a reference to this article please select a referencing style below:

A Framework For Word Clustering Of Bangla Sentences Using Higher Order N-Gram Language Model. (2019, September 13). GradesFixer. Retrieved October 15, 2021, from
“A Framework For Word Clustering Of Bangla Sentences Using Higher Order N-Gram Language Model.” GradesFixer, 13 Sept. 2019,
A Framework For Word Clustering Of Bangla Sentences Using Higher Order N-Gram Language Model. [online]. Available at: <> [Accessed 15 Oct. 2021].
A Framework For Word Clustering Of Bangla Sentences Using Higher Order N-Gram Language Model [Internet]. GradesFixer. 2019 Sept 13 [cited 2021 Oct 15]. Available from:
copy to clipboard

Sorry, copying is not allowed on our website. If you’d like this or any other sample, we’ll happily email it to you.

    By clicking “Send”, you agree to our Terms of service and Privacy statement. We will occasionally send you account related emails.


    Attention! This essay is not unique. You can get a 100% Plagiarism-FREE one in 30 sec

    Receive a 100% plagiarism-free essay on your email just for $4.99
    get unique paper
    *Public papers are open and may contain not unique content
    download public sample

    Sorry, we could not paraphrase this essay. Our professional writers can rewrite it and get you a unique paper.



    Your essay sample has been sent.

    Want us to write one just for you? We can custom edit this essay into an original, 100% plagiarism free essay.

    thanks-icon Order now

    Hi there!

    Are you interested in getting a customized paper?

    Check it out!
    Having trouble finding the perfect essay? We’ve got you covered. Hire a writer

    Haven't found the right essay?

    Get an expert to write you the one you need!


    Professional writers and researchers


    Sources and citation are provided


    3 hour delivery