Pssst… we can write an original essay just for you.
Any subject. Any type of essay.
We’ll even meet a 3-hour deadline.
121 writers online
Clustering of words is the method that is used to partition the sets of words into subsets of semantically similar words. Word clustering has crucial in many applications of natural language processing like POS tagging, spell checker, grammar checker, word sense disambiguation and many more. In this paper we propose a model by using higher order N-grams language model that is helpful for clustering Bangla word efficiently, which is based on the similarity of meaning in language and contextual.
N-gram rules used to generate different probabilities for different structure of sentences. For implementation we also propose a system that generates diﬀerent words of cluster and tested by threshold values to justify given result. By experimenting with a large corpus of the word length of Bangla sentences, our proposed model shows the accuracy approximately 89% for higher order N-gram which is quite satisfactory. Keywords— Bangla language processing, word cluster, corpus, higher orders n-gram, threshold valuesI.
The idea of word prediction with the probabilistic models called n-gram model, which predict the next word from the previous n-1 words . This n-gram is the most important tools in speech and language processing. And these types of statistical models of word sequence also called language model. It controls length, decides the suitable words and necessary for statistical machine translation. There are different types of methods have been used to implement like bi-gram, tri-gram model and so on. So, this new research dimension of word clustering in Bangla Language Processing is increasing day by day. By the research history of word clustering, it is cleared that its application in language processing field is magnificent. So, we have to introduce it globally by an efficient method of Bangla word clustering using N-gram language models. A very few words were used in previous history but here we use about 2 lakhs of word cluster for getting efficiency. Also, by this method we show the performance for higher order N-grams.
The research of word clustering in Bangla Language Processing is in the beginning stage. So, word cluster can be helpful for many sectors natural language processing. Like word sense disambiguation, text classification, recommendation system, spell checker, grammar checker, knowledge discovers and many other applications. Word Sense Disambiguation (WSD) is identifying which sense of a word is used in a sentence when the word has multiple meaning. The natural language is formed in a way that requires so much of it is a reflection of that neurologic reality. For reducing the problem of WSD, word clustering can also point out the most suitable form of a word . The Text classification assigns one or more classes to a document according to their content. The POS tagging is a supervised learning solution that uses features like the previous word, next word, is first word capitalized . It is also called grammatical tagging or word category disambiguation.
Word cluster can determine POS tag for a specific unknown word . Word clustering can be helpful for spell checker as it provides many choices to correct the incorrect spelling . The main concept of the cluster is to group words into clusters where words are homogeneous or similar words as in different clusters clearly different from each other clustering group. For that reason, we proposed a framework to implement the word cluster system with the help of n-gram higher order rules. This paper analysis the system with about 3019 different kinds of Bangla sentences.Now, Bangla is the 4th most spoken language and spoken over 245 million people in the world. And also enriched with different resources like cultural, historical. A good number of researches of word clustering for some languages like Russian, Arabic, Chinese, Japanese and English have already done. English has already been implemented enough methods to enrich resources on the other hand Bangla is still stay in behind and could not reach up the satisfactory level. So, it is essential to grow up the necessity of Bangla word clustering.The aim of our research is to speed up the entire process through higher order N-grams. And observe the result which gram gives better performance. Also, our proposed methodology will play an important role in search engine. Bangla word clustering does not have efficient methods. For saving all of the excellences done in Bangla it is necessary to enhance the power of Bangla language.
Many implementations of other language have been done but due to the shortage of resources the implementation of word clustering in Bangla is remaining in behind. At first, implementation of bigram model for the calculation of weight matrix of a neural network . Other method using N-gram is introduced by author , who show the similarity function and greedy algorithm that is used to group the words into the same cluster. For Japanese and English language an effective method is delete interpolation that was developed . By using this method, they got better result than the class-based N-gram models’ method. A machine learning technique is used to implement word clustering based on tri-gram, 4-gram and 5-gram. Another English paper was published after their experiment . They used Naïve Bayes method to classifying words using surrounding context words as feature that works effectively. Some work has been developed to show the technical challenges and design the issue in Bangla language processing . Another methodology was implemented for word clustering by using unsupervised machine learning technique .
A stochastic language model is used for automatic word prediction in Bangla language . Another Bangla paper was published that showed corpus based unsupervised Bangla word stemming by using the N-gram model . A machine learning technique is used to implement word clustering based on tri-gram, 4-gram and 5-gram for a better result . By observing all of these papers it is clear to us that many experiments have developed but there is no any existing model that can help to generate the word cluster efficiently for higher order n-grams. Also, other languages already stared the implementation of word clustering. So, this can be a new dimension for our language. In this paper we work with a new approach that will help for word clustering in Bangla Natural Language processing. III. Proposed frameworkIn our proposed framework, we have six modules including input sentences, n-gram selector, rule generator, word cluster, threshold value and output. In Fig. 1. We have shown our system.
To export a reference to this article please select a referencing style below:
Sorry, copying is not allowed on our website. If you’d like this or any other sample, we’ll happily email it to you.
Attention! this essay is not unique. You can get 100% plagiarism FREE essay in 30sec
Sorry, we cannot unicalize this essay. You can order Unique paper and our professionals Rewrite it for you
Your essay sample has been sent.
Want us to write one just for you? We can custom edit this essay into an original, 100% plagiarism free essay.Order now
Are you interested in getting a customized paper?Check it out!