By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email
No need to pay just yet!
About this sample
About this sample
Words: 989 |
Pages: 2|
5 min read
Published: Sep 20, 2018
Words: 989|Pages: 2|5 min read
Published: Sep 20, 2018
Machine Translation (MT) is the field of translating from one source language to any another target language. Machine Translation is one of the most dominant emerging fields of today’s world. Machine Translation came into existence in early1940s during the cold war when there was a great need to decipher or decrypt the secretly coded message exchange between the English and the Russian language. The technology at that time was referred to as the “Science of Cryptography”.MT is a key to success to any new services. Nowadays many IT and other private sectors are converging to MT technology so as to enhance their existing product services. This has promoted the development of many new models and resulted in the development of an open source Statistical Machine Translation system (SMT) Moses which is deployed in various Institutes, research project etc. There are various approaches to Machine Translation.
No Author GivenRule-based Machine Translation (RBSMT) is one of the most basic and the earliest approach to Machine Translation. In RBSMT, we have to develop and handle rules using different grammatical conventions, lexicon and we need to process the rules [1]. Generally, rules are coded by Linguistic Expert who has greater expertise in this field. The advantages of RBMT is that it is very simple and can be easily extended to handle any situation. There are different approaches of RBMT. They are Transfer-based RBMT, Interlingua-based RBMT, and Dictionary-based RBMT. One of the limitations of RBMT is that we, human, need to create rules for every analysis and generation stage which is a very cumbersome and sometimes tedious task. We need to create and develop rules in order to adapt to the new changing environment. Thus, we used the corpus-based approach due to the failure of rule-based approaches. This may be due to increasing availability of Machine Readable text, and increasing capability of the hardware. There are different approaches that used corpus-based MT. They are mentioned as follows:1.2.3.4.5.
Examples:
Example-based Machine Translation (EBMT) is one of the SMT approach based on the analogy. It is also based on the Bilingual Corpus as its main knowledgebase [1]. Given a new test Source Sentence and their corresponding reference sentences, it is translated using examples or analogy from the knowledge base. The translated sentences are stored into the knowledge base. This saves the effort of translating on every new test sentences. One of the limitations to this approach is that if there is an unmatched test sentence, then it needs to be regenerated from the scratch. It cannot use the close phrases or word in order to predict the translation [1, 10].
Translation (SMT) is a data-driven or corpus-based approach to MT. It used the supervised and unsupervised technique of Machine learning algorithm to train the translation model. The goal of the SMT system is to produce a target translated sentence from a given source sentence. Among all the possible candidate translation sentence for a given input sentence, SMT decoder tries to find the best possible translation. The approach is referred to as the Noisy channel model of Bayes’ Theorem. The argmax operation in this model tries to search for the best translation from the space of all possible translation for the input test sentences. Bootstrapping technique can be used in SMT to learn the bilingual data. It can be iterated repeatedly to obtain better results.
The Moses decoder can accept input in the form of test sentences, confusion network, or lattice networks. In our work, we used the concept of a Phrase-based model of Statistical Machine translation. It is a more refined model of Statistical Machine Translation. In this approach, the input sentences and the output sentences are divided into segments Phrases. So one of the advantages of this model is that it did not translate Title Suppressed Due to Excessive Length3word by word. When there is a mismatch for a word, we used the preceding n-gram approach to predict the new words but with a word penalty. In this context assigned a reordering cost or a distortion cost depending on the number of words skipped either forward or the backward.
Phrase-based model is one of the most successful approaches of Statistical Machine Translation, but it can not handle the syntax and semantics information of the target language [1, 5]. Thus, the main focus of this paperwork is to build an SMT System based on phrase-based model and hence to translate any available text or documents from English to Manipuri Language pair. For this, we required an enormous amount of Parallel corpus or bitext of aligned text level sentences. If the corpus contains formed input it will not be translated correctly and hence, it will affect the translation Model. Not only this the main aim of this paper is to increase the fluency of the translated output language. There is the very limited availability of electronically available parallel text corpora.
Our paper works clearly show that even though we do not have much training parallel text corpora, we still try to improve the fluency of the translated output. This has been achieved through the incorporation of the monolingual corpus on the target side of the language pair during the language model training. We also prove that the incorporation of Monolingual corpus significantly improves the Bilingual Evaluation UnderStudy (BlEU Score) which is one of a great result in our research findings and experimentation. In the succeeding section, we discuss the methodology and System architecture of the PBSMT system. We then evaluate and compare the performance of our improved PBSMT system and the baseline PBSMT system using various automatic and human evaluation metrics and subsequently we analyze the various types of error generated by the PBSMT system. In Sect.5, we conclude with the discussion on the result and observation obtained from our research findings.
Browse our vast selection of original essay samples, each expertly formatted and styled