close
test_template

Methods of Large Scale Text Classification in Natural Language Processing

Human-Written
download print

About this sample

About this sample

close
Human-Written

Words: 2459 |

Pages: 5|

13 min read

Published: Apr 15, 2020

Words: 2459|Pages: 5|13 min read

Published: Apr 15, 2020

Table of contents

  1. Introduction
  2. Text Classification Process
  3. Word Based Representation
  4. Graph Based Representation
  5. Semantic Relation
  6. Application of Text Classification Algorithm
  7. Observation
  8. Conclusion

Text classification is the task of classifying un-labelled natural language documents into a predefined set ofcategories. Task of classification can depend on various factorslike structure of data,size of data processed etc. Many real worldproblems however need to consider a huge amount of data tobe classified from many sources. Large scale text classificationclassifies the text into thousands of classes and in some cases,each document may belong to only a single class while inothers to more than one class.

Hierarchical relations can offerextra information to a classification system which can improvescalability and accuracy. The work aims at a survey on variousmethod used for text classification in NLP which include bothMachine learning and deep learning techniques. it also describesthe evaluation measures commonly used for the classificationsystem.

Introduction

Text classification deals with the problem of assigningdocuments to a predefined set of classes. Consider the caseof binary classification where there is just one class andeach document either belongs to it or not. Spam filtering issuch an example, where emails are classified as fraudulentor not. A classifier can be trained using positive and negativeinstances in order to perform the classification automaticallyin machine learning, but was found rarely to be 100% correcteven in the simplest case.

In large Scale Text Classifica-tion,the volume of documents to be processed is also verylarge (hundreds of thousands or even millions), leading to ahigh vocabulary (unique different words in the documents,also known as types). One of the aspect of Multi label clas-sification is that the classes are connected each other. Thusthis can be a parent child relation composing a hierarchy. Aclass taxonomy offers extra information to a classificationsystem, which can be exploited either to improve scalabilityor to improve accuracy of the classification system.

Text Classification Process

The goal of text classification is to automatically clas-sify the text documents into one or more defined cate-gories. Classes are selected from a previously establishedtaxonomy (a hierarchy of catergories or classes). The task of representing a given document in a formwhich is suitable for data mining system is referred asdocument representation. Since data can be structured orunstructured,form of representation is very important for theclassification process i. e. in the form of instances with a fixednumber of attributes. Documents from plain text is convertedto a fixed number of attributes in a training set. This processcan be done in several ways.

Word Based Representation

The process of setting oneof the parts of speech to the given word in the document istermed as Parts Of Speech tagging. It is commonly referredto as POS tagging. Parts of speech can be nouns, verbs,adverbs, adjectives, pronouns, conjunction and their sub-categories.

Parts Of Speech tagger or POS tagger tag thewords automatically. Taggers use several kinds of informa-tion for the process of tagging the words such as dictionaries,lexicons, rules, and so on. Dictionaries contain category orcategories of a particular word. That is a word may belongto more than one category. For example, run is both nounand verb. Taggers use probabilistic information to solve thisambiguity.

Graph Based Representation

Bag of-words is a typ-ical and standard way to deal with model content recordwhich is reasonable for catching word frequency. But BOWoverlooks the auxiliary and semantic data. In Graph repre-sentation,mathematical constructs are utilized to display re-lationship and basic data viably. Here, A content can suitablyrepresented as Graph in which feature term is portrayed invertex and edge connection can be the connection betweenthe feature terms.

Computations identified with different ac-tivities like term weight,ranking which is useful in numerousapplications in data recovery are given by this model. Graphbased portrayal is proper method for representation of con-tent record and enhanced the aftereffect of investigation overcustomary model for various content applications. Documentis modeled as Graph where term represented by vertices andrelation between terms is represented by edges: G ={Vertex,EdgeRelation}

There are generally five different types of vertices in theGraph representation: Vertex = {F,S,P,D,C},where F-Featureterm,S-Sentence,P-Paragraph,D-Document,C-Concept. EdgeRelation = {Syntax,Statistical,Semantic}Edge relations between two feature terms may be differenton the context of Graph.

  1. Word occurrence together in a sentence or paragraph orsection or document.
  2. Common words in a sentence or paragraph or sectionor document.

Semantic Relation

Words have similar meaning,wordsspelled same way but have different meaning,oppositewords. Term significance isn’t viably caught by the Bag-of-words approach. The Relationship between writings canbe maintained by keeping up the auxiliary representationof the data which will prompt a superior order frameworkexecution.

B. Constructing Vector Space ModelVector Space Model or VSM is a representation of a set ofdocuments as vectors in a common vector space and is funda-mental to a host of IR operations ranging from scoring docson query,doc classification and document clustering. VSMis an algebraic model for representing text documents asvectors of identifiers,such as index terms.

Feature subsetselection for text document classification task use an eval-uation function that is applied to a single word. Scoring ofindividual words can be performed using some of measureslike Document Frequency(DF),Term Frequency(TF) etc. Fea-ture extraction approach does not weight terms in order todiscard lower weighted like feature selection,but compactsthe vocabulary based on feature concurrencies.

TF-IDF: Term Frequency-Inverse Document Frequenyuses all tokens in dataset as vocabulary. TF is the frequencyof a token in each document. IDF is the number of documentsin which token occurs. The intuition for this measure is: Animportant word in a document will be occurring frequentlyand it should be given a high score. But if the word occur-rence is too high,it is probably not unique and thus assigneda lower score. The math formula for this measure : . tfidf(t,d,D) = tf(t,d) * tf (t,D), where t denotes the term,d de-notes each document an D denotes collection of documents. Advantages

  • Easiness in compute
  • Have some basic metric to extract the most descriptiveterms in a document
  • Can easily compute the similarity between 2 documentsusing itDisadvantages
  • TF-IDF is based on the bag-of-words (BoW)model. Since it uses bag of words, it does not captureposition of words in text, semantics, co-occurrences indifferent documents, etc.
  • TF-IDF is only useful as a lexical level feature
  • It cannot capture semantics (e. g. as compared to topicmodels, word embeddings)

Principle Component Analysis: PCA is a classicalmultivariate data analysis tool,a very good data dimensionreduction processing technology. Suppose there are N datasamples,each sample is expressed with n observed variablesx1, x2,. . . , xn we can get a sample of data matrix. PCA usesvariance of each feature to maximize its seperability. It is anunsupervised algorithm. Steps of PCA are

  • Standardize the data
  • Obtain eigen vectors and eigen values from co-variancematrix or co-relation matrix.
  • sort eigen values in descending order and choose k eigenvectors that correspond to k largest eigen values where kis number of dimensions of new feature subspacek ≤ d
  • Construct projection matrix W from selected k eigenvectors.
  • Transform original dataset X via W to obtain a k-dimensional feature subspace Y.

Application of Text Classification Algorithm

The Data mining algorithms in Natural Language Process-ing is used to get insights from a large amount of text data. Itis a set of heuristics and calculations that creates a modelfrom data. The algorithm first analyzes the data provided,then specific types of patterns or trends are identified. Thealgorithm then uses the results of this analysis over manyiterations and the optimal parameters for creating the miningmodel was found.

These parameters are then applied acrossthe entire data set to extract actionable patterns and detailedstatistics. Machine Learning (or ML) is an area of Artificial In-telligence (AI) that is a set of statistical techniques forproblem solving. In order to apply ML techniques to NLPproblems,the unstructured text is converted into a structuredformat. Deep Learning (which includes Recurrent NeuralNetworks, Convolution neural Networks and others) is a typeof Machine Learning approach. It is an extension of NeuralNetworks.

Deep Learning can be used for NLP tasks as well. Fig. 2. Relationship between ML,Deep Learning and NLPA. Machine learning techniques for text classificationMachine Learning is set of algorithms that parse data,learn from them, and then apply what theyve learned tomake intelligent decisions. The modeling of two techniquesis briefly discussed below:

Nave Bayes classification: Naive Bayes classifier is asupervised classifier which give an approach to express pos-itive, negative and neutral sentiments in the content. NaiveBayes classifier categorize words into their respective labelsutilizing the idea of conditional probability. The advantage ofutilizing Nave Bayes on content classification is that it needslittle informational index for preparing.

The raw informationfrom web experiences pre preparing, evacuation of numeric,outside words, HTML labels and uncommon images yieldingthe arrangement of words. Words with marks of positive,negative and unbiased words are labeled and is physicallyperformed by human specialists. This pre handling producesword-classification sets for preparing set.

Consider a wordy from test set (unlabeled wordset) and a window of n-words (x1, x2,. . . . . . , xn) from a document. The conditionalprobability of given data point y to be in the category ofn-words from training set is given by: 2) J48 algorithm used for sentiment prediction: J48 is adecision tree based classifier utilizedto produce rules for the identification of targetterms.

Feature space is isolated into unique areas pursuedby the classification of test into classification marks in theprogressive mechanism. Larger training set collections arehandled with more productivity by this strategy than differentclassifiers.  In the test set inevitably, level of a node islifted up when a close element qualifies the name state ofinterior component in a similar part of the tree. Differenttwo branches of decision tree is step by step created bythe task of assignment to the word labels.

J48 calculationutilizes entropy work for testing the order of terms from thetest set. The extra highlights of J48 are representing missingqualities, choice trees pruning, constant trait value ranges,inference of principles, and etc. where (Term) can be uni gram, bi gram and tri gram. B. Deep Learning techniques for text classification Deep learning is a technique in machine learning thatachieves great power and flexibility by learning to representthe world as nested hierarchy of concepts, with each conceptdefined in relation to simpler concepts, and more abstractrepresentations computed in terms of less abstract ones. Twoof the deep learning techniques are discussed below:

Convolution Neural Network: CNN have been broadlyutilized in image handling which have demonstrated rela-tively exact results in it. However in NLP,where the datasources are text or sentences related to as a matrix,whenCNN handles it,each column of the lattice compares to onetoken, which is word, yet it could be a character.

That is,each line is vector that speaks to a word. Commonly, thesevectors are word embeddings (low-dimensional portrayals),yet they could likewise be one-hot vectors that file the wordinto a vocabulary. For a 10 word sentence utilizing a 100-dimensional embedding, we would have a 10100 grid as ourinput. For eg,consider a sentence classification utilizing CNNmethod portrayed in the figure 2. 3, Here three channel areaof sizes: 2, 3 and 4 are delineated, every one of whichhas 2 filters.

Hence a univariate highlight vector is created fromeach of the six maps, and these 6 highlights are linked toshape a component vector for the penultimate layer. Thelast softmax layer at that point gets this component vectoras input and utilizes it to categorize the sentence; herebinary characterization is expected and consequently showtwo conceivable output states.

Reccurrent Neural Network: The concept behindRNNs is to make utilization of consecutive data. In acustomary neural system we expect that all sources of input(and output) are not dependent on one another. Yet, forsome assignments that is an unfruitful thought. On the offchance that you need to anticipate the following word in asentence you better know which words preceded it.

RNNs arecalled repetitive on the grounds that they play out a similarassignment for each component of a grouping, with the yieldbeing relied upon the past calculations. Another approach toconsider RNNs is that they have a ”memory” which catchesdata about what has been figured up until now.

In words,RNNs can make use of data in subjectively long successions,however by and by action,they are restricted to thinking backjust a couple of steps. The uses of RNN system models are two-overlay: First, itenables us to score self-assertive sentences in view of thefact that they are so liable to happen in reality. This gives usa proportion of syntactic and semantic correctness.

Secondly,a model for language enables us to create new contentThe figure underneath demonstrates a RNN being unrolled(or unfurled) into a full system. By unrolling we essentiallyimply that we work out the system for the entire succession. For instance, if the sequence we care about is a sentence of5 words, the system would be unrolled into a 5-layer neuralsystem, one layer for each word. Fig. 4. An RNN network and the unfolding in time of the computationinvolved in its forward computation

  1. Precision: Precision for a class C is the fraction oftotal number of documents that are correctly classified tothe total number of documents that classified to the class C. Precision = TPTP + FPIn which TP, FN, FP and TN refer respectively to thenumber of true positive instances, the number of falsenegative instances, the number of false positive instances andthe number of true negative instances.
  2. Recall: Recall is the fraction of total number of cor-rectly classified documents to the total number of documentsthat belongs to class C. Recall = TPTP + FNIn which TP, FN, FP and TN refer respectively to thenumber of true positive instances, the number of falsenegative instances, the number of false positive instances andthe number of true negative instances.
  3. F-measure: F-measure or F1-measure is a combinationof recall and precision which is used for performance eval-uation. F1 measure is a derived effectiveness measurement. The resultant value is interpreted as a weighted average ofthe precision and recall. F-measure = 2*precision * recallprecision + recallIV.

Observation

On comparing the various methods of text clas-sification,some methods works well with only smalldatasets. However most of the real time problems in NLP dealwith a large scale of data and most of the ML techniqueswere found fast on small dataset.

Also considering a class tax-onomy or hierarchy is important since it offers extra informa-tion to a classification system,which can improve scalabilityand accuracy. Thus on dealing with complex problems,deeplearning was found more promising. Also while using deeplearning, Learning can be done unsupervised. Since the datato be classified can range from varied sources,representationof data,features to be selected,classifying approach(whetherML or DL)etc,which evaluation measure to be used,all willdepend mostly on the context.

Get a custom paper now from our expert writers.

Conclusion

Text Classification assigns one or more classes to a doc-ument according to their content. Classes are automaticallyselected from a previously established classes to make theprocess superfast and efficient. Deep learning is a technologythat has become an essential part of machine learning workflows. Deep learning has been used extensively in natural lan-guage processing (NLP) because it is well suited for learningthe complex underlying structure of a sentence and semanticproximity of various words. Various evaluation measures arealso decribed to check the accuracy of classification.

Image of Dr. Charlotte Jacobson
This essay was reviewed by
Dr. Charlotte Jacobson

Cite this Essay

Methods of Large Scale Text Classification in Natural Language Processing. (2020, April 12). GradesFixer. Retrieved January 11, 2025, from https://gradesfixer.com/free-essay-examples/methods-of-large-scale-text-classification-in-natural-language-processing/
“Methods of Large Scale Text Classification in Natural Language Processing.” GradesFixer, 12 Apr. 2020, gradesfixer.com/free-essay-examples/methods-of-large-scale-text-classification-in-natural-language-processing/
Methods of Large Scale Text Classification in Natural Language Processing. [online]. Available at: <https://gradesfixer.com/free-essay-examples/methods-of-large-scale-text-classification-in-natural-language-processing/> [Accessed 11 Jan. 2025].
Methods of Large Scale Text Classification in Natural Language Processing [Internet]. GradesFixer. 2020 Apr 12 [cited 2025 Jan 11]. Available from: https://gradesfixer.com/free-essay-examples/methods-of-large-scale-text-classification-in-natural-language-processing/
copy
Keep in mind: This sample was shared by another student.
  • 450+ experts on 30 subjects ready to help
  • Custom essay delivered in as few as 3 hours
Write my essay

Still can’t find what you need?

Browse our vast selection of original essay samples, each expertly formatted and styled

close

Where do you want us to send this sample?

    By clicking “Continue”, you agree to our terms of service and privacy policy.

    close

    Be careful. This essay is not unique

    This essay was donated by a student and is likely to have been used and submitted before

    Download this Sample

    Free samples may contain mistakes and not unique parts

    close

    Sorry, we could not paraphrase this essay. Our professional writers can rewrite it and get you a unique paper.

    close

    Thanks!

    Please check your inbox.

    We can write you a custom essay that will follow your exact instructions and meet the deadlines. Let's fix your grades together!

    clock-banner-side

    Get Your
    Personalized Essay in 3 Hours or Less!

    exit-popup-close
    We can help you get a better grade and deliver your task on time!
    • Instructions Followed To The Letter
    • Deadlines Met At Every Stage
    • Unique And Plagiarism Free
    Order your paper now