Home — Essay Samples — Information Science and Technology — Data Mining — Text Mining as Document Search

Text Mining as Document Search

Categories: Data Mining Information Technology

About this sample

Words: 1075 |

Pages: 2|

6 min read

Published: Oct 11, 2018

Words: 1075|Pages: 2|6 min read

Published: Oct 11, 2018

Approaches to Text Mining

Using well-tested methods and understanding the results of text mining

"Black-box" approaches to text mining and extraction of concepts

Skepticism is urged when using such algorithms because

While running their day to day business, organizations encounter textual data. The source of the data could be electronic text, call center logs, social media, corporate documents, research papers, application forms, service notes, emails, etc. This data may be accessible but remains untapped due to the lack of awareness of the information wealth an organization possesses or the lack of methodology or technology to analyze this data and get the useful insight.

Purpose of Text Mining is to process unstructured (textual) information, extract meaningful numeric indices from the text, and, thus, make the information contained in the text accessible to the various data mining (statistical and machine learning) algorithms. Information can be extracted to derive summaries for the words contained in the documents or to compute summaries for the documents based on the words contained in them. Hence, you can analyze words, clusters of words used in documents, etc., or you could analyze documents and determine similarities between them or how they are related to other variables of interest in the data mining project. In the most general terms, text mining will "turn text into numbers" (meaningful indices), which can then be incorporated in other analyses such as predictive data mining projects, the application of unsupervised learning methods (clustering), etc.

As we can analyze, Text mining is the knowledge discovery from textual data or textual data exploration to uncover useful but hidden information. However, many people have defined text mining slightly differently. The following are a few definitions:

“The objective of Text Mining is to exploit the information contained in textual documents in various ways, including …discovery of patterns and trends in data, associations among entities, predictive rules, etc.” (Grobelnik et al., 2001).

“Another way to view text data mining is as a process of exploratory data analysis that leads to heretofore unknown information, or to answers for questions for which the answer is not currently known.” (Hearst, 1999).

Text mining also known as text data mining or text analytics is the process of discovering high-quality information from the textual data sources. The application of text mining techniques to solve specific business problems is called business text analytics or simply text analytics. Text mining techniques can facilitate organizations derive valuable business insight from the wealth of textual information they possess.

Text mining transforms textual data into a structured format through the use of several techniques. It involves identification and collection of the textual data sources, NLP techniques like part of speech tagging and syntactic parsing, entity/concept extraction which identifies named features like people, places, organizations, etc., disambiguation, establishing a relationship between different entities/concepts, pattern and trend analysis and visualization techniques.

Text mining is similar to data mining, except that data mining tools are designed to handle structured data from databases, but text mining can also work with unstructured or semi-structured data sets such as emails, text documents, and HTML files etc. As a result, text mining is a far better solution.

Text mining usually is the process of structuring the input text (usually parsing, along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into a database), deriving patterns within the structured data, and final evaluation and interpretation of the output.

Approaches to Text Mining

To reiterate, text mining can be summarized as a process of "numericizing" text. At the simplest level, all words found in the input documents will be indexed and counted in order to compute a table of documents and words, i.e., a matrix of frequencies that enumerates the number of times that each word occurs in each document. This basic process can be further refined to exclude certain common words such as "the" and "a" (stop word lists) and to combine different grammatical forms of the same words such as "traveling," "traveled," "travel," etc. However, once a table of (unique) words (terms) by documents has been derived, all standard statistical and data mining techniques can be applied to derive dimensions or clusters of words or documents, or to identify "important" words or terms that best predict another outcome variable of interest.

Using well-tested methods and understanding the results of text mining

Once a data matrix has been computed from the input documents and words found in those documents, various well-known analytic techniques can be used for further processing those data including methods for clustering, factoring, or predictive data mining

"Black-box" approaches to text mining and extraction of concepts

There are text mining applications which offer "black-box" methods to extract "deep meaning" from documents with little human effort (to first reading and understand those documents). These text mining applications rely on proprietary algorithms for presumably extracting "concepts" from the text, and may even claim to be able to summarize large numbers of text documents automatically, retaining the core and most important meaning of those documents. While there are numerous algorithmic approaches to extracting "meaning from documents," this type of technology is very much still in its infancy, and the aspiration to provide meaningful automated summaries of large numbers of documents may forever remain elusive.

Skepticism is urged when using such algorithms because

1) if it is not clear to the user how those algorithms work, it cannot possibly be clear how to interpret the results of those algorithms, and

2) the methods used in those programs are not open to scrutiny, for example by the academic community and peer review and, hence, we simply don't know how well they might perform in different domains.

As a final thought on this subject, you may consider this concrete example: Try the various automated translation services available via the Web that can translate entire paragraphs of text from one language into another. Then translate some text, even simple text, from your native language to some other language and back, and review the results. Almost every time, the attempt to translate even short sentences to other languages and back while retaining the original meaning of the sentence produces humorous rather than accurate results. This illustrates the difficulty of automatically interpreting the meaning of the text.

There is another type of application that is often described and referred to as "text mining" - the automatic search of large numbers of documents based on keywords or key phrases.

This is the domain of, for example, the popular internet search engines that have been developed over the last decade to provide efficient access to Web pages with certain content.

This essay was reviewed by

Alex Wood

More about our Team

Cite this Essay

Text Mining as Document Search. (2018, October 08). GradesFixer. Retrieved April 26, 2024, from https://gradesfixer.com/free-essay-examples/text-mining-as-document-search/

“Text Mining as Document Search.” GradesFixer, 08 Oct. 2018, gradesfixer.com/free-essay-examples/text-mining-as-document-search/

Text Mining as Document Search. [online]. Available at: <https://gradesfixer.com/free-essay-examples/text-mining-as-document-search/> [Accessed 26 Apr. 2024].

Text Mining as Document Search [Internet]. GradesFixer. 2018 Oct 08 [cited 2024 Apr 26]. Available from: https://gradesfixer.com/free-essay-examples/text-mining-as-document-search/

copy

Keep in mind: This sample was shared by another student.