Home — Essay Samples — Information Science and Technology — Computer Programming — Term Frequency – Inverse Document Frequency in Document Corpus

Term Frequency - Inverse Document Frequency in Document Corpus

Categories: Computer Programming Computer Software Information Technology

About this sample

Words: 565 |

Page: 1|

3 min read

Published: Jan 4, 2019

Words: 565|Page: 1|3 min read

Published: Jan 4, 2019

The TF - IDF stands for Term Frequency - Inverse Document Frequency, using this TF-IDF weight of the document is calculated. It is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus. It is often used as a weighting factor in searches of information retrieval, text mining, and user modeling. The TF-IDF weight is calculated by two terms:

The (TF) which measures how frequently a term occurs in a document. Since every document is different in length, it is possible that a term would appear much more times in long documents than shorter ones. Thus, the term frequency is often divided by the document length as a way of normalization: Suppose we have a set of English text documents and wish to rank which document is most relevant to the query, “the brown cow”. A simple way to start out is by eliminating documents that do not contain all three words “the”, “brown”, and “cow”, but this still leaves many documents. To further distinguish them, we might count the number of times each term occurs in each document; the number of times a term occurs in a document is called its term frequency. However, in the case where the length of documents varies greatly, adjustments are often made (see definition below).

The first form of term weighting is due to Hans Peter Luhn (1957) which may be summarized as: The weight of a term that occurs in a document is simply proportional to the term frequency. [3]TF (t) = (Number of times term t appears in a document) / (Total number of terms in the document).

The (IDF) which measures how important a term is. While computing TF, all terms are considered equally important. However it is known that certain terms, such as “is”, “of”, and “that”, may appear a lot of times but have little importance. Thus we need to weigh down the frequent terms while scale up the rare ones, by computing the following: Because the term “the” is so common, term frequency will tend to incorrectly emphasize documents which happen to use the word “the” more frequently, without giving enough weight to the more meaningful terms “brown” and “cow”. The term “the” is not a good keyword to distinguish relevant and non-relevant documents and terms, unlike the less-common words “brown” and “cow”. Hence an inverse document frequency factor is incorporated which diminishes the weight of terms that occur very frequently in the document set and increases the weight of terms that occur rarely. Karen Spärck Jones (1972) conceived a statistical interpretation of term specificity called Inverse Document Frequency (IDF), which became a cornerstone of term weighting:

The specificity of a term can be quantified as an inverse function of the number of documents in which it occurs. [4]IDF (t) = log_e (Total number of documents / Number of documents with term t in it).

WHOIS: WHOIS is a query and response protocol that is widely used for querying databases that store the registered users or assignees of an Internet resource, such as a domain name, an IP address block, or an autonomous system, but is also used for a wider range of other information. The protocol stores and delivers database content in a human-readable format.

Locating Phishing Server:

URL is nothing but IP Address.
Using IP address our system will locate phishing server.

Serverless Architecture

Lively Protections From Recognize And Lighten Scattered Refusal Of Organization (DDoS) Ambushes

This essay was reviewed by

Alex Wood

More about our Team

Cite this Essay

Term Frequency – Inverse Document Frequency in Document Corpus. (2019, January 03). GradesFixer. Retrieved July 31, 2026, from https://gradesfixer.com/free-essay-examples/term-frequency-inverse-document-frequency-in-document-corpus/

“Term Frequency – Inverse Document Frequency in Document Corpus.” GradesFixer, 03 Jan. 2019, gradesfixer.com/free-essay-examples/term-frequency-inverse-document-frequency-in-document-corpus/

Term Frequency – Inverse Document Frequency in Document Corpus. [online]. Available at: <https://gradesfixer.com/free-essay-examples/term-frequency-inverse-document-frequency-in-document-corpus/> [Accessed 31 Jul. 2026].

Term Frequency – Inverse Document Frequency in Document Corpus [Internet]. GradesFixer. 2019 Jan 03 [cited 2026 Jul 31]. Available from: https://gradesfixer.com/free-essay-examples/term-frequency-inverse-document-frequency-in-document-corpus/

copy

Keep in mind: This sample was shared by another student.

450+ experts on 30 subjects ready to help
Custom essay delivered in as few as 3 hours

Get high-quality help

Meadow

Verified writer

Expert in: Information Science and Technology

4.9

(340 reviews)

“She did such a phenomenal job on this assignment! She completed it prior to its deadline and was thorough and informative”

+120 experts online

Hire writer

Learn the cost and time for your paper

Paper Topic

Deadline: in 10 days

Number of pages

Email Invalid email

By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email

"You must agree to out terms of services and privacy policy"

Get an estimate

No need to pay just yet!

Remember! This is just a sample.

You can get your custom paper by one of our expert writers.

Get custom essay

121 writers online

Still can’t find what you need?

Browse our vast selection of original essay samples, each expertly formatted and styled

Term Frequency - Inverse Document Frequency in Document Corpus

Cite this Essay

Related Essays

Still can’t find what you need?

Related Essays

Related Topics

Get Your Personalized Essay in 3 Hours or Less!

Get Your
Personalized Essay in 3 Hours or Less!