close
This essay has been submitted by a student. This is not an example of the work written by professional essay writers.

What is Similarity Measures

downloadDownload printPrint

Remember! This is just a sample.

You can get your custom paper by one of our expert writers.

Get custom essay

121 writers online

What is Similarity Measures essay
Download PDF

Semantic Similarity

Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between them is based on the likeness of their meaning or semantic content as opposed to similarity which can be estimated regarding their syntactical representation (e.g. their string format). These are mathematical tools used to estimate the strength of the semantic relationship between units of language, concepts or instances, through a numerical description obtained according to the comparison of information supporting their meaning or describing their nature.

The similarity is subjective and is highly dependent on the domain and application. For example, two fruits are similar because of colour or size or taste. Care should be taken when calculating distance across dimensions/features that are unrelated. The relative values of each element must be normalized, or one feature could end up dominating the distance calculation. Similarities are measured in the range 0 to 1 [0,1].

Similarity Measures

A Similarity Measure is the measure of how much alike two data objects are. Similarity measure in context of data mining is a distance between points of dimensions representing features of the objects. If this distance is small, it will be the high degree of similarity where as a large distance will be the low degree of similarity.

A Similarity Measure is also known as Similarity Function which is a real-valued function that quantifies the similarity between two objects. Although no single definition of a similarity measure exists, usually such measures are in some sense the inverse of distance metrics: they take on large values for similar objects and either zero or a negative value for very dissimilar objects.

Similarity between two documents or document Vs query terms: A similarity measure can be used to calculate similarity between two documents, two queries, or one document and one query.

Document Ranking: similarity measure score can be used to rank the documents.

All clustering algorithms use similarity or so called “distance functions” to determine cluster members. Few of the most popular similarity measures are discussed in the following subsections.

Euclidian Distance

It is a standard metric for geometrical problems. It is the ordinary distance between two points and can be easily measured with a ruler in two- or three-dimensional space. Euclidean distance is widely used in clustering problems, including clustering text. It is also the default distance measure used with the K-means algorithm. Measuring distance between text documents: given two documents, da and db represented by their term vectors ta and tb respectively. The Euclidean distance of the two documents is defined as:

Where, the term set is T = {t1, t2,..….., tn}In this calculation Wt,a = tf-idf(da,t)

Euclidean distance is the most common use of distance. In most cases when people said about distance, they will refer to Euclidean distance. Euclidean distance is also known as simply distance. When data is dense or continuous, this is the best proximity measure.

Manhattan Distance

Manhattan distance is a metric in which the distance between two points is the sum of the absolute differences of their Cartesian coordinates. In a simple way of saying it is the total sum of the difference between the x-coordinates and y-coordinates.

Suppose we have two points A and B if we want to find the Manhattan distance between them, just we have, to sum up, the absolute x-axis and y – axis variation means we have to find how these two points A and B are varying in X-axis and Y- axis. In a more mathematical way of saying Manhattan distance between two points measured along axes at right angles.

In a plane with p1 at (x1, y1) and p2 at (x2, y2), Manhattan distance = |x1 – x2| + |y1 – y2|

This Manhattan distance metric is also known as Manhattan length, rectilinear distance, L1 distance or L1 norm, city block distance, taxi-cab metric, or city block distance.

Cosine Similarity

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them.

Cosine similarity metric finds the normalized dot product of the two attributes. By determining the cosine similarity, we would effectively try to find the cosine of the angle between the two objects. The cosine of 0° is 1, and it is less than 1 for any other angle.

It is thus a judgement of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors at 90° have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude.

Cosine similarity is particularly used in positive space, where the outcome is neatly bounded in [0,1]. One of the reasons for the popularity of cosine similarity is that it is very efficient to evaluate, especially for sparse vectors.

Jaccard Coefficient

The Jaccard coefficient is used to measure similarity between sets, and it can be calculated by dividing the size of the intersection by the size of the union of the sets:

We so far discussed some metrics to find the similarity between objects. where the objects are points or vectors. When we consider about Jaccard similarity, this object will be sets. So first let’s learn some very basic about sets.

A set is (unordered) collection of objects {a, b, c}. we use the notation as elements separated by commas inside curly brackets {}. They are unordered so {a, b} = {b, a}.

Cardinality of A denoted by |A| which counts how many elements are in A.

Intersection between two sets A and B is denoted A ∩ B and reveals all items which are in both sets A, B.

Union between two sets A and B is denoted A ∪ B and reveals all items which are in either set.

The Jaccard Coefficient measures the similarity between finite sample sets and is defined as the cardinality of the intersection of sets divided by the cardinality of the union of the sample sets. Suppose you want to find Jaccard similarity between two sets A and B it is the ration of cardinality of A ∩ B and A ∪ B

Similarity J (A, B) = A ∩ B/ A ∪ B

For calculating Similarity between query and given document by using Jaccard Coefficient

Remember: This is just a sample from a fellow student.

Your time is important. Let us write you an essay from scratch

experts 450+ experts on 30 subjects ready to help you just now

delivery Starting from 3 hours delivery

Find Free Essays

We provide you with original essay samples, perfect formatting and styling

Cite this Essay

To export a reference to this article please select a referencing style below:

What is Similarity Measures. (2019, January 28). GradesFixer. Retrieved June 27, 2022, from https://gradesfixer.com/free-essay-examples/what-is-similarity-measures/
“What is Similarity Measures.” GradesFixer, 28 Jan. 2019, gradesfixer.com/free-essay-examples/what-is-similarity-measures/
What is Similarity Measures. [online]. Available at: <https://gradesfixer.com/free-essay-examples/what-is-similarity-measures/> [Accessed 27 Jun. 2022].
What is Similarity Measures [Internet]. GradesFixer. 2019 Jan 28 [cited 2022 Jun 27]. Available from: https://gradesfixer.com/free-essay-examples/what-is-similarity-measures/
copy to clipboard
close

Sorry, copying is not allowed on our website. If you’d like this or any other sample, we’ll happily email it to you.

    By clicking “Send”, you agree to our Terms of service and Privacy statement. We will occasionally send you account related emails.

    close

    Attention! This essay is not unique. You can get a 100% Plagiarism-FREE one in 30 sec

    Receive a 100% plagiarism-free essay on your email just for $4.99
    get unique paper
    *Public papers are open and may contain not unique content
    download public sample
    close

    Sorry, we could not paraphrase this essay. Our professional writers can rewrite it and get you a unique paper.

    close

    Thanks!

    Please check your inbox.

    Want us to write one just for you? We can custom edit this essay into an original, 100% plagiarism free essay.

    thanks-icon Order now
    boy

    Hi there!

    Are you interested in getting a customized paper?

    Check it out!
    Don't use plagiarized sources. Get your custom essay. Get custom paper
    exit-popup-close

    Haven't found the right essay?

    Get an expert to write you the one you need!

    exit-popup-print

    Professional writers and researchers

    exit-popup-quotes

    Sources and citation are provided

    exit-popup-clock

    3 hour delivery

    exit-popup-persone