Home — Essay Samples — Science — Linguistics — What is Similarity Measures

What is Similarity Measures

Categories: Language Linguistics

Human-Written

About this sample

Human-Written

Words: 1017 |

Pages: 2|

6 min read

Published: Jan 29, 2019

Words: 1017|Pages: 2|6 min read

Published: Jan 29, 2019

Semantic Similarity
Similarity Measures
Euclidian Distance
Manhattan Distance
Cosine Similarity
Jaccard Coefficient

Semantic Similarity

Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between them is based on the likeness of their meaning or semantic content as opposed to similarity which can be estimated regarding their syntactical representation (e.g. their string format). These are mathematical tools used to estimate the strength of the semantic relationship between units of language, concepts or instances, through a numerical description obtained according to the comparison of information supporting their meaning or describing their nature.

The similarity is subjective and is highly dependent on the domain and application. For example, two fruits are similar because of colour or size or taste. Care should be taken when calculating distance across dimensions/features that are unrelated. The relative values of each element must be normalized, or one feature could end up dominating the distance calculation. Similarities are measured in the range 0 to 1 [0,1].

Similarity Measures

A Similarity Measure is the measure of how much alike two data objects are. Similarity measure in context of data mining is a distance between points of dimensions representing features of the objects. If this distance is small, it will be the high degree of similarity where as a large distance will be the low degree of similarity.

A Similarity Measure is also known as Similarity Function which is a real-valued function that quantifies the similarity between two objects. Although no single definition of a similarity measure exists, usually such measures are in some sense the inverse of distance metrics: they take on large values for similar objects and either zero or a negative value for very dissimilar objects.

Similarity between two documents or document Vs query terms: A similarity measure can be used to calculate similarity between two documents, two queries, or one document and one query.

Document Ranking: similarity measure score can be used to rank the documents.

All clustering algorithms use similarity or so called “distance functions” to determine cluster members. Few of the most popular similarity measures are discussed in the following subsections.

Euclidian Distance

It is a standard metric for geometrical problems. It is the ordinary distance between two points and can be easily measured with a ruler in two- or three-dimensional space. Euclidean distance is widely used in clustering problems, including clustering text. It is also the default distance measure used with the K-means algorithm. Measuring distance between text documents: given two documents, da and db represented by their term vectors ta and tb respectively. The Euclidean distance of the two documents is defined as:

Where, the term set is T = {t1, t2,..….., tn}In this calculation Wt,a = tf-idf(da,t)

Euclidean distance is the most common use of distance. In most cases when people said about distance, they will refer to Euclidean distance. Euclidean distance is also known as simply distance. When data is dense or continuous, this is the best proximity measure.

Manhattan Distance

Manhattan distance is a metric in which the distance between two points is the sum of the absolute differences of their Cartesian coordinates. In a simple way of saying it is the total sum of the difference between the x-coordinates and y-coordinates.

Suppose we have two points A and B if we want to find the Manhattan distance between them, just we have, to sum up, the absolute x-axis and y – axis variation means we have to find how these two points A and B are varying in X-axis and Y- axis. In a more mathematical way of saying Manhattan distance between two points measured along axes at right angles.

In a plane with p1 at (x1, y1) and p2 at (x2, y2), Manhattan distance = |x1 – x2| + |y1 – y2|

This Manhattan distance metric is also known as Manhattan length, rectilinear distance, L1 distance or L1 norm, city block distance, taxi-cab metric, or city block distance.

Cosine Similarity

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them.

Cosine similarity metric finds the normalized dot product of the two attributes. By determining the cosine similarity, we would effectively try to find the cosine of the angle between the two objects. The cosine of 0° is 1, and it is less than 1 for any other angle.

It is thus a judgement of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors at 90° have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude.

Cosine similarity is particularly used in positive space, where the outcome is neatly bounded in [0,1]. One of the reasons for the popularity of cosine similarity is that it is very efficient to evaluate, especially for sparse vectors.

Jaccard Coefficient

The Jaccard coefficient is used to measure similarity between sets, and it can be calculated by dividing the size of the intersection by the size of the union of the sets:

We so far discussed some metrics to find the similarity between objects. where the objects are points or vectors. When we consider about Jaccard similarity, this object will be sets. So first let’s learn some very basic about sets.

A set is (unordered) collection of objects {a, b, c}. we use the notation as elements separated by commas inside curly brackets {}. They are unordered so {a, b} = {b, a}.

Cardinality of A denoted by |A| which counts how many elements are in A.

Intersection between two sets A and B is denoted A ∩ B and reveals all items which are in both sets A, B.

Union between two sets A and B is denoted A ∪ B and reveals all items which are in either set.

The Jaccard Coefficient measures the similarity between finite sample sets and is defined as the cardinality of the intersection of sets divided by the cardinality of the union of the sample sets. Suppose you want to find Jaccard similarity between two sets A and B it is the ration of cardinality of A ∩ B and A ∪ B

Similarity J (A, B) = A ∩ B/ A ∪ B

For calculating Similarity between query and given document by using Jaccard Coefficient

Works on Analysis of Word Organization Overview

Synthetic phonics and the teaching of reading

This essay was reviewed by

Alex Wood

More about our Team

Cite this Essay

What is Similarity Measures. (2019, January 28). GradesFixer. Retrieved April 8, 2025, from https://gradesfixer.com/free-essay-examples/what-is-similarity-measures/

“What is Similarity Measures.” GradesFixer, 28 Jan. 2019, gradesfixer.com/free-essay-examples/what-is-similarity-measures/

What is Similarity Measures. [online]. Available at: <https://gradesfixer.com/free-essay-examples/what-is-similarity-measures/> [Accessed 8 Apr. 2025].

What is Similarity Measures [Internet]. GradesFixer. 2019 Jan 28 [cited 2025 Apr 8]. Available from: https://gradesfixer.com/free-essay-examples/what-is-similarity-measures/

copy

Keep in mind: This sample was shared by another student.

450+ experts on 30 subjects ready to help
Custom essay delivered in as few as 3 hours

Get high-quality help

Prof. Kifaru

Verified writer

Expert in: Science

4.7

(412 reviews)

“Really polite, and a great writer! Task done as described and better, responded to all my questions promptly too! ”

+120 experts online

Hire writer

Learn the cost and time for your paper

Paper Topic

Deadline: in 10 days

Number of pages

Email Invalid email

By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email

"You must agree to out terms of services and privacy policy"

Get an estimate

No need to pay just yet!

Remember! This is just a sample.

You can get your custom paper by one of our expert writers.

Get custom essay

121 writers online

Still can’t find what you need?

Browse our vast selection of original essay samples, each expertly formatted and styled

What is Similarity Measures

Table of contents

Semantic Similarity

Similarity Measures

Euclidian Distance

Manhattan Distance

Cosine Similarity

Jaccard Coefficient

Cite this Essay

Still can’t find what you need?

Get Your
Personalized Essay in 3 Hours or Less!

What is Similarity Measures

Table of contents

Semantic Similarity

Similarity Measures

Euclidian Distance

Manhattan Distance

Cosine Similarity

Jaccard Coefficient

Cite this Essay

Related Essays

Still can’t find what you need?

Related Essays on Linguistics

Related Topics

Get Your Personalized Essay in 3 Hours or Less!

Get Your
Personalized Essay in 3 Hours or Less!