By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email
No need to pay just yet!
About this sample
About this sample
Words: 449 |
Page: 1|
3 min read
Updated: 16 November, 2024
Words: 449|Page: 1|3 min read
Updated: 16 November, 2024
Most previous works in the database literature have focused on indexing lower-dimensional data and on other types of queries besides similarity queries. The lc-d tree was one of the first structures proposed for indexing multidimensional data for nearest neighbor queries. Recently, this structure has been used in geographic information systems for queries like similarity queries and might be useful for similarity indexing. Other methods, such as space-filling curves, linear quad trees, and grid files, do not scale well to high dimensions but may be useful for medium-dimensional data.
The R-tree and its most successful variant, the R*-tree, have been used most often for indexing high-dimensional data in the database literature. However, since ranges are stored on each dimension, the index requires more space and time to search in higher dimensionality. For this reason, higher-dimensional data typically is mapped to a lower-dimensional space before indexing in R-trees.
The TV-tree is the only method in the database literature thus far that has been proposed specifically for indexing high-dimensional data. Performance comparisons clearly show that the TV-tree can be much more efficient than the R*-tree. However, the improved performance depends on two assumptions. The first assumption is that dimensions and the feature vectors are ordered by “importance.” This second assumption is that sets of feature vectors in the dataset will tend to exactly match on dimensions, especially on the first few “important” dimensions.
The first assumption is reasonable (if not desirable) since an appropriate transform may be used. The second assumption was not explicitly stated in the paper, but a careful analysis of their algorithms reveals that their performance improvement depends upon it. In some applications, the original feature vectors contain a small set of discrete quantities, so the second assumption does hold. Unfortunately, this second assumption will normally not be true in visual information systems, and in many other applications. Features in these applications are typically real-valued, so that chances of exactly matching on dimensions are negligible. In this case, the TV-tree reduces to an index on only the first few dimensions. Small changes in the proposed algorithms should allow the TV-tree to be a modest improvement over the R*-tree in these applications. However, in this paper, we will refer to the R-tree (and variants) as the best previously known structure for similarity indexing because it has proven itself in more similarity indexing applications (Beckmann et al., 1990; Guttman, 1984).
There is also related work outside of the database literature. In the information retrieval literature, work has been done on cluster files that propose structures similar to the SS-tree. In the image database community, a static indexing structure based on Kohonen nets was suggested. There is also related work in the computational geometry and vector quantization literature. These fields offer valuable insights and potential improvements for database indexing, suggesting that interdisciplinary approaches may further enhance our understanding and capabilities in similarity indexing (Kohonen, 1990; Jain & Dubes, 1988).
The exploration of various indexing methods for multidimensional data highlights the complexity and importance of finding efficient solutions for similarity queries. While methods like the TV-tree show promise, their applicability is limited by certain assumptions. As database demands evolve, continuous research and innovation will be crucial in developing robust and adaptable indexing structures.
Browse our vast selection of original essay samples, each expertly formatted and styled