Ball tree

In computer science, a ball tree, balltree or metric tree, is a space partitioning data structure for organizing points in a multi-dimensional space. The ball tree gets its name from the fact that it partitions data points into a nested set of hyperspheres known as "balls". The resulting data structure has characteristics that make it useful for a number of applications, most notably nearest neighbor search.

Informal description

A ball tree is a binary tree in which every node defines a D-dimensional hypersphere, or ball, containing a subset of the points to be searched. Each internal node of the tree partitions the data points into two disjoint sets which are associated with different balls. While the balls themselves may intersect, each point is assigned to one or the other ball in the partition according to its distance from the ball's center. Each leaf node in the tree defines a ball and enumerates all data points inside that ball.
Each node in the tree defines the smallest ball that contains all data points in its subtree. This gives rise to the useful property that, for a given test point, the distance to any point in a ball in the tree is greater than or equal to the distance from to the ball. Formally:
Where is the minimum possible distance from any point in the ball to some point.
Ball-trees are related to the M-tree, but only support binary splits, whereas in the M-tree each level splits to fold, thus leading to a shallower tree structure, therefore need fewer distance computations, which usually yields faster queries. Furthermore, M-trees can better be stored on disk, which is organized in pages. The M-tree also keeps the distances from the parent node precomputed to speed up queries.
Vantage-point trees are also similar, but they binary split into one ball, and the remaining data, instead of using two balls.

Construction

A number of ball tree construction algorithms are available. The goal of such an algorithm is to produce a tree that will efficiently support queries of the desired type efficiently in the average case. The specific criteria of an ideal tree will depend on the type of question being answered and the distribution of the underlying data. However, a generally applicable measure of an efficient tree is one that minimizes the total volume of its internal nodes. Given the varied distributions of real-world data sets, this is a difficult task, but there are several heuristics that partition the data well in practice. In general, there is a tradeoff between the cost of constructing a tree and the efficiency achieved by this metric.
This section briefly describes the simplest of these algorithms. A more in-depth discussion of five algorithms was given by Stephen Omohundro.

k-d Construction Algorithm

The simplest such procedure is termed the "k-d Construction Algorithm", by analogy with the process used to construct k-d trees. This is an off-line algorithm, that is, an algorithm that operates on the entire data set at once. The tree is built top-down by recursively splitting the data points into two sets. Splits are chosen along the single dimension with the greatest spread of points, with the sets partitioned by the median value of all points along that dimension. Finding the split for each internal node requires linear time in the number of samples contained in that node, yielding an algorithm with time complexity, where n is the number of data points.

Pseudocode

function construct_balltree is
input: D, an array of data points.
output: B, the root of a constructed ball tree.
if a single point remains then
create a leaf B containing the single point in D
return B
else
let c be the dimension of greatest spread
let p be the central point selected considering c
let L, R be the sets of points lying to the left and right of the median along dimension c
create B with two children:
B.pivot := p
B.child1 := construct_balltree,
B.child2 := construct_balltree,
let B.radius be maximum distance from p among children
return B
end if
end function

Nearest-neighbor search

An important application of ball trees is expediting nearest neighbor search queries, in which the objective is to find the k points in the tree that are closest to a given test point by some distance metric. A simple search algorithm, sometimes called KNS1, exploits the distance property of the ball tree. In particular, if the algorithm is searching the data structure with a test point t, and has already seen some point p that is closest to t among the points encountered so far, then any subtree whose ball is further from t than p can be ignored for the rest of the search.

Description

The ball tree nearest-neighbor algorithm examines nodes in depth-first order, starting at the root. During the search, the algorithm
maintains a max-first priority queue, denoted Q here, of the k nearest points encountered so far. At each node B, it may perform one of three operations, before finally returning an updated version of the priority queue:

If the distance from the test point t to the current node B is greater than the furthest point in Q, ignore B and return Q.
If B is a leaf node, scan through every point enumerated in B and update the nearest-neighbor queue appropriately. Return the updated queue.
If B is an internal node, call the algorithm recursively on B's two children, searching the child whose center is closer to t first. Return the queue after each of these calls has updated it in turn.

Performing the recursive search in the order described in point 3 above increases likelihood that the further child will be pruned
entirely during the search.

Pseudocode

function knn_search is
input:
t, the target point for the query
k, the number of nearest neighbors of t to search for
Q, max-first priority queue containing at most k points
B, a node, or ball, in the tree
output:
Q, containing the k nearest neighbors from within B
if distance - B.radius ≥ distance then
return Q unchanged
else if B is a leaf node then
for each point p in B do
if distance < distance then
add p to Q
if size > k then
remove the furthest neighbor from Q
end if
end if
repeat
else
let child1 be the child node closest to t
let child2 be the child node furthest from t
knn_search
knn_search
end if
end function

Performance

In comparison with several other data structures, ball trees have been shown to perform fairly well on
the nearest-neighbor search problem, particularly as their number of dimensions grows.
However, the best nearest-neighbor data structure for a given application will depend on the dimensionality, number of data points, and underlying structure of the data.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...