B+ tree

A B+ tree is an m-ary tree with a variable but often large number of children per node. A B+ tree consists of a root, internal nodes and leaves. The root may be either a leaf or a node with two or more children.
A B+ tree can be viewed as a B-tree in which each node contains only keys, and to which an additional level is added at the bottom with linked leaves.
The primary value of a B+ tree is in storing data for efficient retrieval in a block-oriented storage context — in particular, filesystems. This is primarily because unlike binary search trees, B+ trees have very high fanout, which reduces the number of I/O operations required to find an element in the tree.
The ReiserFS, NSS, XFS, JFS, ReFS, and BFS filesystems all use this type of tree for metadata indexing; BFS also uses B+ trees for storing directories. NTFS uses B+ trees for directory and security-related metadata indexing. EXT4 uses extent trees for file extent indexing. Relational database management systems such as IBM DB2, Informix, Microsoft SQL Server, Oracle 8, Sybase ASE, and SQLite support this type of tree for table indices. Key–value database management systems such as CouchDB and Tokyo Cabinet support this type of tree for data access.

Overview

The order, or branching factor, of a B+ tree measures the capacity of nodes for internal nodes in the tree. The actual number of children for a node, referred to here as , is constrained for internal nodes so that. The root is an exception: it is allowed to have as few as two children. For example, if the order of a B+ tree is 7, each internal node may have between 4 and 7 children; the root may have between 2 and 7. Leaf nodes have no children, but are constrained so that the number of keys must be at least and at most. In the situation where a B+ tree is nearly empty, it only contains one node, which is a leaf node. This node is permitted to have as little as one key if necessary and at most .

Node Type	Children Type	Min Number of Children	Max Number of Children	Example	Example
Root Node	Records	1		1–6	1–99
Root Node	Internal Nodes or Leaf Nodes	2		2–7	2–100
Internal Node	Internal Nodes or Leaf Nodes			4–7	50–100
Leaf Node	Records			4–7	50–100

Algorithms

Search

The root of a B+ Tree represents the whole range of values in the tree, where every internal node is a subinterval.
We are looking for a value k in the B+ Tree. Starting from the root, we are looking for the leaf which may contain the value k. At each node, we figure out which internal pointer we should follow. An internal B+ Tree node has at most ≤ children, where every one of them represents a different sub-interval. We select the corresponding node by searching on the key values of the node.
function search is
return tree_search

function: tree_search is
if node is a leaf then
return node
switch k do
case k ≤ k_0
return tree_search
case k_i < k ≤ k_
return tree_search
case k_d < k
return tree_search
This pseudocode assumes that no duplicates are allowed.

Prefix key compression

It is important to increase fanout, as this allows to direct searches to the leaf level more efficiently.
Index Entries are only to 'direct traffic', thus we can compress them.
Insertion
Perform a search to determine what bucket the new record should go into.
If the bucket is not full, add the record.
Otherwise, before inserting the new record
* split the bucket.
** original node has ⎡/2⎤ items
** new node has ⎣/2⎦ items
* Move ⎡/2⎤-th key to the parent, and insert the new node to the parent.
* Repeat until a parent is found that need not split.
If the root splits, treat it as if it has an empty parent and split as outline above.

B-trees grow at the root and not at the leaves.

Bulk-loading

Given a collection of data records, we want to create a B+ tree index on some key field. One approach is to insert each record into an empty tree. However, it is quite expensive, because each entry requires us to start from the root and go down to the appropriate leaf page. An efficient alternative is to use bulk-loading.

The first step is to sort the data entries according to a search key in ascending order.
We allocate an empty page to serve as the root, and insert a pointer to the first page of entries into it.
When the root is full, we split the root, and create a new root page.
Keep inserting entries to the right most index page just above the leaf level, until all entries are indexed.

Note :

when the right-most index page above the leaf level fills up, it is split;
this action may, in turn, cause a split of the right-most index page one step closer to the root;
splits only occur on the right-most path from the root to the leaf level.
Characteristics

For a -order B+ tree with levels of index:

The maximum number of records stored is
The minimum number of records stored is
The minimum number of keys is
The maximum number of keys is
The space required to store the tree is
Inserting a record requires operations
Finding a record requires operations
Removing a record requires operations
Performing a range query with k elements occurring within the range requires operations
Implementation

The leaves of the B+ tree are often linked to one another in a linked list; this makes range queries or an iteration through the blocks simpler and more efficient. This does not substantially increase space consumption or maintenance on the tree. This illustrates one of the significant advantages of a B+tree over a B-tree; in a B-tree, since not all keys are present in the leaves, such an ordered linked list cannot be constructed. A B+tree is thus particularly useful as a database system index, where the data typically resides on disk, as it allows the B+tree to actually provide an efficient structure for housing the data itself.
If a storage system has a block size of B bytes, and the keys to be stored have a size of k, arguably the most efficient B+ tree is one where. Although theoretically the one-off is unnecessary, in practice there is often a little extra space taken up by the index blocks. Having an index block which is slightly larger than the storage system's actual block represents a significant performance decrease; therefore erring on the side of caution is preferable.
If nodes of the B+ tree are organized as arrays of elements, then it may take a considerable time to insert or delete an element as half of the array will need to be shifted on average. To overcome this problem, elements inside a node can be organized in a binary tree or a B+ tree instead of an array.
B+ trees can also be used for data stored in RAM. In this case a reasonable choice for block size would be the size of processor's cache line.
Space efficiency of B+ trees can be improved by using some compression techniques. One possibility is to use delta encoding to compress keys stored into each block. For internal blocks, space saving can be achieved by either compressing keys or pointers. For string keys, space can be saved by using the following technique: Normally the i-th entry of an internal block contains the first key of block. Instead of storing the full key, we could store the shortest prefix of the first key of block that is strictly greater than last key of block i. There is also a simple way to compress pointers: if we suppose that some consecutive blocks are stored contiguously, then it will suffice to store only a pointer to the first block and the count of consecutive blocks.
All the above compression techniques have some drawbacks. First, a full block must be decompressed to extract a single element. One technique to overcome this problem is to divide each block into sub-blocks and compress them separately. In this case searching or inserting an element will only need to decompress or compress a sub-block instead of a full block. Another drawback of compression techniques is that the number of stored elements may vary considerably from a block to another depending on how well the elements are compressed inside each block.

History

The B tree was first described in the paper Organization and Maintenance of Large Ordered Indices. Acta Informatica 1: 173–189 by Rudolf Bayer and Edward M. McCreight. There is no single paper introducing the B+ tree concept. Instead, the notion of maintaining all data in leaf nodes is repeatedly brought up as an interesting variant. An early survey of B trees also covering B+ trees is Douglas Comer. Comer notes that the B+ tree was used in IBM's VSAM data access software and he refers to an IBM published article from 1973.

Implementations

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...