# B-trees

A B-tree is a well known efficient method of indexing data in the database. B-tree indexes are balanced trees where leaf elements are equidistant from the root. Retrieval and updates of records traverse the a B-tree from the root following a path through interior nodes that progressively narrow intervals that a key must fall within.

In Arity/Prolog32 you can define as many B-trees as you wish, subject to database size limitations (2 Gbytes). Each B-tree may be structured independently. Each node can have up to 85 branches. If you do not specify otherwise, then each node consists of 17 branches. The bottom level of the B-tree is made up of the leaves, which are the Arity/Prolog32 terms that are to be stored in the B-tree. Often, you may find it convenient to store database references as the terms maintained in the B-tree.

All B-tree predicates operate within the current workspace and data world.

## Specifying the characteristics of a B-tree

An Arity/Prolog32 application may define the characteristics of each B-tree which govern its internal storage structure, whether each index key must be unique, the ordering of keys stored and retrieved, and whether keys are case sensitive.

defineb(+BTree, +SplitSize, +Uniqueness, +Order)

defineb(+BTree, +SplitSize, +Uniqueness, +Order, +Case)

The defineb/4 and defineb/5 predicates are used when you want to define specific attributes of a B-tree. Depending on your application, specifying B-tree attributes can result in more efficient B-tree organization or can be used to customize B-tree organization.

The attributes which may be specified are:

SplitSize

The SplitSize argument indicates how many branches a specific node of a B-tree can have before it splits into a new node. For example, if the SplitSize is set to five, each node of the B-tree will have at most five branches. The larger the SplitSize argument, the fewer levels of branching the B-tree will make. For instance, 100 items indexed with a split size of 17 will require fewer B-tree branches than 100 items indexed with a split size of 11.

The SplitSize must be an odd integer from 3 to 85. You must specify an odd integer due to the internal representation of the split size. The default split size, which is used for all B-trees not explicitly defined using defineb, is 17. For most applications, the default split size results in the most efficient B-tree organization. However, if you are creating a very large B-tree, you may want to increase split size to increase efficiency. If your B-tree is indexed using a large number of complex keys, you may want to decrease split size for maximum efficiency. The optimal split size is application specific, so you may need to experiment to achieve maximum efficiency.

Uniqueness

A unique B-tree is one in which each index key is unique and each unique key stores a single unique term. For example, if you are creating a B-tree which uses social security numbers as the index key, and the term stored under the key is a person's driver's license number, then the B-tree would be unique. The Uniqueness argument can be set to true or false. If set to true, any attempt to record more than one term under a particular key will fail. Setting the Uniqueness argument to false does not eliminate the possibility of a unique B-tree, it simply means that the uniqueness of the keys will not be checked by the system. The default uniqueness setting is false.

Order

The Order argument can be used to specify the order of retrieval of elements of a composite (complex) key. A composite key is one which contains more than a single piece of information and is represented by a Prolog structure. For example, the composite key student(Name,Grade) gives both the name and grade of the student. Unlike other terms which are stored in a B-tree under a single element, a term stored under a composite key is stored under the whole structure:

recordb(grades,student(adam,b),student(adam,b)).
recordb(grades,student(ralph,c),student(ralph,c)).

The Order argument is used to specify the order in which composite keys are stored in a B-tree. Since a composite key contains more than a single piece of information, you can specify the order in which keys are returned by specifying the order of each item of information returned by the key. The Order argument is in the form:

keyOrder(argOrder,argOrder,...)

The keyOrder and argOrder arguments can be either + (plus), indicating ascending order, or - (minus), indicating descending order. The order refers to the elements placement in the Prolog standard order (see section 7.3). For example, suppose you have created a B-tree named grades which stores composite keys representing the names and grades for the student in a class. The structures containing the names and grades are grouped according to whether a student is a freshman (1), sophomore (2), junior (3), or senior (4). A partial listing of the keys under which information is stored is:

'1'(jones,a).
'1'(james,d).
'1'(barnes,b).
'2'(tyler,f).
'2'(tyler,c).
'3'(regis,a).
'4'(daniels,b).

You want to return the terms in the following order:

1. Seniors should be returned first, followed by juniors, sophomores, and freshman.
2. Names should be returned in alphabetical order.
3. Grades should be returned so that the highest grade (a) is returned first.

This is accomplished with the following Order argument:

-(+,+)

This will cause the terms to be returned as:

'4'(daniels,b).
'3'(regis,a).
'2'(tyler,c).
'2'(tyler,f).
'1'(barnes,b).
'1'(james,d).
'1'(jones,a).

As a shorthand notation, you can provide the value + as the Order argument to indicate that all elements of the composite key should be returned in ascending order. You can provide the value - as the Order argument to indicate that all elements of the composite key should be returned in descending order.

Case

You can specify with the Case argument whether or not comparisons within the B-tree are sensitive to case. If the Case argument is 0 then the comparison will be case sensitive. If the Case argument is 1, then the comparison will be case insensitive and the following keys will be considered the same:

employee('Paul',15)
'Employee'('paul',15)

The default value for the Case argument is to be case sensitive (0).

Note: When you specify that a B-tree is case insensitive, only the first key specified when adding terms is kept in the B-tree.

Thus, to define a unique B-tree with a split size of 25 and all elements selected in case insensitive and ascending order, you provide the following:

defineb(example_tree, 25, true, +, 1).

The defineb/4 and defineb/5 predicates can only be used when the B-tree is empty. If any terms have been stored in the B-tree, then any call to defineb/4 or defineb/5 will fail.

## Adding terms to a B-tree

recordb(+BTree, +Key, +Term)

The recordb predicate is used for recording terms in a btree. The BTree argument is the name you give to the B-tree. In the student database example, the tree name is student. The Key argument is the term by which the terms you store in the B-tree can be accessed. For example, you could sort the student database using any one of the terms in the student clause: Name, ID, Test, or Total. The Key argument needs to be a ground term (no variables within the term). The Term argument is the term that you want recorded in the B-tree.

To illustrate how the recordb predicate is used, consider the example of creating a B-tree to be used for sorting the student information by student name. The term for ralph is entered in the B-tree as follows:

?- recordb(student,ralph,student(ralph,01,93,270)).

The term is stored in the B-tree with the Key of 'ralph'. The remaining students can be entered in the B-tree as follows:

?- recordb(student,susan,student(susan,24,75,250)).
?- recordb(student,linda,student(linda,5,53,188)).
?- recordb(student,dean,student(dean,33,0,160)).

replaceb(+BTree,+Key,+Old,+New)

The replaceb/4 predicate is used to replace a term at a particular position in the B-tree. You must specify both the Old term to be replaced and the New term. For example, the following replaces a student term with a new student term that has a different test score:

?- replaceb(student,ralph,student(ralph,01,93,270),student(ralph,01,88,265)).

## Reviewing terms in a B-tree

Several mechanisms are provided for reviewing the keys and terms stored in a B-tree. You can retrieve the terms and keys singularly through backtracking or collected into a list. Since the B-trees are ordered, you can take advantage of the order to direct the search within the B-tree.

retrieveb(+BTree, ?Key, ?Term)

To return the terms in sorted order, you use the retrieveb/3 predicate.

For example, to display the student database sorted by student name, you type the following:

?- retrieveb(student,X,Y).
X = dean
Y = student(dean,33,0,160) ->;
X = linda
Y = student(linda,5,53,188) ->;
X = ralph
Y = student(ralph,01,93,270) ->;
X = susan
Y = student(susan,24,75,250) ->;
no

If you want to see the term associated with a particular key, you provide both the B-tree name and key. For example:

?- retrieveb(student,susan,X).
X = student(susan,24,75,250) ->;
no

retrieveb_terms(+BTree,+Key,-Terms)

Similar to the retrieveb/3 predicate, the retrieveb_terms/3 predicate returns all of the Terms associated with a Key in a BTree. The Terms are returned in a list in the order that they are stored in the B-tree.

For instance, if you had an employee database indexed on last names with the first name and length of employment sorted in the term, you may have several employees with the last name Smith:

?- recordb(employees,smith,employee(jennifer,4)),
recordb(employees,smith,employee(tom,1)),
recordb(employees,smith,employee(fred,2)).

You may want to gather the employee terms for all employees with the last name Smith:

?- retrieveb_terms(employees,smith,X).
X = [ employee(jennifer,4),employee(tom,1),employee(fred,2) ]
yes
?-

betweenb(+BTree, +Key1, +Key2, +Relation1, +Relation2, -Key, -Term)

The betweenb/7 predicate is used when you want to return a specific portion of a B-tree. The Key1 and Key2 arguments indicate the keys between which you want the information returned. The order (ascending or descending) in which the keys are returned depends on whether Key1 comes before or after Key2 in the order specified for the BTree with the defineb/4 and defineb/5 predicates. For example, if Key1 is jane and Key2 is alan, betweenb returns the keys between alan and jane in descending order from jane.

The Relation1 and Relation2 arguments indicate whether the limit keys are returned or not. A relation argument can be =, <, or >. An equal (=) indicates that the limit key is returned as part of the solution. A less than (<) indicates that all items less then the limit key (up to the lower limit) are returned, but the limit key is not returned. A greater than (>) indicates that all items greater then the limit key (up to the upper limit) are returned, but the limit key is not returned. However, note that the use of greater than and less than must limit the return space to a single portion of the B-tree. That is, the upper bounds of a returned section can only use an = or < as a relation, and a lower bounds of a returned section can only use an = or > as a relation.

For instance, suppose you have stored the student terms in a B-tree under keys representing the test score of a student. You name this B-tree score. You want to return terms for students who passed with scores between 70 and 100, inclusive. Since you want the highest scores to be returned first, you use 100 as the Key1 argument:

?- betweenb(score, 100, 70 ,=, =, Key, Student).
Key = 100
Key = 93
Student = student(ralph,01,93,270) ->;
Key = 75
Student = student(susan,24,75,250) ->;
no

The betweenb/7 predicate can be used as the basis for writing other convenient B-tree searching predicates. For instance, the predicate forwardb/4 can be defined that searches forward from a specific point in a B-tree and returns the keys and terms from that point forward:

forwardb(Btree, Start_key, Return_key, Term) :-
betweenb(Btree, Start_key, _, _, _, Return_key, Term).

Similarly, you can write a backwardb/4 predicate to search backward from a specific point in a B-tree. However, to write such a predicate, you need to be able to indicate that the upper limit of the search is the first entry in the B-tree. The special notation $begin is used to indicate the first entry in a B-tree. Thus, the backward predicate is written as: backwardb(Btree, Start_key, Return_key, Term) :- betweenb(Btree, Start_key, '$begin', _, _, Return_key, Term).

betweenkeysb(+BTree, +Key1, +Key2, -Key)

The betweenkeysb/4 predicate is used to return only the keys between two specific points in the B-tree. If Key1 is before Key2 in the standard order, then keys are returned in ascending order. If Key1 is after Key2 in the standard order, then keys are returned in descending order.

retrieveb_nth(+BTree, +N, -Key)

The retrieveb_nth/3 predicate is used to return the Nth key stored in a B-tree. Keys are numbered from 1.

retrieveb_keys(+BTree, -Keys)

retrieve_terms(+BTree, -Terms)

These predicates can be used to return either all of the Keys or all of the Terms stored in a B-tree in the order that they are stored.

btree_count(+BTree, -Count)

The number of unique keys in a BTree is returned by the btree_count/2 predicate.

what_btrees(-Btree)

The what_btrees/1 predicate returns, through backtracking, the names of each of the existing B-trees.

## Removing terms from a B-tree

A single term, all terms associated with a particular key or all keys in a B-tree can be removed using the following predicates.

removeb(+BTree,+Key)

removeb(+BTree,+Key,+Term)

The removeb/2 and removeb/3 predicates are used to delete terms from a B-tree. The BTree argument specifies the B-tree, the Key is the key under which the term(s) are stored. If the Term argument is not used, then all terms indexed by Key will be removed. If Term is specified, then only that Term will be removed.

removeallb(+BTree)

To delete an entire BTree and all its terms, use the removeallb/1 predicate with the name of the B-tree that you want to delete.