Hash tables

A hash table is a set of key-value pairs. Access, insertion, and deletion of a key-value pair in a hash table is made efficient by computing a numerical value from the key even if the key is a non-numeric data value such as a text string or structure. The "hashed value" is used as an offset index to quickly locate in a table one or more candidate matches to a full unhashed key value. The actual strategy of locating and verifying an actual match from a candidate hashed value may be "hash and chain" (used in Arity/Prolog32) or "rehash" but the idea is the same: Attempt to complete an operation in essentially the same (fast!) time regardless of the number of entries in the table. This differs from B-trees where the average access time grows as the log of number of entries in the table.

An important difference between hash tables and B-trees in Arity/Prolog32 is that hash tables only use the principal functor of the Key argument used for recording and retrieving. In contrast, the entire Key argument is used in B-tree operations.

Hash tables work well when the hash function creates values that are approximately evenly distributed in range, such that the keys span the table with a minimum of collisions. The hash function used within Arity/Prolog32 was chosen based on a variety of use cases that were thought to be common. The number of hash buckets for a hash table in Arity/Prolog32 may be adjusted using the defineh/2 predicate but is limited to a maximum of 1018.

Generally, a hash table uses less storage space and is faster than a B-tree. However, a B-tree returns data in sorted order, whereas a hash table does not. You should consider using a hash table over a B-tree when fast access is important but sorted term retrieval is not. You should experiment with both hash tables and B-trees (including varying defineb/4, defineb/5, and defineh/2 parameters) to determine what works best for your application. Generally speaking, both methods are fast.

Arity/Prolog32 application developers sometimes use only B-trees because the sorted order of B-tree retrieval and added flexibility is convenient during development and testing.

Note that hash tables are used for managing the namespaces of code worlds and data worlds. All predicate names, for example, are stored in an internal system hash table.

Terms recorded in a hash bucket using the recordh/3 predicate cannot be accessed by other database predicates. For example, you cannot use recorded/3 to search for terms in the hash table, and you cannot use recorda/3 or recordz/3 to add terms to the hash table.

All hash table predicates operate within the current workspace and data world.

Hash table predicates

defineh(+Table, +HashBuckets)

The defineh/2 predicate is used to specify the number of hash buckets to be created for a hash table. The Table argument is the name of the hash table you are defining. The HashBuckets argument is an integer indicating the number of hash buckets to be created for the hash table. The use of the defineh/2 predicate is optional.

If you use recordh/3 without using defineh/2, the system will store the hash table terms in a hash table containing 64 hash buckets (appropriate for less than 1000 distinct keys, depending on your application). Increasing the number of hash buckets may improve hash table performance. Up to a point, the more hash buckets defined, the more efficiently a term stored in a hash table can be retrieved. Each hash bucket uses 4 bytes of database space and all hash buckets for a hash table must fit on a single database page which is 4kb minus a small amount of overhead. The number of hash buckets that will be created is the minimum of HashBuckets and 1018.

Regardless of the number of HashBuckets that a table has, you can store as many keys and as many terms under a key in a table as you would like up to the size limit of the database itself (2 Gb).

recordh(+Table, +Key, +Term)

The recordh/3 predicate is used for recording terms in a hash table. The Table argument is the name you give to the hash table. The Key argument indicates the category under which you want the term recorded. In other words, the Key specifies the hash bucket into which the term should be recorded. The Term argument is the term that is recorded in the hash table.

NOTE: Only the principal functor of the Key is used in a hash table. The value of Key must be instantiated.

replaceh(+Table, +Key, +OldTerm, +NewTerm)

The replaceh/4 predicate can be used to replace the occurrence of OldTerm indexed by Key in a hash Table with NewTerm.

retrieveh(+Table, -Key, -Term)

The retrieveh/3 predicate returns through backtracking all entries in Table that unify with Key and Term. The order of retrieval is not well-defined.

removeh(+Table, -Key, -Term)

The removeh/3 predicate is used to delete one or more terms from the Table. The removeh/3 deletes one stored term at a time, returning through backtracking each term that unifies with Key and Term.

removeallh(+Table)

The removeallh/1 predicate deletes the entire Table and removes its definition.