Program structure and data types

This chapter provides a refresher on the structure of Arity/Prolog32 programs and unification. It also provides detail on the data types supported by Arity/Prolog32.

Program structure

Arity/Prolog32 program development uses conventions and methods that are common to those used for many programming languages, especially 'C'.

  • Prolog program source files may contain directives, comments, and predicate definitions. Good practice for all programming activities encourages modularity, the use of consistent naming and indentation conventions, and sufficient use of comments. We suggest the use of include files for macro definitions and common directives.
  • The Arity/Prolog32 interpreter should be used for experimentation, development and debugging. We suggest that as you complete areas of functionality of your programs you create or extend your own dynamically linked libraries. These may be loaded and used as extensions to your interpreted code. When development has completed, you can create a finished application by compiling a main module which uses these DLLs.
  • The Arity/Prolog32 alint utility can often find typos and other bugs, similar to 'lint' utilities for other languages. It runs quickly and will save you time.
  • Comment liberally. Prolog adopts the same conventions as 'C' and many other languages. A block comment may span multiple lines and appears as /* this is a comment */ . A percent character (%) is used to comment out the remainder of the line in which it appears.
  • While you may have more than one predicate that has the same name but different arities, it is generally not a good idea because you may find your programs to be harder to debug.
  • Prolog predicates should have all clauses defined together. It is not recommended to intermingle clauses from different predicates. Some Prolog implementations enforce this as a restriction.
  • Recall that a Prolog clause or directive is just a Prolog term, but made easier to read with the operator definitions include those for ':-'/1 and ':-'/2. Because the various predicates for reading Prolog clauses into the interpreter and compiler (such as consult/1) use the Arity/Prolog32 read/1 and read/2 predicates, that the same convention of providing a terminating period (.) followed by some white space (such as a <cr>) is used. The period-whitespace is not part of the Prolog term but is used just as a terminator.

Unification

Arity/Prolog32 data types have been extended beyond that typically available within other Prolog implementations. Unification, which we first review, is the same operation in Arity/Prolog32 as in other mainstream Prolog implementations.

In conventional programming languages, the basic computational step is the assignment of values to variables. Prolog uses a much more general type of assignment known as unification. Unification may succeed or fail. If successful, then the two terms that are unified become the same equivalent term. This process is explained in more detail below.

We start with a quick summary of some of the vocabulary used to describe Prolog data structures. More detail about these data structures is provided later.

Term

Prolog programs are built from and manipulate terms. A term is any Prolog data object, including variables

Atomic

A term that is one of the basic data types: a number, a text string, etc.

Structure

A term that consists of a type (a functor) and one or more arguments, each of which is a term.

List

A list is a structure which provides a convenient way to represent an ordered sequence of terms.

Unbound (or uninstantiated or free) variable

An unbound variable is a “placeholder” for a value but does not yet contain a value.

Bound (or instantiated) variable

A bound variable is a variable that has been unified with a structure or atomic value.

Ground Term

A ground term is a term that has no parts (sub terms) that are unbound variables.

Unification is a more general operation than conventional assignment found in most programming languages. Unbound variables may become bound to values and also to other unbound variables. Unification is also defined as a recursive operation on terms that are complex data structures.

We call the event of an unbound variable becoming bound to a term instantiation.

Not all data structures can unify with one another, and it is important to note which data structures can and cannot unify. The following is a list of the effects of unification on combinations of data types.

variable with variable

Unbound variables always unify with other unbound variables, becoming in effect the same unbound variable. Later, if one becomes instantiated, the other also becomes instantiated to the same value.

atomic or structure with variable

The variable becomes instantiated with the value of the atomic or the structure. This is analogous to conventional assignment.

atomic with atomic

The values of the atomics must match by having the same internal representation. For example, unifying the two terms that are each the integer 3 always succeeds. However, unifying the integer 3 with the atom 'Oscar' fails.

structure with structure

The functors of both structures must match, that is, they must have the same name and arity. In addition, each argument must unify. If structures contain substructures, then the unification process is recursive.

Prolog data structures

Arity/Prolog32 provides the ability to build and manipulate complex data structures composed from simpler data types.

Unbound variables

A variable may become instantiated to any other term. In addition, variables may be unified with other variables. When one variable becomes instantiated, the other also becomes instantiated.

A variable name can be any sequence of alphanumeric characters that begins with a capital letter or an underscore (), or it can be a single underscore () to represent an anonymous variable.

Some examples of variable names are:

BinaryTree1
X
_
_0041

You can use the anonymous variable (_) in your code whenever you do not need to unify that variable with any other terms. This arises frequently when you need to unify only some but not all subterms of a term or only some arguments of a predicate. In the following example, the term takes 5 arguments. Only the third and fifth arguments are explicitly named and the first, second, and fourth arguments are distinct anonymous variables. Y and Z are named.

find(_,_,Y,_,Z)

Atoms and Strings

In most Prolog implementations, atoms and strings (if they are supported at all) are distinct data types. In Arity/Prolog32 they are not.

Atoms (or equivalently in Arity/Prolog32, strings) are textual values. They can contain letters, digits, or symbols. In your code, if they begin with an upper-case letter, or a digit, then they must be enclosed in single quotes or in dollar signs. Also in your code, if you use a single quote as an apostrophe inside of the single quotes, it is written as two apostrophes.

Some examples of atoms are:

tennis
'23'
'Paris'
->
'Mary''s'
$Mary's$

An empty atom is written either as two sequential single quotes or two sequential dollar signs:

''
$$

Atoms always unify with atoms that are the exactly the same. This matching is case sensitive.

Most Arity/Prolog32 evaluable predicates that take atoms as arguments are not affected by character set issues, others assume the ASCII character set. It is possible to write Arity/Prolog32 applications that process text in different character sets.

The maximum length of an atom is 512K bytes. Atoms are stored on the global stack and so the practical maximum length may be less, depending on your application and on your environment settings. Practically speaking, individual atoms of thousands or even tens of thousands of bytes in length may be reasonably processed, particularly if your program employs a repeat-fail style of work. There is no limit to the number of atoms that your program may have.

Arity/Prolog32 also supports another style of encoding text: character lists. Refer to the discussion below on lists for more detail.

Integers and Floating-point numbers

Arity/Prolog32 has an integrated approach to the processing of integer and floating point numbers. Whenever a numeric result of a read or a computation (including the is/2 predicate and embedded C calls) is returned to Prolog, it is checked to see if it can be exactly represented as a signed 32-bit integer. If so, it is represented internally as an integer, if not, it is represented internally as a IEEE double precision (64-bit) float. The implications of this are the following:

  • Less space is consumed on the stacks and in the database.
  • Calculations using the is/2 predicate and other arithmetic predicates is consistent with expectations.
  • Unification of numbers is more consistent (but be wary of matching two floats because two floats that appear to be the same when written may have different low precision bits).
  • The float/1 predicate succeeds for arguments that are numbers that cannot be represented as a signed 32-bit integer.
  • The integer/1 predicate succeeds for arguments that are numbers that can be represented as a signed 32-bit integer.
  • The number/1 predicate succeeds for any numeric argument.
  • The float_text/3 predicate now will accept integers as well as floats as its first argument.
  • The write and display predicates output a number as an integer whenever that number is may be represented as a signed 32-bit integer and otherwise will output a number in general floating point format.

Integers are positive and negative whole numbers with a range between -2,147,483,648 and 2,147,483,647 (a 32-bit signed integer). Some examples of integers are:

299
234567
-889
0

Integers always unify with integers of the same value.

ASCII characters are also integers when input using the read/1 or similar predicates (i.e., those used for Prolog term input from textual sources). A convenient notation for the integer ASCII value of a character is that character prefaced by a single backquote character (`). For example, the following two terms are equivalent:

`a
97

Floating-point numbers allow Prolog to perform arithmetic on fractions. Floating-point numbers have a range of -1.7e308 to 1.7e308. Floating-point numbers have 15 decimal places of accuracy.

A floating-point number must follow a specific format when input using the read/1 or similar predicates. This format is: an optional sign followed by at least one digit followed by a decimal point followed by at least one digit optionally followed by the exponent. The exponent is in the form E (or e) followed by an optional sign followed by an integer in the range -308 to 308.

Some examples of floating-point numbers are:

0.5
55.3
-83.0E21
2134.2
122.345e25

The following terms are not accepted as floating-point numbers:

5.
.5
-17.23E-1000

Floating point numbers unify with other floating point numbers. However, be aware floating-point operations are inexact and attempting to unify two seemingly identical numbers may not succeed. You should never rely on exact comparison of floating point numbers for any program, regardless of programming language; Unification of floating point numbers is no exception.

Database references

Database references (or “database refs”) serve as unique identifiers for terms stored in the database. Everything stored in the database is assigned its own unique database reference by the system; you should not attempt to invent your own database references. These values are displayed as a tilde (~) followed by eight hexadecimal digits, such as:

~00255515
~00035D18

Database references always unify with themselves.

Structures

Structures are general purpose data types that may be used to group or express a relationship between terms. A structure is composed of an object (the functor) and its arguments. The number of arguments must be between 1 and 255 (a "structure" with 0 arguments is an atom).

The following is an example of a Prolog data structure:

book(plants, publisher(green_house))

Structures may be unified with other structures. They unify if the functors of the structures match and if each of the respective arguments unify. For example, the following structures unify:

book(Author, Title)
book(james, $The lonely tree$)

The following do not unify because the atom charles does not unify with the atom chuck:

name(jones, charles)
name(jones, chuck)

The number of arguments for a structure is often called its "arity." The name of a structure and its arity together are known as the "functor." A common notation for a functor is "name/arity." For example, the functor associated with the binary operation of addition is '+'/2.

Structures may be complex, having sub-structures, etc. The outermost functor is frequently called the "principal functor." Database key and hash table keys may be structures but only the principal functor is significant.

In general, structures are represented in prefix Polish notation. That is, the functor is listed first, followed by its arguments. However, as a notational convenience, operators may be defined such that Prolog will accept a term if it is not in prefix Polish form. For example, the functor :-/2 is declared as an infix operator, so the clause:

:- (animal(X), human(X)).

is accepted by the Prolog read predicate when it is in the form

animal(X) :- human(X).

Arithmetic expressions are examples of structures in which operator definitions are commonly used. For example, the structure +(3,2) is usually written as

3 + 2

Two special operators are defined in Arity/Prolog32 that each take a single argument. The following pairs are equivalent:

'[!!]'(X)
[! X !]

'{}'(X)
{ X }

For additional information about operators refer to the chapter on Arity/Prolog32 input and output predicates.

Lists and list notation

Lists are a specific type of Prolog data structure that is commonly used to represent an ordered sequence of terms. Any Prolog term (for example, variables, structures, or other lists) may be elements in a list. List notation is a convenient way to write a list. In list notation, each element of the list is separated by a comma, and the entire set of elements is placed inside brackets ([]). For example, the list of elements a, b and c is written in list notation as:

[a, b, c]

The notation that we use for lists is merely a convenience (a "syntactic sugar") for less terse representations of the equivalent Prolog terms. A list is built from a "cons functor" which is '.'(Head, Tail). The empty list which is also used to signify an empty list is the atom '[]'. For convenience, Arity/Prolog32 code may also represent the empty list as an opening square bracket '[' followed by white space followed by as closing square bracket ']'.

The list above consisting of the elements a, b and c could have been written as

'.'(a,'.'(b,'.'(c,[])))

The list with one element, a, would be written as [a] in list notation, but without using list notation it is written as

'.'(a,[])

As you can see, list notation is both more convenient to write as well as read.

There is also a notation for indicating the head of the list and the tail of the list. You use a vertical bar (|) to separate the head from the tail. Thus, using head / tail list notation, the list [a, b, c, d] is represented as any of the following:

[a|[b ,c, d]]
[a,b|[c, d]]
[a,b,c|[d]]

Most often you will see code that uses variables within list notation that is used to unify one or more leading elements from the rest (the tail). For example the following can be unified with a list to bind H1 to the first element, H2 to the second element, and T to the remaining:

[H1,H2|T]

Character lists are a common form of lists that have their own notation. A character list is a list of integers corresponding to the ASCII values of a sequence of characters. Character lists are enclosed in double quotes. The following three lists are equivalent:

"abc"
[97, 98, 99]
'.'(97, '.'(98, '.'(99,[])))