The role of reasoning in the application of ontologies
Originally published on Thu, 04/04/2013 - 11:31
Latest Revision: Mon, 12/23/19 - 1:47
This is a short note about the role of reasoning in the application of ontologies. I am writing this because often I hear arguments about things that can lead good efforts astray such as that:
- Ontologies are sufficient to encode knowledge about some domain.
- Some particular ontological formalism (say, semantic nets, or frames, or description logics, or ???) is the "best" way to encode knowledge.
- A "universal" ontology integrating knowledge from all ontologies is possible.
I do not want to get too technical in this note; Instead, I want to sketch some arguments and ideas to make my points about these fallacies.
Fallacy 1: Ontologies are sufficient to encode knowledge.
This fallacy is easy to demonstrate. Almost every domain of knowledge is a mixture of assertions (facts) and procedural knowledge. Imagine creating an automated system for doing arithmetic on a computer. (Ah, but there is one built in, you say!) The arithmetic portion of the CPU has circuitry for binary arithmetic. But imagine building up an ontology for arithmetic –
An ontology for arithmetic would presumably encode an enormous number of facts. It would be enormously wasteful. But as we learned more math, we learned some things that could be efficiently encoded in an ontology - the rules of trigonometry, calculus, that sort of thing. In fact, let's encode some rules of differentiation using the fragment of first order logic called "Horn clauses" which has the same expression as the Prolog language.
Rule | Prolog | English equivalent |
---|---|---|
$\small\frac{dx}{dx} = 1$ | d(X,X,1). | "the derivative of X with respect to itself is 1. |
$\small\frac{dC}{dx} = 0$ | d(C,X,0) :- number(C). | "the derivative of a constant C with respect to X is 0 |
$\small{\frac{d(A+B)}{dx} = \frac{d(A)}{dx}+\frac{d(B)}{dx}}$ | d(U+V,X,A+B) :- d(U,X,A), d(V,X,B). | "the derivative of the sum is the sum of the derivatives" |
$\small\frac{d(A \cdot B)}{dx}\tiny =A \small \frac{d(B)}{dx}\tiny +B \small\frac{d(A)}{dx}$ | d(U*V,X,B*U+A*V) :- d(U,X,A), d(V,X,B). | "the derivative of a product is the first * the derivative of the second plus the second * the derivative of the first" |
$\small\frac{d(\sin(A))}{dx} = \scriptsize \cos(A)\small\frac{d(A)}{dx}$ | d(sin(T),X,R*cos(T)) :- d(T,X,R). | "the derivative of the sin is the cos - using the chain rule" |
Although the Prolog rules look a bit odd compared to what we saw in our intro calculus classes the correspondence is quite clear. But we learned more in class: We learned the systematic procedural knowledge to apply these rules (and a handful more) so we could solve problems that were appropriately situated. This procedural knowledge enabled us to reason (rather mechanically) using whatever rules we happened to be able to recall. A bit more complex is the integral calculus where more search using different methods (such as integration by parts) would enable us to recognize how to apply "anti-derivative" rules. Experience allowed us to learn how to search more efficiently through combinations of tricks (rules) that we had memorized.
Ontologies encode some of our knowledge. However, ontologically encoded knowledge is only brought to life through some kind of reasoning. Sometimes the reasoning takes the form of automating the organization of the ontology itself - such as when we can automatically infer the "is a" hierarchical organization of concepts from the comparison of their descriptions. Sometimes reasoning takes the form of "walking" relationships within the ontology such as finding indirect causation by following individual "causes" links. And sometimes reasoning takes the form of some overarching algorithmic or heuristic procedure that uses the ontology as a resource of encoded knowledge.
"Applied ontologies" are the combination of ontologically encoded knowledge and procedural knowledge. The ontologically encoded knowledge is sometimes called "declarative knowledge" - it asserts statements and relationships that are held to be true.
This brings us to:
Fallacy 2: There exists a "best" way to encode knowledge.
Oh, I wish. But the not so simple truth is that the encoding of ontological knowledge constrains the efficiency and the decidability of the reasoning methods that use that knowledge.
The deepest reasons go into the depths of logic. As it turns out, in the 1930s the entire positivist program in mathematics and science which aimed to axiomize all knowledge was proven to be impossible. Roughly stated, Kurt Gödel proved that there cannot be any universal algorithm which will guarantee that it will solve in any finite time all problems, or even for many setups composed of what appear to be perfectly innocent looking problems in perfectly simple logical statements.
The bottom line is that if you put too much representational power in your ontology language + reasoning procedure then you will be faced with the unpleasant problem that there will be nice looking questions that will not be answerable in any predictable time, no matter how long. And worse, even if your ontology language + reasoning procedure is decidable you might find that certain (deceptively simple in appearance) questions are subject to falling off "computational cliffs" - they are not tractable.
The approach that software development has always taken is to match the representation of data to the kinds of procedures that need to be executed. The same holds true of knowledge representation and reasoning, particularly where the coupling is tight. If, for example, a natural language processor needs to use a large lexicon, then an organization such as that used by WordNet (essentially frames) is often considered best. If, for example, metabolic pathways are to be represented, then a semantic net style such as that used by KEGG is appropriate (although this can also be represented very well by frames).
The applied ontology approach picks the representation and reasoning method best matched to the problem. OK, so the next fallacy seems clear - but is there hope?
Fallacy 3: A "universal" ontology could encode any and all knowledge.
Knowledge representations (which are declarative in nature) fall very roughly into four classes: frames, semantic nets, production rules, and various forms of clausal logic. Each representation discipline has spawned its own perspective on what kinds of tightly-coupled reasoning works best. Frame people have moved in the direction of various description logics (an approach I favor). Semantic net people generally favor very ad hoc graph transversal. Production rule people (few to find these days!) advocate RETE or backward chaining. Logicians have of course moved the furthest and shine light on the future for us all, particularly in the area of fusions of modal logics.
Fusions of modal logics are very interesting because they point the way toward greater universality of representation and reasoning. Simply, the idea is to partition the representation of knowledge in such a way that for a very broad class of reasoning problems we can find a way of breaking each problem into parts that individually use a decidable (and hopefully tractable) subset of logic. Then hopefully, we can find a recombination of the solved parts to get a full solution that is also decidable (tractable). This divide-and-conquer approach to reasoning has much appeal: With a bit of cleverness in how we think about knowledge representation we can solve a much large class of reasoning problems. It also seems psychologically real in that the approach seems to parallel our "common sense." Much of the most interesting work now being done in frame-based reasoning is the re-casting of these results from modal logic.
In the larger sense then, the important thing is that there are ways to create the appearance of a more "universal" representation by cleverly applying certain partitions of what may be represented and how reasoning is applied over the span of what needs to be maintained. The universality is not achieved by dumping all knowledge into the same representation. The universality (such as it may be) is achieved by carefully stitching together the knowledge minding the trade-off between expressive power and computational tractability.