Formulation and Validation of the

Axiomatic
Semantics of OWL

**Abstract:
**

The constructs of the Web Ontology
Language (OWL) have
been described in a first-order axiomatic theory. A
theorem prover was used to ensure that intended consequences follow
from the
axioms. The theorem prover searched
for, and failed to find, inconsistencies in the axioms.
Several new consequences of the axioms were
discovered by the theorem prover, redundant axioms were eliminated, and
other
axioms were simplified. The axiomatic
theory of OWL, with a first-order theorem prover, may be used as a
reasoner for
the Full version of OWL.

The OWL language has been developed to annotate Web pages with semantic tags so that their contents will be more easily understood by machine. It includes facilities for presenting ontologies that define the meaning of the semantic tags. The language is based on a description logic, which admits procedures for deciding logical implication and checking consistency. There are, however, three layers of OWL: OWL-lite, OWL-DL, and OWL-Full; in this effort we are concerned with OWL-Full, which is the more expressive than a description logic but is not decidable.

OWL has been adopted by the DARPA DAML program as its recommended Semantic Web markup language. It is the latest of a series of languages, in which its predecessor was called DAML+OIL. These languages were defined by an informal reference document and two formal presentations: a model-theoretic semantics and an axiomatic semantics.

Validation of the earlier DAML+ OIL axioms is described in another document. By application of the validation methodology described in this paper, many flaws were discovered in those axioms. These included missing axioms, unnecessarily complex axioms, redundant axioms, and logical inconsistencies. We even found inconsistencies in the axioms for the logical language KIF, on which the DAML+OIL axioms were based; these axioms had been available for scrutiny on the Web since 1998. None of these were indications of conceptual problems; rather, they were easily fixed and only occurred because the axioms had not been subjected to a validation process.

It is now generally recognized that, no matter how carefully a program has been written, it is likely to contain bugs unless it has been tested, validated, or verified in some way. On the other hand, people are likely to accept without question the correctness of a set of logical axioms; expressing a sentence in logic gives it an aura of correctness. Yet the same kinds of errors that occur in programs will also occur in axioms; even cursory examination with an automated formal tool will usually uncover problems.

For the OWL axioms presented here, therefore, it was decided to perform the formulation and the validation of the axiomatic semantics hand in hand. With each group of new axioms, some intended consequences were posed as conjectures. The theory was thus presented as a sequence of axioms and conjectures; each conjecture was expected to follow from the previous axioms and conjectures, and we used a theorem prover to attempt to prove them. Failure to prove a conjecture suggested that either the axioms were too weak or that we were wrong to expect the conjecture to follow. If the conjecture was proved, however, it was regarded as a theorem and could be used to prove subsequent conjectures.

Also we periodically ran consistency checks. In this mode, we would reason forward from the axioms and theorems to see if a contradiction (the sentence false) could be derived. If so, we would examine the proof to see which axioms had been used in proving the contradictory sentence; at least one of those axioms had to be false.

Failure to find a contradiction does not necessarily establish consistency, even if we let the theorem prover run all night. To prove consistency it would suffice to construct a model for the axioms, a concrete interpretation for the symbols of the theory under which all the axioms are true, but this would be more time consuming. Furthermore, even if we had proved consistency, it would not necessarily follow that the axioms were true of the intended model of OWL; they could be consistent but incorrect.

There are advantages to running consistency checks other than giving marginally better confidence in our axioms. Every so often the theorem prover would derive a result that caused many other results to be simplified or subsumed altogether. In this case, it was advantageous to add the new result to the theory as a theorem. Also, axioms could be eliminated if they were subsumed by earlier axioms and theorems. Sometimes the theorem prover would point out axioms and theorems that could be simplified.

In addition to proving conjectures that
we invented
ourselves, we also proved conjectures that were posed as test cases in
the OWL
documentation. Time did not permit
proving all the test cases. Again,
although proving all the test cases would give us more confidence in
the
axioms, it would not prove their correctness, since nothing guarantees
the
correctness or adequacy of the test cases.

The formulation of the OWL theory was conducted in Specware, an automated environment created by the Kestrel Institute. In support of its primary application of formal methods for developing software, Kestrel has facilities for constructing, manipulating, and verifying axiomatic theories. It enables us to develop a theory in a structured and modular way, by combining and altering simpler theories. It also has an interface that connects it to a selection of automated theorem provers. For this work, we have used the interface with the theorem prover SNARK.

SNARK (Stickel) is a first-order-logic theorem prover with equality, associative-commutative unification, and sorts. It contains a variety of standard machine inference rules, including resolution, paramodulation, and rewriting. It has a procedural-attachment mechanism, which allows certain computations, such as arithmetic operations, to be performed procedurally rather than axiomatically; it would be cumbersome to use the axioms of arithmetic to compute 22+22, when the computer already has built-in procedures for that purpose. SNARK also contains strategic controls that enable us to tailor it to exhibit high performance in a specific subject domain; for instance, we can declare an ordering on symbols to indicate which symbols are preferred and which are to be eliminated.

An interface developed by Kestrel translates axioms and conjectures from the language of Specware into the language of SNARK, so that SNARK can prove that the conjecture is implied by earlier axioms and conjectures. While Specware is higher-order and SNARK is first-order, the interface can translate some higher-order constructs into first-order logic. Also in formulating the axioms we have paraphrased some higher-order features in first-order logic.

While other environments could be used for the described work, the Specware/SNARK combination had many advantages, in addition to the features we have mentioned. Specware enables us to check all the conjectures in a theory with a single command, and reports on the time required to prove each conjecture. This makes it easy to compare alternative axiomatizations. Specware also keeps track of which strategic settings were used to prove each conjecture. The notation of Specware is close to conventional mathematical notation, so the axioms can be read and understood. Because of the modular construction of theories, we can verify a conjecture in the subtheory to which it applies, which limits the number of axioms that need be considered in the search for a proof; the structure of our OWL theory reflects the structure of the OWL documentation. Thus, instead of forcing us to think in terms of individual theorems, the environment allows us to maintain, experiment with and alter a complex theory structure.

We will not attempt to present the
entire language here, and
we will not adhere to OWL notation; the axioms also are not in OWL
notation. We present just enough here
to understand the discussion in this report; the axiom file contains
quotations
from the reference manual and other sources. The language contains *individuals*;
*classes*, which correspond to unary relations; and *properties*,
which are binary relations. There are
no general n-ary relations. For
example, if John
and
Sally are individuals and hasParent is a property, we
may
write hasParent(John,
Sally) to mean that Sally is a parent of John; we will also say
that Sally
is related to John
by the hasParent
property.

In addition, there are constructs for
describing, combining,
and manipulating classes and properties.
These include the usual set-theoretic functions and relations,
such as
union, intersection, and subclass; operators such as inverse and
composition;
and the *cardinality restrictions*, which allow reference to
quantities. There are no variables or
logical quantifiers. The language has
some higher-order features, in that everything is an individual, so
that
classes and properties may be members of classes and arguments of
properties.

We describe some of the cardinality
restrictions, whose
intended consequences were among the most complicated to prove. Suppose p
is a property (binary relation) and n
is a nonnegative integer; then the *cardinality restriction on **p
of value **n*,
written Cardinality(p,n), is the class of all
individuals that
are related by p to
exactly n individuals. For example, Cardinality(hasParent,2)
is the class of all individuals x
that have exactly two parents y. Thus, if Person
is the class of all people and SubClass
is the ordinary subclass relation between classes,

SubClass(Person,
Cardinality(hasParent,2))

means that all people have precisely two parents; such a statement might be included in an ontology that describes people.

Similarly, maxCardinality(p,n)is the class of all individuals that are related by property p to at most n individuals, and minCardinality(p,n) is the class of all individuals that are related by p to at least n individuals. Thus we might say

SubClass(Person,
maxCardinality(hasSpouse,1))

to indicate that all people have at most one spouse (in the described society, at least), and

SubClass(Person,
minCardinality(residesAt,1))

to indicate that all people reside in at least one place.

To describe the cardinality restrictions and other constructs, we introduced fragments of set theory and number theory. OWL classes differ from conventional sets in that it is possible for two distinct classes to have precisely the same elements. For instance, the class of students in the course Advanced Sanskrit 404 might be exactly the same as the class consisting of George Smith, but in OWL these classes are regarded as distinct because their definitions are not equivalent, and they themselves may have distinct properties. If two classes have the same elements, they do satisfy the property equivalentClass.

In the theory of classes and numbers we can introduce the ordinary cardinality function card(C), the number of elements in a class C. This is defined by axioms

card(C) = 0 <=>

equivalentClass(C,Nothing)

card(addOne(x,C)) =

if Type(x,C)

then card(C)

else 1+card(C),

for all individuals x and classes C. Here Nothing is the empty class; addOne(x,C) adds the element x to the class C, if it is not there already; and Type is the OWL version of the member relationÂ—Type(y,C) means that y is an element of the class C. (Actually Specware and SNARK do not have the conditional term construct, so the second axiom above is broken into two.)

If p is a property and x an individual, we define image(x,p), the image of x under p, to be the class of individuals that are related to x by p; thus

image(John, hasParent)

is the class of people related to John by the hasParent relation, i.e., the parents of John. This is defined by the axiom

Type(y, image(x, p)) <=>
p(x,y).

In other words,

image(x,p)
= {y | p(x,y)}.

Note that here p
occurs as both a predicate symbol and an argument, which takes us
outside the
first-order-logic syntax that SNARK accepts; in writing the axioms, we
actually
write Holds(p,x,y)
instead of p(x,y).

We can then define the cardinality restriction by the axiom

Type(x, Cardinality(p,n)) <=> card(image(x,p)) = n,

for all individuals x, properties p, and nonnegative integers n. In other words, an individual x belongs to the cardinality restriction on p of value n if the image of x under p has cardinality n.

Similarly, we define the other cardinality restrictions by axioms

Type(x, maxCardinality(p, n)) <=> card(image(x,p)) ≤ n

and

Type(x, minCardinality(p, n))
<=>
card(image(x,p)) ≥ n,

for all individuals x, properties p, and nonnegative integers n.

Once we have formulated the axioms for a construct, we articulate intended consequences, i.e., statements we expect those axioms to imply. For example, we expect that if an individual belongs to the cardinality restriction on p of value 2, it is related by property p to two distinct elements:

Type(x,
Cardinality(p,2)) =>

ex(y1 :
Individual, y2 : Individual)

p(x,y1) & p(x,y2) & y1≠y2,

for all individuals x and properties p. In general, ex(x:C)F, a sorted existentially quantified sentence, means that for some entity x in class C, the condition F is true. Note that one cannot say this directly in OWL itself, because there are no existential quantifiers in OWL.

Furthermore, one would expect that it is impossible to find three distinct individuals that are related to x by p:

Type(x,
Cardinality(p,2)) =>

~(ex(y1 :
Individual, y2 : Individual, y3 : Individual)

p(x,y1) &
p(x,y2) & p(x,y3) & allDifferent(<y1,y2,y3>).

Here allDifferent(t) means that all the elements of the tuple t are distinct.

The proofs of these depend on the corresponding properties of classes and tuples, such as that a class of cardinality 2 contains 2 distinct elements, i.e.,

card(C) = 2 =>

ex(y1 :
Individual, y2 : Individual)

Type(y1,C) &
Type(y2,C) & y1≠y2,

and that a class of cardinality 2 cannot contain 3 distinct elements, i.e.,

card(C)
= 2 =>

~(ex(y1 :
Individual, y2 : Individual, y3 : Individual)

Type(y1,C)&
Type(y2,C) & Type(y3,C) &

allDifferent(<y1,y2,y3>).

These are proved by SNARK from the axioms for card and other constructs.

We expect that SNARK would fail to prove the corresponding propositions for classes of cardinality much higher than 2Â—these are versions of the pigeonhole principle, which is known to be challenging to resolution theorem provers. We have, however, established generalizations of these properties for classes of any cardinality n.

While we proved a number of test cases from the OWL documentation, the cardinality conjectures were more challenging. The test cases are written in OWL itself, and hence cannot contain explicit quantifiers, which add to the difficulty of the reasoning.

Sometimes we would formulate conjectures that the theorem prover could not prove, and which on reflection turned out not to be true. For example, for any property p and class C, OWL defines

someValuesFrom(p,C)

to be the class of all individuals x which are related by p to some element y of C; thus

someValuesFrom(hasParent,
Monkey)

is the class of individuals having a monkey for a parent.

OWL also defined the class Thing to be the class that contains all individuals. We conjectured that, for any property p, every individual belongs to

someValuesFrom(p,Thing),

i.e.,

equivalentClass(Thing,
someValuesFrom(p,Thing)).

SNARK failed to prove this, and in fact can prove its negation. If we take p to be false_property, which is false for any individuals, it turns out that no individual belongs to someValuesFrom(p,Thing). Therefore we discarded this conjecture.

We remarked that we searched for inconsistencies in the axioms by reasoning forward and seeking a contradiction. This uncovered problems in earlier versions of the axioms, but with the current set we have not found inconsistencies even after letting the theorem prover run several hours.

We have found a number of useful theorems about OWL by examining the proof trace for results that caused many other results to be subsumed or simplified. For instance, SNARK proved the theorem

Type(x,
minCardinality(p,0)),

for any individual x and property p. Intuitively this means that any individual is related by the relation p to 0 or more individuals. Apparently this formula appeared as a condition in many of the results obtained by SNARK, which were then simplified when this theorem was proved. When the theorem was added to the theory, the proof times for subsequent conjectures were reduced.

Another useful result observed by SNARK is that

~(allDifferent(<x, x, …>)),

for all individuals x. In other words, any tuple beginning with two copies of the same element cannot consist of distinct elements. (Actually this result was paraphrased in terms of the tuple constructs, since the logic does not allow the ellipsis notation.)

Another advantage of reasoning forward from the axioms was that SNARK discovered simplifications of axioms and conjectures. For instance, we had originally formulated the axiom

~(C = Nothing)
=>

~(Type(choice(C),
remove(choice(C),C))),

for all classes C. Here choice(C) is an arbitrary element of C and remove(x,C) is the result of removing x from C; thus, the axiom said that, for a nonempty class C, the selected element choice(C) does not belong the result of removing that element from the class. SNARK observed that, since no individual belongs to the empty class and the result of removing an element from the empty class is still the empty class, the hypothesis ~(C = Nothing) is in fact unnecessary. The axiom was simplified to

~(Type(choice(C),
remove(choice(C),C))).

In all we formulated about 75 axioms for OWL and 65 conjectures; when these were translated into conjunctive normal form, this resulted in about 175 clauses. Conjectures were given a time limit of 10 seconds each. All the conjectures were proved, and the proofs typically required less than a second each; the most difficult required about 6 seconds. About six different strategic settings were required, and it was determined empirically which setting was appropriate for a given conjecture. The settings varied in such dimensions as whether the set of support was used, whether numerical computations were performed axiomatically or procedurally, and whether binary resolution or hyperresolution was used. Although we had hoped to find a single setting that would be appropriate for proving new conjectures and checking OWL ontologies for consistency, we can still imagine a single strategy that tries each of the six settings in turn.

There are several efforts in progress similar to ours. Surnia (Hawke) applies the first-order theorem prover Otter (McCune) to axioms for OWL-Full. It has the advantage that the axioms are phrased in a function-free logic (relations only), so that Otter is a decision-procedureÂ—it is complete and always terminates. This means that failure to find a proof implies that the conjecture is not a logical consequence of the axioms. I believe that the axioms do not describe the arithmetic and set-theoretic constructs we use in proving the cardinality conjectures.

The theorem prover Vampire (Voronkov and Riazanov) has been applied to axioms for OWL-DL and has succeeded in establishing most of the OWL-DL test cases (Tsarkov, Riazanov, Bechhofer, and Horrocks). The system has also been tested on larger ontologies. The authors have developed a servelet for translating from OWL syntax into conventional logical notation; this servelet would also allow the Specware/SNARK framework to prove test cases expressed in OWL notation, but we have not yet attempted this.

In a similar effort, the theorem prover Gandalf is being applied by Tammet to axioms for a subset of OWL.

Until we have applied the Specware/SNARK system to more test cases, it is difficult to compare our work with theirs. None of these efforts seem intended to provide an alternative definition of OWLÂ—in fact, the axioms themselves are not presented for examination. Rather they seem aimed at establishing the inconsistency of ontologies and at proving that a conclusion is entailed by an ontology.

This work adds to the evidence that general-purpose theorem proving, in addition to being an object of research effort, can be a useful tool to assist research in other topics.

This work has been carried out under the
guidance and
encouragement of Richard Fikes, Pat Hayes, and Deborah McGuinness. It employs the Specware software development
environment of the Kestrel Institute and the SNARK theorem prover of
SRI
International. David Cyrluk created the
link between Specware and SNARK, and Mark Stickel developed SNARK
itself. Thanks also to David Martin, Jerry
Hobbs,
Grit Denker, Paul Kogut, Mitch Kokar, Ken Baclawski, and Sandro Hawke
for
support and suggestions. This work has
been supported by the DARPA DAML project, through contracts to the
Kestrel
Institute (via Lockheed Martin) and SRI International.