Inference in Queries

As an experiment, we consider what kinds of inference is required to answer one of the queries for searching for documents. We phrase this query in terms of the ontologies and theories we have developed for bibliographical references, names, addresses, topics, and dates. We then use SNARK to find answers to this query, based on theories we have developed for these areas.

Before we give the example, it may be useful to explain some basic principles of our ontology. We distinguish between strings, names, and entities. For example, "James Hendler" is a string, (personq "James Hendler") is a name, and james-hendler is a person.

While james-hendler refers to a unique individual (ultimately a URI), the name (personq "James Hendler") is shared by several people. While the name (personq "Washington") is uniquely a name of a person, the same string "Washington" can stand for a person, a city, or a state. These distinctions seem pedantic, but when we try to ignore them we get into trouble.

The relation person-val relates a name of a person with the person itself. Thus, if

  (person-val ?personq ?person)
holds, ?personq is a name for ?person. By convention, ?person, ?person0, ?person1, etc., are variables that range over people; on the other hand, ?personq, ?personq0, etc., range over names of people.

A paper is of sort paper; a reference to a paper is of sort paperq. Thus we think of a reference as a kind of name for a paper.

Other concepts are best explained in the context of the example. We consider the first query from the homework assignment:

Find a reference to the most recent paper on SHOE with James Hendler as a co-author.
This query may be phrased as follows:
(get 
  ?paperq
 such-that
  (and
   (pub-val ?paperq ?paper)
   (author ?paper ?person)
   (person-val (personq "James Hendler") ?person)
   (about ?paper (topic "SHOE"))
   (= (pub-to-year ?paper) (year-fn ?natural)))
 prefer
  starts-after-starting-of
 on
  (year-fn ?natural)
 time-limit
  10)

In other words, we want to find a reference ?paperq such that

Furthermore, if there are several papers found that satisfy above criteria, we prefer the later one. The relation starts-after-starting-of is a relation on time intervals such that

  (starts-after-starting-of ?time-interval1 ?time-interval2)
holds if ?time-interval1 starts after ?time-interval2; thus
  (starts-after-starting-of (year-fn 2000) (year-fn 1997))

We execute this by first finding a single paper reference that meets all our criteria. We then look for another one, with a later publication date. We do not ask for the latest of all of Hendler's publications, since we can never be sure if we have seen all of them; instead, we give the search a time limit of 10 seconds, and return the latest paper we have found in that time.

Let us follow a few steps on the inference process by which an answer was found.

We begin with a logical sentence obtained from the query:

(Row 193
 (or (not (pub-val ?paperq ?paper))
     (not (author ?paper ?person))
     (not (person-val (personq "James Hendler") ?person))
     (not (about ?paper (topic "SHOE")))
     (not (= (pub-to-year ?paper) (year-fn ?integer&nonnegative))))
 Answer (ans ?paperq (year-fn ?integer&nonnegative)))
Note that, because SNARK is a refutation procedure, queries are negated, and inference proceeds until a contradiction is obtained. Also note that the query, and its logical descendents, is accompanied by an Answer expression indicating what answer we expect to obtain from the proof. Because our preference is based on the year, we include the year of publication as part of our answer. The expression integer&nonnegative is SNARK's internal notation for the sort of natural numbers.

Formulas and their associated answers are called "rows."

We assume the existence of a name service that gives the URI of people with name "James Hendler"; here that service is simulated by assertions. Using that assertion on the third disjunct, we obtain the next row

(Row 194
     (or (not (pub-val ?paperq ?paper))
         (not (author ?paper james-hendler))
         (not (about ?paper (topic "SHOE")))
         (not (= (pub-to-year ?paper) (year-fn ?integer&nonnegative))))
     (resolve 193 name-of-james-hendler)
     Answer (ans ?paperq (year-fn ?integer&nonnegative)))
Here we have decided to look for publications of the individual with URI james-hendler.

We assume we have access to DamL-annotated publication lists; for the time being, we simulate the contents of the annotation in SNARK notation:

  (assert
   '(pub-val
     (inproceedingsq
      author (coq
              (personq "Sean Luke")
              (personq "Lee Spector")
              (personq "David Rager")
              (personq "James Hendler"))
      title (titleq "Ontology-based Web Agents")
      booktitle (titleq "Proceedings of the First International Conference 
       on Autonomous Agents (Agents97)")
      year (year-fn 1997)
      publisher (publisherq "Association for Computing Machinery")
      address (cityq "New York" "NY" "US")
      topic (topic "SHOE"))
     shoe-acm-paper)
   :name 'shoe-acm-paper-reference)
This is a description of a 1997 paper on SHOE, of which Hendler is a co-author. Here
  (coq ?personq1 ?personq2 ...)
is a publication list containing names ?personq1, ?personq2, ....

The reference indicates that the paper appears in a conference proceedings, gives the title of the paper, the title of the proceedings, the date of publication, the publisher, the address of the publisher, and the topic. The notation for references in this theory is based on that of Bibtex.

Names, indicated by the keyword :name, have no logical or functional significance; they are used only to make the trace easier for us to follow.

This assertion applies to the first disjunct of Row 194; we obtain

(Row 202
  (or (not (author shoe-acm-paper james-hendler))
      (not (about shoe-acm-paper (topic "SHOE")))
      (not (= (pub-to-year shoe-acm-paper)
              (year-fn ?integer&nonnegative))))
   (resolve 194 shoe-acm-paper-reference)
   Answer (ans
           (inproceedingsq author
            (coq (personq "Sean Luke")
                 (personq "Lee Spector")
                 (personq "David Rager")
                 (personq "James Hendler"))
            title (titleq "Ontology-based Web Agents")
            booktitle
            (titleq
             "Proceedings of the First International Conference on
              Autonomous Agents (Agents97)")
            year (year-fn 1997)
            publisher
            (publisherq "Association for Computing Machinery")
            address
            (cityq "New York" "NY" "US")
            topic (topic "SHOE"))
           (year-fn ?integer&nonnegative)))

Note that the answer already contains a reference satisfying the conditions of our query, but we have not yet checked that all these conditions are satisfied. We also have not yet found the date of this publication, although it is implicit in the reference.

Let us skip ahead to the end of this particular refutation:

(Row 321
     false
     (resolve 320 name-of-lee-spector)
     Answer
     (ans
      (inproceedingsq
       author
       (coq (personq "Sean Luke")
            (personq "Lee Spector")
            (personq "David Rager")
            (personq "James Hendler"))
       title (titleq "Ontology-based Web Agents") booktitle
       (titleq
        "Proceedings of the First International Conference on
         Autonomous Agents (Agents97)")
       year (year-fn 1997)
       publisher
       (publisherq "Association for Computing Machinery")
       address
       (cityq "New York" "NY" "US")
       topic (topic "SHOE"))
      (year-fn 1997)))
Here we have verified all the conditions and have determined that the date of the publication we have found is 1997.

Since we want the latest possible publication, we begin a new query, to find a publication later than 1997:

(Row 324
     (or (not (pub-val ?paperq ?paper))
         (not (author ?paper ?person))
         (not (person-val (personq "James Hendler") ?person))
         (not (about ?paper (topic "SHOE")))
         (not (= (pub-to-year ?paper) (year-fn ?integer&nonnegative)))
         (not (starts-after-starting-of
               (year-fn ?integer&nonnegative)
               (year-fn 1997))))
     Answer (ans ?paperq (year-fn ?integer&nonnegative)))

This query is the same as the one we started with, except it contains an additional disjunct

  (starts-after-starting-of
   (year-fn ?integer&nonnegative)
   (year-fn 1997))
In other words, we insist that the publication we find is more recent than 1997.

The refutation proceeds similarly to what we have seen already, except this time we find the following reference:

  (assert
   '(pub-val
     (inproceedingsq
      author (coq
              (personq "Jeff Heflin")
              (personq "James Hendler"))
      title (titleq "Searching the Web with SHOE")
      booktitle (titleq "Artificial Intelligence for Web Search.
                         Papers from the AAAI Workshop.")
      year (year-fn 2000)
      publisher (publisherq "AAAI Press")
      number (pubnumber "WS-00-01")
      address (cityq "Menlo Park" "CA" "US")
      topic (topic "SHOE"))
     shoe-aaai-paper)
   :name 'shoe-aaai-paper-reference)

SNARK uses a temporal reasoning procedure, based on the Allen temporal calculus, to check temporal constraints, whether they arise directly from the query or are a consequence of our preferences.

The AAAI paper satisfies all the constraints and is our next best selection:

(Row 421
  false
   (resolve 420 name-of-jeff-heflin)
   Answer
   (ans
    (inproceedingsq
     author
     (coq
      (personq "Jeff Heflin")
      (personq "James Hendler"))
     title
     (titleq "Searching the Web with SHOE")
     booktitle
     (titleq "Artificial Intelligence for Web Search.
              Papers from the AAAI Workshop.")
     year (year-fn 2000)
     publisher (publisherq "AAAI Press")
     number
     (pubnumber "WS-00-01")
     address (cityq "Menlo Park" "CA" "US")
     topic (topic "SHOE"))
    (year-fn 2000)))

SNARK will continue to look for Hendler SHOE papers more recent than 2000, but we have not given assertions for any. When the time limit is exceeded, it will return the reference for the IEEE paper.

It is also possible to obtain lists of papers, ordered by our preference, so that more recent are returned first.


Richard Waldinger
2000-12-15