Hypothesis Schema Design

Version 2002/07

 Ian Harrison, SRI International

 

 

1.   Introduction

 

This document describes the current status of the hypothesis markup language (HypothesisML), which we are proposing to use to describe matches (i.e. the output from a pattern match). HypothesisML includes the PatternML schema and can be used within ControlML, as the subject of a request, or as part of a response. These other schema are described in separate documents.

 

2.   Design of Hypothesis Schema

 

The design goal for HypothesisML was that it was to be used to represent pattern match results, which could be interchanged and understood between different components in the EELD program. HypothesisML adds a layer of belief to the pattern match, with the actual matches being represented using PatternML elements. This allows a single hypothesis to have multiple beliefs as calculated using different methods, as well as allowing multiple different hypotheses to have the same matched pattern.

 

 

The hypothesis schema was developed using Tibco’s XML Turbo tool. Several iterations between members of the SRI EELD team resulted in this current version.

 

3.   Hypothesis Schema

 

hypothesis

The hypothesisBag element (see Figure 1) is the root element for describing an unordered set of hypotheses. hypothesisBag can contain 0 or more hypothesis elements.

 

 

 

 

 

Figure 1: hypothesis element

 

 

A hypothesis element can have 4attributes. id (required) is a unique id within the document, but should be a UUID eventually. The URI attribute is a reference to this hypothesis that is network accessible.The label attribute is just a pretty string given to the hypothesis, e.g. by a user. The class attribute will be used to record the class that this hypothesis is an instance of (if there is a hypothesis class hierarchy). For now the class attribute is just a string.

              

A hypothesis element must contain 1 (and only 1) pattern element (although pattern elements can appear as child elements of the body element, which is a valid child element of pattern), and can also contain 0 or more belief elements.

 

The pattern element is used to define the matched pattern. For efficiency reasons, we’re not expecting a EELD matcher to return the whole pattern passed to it for matching, with match data interspersed within the pattern (as value elements - see PatternML document). Instead it’s likely that pattern matchers will just pass back those parts of the pattern that match. This will be done using a pattern element with node/edge elements as content. Alternatively there is a uri attribute for the pattern element, which allows a URI reference to the pattern to be given, allowing reference to the complete pattern.

 

The belief element is designed to hold information about the current belief in the hypothesis. A belief element can contain 1 (and only 1) method/result pairs (i.e. if there is a method element there must be a result element).

 

A method element is a string description of the method used to calculate the belief.

 

The result element is a string description of the actual belief value, using the associated belief calculation method.

 

The exhibit element describes a piece of evidence used to support the hypothesis. It is most likely used as a child element of the value element of a node. The exhibit element has 3 attributes: id, uri and timestamp. Attribute id (required) is a unique id to the piece of evidence (should be a UUID eventually). The URI attribute is a reference to the piece of evidence if it is network accessible. Timestamp is the time/date that the piece of evidence was created (not when an event occurred, which might be referred to in the evidence).

 

The exhibit element can contain 3 child elements in the sequence: assertion, source, belief. There must be 1 and only 1 assertion element. There must be 1 and only 1 source element. There can be 0 or 1 belief elements.

 

The assertion element describes a fact, and can contain 3 types of child elements in the sequence: negation, relationName, relationArgument. The negation element says whether it is a (not (relation arg1 arg2 ...argn)) style assertion. There can 0 or 1 negation elements. The relationName element is the name of the relation – there must be 1 and only 1 relationName element. There can by 1 or more relationArgument elements. These represent the relation arguments in order.

 

The source element describes the source of a piece of evidence. It is most likely used as the source for an exhibit element. The source element has 3 attributes: id, uri and reliability. Attribute id (required) is a unique id to the source of evidence (can be a person or thing). The id should be a UUID eventually. The URI attribute is a reference to the source if it is network accessible (e.g. online news source).  Reliability is a measur of how reliable a source is for evidence (it’s a number between 0 and 1).

 

ListOfIntegers is provided as an element, to enable one to record a list of integers separated by whitespace. This is most likely to be used as a child element of the value element (e.g. <value><listOfIntegers>1 3 5 7</listOfIntegers></value>).

 

ListOfStrings is provided as an element, to enable one to record a list of delimited strings separated by whitespace. This is most likely to be used as a child element of the value element (e.g. <value><listOfStrings>’StringA’ ‘StringB’ ‘StringC’</listOfStrings></value>).

 

 

Appendix A: HypothesisML Schema

 

<?xml version="1.0" encoding="UTF-8"?>

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

       <xsd:include schemaLocation="http://www.ai.sri.com/~law/schemas/2002/07/pattern"/>

       <xsd:element name="hypothesis">

              <xsd:annotation>

                     <xsd:documentation>A hypothesis element can have 4 attributes. id is a unique id within the document, but is not a uuid for now. If we use this it'll be for convenience to store an object id from a particular application, but this will have no meaning to other applications. The uri is a pointer to the hypothesis, if it network accessible.The label attribute is just a pretty string given to the hypothesis, e.g. by a user. The class attribute will be used to record the class that this hypothesis is an instance of (if there is a hypothesis class hierarchy). For now the class attribute is just a string.

          </xsd:documentation>

                     <xsd:documentation>A hypothesis must contain 1 (and only 1) pattern element, and can also contain 0 or more belief elements.

          </xsd:documentation>

              </xsd:annotation>

              <xsd:complexType>

                     <xsd:sequence>

                           <xsd:element ref="pattern"/>

                           <xsd:element ref="belief" minOccurs="0" maxOccurs="unbounded"/>

                     </xsd:sequence>

                     <xsd:attribute name="uri" type="xsd:anyURI"/>

                     <xsd:attribute name="id" type="xsd:string" use="required"/>

                     <xsd:attribute name="label" type="xsd:string"/>

                     <xsd:attribute name="class" type="xsd:string"/>

              </xsd:complexType>

       </xsd:element>

       <xsd:element name="method" type="xsd:string">

              <xsd:annotation>

                     <xsd:documentation>The method element is a string description of the method used to calculate the belief.

          </xsd:documentation>

              </xsd:annotation>

       </xsd:element>

       <xsd:element name="result" type="xsd:string">

              <xsd:annotation>

                     <xsd:documentation>The result element is a string description of the actual belief value, using the associated belief calculation method.

          </xsd:documentation>

              </xsd:annotation>

       </xsd:element>

       <xsd:element name="belief">

              <xsd:annotation>

                     <xsd:documentation>The belief element is designed to hold information about belief in an object. A belief element can contain 1 and only 1 method/result pair.

          </xsd:documentation>

              </xsd:annotation>

              <xsd:complexType>

                     <xsd:all>

                           <xsd:element ref="method"/>

                           <xsd:element ref="result"/>

                     </xsd:all>

              </xsd:complexType>

       </xsd:element>

       <xsd:element name="exhibit">

              <xsd:annotation>

                     <xsd:documentation>The exhibit element describes a piece of evidence used to support the hypothesis. It is most likely used as a child element of the value element of a node.

          </xsd:documentation>

              </xsd:annotation>

              <xsd:complexType>

                     <xsd:sequence>

                           <xsd:element ref="assertion" minOccurs="1" maxOccurs="1"/>

                           <xsd:element ref="source" minOccurs="1" maxOccurs="1"/>

                           <xsd:element ref="belief" minOccurs="0" maxOccurs="1"/>

                     </xsd:sequence>

                     <xsd:attribute name="id" type="xsd:string" use="required"/>

                     <xsd:attribute name="uri" type="xsd:anyURI"/>

                     <xsd:attribute name="timestamp" type="xsd:dateTime"/>

              </xsd:complexType>

       </xsd:element>

       <xsd:element name="assertion">

              <xsd:annotation>

                     <xsd:documentation>The assertion element describes a fact where the negation element says whether it is a (not (relation arg1 arg2 ...argn)) style assertion. The second child element is the name of the relation and the rest of the child elements are the relation arguments in order.

          </xsd:documentation>

              </xsd:annotation>

              <xsd:complexType>

                     <xsd:sequence>

                            <xsd:element ref="negation" minOccurs="0" maxOccurs="1"/>

                           <xsd:element ref="relationName"/>

                           <xsd:element ref="relationArgument" minOccurs ="1" maxOccurs="unbounded"/>

                     </xsd:sequence>

              </xsd:complexType>

       </xsd:element>

       <xsd:element name="negation" type="xsd:boolean">

              <xsd:annotation>

                     <xsd:documentation>Whether the assertion is negated.

          </xsd:documentation>

              </xsd:annotation>

       </xsd:element>

       <xsd:element name="relationName" type="xsd:string">

              <xsd:annotation>

                     <xsd:documentation>The name of the relation.

          </xsd:documentation>

              </xsd:annotation>

       </xsd:element>

       <xsd:element name="relationArgument" type="xsd:string">

              <xsd:annotation>

                     <xsd:documentation>The relation argument.

          </xsd:documentation>

              </xsd:annotation>

       </xsd:element>

       <xsd:element name="source">

              <xsd:annotation>

                     <xsd:documentation>The source element describes the source of a piece of evidence.

          </xsd:documentation>

              </xsd:annotation>

              <xsd:complexType>

                     <xsd:attribute name="id" type="xsd:string" use="required"/>

                     <xsd:attribute name="uri" type="xsd:anyURI"/>

                     <xsd:attribute name="reliability" type="xsd:string"/>

              </xsd:complexType>

       </xsd:element>

       <xsd:element name="hypothesisBag">

              <xsd:annotation>

                     <xsd:documentation>HypothesisBag is the root of the schema and can conatin 0 or more hypothesis -- i.e. it's a bag for holding all hypothesis.

          </xsd:documentation>

              </xsd:annotation>

              <xsd:complexType>

                     <xsd:sequence>

                           <xsd:element ref="hypothesis" minOccurs="0" maxOccurs="unbounded"/>

                     </xsd:sequence>

              </xsd:complexType>

       </xsd:element>      

       <xsd:element name = "listOfIntegers" type="listOfIntegersType">

            <xsd:annotation>

              <xsd:documentation>list of integers separated by whitespace. </xsd:documentation>

            </xsd:annotation>

        </xsd:element>

       <xsd:simpleType name="listOfIntegersType">

                 <xsd:list itemType = "xsd:integer"/>

       </xsd:simpleType>

       <xsd:element name = "listOfStrings" type="listOfStringsType">

            <xsd:annotation>

              <xsd:documentation>list of delimited strings, separated by whitespace. </xsd:documentation>

            </xsd:annotation>

        </xsd:element>

       <xsd:simpleType name="listOfStringsType">

              <xsd:list itemType = "xsd:string"/>

        </xsd:simpleType>

</xsd:schema>

 

Appendix B: Example HypothesisML File

 

Example HypothesisML files.

 

1) Pattern is a query about all contract killings in Moscow, against a database of evidence (uses example 1 from PatternML document). Three matches come back.

 

<?xml version="1.0" encoding="UTF-8" ?>

<hypothesisBag xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://www.ai.sri.com/~law/schemas/2002/07/hypothesis">

 <hypothesis label="contract murder match 1" id="contract_murder_match1">

   <pattern id="contract_murder1">

    <body>

      <node id="contract_murder.x" label="murder">

        <value>Murder55</value>

      </node>

      <node id="eventOccursAt.contract_murder.x" label="eventOccursAt">

         <value>

          <exhibit id="exhibit123" timestamp="2001-08-03T13:29:18">

           <assertion>

             <relationName>eventOccursAt</relationName>

            <relationArgument>Murder55</relationArgument>

            <relationArgument>'Moscow'</relationArgument>

           </assertion>

           <source id="pravda-newspaper123" uri="www.pravda.ru" reliability="0.9"/>

           <belief>

            <method>analyst rating</method>

            <result>0.97</result>

           </belief>

         </exhibit>

        </value>

      </node> 

      <node id="location.contract_murder.x" label="location">

        <value>'Moscow'</value>

      </node> 

      <edge from="contract_murder.x" to="eventOccursAt.contract_murder.x" id="contract_murder.x.eventOccursAt" label="location"/> 

  <edge from="eventOccursAt.contract_murder.x" to="location.contract_murder.x" id="eventOccursAt.contract_murder.x.location" label="location"/> 

    </body>

   </pattern>

   <belief>

    <method>graph-edit-distance</method>

    <result>0.98</result>

   </belief>

  </hypothesis>

 <hypothesis label="contract murder match 2" id="contract_murder_match2">

   <pattern id="contract_murder1">

    <body>

      <node id="contract_murder.x" label="murder">

        <value>Murder67</value>

      </node>

      <node id="eventOccursAt.contract_murder.x" label="eventOccursAt">

         <value>

          <exhibit id="exhibit987" timestamp="2001-09-17T15:31:19">

           <assertion>

             <relationName>eventOccursAt</relationName>

            <relationArgument>Murder67</relationArgument>

            <relationArgument>'Moscow'</relationArgument>

           </assertion>

           <source id="informer4567" reliability="0.6"/>

           <belief>

            <method>analyst rating</method>

            <result>0.78</result>

           </belief>

         </exhibit>

        </value>

      </node> 

      <node id="location.contract_murder.x" label="location">

        <value>'Moscow'</value>

      </node> 

      <edge from="contract_murder.x" to="eventOccursAt.contract_murder.x" id="contract_murder.x.eventOccursAt" label="location"/> 

  <edge from="eventOccursAt.contract_murder.x" to="location.contract_murder.x" id="eventOccursAt.contract_murder.x.location" label="location"/> 

    </body>

   </pattern>

   <belief>

    <method>graph-edit-distance</method>

    <result>0.87</result>

   </belief>

  </hypothesis>

 <hypothesis label="contract murder match 3" id="contract_murder_match3">

   <pattern id="contract_murder1">

    <body>

      <node id="contract_murder.x" label="murder">

        <value>Murder84</value>

      </node>

      <node id="eventOccursAt.contract_murder.x" label="eventOccursAt">

         <value>

          <exhibit id="exhibit2013" timestamp="2002-01-03T17:23:28">

           <assertion>

             <relationName>eventOccursAt</relationName>

            <relationArgument>Murder84</relationArgument>

            <relationArgument>'Moscow'</relationArgument>

           </assertion>

           <source id="police-report345" reliability="0.95"/>

           <belief>

            <method>analyst rating</method>

            <result>1.0</result>

           </belief>

         </exhibit>

        </value>

      </node> 

      <node id="location.contract_murder.x" label="location">

        <value>'Moscow'</value>

      </node> 

      <edge from="contract_murder.x" to="eventOccursAt.contract_murder.x" id="contract_murder.x.eventOccursAt" label="location"/> 

  <edge from="eventOccursAt.contract_murder.x" to="location.contract_murder.x" id="eventOccursAt.contract_murder.x.location" label="location"/> 

    </body>

   </pattern>

   <belief>

    <method>graph-edit-distance</method>

    <result>0.98</result>

   </belief>

  </hypothesis>

</hypothesisBag>