Hypothesis Schema Design
Version 2002/07
Ian Harrison, SRI International
1. Introduction
This document describes the current status of the hypothesis
markup language (HypothesisML), which we are proposing to use to describe
matches (i.e. the output from a pattern match). HypothesisML includes the
PatternML schema and can be used within ControlML, as the subject of a request,
or as part of a response. These other schema are described in separate
documents.
2. Design of Hypothesis Schema
The design goal for HypothesisML was that it was to be used
to represent pattern match results, which could be interchanged and understood
between different components in the EELD program. HypothesisML adds a layer of
belief to the pattern match, with the actual matches being represented using
PatternML elements. This allows a single hypothesis to have multiple beliefs as
calculated using different methods, as well as allowing multiple different
hypotheses to have the same matched pattern.
The hypothesis schema was developed using Tibco’s XML Turbo
tool. Several iterations between members of the SRI EELD team resulted in this
current version.
3. Hypothesis Schema
The hypothesisBag element (see Figure 1) is the root element for
describing an unordered set of hypotheses. hypothesisBag can contain 0
or more hypothesis elements.

Figure 1:
hypothesis element
A hypothesis element can
have 4attributes. id (required) is a unique id within the document, but should
be a UUID eventually. The URI attribute is a reference to this hypothesis that
is network accessible.The label attribute is just a pretty string given to the
hypothesis, e.g. by a user. The class attribute will be used to record the
class that this hypothesis is an instance of (if there is a hypothesis class
hierarchy). For now the class attribute is just a string.
A hypothesis element must
contain 1 (and only 1) pattern element (although pattern elements can
appear as child elements of the body element, which is a valid child element of
pattern), and can also contain 0 or more belief elements.
The pattern element is used
to define the matched pattern. For efficiency reasons, we’re not expecting a
EELD matcher to return the whole pattern passed to it for matching, with match
data interspersed within the pattern (as value elements - see PatternML
document). Instead it’s likely that pattern matchers will just pass back those
parts of the pattern that match. This will be done using a pattern element with
node/edge elements as content. Alternatively there is a uri attribute for the
pattern element, which allows a URI reference to the pattern to be given,
allowing reference to the complete pattern.
The belief element is designed to hold information
about the current belief in the hypothesis. A belief element can contain 1 (and
only 1) method/result pairs (i.e. if there is a method element there
must be a result element).
A method element is a string description of the
method used to calculate the belief.
The result element is a string description of the
actual belief value, using the associated belief calculation method.
The exhibit
element describes a piece of evidence used to support the hypothesis. It is
most likely used as a child element of the value element of a node. The
exhibit element has 3 attributes: id, uri and timestamp. Attribute
id (required) is a unique id to the piece of evidence (should be a UUID
eventually). The URI attribute is a reference to the piece of evidence if it is
network accessible. Timestamp is the time/date that the piece of evidence was
created (not when an event occurred, which might be referred to in the
evidence).
The exhibit element can
contain 3 child elements in the sequence: assertion, source, belief. There must
be 1 and only 1 assertion element. There must be 1 and only 1 source element.
There can be 0 or 1 belief elements.
The assertion
element describes a fact, and can contain 3 types of child elements in the
sequence: negation, relationName, relationArgument. The negation
element says whether it is a (not (relation arg1 arg2 ...argn)) style
assertion. There can 0 or 1 negation elements. The relationName element
is the name of the relation – there must be 1 and only 1 relationName element.
There can by 1 or more relationArgument elements. These represent the
relation arguments in order.
The source
element describes the source of a piece of evidence. It is most likely used as
the source for an exhibit element. The source element has 3 attributes: id, uri
and reliability. Attribute id (required) is a unique id to the source of
evidence (can be a person or thing). The id should be a UUID eventually. The
URI attribute is a reference to the source if it is network accessible (e.g.
online news source). Reliability is a
measur of how reliable a source is for evidence (it’s a number between 0 and
1).
ListOfIntegers is provided as an element, to enable one to record a list of
integers separated by whitespace. This is most likely to be used as a child
element of the value element (e.g. <value><listOfIntegers>1 3 5
7</listOfIntegers></value>).
ListOfStrings is provided as an element, to enable one to record a list of
delimited strings separated by whitespace. This is most likely to be used as a
child element of the value element (e.g.
<value><listOfStrings>’StringA’ ‘StringB’
‘StringC’</listOfStrings></value>).
Appendix A: HypothesisML Schema
<?xml version="1.0"
encoding="UTF-8"?>
<xsd:schema
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:include
schemaLocation="http://www.ai.sri.com/~law/schemas/2002/07/pattern"/>
<xsd:element
name="hypothesis">
<xsd:annotation>
<xsd:documentation>A
hypothesis element can have 4 attributes. id is a unique id within the
document, but is not a uuid for now. If we use this it'll be for convenience to
store an object id from a particular application, but this will have no meaning
to other applications. The uri is a pointer to the hypothesis, if it network
accessible.The label attribute is just a pretty string given to the hypothesis,
e.g. by a user. The class attribute will be used to record the class that this
hypothesis is an instance of (if there is a hypothesis class hierarchy). For
now the class attribute is just a string.
</xsd:documentation>
<xsd:documentation>A
hypothesis must contain 1 (and only 1) pattern element, and can also contain 0
or more belief elements.
</xsd:documentation>
</xsd:annotation>
<xsd:complexType>
<xsd:sequence>
<xsd:element
ref="pattern"/>
<xsd:element
ref="belief" minOccurs="0"
maxOccurs="unbounded"/>
</xsd:sequence>
<xsd:attribute
name="uri" type="xsd:anyURI"/>
<xsd:attribute
name="id" type="xsd:string" use="required"/>
<xsd:attribute
name="label" type="xsd:string"/>
<xsd:attribute
name="class" type="xsd:string"/>
</xsd:complexType>
</xsd:element>
<xsd:element
name="method" type="xsd:string">
<xsd:annotation>
<xsd:documentation>The
method element is a string description of the method used to calculate the
belief.
</xsd:documentation>
</xsd:annotation>
</xsd:element>
<xsd:element
name="result" type="xsd:string">
<xsd:annotation>
<xsd:documentation>The
result element is a string description of the actual belief value, using the
associated belief calculation method.
</xsd:documentation>
</xsd:annotation>
</xsd:element>
<xsd:element
name="belief">
<xsd:annotation>
<xsd:documentation>The
belief element is designed to hold information about belief in an object. A
belief element can contain 1 and only 1 method/result pair.
</xsd:documentation>
</xsd:annotation>
<xsd:complexType>
<xsd:all>
<xsd:element
ref="method"/>
<xsd:element
ref="result"/>
</xsd:all>
</xsd:complexType>
</xsd:element>
<xsd:element
name="exhibit">
<xsd:annotation>
<xsd:documentation>The
exhibit element describes a piece of evidence used to support the hypothesis.
It is most likely used as a child element of the value element of a node.
</xsd:documentation>
</xsd:annotation>
<xsd:complexType>
<xsd:sequence>
<xsd:element
ref="assertion" minOccurs="1" maxOccurs="1"/>
<xsd:element
ref="source" minOccurs="1" maxOccurs="1"/>
<xsd:element
ref="belief" minOccurs="0" maxOccurs="1"/>
</xsd:sequence>
<xsd:attribute
name="id" type="xsd:string" use="required"/>
<xsd:attribute
name="uri" type="xsd:anyURI"/>
<xsd:attribute
name="timestamp" type="xsd:dateTime"/>
</xsd:complexType>
</xsd:element>
<xsd:element
name="assertion">
<xsd:annotation>
<xsd:documentation>The
assertion element describes a fact where the negation element says whether it
is a (not (relation arg1 arg2 ...argn)) style assertion. The second child
element is the name of the relation and the rest of the child elements are the
relation arguments in order.
</xsd:documentation>
</xsd:annotation>
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="negation"
minOccurs="0" maxOccurs="1"/>
<xsd:element
ref="relationName"/>
<xsd:element
ref="relationArgument" minOccurs ="1"
maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element
name="negation" type="xsd:boolean">
<xsd:annotation>
<xsd:documentation>Whether
the assertion is negated.
</xsd:documentation>
</xsd:annotation>
</xsd:element>
<xsd:element
name="relationName" type="xsd:string">
<xsd:annotation>
<xsd:documentation>The
name of the relation.
</xsd:documentation>
</xsd:annotation>
</xsd:element>
<xsd:element
name="relationArgument" type="xsd:string">
<xsd:annotation>
<xsd:documentation>The
relation argument.
</xsd:documentation>
</xsd:annotation>
</xsd:element>
<xsd:element
name="source">
<xsd:annotation>
<xsd:documentation>The
source element describes the source of a piece of evidence.
</xsd:documentation>
</xsd:annotation>
<xsd:complexType>
<xsd:attribute
name="id" type="xsd:string" use="required"/>
<xsd:attribute
name="uri" type="xsd:anyURI"/>
<xsd:attribute
name="reliability" type="xsd:string"/>
</xsd:complexType>
</xsd:element>
<xsd:element
name="hypothesisBag">
<xsd:annotation>
<xsd:documentation>HypothesisBag
is the root of the schema and can conatin 0 or more hypothesis -- i.e. it's a
bag for holding all hypothesis.
</xsd:documentation>
</xsd:annotation>
<xsd:complexType>
<xsd:sequence>
<xsd:element
ref="hypothesis" minOccurs="0" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element
name = "listOfIntegers" type="listOfIntegersType">
<xsd:annotation>
<xsd:documentation>list of integers separated
by whitespace. </xsd:documentation>
</xsd:annotation>
</xsd:element>
<xsd:simpleType
name="listOfIntegersType">
<xsd:list itemType =
"xsd:integer"/>
</xsd:simpleType>
<xsd:element
name = "listOfStrings" type="listOfStringsType">
<xsd:annotation>
<xsd:documentation>list of delimited strings,
separated by whitespace. </xsd:documentation>
</xsd:annotation>
</xsd:element>
<xsd:simpleType
name="listOfStringsType">
<xsd:list itemType =
"xsd:string"/>
</xsd:simpleType>
</xsd:schema>
Appendix B: Example HypothesisML File
Example HypothesisML files.
1) Pattern is a query about all contract killings in Moscow,
against a database of evidence (uses example 1 from PatternML document). Three
matches come back.
<?xml version="1.0"
encoding="UTF-8" ?>
<hypothesisBag
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="http://www.ai.sri.com/~law/schemas/2002/07/hypothesis">
<hypothesis label="contract murder match 1"
id="contract_murder_match1">
<pattern id="contract_murder1">
<body>
<node id="contract_murder.x" label="murder">
<value>Murder55</value>
</node>
<node id="eventOccursAt.contract_murder.x"
label="eventOccursAt">
<value>
<exhibit id="exhibit123"
timestamp="2001-08-03T13:29:18">
<assertion>
<relationName>eventOccursAt</relationName>
<relationArgument>Murder55</relationArgument>
<relationArgument>'Moscow'</relationArgument>
</assertion>
<source
id="pravda-newspaper123" uri="www.pravda.ru"
reliability="0.9"/>
<belief>
<method>analyst rating</method>
<result>0.97</result>
</belief>
</exhibit>
</value>
</node>
<node
id="location.contract_murder.x" label="location">
<value>'Moscow'</value>
</node>
<edge from="contract_murder.x"
to="eventOccursAt.contract_murder.x"
id="contract_murder.x.eventOccursAt"
label="location"/>
<edge
from="eventOccursAt.contract_murder.x"
to="location.contract_murder.x"
id="eventOccursAt.contract_murder.x.location"
label="location"/>
</body>
</pattern>
<belief>
<method>graph-edit-distance</method>
<result>0.98</result>
</belief>
</hypothesis>
<hypothesis label="contract murder match 2"
id="contract_murder_match2">
<pattern id="contract_murder1">
<body>
<node id="contract_murder.x" label="murder">
<value>Murder67</value>
</node>
<node id="eventOccursAt.contract_murder.x"
label="eventOccursAt">
<value>
<exhibit id="exhibit987"
timestamp="2001-09-17T15:31:19">
<assertion>
<relationName>eventOccursAt</relationName>
<relationArgument>Murder67</relationArgument>
<relationArgument>'Moscow'</relationArgument>
</assertion>
<source id="informer4567" reliability="0.6"/>
<belief>
<method>analyst rating</method>
<result>0.78</result>
</belief>
</exhibit>
</value>
</node>
<node id="location.contract_murder.x"
label="location">
<value>'Moscow'</value>
</node>
<edge from="contract_murder.x"
to="eventOccursAt.contract_murder.x" id="contract_murder.x.eventOccursAt"
label="location"/>
<edge
from="eventOccursAt.contract_murder.x"
to="location.contract_murder.x"
id="eventOccursAt.contract_murder.x.location"
label="location"/>
</body>
</pattern>
<belief>
<method>graph-edit-distance</method>
<result>0.87</result>
</belief>
</hypothesis>
<hypothesis label="contract murder match 3"
id="contract_murder_match3">
<pattern id="contract_murder1">
<body>
<node id="contract_murder.x" label="murder">
<value>Murder84</value>
</node>
<node id="eventOccursAt.contract_murder.x"
label="eventOccursAt">
<value>
<exhibit id="exhibit2013"
timestamp="2002-01-03T17:23:28">
<assertion>
<relationName>eventOccursAt</relationName>
<relationArgument>Murder84</relationArgument>
<relationArgument>'Moscow'</relationArgument>
</assertion>
<source id="police-report345"
reliability="0.95"/>
<belief>
<method>analyst rating</method>
<result>1.0</result>
</belief>
</exhibit>
</value>
</node>
<node id="location.contract_murder.x"
label="location">
<value>'Moscow'</value>
</node>
<edge from="contract_murder.x" to="eventOccursAt.contract_murder.x"
id="contract_murder.x.eventOccursAt"
label="location"/>
<edge
from="eventOccursAt.contract_murder.x"
to="location.contract_murder.x"
id="eventOccursAt.contract_murder.x.location"
label="location"/>
</body>
</pattern>
<belief>
<method>graph-edit-distance</method>
<result>0.98</result>
</belief>
</hypothesis>
</hypothesisBag>