AIC Seminar Series
Semantic Web Information Processing
|Guizhen Yang||University at Buffalo|
Date: Thursday, June 17th 2004 at 10:00am
Location: EJ228 (Directions)
The vision of the Semantic Web is to define and share machine
processable data on the Web which will enable a variety of automated
tasks ranging from information search to data integration to content
management to Web services. This talk will present our approach to
realizing the Semantic Web vision, by addressing two fundamental
issues: (1) infrastructure for reasoning with semantically enriched
data; (2) creation of semantic content by transforming semistructured
Web documents into structured data.
In the first part of the talk, I will focus on infrastructure for
reasoning with semantically enriched data. I will present my work on
the design and implementation of Flora-2. Flora-2 unifies the
well-known F-logic, HiLog, and Transaction Logic into one coherent
rule-based, object-oriented knowledge representation system. I will
discuss the engineering issues of language and compiler design,
system architecture, and query optimization, as well as the theoretical
issues related to the new semantics and algorithms for nonmonotonic
multiple value and code inheritance.
Flora-2 (and its predecessor Flora-1) has been used in a variety of
application domains, ranging from Web agents to information
integration in bioinformatics to ontology management to building CASE
systems. Since its last alpha-release in late 2002, it has had
hundreds of downloads and a small community of devoted users. Currently
the Flora-2 system consists of 18,000 lines of Prolog/C code and is
freely available at http://flora.sourceforge.net/.
In the second part of the talk, I will deal with creation of semantic
content from Web documents. Specifically, I will describe novel
techniques for data extraction from Web documents that exhibit a high
degree of precision and recall. The theory behind these techniques is
based on the concept of unambiguity in automatic learning of
extraction patterns and the notion of resilience to changes in Web
documents. I will present complexity results and efficient algorithms
for learning unambiguous and resilient extraction patterns, as well as
experimental results to demonstrate the effectiveness of these
techniques in practice.
At the end of the talk I will outline ongoing and future research on
the Flora-2 system and mining semantic information from Web documents.
Please arrive at least 10 minutes early as you will need to sign in by
following instructions by the lobby phone at Building E (or call Wilma
Lenz at 650 859 4904, or Vicenta at Lopez at 650 859 5750). SRI is
located at 333 Ravenswood Avenue in Menlo Park. Visitors may park in the
parking lots off Fourth Street. Detailed directions to SRI, as well as maps,
are available from the Visiting AIC web page.
There are two entrances to SRI International located on Ravenswood Ave.
Please check the Building E entrance signage.
©2017 SRI International 333 Ravenswood Avenue, Menlo Park, CA 94025-3493