AIC Seminar Series
Semantic Web Information Processing
|Guizhen Yang||University at Buffalo|
Date: 2004-06-17 at 10:00
Location: EJ228 (Directions)
The vision of the Semantic Web is to define and share machine
processable data on the Web which will enable a variety of automated
tasks ranging from information search to data integration to content
management to Web services. This talk will present our approach to
realizing the Semantic Web vision, by addressing two fundamental
issues: (1) infrastructure for reasoning with semantically enriched
data; (2) creation of semantic content by transforming semistructured
Web documents into structured data.
In the first part of the talk, I will focus on infrastructure for
reasoning with semantically enriched data. I will present my work on
the design and implementation of Flora-2. Flora-2 unifies the
well-known F-logic, HiLog, and Transaction Logic into one coherent
rule-based, object-oriented knowledge representation system. I will
discuss the engineering issues of language and compiler design,
system architecture, and query optimization, as well as the theoretical
issues related to the new semantics and algorithms for nonmonotonic
multiple value and code inheritance.
Flora-2 (and its predecessor Flora-1) has been used in a variety of
application domains, ranging from Web agents to information
integration in bioinformatics to ontology management to building CASE
systems. Since its last alpha-release in late 2002, it has had
hundreds of downloads and a small community of devoted users. Currently
the Flora-2 system consists of 18,000 lines of Prolog/C code and is
freely available at http://flora.sourceforge.net/.
In the second part of the talk, I will deal with creation of semantic
content from Web documents. Specifically, I will describe novel
techniques for data extraction from Web documents that exhibit a high
degree of precision and recall. The theory behind these techniques is
based on the concept of unambiguity in automatic learning of
extraction patterns and the notion of resilience to changes in Web
documents. I will present complexity results and efficient algorithms
for learning unambiguous and resilient extraction patterns, as well as
experimental results to demonstrate the effectiveness of these
techniques in practice.
At the end of the talk I will outline ongoing and future research on
the Flora-2 system and mining semantic information from Web documents.
Please arrive at least 10 minutes early in order to sign in and be escorted to the conference room. SRI is located at 333 Ravenswood Avenue in Menlo Park. Visitors may park in the visitors lot in front of Building E, and should follow the instructions by the lobby phone to be escorted to the meeting room. Detailed directions to SRI, as well as maps, are available from the Visiting AIC web page.
©2014 SRI International 333 Ravenswood Avenue, Menlo Park, CA 94025-3493