AIC Seminar Series

Semantic Web Information Processing

Guizhen YangUniversity at Buffalo

Date:  2004-06-17 at 10:00

Location:  EJ228  (Directions)


The vision of the Semantic Web is to define and share machine processable data on the Web which will enable a variety of automated tasks ranging from information search to data integration to content management to Web services. This talk will present our approach to realizing the Semantic Web vision, by addressing two fundamental issues: (1) infrastructure for reasoning with semantically enriched data; (2) creation of semantic content by transforming semistructured Web documents into structured data. In the first part of the talk, I will focus on infrastructure for reasoning with semantically enriched data. I will present my work on the design and implementation of Flora-2. Flora-2 unifies the well-known F-logic, HiLog, and Transaction Logic into one coherent rule-based, object-oriented knowledge representation system. I will discuss the engineering issues of language and compiler design, system architecture, and query optimization, as well as the theoretical issues related to the new semantics and algorithms for nonmonotonic multiple value and code inheritance. Flora-2 (and its predecessor Flora-1) has been used in a variety of application domains, ranging from Web agents to information integration in bioinformatics to ontology management to building CASE systems. Since its last alpha-release in late 2002, it has had hundreds of downloads and a small community of devoted users. Currently the Flora-2 system consists of 18,000 lines of Prolog/C code and is freely available at In the second part of the talk, I will deal with creation of semantic content from Web documents. Specifically, I will describe novel techniques for data extraction from Web documents that exhibit a high degree of precision and recall. The theory behind these techniques is based on the concept of unambiguity in automatic learning of extraction patterns and the notion of resilience to changes in Web documents. I will present complexity results and efficient algorithms for learning unambiguous and resilient extraction patterns, as well as experimental results to demonstrate the effectiveness of these techniques in practice. At the end of the talk I will outline ongoing and future research on the Flora-2 system and mining semantic information from Web documents.

