FASTUS is a (slightly permuted) acronym for Finite State Automa-based Text Understanding System. It is a system for extracting information from free text. Currently English and Japanese versions of the system exist. Typically, applications mark text with annotations that indicate items of interest, such as names of people or companies, or it fills templates with information that could be entered into a relational database. FASTUS works as a series of cascaded, finite-state automata.
FASTUS was developed in response to the needs of the government intelligence community
for scanning and processing huge volumes of written texts. Government intelligence
agencies collect much information from around the world from both classified and
unclassified sources. Assimilating important facts from this data can be a daunting
task for an analyst. One analyst had described the problem by saying that, ``If I read
every bit of information that might be important to what I am working on, it would
be like reading War and Peace every day.'' FASTUS provides the analyst with a tool
that will help him or her to avoid being overwhelmed by the flood of information.
FASTUS is most appropriate for information extraction tasks, rather than full text understanding. That is, it is most effective for tasks in which (1) only a fraction of the text contains relevent information, and (2) there is a relatively simple, predefined, rigid target representation that the information is mapped into.
FASTUS has been under development since 1992. The system is implemented in Common Lisp, and has been transported to several hardware platforms.
If you're curious, you can read a paper (in HTML) about how FASTUS works.
For more detailed information, you can download the following publications (gzipped postscript):
If you want more information about how FASTUS can solve your text processing problem, drop us a line and let's talk about it.