PAL Semantic Extraction API

PAL Semantic Extraction API enables the user both to train and untrain through examples, to configure the extraction system, and to reset the learning to the original state.

Datasets

Several training datasets are provided with PAL Semantic Extractor. These can be found in the jar file and should be placed on the same disk. These data sets are used during initialization. The location of the files is specified using the method:

void setWorkDir(String dir);

Input Methods

Class IFindAndSummarize has several methods to train Semantic Extractor.

Methods:

Output Methods

The main container of information is "Thing," which contains the type of the item (for example, Date, Time or Person); its value/name (Nov 26, 16:00, Bob Smith); its sentence and word positions; and its sub-structures, containing hierarchical information where appropriate.

The wrapper IExtractWrapper returns a specific List of Things based on an input Thing Type or a Map of all Things.

Class IExtractWrapper

Methods:

Class IThing

Methods:

API

API for IFindAndSummarize


//-------------------------------------------------------------------------------	
 
    /**
     * Sets the working data directory for the data files
     *
     * @param dir working directory
     */
	
     void setWorkDir(String dir);
 
//-------------------------------------------------------------------------------	
    /**
     * Executes semantic extraction on a given string
     *
     * @param inStr the String that is to be analyzed
     * @return ExtractWrapper
     */
     public ExtractWrapper semanticExtraction(String inStr);
 
//-------------------------------------------------------------------------------
      
    /**
     * Executes signature extraction from a given string
     * such as name, address, telephone number etc.
     * 
     * @param inStrIn input string that contains signature
     * @return IExtractWrapper of extracted info
     */
     public IExtractWrapper signatureExtraction(String inStrIn);
 
//-------------------------------------------------------------------------------
    /**
     * Defines groups for meta-Structures
     * 
     * @param groupName = name of the group
     * @param groupList String[] list of the types to group together
     * @return
     */
 
     public void setGroup(String groupName, String[] groupList);
  
//-------------------------------------------------------------------------------	
 
    /**
     * Train the Semantic Extractor on a specific example
     *
     * @param example an example of the text string to learn
     * @param category a defining organizational group for this example, such as 'LATLON'
     * @param type currently only "general" works, more TODO
     * @return 
     */
   
     boolean trainOnNewExample(String example, String category, String type);
 
//-------------------------------------------------------------------------------	
 
    /**
     * Un-Trains the Semantic Extractor on a specific example
     * 
     * @param example an example of the text string to learn
     * @param category a defining organizational group for this example, such as 'LATLON'
     * @param type supports only "general" type
     * @return 
     */
   
     boolean forgetNewExample(String example, String category, String type);
 
//-------------------------------------------------------------------------------
     
    /**
     * 
     * @param example a person's full name
     * @return
     */
     boolean learnPersonName(String example);
//-------------------------------------------------------------------------------	
     
    /**
     * Adds the person's first name to the semantic database. Usually unusual names 
     *
     * @param example a person's first name
     * @return
     */

     public boolean learnPersonFirstName(String example);

//-------------------------------------------------------------------------------	
    /**
     * Adds the person's last name to the semantic database. Usually unusual names 
     *
     * @param example a person's last name
     * @return
     */
     public boolean learnPersonLastName(String example);

//-------------------------------------------------------------------------------	
    /**
     * Removes the person's last name Prefix from the semantic database.
     *
     * @param example a person's last name Prefix
     * @return
     */
     public boolean forgetLastNamePrefix(String example);     

//-------------------------------------------------------------------------------	
    /**
     * Adds the person's last name prefix to the semantic database.
     *
     * @param example a person's last name Prefix (such as 'al-' or 'Abu')
     * @return
     */
     public boolean learnLastNamePrefix(String example);     
    
//-------------------------------------------------------------------------------	
    /**
     * Adds the location name to the semantic database.
     *
     * @param example a location name
     * @return
     */
     boolean learnLocationName(String example);

//-------------------------------------------------------------------------------
    /**
     * Adds the Name (e.g. organization, group, bomb etc), type (Organization, ...) 
     *  and puts it into the semantic database.
     *
     * @param example an organization name
     * @param type the type of group (organization)
     * @return
     */
     
     public boolean learnNameAndType(String example, String type);
 
//-------------------------------------------------------------------------------	
    /** 
     * removes a name and type from the database
     *
     * @param example
     * @return
     */
     public boolean forgetNameAndType(String example);

//-------------------------------------------------------------------------------	
    /**
     * Learns an alias for a known name
     * (e.g. Jim for James, Bin Ladin for Bin Laden)
     *
     * @param alias alias for a thing in the DB
     * @param realObject the thing in the DB
     * @return
     */
     public boolean learnAliasName(String alias, String realObject);

//-------------------------------------------------------------------------------	
    /** 
     * removes an alias from the database
     *
     * @param alias
     * @return
     */
     public boolean forgetAliasName(String alias); 

//-------------------------------------------------------------------------------	
    /**
     * Controls aspects of program output,
     *  turns on or off various methods that you want or don't need
     *  
     *  the String parameter contains any of these:
     *  	Question_Off
     *  	Question_On  
     *  	Acronym_Off
     *  	Acronym_On
     *  	Quotes_Off
     *  	Quotes_On
     *  	Signature_Off
     *  	Signature_On
     *  
     * @param strOnOff - example 
     *       String strOnOff= "Question_Off Acronym_Off Quotes_On Signature_On";
     *       onOff(strOnOff)
     *       
     */
     public void onOff(String onOff);   
 
//-------------------------------------------------------------------------------	
 
    /**
     * starts up the system and reads in training files
     *
     */
     void startup();
 
//-------------------------------------------------------------------------------	
    /**
     * Resets the Semantic Extraction system back to user's out-of-the-box state
     *   organizations, acronyms and regular expressions are cleared
     *   Names (people and locations) are cleared and 
     *   	set to out-of-the-box backup files
     *
     */
     void reset();
 

API for IExtractWrapper


 
//-------------------------------------------------------------------------------
    /**
     * Gets the map of a list of things 
     */
     public HashMap  getThingMap(); 
 
//-------------------------------------------------------------------------------	
    /**
     * Gets a list of Things 
     */
     public List getThing(String type);
 
//-------------------------------------------------------------------------------	
    /**	   
     * Returns a specific Thing of type and value.
     */
	
     public Thing getThing(String type, String  value);

//-------------------------------------------------------------------------------	
 
    /**
     * Prints out a list of things Information to System.out
     */
     public void print();

API for IThing


//-------------------------------------------------------------------------------
    /**
     * Returns the Thing type
     */				
     String getType();

//-------------------------------------------------------------------------------
    /**
     * Returns the Thing Value
     */
     String getValue();

//-------------------------------------------------------------------------------
    /**
     * Returns the HashMap Template for the Thing
     *   allows hierarchical information to 
     *   be included with a thing
     */
     HashMap getTemplate();

//-------------------------------------------------------------------------------
    /**
     * Returns the first Name
     */
     String getFirstName();

//-------------------------------------------------------------------------------
    /**
     * Returns the last Name
     */
     String getLastName();

//-------------------------------------------------------------------------------
    /**
     * Returns the middle name
     */
     String getMiddleName();
		
//-------------------------------------------------------------------------------
    /**
     * Returns the word type ID in the form %TYPE_Num
     */
     String getTypeID();
		
//-------------------------------------------------------------------------------
    /**
     * Returns the document name/number
     */
     String getDocument();

//-------------------------------------------------------------------------------
    /**
     * Returns the sentence position in
     *  the document that the word(s) is in
     */
     int getSentencePos();

//-------------------------------------------------------------------------------
    /**
     * Returns word position in the sentence
     */
     int getWordPos();

//-------------------------------------------------------------------------------
     /**
     * Returns the word position of the last word
     *  in the word phrase in the sentence 
     *  (one word start =x, end=x+1)
     */
     int getWordEndPos();

//-------------------------------------------------------------------------------
    /**
     * Returns the beginning character 
     * position over the entire document
     */
     int getCharPosition();

//-------------------------------------------------------------------------------
    /**
     * Returns the end character position 
     *  of the entire document
     */
     int getEndCharPosition();

//-------------------------------------------------------------------------------
    /**
     * Returns the confidence of the extraction
     */
     int getConfidence();

//-------------------------------------------------------------------------------
    /**
     * Returns the internal relevance of the 
     *   word phrase
     */
     int getRelevance();

//-------------------------------------------------------------------------------
    /**
     * Returns an external relavance 
     */
     Personint getExternalRelevance();
 

Coding Example

The following snippet from the Semantic Extraction example program shows the classification API calls.

package sri.semantic_extractor;
 
import sri.info_extractor.*;
 
import java.util.HashMap;
import java.util.List;
import java.util.Collection;
import java.util.Iterator;
 
import org.apache.log4j.Logger;
import org.apache.log4j.Level;
import org.apache.log4j.BasicConfigurator;
 
/**
 * 
 */
// ----------------------------------------------------------------
public class Main {
 
	private static final Logger LOG = Logger.getRootLogger();
 
	public static  void main(String argv[]) {
 
	    Logger.getRootLogger().setLevel(Level.ERROR);
	    BasicConfigurator.configure(); 
 
	    
	    
	//  Setup
	    
    	IFindAndSummarize fas = new FindAndSummarize();
    	
        // Set up the directory location where the data lives
    	
    	fas.setWorkDir("C:/Semantic_Ex_Test/data");
    	
    	fas.startup();
    	
 
    	fas.reset();
 
   
    	// Training examples   
	
    	fas.trainOnNewExample("42SXF3646255095", "MGRS", fas.GENERAL);
    	fas.trainOnNewExample("38S LB 94630 94810", "MGRS", fas.GENERAL);
    	fas.trainOnNewExample("39STD 64934 83804", "MGRS", fas.GENERAL);
    	fas.trainOnNewExample("38S MB 3737988209", "MGRS", fas.GENERAL);
    	fas.trainOnNewExample("364210N 0711833E", "LATLON", fas.GENERAL);
    	fas.trainOnNewExample("393047.379S 0481834.522W", "LATLON", fas.GENERAL);
    	fas.trainOnNewExample("070100Apr13", "Datee", fas.GENERAL);
    	fas.trainOnNewExample("BL333", "Office", fas.GENERAL);
 
    	fas.learnPersonName("Abu Viddi");
    	
    	fas.learnLocationName("Mosal");
    	fas.learnLocationName("Puskino");
 
    	fas.learnNameAndType("al-Shabab", "Organization");
    	fas.learnNameAndType("SAPA", "Organization");
    	fas.learnNameAndType("IED", "BombType");
 
 
    	fas.trainOnNewExample("(\b[Tt]o:\b)(\\w+)@(\\w+\\.)(\\w+)(\\.\\w+)*", "Email To", fas.REGEXP);
 
     	fas.onOff("Acronym_On");
     	fas.onOff("Question_On");
 
    	
    	fas.trainOnNewExample("13NOV07", "Date", fas.GENERAL);
   
    	fas.forgetNewExample("42SXF3646255097", "MGRS", fas.GENERAL);
 
    	fas.onOff("Questions_Off Acronyms_On");
 
    	fas.forgetNameAndType("al-Qaeda");
 
     	String [] groupA = {"Street Address","csz"};
    	fas.setGroup("Address", groupA);
 
 
    	// Input
    	
    	String inStr="I'm arriving tomorrow (3/24) at SFO at 2:15pm.  I should be out of the airport 20 minutes later.
 Can you pick me up and take me to 133 Oak St, Menlo Park, CA?  Larry Jones may be with me.
 
 thank you,
 
 Bob Robertson
 273 Emerson Ave.
 Omaha, NE
 Cell: 343-555-2312
 bjones@pamnr.com";
       		
 
 
 
    	// Call the Semantic Extractor
 
   	System.out.println("\n<<<<<<<<<<<<<<<<<<"+inStr);
 
    	IExtractWrapper eW = fas.semanticExtraction(inStr);  	
    	eW.print();
    	
    }
 
//----------------------------------------------------------
	The Print() routine from the ExtractWrapper Class	
//----------------------------------------------------------
	public  void print(){		
    	// Print the results
    	    	
    	Collection  v = listOfThings.values();
    	
	  for (Iterator iter = v.iterator(); iter.hasNext();) {			
	    List d= (List)iter.next();
						
            for(int i=0; i < d.size(); i++){
	      Thing t = d.get(i);
	      System.out.println("\t\t    "+t.getType()+" = "+t.getValue()+"
                ID="+t.getTypeID()+"  Doc="+t.getDocument()+"   SentencePos="+t.getSentencePos()+"                                           WordPos=("+t.getWordPos()+","+t.getWordEndPos()+")
                charPos="+t.getCharPosition()+"   UniqueID="+t.getUniqueID());
				
	      printTemplate(t, 1);					
	    }
	  }  
	}
	
//----------------------------------------------------------
	private  void printTemplate(Thing t, int over){
	
	// look at the template for this Thing
	// ---------------------------------------
	  if(t.getTemplate() != null){
	    HashMap  template = t.getTemplate();
	    Collection  temV = template.values();
	    	
	    for (Iterator iterTem = temV.iterator(); iterTem.hasNext();) {			
	      List temD= (List)iterTem.next();
	      for(int ii=0; ii < temD.size(); ii++){
		Thing tt = temD.get(ii);
		System.out.print("\t\t\t");
		for(int ov=0; ov < over; ov++)System.out.print("\t");
		  System.out.println("   "+tt.getType()+" = "+tt.getValue()+"
                    ID="+tt.getTypeID()+"  Doc="+tt.getDocument()+"
                    SentencePos="+tt.getSentencePos()+"   WordPos=("+tt.getWordPos()+",
                    "+tt.getWordEndPos()+") charPos="+tt.getCharPosition()+"
                    UniqueID="+tt.getUniqueID());
		  printTemplate(tt, over+1);
		}
	      }
            }
	  }
        }

Example Output

Sample output from the Semantic Extractor is presented below for the following input.

I'm arriving tomorrow (3/24) at SFO at 2:15pm. I should be out of the airport 20 minutes later. Can you pick me up and take me to 133 Oak St, Menlo Park, CA? Larry Jones may be with me.

thank you,

Bob Robertson
273 Emerson Ave.
Omaha, NE
Cell: 343-555-2312
bjones@pamnr.com

   **** SEMANTIC EXTRACTOR Results ------------------------------------
 
  Time = 14:15   ID=%TIME_0       Doc=null   SentencePos=0   WordPos=(7, 8)   charPos=37   UniqueID=6
  Date = tomorrow   ID=%DATE_0       Doc=null   SentencePos=0   WordPos=(2, 3)   charPos=13   UniqueID=7
  Date = 3/24   ID=%DATE_0       Doc=null   SentencePos=0   WordPos=(3, 4)   charPos=22   UniqueID=8
  Date = minutes   ID=%DATE_0       Doc=null   SentencePos=1   WordPos=(8, 9)   charPos=34   UniqueID=10
  Address_0 = Group   ID=%ADDRESS_0_0       Doc=null   SentencePos=3   WordPos=(12, 15)   charPos=-1   UniqueID=19
  Street Address = 133 Oak St   ID=%STREET ADDRESS_0       Doc=null   SentencePos=3   WordPos=(9, 12)   charPos=34   UniqueID=17
  csz = Menlo Park CA   ID=%CSZ_0       Doc=null   SentencePos=3   WordPos=(12, 15)   charPos=45   UniqueID=11
  Question = Can you pick me up and take me to 133 Oak St, Menlo Park, CA?    ID=%QUESTION_0       Doc=null   SentencePos=3   WordPos=(0, 15)   charPos=102   UniqueID=2
  Duration = 20 minutes   ID=%DURATION_0       Doc=null   SentencePos=1   WordPos=(7, 9)   charPos=31   UniqueID=9
  Acronym = SFO   ID=%ACRONYM_0       Doc=null   SentencePos=0   WordPos=(5, 6)   charPos=36   UniqueID=1
  Person = Larry Jones   ID=%PERSON_0       Doc=null   SentencePos=4   WordPos=(0, 2)   charPos=0   UniqueID=3
  Person = Bob Robertson   ID=%PERSON_0       Doc=null   SentencePos=9   WordPos=(0, 2)   charPos=0   UniqueID=4
    Street = Emerson Ave   ID=%STREET_0       Doc=null   SentencePos=10   WordPos=(1, 3)   charPos=0   UniqueID=21
    Cell/Mobile = Cell 343-555-2312   ID=%CELL/MOBILE_0       Doc=null   SentencePos=13   WordPos=(0, 2)   charPos=0   UniqueID=22
    Address_1 = Group   ID=%ADDRESS_1_0       Doc=null   SentencePos=12   WordPos=(0, 2)   charPos=-1   UniqueID=23
    Street Address = 273 Emerson Ave   ID=%STREET ADDRESS_0       Doc=null   SentencePos=10   WordPos=(0, 3)   charPos=0   UniqueID=18
    csz = Omaha NE   ID=%CSZ_0       Doc=null   SentencePos=12   WordPos=(0, 2)   charPos=0   UniqueID=14
    Email Address = bjones@pamnr.com   ID=%EMAIL ADDRESS_0       Doc=null   SentencePos=14   WordPos=(0, 1)   charPos=0   UniqueID=24
 
   **** SEMANTIC EXTRACTOR --------------------------------------------

Implementation Notes

PAL semantic Extraction uses the following techniques to keep the software foot print small.