Search |  Contact |  SRI Home Do not follow this link, or your host will be blocked from this site. This is a spider trap. Do not follow this link, or your host will be blocked from this site. This is a spider trap. Do not follow this link, or your host will be blocked from this site. This is a spider trap.A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A ASRI International.  333 Ravenswood Avenue.  Menlo Park, CA 94025-3493. SRI International is a nonprofit corporation.

AIC Seminar Series

Deriving Paraphrases for Highly Inflected Languages, with a Focus on Machine Translation

Kfir BarTel-Aviv University

Notice:  Hosted by Richard Waldinger.

Date:  2014-01-20 at 16:00

Location:  EJ228 (SRI E building)  (Directions)

   Abstract

Paraphrasing is the act of generating an alternate sequence of words that conveys the same meaning. In this work, we explore the potential of using paraphrases to improve a corpus-based translation system, designed to translate a morphologically rich language into English. We focus on Arabic, a highly inflected language, whose words are generated by a comprehensive derivational and inflectional morphological system. We describe an automatic data-driven paraphrasing procedure for Arabic, starting with two limited case studies. Our procedure utilizes comparable documents, that is, distinct documents covering the same topic, for learning the characteristics of potential para-phrases. A co-training approach is taken, with two classifiers, one designed to model the contexts surrounding occurrences of paraphrases, and the other trained to identify significant features of the words within paraphrases. In particular, we use morpho-syntactic features calculated for both classifiers, which proved to be effective for this task. We employ a simplified version of our paraphrasing procedure to support a corpus-based translation system by deriving paraphrases for fragments of the input Arabic text. The experimental results were found to be encouraging.

   Bio for Kfir Bar

I have submitted my PhD thesis, titled “Deriving Paraphrases for Highly Inflected Languages, with a Focus on Machine Translation” at Tel-Aviv University under the supervision of Prof. Nachum Dershowitz. In particular, my research focuses on automatic learning techniques for identifying paraphrases in Arabic texts, so as to improve the performance of an automatic Arabic-to-English translation system. I recently co-founded Comprendi focusing on business intelligence and hyper segmentation leveraging large amount of text. In a previous position, I worked at IntuView Inc. for seven years. At IntuView we developed semantic-based algorithms for named entity recognition and document classification.

   Note for Visitors to SRI

Please arrive at least 10 minutes early as you will need to sign in by following instructions by the lobby phone at Building E. SRI is located at 333 Ravenswood Avenue in Menlo Park. Visitors may park in the parking lots off Fourth Street. Detailed directions to SRI, as well as maps, are available from the Visiting AIC web page. There are two entrances to SRI International located on Ravenswood Ave. Please check the Builing E entrance signage.

SRI International
©2014 SRI International 333 Ravenswood Avenue, Menlo Park, CA 94025-3493
SRI International is an independent, nonprofit corporation. Privacy policy