Search |  Contact |  SRI Home Do not follow this link, or your host will be blocked from this site. This is a spider trap. Do not follow this link, or your host will be blocked from this site. This is a spider trap. Do not follow this link, or your host will be blocked from this site. This is a spider trap.A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A ASRI International.  333 Ravenswood Avenue.  Menlo Park, CA 94025-3493. SRI International is a nonprofit corporation.

Publication Details

On the Empirical Complexity of Text Classification Problems

by Madani, Omid; Raghavan, Hema; Jones, Rosie

Technical Report 567
Institution: SRI International
Address: 333 Ravenswood Ave, Menlo Park, CA 94025
Sep 2009.

Abstract

In order to train a classifier that generalizes well, different learning problems, in particular high-dimensional ones such as text classification, can require widely different amounts of training, as measured in terms of the number of training instances required to reach adequate accuracy or the number of features effectively utilized in the classifier. We define several measures of learning difficulty and explore their utility in approximately capturing the inherent complexity of text classification problems. These measures can be efficiently computed for real-world problems for which linear classifiers are effective. We observe an intimate relationship (a high positive correlation) between feature complexity and instance complexity when using the measures.

Electronic Copies


Adobe PDF

BibTeX

EndNote

AIC Personnel

Name Title E-mail
Madani, Omid Senior Computer Scientist

SRI International
©2014 SRI International 333 Ravenswood Avenue, Menlo Park, CA 94025-3493
SRI International is an independent, nonprofit corporation. Privacy policy