Koller, Daphne and Sahami, Mehran (1996) Toward Optimal Feature Selection. Technical Report. Stanford InfoLab.
In this paper, we examine a method for feature subset selection based on Information Theory. Initially, a framework for defining the theoretically optimal, but computationally intractable, method for feature subset selection is presented. We show that our goal should be to eliminate a feature if it gives us little or no additional information beyond that subsumed by the remaining features. In particular, this will be the case for both irrelevant and redundant features. We then give an efficient algorithm for feature selection which computes an approximation to the optimal feature selection criterion. The conditions under which the approximate algorithm is successful are examined. Empirical results are given on a number of data sets, showing that the algorithm effectively handles datasets with a very large number of features.
|Item Type:||Techreport (Technical Report)|
|Additional Information:||Previous number = SIDL-WP-1996-0032|
|Subjects:||Computer Science > Digital Libraries|
|Related URLs:||Project Homepage||http://www-diglib.stanford.edu/diglib/pub/|
|Deposited By:||Import Account|
|Deposited On:||28 Oct 2001 16:00|
|Last Modified:||09 Dec 2008 08:40|
Repository Staff Only: item control page