Gene Selection Using Random Forest and Proximity Differences Criterion on DNA Microarray Data
 
 
Qifeng Zhou*Corresponding author, Wencai Hong, Linkai Luo, Fan Yang
Department of Automation, Xiamen University, Xiamen 361005, China
zhouqf@xmu.edu.cn, 369414951@163.com, luolk@xmu.edu.cn
doi: 10.4156/jcit.vol5.issue6.17 

Abstract
 Selection of relevant genes for sample classification is a common task in most gene expression studies. As a powerful classification approach, random forest has been applied in this field, and it shows excellent performance compared with other classification methods. The measure of variable importance is the key of gene selection using random forest. However, the existing methods just consider the original variable importance measure based on the OOB error. In this paper, we proposed a new variable importance measure based on the difference of proximity matrix, and used it for gene selection on DNA microarray data. Compared with the existing variable importance analysis of random forest, the new method is more sensitive to information gene and yields small sets of genes while preserving predictive accuracy.

Keyword
 Gene selection, Random forest, Variable importance, Proximity matrix

Qifeng Zhou, Wencai Hong, Linkai Luo, Fan Yang, "Gene Selection Using Random Forest and Proximity Differences Criterion on DNA Microarray Data", JCIT: , Vol. 5, No. 6, pp. 161 ~ 170, 2010