THE INTERACTIVE DECISION COMMITTEE FOR CHEMICAL TOXICITY ANALYSIS
Keywords:
Chemical toxicity, Decision committee method, Ensemble, Ensemble feature selection, QSAR modeling, Statistical learningAbstract
We introduce the Interactive Decision Committee method for classification when highdimensional
feature variables are grouped into feature categories. The proposed method
uses the interactive relationships among feature categories to build base classifiers which
are combined using decision committees. A two-stage or a single-stage 5-fold crossvalidation
technique is utilized to decide the total number of base classifiers to be combined.
The proposed procedure is useful for classifying biochemicals on the basis of toxicity activity,
where the feature space consists of chemical descriptors and the responses are binary
indicators of toxicity activity. Each descriptor belongs to at least one descriptor category.
The support vector machine, the random forests, and the tree-based AdaBoost algorithms
are utilized as classifier inducers. Forward selection is used to select the best combinations
of the base classifiers given the number of base classifiers. Simulation studies demonstrate
that the proposed method outperforms a single large, unaggregated classifier in the presence
of interactive feature category information. We applied the proposed method to two
toxicity data sets associated with chemical compounds. For these data sets, the proposed
method improved classification performance for the majority of outcomes compared to a
single large, unaggregated classifier.