Graph-based strategies for performing the exhaustive and random k-fold cross-validations
Journal
Journal of Computational and Graphical Statistics
Date Issued
2009
Author(s)
DOI
10.1198/jcgs.2009.08019
Abstract
An efficient graph-based strategy for performing the exhaustive k-fold crossvalidation procedure is proposed. All training (and testing) subsets are presented as nodes of a complete weighted graph. The arcs between the nodes indicate the different possibilities for deriving the solution of the destination node given the solution of the source node. The weights of the arcs represent the complexities of (the numerical operations involved in) updating and downdating the corresponding data matrices. The complete graph with arcs connecting every pair of nodes is defined and its properties are investigated. The optimum way of performing the exhaustive k-fold cross-validation is equivalent in deriving the path within the graph that has the minimum computational complexity. Furthermore, a generalization of the complete k-fold cross-validation graph is used to derive new strategies for performing random k-fold cross-validations. The proposed strategies generate additional nodes during the computations, which are part of the generalized graph. The additional nodes represent new models which have not been required initially, but provide additional information about the evaluated model. The advantages and the drawbacks of the proposed strategies are discussed. Numerical results are presented and analyzed. Finally the computation of all nearest neighbors of a given node is also considered. The Fortran 90 source code for the algorithms in the manuscript is available on-line.

