Efficient algorithms for computing the best subset regression models for large-scale problems

Hofmann, Marc H.; Gatu, Cristian; Kontoghiorghes, Erricos John

doi:10.1016/j.csda.2007.03.017

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.14279/2017

Title:	Efficient algorithms for computing the best subset regression models for large-scale problems
Authors:	Hofmann, Marc H. Gatu, Cristian Kontoghiorghes, Erricos John
Major Field of Science:	Social Sciences
Keywords:	Branch and bound algorithms;Decision trees;Regression analysis;Mathematical models
Issue Date:	15-Sep-2007
Source:	Computational Statistics and Data Analysis, 2007, vol. 52, no. 1, pp. 16-29.
Volume:	52
Issue:	1
Start page:	16
End page:	29
Journal:	Computational Statistics and Data Analysis
Abstract:	Several strategies for computing the best subset regression models are proposed. Some of the algorithms are modified versions of existing regression-tree methods, while others are new. The first algorithm selects the best subset models within a given size range. It uses a reduced search space and is found to outperform computationally the existing branch-and-bound algorithm. The properties and computational aspects of the proposed algorithm are discussed in detail. The second new algorithm preorders the variables inside the regression tree. A radius is defined in order to measure the distance of a node from the root of the tree. The algorithm applies the preordering to all nodes which have a smaller distance than a certain radius that is given a priori. An efficient method of preordering the variables is employed. The experimental results indicate that the algorithm performs best when preordering is employed on a radius of between one quarter and one third of the number of variables. The algorithm has been applied with such a radius to tackle large-scale subset-selection problems that are considered to be computationally infeasible by conventional exhaustive-selection methods. A class of new heuristic strategies is also proposed. The most important of these is one that assigns a different tolerance value to each subset model size. This strategy with different kind of tolerances is equivalent to all exhaustive and heuristic subset-selection strategies. In addition the strategy can be used to investigate submodels having noncontiguous size ranges. Its implementation provides a flexible tool for tackling large scale models.
URI:	https://hdl.handle.net/20.500.14279/2017
ISSN:	1679473
DOI:	10.1016/j.csda.2007.03.017
Rights:	© Elsevier
Type:	Article
Affiliation:	Cyprus University of Technology
Affiliation :	University of Cyprus
Publication Type:	Peer Reviewed
Appears in Collections:	Άρθρα/Articles

CORE Recommender

Show full item record

SCOPUS^TM
Citations

41

checked on Nov 9, 2023

WEB OF SCIENCE^TM
Citations 50

36

Last Week
0

Last month
0

checked on Oct 29, 2023

Page view(s) 10

558

Last Week
0

Last month
6

checked on Dec 22, 2024

Google Scholar^TM

Check

Altmetric

This item is licensed under a Creative Commons License

SCOPUSTM Citations

WEB OF SCIENCETM Citations 50

Page view(s) 10

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations 50

Google Scholar^TM