A relevant s-matching classifier for the covariate shift  machine learning problem

Yatracos, Yannis G.

Παρακαλώ χρησιμοποιήστε αυτό το αναγνωριστικό για να παραπέμψετε ή να δημιουργήσετε σύνδεσμο προς αυτό το τεκμήριο: https://hdl.handle.net/20.500.14279/3388

Τίτλος:	A relevant s-matching classifier for the covariate shift machine learning problem
Other Titles:	Relevant S-Matching in covariate shift
Συγγραφείς:	Yatracos, Yannis G.
metadata.dc.contributor.other:	Γιατράκος, Γιάννης
Major Field of Science:	Social Sciences
Field Category:	Economics and Business
Λέξεις-κλειδιά:	Covariate Shift Problem;Machine Learning;Matching;Minimal Suﬃcient Statistic
Ημερομηνία Έκδοσης:	23-Ιου-2014
Περίληψη:	In Machine Learning (ML), the x-covariate and its y-label may have di erent joint probability distributions in the learning (called also training) and the test populations. This situation occurs often in practice and is studied, among others, in the covariate shift problem (Bickel et al., 2007), sample selection bias (Zadrozny, 2004), domain adaptation (Daum e and Marcu, 2006) and distance ML (Cao et al., 2009). For any loss l; the classi er minimizing, over a collection of classi ers of interest d ∈ D the (statistical) risk El(d(x); y) in the training population may not be the risk's minimizer over D in the test population; see, for example, Bickel et al. (2007, section 2). An additional problem not addressed is whether the whole learning population or its available subset (called learning \data" or sample) are relevant when obtaining a classi er for the test population or the test data. Both problems are solved herein using tools from causal inference, the minimal su cient statistic S or equivalently ratios of generalized propensity scores (Yatracos, 2011), to identify relevant \matching" groups of x-covariates; x1 and x2 belong to the same matching group when S(x1) ≈ S(x2): Due to su - ciency, the conditional risks on each S-matching group coincide for the learning and the test distributions of (x; y) and are minimized by the same classi er that is used to predict the y-label in the test population. When D consists, for example, of linear classi ers in x; the classi er obtained herein via matching will consist of piecewise linear classi ers, one for each matching group. This approach solves directly the problem of obtaining the same piecewise classi er both for the training and the test populations and reduces the mean square error. The above description is now supplemented with the comments of a reader trained in ML: \the basic idea of the paper seems sensible: learn localized models for di erent regions of the input space, as de ned by similarity to the test distribution. Then, pick the appropriate model for a given test example, and use this to make a prediction." Previously, S(x) was used as weight to adjust the log-likelihood function for covariate shift and improve predictive inference (Shimodaira, 2000, p. 231); x-covariates with the same S-value have equal \importance" (see, for example, Shimodaira, 2000 or Zadrozny, 2004). More recently, S has been used to adjust loss function l to randomized l = Sl and obtain the same optimal classi er in the l -risk and l-risk minimization problems (Bickel et al, 2007). In applications with learning and test data, conditional risk minimization via S allows for the use of learning data relevant to the test data and reduces potential sampling bias as well as the intensity of the optimization problem when the sizes of the training data and the covariates' dimension are large. With k (> 1) learning (x; y)-populations and the test population, the use of Shimodaira's S factor is not possible unless the mixture distribution of the learning populations is available. In this case, for several related \tasks", i.e. parameters in the densities of the learning populations, Bickel et al. (2009) provided for task t the Shimodaira-type weight rt(x; y) and its estimate, in order to \train a hypothesis for task t by minimizing the expected loss over the distributions of all tasks", i.e. for the learning mixture distribution. It is seen herein that rt is the minimal su cient statistic for the test distribution of task t and the learning mixture distribution. When the learning mixture distribution is unknown but the covariate shift distributional assumption holds for the (k+1) populations, the k-dimensional minimal su cient statistic S is used to obtain matching groups of x-covariates and the corresponding classi ers. With learning samples, conditional risk minimization on S-matched groups pooled together from all learning populations is used to obtain the corresponding classi ers as in the case k = 1: This matching approach has been recently used with multiple treatments (\tasks" in ML), when the data is obtained from an observational study (Yatracos, 2011). For the interested reader, a recent review on matching, propensity scores and causal inference is presented in Stuart (2010). In sections 2-4, results are presented for k = 1; a brief description of the results for k > 1 is in section 5.
URI:	https://hdl.handle.net/20.500.14279/3388
Type:	Article
Affiliation:	Cyprus University of Technology
Εμφανίζεται στις συλλογές:	Άρθρα/Articles

Αρχεία σε αυτό το τεκμήριο:

Αρχείο	Περιγραφή	Μέγεθος	Μορφότυπος
matchingandml14Aweb.pdf		91.84 kB	Adobe PDF	Δείτε/ Ανοίξτε

CORE Recommender

Δείξε την πλήρη περιγραφή του τεκμηρίου

Page view(s) 20

459

Last Week
0

Last month
0

checked on 22 Νοε 2024

Download(s) 20

134

checked on 22 Νοε 2024

Google Scholar^TM

Check

Όλα τα τεκμήρια του δικτυακού τόπου προστατεύονται από πνευματικά δικαιώματα

Αρχεία σε αυτό το τεκμήριο:

Page view(s) 20

Download(s) 20

Google ScholarTM

Google Scholar^TM