Please use this identifier to cite or link to this item:
|Title:||A relevant s-matching classifier for the covariate shift machine learning problem||Other Titles:||Relevant S-Matching in covariate shift||Authors:||Yatracos, Yannis G.||Keywords:||Covariate Shift Problem;Machine Learning;Matching;Minimal Suﬃcient Statistic||Category:||Economics and Business||Field:||Social Sciences||Issue Date:||23-Jun-2014||Abstract:||In Machine Learning (ML), the x-covariate and its y-label may have di erent joint probability distributions in the learning (called also training) and the test populations. This situation occurs often in practice and is studied, among others, in the covariate shift problem (Bickel et al., 2007), sample selection bias (Zadrozny, 2004), domain adaptation (Daum e and Marcu, 2006) and distance ML (Cao et al., 2009). For any loss l; the classi er minimizing, over a collection of classi ers of interest d ∈ D the (statistical) risk El(d(x); y) in the training population may not be the risk's minimizer over D in the test population; see, for example, Bickel et al. (2007, section 2). An additional problem not addressed is whether the whole learning population or its available subset (called learning \data" or sample) are relevant when obtaining a classi er for the test population or the test data. Both problems are solved herein using tools from causal inference, the minimal su cient statistic S or equivalently ratios of generalized propensity scores (Yatracos, 2011), to identify relevant \matching" groups of x-covariates; x1 and x2 belong to the same matching group when S(x1) ≈ S(x2): Due to su - ciency, the conditional risks on each S-matching group coincide for the learning and the test distributions of (x; y) and are minimized by the same classi er that is used to predict the y-label in the test population. When D consists, for example, of linear classi ers in x; the classi er obtained herein via matching will consist of piecewise linear classi ers, one for each matching group. This approach solves directly the problem of obtaining the same piecewise classi er both for the training and the test populations and reduces the mean square error. The above description is now supplemented with the comments of a reader trained in ML: \the basic idea of the paper seems sensible: learn localized models for di erent regions of the input space, as de ned by similarity to the test distribution. Then, pick the appropriate model for a given test example, and use this to make a prediction." Previously, S(x) was used as weight to adjust the log-likelihood function for covariate shift and improve predictive inference (Shimodaira, 2000, p. 231); x-covariates with the same S-value have equal \importance" (see, for example, Shimodaira, 2000 or Zadrozny, 2004). More recently, S has been used to adjust loss function l to randomized l = Sl and obtain the same optimal classi er in the l -risk and l-risk minimization problems (Bickel et al, 2007). In applications with learning and test data, conditional risk minimization via S allows for the use of learning data relevant to the test data and reduces potential sampling bias as well as the intensity of the optimization problem when the sizes of the training data and the covariates' dimension are large. With k (> 1) learning (x; y)-populations and the test population, the use of Shimodaira's S factor is not possible unless the mixture distribution of the learning populations is available. In this case, for several related \tasks", i.e. parameters in the densities of the learning populations, Bickel et al. (2009) provided for task t the Shimodaira-type weight rt(x; y) and its estimate, in order to \train a hypothesis for task t by minimizing the expected loss over the distributions of all tasks", i.e. for the learning mixture distribution. It is seen herein that rt is the minimal su cient statistic for the test distribution of task t and the learning mixture distribution. When the learning mixture distribution is unknown but the covariate shift distributional assumption holds for the (k+1) populations, the k-dimensional minimal su cient statistic S is used to obtain matching groups of x-covariates and the corresponding classi ers. With learning samples, conditional risk minimization on S-matched groups pooled together from all learning populations is used to obtain the corresponding classi ers as in the case k = 1: This matching approach has been recently used with multiple treatments (\tasks" in ML), when the data is obtained from an observational study (Yatracos, 2011). For the interested reader, a recent review on matching, propensity scores and causal inference is presented in Stuart (2010). In sections 2-4, results are presented for k = 1; a brief description of the results for k > 1 is in section 5.||URI:||http://ktisis.cut.ac.cy/handle/10488/3547||Type:||Article|
|Appears in Collections:||Άρθρα/Articles|
Show full item record
Page view(s) 5032
checked on Nov 24, 2017
checked on Nov 24, 2017
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.