Robustness against separation and outliers in binary regression
Peter J. Rousseeuw and Andreas Christmann
Date: 05/OCT/2001
Abstract
The logistic regression model is commonly used to describe the effect of one or several
explanatory variables on a binary response variable.
Here we consider an alternative model under which the observed response is strongly
related but not equal to the unobservable true response.
We call this the hidden logistic regression (HLR) model because
the unobservable true responses are comparable to a
hidden layer in a feedforward neural net.
We propose the maximum estimated likelihood method in this model,
which is robust against separation unlike existing methods for
logistic regression.
We also consider outlier-robust estimation in this setting.
Keywords: Logistic regression; Hidden layer; Overlap; Robustness.
References
- A. Albert, J.A. Anderson (1984).
- On the existence of maximum likelihood estimates in logistic regression models.
Biometrika, 71, 1-10.
- A. Christmann (1998).
- On positive breakdown point estimators in regression models
with discrete response variables.
Habilitation thesis, University of Dortmund, Department of Statistics.
- A. Christmann, P. Fischer, T. Joachims (2002).
- Comparison between the regression depth method and the
support vector machine to approximate the minimum
number of misclassifications.
To appear in: Computational Statistics, 2.
- A. Christmann, P. J. Rousseeuw (2001).
- Measuring overlap in logistic regression.
Computational Statistics and Data Analysis, 37, 65-75.
- J.B. Copas (1988).
- Binary Regression Models for Contaminated Data. With discussion.
J.R.Statist.Soc. , B, 50, 225-265.
- H. R. Künsch, L. A. Stefanski, and R. J. Carroll (1989).
- Conditionally unbiased bounded-influence estimation in general regression
models, with applications to generalized linear models.
J. Amer. Statist. Assoc., 84, 460-466.
- P. J. Rousseeuw, M. Hubert (1999).
- Regression Depth.
J. Amer. Statist. Assoc., 94, 388-433.
- P.J. Rousseeuw, P.J., K. Van Driessen (1999).
- Computing LTS Regression for Large Data Sets.
Technical Report, University of Antwerp, submitted.
- T. J. Santner, D. E. Duffy (1986).
- A note on A. Albert and J.A. Anderson's conditions for the existence of
maximum likelihood estimates in logistic regression models.
Biometrika, 73, 755-758.