Statistical Modelling 16 (3) (2016), 161–200

Regularized regression for categorical data

Gerhard Tutz
Department of Statistics,
Ludwig-Maximilians-Universität Munich,
Germany
e-mail: gerhard.tutz@stat.uni-muenchen.de

Jan Gertheiss
Institute of Applied Stochastics and Operations Research,
Clausthal University of Technology,
Germany


Abstract:

In the last two decades, regularization techniques, in particular penalty-based methods, have become very popular in statistical modelling. Driven by technological developments, most approaches have been designed for high-dimensional problems with metric variables, whereas categorical data has largely been neglected. In recent years, however, it has become clear that regularization is also very promising when modelling categorical data. A specific trait of categorical data is that many parameters are typically needed to model the underlying structure. This results in complex estimation problems that call for structured penalties which are tailored to the categorical nature of the data. This article gives a systematic overview of penalty-based methods for categorical data developed so far and highlights some issues where further research is needed. We deal with categorical predictors as well as models for categorical response variables. The primary interest of this article is to give insight into basic properties of and differences between methods that are important with respect to statistical modelling in practice, without going into technical details or extensive discussion of asymptotic properties.

Keywords:

boosting; categorical data; fused lasso; group lasso; multinomial model; proportional odds model; regression trees.

Downloads:

Example code in zipped archive.
back