Statistical Modelling 14 (6) (2014), 523–547

Random forest and variable importance rankings for correlated survival data, with applications to tooth loss.

M.J. Hallett
Computational Science Research Center,
San Diego State University,
San Diego, CA,
USA


J.J. Fan
Department of Mathematics and Statistics,
San Diego State University,
San Diego, CA,
USA
e-mail: jjfan@mail.sdsu.edu


X.G. Su
Department of Mathematical Sciences,
University of Texas,
El Paso, TX,
USA


R.A. Levine
Department of Mathematics and Statistics,
San Diego State University,
San Diego, CA,
USA


M.E. Nunn
Department of Periodontology,
Creighton University,
Omaha, NE,
USA


Abstract:

Oral health is a significant issue for adults because of its relationship to quality of life, as well as systematic health and well being. Impaired oral health can lead to significant health problems, such as pain and infection. This article considers a tree-based method to assess tooth loss. In particular, a variable importance measure based on extremely randomized trees (Geurts et al., 2006) is proposed for correlated survival data, and is applied to the VA Dental Longitudinal Study. This new variable importance method aims to remove the bias of the traditional random forest variable selection, which may favour input variables with more categories, as shown by Strobl et al. (2007). The multivariate exponential tree algorithm of Fan et al. (2009) is used to build trees, as it has superior prediction accuracy and computational efficiency compared to marginal and semiparametric frailty model-based trees (Nunn et al., 2011). Simulation studies for assessing various variable importance methods are presented. To limit the final number of meaningful prognostic groups, an amalgamation procedure is used to develop tooth prognostic groups from a forest of trees. The resulting prognosis rules and variable importance rankings may be used in clinical practice to increase tooth retention and establish rational treatment plans. By ranking the relative importance of various clinical and genetic factors for tooth loss, we are able to provide clinicians with critical information so that they can develop and implement an effective treatment plan.

Keywords:

Random forest; variable importance; correlated survival data; prognostic rules; dental applications; VA Dental Longitudinal Study

Downloads:

Unfortunately the data used for this article is unavailable.
back