Centile estimation includes methods for estimating the age related distribution of human growth.
The standard estimation of centile curves involves two continuous variables:
- the response variable, that is, the variable we are interested in and for which we are trying to find the centile curves, e.g. weight, BMI, head circumference etc. and
- the explanatory variable `age’.
The 100p centile of a random variable Y is the value y such that p(Y <= y)=p, i.e. y= inv.cdf(p), where inv.cdf() the the inverse cumulative distribution function of of Y applied to p. (Note that inv.cdf() is the q function in R). Here we consider the conditional centile of Y given explanatory variable `age’. By varying `age’ a 100p centile curve of y against `age’ is obtained. Centile curves can be obtained for different values of p. The World Health Organisation uses 100p=(3, 15, 50, 85, 97) in its charts and 100p=(1, 3, 5, 15, 25, 50, 75, 85, 95, 97, 99) in its tables.
This can be extended to more than one explanatory variable.
The methodology for creating growth centile references for individuals from a population comprises mainly two different methods:
- the non parametric method of quantile regression (Koenker, 2005; Koenker and Bassett, 1978, Koenker and Ng (2005), He and Ng (1999) and Ng and Maechler (2007))
- the parametric LMS (i.e. Lambda, Mu and Sigma) method of Cole (1988), Cole and Green (1992) and its extensions for example see Wright and Royston (1997), van Buuren and Fredriks (2001), and Rigby and Stasinopoulos (2004, 2006).
The LMS method and its extensions are subclass of GAMLSS. The LMS method within the gamlss() function is equivalent of assuming the Box- Cox Cole and Green distribution (BCCG) for the response variable. The BCCG distribution is suitable for both positively or negatively skew data but does cope with kurtotic data.
Rigby and Stasinopoulos (2004, 2006) extended the LMS method by introducing the Box-Cox power exponential (BCPE) and the Box-Cox t (BCT) distributions respectively and called the resulting methods LMSP and LMST respectively. The BCPE assumes that the transformed random variable Z has a (truncated) exponential power distribution while BCT assumes that Z has a (truncated) t distribution.
More recently Hossain et al. (2016) developed methodology to deal with responses variables defined on [0,1], including zero and one. Those models can be fitted in GAMLSS.
The functions lms() and quantSheets() are available in gamlss package for fitting the LMS method and quantile regression respectively. More information on how to use those function can be found in Chapter 13 of the book `Flexible Regression and Smoothing: Using GAMLSS in R’