Centile estimation includes methods for estimating the age related distribution of human growth.
The standard estimation of centile curves involves two continuous variables:
- the response variable, that is, the variable we are interested in and for which we are trying to find the centile curves, e.g. weight, BMI, head circumference etc. and
- the explanatory variable age.
The 100p centile of a random variable Y is the value y such that p(Y > y) = p, i.e. y= q-function(p), where q-function() the the inverse cumulative distribution function of of Y applied to p. Here we consider the conditional centile of Y given explanatory variable X (usually age). By varying X a 100p centile curve of y(x) against x is obtained. Centile curves can be obtained for different values of p. The World Health Organisation uses 100p=(3, 15, 50, 85, 97) in its charts and 100p=(1, 3, 5, 15, 25, 50, 75, 85, 95, 97, 99) in its tables.
The methodology can be extended to more than one explanatory variable.
The literature for creating growth centile references for individuals from a population comprises two different methods:
- the quantile regression methodology [Koenker, 2005; Koenker and Bassett, 1978, Koenker and Ng (2005), He and Ng (1999) and Ng and Maechler (2007)]
- the LMS (i.e. Lambda, Mu and Sigma) method of Cole (1988), Cole and Green (1992) and its extensions see Wright and Royston (1997), van Buuren and Fredriks (2001), and Rigby and Stasinopoulos (2004, 2006).
The LMS method, within GAMLSS, is equivalent of assuming the three parameter Box- Cox Cole and Green distribution (BCCG) for the response variable and fitting a smooth curves against age for μ, σ, and ν. The BCCG distribution is derived by assuming that Y, the response variable is a specific function of a random variable Z which has a (truncated) normal distribution. The BCCG distribution is suitable for positively or negatively skew data depending on the values of the parameter ν.
Rigby and Stasinopoulos (2004, 2006) extended the LMS method (which allows for skewness and but not for kurtosis in the data), by introducing the Box-Cox power exponential (BCPE) and the Box-Cox t (BCT) distributions respectively and called the resulting methods LMSP and LMST respectively. The BCPE assumes that the transformed random variable Z has a (truncated) exponential power distribution while BCT assumes that Z has a (truncated) t-distribution.
More recently Hossain et al. (2016) developed methodology to deal with responses variables defined on [0,1], including zero and one. Those models can be also fitted in GAMLSS.
The functions lms() and quantSheets() are available in gamlss package for fitting the LMS method and quantile regression respectively. More information on how to use those function can be found in Chapter 13 of the book `Flexible Regression and Smoothing: Using GAMLSS in R’