The function NPGL2() defines the mean-parametrized version of the Poisson-generalised Lindley distribution, a two-parameter distribution, for a gamlss.family object to be used in GAMLSS fitting using the function gamlss().

NPGL2(mu.link = "log", sigma.link = "log")

Arguments

defines the mu.link, with "log" link as the default for the mu parameter (the mean of the distribution).

defines the sigma.link, with "log" link as the default for the sigma parameter (parameter alpha of the original NPGL).

Value

Returns a gamlss.family object which can be used to fit a mean-parametrized NPGL distribution in the gamlss() function.

Details

This family uses the mean-parametrized version of the NPGL distribution proposed in Section 6 of Altun (2021). The new parameters are:

  • \(\mu > 0\): the mean of the distribution, i.e. \(E[X]\).

  • \(\sigma > 0\)

The reparametrization links the mean and \(\sigma\) to the original parameter \(\mu\) via the mean equation \(E[X] = (\sigma + \mu) / (\mu (1 + \mu))\)

References

Altun, E. A new two-parameter discrete poisson-generalized Lindley distribution with properties and applications to healthcare data sets. Comput Stat 36, 2841-2861 (2021). https://doi.org/10.1007/s00180-021-01097-0

See also

Author

Tomas Mesa, tomas.mesaz@udea.edu.co

Examples

# Example 1
# Generating some random values with
# known mu and sigma

set.seed(123)
y <- rNPGL2(n=5000, mu=4, sigma=2)

# Fitting the model
library(gamlss)
mod1 <- gamlss(y~1, sigma.fo=~1, family=NPGL2,
               control=gamlss.control(n.cyc=500, trace=TRUE))
#> GAMLSS-RS iteration 1: Global Deviance = 24740.63 

# Extracting the fitted values for mu and sigma
# using the inverse link function
exp(coef(mod1, what="mu"))
#> (Intercept) 
#>    3.954113 
exp(coef(mod1, what="sigma"))
#> (Intercept) 
#>    2.045665 

# Example 2
# Generating random values under some model

# A function to simulate a data set with Y ~ NPGL2
gendat <- function(n) {
  x1 <- runif(n)
  x2 <- runif(n)
  mu    <- exp(1.5 - 1.8 * x1)  # mean of the distribution
  sigma <- exp(0.5 + 0.8 * x2)  # shape parameter alpha
  y <- rNPGL2(n=n, mu=mu, sigma=sigma)
  data.frame(y=y, x1=x1, x2=x2)
}

set.seed(123)
dat <- gendat(n=1000)

# Fitting the model
mod2 <- gamlss(y~x1, sigma.fo=~x2, family=NPGL2, data=dat,
               control=gamlss.control(n.cyc=800, trace=TRUE))
#> GAMLSS-RS iteration 1: Global Deviance = 3688.414 
#> GAMLSS-RS iteration 2: Global Deviance = 3688.414 

summary(mod2)
#> Warning: summary: vcov has failed, option qr is used instead
#> ******************************************************************
#> Family:  c("NPGL2", "Mean-parametrized Poisson-generalised Lindley") 
#> 
#> Call:  gamlss(formula = y ~ x1, sigma.formula = ~x2, family = NPGL2,  
#>     data = dat, control = gamlss.control(n.cyc = 800, trace = TRUE)) 
#> 
#> Fitting method: RS() 
#> 
#> ------------------------------------------------------------------
#> Mu link function:  log
#> Mu Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  1.44320    0.06522   22.13   <2e-16 ***
#> x1          -1.69399    0.12653  -13.39   <2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> ------------------------------------------------------------------
#> Sigma link function:  log
#> Sigma Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)
#> (Intercept)   0.5676     0.7228   0.785    0.432
#> x2            0.7323     0.9646   0.759    0.448
#> 
#> ------------------------------------------------------------------
#> No. of observations in the fit:  1000 
#> Degrees of Freedom for the fit:  4
#>       Residual Deg. of Freedom:  996 
#>                       at cycle:  2 
#>  
#> Global Deviance:     3688.414 
#>             AIC:     3696.414 
#>             SBC:     3716.045 
#> ******************************************************************

# Example 3
# Using the data from Altun (2021), Section 8.1.
# The dataset is about the 1991 Arizona cardiovascular patient.
# We model the length of hospital stay (los) with cardiovascular
# procedure, sex, type of admission and age as covariates.
# The response variable has a dispersion index of 5.43,
# confirming over-dispersion.
# Data available in the azpro dataset of the COUNT package.

library(COUNT)
#> Loading required package: msme
#> Loading required package: MASS
#> Loading required package: lattice
#> Loading required package: sandwich
data(azpro)
mod3 <- gamlss(
  los ~ procedure + sex + admit + age75,
  sigma.fo = ~1,
  family = NPGL2(mu.link = "log"),
  data = azpro,
  control = gamlss.control(n.cyc = 500, trace = TRUE)
  )
#> GAMLSS-RS iteration 1: Global Deviance = 21127.37 
#> GAMLSS-RS iteration 2: Global Deviance = 21126.52 
#> GAMLSS-RS iteration 3: Global Deviance = 21126.52 
#> GAMLSS-RS iteration 4: Global Deviance = 21126.51 
#> GAMLSS-RS iteration 5: Global Deviance = 21126.51 

# Extracting the fitted values for mu and sigma
# using the inverse link function
coef(mod3, what="mu")
#> (Intercept)   procedure         sex       admit       age75 
#>   1.4046131   0.9749667  -0.1263800   0.3756807   0.1190247 
coef(mod3, what="sigma")
#> (Intercept) 
#>   0.9999109 

# Note: coef(mod3, what="mu") is the estimated mean length
# of hospital stay for the reference group (PTCA procedure,
# female, elective admission, age <= 75), directly interpretable
# because mu is the mean of the NPGL2 distribution.

# Example 4
# Using the data from Altun (2021), Section 8.2.
# The dataset originates from the US National Medical Expenditure
# Survey (NMES) conducted in 1987 and 1988. We model the number
# of physician office visits with hospital stays, chronic conditions,
# activity limitations, age, gender, marital status, income,
# employment status, Medicaid, private insurance and self-perceived
# health status as covariates.
# The response variable has a dispersion index of 7.91,
# confirming over-dispersion.
# Data available in the NMES1988 dataset of the AER package.

library(AER)
#> Loading required package: car
#> Loading required package: carData
#> Loading required package: lmtest
#> Loading required package: zoo
#> 
#> Attaching package: ‘zoo’
#> The following objects are masked from ‘package:base’:
#> 
#>     as.Date, as.Date.numeric
#> Loading required package: survival
data("NMES1988")

# Set "average" as the reference category for health status,
# so that healthexcellent and healthpoor are the two dummies,
# matching x11 and x12 in Altun (2021).

NMES1988$health <- relevel(factor(NMES1988$health), ref = "average")

mod4 <- gamlss(
  visits ~ hospital + chronic + adl + age + gender + married +
    income + employed + medicaid + insurance + health,
  sigma.fo = ~1,
  family = NPGL2(mu.link = "log"),
  data = NMES1988,
  control = gamlss.control(n.cyc = 500, trace = TRUE)
)
#> GAMLSS-RS iteration 1: Global Deviance = 24345.32 
#> GAMLSS-RS iteration 2: Global Deviance = 24345.25 
#> GAMLSS-RS iteration 3: Global Deviance = 24345.25 

coef(mod4, what="mu")
#>     (Intercept)        hospital         chronic      adllimited             age 
#>     1.441319681     0.215708889     0.171200735     0.074079761    -0.054148935 
#>      gendermale      marriedyes          income     employedyes     medicaidyes 
#>    -0.099282798    -0.032870035     0.007873323     0.029981452     0.231061396 
#>    insuranceyes      healthpoor healthexcellent 
#>     0.389632677     0.227960025    -0.325949741 
coef(mod4, what="sigma")
#> (Intercept) 
#>    0.264392