The function NPGL2() defines the mean-parametrized version of the
Poisson-generalised Lindley distribution, a two-parameter distribution,
for a gamlss.family object to be used in GAMLSS fitting using
the function gamlss().
NPGL2(mu.link = "log", sigma.link = "log")Returns a gamlss.family object which can be used to fit a
mean-parametrized NPGL distribution in the gamlss() function.
This family uses the mean-parametrized version of the NPGL distribution proposed in Section 6 of Altun (2021). The new parameters are:
\(\mu > 0\): the mean of the distribution, i.e. \(E[X]\).
\(\sigma > 0\)
The reparametrization links the mean and \(\sigma\) to the original parameter \(\mu\) via the mean equation \(E[X] = (\sigma + \mu) / (\mu (1 + \mu))\)
Altun, E. A new two-parameter discrete poisson-generalized Lindley distribution with properties and applications to healthcare data sets. Comput Stat 36, 2841-2861 (2021). https://doi.org/10.1007/s00180-021-01097-0
# Example 1
# Generating some random values with
# known mu and sigma
set.seed(123)
y <- rNPGL2(n=5000, mu=4, sigma=2)
# Fitting the model
library(gamlss)
mod1 <- gamlss(y~1, sigma.fo=~1, family=NPGL2,
control=gamlss.control(n.cyc=500, trace=TRUE))
#> GAMLSS-RS iteration 1: Global Deviance = 24740.63
# Extracting the fitted values for mu and sigma
# using the inverse link function
exp(coef(mod1, what="mu"))
#> (Intercept)
#> 3.954113
exp(coef(mod1, what="sigma"))
#> (Intercept)
#> 2.045665
# Example 2
# Generating random values under some model
# A function to simulate a data set with Y ~ NPGL2
gendat <- function(n) {
x1 <- runif(n)
x2 <- runif(n)
mu <- exp(1.5 - 1.8 * x1) # mean of the distribution
sigma <- exp(0.5 + 0.8 * x2) # shape parameter alpha
y <- rNPGL2(n=n, mu=mu, sigma=sigma)
data.frame(y=y, x1=x1, x2=x2)
}
set.seed(123)
dat <- gendat(n=1000)
# Fitting the model
mod2 <- gamlss(y~x1, sigma.fo=~x2, family=NPGL2, data=dat,
control=gamlss.control(n.cyc=800, trace=TRUE))
#> GAMLSS-RS iteration 1: Global Deviance = 3688.414
#> GAMLSS-RS iteration 2: Global Deviance = 3688.414
summary(mod2)
#> Warning: summary: vcov has failed, option qr is used instead
#> ******************************************************************
#> Family: c("NPGL2", "Mean-parametrized Poisson-generalised Lindley")
#>
#> Call: gamlss(formula = y ~ x1, sigma.formula = ~x2, family = NPGL2,
#> data = dat, control = gamlss.control(n.cyc = 800, trace = TRUE))
#>
#> Fitting method: RS()
#>
#> ------------------------------------------------------------------
#> Mu link function: log
#> Mu Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 1.44320 0.06522 22.13 <2e-16 ***
#> x1 -1.69399 0.12653 -13.39 <2e-16 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> ------------------------------------------------------------------
#> Sigma link function: log
#> Sigma Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.5676 0.7228 0.785 0.432
#> x2 0.7323 0.9646 0.759 0.448
#>
#> ------------------------------------------------------------------
#> No. of observations in the fit: 1000
#> Degrees of Freedom for the fit: 4
#> Residual Deg. of Freedom: 996
#> at cycle: 2
#>
#> Global Deviance: 3688.414
#> AIC: 3696.414
#> SBC: 3716.045
#> ******************************************************************
# Example 3
# Using the data from Altun (2021), Section 8.1.
# The dataset is about the 1991 Arizona cardiovascular patient.
# We model the length of hospital stay (los) with cardiovascular
# procedure, sex, type of admission and age as covariates.
# The response variable has a dispersion index of 5.43,
# confirming over-dispersion.
# Data available in the azpro dataset of the COUNT package.
library(COUNT)
#> Loading required package: msme
#> Loading required package: MASS
#> Loading required package: lattice
#> Loading required package: sandwich
data(azpro)
mod3 <- gamlss(
los ~ procedure + sex + admit + age75,
sigma.fo = ~1,
family = NPGL2(mu.link = "log"),
data = azpro,
control = gamlss.control(n.cyc = 500, trace = TRUE)
)
#> GAMLSS-RS iteration 1: Global Deviance = 21127.37
#> GAMLSS-RS iteration 2: Global Deviance = 21126.52
#> GAMLSS-RS iteration 3: Global Deviance = 21126.52
#> GAMLSS-RS iteration 4: Global Deviance = 21126.51
#> GAMLSS-RS iteration 5: Global Deviance = 21126.51
# Extracting the fitted values for mu and sigma
# using the inverse link function
coef(mod3, what="mu")
#> (Intercept) procedure sex admit age75
#> 1.4046131 0.9749667 -0.1263800 0.3756807 0.1190247
coef(mod3, what="sigma")
#> (Intercept)
#> 0.9999109
# Note: coef(mod3, what="mu") is the estimated mean length
# of hospital stay for the reference group (PTCA procedure,
# female, elective admission, age <= 75), directly interpretable
# because mu is the mean of the NPGL2 distribution.
# Example 4
# Using the data from Altun (2021), Section 8.2.
# The dataset originates from the US National Medical Expenditure
# Survey (NMES) conducted in 1987 and 1988. We model the number
# of physician office visits with hospital stays, chronic conditions,
# activity limitations, age, gender, marital status, income,
# employment status, Medicaid, private insurance and self-perceived
# health status as covariates.
# The response variable has a dispersion index of 7.91,
# confirming over-dispersion.
# Data available in the NMES1988 dataset of the AER package.
library(AER)
#> Loading required package: car
#> Loading required package: carData
#> Loading required package: lmtest
#> Loading required package: zoo
#>
#> Attaching package: ‘zoo’
#> The following objects are masked from ‘package:base’:
#>
#> as.Date, as.Date.numeric
#> Loading required package: survival
data("NMES1988")
# Set "average" as the reference category for health status,
# so that healthexcellent and healthpoor are the two dummies,
# matching x11 and x12 in Altun (2021).
NMES1988$health <- relevel(factor(NMES1988$health), ref = "average")
mod4 <- gamlss(
visits ~ hospital + chronic + adl + age + gender + married +
income + employed + medicaid + insurance + health,
sigma.fo = ~1,
family = NPGL2(mu.link = "log"),
data = NMES1988,
control = gamlss.control(n.cyc = 500, trace = TRUE)
)
#> GAMLSS-RS iteration 1: Global Deviance = 24345.32
#> GAMLSS-RS iteration 2: Global Deviance = 24345.25
#> GAMLSS-RS iteration 3: Global Deviance = 24345.25
coef(mod4, what="mu")
#> (Intercept) hospital chronic adllimited age
#> 1.441319681 0.215708889 0.171200735 0.074079761 -0.054148935
#> gendermale marriedyes income employedyes medicaidyes
#> -0.099282798 -0.032870035 0.007873323 0.029981452 0.231061396
#> insuranceyes healthpoor healthexcellent
#> 0.389632677 0.227960025 -0.325949741
coef(mod4, what="sigma")
#> (Intercept)
#> 0.264392