Inference Procedures on the Generalized Poisson Distribution from Multiple Samples: Comparisons with Nonparametric Models for Analysis of Covariance (ANCOVA) of Count Data

Show more

1. Introduction

The Poisson distribution is commonly used to model count data. However, a restriction of this distribution is that the response variable must have a mean equal to the variance. This restriction does not often hold true for many biological and epidemiological data. In many applications the variance can be much larger than the mean, a phenomenon known as “over dispersion”. This over dispersion may occur due to population heterogeneity, or presence of outliers in the data [1]. An analysis of data with overly dispersed counts can lead to the underestimation of parameter standard error, if overdispersion is ignored. A review of the issue of overdispersion in both binary and count data was reviewed by Hinde and Demetrio [2], and in a more recent review by Hayat and Higgins [3]. Diagnosing and accounting for overdispersion is not a simple issue and should be appropriately dealt with to avoid bias in interpreting the results.

The Negative-Binomial (NB) distribution has been used as an alternative to the Poisson distribution for modeling data that exhibit overdispersion. The NB has two parameters and a variance that is a quadratic function of the mean. NB model has been the model of choice for the analysis of overly dispersed count data. The NB regression was reviewed by Hinde and Demetrio [2]. Joe and Zhu [4] drew a comparison between the NB and a mixture-based generalization of the Poisson distribution.

In this paper we discuss several inferential statistical issues related to a modified form of the Generalized Poisson Distribution (GPD). The GPD distribution was introduced to the statistical literature by Consul and Jain [5] and a detailed account of its properties was given by Consul [6]. The distribution has two parameters, and a variance that is cubic function of the mean. The distribution has been used to analyze data in the fields of genetics [7] as a queuing model [8] [9] [10] and genomics [11]. The modified form of the GPD, which we shall call “Modified Poisson Distribution” (MGPD) was first discussed in [12]. The modification was a double parametric transformation on the original parameters of the GPD. The main purpose of the transformation was to achieve parameters orthogonality [13] and make the MGPD a member of the class of “Generalized Linear Models” [14]. Recently Shoukri and Al-Eid investigated several inference procedures in the two samples situation [15].

This paper has three-fold objectives. In Section 2, we present the model. We then, assume that we have k independent samples and we demonstrate how to construct statistical testing procedures on the dispersion parameters. Specifically, we first validate the hypothesis of homogeneity of dispersion parameters, thereafter we test the significance of the common dispersion parameter. In Section 3 we test the hypothesis of equality of k-means in the presence of overdispersion. When covariates are measured, testing the equality of group means is therefore equivalent to the Analysis of covariance (ANCOVA) in the presence of overdispersion. In Section 4 we use the COVID-19 mortality data to draw a comparison between the MGPD, and the Generalized Additive Models (GAM). We demonstrate the differences between the two analytic strategies and highlight the superiority of the MGPD in the analysis of count data exhibiting overdispersion in Section 5. General discussion is presented in Section 6.

2. The Model and Its Parameters Estimation

2.1. Modified Generalized Poisson Distribution

The GPD was introduced by Consul and Jain [5]

$\begin{array}{l}P\left(Y=y\right)=\frac{{\lambda}_{1}{\left({\lambda}_{1}+{\lambda}_{2}y\right)}^{y-1}}{y!}\mathrm{exp}\left[-{\lambda}_{1}-{\lambda}_{2}y\right]\\ {\lambda}_{1}>0\\ 0\preccurlyeq {\lambda}_{2}<1\end{array}$ (2.1)

The GPD whose probability function is given in (2.1) reduces to the well-known Poisson distribution when ${\lambda}_{2}=0$. Therefore the parameter ${\lambda}_{2}$ with the above restriction on its range, is considered the dispersion parameter. Shoukri and Mian [12] employed the parametric transformations:

$\begin{array}{l}{\lambda}_{1}=\mu /\left(1+\u03f5\mu \right)\\ {\lambda}_{2}=\u03f5{\lambda}_{1}\end{array}$ (2.2)

Using the transformations in (2.2) we therefore have:

$P\left(X=x\right)=\frac{{\left(1+\u03f5x\right)}^{x-1}}{x!}{g}^{x}\left(\mu ,\u03f5\right)\mathrm{exp}\left[-\frac{\mu}{1+\u03f5\mu}\right]$ (2.3)

where $g\left(\mu ,\u03f5\right)=\frac{\mu}{1+\u03f5\mu}\mathrm{exp}\left[\frac{-\u03f5\mu}{1+\u03f5\mu}\right]$

For fixed $\u03f5$, the function $g(.)$ in (2.3) is the natural parameter transformation which renders the GPD a member of the linear family of exponential class (see; [14] ), with a general structure:

$f\left(x\right)=h\left(x\right)\mathrm{exp}\left[\varphi T\left(x\right)-A\left(\varphi \right)\right]$ (2.4)

We call the transformed GPD, the “Modified Generalized Poisson Distribution” or MGPD.

Shoukri and Mian [12] showed that a recurrence relation among the ${r}^{th}$ non-central moments ${{\vartheta}^{\prime}}_{r}$ is such that:

${{\vartheta}^{\prime}}_{r+1}={\sigma}^{2}\left(\mu \right)\frac{\partial {{\vartheta}^{\prime}}_{r}}{\partial \mu}+\mu {{\vartheta}^{\prime}}_{r}$ (2.5)

From (2.5) we can show that:

${{\vartheta}^{\prime}}_{0}\equiv 1,{{\vartheta}^{\prime}}_{1}\equiv \mu =E\left(Y\right),\text{and}{\sigma}^{2}\left(\mu \right)=\mu {\left(1+\u03f5\mu \right)}^{2}\equiv \mathrm{var}\left(Y\right)$ (2.6)

That is the variance is a cubic function of the population mean. We shall deal with the situation when $\u03f5>0$.

2.2. Point Estimators

Our approach for parametric estimation in this section will be for a single random sample. If ${Y}_{1},{Y}_{2},\cdots ,{Y}_{n}$ is a random sample from the GPD (2.3), Consul and Shoukri [16] showed that the unique maximum likelihood estimates of the parameters exist if and only if the sample variance is larger than the sample mean. Here we shall use the sample moments to obtain estimators for the model parameters $\left(\mu ,\u03f5\right)$.

Equating the first two sample moments $\left(\stackrel{\xaf}{y},{s}^{2}\right)$ to their corresponding population moments

$\begin{array}{c}\stackrel{\xaf}{y}=\mu \\ {s}^{2}=\frac{1}{n}{\displaystyle \underset{i=1}{\overset{i=1}{\sum}}{\left({y}_{i}-\stackrel{\xaf}{y}\right)}^{2}}=\mu {\left(1+\u03f5\mu \right)}^{2}\end{array}$

and solving for the parameters we get:

$\begin{array}{c}\stackrel{\u02dc}{\mu}=\stackrel{\xaf}{y}\\ \stackrel{\u02dc}{\u03f5}={\left({s}^{2}\right)}^{1/2}{\left(\stackrel{\xaf}{y}\right)}^{-3/2}-{\left(\stackrel{\xaf}{y}\right)}^{-1}\end{array}$ (2.7)

The variance of the moment estimators of the mean and the dispersion parameter are respectively given by Shoukri and Al-Eid [15] as:

$\mathrm{var}\left(\stackrel{^}{\mu}\right)=\mu {\left(1+\u03f5\mu \right)}^{2}/n$ (2.8a)

$v=\mathrm{var}\left(\stackrel{^}{\u03f5}\right)=\frac{{\left(1+\u03f5\mu \right)}^{2}}{2n{\mu}^{2}}\left[1+2\u03f5+3{\u03f5}^{2}\mu \right]$ (2.8b)

2.2.1. Homogeneity of Dispersion Parameters

Suppose that we have $k$ independent random samples from (2.3), which we denote ${Y}_{ij}~GPD\left({\mu}_{i},{\u03f5}_{i}\right)$ with ${n}_{i}$ observations from the ${i}^{th}$ population $\left(i=1,2,\cdots ,k\right)$.

Wedenote the variance of the estimator of $\u03f5$ given in Equation (2.8) by ${v}_{i}$ and, let ${w}_{i}=1/{v}_{i}$, ${v}_{i}$ is given in (2.8b).

Cochran [17] developed a general statistic *Q* which may be used to test the homogeneity of several population parameters. The *Q* statistics has asymptotically, chi-square distribution with *k *− 1 degrees of freedoms. It is defined as:

$Q\_esp={\displaystyle {\sum}_{i=1}^{k}{w}_{i}}{\left({\stackrel{^}{\u03f5}}_{i}-\stackrel{\xaf}{\u03f5}\right)}^{2}/{\displaystyle {\sum}_{i=1}^{k}{w}_{i}}$ (2.9)

where

$\stackrel{\xaf}{\u03f5}={\displaystyle {\sum}_{i=1}^{k}{w}_{i}{\stackrel{^}{\u03f5}}_{i}}/{\displaystyle {\sum}_{i=1}^{k}{w}_{i}}$ (2.10)

The hypothesis
${H}_{0}:{\u03f5}_{1}={\u03f5}_{2}={\u03f5}_{3}=\cdots {\u03f5}_{k}=\u03f5$ of homogeneity of dispersion parameters is rejected whenever the statistic
$Q\_esp$ exceeds
${Q}_{\alpha ,k-1}$, the upper 5% quantile of a chi-square random variable with *k *− 1 degrees of freedom.

2.2.2. Testing the Significance of the Common Dispersion Parameter: ${H}_{0}:\u03f5=0$

Here we develop a test statistic on the null hypothesis of absence of overdispersion. For the case when
${\mu}_{i}$ ’s are unknown, a uniformly most powerful test for
${H}_{0}:\u03f5=0$ (Poisson) versus
${H}_{1}:\u03f5>0$ (*GPD*) cannot be obtained, however the locally powerful Neyman’s
$C\left(\alpha \right)$ test can be constructed [18]. The log-likelihood function is given by

$\begin{array}{c}\mathcal{l}={\displaystyle \underset{i=1}{\overset{k}{\sum}}{\displaystyle \underset{j=1}{\overset{{n}_{i}}{\sum}}\left({y}_{ij}-1\right)}}\mathrm{log}\left(1+\epsilon {y}_{ij}\right)+{\displaystyle \underset{i=1}{\overset{k}{\sum}}{n}_{i}{\stackrel{\xaf}{y}}_{i.}}\left[\mathrm{log}{\mu}_{i}-\mathrm{log}\left(1+\epsilon {\mu}_{i}\right)\right]\\ -{\displaystyle \underset{i=1}{\overset{k}{\sum}}{n}_{i}{\mu}_{i}}\left(\frac{1+\epsilon {\stackrel{\xaf}{y}}_{i.}}{1+{\mu}_{i}}\right)\end{array}$ (2.11)

where ${\stackrel{\xaf}{y}}_{i.}=\frac{{y}_{i.}}{{n}_{i}}={\displaystyle {\sum}_{j=1}^{{n}_{i}}{y}_{ij}/{n}_{i}}$.

The locally asymptotically most powerful $C\left(\alpha \right)$ test is to reject ${H}_{0}$ for large values of ${\left(\partial \mathcal{l}/\partial \u03f5\right)}_{\u03f5=0}$. From (2.11):

${\left(\partial \mathcal{l}/\partial \u03f5\right)}_{\u03f5=0}={\displaystyle \underset{i=1}{\overset{k}{\sum}}{\displaystyle \underset{j=1}{\overset{{n}_{i}}{\sum}}{y}_{ij}\left({y}_{ij}-1\right)}}-{\displaystyle \underset{i=1}{\overset{k}{\sum}}{n}_{i}{\mu}_{i}{\stackrel{\xaf}{y}}_{i.}}-{\displaystyle \underset{i=1}{\overset{k}{\sum}}{n}_{i}{\mu}_{i}\left({\stackrel{\xaf}{y}}_{i.}-{\mu}_{i}\right)}$ (2.12)

Therefore, the locally asymptotically most powerful $C\left(\alpha \right)$ test is to reject ${H}_{0}$ for large values of $T$, where

$T={\left(\partial \mathcal{l}/\partial \u03f5\right)}_{\u03f5=0}={\displaystyle \underset{i=1}{\overset{k}{\sum}}{\displaystyle \underset{j=1}{\overset{{n}_{i}}{\sum}}\left[{\left({y}_{ij}-{\stackrel{\xaf}{y}}_{i.}\right)}^{2}-{y}_{ij}\right]}}$ (2.13)

The statistic (2.13) is obtained from (2.12) by replacing each
${\mu}_{i}$ with root
${n}_{i}$ consistent estimator,
${\stackrel{^}{\mu}}_{i}$. The simplest
${\stackrel{^}{\mu}}_{i}$ is the maximum likelihood estimator
${\stackrel{^}{\mu}}_{i}={\stackrel{\xaf}{y}}_{i.}$. Moran [18] pointed out that the
$C\left(\alpha \right)$ test statistic *T* is asymptotically normal. It can be easily shown that:

$E\left(T\right)={\displaystyle \underset{i=1}{\overset{k}{\sum}}\left[\left({n}_{i}-1\right){\mu}_{i}{\left(1+\epsilon {\mu}_{i}\right)}^{2}-{n}_{i}{\mu}_{i}\right]}$ (2.14)

and

$\begin{array}{c}\mathrm{var}\left(T\right)={\displaystyle \underset{i=1}{\overset{k}{\sum}}\{2\left({n}_{i}-1\right){\mu}_{i}^{2}{\left(1+\epsilon {\mu}_{i}\right)}^{4}+\frac{1}{{n}_{i}}[{\mu}_{i}{\left(1+\epsilon {\mu}_{i}\right)}^{4}(1+3{\mu}_{i}+10\epsilon {\mu}_{i}+15{\epsilon}^{2}{\mu}_{i}^{2})}\\ -3{\mu}_{i}^{2}{\left(1+\epsilon {\mu}_{i}\right)}^{4}+{\mu}_{i}{\left(1+\epsilon {\mu}_{i}\right)}^{2}-2{\mu}_{i}{\left(1+\epsilon {\mu}_{i}\right)}^{3}]+\frac{{\mu}_{i}}{{n}_{i}}\}\end{array}$ (2.15)

Under ${H}_{0}:\u03f5=0$, (2.14) and (2.15) reduce respectively to $E\xb0=-{\displaystyle {\sum}_{i=1}^{k}{\mu}_{i}}$ and

$v\xb0={\displaystyle {\sum}_{i=1}^{k}\left(2\left({n}_{i}-1\right){\mu}_{i}^{2}+\frac{{\mu}_{i}}{{n}_{i}}\right)}$ (2.16)

The hypothesis ${H}_{0}:\u03f5=0$ is rejected whenever:

$Q\left(\u03f5=0\right)={\left(T-E\xb0\right)}^{2}/v\xb0$ exceeds ${Q}_{\alpha ,1}$, the upper 5% quantile of a chi-square random variable with one degree of freedom.

3. Testing Equality of Means

Based on the one-way layout data considered in the previous section, we would like to test the null hypothesis ${H}_{0}:{\mu}_{1}=\cdots ={\mu}_{k}=\mu $ against ${H}_{a}$ : at least two of the ${{\mu}^{\prime}}_{i}s$ are different, for all $\u03f5>0$. The log likelihood under the hypothesis ${H}_{a}$ : is given by (2.11), and will be denoted by ${\mathcal{l}}_{a}$, the log likelihood under ${H}_{0}$ will be denoted by ${\mathcal{l}}_{0}$ and is obtained by replacing ${\mu}_{i}=\mu \left(i=1,2,\cdots ,k\right)$ in (2.11). Under ${H}_{a}$, the maximum likelihood estimator of ${\mu}_{i}$ is

${\stackrel{^}{\mu}}_{i}={\stackrel{\xaf}{y}}_{i.}$

And the maximum likelihood estimator ${\stackrel{^}{\u03f5}}_{a}$, of $\u03f5$ is the non-negative root of

$\underset{i=1}{\overset{k}{\sum}}\left[{\displaystyle \underset{j=1}{\overset{{n}_{i}}{\sum}}\frac{\left({y}_{ij}-1\right){y}_{ij}}{1+{\stackrel{^}{\epsilon}}_{a}{y}_{ij}}-\frac{{n}_{i}{\left({\stackrel{\xaf}{y}}_{i.}\right)}^{2}}{\left(1-{\stackrel{^}{\epsilon}}_{a}{\stackrel{\xaf}{y}}_{i.}\right)}}\right]}=0$ (3.1)

Under ${H}_{0}$ the maximum likelihood estimator of the common mean $\mu $ is $\stackrel{^}{\mu}=y\mathrm{..}/N=\stackrel{\xaf}{y}$, where, $N={\displaystyle {\sum}_{i=1}^{k}{n}_{i}}$.

The maximum likelihood estimator of ${\stackrel{^}{\u03f5}}_{o}$ and $\u03f5$ under ${H}_{0}$ is the positive root of

$\underset{i=1}{\overset{k}{\sum}}{\displaystyle \underset{j=1}{\overset{{n}_{i}}{\sum}}\frac{\left({y}_{ij}-1\right){y}_{ij}}{1+{\stackrel{^}{\u03f5}}_{o}{y}_{ij}}-\frac{N{\left(\stackrel{\xaf}{y}\right)}^{2}}{\left(1+{\stackrel{^}{\u03f5}}_{o}\stackrel{\xaf}{Y}\right)}}}=0$ (3.2)

Detailed discussion on the necessary and sufficient conditions that (3.1) and (3.2) to have a unique root is given in Consul and Shoukri [16].

Denoting the maximized log likelihood under ${H}_{a}$ by ${L}_{a}$, and that under ${H}_{0}$ by ${L}_{0}$, the likelihood ratio test, which has an asymptotic distribution of chi-squared with $\left(k-1\right)$ degree of freedom is:

$\begin{array}{l}\lambda =2\left({L}_{a}-{L}_{0}\right)\\ =2[{\displaystyle \underset{i=1}{\overset{k}{\sum}}{\displaystyle \underset{j=1}{\overset{{n}_{i}}{\sum}}\left({y}_{ij}-1\right)}}\mathrm{log}\left\{\frac{1+{\stackrel{^}{\epsilon}}_{a}{y}_{ij}}{1+{\stackrel{^}{\epsilon}}_{0}{y}_{ij}}\right\}+{\displaystyle \underset{i=1}{\overset{k}{\sum}}{n}_{i}{\stackrel{\xaf}{y}}_{i}}\mathrm{log}\left\{\frac{{\stackrel{\xaf}{y}}_{i.}\left(1+{\stackrel{^}{\epsilon}}_{0}\stackrel{\xaf}{y}\right)}{\stackrel{\xaf}{y}\left(1+{\stackrel{^}{\epsilon}}_{a}{\stackrel{\xaf}{y}}_{i.}\right)}\right\}]\end{array}$ (3.3)

As an alternative to the likelihood ratio test (3.3), we present the Neyman’s $C\left(\alpha \right)$ statistic which has local optimal properties. Suppose that ${\mu}_{i}$ can be written as ${\mu}_{i}=\mu +{\delta}_{i}$ with ${\delta}_{k}=0$. Then testing the null hypothesis ${H}_{0}:{\mu}_{1}={\mu}_{2}=\cdots ={\mu}_{k}$ is equivalent to testing ${H}_{0}:{\delta}_{i}=o\text{}\left(i=1,2,3,\cdots ,k\right)$, where $\mu $ and $\u03f5$ are nuisance parameters. We reparametrize (11.2), and denote the resulting function by ${\mathcal{l}}^{*}$.

Define $\delta =\left({\delta}_{1},{\delta}_{2},\cdots ,{\delta}_{k-1}\right),\text{}\tau ={\left({\tau}_{1},{\tau}_{2}\right)}^{\prime}=\left(\mu ,\u03f5\right)$.

${\varphi}_{i}\left(\tau \right)={\left[\frac{\partial {\mathcal{l}}^{*}}{\partial {\delta}_{i}}\right]}_{\stackrel{\xaf}{\delta}=0}\text{}i=1,2,\cdots ,k-1$

${\Delta}_{i}\left(\tau \right)={\left[\frac{\partial {\mathcal{l}}^{*}}{\partial {\tau}_{j}}\right]}_{\stackrel{\xaf}{\delta}=0}\text{}j=1,2$

Let $\stackrel{^}{\tau}$ be any root-n consistent estimator of $\tau $ under the null hypothesis. Moran [18] showed that the $C\left(\alpha \right)$ test is based on ${F}_{i}\left(\stackrel{^}{\tau}\right)={\varphi}_{i}\left(\stackrel{^}{\tau}\right)-{\gamma}_{i1}{\Delta}_{i1}\left(\stackrel{^}{\tau}\right)-{\gamma}_{i2}{\Delta}_{i2}\left(\stackrel{^}{\tau}\right)$, where ${\gamma}_{i1}$ and ${\gamma}_{i2}$ are the partial regression coefficients of ${\varphi}_{i}$ on ${\Delta}_{1}$ and ${\Delta}_{2}$ respectively. Define the following matrices:

${P}_{ij}=-E{\left[\frac{{\partial}^{2}{\mathcal{l}}^{*}}{\partial {\delta}_{i}\partial {\delta}_{j}}\right]}_{\stackrel{\xaf}{\delta}=0},\text{}{Q}_{ij}=-E{\left[\frac{{\partial}^{2}{\mathcal{l}}^{*}}{\partial {\delta}_{i}\partial {\tau}_{j}}\right]}_{\stackrel{\xaf}{\delta}=0},$

and ${R}_{ij}=-E{\left[\frac{{\partial}^{2}{\mathcal{l}}^{*}}{\partial {\tau}_{i}\partial {\tau}_{j}}\right]}_{\stackrel{\xaf}{\delta}=0}$.

Here, we replace by its estimator
$\stackrel{^}{\tau}$ in *F*, *P*, *Q*, and *R*, the
$C\left(\alpha \right)$ test statistic is given by

${F}^{\prime}{\left(p-Q{R}^{-1}{Q}^{\prime}\right)}^{-1}F$ (3.5)

The asymptotic distribution of the test statistic given in (3.5) will be that of a chi-square with *k *− 1 degrees of freedom.

Now, there are two possible root-n consistent estimators of $\tau $, under ${H}_{0}$ :

The first is the maximum likelihood estimator $\stackrel{^}{\tau}={\left(\stackrel{\xaf}{y},{\stackrel{^}{\u03f5}}_{0}\right)}^{\prime}$, which on substitution we get ${\Delta}_{j}\left(\stackrel{^}{\tau}\right)=0\text{}\left(j=1,2\right)$, and hence ${F}_{j}\left(\stackrel{^}{\tau}\right)={\varphi}_{i}\left(\stackrel{^}{\tau}\right)$. Accordingly, (3.5) reduces to

${C}^{2}={\displaystyle \underset{i=1}{\overset{k}{\sum}}\frac{{n}_{i}{\left({\stackrel{\xaf}{y}}_{i.}-\stackrel{\xaf}{y}\right)}^{2}}{\stackrel{\xaf}{y}{\left(1+{\stackrel{^}{\epsilon}}_{0}\stackrel{\xaf}{y}\right)}^{2}}}$ (3.6)

The hypothesis of equality of population means is thus rejected whenever
${C}^{2}$ exceeds
${Q}_{\alpha ,k-1}$, the upper 5% quantile of a chi-square random variable with *k *− 1 degrees of freedom. For more details we refer the reader to [12].

4. ANCOVA: The Generalized Poisson Regression

It is well-known that ANOVA and regression are related techniques that are concerned with testing the differences in group means after adjusting for the confounding effects of potential risk factors and covariates. Since the MGPD is a member of the linear exponential family (for fixed $\u03f5$ ) Shoukri and Mian [12] expressed the expectation ${\mu}_{i}$ of ${y}_{i}$ as:

$\eta \left({\mu}_{i}\right)={X}_{i}^{T}\beta $ (4.1)

In Equation (4.1), ${X}_{i}$ is a set of measured $\left(P+1\right)$ covariates, and a subset of theses covariates defines a set of indicators (dummy) variables to identify categorical effects. The transformation $\eta (\cdot )$ is a monotone, differentiable function named “the link function”. To estimate ${\beta}_{0},{\beta}_{1},\cdots ,{\beta}_{p}$, and $\u03f5$ we construct the log-link so that:

${\mu}_{i}\left(x\right)=\mathrm{exp}\left[{X}_{i}^{T}\beta \right]$

The logarithm of the likelihood function will be proportional to

$\begin{array}{c}\mathcal{l}=\mathcal{l}\left(\beta ,\epsilon \right)={\displaystyle \underset{i=1}{\overset{k}{\sum}}\left({y}_{i}-1\right)}\mathrm{ln}\left(1+\epsilon {y}_{i}\right)+{\displaystyle \underset{i=1}{\overset{k}{\sum}}{y}_{i}}\mathrm{ln}{\mu}_{i}\left(x\right)\\ -{\displaystyle \underset{i=1}{\overset{k}{\sum}}{y}_{i}}\mathrm{ln}\left(1+\epsilon {\mu}_{i}\left(x\right)\right)-{\displaystyle \underset{i=1}{\overset{k}{\sum}}\frac{{\mu}_{i}\left(x\right)\left(1+\epsilon {y}_{i}\right)}{1+\epsilon {\mu}_{i}\left(x\right)}}\end{array}$ (4.2)

The first and second partial derivatives are given by:

$\frac{\partial \mathcal{l}}{\partial \epsilon}={\stackrel{\dot{}}{\mathcal{l}}}_{\epsilon}={\displaystyle \underset{i=1}{\overset{k}{\sum}}\frac{{y}_{i}\left({y}_{i}-1\right)}{1+\epsilon {y}_{i}}}-{\displaystyle \underset{i=1}{\overset{k}{\sum}}\frac{{y}_{i}{\mu}_{i}\left(x\right)}{1+\epsilon {\mu}_{i}\left(x\right)}}-{\displaystyle \underset{i=1}{\overset{k}{\sum}}\frac{{\mu}_{i}\left(x\right)\left({y}_{i}-{\mu}_{i}\left(x\right)\right)}{{\left(1+\epsilon {\mu}_{i}\left(x\right)\right)}^{2}}}$ (4.3)

$\frac{\partial \mathcal{l}}{\partial {\beta}_{r}}={\stackrel{\dot{}}{\mathcal{l}}}_{r}={\displaystyle \underset{i=1}{\overset{k}{\sum}}\frac{\left({y}_{i}-{\mu}_{i}\left(x\right)\right)}{{\left(1+\epsilon {\mu}_{i}\left(x\right)\right)}^{2}}{x}_{ir}}$ (4.4)

$\frac{{\partial}^{2}\mathcal{l}}{\partial \epsilon \partial {\beta}_{r}}={\stackrel{\xa8}{\mathcal{l}}}_{\epsilon r}=-2{\displaystyle {\sum}_{i=1}^{k}\frac{{\mu}_{i}\left(x\right)\left({y}_{i}-{\mu}_{i}\left(x\right)\right)}{{\left(1+\epsilon {\mu}_{i}\left(x\right)\right)}^{3}}{x}_{ir}}$ (4.5)

$\frac{{\partial}^{2}\mathcal{l}}{\partial \beta r\partial \beta s}={\stackrel{\xa8}{\mathcal{l}}}_{rs}=-{\displaystyle {\sum}_{i=1}^{k}\left[\frac{{\mu}_{i}\left(x\right)}{{\left(1+\epsilon {\mu}_{i}\left(x\right)\right)}^{2}}{x}_{ir}{x}_{is}+2\epsilon \frac{\left({y}_{i}-{\mu}_{i}\left(x\right)\right){\mu}_{i}\left(x\right)}{{\left(1+\epsilon {\mu}_{i}\left(x\right)\right)}^{3}}{x}_{ir}{x}_{is}\right]}$ (4.6)

$\frac{{\partial}^{2}\mathcal{l}}{\partial {\epsilon}^{2}}={\stackrel{\xa8}{\mathcal{l}}}_{\epsilon r}-{\displaystyle \underset{i=1}{\overset{k}{\sum}}\frac{{y}_{i}^{2}\left({y}_{i}-1\right)}{{\left(1+\epsilon {y}_{i}\right)}^{2}}}+{\displaystyle \underset{i=1}{\overset{k}{\sum}}\frac{{y}_{1}{\mu}_{i}^{2}\left(x\right)}{{\left(1+\epsilon {\mu}_{i}\right)}^{2}}}+2{\displaystyle \underset{i=1}{\overset{k}{\sum}}\frac{{\mu}_{i}^{2}\left(x\right)\left({y}_{i}-{\mu}_{i}\left(x\right)\right)}{{\left(1+\epsilon {\mu}_{i}\left(x\right)\right)}^{3}}}$ (4.7)

Taking the expected value of the negative of the second partial derivatives we get the Fishers’ information matrix I, whose elements are:

$-E\left[{\stackrel{\xa8}{\mathcal{l}}}_{rs}\right]={I}_{rs}={\displaystyle \underset{i=1}{\overset{k}{\sum}}\frac{{\mu}_{i}\left(x\right)}{{\left(1+\epsilon {\mu}_{i}\left(x\right)\right)}^{2}}{x}_{ir}{x}_{is}},\text{}r,s,=1,2,\cdots ,p$ (4.8)

$-E\left[{\stackrel{\xa8}{\mathcal{l}}}_{\epsilon s}\right]={I}_{\epsilon r}=0$ (4.9)

From Consul and Shoukri [16] we get:

$-E\left[{\stackrel{\xa8}{\mathcal{l}}}_{\epsilon \epsilon}\right]={i}_{\epsilon \epsilon}=2{\left(1+2\epsilon \right)}^{-1}{\displaystyle \underset{i=1}{\overset{k}{\sum}}\frac{{\mu}_{i}^{2}\left(x\right)}{{\left(1+\epsilon {\mu}_{i}\left(x\right)\right)}^{2}}}$ (4.10)

The asymptotic distributions of the regression estimators can be established using the results in [12].

Our approach to the data analysis when the main interest is comparing group means in the presence of potential risk factors and confounders is summarized in three steps. In the first step we use the MLE to estimate the regression parameters using Equation (4.2), without including the groups as independent variable. In the second step, we extract the residuals (*E*) of the generalized Poisson regression model, defines as:

$E=\text{= Observed dependent variable}-\text{predicted val \u2212 ue of the dependent variable}$

In the final step we test the normality and variance homogeneity of *E. *Thereafter, we use nonparametric ANOVA with the residuals being the dependent variable, and the groups being the independent variables to complete the ANCOVA testing.

5. Data Analyses

Al-Gahtani *et al*. [19] analyzed COVID-19 case fatality data collected retrospectively from the start of the of the epidemic to December 2-2020, the day the Pfizer vaccine was approved by the American Center for Disease Control (CDC). The data were collected from 120 countries grouped into 15 regions [19] as shown in Table 1. We will reanalyze the data such that:

The response variable is the aggregate number of COVID-19 deaths which we denote by “y”. We shall use different set of covariates, and these are:

· Region: The factor variable which is the main effect.

· The other covariates are:

1) X_{1} =log (percentage of obese personsin a country reported in 2018) [21] [22];

2) X_{2} = log (population density) [23];

3) X_{3} = log (number of people with colorectal cancer in a country reported in 2017) [24];

4) X_{4} = log (Chronic Kidney Disease—case fatality in a country as reported in 2017) [20].

In Figure 1 we show the histogram of (y), the aggregate of COVID-19 deaths during the study period. The distribution is positively skewed with variance much larger than the mean.

Direct calculations from the summary statistics given in Table 2 give:

$Q\_esp=0.00004$, and the corresponding p-value = 0.999. Therefore, the hypothesis of homogeneity of dispersion parameters is supported by the data. Moreover, $Q\text{}\left(\u03f5=0\right)$ is quite large and the corresponding p-value = 0.00001.

Table 1. Adaptedtable: Countries and the corresponding Regional classification as given in https://doi.org/10.1016/s0140-6736(20)30045-3 [20]. In the first column we have the countries, in the second column we have Region or group name followed by the number of countries within the group. In the last column we have Region code.

CSSA = Central Sub-Saharan-Africa; MENA = Middle East and North Africa; HIAP = High Income Asian Pacific; WSSA = Western Sub-Saharan-Africa; SSSA = Southern Sub-Saharan Africa.

Figure 1. Histogram COVID-19 deaths (y). It is skewed to the right showing clear over dispersion in the data.

Table 2. Summary measures of COVID-19 deaths: group sizes (n), means (m), standard deviation (s) and the estimates of the dispersion parameter (eps) per group.

Therefore, the hypotheses that the common dispersion is not significantly different from zero is not supported by the data. The C^{2}-statistic is quite large as well, and the corresponding p-value is near zero, therefore the hypothesis of equality of mean counts in all regions (aggregate COVID-19 deaths) is also not supported by the data.

We shall write a function using the R-program for the estimation of the regression parameters. The iteration process requires staring points. We obtain the staring points by first fitting the classical Poisson regression, which is done using the following code:

out1 = GLM (y~x_{1} + x_{2} + x_{3} + x_{4}, data = data2, family = Poisson).

Having obtained the parameter estimates from the Poisson regression, we use them to start the iteration process and obtain final estimates as shown in the Appendix.

The MGPD regression results are summarized in Table 3.

The correlation between the observed and predicted COVID-19 death counts is (0.758).

Figure 2 gives the Q-Q plot of the quantiles of model residuals exhibiting close agreement among with the quantiles of the standard normal distribution.

To complete the ANCOVA testing we use theKruskal-Wallis test whereby residuals of the MGPD regression model are used as dependent variables and the “Regions”, or groups as independent variables. The results are summarized as follows:

Kruskal-Wallis chi-squared = 14.936, p-value = 0.1344.

Therefore, after adjusting for the covariates, there is not sufficient evidence to

Table 3. Maximum likelihood estimation of the MGPD regression using R.

Figure 2. Plot of the quantiles of the residuals of the MGPD regression model against the quantiles of the standard normal distribution.

reject the hypothesis of equality of mean counts in COVID-19 deaths among the “Regions”.

6. Nonparametric Regression Modeling: Generalized Additive Models

The Generalized Additive Models (GAM) are recent developments that are becoming popular as modeling techniques. It is nonparametric in nature and, even though less powerful, it is quite robust against departure from the assumptions required by classical GLM regression models. The GAM allow us to include non-linear smoothers into the modeling strategy. In mathematical terms GAM solve the following equation:

$g\left({\mu}_{i}\right)={f}_{1}\left({x}_{1}\right)+{f}_{2}\left({x}_{2}\right)+{f}_{3}\left({x}_{3}\right)+{f}_{4}\left({x}_{4}\right)+{f}_{5}\left({x}_{5}\right)$ (6.1)

The
${f}_{j}\left({x}_{j}\right)$ are smooth functions to be estimated. Equation (6.1) seems complex, but it is very simple to understand. The first thing to notice is that with GAM we are not necessarily estimating the response directly, *i.e.* we are not modelling y. In fact, as with GLM we have the possibility to use link functions to model non-normal response variables (and thus perform Poisson or logistic regression) [14]. Therefore, the term
$g\left(\mu \right)$ is simply the transformation of y needed to “linearize” the model. When we are dealing with a normally distributed response this term is simply replaced by y. The second part of the equation, where we have two terms: the parametric and the non-parametric part. In GAM we can include all the parametric terms we can include in Linear Model or GLM, for example linear or polynomial terms. The second part is the non-parametric smoother that will be automatically fitted, and it is the key point of GAMs. A complete and lucid account of the GAM theory can be found in [25] [26] [27].

We fitted the GAM to the data using the R-package “GAM”, and the next two lines are the needed code:

library(gam);

agam=gam(y~Region+x1+x2+x3+x4,data=data2).

Call: gam(formula = y ~ code + x1 + x2 + x3 + x4, data = data2).

The following results are obtained from the GAM fitting to the data:

1) Null Deviance: 135512150629 on 112 degrees of freedom;

2) Residual Deviance: 74451674616 on 96 degrees of freedom.

From which: The correlation between observed counts and predicted counts is: (1-74451674616/135512150629)^{1/2} = 0.671.

The GAM results are shown in Table 4, and the Q-Q plot of the model residuals is given in Figure 3, showing that the model residuals are not as close to normality as the residuals of the MGPD regression model.

7. Discussion

In this paper we demonstrated the use of the MGPD as a model for the ANCOVA. We used a two-steps approach. In the first step we used the regression models to

Table 4. The results of fitting the GAM: ANOVA for Parametric Effects.

***significant at level of significance less than 0.00001.

Figure 3. Plot of the quantiles of the residuals of the GAM nonparametric regression model against the quantiles of the standard normal distribution.

assess the influence of possible confounders and covariates on the outcome of interest. Thereafter we extracted the model residuals and used these residuals as a dependent variable of a nonparametric ANOVA, with the groups being the independent predictors. We note that while there was a significant difference among the group means in the univariate analysis, such difference was not significant in the second step of the ANCOVA. We note that the MGPD regression model showed high correlation (0.758) between the observed counts and the model based predicted counts, indicative of a good fit by the model to the given data. On using the Q-Q plot, model residuals are shown to have close agreement to the empirical quantiles of the standard normal distribution. This shows that the model is quite reliable as a predictive tool, and that the distribution of the estimated regression parameters is that of a multivariate normal.

For the sake of comparison, we fitted the data using the GAM, a nonparametric regression approach. This approach deals with the covariates as factors. The GAM model showed that after adjusting for the covariates within the same model, there are no significant differences among regions. These findings are in agreement with those based on the MGPD regression. The GAM did not produce estimate for the dispersion parameter 𝟄. The measure of goodness of fit of the GAM was (0.671), which is much lower than that of the MGPD. The MGPD model has several advantages when compared to the GAM. First, The GAM cannot be used as a predictive tool, while the MGPD model can be used to predict the mean of the response variable. Second, the residuals of the MGPD regression model have a distribution that is almost normal. This emphasizes the reliability of the likelihood based statistical estimation of the model parameters. Finally, our two-steps approach to data fitting makes helps avoiding both overfitting and possible multicollinearity.

Appendix A

R-Code fitting the Generalized Poisson using the method of maximum likelihood.

###Notations: *b _{i}* are the regression parameters, mu is the mean function, “n” is the number of observations “ll” denotes the ###log-likelihood function, eta is the linear predictor ###

llik= function(y,par){

b0=par[1]

b1=par[2]

b2=par[3]

b3=par[4]

b4=par[5]

k=par[6]

n=length(y)

eta=b0+b1*x1+b2*x2+b3*x3+b4*x4

mu=exp(eta)

ll= sum(y*log(mu/(1+(k*mu))))+sum((y-1)*log(1+(k*y))

+((-mu*(1+(k*y)))/ (1+(k*mu)))-lgamma(y+1))

return(-ll)

}

res=optim(par=c(1.4,.84,.07,.95,-.37,.1),llik,y=y,method=“BFGS”,hessian=T)

theta=res$par

theta

#CALCULATING THE STANDARD ERRORS OF MLE

out2=nlm(llik,theta,y=y,hessian=TRUE)

summary(out2)

plot(data2$y,resid(out2))

data_new=data.rame(data2$y,resid(out2))

fish=out2$hessian

solve(fish)

element=diag((solve(fish)))

se=sqrt(element)

se

z=theta/se

out.GMPD=data.frame(theta,se,z)

out.GMPD

#### FINAL ESTIMATES -0.30007804 0.65884589 0.48926214 0.42536091 -0.38469265

data2$y_hat=exp(-.3+.66*data2$x1+.5*data2$x2+.425*data2$x3-.384*data2$x4)

data2$y

data2_error=data.frame(data2$y,data2$y_hat)

cor(data2$y,data2$y_hat) #####0.76

data2_error=data2$y-data2$y_hat

data2$response=sqrt((data2_error)^(1/3))

qqnorm(data2$response)

### SHAPITO WILK TEST OF NORMALITY####

shapiro.test(data2$response)

leveneTest(data2$response~data2$Region)

aov_result=aov(data2$response~data2$Region)

####ANOVA ON THE RESIDUALS WITH REGION BEING THE INDEPENDENT VARIABLEUSING KRUSKAL_WALLIS####

levels(data2$Region)

aov_result=aov(data2$response~data2$Region)

summary(aov_result)

boxplot(data2_error~data2$Region,xlab=“Region”,ylab=“GPD Residuals”,main=“CODID-19 Deaths”)

kruskal_result=kruskal.test(data2$response~data2$Region)

###END OF CODE####.

References

[1] Cox, D.R. (1983) Some Remarks on Overdispersion. Biometrika, 70, 269-274.

https://doi.org/10.1093/biomet/70.1.269

[2] Hinde, J. and Demetrio, C.G.B. (1998) Overdispersion: Models and Estimation. Computational statistics and Data Analysis, 27, 151-170.

https://doi.org/10.1016/S0167-9473(98)00007-3

[3] Hayat, M.J. and Higgins, M. (2014) Understanding Poisson Regression. Journal of Nursing Education, 53, 207-215.

https://doi.org/10.3928/01484834-20140325-04

[4] Joe, H. and Zhu, R. (2005) Generalized Poisson Distribution: The Property of Mixture of Poisson and Comparison with the Negative Binomial Distribution. Biometrical Journal, 47, 219-229.

https://doi.org/10.1002/bimj.200410102

[5] Consul, P.C. and Jain, G.C. (1970) On the Generalization of Poisson Distribution. Annals of Mathematical Statistics, 41, 1387.

[6] Consul, P.C. (1989) Generalized Poisson Distribution. Marcel Dekker Inc., New York.

[7] Janardan, K.G. and Schaeffer, D.J. (1977) Models for the Analysis of Chromosomal Aberrations in Human Leukocytes. Biometrical Journal, 19, 599-612.

https://doi.org/10.1002/bimj.4710190804

[8] Tanner, J.C. (1961) A Derivation of Borel Distribution. Biometrika, 40, 222-224.

https://doi.org/10.1093/biomet/48.1-2.222

[9] Consul, P.C. and Shoukri, M.M. (1988) Some Chance Mechanisms Related to a Generalized Poisson Probability Model. American Journal of Mathematical and Management Sciences, 8, 181-202.

https://doi.org/10.1080/01966324.1988.10737237

[10] Jiang, H. and Wong, W.H. (2009) Statistical Inferences for Isoform Expression in RNA-Seq. Bioinformatics, 25, 1026-1032.

https://doi.org/10.1093/bioinformatics/btp113

[11] Srivastava, S. and Chen, L. (2010) A Two-Parameter Generalized Poisson Model to Improve the Analysis of RNA-seq Data. Nucleic Acids Research, 38, e170.

https://doi.org/10.1093/nar/gkq670

[12] Shoukri, M.M. and Mian, I.U.H. (1991) Some Aspects of Statistical Inference on the Lagrange (Generalized) Poisson Distribution. Communication in Statistics: Computations and Simulations, 20, 1115-1137.

https://doi.org/10.1080/03610919108812999

[13] Cox, D.R. and Reid, N. (1987) Parameter Orthogonality and Approximate Conditional Inference. Journal of the Royal Statistical Society. Series B (Methodological), 49, 1-39.

https://doi.org/10.1111/j.2517-6161.1987.tb01422.x

[14] McCullagh, P. and Nelder, J.A. (1989) Generalized Linear Models. Chapman and Hall, London.

https://doi.org/10.1007/978-1-4899-3242-6

[15] Shoukri, M.M. and Al-Eid, M. (2020) Inference Procedures on the Ratio of Modified Generalized Poisson Distribution Means: Applications to RNA_SEQ Data. International Journal of Statistics in Medical Research, 9, 41-49.

https://doi.org/10.6000/1929-6029.2020.09.05

[16] Consul, P.C. and Shoukri, M.M. (1984) Maximum Likelihood Estimation of the Generalized Poisson Distribution. Communications in Statistics, Theory and Methods, 13, 1533-1547.

https://doi.org/10.1080/03610928408828776

[17] Cochran, W.G. (1954) Some Methods for Strengthening the Common X2 Tests. Biometrics, 10, 417-451.

https://doi.org/10.2307/3001616

[18] Moran, P.A.P. (1970) On Asymptotically Optimal Test of Composite Hypotheses. Biometrika, 57, 47-55.

https://doi.org/10.1093/biomet/57.1.47

[19] Al-Gahtani, S., Shoukri, M. and Al-Eid, M. (2021) Predictors of the Aggregate of COVID-19 Cases and Its Case-Fatality: A Global Investigation Involving 120 Countries. Open Journal of Statistics, 11, 259-277.

https://www.scirp.org/journal/ojs

https://doi.org/10.4236/ojs.2021.112014

[20] Cockwell, P. and Fisher, L.-A. (2017) Global, Regional, and National Burden of Chronic Kidney Disease, 1990-2017: A Systematic Analysis for the Global Burden of Disease Study. The Lancet, 395, 709-733.

[21] Rottoli, M., Bernante, P., Garelli, S. and Gianella, M. (2020) How Important Is Obesity as a Risk Factor for Respiratory Failure, Intensive Care Admission and Death in Hospitalized COVID-19 Patients? Results from a Single Italian Center. European Journal of Endocrinology, 183, 389-397.

https://doi.org/10.1530/EJE-20-0541

[22] https://worldpopulationreview.com/en/country-rankings/obesity-rates-by-country

[23] Rashed, E.A., Kodera, S., Gomez-Tames, J. and Hirata, A. (2020) Correlation between COVID-19 Morbidity and Mortality Rates in Japan and Local Population Density, Temperature, and Absolute Humidity. International Journal of Environmental Research and Public Health, 17, 5447.

https://doi.org/10.3390/ijerph17155477

[24] GBD Colorectal Cancer Collaborators (2019) The Global, Regional, and National Burden of Colorectal Cancer and Its Attributable Risk Factors in 195 Countries and Territories, 1990-2017: A Systematic Review for the Global Burden of Disease Study 2017. The Lancet Gastroenterology and Hepatology, 4, 913-933.

[25] Hastie, T.J. and Tibshirani, R.J. (1990) Generalized Additive Models. Chapman & Hall/CRC, London.

[26] Ruppert, D., Wand, M.P. and Carroll, R.J. (2003) Semiparametric Regression. Cambridge University Press, Cambridge.

https://doi.org/10.1017/CBO9780511755453

[27] Wood, S.N. (2017) Generalized Additive Models: An Introduction with R. Second Edition, CRC Press, Boca Raton.

https://doi.org/10.1201/9781315370279