top of page
Search

Poisson Model, Hurdle Model, Likelihood In Machine Learning

Data

The spreadsheet CLGoals.xlsx contains the number of goals scored in each UEFA Champions.

League game to-date this season (three match weeks of sixteen games). The data are count data that take the values 0, 1, 2, ....

Modeling

Poisson Model

The Poisson distribution is probably the most standard model for count data.

The Poisson model, with parameter λ, assumes that

Thus, P{X = 0} = exp(−λ), P{X = 1} = λ exp(−λ), P{X = 2} = λ^2 exp(−λ)/2, ...

The expected value (mean) of the Poisson distribution is λ and the variance is also λ (thus,

the standard deviation is √λ).

Hurdle Model

The Hurdle model, with parameters θ and λ, assumes that

Thus, P{X = 0} = θ, P{X = 1} = (1 − θ)λ exp(−λ)/(1 − exp(−λ)), ...

If θ = e^−λ

then the Hurdle model is the same as the Poisson model. If θ < e^−λ , then zeros are less likely than under a Poisson model. If θ > e^−λ , then zeros are more likely than under a Poisson model.

Likelihood

The likelihood function is defined to be the probability of the observed data for a given param-eter value. If we have independent observations x1, x2, . . . , xn, then the likelihood is

The log-likelihood is (natural) logarithm of the likelihood, thus it takes the form

1. Read the data into R.

Hint: The read.xlsx() function in the openxlsx R package is useful for doing this.

2. Produce a table that tabulates frequency of each number of goals.

3. Produce a plot of the frequency of each number of goals.

4. Calculate the mean and the standard deviation of the number of goals.

1. Write a function that calculates the log-likelihood function (for a specified value of λ) for the Poisson model for the UEFA Champions League data.

2. Plot the log-likelihood function for a range of values of λ.

Hint: Make sure that λ = x is in the range.

3. Add a vertical line to the plot at the value x and visually verify that this maximizes the log-likelihood function.

4. Simulate 48 values from a Poisson model with λ = x and summarize the resulting values (contrasting them with the summaries produced in Task 1).

5. Simulate 48 values from a Poisson model for other values of λ and summarize

1. Create a dHurdle() function that has arguments x, param that computes P{X = x} for

the Hurdle model, where the first element of the vector param is θ and the second element of the vector param is λ. Ensure that the function can handle x being a vector of values.

2. Write a function that calculates the log-likelihood function (for a specified value of param) for the Hurdle model for the UEFA Champions League goal data.

3. Use the optim function to find the value of θ and λ that maximizes the log-likelihood.

Hint: optim minimizes functions, by default, so you may want to write a function that

computes minus the log-likelihood and minimize that.

Alternatively, you can set control=list(fnscale=-1) as an argument in optim to make

it maximize.

4. Comment on the value of θ found and compare the log-likelihood values found for the Poisson and Hurdle models.