top of page

Poisson Model, Hurdle Model, Likelihood In Machine Learning



Data

The spreadsheet CLGoals.xlsx contains the number of goals scored in each UEFA Champions.

League game to-date this season (three match weeks of sixteen games). The data are count data that take the values 0, 1, 2, ....


Modeling

Poisson Model

The Poisson distribution is probably the most standard model for count data.

The Poisson model, with parameter λ, assumes that






Thus, P{X = 0} = exp(−λ), P{X = 1} = λ exp(−λ), P{X = 2} = λ^2 exp(−λ)/2, ...

The expected value (mean) of the Poisson distribution is λ and the variance is also λ (thus,

the standard deviation is √λ).


Hurdle Model

The Hurdle model, with parameters θ and λ, assumes that




Thus, P{X = 0} = θ, P{X = 1} = (1 − θ)λ exp(−λ)/(1 − exp(−λ)), ...

If θ = e^−λ

then the Hurdle model is the same as the Poisson model. If θ < e^−λ , then zeros are less likely than under a Poisson model. If θ > e^−λ , then zeros are more likely than under a Poisson model.


Likelihood

The likelihood function is defined to be the probability of the observed data for a given param-eter value. If we have independent observations x1, x2, . . . , xn, then the likelihood is




The log-likelihood is (natural) logarithm of the likelihood, thus it takes the form




Task 1: Exploring

1. Read the data into R.

Hint: The read.xlsx() function in the openxlsx R package is useful for doing this.

2. Produce a table that tabulates frequency of each number of goals.

3. Produce a plot of the frequency of each number of goals.

4. Calculate the mean and the standard deviation of the number of goals.


Task 2a: Poisson Modelling

1. Write a function that calculates the log-likelihood function (for a specified value of λ) for the Poisson model for the UEFA Champions League data.

2. Plot the log-likelihood function for a range of values of λ.

Hint: Make sure that λ = x is in the range.

3. Add a vertical line to the plot at the value x and visually verify that this maximizes the log-likelihood function.

4. Simulate 48 values from a Poisson model with λ = x and summarize the resulting values (contrasting them with the summaries produced in Task 1).

5. Simulate 48 values from a Poisson model for other values of λ and summarize


Task 2b: Hurdle Modelling

1. Create a dHurdle() function that has arguments x, param that computes P{X = x} for

the Hurdle model, where the first element of the vector param is θ and the second element of the vector param is λ. Ensure that the function can handle x being a vector of values.

2. Write a function that calculates the log-likelihood function (for a specified value of param) for the Hurdle model for the UEFA Champions League goal data.

3. Use the optim function to find the value of θ and λ that maximizes the log-likelihood.

Hint: optim minimizes functions, by default, so you may want to write a function that

computes minus the log-likelihood and minimize that.

Alternatively, you can set control=list(fnscale=-1) as an argument in optim to make

it maximize.

4. Comment on the value of θ found and compare the log-likelihood values found for the Poisson and Hurdle models.



Contact us to get any machine learning project assignment help with an affordable prices at contact@codersarts.com

291 views0 comments
bottom of page