# Poisson Model, Hurdle Model, Likelihood In Machine Learning

**Data**

The spreadsheet CLGoals.xlsx contains the number of goals scored in each UEFA Champions.

League game to-date this season (three match weeks of sixteen games). The data are count data that take the values 0, 1, 2, ....

**Modeling**

**Poisson Model**

The Poisson distribution is probably the most standard model for count data.

The Poisson model, with parameter λ, assumes that

Thus, P{X = 0} = exp(−λ), P{X = 1} = λ exp(−λ), P{X = 2} = λ^2 exp(−λ)/2, ...

The expected value (mean) of the Poisson distribution is λ and the variance is also λ (thus,

the standard deviation is √λ).

**Hurdle Model**

The Hurdle model, with parameters θ and λ, assumes that

Thus, P{X = 0} = θ, P{X = 1} = (1 − θ)λ exp(−λ)/(1 − exp(−λ)), ...

If θ = e^−λ

then the Hurdle model is the same as the Poisson model. If θ < e^−λ , then zeros are less likely than under a Poisson model. If θ > e^−λ , then zeros are more likely than under a Poisson model.

**Likelihood**

The likelihood function is defined to be the probability of the observed data for a given param-eter value. If we have independent observations x1, x2, . . . , xn, then the likelihood is

The log-likelihood is (natural) logarithm of the likelihood, thus it takes the form

**Task 1: Exploring**

**1. **Read the data into R.

**Hint:** The read.xlsx() function in the openxlsx R package is useful for doing this.

**2. **Produce a table that tabulates frequency of each number of goals.

**3.** Produce a plot of the frequency of each number of goals.

**4.** Calculate the mean and the standard deviation of the number of goals.

**Task 2a: Poisson Modelling**

**1.** Write a function that calculates the log-likelihood function (for a specified value of λ) for the Poisson model for the UEFA Champions League data.

**2.** Plot the log-likelihood function for a range of values of λ.

**Hint:** Make sure that λ = x is in the range.

**3. **Add a vertical line to the plot at the value x and visually verify that this maximizes the log-likelihood function.

**4. **Simulate 48 values from a Poisson model with λ = x and summarize the resulting values (contrasting them with the summaries produced in Task 1).

**5.** Simulate 48 values from a Poisson model for other values of λ and summarize

**Task 2b: Hurdle Modelling**

**1. **Create a dHurdle() function that has arguments x, param that computes P{X = x} for

the Hurdle model, where the first element of the vector param is θ and the second element of the vector param is λ. Ensure that the function can handle x being a vector of values.

**2. **Write a function that calculates the log-likelihood function (for a specified value of param) for the Hurdle model for the UEFA Champions League goal data.

**3. **Use the optim function to find the value of θ and λ that maximizes the log-likelihood.

**Hint**: optim minimizes functions, by default, so you may want to write a function that

computes minus the log-likelihood and minimize that.

Alternatively, you can set control=list(fnscale=-1) as an argument in optim to make

it maximize.

**4.** Comment on the value of θ found and compare the log-likelihood values found for the Poisson and Hurdle models.