Data
The spreadsheet CLGoals.xlsx contains the number of goals scored in each UEFA Champions.
League game to-date this season (three match weeks of sixteen games). The data are count data that take the values 0, 1, 2, ....
Modeling
Poisson Model
The Poisson distribution is probably the most standard model for count data.
The Poisson model, with parameter λ, assumes that
Thus, P{X = 0} = exp(−λ), P{X = 1} = λ exp(−λ), P{X = 2} = λ^2 exp(−λ)/2, ...
The expected value (mean) of the Poisson distribution is λ and the variance is also λ (thus,
the standard deviation is √λ).
Hurdle Model
The Hurdle model, with parameters θ and λ, assumes that
Thus, P{X = 0} = θ, P{X = 1} = (1 − θ)λ exp(−λ)/(1 − exp(−λ)), ...
If θ = e^−λ
then the Hurdle model is the same as the Poisson model. If θ < e^−λ , then zeros are less likely than under a Poisson model. If θ > e^−λ , then zeros are more likely than under a Poisson model.
Likelihood
The likelihood function is defined to be the probability of the observed data for a given param-eter value. If we have independent observations x1, x2, . . . , xn, then the likelihood is
The log-likelihood is (natural) logarithm of the likelihood, thus it takes the form
Task 1: Exploring
1. Read the data into R.
Hint: The read.xlsx() function in the openxlsx R package is useful for doing this.
2. Produce a table that tabulates frequency of each number of goals.
3. Produce a plot of the frequency of each number of goals.
4. Calculate the mean and the standard deviation of the number of goals.
Task 2a: Poisson Modelling
1. Write a function that calculates the log-likelihood function (for a specified value of λ) for the Poisson model for the UEFA Champions League data.
2. Plot the log-likelihood function for a range of values of λ.
Hint: Make sure that λ = x is in the range.
3. Add a vertical line to the plot at the value x and visually verify that this maximizes the log-likelihood function.
4. Simulate 48 values from a Poisson model with λ = x and summarize the resulting values (contrasting them with the summaries produced in Task 1).
5. Simulate 48 values from a Poisson model for other values of λ and summarize
Task 2b: Hurdle Modelling
1. Create a dHurdle() function that has arguments x, param that computes P{X = x} for
the Hurdle model, where the first element of the vector param is θ and the second element of the vector param is λ. Ensure that the function can handle x being a vector of values.
2. Write a function that calculates the log-likelihood function (for a specified value of param) for the Hurdle model for the UEFA Champions League goal data.
3. Use the optim function to find the value of θ and λ that maximizes the log-likelihood.
Hint: optim minimizes functions, by default, so you may want to write a function that
computes minus the log-likelihood and minimize that.
Alternatively, you can set control=list(fnscale=-1) as an argument in optim to make
it maximize.
4. Comment on the value of θ found and compare the log-likelihood values found for the Poisson and Hurdle models.
コメント