Suppose we want to compute the likelihood for gamma distribution, or log of the gamma lik, to be exact: Using the built-in function > dgamma(x, shape=3, rate = 2, log=TRUE) Or define our own function loglikgamma <- function( x , shape , rate ) { shape * log( rate )+(shape - 1) * log( x ) - rate*x - log(gamma( shape )) } then for data of 3 observations, we verify > x <- c( 3, 3.5, 4.6) > loglikgamma(x, shape=3, rate=2) [1] -2.416481 -3.108180 -4.761593 > dgamma(x, shape=3, rate = 2, log=TRUE) [1] -2.416481 -3.108180 -4.761593 > OK, now define the minus log lik function, ready to be minimized. MinusLogLik <- function( theta, x) { myshape <- theta[1] myrate <- theta[2] - sum(dgamma(x, shape=myshape, rate = myrate, log=TRUE) ) } myMinusLogLik <- function( theta, x) { myshape <- theta[1] myrate <- theta[2] - sum(loglikgamma(x, shape=myshape, rate = myrate) ) } Here is the real fifteen observations x <- c(0.37, 2.72, 2.96, 1.31, 0.67, 1.58, 3.19, 1.32, 0.62, 0.50, 2.48, 1.87, 1.46, 0.81, 2.26) MinusLogLik( theta= c(1,2), x=x) [1] 37.84279 OK, find the maximum value and the MLE > optim( theta <- c(2,4), MinusLogLik, hessian=TRUE, x=x) $par [1] 2.721622 1.692691 $value [1] 18.8886 $counts function gradient 67 NA $convergence [1] 0 $message NULL $hessian [,1] [,2] [1,] 6.644881 -8.86163 [2,] -8.861630 14.24833 We can also use nlm( ) function > nlm( MinusLogLik, theta <- c(2,4), x=x , hessian=TRUE) $minimum [1] 18.8886 $estimate [1] 2.721376 1.692397 $gradient [1] 1.240210e-07 -5.353011e-07 $hessian [,1] [,2] [1,] 6.644806 -8.862726 [2,] -8.862726 14.249133 $code [1] 1 $iterations [1] 10 Warning messages: 1: NaNs produced in: dgamma(x, shape, scale, log) 2: NA/Inf replaced by maximum positive value So, the maximum of log likelihood is -18.8886, (notice the minus sign). the parameter that achieve this is alpha = 2.721376, lambda = 1.692397 This is the Maximum Likelihood Estimator: alpha = 2.721376, lambda = 1.692397 We may also use the log lik function of our own > nlm( myMinusLogLik, theta <- c(2,4), x=x , hessian=TRUE) $minimum [1] 18.8886 $estimate [1] 2.721376 1.692397 $gradient [1] 1.292430e-07 -5.353011e-07 $hessian [,1] [,2] [1,] 6.644806 -8.862726 [2,] -8.862726 14.249133 $code [1] 1 $iterations [1] 10 Warning messages: 1: NaNs produced in: log(x) 2: NA/Inf replaced by maximum positive value > > optim( theta <- c(2,4), myMinusLogLik, hessian=TRUE, x=x) $par [1] 2.721622 1.692691 $value [1] 18.8886 $counts function gradient 67 NA $convergence [1] 0 $message NULL $hessian [,1] [,2] [1,] 6.644881 -8.86163 [2,] -8.861630 14.24833 Is lambda = 2, alpha = 1.5 inside the 90% confidence interval or not? We compute the Wilks statistics: (recall, the first parameter is alpha, second is lambda ) > 2* ( myMinusLogLik(theta = c( 1.5, 2), x=x ) - myMinusLogLik(theta = c(2.721622 , 1.692691), x=x ) ) > [1] 19.68542 and this is larger than 4.60517 (= qchisq(0.90, df=2) ). So lambda = 2, alpha = 1.5 is outside the 90% confidence region. On the other hand if we test alpha=2, lambda = 1.5, then > 2* ( myMinusLogLik(theta = c(2, 1.5), x=x ) - myMinusLogLik(theta = c(2.721622 , 1.692691), x=x ) ) > [1] 1.850313 This imply lambda = 1.5, alpha = 2 is inside 90% confidence region. (because 1.850313 < qchisq(0.90, df=2) ). If we try to use the Wald (approx) pivotal, then the calculation is as follows: First we want to get the (observed) Fisher infomation matrix. (you may also use the expected, if you care to compute the derivative and the expectation). optim( theta <- c(2,4), myMinusLogLik, hessian=TRUE, x=x) $par [1] 2.721622 1.692691 $value [1] 18.8886 $counts function gradient 67 NA $convergence [1] 0 $message NULL $hessian [,1] [,2] [1,] 6.644881 -8.86163 [2,] -8.861630 14.24833 > Finfo <- optim( theta <- c(2,4), myMinusLogLik, hessian=TRUE, x=x)$hessian > Finfo [,1] [,2] [1,] 6.644881 -8.86163 [2,] -8.861630 14.24833 > c( 1.5-2.721622, 2-1.692691 ) %*% Finfo %*% c( 1.5-2.721622, 2-1.692691) [,1] [1,] 17.91574 This is larger than qchisq(0.90, df=2), so (1.5, 2) is outside the confidence region based on Wald method. > c( 2-2.721622, 1.5-1.692691) %*% Finfo %*% c(2-2.721622, 1.5-1.692691) [,1] [1,] 1.524862 This is smaller than 4.60517 = qchisq(0.9, df=2) , so (2, 1.5) is inside the confidence region. The Wald confidence region is always an ellipsoid, while the Wilks is only convex. Notice the p-values are slightly different from those of Wilks, this is due to the fact that both are approx. pivotal. (the chi square values are 19.68542 vs 17.91574; and 1.850313 vs 1.524862)