Suppose we want to compute the likelihood for gamma distribution, 
or log of the gamma lik, to be exact:

Using the built-in function

> dgamma(x,  shape=3, rate = 2, log=TRUE)

Or define our own function 

loglikgamma <- function( x ,  shape , rate ) {
shape * log( rate )+(shape - 1) * log( x ) - rate*x - log(gamma( shape ))
}

then for data of 3 observations, we verify

> x <- c( 3, 3.5, 4.6)
> loglikgamma(x, shape=3, rate=2)
[1] -2.416481 -3.108180 -4.761593
> dgamma(x,  shape=3, rate = 2, log=TRUE)
[1] -2.416481 -3.108180 -4.761593
> 

OK, now define the minus log lik function, ready to be minimized.

MinusLogLik <- function( theta, x) {
myshape <- theta[1]
myrate <- theta[2]
- sum(dgamma(x,  shape=myshape, rate = myrate, log=TRUE)  )
}

myMinusLogLik <- function( theta, x) {
myshape <- theta[1]
myrate <- theta[2]
- sum(loglikgamma(x,  shape=myshape, rate = myrate)  )
}


Here is the real fifteen observations

x <- c(0.37, 2.72, 2.96, 1.31, 0.67, 1.58, 3.19, 1.32, 0.62, 0.50, 2.48, 1.87, 1.46, 0.81, 2.26)

 MinusLogLik( theta= c(1,2), x=x)
[1] 37.84279

OK, find the maximum value and the MLE

> optim( theta <- c(2,4), MinusLogLik, hessian=TRUE, x=x)
$par
[1] 2.721622 1.692691

$value
[1] 18.8886

$counts
function gradient 
      67       NA 

$convergence
[1] 0

$message
NULL

$hessian
          [,1]     [,2]
[1,]  6.644881 -8.86163
[2,] -8.861630 14.24833

We can also use nlm( ) function

> nlm( MinusLogLik, theta <- c(2,4), x=x , hessian=TRUE)
$minimum
[1] 18.8886

$estimate
[1] 2.721376 1.692397

$gradient
[1]  1.240210e-07 -5.353011e-07

$hessian
          [,1]      [,2]
[1,]  6.644806 -8.862726
[2,] -8.862726 14.249133

$code
[1] 1

$iterations
[1] 10

Warning messages:
1: NaNs produced in: dgamma(x, shape, scale, log) 
2: NA/Inf replaced by maximum positive value 

So, the maximum of log likelihood is  -18.8886, (notice the minus sign).
the parameter that achieve this is alpha = 2.721376,  lambda = 1.692397
This is the Maximum Likelihood Estimator: alpha = 2.721376,  lambda = 1.692397

We may also use the log lik function of our own
> nlm( myMinusLogLik, theta <- c(2,4), x=x , hessian=TRUE)
$minimum
[1] 18.8886

$estimate
[1] 2.721376 1.692397

$gradient
[1]  1.292430e-07 -5.353011e-07

$hessian
          [,1]      [,2]
[1,]  6.644806 -8.862726
[2,] -8.862726 14.249133

$code
[1] 1

$iterations
[1] 10

Warning messages:
1: NaNs produced in: log(x) 
2: NA/Inf replaced by maximum positive value 
> 
> optim( theta <- c(2,4), myMinusLogLik, hessian=TRUE, x=x)
$par
[1] 2.721622 1.692691

$value
[1] 18.8886

$counts
function gradient 
      67       NA 

$convergence
[1] 0

$message
NULL

$hessian
          [,1]     [,2]
[1,]  6.644881 -8.86163
[2,] -8.861630 14.24833


Is lambda = 2, alpha = 1.5 inside the 90% confidence interval or not?

We compute the Wilks statistics: (recall, the first parameter is alpha, 
second is lambda )

> 2* ( myMinusLogLik(theta = c( 1.5, 2), x=x ) - myMinusLogLik(theta = c(2.721622 , 1.692691), x=x )  )
> [1] 19.68542

and this is larger than 4.60517 (= qchisq(0.90, df=2) ). So lambda = 2, alpha = 1.5 is outside the 90% confidence region. 
 On the other hand if we test alpha=2, lambda = 1.5, then 

> 2* ( myMinusLogLik(theta = c(2, 1.5), x=x ) - myMinusLogLik(theta = c(2.721622 , 1.692691), x=x ) )
> [1] 1.850313

This imply lambda = 1.5, alpha = 2 is inside 90% confidence region. 
(because 1.850313 < qchisq(0.90, df=2) ).


If we try to use the Wald (approx) pivotal, then the calculation is as follows:

First we want to get the (observed) Fisher infomation matrix.
(you may also use the expected, if you care to compute the derivative
and the expectation).


optim( theta <- c(2,4), myMinusLogLik, hessian=TRUE, x=x)
$par
[1] 2.721622 1.692691

$value
[1] 18.8886

$counts
function gradient 
      67       NA 

$convergence
[1] 0

$message
NULL

$hessian
          [,1]     [,2]
[1,]  6.644881 -8.86163
[2,] -8.861630 14.24833

> Finfo <- optim( theta <- c(2,4), myMinusLogLik, hessian=TRUE, x=x)$hessian
> Finfo
          [,1]     [,2]
[1,]  6.644881 -8.86163
[2,] -8.861630 14.24833
> c( 1.5-2.721622, 2-1.692691 ) %*% Finfo %*% c( 1.5-2.721622, 2-1.692691)
         [,1]
[1,] 17.91574

This is larger than qchisq(0.90, df=2), so  (1.5, 2) is outside the confidence
region based on Wald method.

> c( 2-2.721622, 1.5-1.692691) %*% Finfo %*% c(2-2.721622, 1.5-1.692691)
         [,1]
[1,] 1.524862

This is smaller than 4.60517 = qchisq(0.9, df=2) , so (2, 1.5) is inside
the confidence region.

The Wald confidence region is always an ellipsoid, while the Wilks
is only convex.

Notice the p-values are slightly different from those of Wilks,
this is due to the fact that both are approx. pivotal.
(the chi square values are 19.68542 vs 17.91574; and  1.850313 vs 1.524862)