Max Grossmann: The interpretation of the Hessian matrix in a least squares context

The interpretation of the Hessian matrix in a least squares context

Posted: 2020-07-19 · Last updated: 2022-03-02 · Permalink

As you may know, in maximum likelihood estimation, the inverse of the Hessian matrix at the point of the maximum likelihood estimates is equal to the estimated variance of the estimates.

But what is the interpretation of the Hessian in a (ordinary) least squares context?

Remember that the solution to

$$ \min_{\beta} \underbrace{(y-X\beta)'(y-X\beta)}_{L\left(\beta\right)} $$

$$ \hat{\beta} = (X^T X)^{-1} X^T y. $$

We arrived at this conclusion by taking the first derivative of the objective function with respect to $\beta$, $L'(\beta) = -2 X^T y + 2 X^{T} X \beta$. If we take the second derivative, we should get the Hessian; and indeed we do. It is trivial to show that the Hessian turns out to be simply

$$H(\hat{\beta}) = \left.\frac{\partial^2 L\left(\beta\right)}{\partial \beta^2}\right|_{\hat{\beta}} = 2 X^T X.$$

Recognizing that in ordinary least squares, the variance of the estimates is estimated as $\widehat{V\left[\hat{\beta}\right]} = \hat{\sigma}^2 (X^T X)^{-1}$, with $\hat{\sigma}^2$ being the estimated error variance, we conclude that the Hessian matrix can be used to estimate the variance of the estimates in the following way:

$$\widehat{V\left[\hat{\beta}\right]} = \hat{\sigma}^2\, \text{diag}\left\{\left(\frac{1}{2}H(\hat{\beta})\right)^{-1}\right\}.$$

Pretty awesome, isn't it?