Max R. P. Grossmann

← Go back

Distribution of the maximum of random variables

Assume you collected one thousand data points. What is your expectation of the maximum of these data points?

In the following, we will assume $n$ random variables $X_{1}, \dots, X_{n}$ that are identically and independently distributed with a common probability distribution function (p.d.f.) $f$. The left-tailed cumulative distribution function (c.d.f.) is $F$. Let the random variable $Y = \max\{X_{1},\dots,X_{n}\}$. We are looking for the distribution of $Y$, i. e. its probability distribution function $f_{Y}(y)$.

First, consider the case where $n = 2$. Some $y$ is the maximum

  1. if $x_1 = y$ and $x_2 < x_1$ or
  2. if $x_2 = y$ and $x_1 < x_2$.

Since $X_1$ and $X_2$ are independently distributed, it follows that

$$ P(Y = y) = P(X_1 = y)\cdot P(X_2 < x_1) + P(X_2 = y)\cdot P(X_1 < x_2). $$

In the continuous world, even though probability densities are not probabilities, they are their analogon. We recognize that $P(X_1 = y)$ is the p.d.f. of $X_1$ at the point $y$; that $P(X_2 = y)$ is the p.d.f. of $X_2$ at the point $y$; that $P(X_2 < x_1)$ is the c.d.f. of $X_2$ at $x_1$ and that $P(X_1 < x_2)$ is the c.d.f. of $X_1$ at $x_2$. Using our definitions of c.d.f. and p.d.f., it follows trivially that

$$ f_{Y}(y) = f(y)\cdot F(y) + f(y)\cdot F(y). $$

Since $X_1$ and $X_2$ are identically distributed, it is visible that we can simplify even further:

$$ f_{Y}(y) = 2 f(y) F(y). $$

This is the p.d.f. of the maximum of two i.i.d. random variables. Pretty amazing, isn't it?

Let us now generalize. For any $n$, it follows that some $y$ is the maximum

  1. if $x_1 = y$ and $x_2 < x_1$ and $x_3 < x_1$ andand $x_n < x_1$ or
  2. if $x_2 = y$ and $x_1 < x_1$ and $x_3 < x_1$ andand $x_n < x_1$ or
  3. if $x_n = y$ and $x_1 < x_1$ and $x_2 < x_1$ andand $x_{n-1} < x_1$.

In other words,

$$ f_{Y}(y) = \overbrace{f(y)\cdot \underbrace{F(y)\cdot\,\dots\,\cdot F(y)}_{\text{$n-1$ times $F(y)$}} + f(y)\cdot F(y)\cdot\,\dots\,\cdot F(y) + \dots}^{\text{$n$ times the first term}} $$

By recognizing the multiplicative and additive patterns, it follows that

$$ f_{Y}(y) = n\cdot f(y)\cdot F(y)^{n-1}. $$

That's it!

For $n$ normal distributions with mean $\mu$ and variance $\sigma^2$, the maximum of these normal distributions is distributed as follows:

$$ f_{Y}(y) = n\cdot \frac{\exp\left(-\frac{(y-\mu)^2}{2\sigma^2}\right)}{\sqrt{2\pi\sigma^2}}\cdot \left( \frac{1}{2} + \frac{1}{2}\text{erf}\left(\frac{y-\mu}{\sqrt{2\sigma^2}}\right) \right)^{n-1}. $$

At the start of this note, we were looking for the expectation of the maximum. Therefore, what is $E[Y]$? First of all, note that $E[Y] = \int_{-\infty}^{\infty} y\,f_{Y}(y)\,dy$. This integral is too tough for us and therefore, let us make a numeric approximation. We will set $\mu = 0$ and $\sigma^2 = 1$, although changing that would be easy.

Fire up R and type the following:

fY <- function(y, n) {
	n*dnorm(y)*pnorm(y)^(n-1)
}

We want to calculate $E[Y]$ for $n = 1000$. Let's use the integrated integrate function that allows us to integrate numerically. Type ?integrate for details.

integrate(function(y) y*fY(y, n = 1000), lower = -Inf, upper = +Inf)

… should give you something like the following output:

3.241436 with absolute error < 1.2e-06

This is basically the result: If you have 1000 normally distributed data points with $\mu = 0$ and $\sigma^2 = 1$, you would expect the maximum of these 1000 data points to be approximately 3.241436. However, maybe you want a confidence interval: For example, in what interval will the maximum lie in 95 percent of cases?
To solve that, define the left-tailed c.d.f. explicitly:

FY <- function(y, n) {
	integrate(fY, lower = -Inf, upper = y, n = n)$value
}

The lower end of the confidence interval (let's call it $i_{0}$) will be characterized by $F(i_{0}) = 0.025$; similarly, the upper end ($i_{1}$) is characterized by $F(i_{1}) = 0.975$. We use uniroot to solve these equations for $i_0$ and $i_1$. Indeed, we define a quantile function so that we have a nice interface:

QY <- function(p, n) {
	uniroot(function(y) FY(y, 1000) - p, interval=c(-10,10))$root
}

Typing QY(0.025) and QY(0.975) gives us the following final result:

In 95 percent of cases, the maximum of 1000 normally distributed data points with $\mu = 0$ and $\sigma^2 = 1$ will lie in the interval [2.679885, 4.052675].

A Monte-Carlo simulation confirms this result:

data <- replicate(1000000, max(rnorm(n = 1000)))
mean(data > 2.679885 & data < 4.052675) # 0.949993