Max Grossmann: Distribution of the maximum of random variables

Distribution of the maximum of random variables

Posted: 2020-07-17 · Last updated: 2025-06-19 · Permalink

Assume you collected one thousand data points. What is your expectation of the maximum of these data points?

In the following, we will assume $n$ random variables $X_{1}, \dots, X_{n}$ that are identically and independently distributed with a common probability distribution function (p.d.f.) $f$. The left-tailed cumulative distribution function (c.d.f.) is $F$. Let the random variable $Y = \max\{X_{1},\dots,X_{n}\}$. We are looking for the distribution of $Y$, i. e. its probability distribution function $f_{Y}(y)$.

First, consider the case where $n = 2$. The maximum $Y = y$ when:

$X_1 = y$ and $X_2 \leq y$ or
$X_2 = y$ and $X_1 < y$.

Note that case 1 includes when $X_1 = X_2 = y$, while case 2 uses a strict inequality to avoid double-counting this scenario.

Since $X_1$ and $X_2$ are independently distributed, it follows that

$$ P(Y = y) = P(X_1 = y)\cdot P(X_2 \leq y) + P(X_2 = y)\cdot P(X_1 < y). $$

In the continuous world, we should derive the distribution of the maximum using the c.d.f. We can then find the p.d.f. of the maximum by differentiating. In fact, let us actually derive the c.d.f. of the maximum of $n$ iid random variables $X_i$:

\begin{align} F_{Y}(y) &= P(Y \leq y),\\ &= P(\max\left\{X_{i},\dots,X_{n}\right\} \leq y),\\ &= \prod_{j=1}^{n} P(X_{i} \leq y),\\ &= \prod_{j=1}^{n} F(y)\\ &= F(y)^n. \end{align}

In line 1, we used the definition of the cumulative distribution function. In line 2, we plugged in the definition of $Y$. In line 3, we used the independence of the $X_i$ and applied our prior reasoning. In line 4, we used the definition of the cumulative distribution function and the fact that the $X_i$ are identically distributed. Line 5 follows trivially.

By differentiating, we find that $f_{Y}(y) = n f(y) F(y)^{n-1}$.

For $n$ normal distributions with mean $\mu$ and variance $\sigma^2$, the maximum of these normal distributions is distributed as follows:

$$ f_{Y}(y) = n\cdot \frac{\exp\left(-\frac{(y-\mu)^2}{2\sigma^2}\right)}{\sqrt{2\pi\sigma^2}}\cdot \left( \frac{1}{2} + \frac{1}{2}\text{erf}\left(\frac{y-\mu}{\sqrt{2\sigma^2}}\right) \right)^{n-1}. $$

At the start of this note, we were looking for the expectation of the maximum. Therefore, what is $E[Y]$? First of all, note that $E[Y] = \int_{-\infty}^{\infty} y\,f_{Y}(y)\,dy$. This integral is too tough for us and therefore, let us make a numeric approximation. We will set $\mu = 0$ and $\sigma^2 = 1$, although changing that would be easy.

Fire up R and type the following:

fY <- function(y, n) {
    n*dnorm(y)*pnorm(y)^(n-1)
}

We want to calculate $E[Y]$ for $n = 1000$. Let's use the integrated integrate function that allows us to integrate numerically. Type ?integrate for details.

integrate(function(y) y*fY(y, n = 1000), lower = -Inf, upper = +Inf)

… should give you something like the following output:

3.241436 with absolute error < 1.2e-06

This is basically the result: If you have 1000 normally distributed data points with $\mu = 0$ and $\sigma^2 = 1$, you would expect the maximum of these 1000 data points to be approximately 3.241436. However, maybe you want a confidence interval: For example, in what interval will the maximum lie in 95 percent of cases?
To solve that, define the left-tailed c.d.f. explicitly:

FY <- function(y, n) {
    integrate(fY, lower = -Inf, upper = y, n = n)$value
}

The lower end of the confidence interval (let's call it $i_{0}$) will be characterized by $F(i_{0}) = 0.025$; similarly, the upper end ($i_{1}$) is characterized by $F(i_{1}) = 0.975$. We use uniroot to solve these equations for $i_0$ and $i_1$. Indeed, we define a quantile function so that we have a nice interface:

QY <- function(p, n) {
    uniroot(function(y) FY(y, 1000) - p, interval=c(-10,10))$root
}

Typing QY(0.025) and QY(0.975) gives us the following final result: In 95 percent of cases, the maximum of 1000 normally distributed data points with $\mu = 0$ and $\sigma^2 = 1$ will lie in the interval [2.679885, 4.052675].

A Monte-Carlo simulation confirms this result:

data <- replicate(1000000, max(rnorm(n = 1000)))
mean(data > 2.679885 & data < 4.052675) # 0.949993