Assume you collected one thousand data points. What is your expectation of the *maximum* of these data points?

In the following, we will assume $n$ random variables $X_{1}, \dots, X_{n}$ that are identically and independently distributed with a common probability distribution function (p.d.f.) $f$. The left-tailed cumulative distribution function (c.d.f.) is $F$. Let the random variable $Y = \max\{X_{1},\dots,X_{n}\}$. We are looking for the distribution of $Y$, i. e. its probability distribution function $f_{Y}(y)$.

First, consider the case where $n = 2$. Some $y$ is the maximum

- if $x_1 = y$
**and**$x_2 < x_1$*or* - if $x_2 = y$
**and**$x_1 < x_2$.

Since $X_1$ and $X_2$ are independently distributed, it follows that

$$ P(Y = y) = P(X_1 = y)\cdot P(X_2 < x_1) + P(X_2 = y)\cdot P(X_1 < x_2). $$

In the continuous world, even though probability densities are not probabilities, they are their analogon. We recognize that $P(X_1 = y)$ is the p.d.f. of $X_1$ at the point $y$; that $P(X_2 = y)$ is the p.d.f. of $X_2$ at the point $y$; that $P(X_2 < x_1)$ is the **c.**d.f. of $X_2$ at $x_1$ and that $P(X_1 < x_2)$ is the **c.**d.f. of $X_1$ at $x_2$. Using our definitions of c.d.f. and p.d.f., it follows trivially that

$$ f_{Y}(y) = f(y)\cdot F(y) + f(y)\cdot F(y). $$

Since $X_1$ and $X_2$ are identically distributed, it is visible that we can simplify even further:

$$ f_{Y}(y) = 2 f(y) F(y). $$

This is the p.d.f. of the maximum of two i.i.d. random variables. Pretty amazing, isn't it?

Let us now generalize. For any $n$, it follows that some $y$ is the maximum

- if $x_1 = y$
**and**$x_2 < x_1$**and**$x_3 < x_1$**and**⋯**and**$x_n < x_1$*or* - if $x_2 = y$
**and**$x_1 < x_1$**and**$x_3 < x_1$**and**⋯**and**$x_n < x_1$*or* - ⋮
- if $x_n = y$
**and**$x_1 < x_1$**and**$x_2 < x_1$**and**⋯**and**$x_{n-1} < x_1$.

In other words,

$$ f_{Y}(y) = \overbrace{f(y)\cdot \underbrace{F(y)\cdot\,\dots\,\cdot F(y)}_{\text{$n-1$ times $F(y)$}} + f(y)\cdot F(y)\cdot\,\dots\,\cdot F(y) + \dots}^{\text{$n$ times the first term}} $$

By recognizing the multiplicative and additive patterns, it follows that

$$ f_{Y}(y) = n\cdot f(y)\cdot F(y)^{n-1}. $$

That's it!

For $n$ normal distributions with mean $\mu$ and variance $\sigma^2$, the maximum of these normal distributions is distributed as follows:

$$ f_{Y}(y) = n\cdot \frac{\exp\left(-\frac{(y-\mu)^2}{2\sigma^2}\right)}{\sqrt{2\pi\sigma^2}}\cdot \left( \frac{1}{2} + \frac{1}{2}\text{erf}\left(\frac{y-\mu}{\sqrt{2\sigma^2}}\right) \right)^{n-1}. $$

At the start of this note, we were looking for the expectation of the maximum. Therefore, what is $E[Y]$? First of all, note that $E[Y] = \int_{-\infty}^{\infty} y\,f_{Y}(y)\,dy$. This integral is too tough for us and therefore, let us make a numeric approximation. We will set $\mu = 0$ and $\sigma^2 = 1$, although changing that would be easy.

Fire up R and type the following:

fY <- function(y, n) { n*dnorm(y)*pnorm(y)^(n-1) }

We want to calculate $E[Y]$ for $n = 1000$. Let's use the integrated `integrate` function that allows us to integrate numerically. Type `?integrate` for details.

integrate(function(y) y*fY(y, n = 1000), lower = -Inf, upper = +Inf)

… should give you something like the following output:

`3.241436 with absolute error < 1.2e-06`

This is basically the result: If you have 1000 normally distributed data points with $\mu = 0$ and $\sigma^2 = 1$, you would expect the maximum of these 1000 data points to be approximately 3.241436. However, maybe you want a confidence interval: For example, in what interval will the maximum lie in 95 percent of cases?

To solve that, define the left-tailed c.d.f. explicitly:

FY <- function(y, n) { integrate(fY, lower = -Inf, upper = y, n = n)$value }

The lower end of the confidence interval (let's call it $i_{0}$) will be characterized by $F(i_{0}) = 0.025$; similarly, the upper end ($i_{1}$) is characterized by $F(i_{1}) = 0.975$. We use `uniroot` to solve these equations for $i_0$ and $i_1$. Indeed, we define a quantile function so that we have a nice interface:

QY <- function(p, n) { uniroot(function(y) FY(y, 1000) - p, interval=c(-10,10))$root }

Typing `QY(0.025)` and `QY(0.975)` gives us the following final result:

In 95 percent of cases, the maximum of 1000 normally distributed data points with $\mu = 0$ and $\sigma^2 = 1$ will lie in the interval [2.679885, 4.052675].

A Monte-Carlo simulation confirms this result:

data <- replicate(1000000, max(rnorm(n = 1000))) mean(data > 2.679885 & data < 4.052675) # 0.949993