next up previous


Postscript version of this file

STAT 450 Lecture 13

Goals of Today's Lecture:

Today's notes

Approximate Distribution Theory

Up to now we have tried to compute the density or cdf of some transformation, Y=g(X), of the data X exactly. In most cases this is not possible. Instead theoretical statisticians try to find methods to compute fY or FY approximately. There are really two standard methods:

In this course we focus on large sample theory.

Asymptotics

You already know two large sample theorems:

I want to make these theorems precise (though I don't intend to prove them convincingly). I want you to have a good intuitive grasp of the meanings of the assertions, however.

The law of large numbers

There are two different senses in which $\bar{X}_n \to \mu$: convergence in probability and almost sure convergence.

Definition: A sequence Yn of random variables converges in probability to Y if, for each $\epsilon>0$:

\begin{displaymath}\text{P}(\vert Y_n-Y\vert > \epsilon) \to 0
\end{displaymath}

Definition: A sequence Yn of random variables converges almost surely (or strongly) to Y if

\begin{displaymath}\text{P}(Y_n \to Y) = 1
\end{displaymath}

Notice that the second kind of convergence asks to to calculate a single probability -- of an event whose definition is very complicated since it mentions all the Yn at the same time. The first kind of convergence involves computing a sequence of probabilities. The nth probability in the sequence mentions only 2 random variables, Yn and Y. Typically convergence in probability is easier to prove; it is a theorem that $Y_n \to Y$ almost surely implies $Y_n \to Y$ in probability. (Notice the way we write those assertions.)

Corresponding to the two kinds of convergence are two precise versions of the law of large numbers:

Theorem: the Weak Law of Large Numbers. If $X_1,X_2,\cdots$ are independent and identically distributed random variables such that $\text{E}(\vert X_1\vert) < \infty$then the sequence of sample means,

\begin{displaymath}\bar{X}_n = \frac{1}{n} \sum_1^n X_i \, ,
\end{displaymath}

converges to $\mu=\text{E}(X_1)$ in probability.

Theorem: the Strong Law of Large Numbers. If $X_1,X_2,\cdots$ are independent and identically distributed random variables such that $\text{E}(\vert X_1\vert) < \infty$then the sequence of sample means,

\begin{displaymath}\bar{X}_n = \frac{1}{n} \sum_1^n X_i \, ,
\end{displaymath}

converges to $\mu=\text{E}(X_1)$ almost surely.

The SLLN (note the abbreviation) is harder to prove. The WLLN can be deduced, provided $\text{E}(X_1^2) < \infty$ from Chebyshev's inequality:

Chebyshev's inequality: If $\text{Var}(Y) < \infty$then

\begin{displaymath}\text{P}(\vert Y-\mu_Y\vert > t ) \le \frac{\text{Var}(Y)}{t^2}
\end{displaymath}

for all t > 0.

To apply the theorem let $X_1,\ldots,X_n$ be independent and identically distributed (iid) with mean $\mu$ and variance $\sigma^2$. Then $\bar{X}_n$ has mean $\mu$ and $\text{Var}(\bar{X}_n) = \sigma^2/n$. Thus

\begin{displaymath}\text{P}(\vert\bar{X}_n-\mu\vert > \epsilon) \le
\frac{\text{Var}(\bar{X}_n)}{\epsilon_2} = \frac{\sigma^2}{n\epsilon^2}
\to 0
\end{displaymath}

for each $\epsilon>0$. This proves the weak law of large numbers in the special case $\text{Var}(X_1) < \infty$.

This is a sort of approximate distribution calculation. We say that a certain random variable has almost the same distribution as another, namely, if n is large then the random variable $\bar{X}_n$ has almost the same distribution as the (nonrandom) quantity $\mu$.

In some of the calculations we are about to make that approximation is good enough. In others, however, we want to know just how close to $\mu$ the quantity $\bar{X}_n$ is likely to be. The answer is provided by the central limit theorem.

The central limit theorem

If $X_1,X_2,\cdots$ are iid with mean 0 and variance 1 then $n^{1/2}\bar{X}$ converges in distribution to N(0,1). In previous textbooks you will have seen pictures like the following:



The picture above is an example of one version of the central limit theorem. It shows that the probability mass function of a Binomial(100,0.5) random variable Y100 is close to that of a N(50,5) random variable. To see that this is the central limit theorem setting let $X_1,X_2,\cdots$ be independent random variables with P(Xi=1) = p = 1-P(Xi=0). (We call the Xi Bernoulli random variables.) Then $Y_{100}=X_1+\cdots+X_100 \sim Binomial(100,1/2)$. The central limit theorem asserts that $\sqrt{n}(\bar{X}_n -p)/\sqrt{p(1-p)}$is nearly N(0,1) and this quantity is actually $(Y_n-np)/\sqrt{np(1-p)}$.

Here is another example of the central limit theorem. Here the Xi are independent and identically distributed random variables with P(Xi=0)=P(Xi=2)=127.5/256 and P(Xi=1)=1/256. The top plot shows a histogram style plot of $P(\sum_1^{256} X_i =k)$against k. The variable Y has mean 128 and standard deviation $16\sigma \approx 15.97$. You are meant to see that the superimposed normal curve goes between the odd number bars and the even number bars.

The plots illustrate the difference between the local central limit theorem, which says that the density of $\bar{X}_n$ is close to the normal density and the global central limit theorem which says that the cdf of $\bar{X}_n$ is close to the normal cdf. That is,

\begin{displaymath}P(n^{1/2}(\bar{X}-\mu)/\sigma \le x ) \to \frac{1}{2\pi} \int_{-\infty}^x e^{-y^2/2} dy
\, .
\end{displaymath}

You should see that for this example the global central limit theorem (generally the word global is omitted and we just call this one ``the'' central limit theorem) provides a good approximation while the local central limit theorem does not. In general the local CLT requires more hypotheses than the global CLT.


<\CENTER>
The central limit theorem: If $X_1,X_2,\cdots$ are independent and identically distributed with mean $\mu$ and finite standard deviation $\sigma$then the sequence of standardized sample means,

\begin{displaymath}Z_n = n^{1/2} \frac{\bar{X}_n - \mu}{\sigma}
\end{displaymath}

converges in distribution to N(0,1)in the sense that

\begin{displaymath}\text{P}(Z_n \le z ) \to \int_{-\infty}^z \frac{1}{\sqrt{2\pi}}e^{-u^2/2}
\, du
\end{displaymath}

In this course we will state (but not really prove) a number of theorems with conclusions of this form. To do so we need some mathematical tools.

Convergence in Distribution

If $X_1,\ldots,X_n$ are iid from a population with mean $\mu$ and standard deviation $\sigma$ then $n^{1/2}(\bar{X}-\mu)/\sigma$ has approximately a normal distribution. We also say that a Binomial(n,p) random variable has approximately a N(np,np(1-p)) distribution.

To make precise sense of these assertions we need to assign a meaning to statements like ``X and Y have approximately the same distribution''. The meaning we want to give is that X and Y have nearly the same cdf but even here we need some care. If n is a large number is the N(0,1/n) distribution close to the distribution of $X\equiv 0$? Is it close to the N(1/n,1/n) distribution? Is it close to the $N(1/\sqrt{n},1/n)$ distribution? If $X_n\equiv 2^{-n}$ is the distribution of Xn close to that of $X\equiv 0$?

The answer to these questions depends in part on how close close needs to be so it's a matter of definition. In practice the usual sort of approximation we want to make is to say that some random variable X, say, has nearly some continuous distribution, like N(0,1). In this case we must want to calculate probabilities like P(X>x) and know that this is nearly P(N(0,1) > x). The real difficulty arises in the case of discrete random variables; in this course we will not actually need to approximate a distribution by a discrete distribution.

When mathematicians say two things are close together they either can provide an upper bound on the distance between the two things or they are talking about taking a limit. In this course we do the latter.

Definition: A sequence of random variables Xn converges in distribution to a random variable X if

\begin{displaymath}E(g(X_n)) \to E(g(X))
\end{displaymath}

for every bounded continuous function g.

Theorem: The following are equivalent:

1.
Xn converges in distribution to X.
2.
$P(X_n \le x) \to P(X \le x)$ for each x such that P(X=x)=0
3.
The characteristic functions of Xn converge to that of X:

\begin{displaymath}E(e^{itX_n}) \to E(e^{itX})
\end{displaymath}

for every real x.
These are all implied by

\begin{displaymath}M_{X_n}(t) \to M_X(t) < \infty
\end{displaymath}

for all $\vert t\vert \le \epsilon$ for some positive $\epsilon$.

Now let's go back to the questions I asked:

Here is the message you are supposed to take away from this discussion. You do distributional approximations by showing that a sequence of random variables Xn converges to some X. The limit distribution should be non-trivial, like say N(0,1). We don't say Xn is approximately N(1/n,1/n) but that n1/2 Xn converges to N(0,1) in distribution.


next up previous



Richard Lockhart
1999-10-09