字幕記錄


00:00
All right, so let's get started.
00:02
So today, we're gonna talk about what are probably
00:06
the two most famous theorems in the entire history of probably.
00:11
They're called the law of large numbers and the central limit theorem.
00:15
They're closely related, so makes sense to do them together, kind of compare and
00:20
contrast them.
00:22
I don't, I can't think of a more famous probability theorem than these two.
00:27
So the setup for today is that we have i.i.d random variables.
00:34
Let's just call them X1, X2 i.i.d.
00:39
Since they're i.i.d they have the same mean and variance.
00:44
If the mean and variance exists but we'll assume they do.
00:49
So the mean, we'll just call it Mu.
00:50
And the variants, sigma squared.
00:56
So we're assuming that these are finite for now.
00:59
The mean and variants exist.
01:00
And both of these theorems tell us
01:05
what happens to the sample mean as n gets large.
01:09
So, the sample mean is just defined as Xn bar.
01:14
Standard notation in statistics is put a bar to mean averages and
01:17
that's just the average of the first n.
01:21
So to take the first n random variables, and average them, so
01:24
that's just called the sample mean.
01:32
So the question is, what can we say about Xn bar as n gets large?
01:37
So the way we would interpret this or use this is we get to observe.
01:44
These Xs, they're random variables but after we observe them they become data.
01:47
We're never going to have an infinite amount of data so
01:51
at some point we stop it at n.
01:52
We can think of that as the sample size and hopefully we get a large sample size.
01:57
Of course, it depends on the problem.
01:58
Some problems, you may not be able to get large n.
02:01
Well, we assume n is large, and
02:03
just take the average, question is just, what can we say?
02:07
All right, so first, here's what the law of large numbers says.
02:16
It's a very simple statement.
02:19
And hopefully pretty intuitive, too.
02:22
Law of Large Numbers says that Xn bar
02:27
converges to mu, as n goes to infinity.
02:36
With probability 1.
02:39
That's the fine print, probability 1.
02:44
With probability 0, so something really crazy could happen.
02:47
But we don't worry too much about it, because it has probability 0.
02:50
With probability 1, this is the sample mean, and
02:54
it says that the sample mean converges to the true mean.
03:03
So, that is a pretty nice, intuitive, easy to remember result.
03:11
That is, by true I mean the theoretical mean.
03:14
That is the expected value of Xj for any j is the true expected value.
03:21
Whereas this, is a random variable.
03:24
Right? We're taking an average of
03:25
random variables.
03:25
That's a random variable.
03:26
So this is just a constant but this is a random variable.
03:30
But it's gonna converge and I should say a little bit more,
03:35
what is this convergence statement actually mean.
03:39
You've all seen limit of sequences, but when we are talking about limits of random
03:43
variables we have to be a little more careful.
03:45
How do we actually define this.
03:47
The definition of this statement is just pointwise which means,
03:54
remember Xn bar is a random variable.
03:56
Random variable mathematically speaking is a function.
03:58
So it's say for each possible, if you evaluate this at some
04:02
specific outcome of the experiment, then you'll get a sequence of numbers.
04:06
That is if you actually observed the values and this kind of crystallizes into
04:11
numbers when you evaluate it at the outcome of the experiment.
04:14
And so those numbers converge to mu.
04:20
In other words, this is an event.
04:23
Either these random variables converge or they don't.
04:27
And we say that event has probability 1.
04:31
That' what the statement of the theorem is.
04:34
So to just give a simple example.
04:41
Let's think about what happens if we have Bernoulli p.
04:45
So if Xj is Bernoulli p, then intuitively we're
04:50
just imagining a infinite sequence of coin tosses.
04:56
Where the probability of heads is p, and
05:00
then this says that if we add up all of these Bernoullis up to n,
05:06
that it's just in the first coin flips, how many times did the coin land heads,
05:13
divided by the number of flips should convert to p with probability 1.
05:25
So for example, so this is a very intuitive statement.
05:28
If it's a fair coin and you flip the coin a million times, well,
05:33
you're not really expecting that it will be 500,000 heads and 500,00 tails.
05:39
But you do think that, in the long run, it should be the case
05:44
that it's going to be essentially half heads, half tails.
05:48
Not exactly, but essentially.
05:50
And the proportion should get closer and closer to the true value.
05:56
This qualification would probably 1 is needed because mathematically speaking
05:59
even if you have a fair coin, there's nothing in the math that says
06:04
it's impossible that the coin would land heads, heads, heads, heads, heads forever.
06:09
You know that that's never actually gonna happen in reality.
06:13
It's just not gonna happen.
06:15
It's a fair coin.
06:16
It might land heads, heads, heads for a time if you're very lucky or
06:20
unlucky or whatever.
06:21
But it's not gonna be heads, heads, heads forever.
06:26
But there's nothing in the math that says that's an invalid sequence.
06:31
So there's some weird pathological cases like that.
06:35
But with probability one, we get what we expect.
06:39
If we didn't have this result, how we would ever even estimate p?
06:45
You might imagine if you didn't know what p was,
06:48
kind of the obvious thing to do is flip the coin a lot of times and
06:51
take the proportion of heads and use that as your approximation for p.
06:54
But what justification could you have for
06:57
doing that approximation if you didn't have this.
07:00
So this is a very, very necessary result.
07:06
But I guess to comment a little bit more about what does it actually say for
07:09
the coin, because this is kind of related to gambler's fallacy, and
07:13
things like that.
07:15
The gambler's fallacy is the idea that like let's say your gambling and
07:19
you lose like ten times in a row and then it's the feeling that your due to win.
07:27
You lost all these times then and you might try to justify that using a lot
07:32
of large numbers and say well you know the coin might landed let's say,
07:36
heads you win money, tails you lose money, you just lost money ten times in a row.
07:41
But the law of large numbers says, in the long run,
07:44
it's gonna go back to one-half if it's fair.
07:47
So somehow you need to start winning a lot to compensate.
07:51
That's not the way it works.
07:54
The coin is memoryless.
07:56
The coin does not care how many failures or how many losses you had before.
08:00
So the way it works is not through If you're unlucky at the beginning that
08:04
somehow it gets offset later by an increase in heads.
08:09
The way it works is through what we might call swamping.
08:14
And let's say the coin landed tails a 100 times in a row.
08:19
It doesn't mean that the probability has changed for 101st flip.
08:24
What it means though, is that we're letting n go to infinity here, okay?
08:29
So no matter how unlucky you were in the first 100 or
08:32
the first million trials, that's nothing compared to infinity, right?
08:37
So those first million just get swamped out by the entire infinite future,
08:42
so that what's going on here.
08:50
Yeah, so to tell you one little story about the law
08:55
of large numbers, a colleague of mine told me this story.
09:01
He had a student once who said he hated statistics.
09:06
And of course, my colleague was very shocked,
09:08
like how can anyone hate statistics?
09:11
And so he asked, why?
09:12
How is it possible that you hate statistics?
09:15
And then the student who was an athlete, and he was training everyday and
09:19
he had just learned the law of large numbers.
09:22
And he was very, very depressed by this because he said, the law of large numbers
09:26
says in the long run, I'm gonna only be average and I can't improve.
09:30
So well, of course the fallacy there, we assumed iid right now.
09:38
Now there are generalizations of this theorem beyond iid, but
09:41
we can't just get rid of iid.
09:43
So the iid is saying that the distribution is not changing with time.
09:48
That doesn't mean that you can't actually improve your own distribution then it
09:53
would not be iid.
09:54
So don't be depressed by this, and in fact this theorem
09:59
I think is crucial in order for science to actually be possible.
10:05
Because if you kind of imagine kind of hypothetical
10:08
counter factual world where this theorem was actually false.
10:13
That would be really depressing to try to ever learn about the world, right?
10:18
Cuz this is saying, you're collecting more and more data.
10:21
You're letting your sample size go to infinity.
10:24
And this says, you converged to the truth, right?
10:28
And it would be some weird setting, where you get more and more data, and more and
10:31
more data, and yet you're not able to converge to the truth, right?
10:35
So that would be really bad.
10:36
So this is very intuitive, very important.
10:40
Okay, so let's prove this at least a similar version.
10:47
So this is actually sometimes called the strong law of large numbers.
10:53
And we're actually gonna prove what's sometimes called
10:56
the weak law of large numbers.
10:57
I don't really like the terminology strong and weak here, but
11:02
that's kind of a standard.
11:04
Strong law of large numbers is what I just said,
11:07
where it's converging point-wise with probability 1.
11:13
That is just these random variables converged to this constant,
11:19
except on some bad event that has probability 0.
11:23
The weak law of large numbers says that for
11:26
any, C greater than 0,
11:32
the probability that Xn bar minus
11:37
the mean is greater than c goes to 0.
11:42
So it's a very similar looking statement.
11:47
It's not exactly equivalent.
11:49
It's possible to show, you have to go through some real analysis for
11:53
this that is not necessary for our purposes.
11:55
But it turns out that, this statement,
11:57
once you've proven this thing it implies this form of convergence.
12:01
This is called convergence in probability, but
12:06
the intuition is very similar.
12:09
So just to interpret this statement in words it says, so we can chose,
12:14
we should interpret c as being some small number.
12:17
So let's say we chose c to be 0.001, okay?
12:21
And then it says that this thing goes to 0, so in other words, this,
12:26
as n goes to infinity again.
12:29
So this says that if n is large enough, then
12:34
it's extremely unlikely that these are more than 0.001 apart.
12:38
In other words, if n is large,
12:41
it's extremely likely that this is extremely close to this, right?
12:46
So it's a very similar statement, n is large,
12:49
it's extremely likely that the sample mean is very close to the true mean.
12:54
Okay, so that's what it says.
12:55
So we'll prove this one,
12:58
because to prove this one takes a lot of work and a lot of time.
13:03
This one, it looks like it's a nice-looking theorem.
13:06
And it is a nice theorem, but
13:07
we can prove it very easily using Chebyshev's inequality.
13:15
Okay, so let's prove the weak law of large numbers.
13:23
So all we need to do is show that this goes to 0, right?
13:26
That's what the statement is.
13:28
So let's just bound it using, this looks pretty similar to what we were doing last
13:32
time, where we did Markov's inequality, Chebyshev's inequality.
13:36
This looks similar to that kind of stuff from last time,
13:39
which is why I did that, well, one reason for doing that last time.
13:42
We need the inequalities anyway, but it's especially useful here.
13:46
So we just need to show this thing goes to 0.
13:48
Xn bar minus mu greater than c, goes to 0,
13:55
By Chebyshev's inequality, this is less than or
13:59
equal to the variance of Xn bar divided by c squared,
14:03
that's just exactly Chebyshev from last time.
14:09
Now we just need the variance of Xn bar, variance of Xn bar,
14:15
well, just stare at the definition of Xn bar for a second.
14:18
There's a 1 over n in front, that comes out as 1 over n squared.
14:25
And then since I'm assuming they're iid an then dependent,
14:28
the variance of the sum is just n times the variance of one term.
14:32
So that's n sigma squared divided by c squared,
14:36
which is sigma squared over nc squared.
14:41
Sigma is a constant, c is a constant, n goes to infinity, so this goes to 0.
14:48
So that proved the weak law of large numbers, just only a one line thing.
14:59
Okay, so that tells us what happens point-wise when we average a bunch
15:06
of iid random variables, and it converges to the mean.
15:11
So let me just rewrite that statement.
15:14
Then we'll write the central limit theorem and kind of compare them.
15:17
So another way to write what we just showed
15:22
is that Xn bar minus mu goes to 0 as n goes to
15:27
infinity, which is a good thing to know.
15:33
However, it doesn't tell us what the distribution of Xn bar looks like.
15:40
So this is true with probability one, but what is the distribution?
15:52
What is the distribution of Xn bar look like?
16:00
So this says it's getting closer, Xn bar is getting closer and
16:05
closer to this constant mu.
16:07
Okay, but that's not really telling us the shape, and
16:10
it's not really telling us the rate.
16:12
This goes to 0, but at what rate?
16:16
So one way to think about problems like that, when you have something going to 0,
16:22
and you wanna study something about, how fast does it go to 0?
16:27
Then one might, not just in here, but
16:30
just as a general approach to that kind of problem.
16:33
We know this goes to 0, but we don't know how fast.
16:37
One way to study that would be multiply it by something that goes to infinity, right.
16:42
Now, if we multiply it by something that goes to infinity,
16:47
such that this times this goes to infinity.
16:50
Then we know that this part that blows up is dominating over this part.
16:55
And if we multiply by something that goes to infinity, but
16:58
this whole thing still goes to 0, then that's more informative, right?
17:02
So what's gonna happen is that we can imagine multiplying here by
17:08
n to some power and we're gonna show that there's a power here,
17:12
and to some power, fill in the blank.
17:15
What we're gonna show is that,
17:18
if the power here is above some threshold and to the big powers,
17:24
its gonna go to infinity fast, this thing will just blow up.
17:29
And if we put a smaller power than the threshold here, then this is still going
17:34
to infinity as long as this is a positive power of n, this is still going to
17:39
infinity, this parts going to 0, but this part's dominating, right?
17:43
So this term is competing with this term.
17:46
This one goes to infinity, this one goes to 0, okay?
17:49
So then the question is what's that magic threshold value?
17:53
And the answer is one-half.
17:57
So that's what we're gonna study right now.
17:58
So we're gonna take the square root of n times xn bar minus mu.
18:04
This is kind of the happy medium,
18:06
where we're gonna get a non-degenerate distribution, that this is gonna converge
18:11
in distribution to an actual distribution, it's not gonna just get killed to 0 or
18:16
blow up to infinity, it's actually gonna give us a nice distribution.
18:22
Okay, and I'm also gonna divide by the sigma here, makes it a little bit cleaner.
18:28
So this is the central limit theorem now.
18:31
I'm stating it, then we'll prove it.
18:37
Central limit theorem says, if you take this and
18:40
look at what happens as n goes to infinity.
18:47
Converges to standard normal in distribution.
18:55
[SOUND] By convergence and distribution, what we mean is that
19:00
the distribution of this converges to the standard normal distribution.
19:06
In other words, you could take the CDF.
19:09
I mean these may be discrete or continuous or a mixture of discreet and continuous.
19:14
So it doesn't necessarily have a PDF, but every random variable has a CDF.
19:20
So it says if you take the CDF of this,
19:22
it's gonna converge to capital 5, the standard normal.
19:27
So I think this is kind of an amazing result that this holds in such generality,
19:33
right, because I mean the normal is just this one, standard normal is just this
19:38
one particular, it's a nice looking bell curve, but that's just one distribution.
19:44
And those x's they could be discrete, they could be continuous,
19:48
they could be extremely nasty looking distributions, right?
19:52
It could look like anything,
19:54
the only thing we assumed was that there was a finite variance.
19:59
Other than that, they could have an incredibly complicated,
20:03
messy distribution.
20:06
But it's always gonna go to standard normal.
20:09
So this is one of the reasons why the standard normal distribution is so
20:14
important on the one hand and so, widely used, because this is a theorem
20:19
as n goes to infinity is what it says, but the way it's used in practice is then
20:24
people use normal approximations all the time and a lot of the justification for
20:30
normal approximations is coming from this, because this says that if n is large,
20:36
then the sample mean will approximately have a normal distribution.
20:44
Even if the original data did not look like they came from a normal distribution,
20:50
when you average lots and lots of them, it looks normal, okay.
20:55
So this is in a sense is a better theorem than the law of large numbers,
20:59
but because it's kind of more informative to know the distribution,
21:03
know something about the rate, and you know it's interesting that it's,
21:07
square root of n is kind of the power of n that's just right, right?
21:11
A larger power it's gonna blow up, a smaller power it's gonna go to 0.
21:15
N to the one-half is the compromise, then you always get a normal distribution.
21:20
It's more informative in some sense, but
21:22
you should also keep in mind, it is a different sense of convergence.
21:27
Up here, we're talking about the random variables actually converging,
21:32
literally the random variables converge the sample mean converges
21:36
literally to point-wise with probability 1, to the true mean.
21:41
Here, we're talking about convergence in distribution.
21:43
So we're not talking about convergence of random variables.
21:47
We're just saying the distribution of this converges to the normal 0, 1 distribution.
21:52
So that's a different sense of convergence, but anyway,
21:57
both of them are telling us what's gonna happen to Xn bar when n is large, okay?
22:04
So well, let's prove this theorem.
22:07
Here's another way to write this, by the way,
22:11
it's good to be familiar with both ways.
22:15
It's just algebra to go from one to the other, but
22:18
they're both useful enough to be worth mentioning.
22:21
Let's just write the central limit theorem in terms of the sum of X's
22:26
rather than in terms of the sample mean.
22:29
So I'm just gonna take the sum of Xj, j equals 1 to n.
22:34
And so, we can either think of the central limit theorem as,
22:38
either think of it as telling us what happens to the sample mean or we
22:41
can think of it as telling us what happens to the sum, or the convolution, okay?
22:46
It's equivalent because they're just a factor of,
22:50
we just have to be careful not to mess up the factor of n,
22:53
b ut we can go from one to the other cuz it's just a factor of n.
22:57
So the claim is that this is approximately normal when n is large,
23:02
but if we just have this thing, this could easily just blow up.
23:08
You're just adding more and more terms.
23:10
But somehow we wanna standardize this first.
23:15
So if we take this thing, because this thing has mean and
23:20
mu, right, so let's subtract n mu.
23:26
Because then it has zero mean, because I just want to match.
23:30
I wanna make the mean 0 and the variance 1, so
23:32
that it kind of matches up with that, rather than just letting it blow up.
23:37
So this is called centering, we just subtracted by linearity,
23:41
the mean is n mu, so just subtract it n mu.
23:43
And then let's divide by the standard deviation,
23:47
this is just how we did standard deviation before.
23:50
So over there we showed that the variants of Xn bar is sigma-squared over n.
23:57
And the variance of this sum is just n sigma squared.
24:02
So let's just divide by the standard deviation, right,
24:07
which is square root of n Times sigma, okay?
24:12
Cuz the variance is n sigma squared.
24:15
So that's just the standardized version.
24:17
And the statement is again that this converges to the standard normal
24:22
in distribution.
24:23
So if we take this sum and standardize it, then it's gonna go standard normal.
24:33
Okay, so, all right, so now we're ready to prove this theorem.
24:41
And, sort of just a calculation, but it's kind of a nice
24:46
calculation in some ways, we're gonna prove it, well.
24:53
This theorem is always true as long as the variance exist.
24:57
We don't need to assume that, the third moment or the fourth moment exist.
25:01
But the proof is much more complicated to do it in that generality.
25:05
So we're gonna assume that the MGF exists, then we can actually work with the MGFs.
25:11
Because when you see this thing, sum of independent random variables,
25:15
then we know the MGF is gonna be something useful if it exists.
25:20
And there's ways to extend this proof to cases where the MGF doesn't exist.
25:23
But for our purposes, we may as well just assume MGF exists.
25:30
So assuming MGF, let's call it M(t).
25:38
Of Xj, they're iid, so if one of them has an MGF, they all have the same MGF.
25:44
We'll just assume that that exists.
25:54
Once we have MGFs, then our strategy is to show that the MGFs converge.
26:01
So that's a theorem about MGFs, that if the MGFs converge to some other MGF,
26:07
then the random variables converge in distribution, right?
26:12
We had a homework problem related to that, where you found that the MGFs converged
26:18
to some MGF, and that implies convergence of the distributions, right?
26:22
Okay, so that's the whole strategy.
26:24
So that means all we need to do is find the MGF of this and
26:28
then take the limit, okay?
26:30
So basically at this point, it's just like, write down the MGF,
26:35
take the limit, and use a few facts about MGFs, okay?
26:40
So first of all, we can assume.
26:50
That, let's just assume mu = 0 and
26:54
sigma = 1, just to simplify the notation.
27:00
This is without loss of generality,
27:04
because we could write this as, all we have to do is consider.
27:10
I wrote the standardized thing this way, but
27:14
I could've just written it as standardizing each X separately.
27:19
I could've written Xj- mu over sigma.
27:24
So this would be standardizing each of them separately, j = 1 to n, and
27:29
then we have a 1 over root n.
27:34
That will be the same thing that we're looking at.
27:36
This just says standardize them separately first.
27:39
But then you could just, I mean if you want, just call this thing Yj.
27:43
And once you have the central limit term for Yj, then you know that that's true.
27:47
So you might as well just assume that they've already been standardized.
27:51
And so just to have some notation, let's just let Sn equal the sum,
27:57
S for sum, of the first n terms.
28:00
And what we wanna show is that the MGF
28:04
of Sn over root n, that's what we're looking at, right?
28:08
That let mu equal zero, sigma equals one, so we're looking at Sn over root n.
28:12
And we wanna show that that goes to the standard normal MGF.
28:22
Right, so we just need to find this MGF, take a limit.
28:27
Okay, so let's just find the MGF.
28:30
So by definition, that's the expected value of e to the t times Sn over root n.
28:42
And Sn is just the sum.
28:44
So, and we're assuming independence, which means that these, you can
28:50
write this as e to the t x1 over root n, e to the t x2 over root n, blah, blah, blah.
28:56
All of those factors are independent, therefore, they're uncorrelated.
29:02
So we can just split it up as a product, X1/ over root n.
29:09
Blah, blah, blah, same thing,
29:13
just e to the Xj over root n is the general term, right?
29:18
I'm just using the fact that those are uncorrelated, so
29:23
we can write e of the product of the expectations.
29:28
But since these X's are iid,
29:30
these are really just the same thing written, n times.
29:33
So really, this is just this thing to the nth power.
29:40
And this thing, that should remind you of an MGF, right?
29:44
That's just the MGF of X1,
29:46
except that instead of evaluated at t, it's evaluated at t over root n.
29:51
So really, that's just the MGF,
29:54
evaluated at t over root n raised to the nth power.
30:00
So that's what we have.
30:04
Now we need to take the limit as n goes to infinity.
30:06
So let's just look at what's gonna happen here, n is going to infinity.
30:11
This thing on the inside becomes M of 0.
30:16
M of 0 is 1 for any MGF, right?
30:19
Cuz e to the 0 is 1.
30:21
So this is of the form 1 to the infinity which is in indeterminate form, right?
30:28
It could evaluate to anything.
30:31
So going back to calculus, how do you deal with 1 to the infinity,
30:35
or 0 over 0, or whatever.
30:37
Usually we try to reduce it to something where we can use L'Hopital's Rule for
30:41
those problems, right?
30:42
Or we can use a Taylor series type of thing.
30:45
So, how do we get into that form?
30:51
Take the log, because this looks like 1 to infinity.
30:56
If we take the log, it'll look like infinity times log of 1.
31:00
So it'll look like infinity times 0, take logs.
31:04
Then we just have to remember to exponentiate at the end to undo the log.
31:10
Okay, so let's write down then what we have.
31:18
After taking the log, and we're trying to do a limit, so
31:22
we're doing the limit as n goes to infinity, and we take the log.
31:26
It's n log M(t
31:31
over root n).
31:36
So that's of the form infinity times 0.
31:41
If we want 0 over 0 or infinity over infinity,
31:44
we can just write it as 1 over n in the denominator.
31:54
Okay, and now it's of the form 0 over 0.
31:57
So we can almost use L'Hopital's Rule, but not quite.
32:00
We have to be a little bit careful.
32:02
Because first of all, I'm assuming n is an integer,
32:05
and you can't do calculus on integers.
32:10
Secondly, it's just kind of, even if we pretended that n is a real number and
32:14
then the derivative of n would be- 1 over n squared and
32:18
that's kind of annoying to deal with.
32:20
And it's kind of annoying to deal with this square root here.
32:23
So let's first make a change of variables.
32:26
Let's just let y = 1 over root n and also let y be real, not necessarily,
32:37
Not necessarily of the form 1 over square root of an integer, okay?
32:42
So it's the same limit, just written in terms of y instead of in terms of n.
32:48
So as n goes to infinity y goes to 0 and 1 over n is y squared,
32:54
so it's denominator is just y squared.
32:58
The reason I do it this way is that 1 over root n is just y
33:02
by definition but then the numerator is just log m of yt.
33:07
That's a lot easier to deal with because we got rid of the square roots.
33:13
So it's still of the form 0 over 0.
33:16
So we're gonna use L'Hospital's Rule.
33:21
So limit, y goes to 0.
33:23
Take the derivative of the numerator and the denominator separately.
33:28
The derivative of the denominator is 2y.
33:31
The derivative of the numerator,
33:32
well we're just going to have to use the chain rule.
33:35
Derivative of log something is 1 over that thing.
33:39
So that's M of yt hence the derivative of that thing which again
33:44
by the chain rule is M prime of yt times the derivative of yt.
33:50
We're treating t as constant, we're differentiating with respect to y.
33:55
So t comes out.
34:00
And now let's see what we have.
34:02
Let's just summarize a couple facts about MGFs.
34:08
So M of t is the expected value of E to the tX1.
34:14
So M of 0 = 1 Okay.
34:22
And when we first started doing MGF we said that we take derivatives of the MGF
34:26
and evaluate it at 0.
34:28
We get the moments, that is why it's called the moment generating function.
34:31
So the first derivative at 0 is the mean, but we assume that mu is 0.
34:36
So this is 0, here.
34:38
And the second derivative, while we're doing this.
34:42
Secondary derivative is the second moment, but since we assumed that the variance is
34:46
1 and the mean is 0, the second moment is 1, okay?
34:51
So over here, as we let y go to 0, denominator's still going to 0.
34:56
Numerator's also going to 0, because M prime of 0 is 0,
35:01
so its still on the form 0 over 0, so let's just do what we were told again.
35:08
So first I can simplify it a little bit, this t can come out,
35:13
because that's acting as a constant, and the 2 can come out.
35:17
And limit y goes to 0 and this M of yt part,
35:24
that's just going to 1.
35:28
So we can write that as part of a separate limit, but
35:31
that other limit is just going to 1.
35:34
You can think of it as just the limit of this part times
35:36
the limit of the rest of it.
35:37
But that part's just going to 1, so we can get rid of that.
35:41
So really is just, what's left is just
35:47
the limit of M prime yt divided by y.
35:53
Everything else is gone, so it's actually pretty nicely simplified.
35:59
Now, using L'Hospital's Rule a second time,
36:02
now the derivative of the denominator is just 1, okay?
36:07
And for the numerator, chain rule, M double prime of yt.
36:16
That was a t not a t squared, but now it's a t squared,
36:19
because by the chain rule, derivative of yt is t, so we have a t squared over 2.
36:25
Now when we let y go to 0, now it's just M double prime 0 is 1, so
36:29
now this limit is just 1.
36:31
So we get t squared over 2,
36:35
that's what we wanted,
36:39
because t squared over 2 is the log.
36:45
Of e to the t squared over 2, but
36:48
e to the t square over 2 is exactly the normal 0,1 MGF.
37:02
Okay so,
37:03
to prove that theorem that's the end of the proof of the central limit theorem.
37:08
All we had to do was just basic facts out MGF, use, L'Hospital's Rule twice.
37:14
And there we have one of the most famous important theorems in statistics.
37:20
Now so there are more general versions of this,
37:23
like you can extend this in various ways where it's not an IID,
37:28
but it still has to satisfy some assumptions, right.
37:32
But anyway, this is the basic central limit theorem.
37:36
Okay, so that's pretty good.
37:40
Let's do an example, like how do we actually use this,
37:45
for the sake of approximations, things like that.
37:50
Last time I was talking about the difference between inequalities and
37:53
approximations, right?
37:54
And we talked about Poisson approximation before.
37:57
We haven't really talked about normal approximation.
38:01
This result is giving us the ability to use normal approximations
38:06
when we're studying sample mean and is large, okay?
38:11
So historically, though, the first version of
38:16
the central limit theorem that was ever proven,
38:21
I think was for binomials, okay?
38:24
So what we're saying is that
38:28
binomial np under some conditions will be approximately normal.
38:33
And well in the old days that was incredibly important fact because
38:38
they didn't have computers to binomials how to deal with
38:42
like n choose k, and n is large, and k, you have all these factorials.
38:47
You can't do these things by hand.
38:49
Now we have fast computers, so it's a little bit better.
38:53
But it's still a lot easier working with normal distributions than
38:57
binomial distributions most of the time, right?
39:00
And even now factorials still grow so fast that even with
39:05
a fast computer with large memory and everything, you may quickly
39:09
exceed its ability when you're doing some big complicated binomial problem.
39:13
And normals have a lot of nice properties, as we've seen, okay?
39:18
The question is, when can we approximate a binomial
39:24
using a normal, and how do we do that, okay?
39:29
So this is just the binomial approximation
39:34
to the normal, other way around.
39:38
Normal approximation, I'll say binomial approximated by normal,
39:41
the normal approximation to the binomial.
39:48
When is that valid?
39:52
To contrast it with the Poisson approximation,
39:55
that we've seen before, okay?
39:58
So, if x is, let's x be binomial np
40:05
And as we've done many times before
40:10
we can represent x as a sum of iid Bernoulli.
40:18
Right? Well these are just 1, if success on the J
40:23
trials 0 otherwise, so the XJ are iid Bernoulli P.
40:33
So this does fit into the framework of the central limit
40:37
theorem that is we are adding up iid random variables.
40:40
So the central limit theorem says that, if the N is large this will be
40:45
approximately normal, at least after we have standardized it, okay?
40:50
So suppose we wanted to approximate, suppose we're
40:55
interested in the probability that x is between A and B.
41:04
And I want to approximate that,
41:07
first we'll do equality then we're approximating it.
41:10
So, I mean if you had to do this on a computer what you would do or by hand,
41:15
which you wouldn't want to, would be to take the PMF and
41:19
sum up all the values of the PMF from A to B, right.
41:23
So okay, you would not want to do that by hand most of the time.
41:28
But suppose we just want an approximation for this, not the exact thing.
41:32
So first, the strategy is just gonna be to take x and standardize it first.
41:38
So we're gonna subtract the mean, so we know that the mean is NP,
41:44
and we're gonna divide by the standard deviation,
41:48
which we know as the square root of NPQ or Q is 1 minus P.
41:53
So, I'm just standardizing it right now.
41:55
So this is still equal, we haven't done any approximations yet.
42:02
And then, now that we've standardized it,
42:05
we can apply the central limit theorem, if N is large enough, right?
42:10
If N is, if central limit theorem said N goes to infinity,
42:12
that doesn't answer the question of how large does N have to be.
42:16
And for that, there's various theorems and various rules of thumb.
42:20
A lot of books will say, how large does N have to be?
42:23
And some books at least will say 30, and that's just a rule of thumb.
42:30
That's not always gonna work for all, there's separate rules of thumb for
42:36
the binomial, like you want N times P to be reasonably large and
42:41
N times 1 minus P to be large, there are different rules of thumb.
42:46
But anyway, if N is large enough,
42:49
then what we've just proven is that this is gonna look like it has
42:53
a normal distribution because that's a sum of IID things.
42:57
And we standardized it correctly, because we already knew the mean and the variance,
43:02
so we just standardized it.
43:03
Okay, so this is approximately.
43:08
Now we're going to use the normal approximation,
43:10
we're going to say this is approximately normal.
43:13
And if I want the probability that the normal is between something and
43:17
something, that's just the CDF here minus the CDF here, right?
43:22
Because for the normal, I mean this is discrete but we're approximating
43:27
using something continuous and we just say, integrate the PDF from here to here.
43:34
But fundamental theorem calculus, that just says take the CDF and go, okay.
43:37
So we're just gonna do Phi of B minus
43:43
NP over square root of NPQ minus Phi
43:48
of A minus NP over square root of NPQ.
43:53
So that would be the basic normal approximation,
43:57
I'll talk a little bit about how to improve this approximation.
44:02
But to contrast it with the Poisson approximation.
44:11
We talked before about the fact that, and we proved the fact
44:16
that if N goes to infinity, and P goes to 0, and N times P is fixed.
44:22
Then the binomial distribution converts to the Poisson distribution,
44:26
we proved that before.
44:28
So in the Poisson approximation, so for
44:31
the Poisson approximation what we had was N is large but P was very small, right?
44:38
And we let lambda equal NP and x as moderate.
44:46
And most important thing is that P is small here, P is close to 0.
44:51
We proved it in the case where this goes to infinity and this goes to 0, okay?
44:55
So Poisson is relevant when we're dealing with a large number of very
45:00
rare unlikely things.
45:02
That's really in contrast to this,
45:06
in this case for the normal approximation.
45:10
Then, while we still want N to be large, but
45:14
if you kind of think intuitively about when is this gonna work well,
45:19
we actually want P to be close to one half.
45:25
Because think about the symmetry, if you have a binomial of P equals one half,
45:30
that's a symmetric distribution.
45:33
The normal is symmetric, no matter, every normal distribution is symmetric.
45:39
If P is far from one half, then the binomial is very, very skewed, and in that
45:44
case it's kind of doesn't make that much sense to approximate using a normal.
45:52
So this is gonna work as an approximation, that's normal approximation,
45:58
as an approximation if P is very small, this makes a lot more sense than this.
46:04
However, think about the statement of the central limit theorem.
46:08
In that theorem I never said P was close to one half,
46:12
in fact that was just a general theorem, we didn't even have P in the statement
46:17
of the central limit theorem, but somehow this still has to eventually work.
46:22
But as a practical matter as an approximation,
46:26
if P is close to one half this is going to work quite well,
46:30
if N is like 30 or 50 or 100, it will work fine.
46:34
But if P is .001, the central limit theorem is still true,
46:38
that as N goes to infinity it's gonna work, okay.
46:41
But if N is kind of not that enormous of a number,
46:46
then it's gonna be a pretty bad approximation.
46:50
And let's just try to reconcile these statements though, is there a case?
46:58
If we let N go to infinity and P be very small,
47:02
I still said, if N is going to infinity,
47:05
it's still gonna converge to normal just much slower, right?
47:11
So, how could the binomial look both normal and Poisson?
47:18
Well, the answer is that the Poisson also looks normal.
47:21
So if you've Poisson lambda where lambda's very large,
47:24
that's also gonna look normal, so there is a case where those come together.
47:30
Okay, one last thing about this is that there is something kind
47:35
of weird about this in the sense that we're approximating
47:40
a discrete distribution using something continuous.
47:44
And if we wanted to get,
47:47
what if we wanted to just approximate same problem?
47:53
I just wanna add something to this.
47:54
Well, let's just look at that just to see what more of like what could go wrong
47:59
with this.
47:59
What if we look at the case A equals B?
48:02
So then we're just saying the probability that x equals A,
48:06
that is approximate the Binomial PMF.
48:10
And one kind of weird thing about this is, this thing would change if
48:14
we changed these to strict inequality but this part would not.
48:18
As soon as we say that this is approximately normal than we don't care
48:22
about that anymore.
48:24
So there's something called the continuity correction which I just wanted to
48:27
briefly mention.
48:28
Which is an improvement to deal with the fact that you're using something
48:31
continuous to approximate something discrete.
48:34
And it's often not explained very well but if you understand what
48:39
it does in this simple case, then it's not hard to see the idea.
48:44
The idea is that if you just said this is approximately normal then you would just
48:49
say zero, right?
48:50
Because it would be zero for continuous, that's not very useful, right?
48:53
We want something more useful than zero.
48:56
So the idea is just simply to write this as,
49:00
here let's assume A is an integer x is discreet well,
49:05
x equals A is the same thing as saying that x is between
49:10
A plus one-half and A minus one-half.
49:17
Right?
49:19
So just use this first.
49:24
So for each value in this range,
49:26
replace it by an interval of length 1 centered there,
49:30
that's exactly the same thing because x is an integer anyway, so that's true.
49:35
But here at least we're giving it an interval to work with instead of
49:40
just saying zero, so that improves this approximation.
49:44
Anyway, it's just central limit theorem.
49:45
All right, so see you next time.