https://www.youtube.com/watch?v=vMaKx9fmJHE&list=PLUl4u3cNGP60uVBMaoNERc6knT_MgPKS0&index=11


字幕記錄


00:00
00:00
The following content is provided under a Creative
00:02
Commons license.
00:03
Your support will help MIT OpenCourseWare
00:06
continue to offer high-quality educational resources for free.
00:10
To make a donation or to view additional materials
00:12
from hundreds of MIT courses, visit MIT OpenCourseWare
00:16
at ocw.mit.edu.
00:17
00:19
PHILIPPE RIGOLLET: We're talking about goodness-of-fit tests.
00:22
Goodness-of-fit tests are, does my data
00:25
come from a particular distribution?
00:27
And why would we want to know this?
00:28
Well, maybe we're interested in, for example,
00:32
knowing if the zodiac signs of the Fortune 500 CEOs
00:36
are uniformly distributed.
00:38
Or maybe we actually have slightly more--
00:41
slightly deeper endeavors, such as understanding
00:44
if you can actually apply the t-test by testing normality
00:48
of your sample.
00:49
All right?
00:49
So we saw that there's the main result--
00:51
the main standard test for this.
00:53
It's called the Kolmogorov-Smirnov test
00:55
that people use quite a bit.
00:57
It's probably one of the most used tests out there.
01:00
And there's other versions of it that I mentioned passing by.
01:05
There's the Cramer-von Mises, and there's
01:08
the Anderson-Darling test.
01:09
Now, how would you pick one of such tests?
01:12
Well, they're always are going to-- they're always
01:14
going to have their advantages and disadvantages.
01:17
And Kolmogorov-Smirnov is definitely the most widely used
01:22
because--
01:23
well, I guess because it's a natural notion
01:24
of distance between functions.
01:26
You just look for each point how far they can be,
01:28
and you just look at the farthest
01:30
they can be everywhere.
01:31
Now, Cramer-von Mises involves L2 distance.
01:34
So if you're not used to Hilbert spaces or notions
01:39
of Euclidean spaces, at least it's a little more complicated.
01:43
And then Anderson-Darling is definitely
01:44
even more complicated.
01:45
Now, each of these tests is going
01:47
to be more powerful against other alternatives.
01:49
So unless you can really guess which alternative
01:52
you're expecting to see, which you probably
01:54
don't, because, again, you're in a case where you want
01:56
to typically declare H0 to be the correct one,
02:00
then it's really a matter of tossing a coin.
02:04
Maybe you can run all three of them
02:06
and just sleep better at night, because all three of them
02:09
have failed to reject, for example.
02:11
All right?
02:12
So as I mentioned, one of the maybe primary goals
02:15
to test goodness of fit is to be able to check
02:19
whether we can apply Student's test, right,
02:22
and if the Student distribution is actually
02:24
a valid distribution.
02:25
And for that, we need to have normally distributed data.
02:28
Now, as I said several times, normally distributed,
02:32
it's not a specific distribution.
02:34
It's a family of distributions that's
02:35
indexed by means and variances.
02:38
And the way I would want to test if a distribution is normally
02:41
distributed is, well, I would just
02:42
look at the most natural normal distribution
02:45
or Gaussian distribution that my data could follow.
02:48
That means that's the Gaussian distribution that
02:50
has the same mean as my data and the same empirical variance
02:53
as my data, right?
02:55
And so I'm going to be given some points x1, xn,
03:00
and I'm going to be asking, are those Gaussian?
03:02
03:04
That means this is equivalent to, say,
03:07
are they N mu sigma square for some mu sigma squared?
03:15
And of course, the natural choice
03:17
is to take mu hat to be--
03:20
mu to be equal to mu hat, which is xn bar.
03:23
And sigma squared to be sigma squared hat to be,
03:30
well, Sn hat--
03:32
Sn-- what we wrote Sn, which is 1/n sum from i equal 1 to n
03:37
of xi minus xn bar squared.
03:40
OK?
03:41
So this is definitely the natural one
03:44
you would want to test.
03:45
And maybe you could actually just close your eyes
03:47
and just stuff that in a Kolmogorov-Smirnov test.
03:52
OK?
03:53
So here, there's a few things that don't work.
03:55
The first one is that Donsker's theorem does not
03:57
work anymore, right?
03:58
Donsker's theorem was the one that
03:59
told us that, properly normalized,
04:02
this thing would actually converge
04:04
to the supremum of a Brownian bridge, which is not true.
04:07
So that's one problem.
04:08
But there's actually an even bigger problem
04:10
is that this distribution, we will check in a second,
04:13
actually does not--
04:15
is pivotal itself, right, the statistic is pivotal.
04:19
It does not have a distribution that
04:20
depends on the known parameters, which
04:22
is sort of nice, at least under the null.
04:24
However, the distribution is not the same
04:27
as the one that had fixed mu and sigma.
04:31
The fact that they come from some random variables
04:34
is actually distorting the distribution itself.
04:36
And in particular, the quantiles are going to be distorted,
04:39
and we hinted at that last time.
04:41
So one other thing I need to tell you, though,
04:44
is that this thing actually-- so I know there's some--
04:48
oh, yeah, that's where there's a word missing.
04:51
So we compute the quantiles for this test statistic.
04:54
And so what I need to promise to you
04:56
is that these quantiles do not depend
04:59
on any unknown parameter, right?
05:01
I mean, it's not clear, right?
05:06
So I want to test whether my data has some Gaussian
05:09
distribution.
05:09
So under the null, all I know is that my xi's are
05:15
Gaussian with some mean mu and some variance sigma,
05:18
which I don't know.
05:19
So it could be the case that when
05:20
I try to understand the distribution of this quantity
05:23
under the null, it depends on mu and sigma, which I don't know.
05:28
So we need to check that this is the case.
05:30
And what's actually our redemption
05:33
here is actually going to be the supremum.
05:37
The supremum is going to basically allow
05:39
us to, say, sup out mu and sigma square.
05:43
So let's check that, right?
05:44
So what I'm interested in is this quantity, supremum
05:48
over t and R of the difference between Fn of t
05:54
and, what I write, phi mu hat sigma squared of t.
06:02
So phi mu hat sigma hat squared--
06:07
sorry, sigma hat squared--
06:09
is the CDF of some Gaussian with mean mu hat and variance sigma
06:15
hat squared.
06:16
And so in particular, this thing here, phi hat of mu hat--
06:24
sorry, phi hat of mu hat sigma hat squared of t
06:30
is the probability that some x is less than t,
06:34
where x follows some N mu hat sigma hat squared.
06:39
So what it means is that by just the translation
06:42
and scaling trig that we typically do
06:44
for Gaussian to turn it into some standard Gaussian, that
06:47
implies that there exists some z, which
06:50
is standard Gaussian this time, so mean 0 and variance 1,
06:54
such that x is equal to sigma hat x--
06:58
sorry, z plus mu hat.
07:02
Agreed?
07:04
That's basically saying that x has some Gaussian with mean mu
07:08
and variance sigma squared.
07:09
And I'm not going to say the hats every single time, OK?
07:13
So OK, so that's what it means.
07:17
So in particular, maybe I shouldn't use x here,
07:20
because x is going to be my actual data.
07:22
So let me write y.
07:23
07:27
OK?
07:29
So now what is this guy here?
07:32
It's basically-- so phi hat.
07:35
So this implies that phi mu hat sigma hat squared of t
07:42
is equal to the probability that sigma hat z
07:46
plus mu hat is less than t, which
07:50
is equal to the probability that z is less than t
07:53
minus mu hat divided by sigma hat, right?
08:00
But now when z is the standard normal,
08:02
this is really just the cumulative distribution
08:04
function of a standard Gaussian but evaluated
08:07
at a point which is not t, but t minus mu
08:09
hat divided by sigma hat.
08:11
All right?
08:11
So in particular, what I know-- so from this what I get-- well,
08:15
maybe I'll remove that, it's going to be annoying--
08:17
I know that phi mu hat sigma hat squared--
08:23
sorry-- phi mu hat sigma hat squared of t
08:27
is simply phi of, say, 0, 1.
08:31
And that's just the notation.
08:32
Usually we don't put those, but here it's more convenient.
08:35
So it's phi 0, 1 of t minus mu hat divided by sigma hat.
08:43
OK?
08:45
That's just something you can quickly check.
08:48
There's this nice way of writing the cumulative distribution
08:51
function for any mean and any variance
08:55
in terms of the cumulative distribution function
08:57
with mean 0 and variance 1.
08:59
All right?
08:59
Not too complicated.
09:00
All right.
09:01
So I know what I'm going to say is that, OK, I have this sup
09:04
here.
09:05
So what I can write is that this thing here
09:07
is equal to the sup routine R of 1/n.
09:12
Let me write what Fn is--
09:14
sum from i equal 1 to n of the indicator
09:17
that xi is less than t minus phi 0, 1
09:23
of t minus mu hat divided by sigma hat.
09:27
09:30
OK?
09:32
I actually want to make a change of variable
09:34
so that this thing I'm going to call mu--
09:36
u, sorry.
09:37
OK?
09:38
And so I'm going to make my life easier,
09:40
and I'm going to make it appear here.
09:42
And so I'm just going to replace this by indicator
09:46
that xi minus mu hat divided by sigma hat less than t
09:52
minus mu hat divided by sigma hat, which is
09:56
sort of useless at this point.
09:57
I'm just making my formula more complicated.
10:00
But now I see something here that shows up,
10:02
and I will call it u, and this is another u.
10:06
OK?
10:08
So now what it means is that suping over t, when t ranges
10:12
from negative infinity to plus infinity,
10:15
the new range is from negative infinity to plus infinity,
10:17
right?
10:20
So this sup, I can actually write--
10:22
this suping t I can write as the sup in u,
10:34
as the indicator that xi minus mu hat divided by sigma hat
10:38
is less than u minus phi 0, 1 of u.
10:47
Now, let's pause for one second.
10:49
Let's see where we're going.
10:51
What we're trying to show that this thing does not
10:53
depend on the unknown parameters, say, mu and sigma,
10:57
which are the mean and the variance of x under the null.
11:01
To do that, we basically need to make
11:04
only quantities that are sort of invariant under these values.
11:09
So I tried to make this thing invariant under anything,
11:11
and it's just really something that depends on nothing.
11:14
It's the CDF.
11:15
It doesn't depend on sigma hat and mu hat anymore.
11:18
But sigma hat and mu hat will depend on mu and sigma, right?
11:22
I mean, they're actually good estimators of those guys,
11:24
so they should be pretty close to them.
11:26
And so I need to make sure that I'm not actually
11:28
doing anything wrong here.
11:30
So the key thing here is going to be to observe that 1/n sum
11:35
from i equal 1 to n of indicator of xi minus u hat divided
11:40
by sigma hat less than u, which is the first term that I have
11:43
in this absolute value, well, this is what-- well,
11:48
this is equal to 1/n sum from i equal 1 to n of indicator
11:54
that--
11:55
well, now under the null, which is
12:00
that x follows N mu sigma squared, for some mu and sigma
12:06
squared that are unknown.
12:07
But they are here.
12:08
They exist.
12:08
I just don't know what they are.
12:10
Then xi minus mu can be written as sigma zi plus mu
12:17
minus mu hat divided by sigma hat, where
12:23
z is equal to x minus mu divided by sigma, right?
12:29
That's just the same trick that I wrote here.
12:32
OK?
12:33
Everybody agree?
12:34
So I just standardize--
12:36
sorry, z-- yeah, so zi is xi minus mu i minus mu divided
12:42
by sigma.
12:42
All right?
12:43
Just a standardization.
12:45
So now once I write this, I can actually
12:47
divide everybody by sigma.
12:49
12:55
Right?
12:55
So I just divided on top here and in the bottom here.
12:59
So now what I need to check is that the distribution
13:02
of this guy does not depend on mu or sigma.
13:08
That's what I claim.
13:10
What is the distribution of this indicator?
13:12
13:16
It's a Bernoulli, right?
13:19
And so if I want to understand its distribution,
13:21
all I need to do is to compute its expectation,
13:23
which is just the probability that this thing happens.
13:26
But the probability that this thing happens
13:27
is actually now depending on mu and sigma.
13:29
And the reason is that mu is what?
13:33
Well, it's x bar-- sorry, yeah, so mu hat-- sorry, is xn bar.
13:44
So mu hat minus mu, which under the null
13:50
follows N mu sigma square over n, right?
13:54
That's the property of the average.
13:57
So when I do mu hat minus mu divided by sigma,
14:00
this thing is what distribution?
14:04
It's still a normal.
14:05
It's a linear transformation of a normal.
14:07
What are the parameters?
14:11
AUDIENCE: 0, 1/n.
14:11
PHILIPPE RIGOLLET: Yeah, 0, 1/n.
14:13
14:16
But this does not depend on mu or sigma, right?
14:26
14:29
Now, I need to check that this guy does not
14:31
depend on mu or sigma.
14:34
What is the distribution of sigma hat over sigma?
14:37
14:40
AUDIENCE: It's a chi-square, right?
14:41
PHILIPPE RIGOLLET: Yeah, it is a chi-square.
14:43
So this is actually--
14:45
sorry, sigma hat squared divided by sigma squared
14:48
is a chi-square with n minus 1 degrees of freedom.
14:54
Does not depend on mu or sigma.
14:55
15:00
AUDIENCE: [INAUDIBLE]
15:02
AUDIENCE: [INAUDIBLE]
15:03
AUDIENCE: Or sigma hat squared over sigma squared?
15:05
PHILIPPE RIGOLLET: Yeah, thank you.
15:07
So this is actually divided by it.
15:10
So maybe this guy.
15:11
Let's write it like that.
15:12
This is the proper way of writing it.
15:14
Thank you.
15:14
15:20
Right?
15:21
So now I have those two things.
15:22
Neither of them depends on mu or sigma.
15:25
I these two things.
15:28
There's just one more thing to check.
15:29
15:32
What is it?
15:32
15:35
AUDIENCE: That they're independent?
15:36
PHILIPPE RIGOLLET: That they're independent, right?
15:37
Because the dependence in mu and sigma
15:39
could be hidden in the covariance.
15:41
It could be the case that the marginal distribution of mu
15:44
does not depend on mu or sigma, that the marginal distribution
15:47
of sigma--
15:48
of mu hat does not depend on mu and sigma.
15:49
The marginal distribution of sigma hat
15:51
does not depend on mu or sigma, but their correlation
15:54
could depend on mu and sigma.
15:56
But we also have that if I look at--
15:59
so if I look at--
15:59
16:02
so since mu hat is independent of sigma hat,
16:10
it means that the joint distribution of mu hat divided
16:33
by sigma and sigma hat divided by sigma
16:38
does not depend on blah, blah, blah, on mu and sigma.
16:46
OK?
16:47
16:50
Agree?
16:52
It's not in the individual ones, and it's not
16:54
in the way they interact with each other.
16:57
It's nowhere.
16:59
AUDIENCE: [INAUDIBLE] independence be [INAUDIBLE]
17:01
theorem?
17:02
PHILIPPE RIGOLLET: Yeah, covariance theorem, right.
17:03
So that's something we've been using over and again.
17:06
That's all under the null.
17:07
If my data is not Gaussian, nothing actually holds.
17:12
I just use the fact that under the null
17:14
I'm Gaussian for some mean mu and variance sigma squared.
17:17
But that's all I care about.
17:18
When I'm designing a test, I only
17:21
care about the distribution under the null, at least
17:24
to control the type I error.
17:26
Then to control the type II error,
17:28
then I cross my fingers pretty hard.
17:31
OK?
17:32
17:34
So now this basically implies what's written on the board,
17:41
that this distribution, this test statistic,
17:45
does not depend on any unknown parameters.
17:48
It's just something that's pivotal.
17:50
In particular, I could go at the back of a book
17:53
and check if there's a table for the quantiles of these things,
17:56
and indeed there are.
17:58
This is the table that you see.
18:00
So actually, this is not even in a book.
18:02
This is in Lilliefors original paper, 1967,
18:09
as you can tell from the typewriting.
18:13
And he actually probably was rolling some dice
18:17
from his office back in the day and was checking
18:19
that this was-- he simulated it, and this is
18:22
how he computed those numbers.
18:24
And here you also have some limiting distribution,
18:28
which is not the sup of a Brownian motion over 0,
18:31
1 of-- sorry, of a Brownian bridge over 0,
18:35
1, which is the one that you would
18:36
see for the Kolmogorov-Smirnov test,
18:38
but it's something that's slightly different.
18:41
And as I said, these numbers are actually typically much smaller
18:45
than the numbers you would get, right?
18:47
Remember, we got something that was about 0.5, I think,
18:50
or maybe 0.41, for the Kolmogorov-Smirnov test
18:54
at the same entrance, which means
18:56
that using Kolmogorov-Lilliefors test
18:58
it's going to be harder for you not
18:59
to reject for the same data.
19:02
It might be the case that in one case you reject,
19:04
and in the other one you fail to reject.
19:06
But the ordering is always that if you
19:09
fail to reject with Kolmogorov-Lilliefors,
19:12
you will fail to reject with Kolmogorov-Smirnov, right?
19:17
There's always one.
19:18
So that's why people tend to close their eyes
19:20
and prefer Kolmogorov-Smirnov because it just
19:23
makes their life easier.
19:25
OK?
19:27
So this is called Kolmogorov-Lilliefors.
19:29
I think there's actually an E here--
19:33
sorry, an I before the E. Doesn't matter too much.
19:41
OK?
19:42
Are there any questions?
19:43
Yes?
19:43
AUDIENCE: Is there like a place you
19:45
can point to like [INAUDIBLE]
19:59
PHILIPPE RIGOLLET: Yeah.
20:00
AUDIENCE: [INAUDIBLE].
20:01
PHILIPPE RIGOLLET: So the fact that it's actually
20:03
a different distribution is that here--
20:07
so if I actually knew what mu and sigma were,
20:11
I would do exactly the same thing.
20:13
But here, rather than having this average with mu and sigma,
20:16
I would just have the--
20:17
with mu hat and sigma hat, I would just
20:19
have the average with mu and sigma.
20:20
OK?
20:21
So what it means is that the key thing
20:23
is that what I would compare is the 1/n sum of some Bernoullis
20:29
with parameter.
20:30
And the parameter here would be the probability that mu--
20:34
xi minus mu over sigma is less than u,
20:37
which is just the probability that phi--
20:40
sorry, it's a Bernoulli with probability F of t.
20:44
Well, let me write what it is, right?
20:49
So that's minus phi 0, 1 of t.
20:57
OK?
20:57
So that's for the K-S test, and then I sup over t, right?
21:04
That's what I would have had, because this is actually
21:06
exactly the right thing.
21:08
Here I would remove the true mean.
21:10
I would divide by the true standard deviation.
21:12
So that would actually end up being a standard Gaussian,
21:15
and that's why I'm allowed to use phi 0, 1 here.
21:18
Agreed?
21:19
And these are Bernoullis because they're just indicators.
21:22
What happens in the Kolmogorov-Lilliefors test?
21:26
Well, here the Bernoulli, the only thing
21:28
that's going to change is this guy, right?
21:30
They still have a Bernoulli.
21:31
It's just that the parameters of the Bernoulli are weird.
21:34
The parameters of the Bernoulli looks like it's--
21:37
it becomes the probability that some N(0, 1) plus some N(0,
21:47
1/n), right, divided by some square root of chi-squared n
22:02
minus 1 divided by n is less than t.
22:07
And those things are independent,
22:09
but those guys are not necessarily independent, right?
22:12
And so why is this probability changing?
22:14
Well, because this denominator is actually fluctuating a lot.
22:17
So that actually makes this probability different.
22:20
And so that's basically where it comes from, right?
22:23
So you could probably convince yourself
22:26
very quickly that this only makes those guys closer.
22:32
And why does it make those guys closer?
22:38
22:40
No, sorry.
22:41
It makes those guys farther, right?
22:43
And it makes those guys farther for a very clear reason,
22:46
is that the expectation of this Bernoulli is exactly that guy.
22:51
Here I think it's going to be true
22:52
as well that the expectation of this Bernoulli
22:54
is going to be that guy, but the fluctuations
22:56
are going to be much bigger than just the phi of the Bernoulli.
22:58
Because the first thing I do is I
22:59
have a random parameter from my Bernoulli,
23:01
and then I flip the Bernoulli.
23:02
So fluctuations are going to be bigger than a Bernoulli.
23:04
And so when I take the sup, I'm going
23:06
to have to [INAUDIBLE] them.
23:07
So it makes things farther apart,
23:09
which makes it more likely for you to reject.
23:11
Yeah?
23:12
AUDIENCE: You also said that if you compare the same-- if you
23:16
compare the table and you set at the same level,
23:19
the Lilliefors is like 0.2, and for the Smirnov is at 0.4.
23:24
PHILIPPE RIGOLLET: Yeah.
23:25
AUDIENCE: OK.
23:26
So it means that Lilliefors is harder not to reject?
23:30
PHILIPPE RIGOLLET: It means that Lilliefors is harder
23:32
not to reject, yes, because we reject when
23:35
we're larger than the number.
23:36
So the number being smaller with the same data, we might be,
23:39
right?
23:40
So basically, it looks like this.
23:43
What we run-- so here we have the distribution for the--
23:55
so let's say this is the density for K-S.
24:05
And then we have the density for Kolmogorov-Lilliefors, K-L. OK?
24:11
And what the density of K-L looks like,
24:13
it looks like this, right?
24:22
And so if I want to squeeze in alpha here,
24:27
I'm going to have to squeeze in-- and I squeeze in alpha
24:30
here, then this is the quantile of order 1 minus alp--
24:34
well, let's say alpha of the K-L.
24:38
And this is the quantile alpha of K-S.
24:41
So now you give me data, and what I do with it,
24:44
I check whether they're larger than this number.
24:46
So if I apply K-S, I check whether I'm larger or smaller
24:48
than this thing.
24:49
But if I apply Kolmogorov-Lilliefors,
24:51
I check whether I'm larger or smaller than this thing.
24:53
So over this entire range of values for my test statistic--
24:56
because it is the same test statistic,
24:58
I just plugged in mu hat and sigma hat--
25:00
for this entire range, the two tests have different outcomes.
25:04
And this is a big range in practice, right?
25:06
I mean, it's between--
25:08
I mean, it's pretty much at scale here.
25:10
25:13
OK?
25:14
25:18
Any other-- yeah?
25:18
AUDIENCE: [INAUDIBLE] when n goes to infinity, the two tests
25:21
become the same now, right?
25:24
PHILIPPE RIGOLLET: Hmmm.
25:25
AUDIENCE: Looking at that formula--
25:27
PHILIPPE RIGOLLET: Yeah, They should become the same
25:29
very far.
25:30
25:32
Let me see, though, because--
25:34
right.
25:35
So here we have 8--
25:38
so here we have, say, for 0.5, we get 0.886.
25:44
And for-- oh, I don't have it.
25:45
25:49
Yeah, actually, sorry.
25:50
So you're right.
25:51
You're totally right.
25:52
This is the Brownian bridge values.
25:56
Because in the limit by, say, Slutsky--
26:00
sorry, I'm lost.
26:02
Yeah, these are the values that you
26:03
get for the Brownian bridge.
26:04
Because in the limit by Slutsky, this thing
26:07
is going to have no fluctuation, and this thing
26:09
is going to have no fluctuation.
26:11
So they're just going to be pinned down,
26:12
and it's going to look like as if I did not replace anything.
26:15
Because in the limit, I know those guys much faster--
26:18
the mu hat and sigma hat converge
26:20
much faster to mu and sigma than the distribution itself, right?
26:25
So those are actually going to be negligible.
26:27
You're right.
26:29
Actually even, I didn't have--
26:31
these are actually the numbers I showed you
26:32
for the bridge, the Brownian bridge,
26:34
last time, because I didn't have it for the Kolmogorov-Smirnov
26:36
one.
26:38
OK?
26:38
26:41
So there's actually-- so those are numerical ways of checking
26:44
things, right?
26:45
I give you data.
26:47
You just crank the Kolmogorov-Smirnov test.
26:50
Usually you press a 5 on MATLAB.
26:52
But let's say you actually compute this entire thing,
26:55
and there's a number that comes out,
26:57
and you decide whether it's large enough or small enough.
27:00
Of course, statistical software is going to make your life even
27:02
simpler by spitting out a p-value, because you can--
27:05
I mean, if you can compute quantiles, you can also
27:07
when compute p-values.
27:09
And so your life is just fairly easy.
27:12
You just have red is bad, green is good, and then you can go.
27:18
The problem is that those are numbers you want to rely on.
27:21
But let's say you actually reject.
27:23
Let's say you reject.
27:23
Your p-value is actually just like slightly below 5%.
27:29
So you can say, well, maybe I'm just going to change
27:33
my p-value--
27:34
my threshold to 1%, but you might
27:36
want to see what's happening.
27:38
And for that you need a visual diagnostic.
27:40
Like, how do I check if something departs
27:42
from being normal, for example?
27:44
How do I check if a distribution--
27:46
why is a distribution not a uniform distribution?
27:49
Why is a distribution not an exponential distribution?
27:51
There's many, many, right?
27:53
If I have an exponential distribution
27:54
and half of my values are negative,
27:57
for example, well, there's like pretty obvious reasons
27:59
why it should not be exponential.
28:01
But it could be the case that it's
28:03
just the tails are little heavier
28:05
or there's more concentration at some point.
28:08
Maybe it has two modes.
28:10
There's things like this.
28:11
But the real thing, we don't believe
28:13
that the Gaussian is so important because it
28:16
looks like this close to 0.
28:19
What we like about the Gaussian is that the tails here
28:22
decay at this rate-- exponential minus x
28:24
squared over 2 that we described in the maybe first lecture.
28:28
And in particular, if there were like kinks around here,
28:31
it wouldn't matter too much.
28:33
This is not what makes issues for the Gaussian.
28:36
And so what we want is to have a visual diagnostic that tells us
28:41
if the tails of my distribution are
28:44
comparable to the tails of a Gaussian one, for example.
28:48
And those are what's called quantile-quantile plots,
28:51
and in particular-- or QQ plots.
28:54
And the basic QQ plots we're going to be using
28:58
are the ones that are called normal QQ plots that
29:00
are comparing your data to a Gaussian distribution,
29:03
or a normal distribution.
29:05
But in general, you could be comparing your data
29:07
to any distribution you want.
29:09
And the way you do this is by comparing
29:11
the quantiles of your data, the empirical quantiles,
29:14
to the quantiles of the actual distribution
29:16
you're trying to compare yourself to.
29:19
So this, in a way, is a visual way
29:22
of performing these goodness-of-fit tests.
29:25
And what's nice about visual is that there's room for debate.
29:29
You can see something that somebody else cannot see,
29:31
and you can always-- because you want to say that things are
29:33
Gaussian.
29:34
And we'll see some examples where you can actually say it
29:36
if you are good at debate, but it's actually
29:41
going to be clearly not true.
29:44
All right.
29:44
So this is a quick and easy check.
29:46
That's something I do all the time.
29:48
You give me data, I'm just going to run this.
29:49
One of the first things I do so I
29:51
can check if I can start entering the Gaussian
29:54
world without compromising myself too much.
29:57
And the idea is to say, well, if F is close to-- if F--
30:04
if my data comes from an F, and if I
30:07
know that Fn is close to F, then rather
30:10
than computing some norm, some number that tells me
30:12
how far they are, summarizing how far they are,
30:14
I could actually plot the two functions
30:16
and see if they're far apart.
30:17
So let's think for one second what this kind of a plot
30:21
would look like.
30:23
Well, I would go between 0 and 1.
30:25
That's where everything would happen.
30:26
Let's say my distribution is the Gaussian distribution.
30:29
So this is the CDF of N(0, 1).
30:35
And now I have this guy that shows up, and remember
30:37
we had this piecewise constant.
30:39
30:44
Well, OK, let's say we get something like this.
30:46
We get a piecewise constant distribution for Fn, right?
30:51
30:54
Just from this, and even despite my bad skills at drawing,
31:00
it's clear that it's going to be hard
31:01
for you to distinguish those two things,
31:03
even for a fairly large amount of points.
31:05
Because the problem is going to happen here,
31:08
and those guys look pretty much the same everywhere
31:11
you are here.
31:11
You're going to see differences maybe in the middle,
31:14
but we don't care too much about those differences.
31:17
And so what's going to happen is that you're
31:19
going to want to compare those two things.
31:20
And this is basically you have the information you want,
31:23
but visually it just doesn't render very well because you're
31:26
not scaling things properly.
31:28
And the way we actually do it is by flipping things around.
31:32
And rather than comparing the plot of F to the plot of Fn,
31:36
we compare the plot of Fn inverse
31:38
to the plot of F inverse.
31:41
Now, if F goes from the real line to the interval 0, 1,
31:47
F inverse goes from 0, 1 to the whole real line.
31:52
So what's going to happen is that I'm
31:53
going to compare things on some intervals, which is the--
31:57
which are the entire real line.
31:59
And then what values should I be looking at those things at?
32:02
Well, technically for F, if F is continuous I
32:05
could look at F inverse for any value that I please, right?
32:09
So I have F. And if I want to look at F inverse,
32:14
I pick a point here and I look at the value that it gives me,
32:17
and that's F inverse of, say, u, right, if this is u.
32:23
And I could pick any value I want,
32:24
I'm going to be able to find it.
32:25
The problem is that when I start to have
32:27
this piecewise constant thing, I need
32:30
to decide what value I assign for anything that's
32:33
in between two jumps, right?
32:35
And so I can choose whatever I want,
32:38
but in practice it's just going to be things
32:40
that I myself decide.
32:42
Maybe I can decide that this is the value.
32:44
Maybe I can decide that the value is here.
32:46
But for all these guys, I'm going to pretty much decide
32:49
always the same value, right?
32:51
If I'm in between--
32:52
for this value u, for this jump the jump is here.
32:56
So for this value, I'm going to be
32:59
able to decide whether I want to go above or below,
33:02
but it's always this value that's going to come out.
33:05
So rather than picking values that are in between,
33:07
I might as well just pick only values
33:08
for which this is the value that it's going to get.
33:11
And those values are exactly 1/n, 2/n, 3/n, 4/n.
33:15
It's all the way to n/n, right?
33:17
That's exactly where the flat parts are.
33:19
We know we jump from 1/n every time.
33:23
And so that's exactly the recipe.
33:25
It says look at those values, 1/n, 2/n, 3/n
33:29
until, say, n minus 1 over n.
33:32
And for those values, compute the inverse
33:35
of both the empiricial CDF and the true CDF.
33:40
Now, for the empirical CDF, it's actually easy.
33:43
I just told you this is basically where the points--
33:45
where the jumps occur.
33:47
And the jumps occur where?
33:49
Well, exactly at my observations.
33:53
Now, remember I need to sort those observations to talk
33:56
about them.
33:57
So the one that occurs for the i-th jump
34:00
is the i-th largest observation, which we denoted by X sub (i).
34:07
Remember?
34:07
We had this formula that we said, well, we have x1, xn.
34:11
These are my data.
34:13
And what I'm going to sort them into
34:14
is x sub (1), which is less than or equal to x
34:18
sub (2), which is less than x sub (n).
34:23
OK?
34:24
So we just ordered them from smallest to largest.
34:26
And then now we've done that, we just
34:28
put this parenthesis notation.
34:30
So in particular, Fn inverse of i/n
34:34
is the location where the i-th jumps occur,
34:38
which is the i-th largest observation.
34:40
OK?
34:42
So for this guy, these values, the y-axes
34:47
are actually fairly easy.
34:49
I know it's basically my ordered observations.
34:53
The x-values are-- well, that depends on the function
34:58
F I'm trying to test.
34:59
If it's the Gaussian, it's just the quantile
35:01
of order 1 minus 1/n, right?
35:05
It's this Q1 minus 1/n here that I need to compute.
35:08
It's the inverse of the cumulative distribution
35:11
function, which, given the formula for F,
35:13
you can actually compute or maybe estimate fairly well.
35:16
But it's something that you can find in tables.
35:18
Those are basically quantiles.
35:20
Inverse of CDFs are quantiles, right?
35:23
And so that's basically the things we're interested in.
35:28
That's why it's called quantile-quantile.
35:30
Those are sometimes referred to as theoretical quantiles,
35:34
the one we're trying to test, and empirical quantiles,
35:37
the one that corresponds to the empirical CDF.
35:39
And so I'm plotting a plot where the x-axis is quantile.
35:44
The y-axis is quantile.
35:45
And so I call this plot a quantile-quantile plot, or QQ
35:49
plot, because, well, just say 10 times quantile-quantile,
35:54
and then you'll see why.
35:55
Yeah?
35:56
AUDIENCE: [INAUDIBLE] have to have the [INAUDIBLE]??
35:59
PHILIPPE RIGOLLET: Well, that's just--
36:01
we're back to the--
36:03
we're back to the goodness-of-fit test, right?
36:06
So if you look--
36:08
so you don't do it yourself.
36:10
That's the simple answer.
36:11
You don't-- I'm just telling you how those plots are going to be
36:14
seen spit out from a software are going to look like.
36:17
Now, depending on the software, there's
36:19
a different thing that's happening.
36:21
Some softwares are actually plotting F with the right--
36:25
let's say you want to do normal, as you asked.
36:27
So some software are just going to use F
36:30
to be with mu hat and sigma hat, and that's fine.
36:33
Some software are actually not going to do this.
36:36
They're just going to use a Gaussian.
36:39
But then they're going to actually have
36:41
a different reference point.
36:43
So what do we want to see here?
36:45
What should happen if all these points--
36:48
if all my points actually come from F,
36:51
from a distribution that has CDF F?
36:53
What should happen?
36:54
What should I see?
36:55
36:58
Well, since Fn should be close to F,
37:01
Fn inverse should be close to F inverse, which
37:04
means that this point should be close to that point.
37:07
This point should be close to that point.
37:08
So ideally, if I actually pick the right F,
37:13
I should see a plot that looks like this, something where
37:19
all my points are very close to the line y
37:24
is equal to x, right?
37:26
And I'm going to have some fluctuations,
37:28
but something very close to this.
37:31
Now, that's if F is exactly the right one.
37:34
If F is not exactly the right one, in particular,
37:36
in the case of a Gaussian one, if I actually
37:40
plotted here the quantiles--
37:43
so if I plotted F 0, 1 of t, right?
37:52
So let's say those are the ones I actually plot,
37:54
but I really don't know what-- mu hat is not 0
37:57
and sigma hat is not 0.
37:59
And so this is not the one I should be getting.
38:01
Since we actually know that phi of mu hat sigma hat
38:06
squared t is equal to phi 0, 1 of t minus mu hat divided
38:12
by sigma hat, there's just this change
38:16
of axis, which is actually very simple.
38:19
This change of axis is just a simple translation scaling,
38:22
which means that this line here is
38:26
going to be transformed into another line
38:28
with a different slope and a different intercept.
38:31
And so some software will actually decide
38:34
to go with this curve and just show you
38:37
what the reference curve should be,
38:39
rather than actually putting everything back
38:41
onto the 45-degree curve.
38:43
AUDIENCE: So if you get any straight line?
38:45
PHILIPPE RIGOLLET: Any straight line, you're happy.
38:47
I mean, depending on the software.
38:49
Because if the software actually really rescaled this thing
38:53
to have mu hat and sigma square and you find a different line,
38:56
a different straight line, this is
38:58
bad news, which is not going to happen actually.
39:01
It's impossible that happens, because you actually-- well,
39:05
it could.
39:06
If it's crazy, it could.
39:07
It shouldn't be very crazy.
39:09
OK.
39:10
So let's see what R does for us, for example.
39:14
So here in R, R actually does this funny trick where--
39:20
so here I did not actually plot the lines.
39:22
I should actually add the lines.
39:23
So the command is like qqnorm of my sample, right?
39:27
And that's really simple.
39:28
I just stack all my data into some vector, say, x.
39:33
And I say qqnorm of x, and it just spits this thing out.
39:40
OK?
39:40
Very simple.
39:42
But I could actually add another command,
39:44
which I can't remember.
39:45
I think it's like qqline, and it's just going
39:50
to add the line on top of it.
39:52
But if you see, actually what R does for us,
39:55
it's actually doing the translation and scaling
39:58
on the axes themselves.
40:01
So it actually changes the x and y-axis in such a
40:05
way that when you look at your picture
40:07
and you forget about what the meaning of the axes are,
40:09
the relevant straight line is actually
40:11
still the 45-degree line.
40:13
It's Because it's actually done the change of units for you.
40:17
So you don't have to even see the line.
40:19
You know that, in your mind, that this is basically--
40:21
the reference line is still 45 degree because that's
40:25
the way the axes are made.
40:27
But if I actually put my axes, right-- so here, for example,
40:29
it goes from--
40:31
let's look at some--
40:32
well, OK, those are all square.
40:36
Yeah, and that's probably because they actually have--
40:38
the samples are actually from a standard normal.
40:41
So I did not make my life very easy
40:43
to illustrate your question, but of course, I
40:45
didn't know you were going to ask it.
40:46
Next time, let's just prepare.
40:49
Let's script more.
40:50
We'll see another one in the next plot.
40:52
But so here what you expect to see
40:54
is that all the plots should be on the 45-degree line, right?
40:58
This should be the right one.
40:59
And if you see, when I start having 10,000 samples,
41:02
this is exactly what's happening.
41:04
So this is as good as it gets.
41:05
This is an N(0, 1) plotted against the theoretical
41:08
quantile of an N(0, 1).
41:10
As good as it gets.
41:12
And if you see, for the second one, which is 50,
41:15
sample size of size--
41:16
sample of size 50, there is some fudge factor, right?
41:19
I mean, those things--
41:20
doesn't look like there's a straight line, right?
41:22
It sort of appears that there are some weird things happening
41:24
here at the lower tail.
41:27
And the reason why this is happening
41:29
is because we're trying to compare the tails, right?
41:32
When I look at this picture, the only thing that goes wrong
41:34
somehow is always at the tip, because those
41:37
are sort of rare and extreme values,
41:39
and they're sort of all over the place.
41:41
And so things are never really super smooth and super clean.
41:44
So this is what your best shot is.
41:46
This is what you will ever hope to get.
41:49
So size 10, right, so you have 10 points.
41:52
Remember, we actually-- well, I didn't really
41:54
tell you how to deal with the extreme cases.
41:56
Because the problem is that F inverse of 1 for the true F
41:59
is plus infinity.
42:01
So you have to make some sort of weird boundary choices
42:04
to decide what F inverse of 1 is, and it's something
42:07
that's like somewhere.
42:09
But you still want to put like 10 dots, right?
42:11
1, 2, 3, 4, 5, 6, 7, 8, 9, 10 dots.
42:15
So I have 10 observations, you will see 10 dots.
42:17
I have 50 observations, you will see 50 dots, right,
42:21
because I have--
42:22
there are 1/n, 2/n, 3/n all the way to n/n.
42:26
I didn't tell you the last one.
42:29
OK.
42:29
So this is when things go well, and this is
42:31
when things should not go well.
42:32
OK?
42:33
So here, actually, the distribution
42:35
is a Student's t with 15 degrees of freedom,
42:37
which should depart somewhat from a Gaussian distribution.
42:41
The tails should be heavier.
42:44
And what you can see is basically the following,
42:47
is that for 10 you actually see something that's crazy, right,
42:51
if I do 10 observations.
42:52
But if I do 50 observations, honestly, it's
42:55
kind of hard to say that it's different
42:56
from the standard normal.
42:58
So you could still be happy with this for 100.
43:01
And then this is what's happening for 10,000.
43:03
And even here it's not the beautiful straight line,
43:06
but it feels like you would be still tempted
43:08
to conclude that it's a beautiful straight line.
43:11
So let's try to guess.
43:13
So basically, there's-- for each of those sides there's two
43:18
phenomena.
43:18
Either it goes like this or it goes like this,
43:22
and then it goes like this or it goes like this.
43:24
Each side corresponds to the left tail, all the smallest
43:28
values.
43:29
So that's the left side.
43:30
And that's the right side-- corresponds
43:31
to the large values.
43:33
OK?
43:33
And so basically you can actually
43:35
think of some sort of a table that tells you
43:40
what your QQ plot looks like.
43:41
43:47
And so let's say it looks--
43:48
so we have our reference 45-degree line.
43:50
So let's say this is the QQ plot.
43:52
That could be one thing.
43:54
This could be the QQ plot where I have another thing.
43:59
Then I can do this guy, and then I do this guy.
44:08
So this is like this.
44:10
OK?
44:11
So those are the four cases.
44:13
OK?
44:14
And here what's changing is the right tail,
44:19
and here what's changing is the--
44:20
and when I go from here to here, what changes is the left tail.
44:24
Is that true?
44:26
No, sorry.
44:27
What changes here is the right tail, right?
44:29
It's this part that changes from top to bottom.
44:34
So here it's something about right tail,
44:38
and here that's something about left tail.
44:40
44:44
Everybody understands what I mean when I talk about tails?
44:46
OK.
44:48
And so here it's just going to be
44:50
a question of whether the tails are heavier
44:52
or lighter than the Gaussian.
44:54
Everybody understand what I mean when I say
44:56
heavy tails and light tails?
44:58
OK.
44:59
So right, so heavy tails just means
45:01
that basically here the tails of this guy
45:04
are heavier than the tails of this guy.
45:06
So it means that if I draw them, they're going to be above.
45:08
Actually, I'm going to keep this picture because it's
45:10
going to be very useful for me.
45:11
45:16
When I plug the quantiles at the same-- so let's
45:19
look at the right tail, for example.
45:21
Right here my picture is for right tails.
45:23
When I look at the quantiles of my theoretical distribution--
45:26
so here you can see the bottom curve
45:28
we have the theoretical quantiles,
45:31
and those are the empirical quantiles.
45:34
If I look to the right here, are the theoretical quantiles
45:39
larger or smaller than the empirical quantiles?
45:41
45:47
Let me phrase it the other--
45:48
are the empirical quantiles larger or smaller
45:50
than the theoretical quantiles?
45:53
AUDIENCE: This is a graph of quantiles, right?
45:56
So if it's [INAUDIBLE] it should be smaller.
45:59
PHILIPPE RIGOLLET: It should be smaller, right?
46:01
On this line, they are equal.
46:04
So if I see the empirical quantile showing up here,
46:07
it means that here the empirical quantile is less
46:10
than the theoretical quantile.
46:12
Agree?
46:13
So that means that if I look at this thing--
46:16
and that's for the same values, right?
46:18
So the quantiles are computed for the same values i/n.
46:22
So it means that the empirical quantiles should be looking--
46:25
so that should be the empirical quantile,
46:29
and that should be the theoretical quantile.
46:32
Agreed?
46:34
Those are the smaller values for the same alpha.
46:37
So that implies that the tails--
46:41
the right tail, is it heavy or lighter--
46:43
heavier or lighter than the Gaussian?
46:45
46:50
AUDIENCE: Lighter.
46:51
PHILIPPE RIGOLLET: Lighter, right?
46:52
Because those are the tails of the Gaussian.
46:54
Those are my theoretical quantiles.
46:55
That means that this is the tail of my empirical distribution.
46:59
So they are actually lighter.
47:00
47:08
OK?
47:09
So here, if I look at this thing,
47:11
this means that the right tail is actually light.
47:18
And by light, I mean lighter than Gaussian.
47:20
Heavy, I mean heavier than Gaussian.
47:22
OK?
47:23
OK, now we can probably do the entire thing.
47:27
Well, if this is light, this is going to be heavy, right?
47:31
That's when I'm above the curve.
47:33
47:36
Exercise-- is this light or is this heavy, the first column?
47:40
47:46
And it's OK.
47:47
It should take you at least 30 seconds.
47:51
AUDIENCE: [INAUDIBLE] different column?
47:53
PHILIPPE RIGOLLET: Yeah, this column, right?
47:54
So this is something that pertains--
47:56
this entire column is going to tell me whether the fact
47:59
that this guy is above, does this
48:01
mean that I have lighter or heavier left tails?
48:06
AUDIENCE: Well, on the left, it's heavier.
48:09
PHILIPPE RIGOLLET: On the left, it's heavier.
48:11
OK.
48:12
I don't know.
48:12
Actually, I need to draw a picture.
48:14
You guys are probably faster than I am.
48:17
AUDIENCE: [INTERPOSING VOICES].
48:19
PHILIPPE RIGOLLET: Actually, let me
48:21
check how much randomness is--
48:23
who says it's lighter?
48:26
Who says it's heavier?
48:27
AUDIENCE: Yeah, but we're biased.
48:29
AUDIENCE: [INAUDIBLE]
48:30
PHILIPPE RIGOLLET: Yeah, OK.
48:32
AUDIENCE: [INAUDIBLE]
48:33
PHILIPPE RIGOLLET: All right.
48:34
So let's see if it's heavier.
48:36
So we're on the left tail, and so we have one looks like this,
48:40
one looks like that, right?
48:41
48:45
So we know here that I'm looking at this part here.
48:49
So it means that here my empirical quantile is larger
48:52
than the theoretical quantile.
48:53
48:58
OK?
49:00
So are my tails heavier or lighter?
49:02
49:06
They're lighter.
49:07
That was a bad bias.
49:08
AUDIENCE: [INAUDIBLE]
49:10
PHILIPPE RIGOLLET: Right?
49:11
It's below, so it's lighter.
49:14
Because the problem is that larger for the negative ones
49:19
means that it's smaller [INAUDIBLE],, right?
49:22
Yeah?
49:23
AUDIENCE: Sorry but, what exactly are these [INAUDIBLE]??
49:26
If this is the inverse--
49:28
if this is the inverse CDF, shouldn't everything--
49:32
well, if this is the inverse CDF,
49:34
then you should only be inputting
49:36
values between 0 and 1 in it.
49:38
And--
49:40
PHILIPPE RIGOLLET: Oh, did I put the inverse CDF?
49:42
AUDIENCE: Like on the previous slide, I think.
49:46
PHILIPPE RIGOLLET: No, the inverse
49:48
CDF, yeah, so I'm inputting--
49:49
AUDIENCE: Oh, you're [INAUDIBLE]..
49:51
PHILIPPE RIGOLLET: Yeah, so it's a scatter plot, right?
49:53
So each point is attached-- each point
49:56
is attached 1/n, 2/n, 3/n.
49:59
Now, for each point I'm plotting,
50:01
that's my x-value, which maps a number between 0 and 1
50:05
back onto the entire real line, and my y-value is the same.
50:09
OK?
50:10
So what it means is that those two numbers, this is in the--
50:14
this lives on the entire real line, not on the interval.
50:17
This lives on the entire real line, not in the interval.
50:20
And so my QQ plots take values on the entire real line,
50:26
entire real line, right?
50:28
So you think of it as a parameterized curve, where
50:31
the time steps are 1/n, 2/n, 3/n,
50:34
and I'm just like putting a dot every time I'm making one step.
50:38
OK?
50:41
OK, so what did we say?
50:43
That was lighter, right?
50:46
AUDIENCE: [INAUDIBLE]
50:51
PHILIPPE RIGOLLET: OK?
50:54
One of my favorite exercises is, here's a bunch of densities.
50:58
Here's a bunch of QQ plots.
51:00
Map the correct QQ plot to its own density.
51:04
All right?
51:05
And there won't be mingled lines that allow you to do that,
51:09
then you just have to follow, like at the back of cereal
51:11
boxes.
51:13
All right.
51:15
Are there any questions?
51:17
So one thing-- there's two things
51:18
I'm trying to communicate here is
51:19
if you see a QQ plot, now you should understand,
51:22
one, how it was built, and two, whether it means that you have
51:28
heavier tails or lighter tails.
51:30
Now, let's look at this guy.
51:32
What should we see?
51:34
We should see heavy on the left and heavy on the right, right?
51:37
We know that this should be the case.
51:39
So this thing actually looks like this, and it sort of does,
51:45
right?
51:46
If I take this line going through here,
51:48
I can see that this guy's tipping here,
51:50
and this guy's dipping here.
51:52
But honestly-- actually, I can't remember exactly, but t 15,
51:57
if I plotted the density on top of the Gaussian,
52:01
you can see a difference.
52:02
But if I just gave it to you, it would be very hard
52:04
for you to tell me if there's an actual difference between t
52:07
15 and Gaussian, right?
52:08
Those things are actually very close.
52:11
And so in particular, here we're really
52:12
trying to recognize what the shape is the fact--
52:15
right?
52:16
So t 15 compared to a standard Gaussian was different,
52:20
but t 15 compared to a Gaussian with a slightly larger variance
52:26
is not going to actually-- you're not going
52:27
to see much of a difference.
52:29
So in a way, such distributions are actually not
52:33
too far from the Gaussian, and it's not too--
52:35
it's still pretty benign to conclude that this was actually
52:38
a Gaussian distribution because you can just use the variance
52:42
as a little bit of a buffer.
52:43
I'm not going to get really into how
52:45
you would use a t-distribution into a t-test,
52:50
because it's kind of like Inception, right?
52:54
So but you could pretend that your data actually
52:58
is t-distributed and then build a t-distribution from it,
53:02
but let's not say that.
53:03
Maybe that was a bad example.
53:05
But there's like other heavy-tailed distributions like
53:08
Cauchy distribution, which doesn't even have a--
53:10
it's not even integrable because that's
53:12
as heavy as the tails get.
53:14
And this you can really tell it's going to look like this.
53:18
It's going to be like pfft.
53:22
What does a uniform distribution look like?
53:24
53:30
Like this?
53:32
It's going to be-- it's going to look like a Gaussian one,
53:37
right?
53:38
So a uniform-- so this is my Gaussian.
53:41
A uniform is basically going to look like this,
53:43
one side take the right mean and the right variance, right?
53:46
So the tails are definitely lighter.
53:48
They're 0.
53:49
That's as lighter as it gets.
53:51
So the light-light is going to look like this S shape.
53:55
So an S-- light-tailed distribution has this S shape.
53:59
OK?
53:59
What is the exponential going to look like?
54:02
54:06
So the exponential is positively supported.
54:08
It only has positive numbers.
54:10
So there's no left tail.
54:11
This is also as light as it gets.
54:14
But the right tail, is it heavier or lighter
54:16
than the Gaussian?
54:17
AUDIENCE: Heavier.
54:18
PHILIPPE RIGOLLET: It's heavier, right?
54:19
It's only the case like e of the minus x rather e to the minus
54:21
x squared.
54:22
So it's heavier.
54:23
So it means that on the left it's going to be light,
54:27
and on the right it's going to be heavy.
54:29
So it's going to be U-shaped.
54:31
OK?
54:32
54:35
That will be fine.
54:37
All right.
54:39
Any other question?
54:41
Again, two messages, like, more technical,
54:44
and you can sort of fiddle with it by looking at it.
54:47
You can definitely conclude that this
54:49
is OK enough to be Gaussian for your purposes.
54:53
Yeah?
54:53
AUDIENCE: So [INAUDIBLE]
54:59
PHILIPPE RIGOLLET: I did not hear the "if"
55:01
at the beginning of your sentence.
55:02
55:06
AUDIENCE: I would want to be lighter tail, right,
55:08
because that'll be-- it's easier to reject?
55:10
Is that correct?
55:11
55:16
PHILIPPE RIGOLLET: So what is your purpose as a--
55:20
AUDIENCE: I want to--
55:21
I have some [INAUDIBLE] right?
55:25
I want to be able to say I reject H0 [INAUDIBLE]..
55:28
PHILIPPE RIGOLLET: Yes.
55:29
AUDIENCE: So if you wanted to make it easier
55:32
to reject H0, then--
55:35
PHILIPPE RIGOLLET: Yeah, in a way that's true, right?
55:37
So once you've actually factored in the mean and the variance,
55:40
the only thing that actually--
55:43
right.
55:43
So if you have Gaussian tails or lighter-- even lighter tails,
55:47
then it's harder for you to explain deviations
55:51
from randomness only, right?
55:52
If you have a uniform distribution
55:54
and you see something which is--
55:56
if you're uniform on 0, 1 plus some number and you see 25,
55:59
you know this number is not going to be 0, right?
56:01
So that's basically as good as it gets.
56:04
And there's basically some smooth interpolation
56:06
if you have lighter tails.
56:07
Now, if you start having something that has heavy tails,
56:10
then it's more likely that pure noise
56:12
will generate large observations and therefore discovery.
56:15
So yes, lighter tails is definitely
56:19
the better-behaved noise.
56:21
Let's put it this way.
56:22
The lighter it is, the better behaved it is.
56:24
Now, this is good--
56:27
this is good for some purposes, but when you want to compute
56:30
actual quantiles, like exact quantiles,
56:35
then it is true in general that the quantiles of lighter-tail
56:40
distributions are going to be dominated by the-- are going
56:42
to be dominated by the--
56:46
let's say on the right tails, are
56:47
going to be dominated by those of a heavy distribution.
56:51
That is true.
56:52
But that's not always the case.
56:54
And in particular, there's going to be
56:54
some like sort of weird points where things are actually
56:57
changing depending on what level you're actually looking
56:59
at those things, maybe 5% or 10%,
57:01
in which case things might be changing a little bit.
57:04
But if you started going really towards the tail,
57:06
if you start looking at levels alpha which are 1% or 0.1%,
57:10
it is true that it's always--
57:13
if you can actually-- so if you see something
57:14
that looks light tail, you definitely
57:16
do not want to conclude that it's Gaussian.
57:18
You want to actually change your modeling so that it
57:21
makes your life even easier.
57:23
And you actually factor in the fact
57:25
that you can see that the noise is actually more benign
57:27
than you would like it to be.
57:30
OK?
57:31
57:34
Stretching fingers, that's it?
57:35
All right.
57:37
OK.
57:38
So I want to--
57:40
I mentioned at some point that we had this chi-square test
57:43
that was showing up.
57:45
And I do not know what I did--
57:47
let's just-- oh, yeah.
57:49
So we have this chi-square test that we worked on last time,
57:53
right?
57:54
So the way I introduced the chi-square test is by saying,
57:57
I am fascinated by this question.
57:59
Let's check if it's correct, OK?
58:01
Or something maybe slightly deeper--
58:04
let's check if juries in this country
58:06
are representative of racial distribution.
58:10
But you could actually-- those numbers here
58:14
come from a very specific thing.
58:16
That was the uniform.
58:16
That was our benchmark.
58:17
Here's the uniform.
58:19
And there was this guy, which was a benchmark, which
58:21
was the actual benchmark that we need to have for this problem.
58:24
And those things basically came out of my hat, right?
58:27
Those are numbers that exist.
58:29
But in practice, you actually make those numbers yourself.
58:33
And the way you do it is by saying, well,
58:36
if I have a binomial distribution
58:39
and I want to test if my data comes
58:41
from a binomial distribution, you
58:42
could ask this question, right?
58:44
You have a bunch of data.
58:45
I did not promise to you that this
58:48
was the sum of independent Bernoullis and [INAUDIBLE]..
58:50
And then you can actually check that it's a binomial indeed,
58:53
and you have binomial.
58:55
If you think about where you've encountered binomials,
58:57
it was mostly when you were drawing balls
58:59
from urns, which you probably don't do that much in practice.
59:02
OK?
59:02
And so maybe one day you want to model things as a binomial,
59:05
or maybe you want to model it as a Poisson,
59:07
as a limiting binomial, right?
59:08
People tell you photons arrive--
59:11
the rate of a photon hitting some surface
59:13
is actually a Poisson distribution, right?
59:15
That's where they arise a lot in imaging.
59:18
So if I have a colleague who's taking pictures
59:21
of the skies over night, and he's like following stars
59:23
and it's just like moving around with the rotation of the Earth.
59:26
And he has to do this for like eight hours
59:28
because he needs to get enough photons over this picture
59:30
to actually arise.
59:32
And he knows they arrive at like a Poisson process,
59:35
and you know, chapter 7 of your probability class, I guess.
59:39
And
59:40
And there's all these distributions
59:43
outside the classroom you probably
59:44
want to check that they're actually correct.
59:46
And so the first one you might want to check, for example,
59:49
is a binomial.
59:49
So I give you a distribution, a binomial distribution
59:52
on, say, K trials, and you have some number p.
59:56
And here, I don't know typically what p should be,
59:59
but let's say I know it or estimate it from my data.
60:01
And here, since we're only going to deal with asymptotics,
60:04
just like it was the case for the Kolmogorov-Smirnov one,
60:07
in the asymptotic we're going to be
60:08
able to think of the estimated p as being a true p, OK,
60:13
under the null at least.
60:15
So therefore, each outcome, I can actually tell you what
60:19
the probability of a binomial--
60:20
is this outcome.
60:21
For a given K and a given p, I can tell you
60:23
exactly what a binomial should give you
60:25
as the probability for the outcome.
60:27
And that's what I actually use to replace the numbers 1/12,
60:33
1/12, 1/12, 1/12 or the numbers 0.72, 0.7, 0.12, 0.9.
60:41
All these numbers I can actually compute
60:43
using the probabilities of a binomial, right?
60:45
So I know, for example, that the probability that a binomial np
60:52
is equal to, say, K is n choose K p to the K 1 minus p
61:02
to the n minus K. OK?
61:05
I mean, so these are numbers.
61:07
If you give me p and you give me n,
61:08
I can compute those numbers for all K from 0 to n.
61:12
And from this I can actually build a table.
61:14
61:22
All right?
61:22
So for each K--
61:25
0.
61:26
So K is here, and from 0, 1, et cetera,
61:31
all the way to n, I can compute the true probability, which
61:35
is the probability that my binomial np is equal to 0,
61:40
the probability that my binomial is equal to 1, et cetera,
61:45
all the way to n.
61:46
I can compute those numbers.
61:47
Those are actually going to be exact numbers, right?
61:50
I just plug in the formula that I had.
61:52
And then I'm going to have some observed.
61:54
62:01
So that's going to be p hat, 0, and that's basically
62:05
the proportion of 0's, right?
62:12
So here you have to remember it's not a one-time experiment
62:16
like you do in probability where you say,
62:18
I'm going to draw n balls from an urn,
62:22
and I'm counting how many--
62:24
how many I have.
62:25
This is statistics.
62:25
I need to be able to do this experiment many times
62:28
so I can actually, in the end, get an idea of what
62:31
the proportion of p's is.
62:33
So you have not just one binomial,
62:36
but you have n binomials.
62:38
Well, maybe I should not use n twice.
62:40
So that's why it's the K here, right?
62:42
So I have a binomial [INAUDIBLE] at Kp
62:44
and I just seize n of those guys.
62:46
And with this n of those guys, I can actually
62:48
estimate those probabilities.
62:50
And what I'm going to want to check
62:51
is if those two probabilities are actually
62:53
close to each other.
62:54
But I already know how to do this.
62:57
All right?
62:58
So here I'm going to test whether P
63:00
is in some parametric family, for example,
63:02
binomial or not binomial.
63:06
And testing-- if I know that it's a binomial [INAUDIBLE],,
63:09
and I basically just have to test if P is the right thing.
63:12
OK?
63:14
Oh, sorry, I'm actually lying to you here.
63:17
OK.
63:18
I don't want to test if it's binomial.
63:19
I want to test the parameter of the binomial here.
63:24
OK?
63:24
So I know-- no, sorry, [INAUDIBLE] sorry.
63:28
OK.
63:28
So I want to know if I'm in some family,
63:30
the family of binomials, or not in the family of binomials.
63:34
OK?
63:35
Well, that's what I want to do.
63:36
And so here H0 is basically equivalent to testing
63:39
if the pj's are the pj's that come from the binomial.
63:42
And the pj's here are the probabilities that I get.
63:46
This is the probability that I get j successes.
63:50
That's my pj.
63:51
That's j's value here.
63:54
OK?
63:54
So this is the example, and we know how to do this.
63:57
We construct p hat, which is the estimated
64:00
proportion of successes from the observations.
64:03
So here now I have n trials.
64:05
This is the actual maximum likelihood estimator.
64:08
This becomes a multinomial experiment, right?
64:12
So it's kind of confusing.
64:13
We have a multinomial experiment for a binomial distribution.
64:17
The binomial here is just a recipe
64:19
to create some test probabilities.
64:21
That's all it is.
64:22
The binomial here doesn't really matter.
64:24
It's really to create the test probabilities.
64:26
And then I'm going to define this test statistic, which
64:28
is known as the chi-square statistic, right?
64:36
This was the chi-square test.
64:37
We just looked at sum of the square root of the differences.
64:41
Inverting the covariance matrix or using the Fisher information
64:45
with removing the part that was not invertible
64:46
led us to actually use this particular value here,
64:50
and then we had to multiply by n.
64:54
OK?
64:55
And that, we know, converges to what?
64:59
A chi-square distribution.
65:01
So I'm not going to go through this again.
65:03
I'm just telling you you can use the chi-square
65:05
that we've seen, where we just came up with the numbers we
65:08
were testing.
65:09
Those numbers that were in this row for the true probabilities,
65:12
we came up with them out of thin air.
65:14
And now I'm telling you you can actually
65:15
come up with those guys from a binomial distribution
65:19
or a Poisson distribution or whatever
65:20
distribution you're happy with.
65:22
65:26
Any question?
65:26
65:30
So now I'm creating this thing, and I
65:31
can apply the entire theory that I have for the chi-square
65:34
and, in particular, that this thing converges
65:36
to a chi-square.
65:38
But if you see, there's something that's different.
65:40
What is different?
65:42
65:45
The degrees of freedom.
65:47
And if you think about it, again, the meaning of degrees
65:51
of freedom.
65:52
What does this word--
65:54
these words actually mean?
65:55
It means, well, to which extent can I
65:57
play around with those values?
65:59
What are the possible values that I can get?
66:01
If I'm not equal to this particular value I'm testing,
66:03
how many directions can I be different from this guy?
66:07
And when we had a given set of values,
66:10
we could be any other set of values, right?
66:13
So here, I had this--
66:16
I'm going to represent-- this is the set of all probability
66:19
distributions of vectors of size K. So here,
66:23
if I look at one point in this set,
66:25
this is something that looks like p1 through pK such that
66:29
their sum--
66:30
such that they're non-negative, and the sum p1 through pK
66:36
is equal to 1.
66:37
OK?
66:37
So I have all those points here.
66:40
OK?
66:41
So this is basically the set that I had before.
66:44
I was testing whether I was equal to this one guy,
66:47
or if I was anything else.
66:48
And there's many ways I can be anything else.
66:51
What matters, of course, is what's around this guy
66:53
that I could actually confuse myself with.
66:55
But there's many ways I can move around this guy.
66:58
Agreed?
67:00
Now I'm actually just testing something very specific.
67:04
I'm saying, well, now the piece that I
67:06
have had to come from this-- have
67:09
to be constructed from this formula, this parametric family
67:13
P of theta.
67:14
And there's a fixed way for-- let's say this is theta,
67:20
so I have a theta here.
67:23
There's not that many ways this can actually give me
67:26
a set of probabilities, right?
67:28
I have to move to another theta to actually start
67:31
being confused.
67:32
And so here the number of degrees of freedom
67:34
is basically, how can I move along this family?
67:39
And so here, this is all the points,
67:41
but there might be just the subset
67:43
of the points that looks like this, just this curve,
67:45
not the half of this thing.
67:48
And those guys on this curve are the p thetas,
67:56
and that's for all thetas when theta runs across data.
68:00
So in a way, this is just a much smaller dimensional thing.
68:03
It's a much smaller object.
68:04
Those are only the ones that I can
68:06
create that are exactly of this very specific parametric form.
68:13
And of course, not all are of this form.
68:15
Not all probability PMFs are of this form.
68:19
And so that is going to have an effect
68:20
on what my PMF is going to be--
68:24
sorry, on what my--
68:28
sorry, what my degrees of freedoms are going to be.
68:33
Because when this thing is very small, that means when--
68:39
that's happening when theta is actually,
68:41
say, a one-dimensional space, then there's still
68:44
many ways I can escape, right?
68:46
I can be different from this guy in pretty
68:48
much every other direction, except for those two
68:50
directions, just when I move from here
68:53
or when I move in this direction.
68:56
But now if this thing becomes bigger,
69:00
your theta is, say, two dimensional,
69:03
then when I'm here it's becoming harder
69:06
for me to not be that guy.
69:07
If I want to move away from it, then I
69:08
have to move away from the board.
69:11
And so that means that the bigger the dimension
69:15
of my theta, the smaller the degrees of freedoms
69:18
that I have, OK, because moving out of this parametric family
69:24
is actually very difficult for me.
69:27
So if you think, for example, as an extreme case,
69:30
the parametric family that I have is basically all PMFs,
69:36
all of them, right?
69:38
So that's a stupid parametric family.
69:39
I'm indexed by the distribution itself,
69:41
but it's still finite dimensional.
69:43
Then here, I have basically no degrees of freedom.
69:46
There's no way I can actually not
69:48
be that guy, because this is everything I have.
69:51
And so you don't have to really understand
69:54
how the computation comes into the numbers of dimension
69:59
and what I mean by dimension of this current space.
70:01
But really, what's important is that as the dimension of theta
70:05
becomes bigger, I have less degrees of freedom
70:09
to be away from this family.
70:11
This family becomes big, and it's very hard for me
70:13
to violate this.
70:14
So it's actually shrinking the number of degrees
70:17
of freedom of my chi-square.
70:18
And that's all you need to understand.
70:20
When d increases, the number of degrees of freedom decreases.
70:23
And I'd like to you to have an idea of why this is somewhat
70:27
true, and this is basically the picture
70:28
you should have in mind.
70:30
70:33
OK.
70:33
So now once I have done this, I can just construct.
70:35
So here I need to check.
70:37
So what is d in the case of the binomial?
70:39
70:42
AUDIENCE: 1.
70:43
PHILIPPE RIGOLLET: 1, right?
70:43
It's just a one-dimensional thing.
70:44
And for most of the examples we're
70:46
going to have it's going to be one dimensional.
70:48
So we have this weird thing.
70:49
We're going to have K minus 2 degrees of freedom.
70:51
70:54
So now I have this thing, and I have this asymptotic.
70:59
And then I can just basically use a test that has--
71:02
that uses the fact that the asymptotic distribution
71:04
is this.
71:05
So I compute my quantiles out of this.
71:06
Again, I made the same mistake.
71:08
This should be q alpha, and this should be q alpha.
71:11
So that's just the tail probability
71:13
is equal to alpha when I'm on the right of q alpha.
71:16
And so those are the tail probability
71:18
of the appropriate chi-square with the appropriate number
71:20
of degrees of freedom.
71:22
And so I can compute p-values, and I can do whatever I want.
71:24
OK?
71:25
So then I just like [INAUDIBLE] my testing machinery.
71:28
OK?
71:29
So now I know how to test if I'm a binomial distribution or not.
71:34
Again here, testing if I'm a binomial distribution
71:38
is not a simple goodness of fit.
71:40
It's a composite one where I can actually--
71:43
there's many ways I can be a binomial distribution
71:45
because there's as many as there is theta.
71:48
And so I'm actually plugging in the theta hat, which is
71:51
estimated from the data, right?
71:54
And here, since everything's happening in the asymptotics,
71:57
I'm not claiming that Tn has a pivotal distribution
72:00
for finite n.
72:01
That's actually not true.
72:02
It's going to depend like crazy on what
72:04
the actual distribution is.
72:06
But asymptotically, I have a chi-square,
72:08
which obviously does not depend on anything [INAUDIBLE]..
72:11
OK?
72:13
Yeah?
72:14
AUDIENCE: So in general, for the binomial [INAUDIBLE] trials.
72:19
But in the general case, the number of--
72:23
the size of our PMF is the number of [INAUDIBLE]..
72:26
PHILIPPE RIGOLLET: Yeah.
72:27
AUDIENCE: So let's say that I was also
72:29
uncertain about what K was so that I don't
72:32
know how big my [INAUDIBLE] is.
72:37
[INAUDIBLE]
72:48
PHILIPPE RIGOLLET: That is correct.
72:50
And thank you for this beautiful segue into my next slide.
72:54
So we can actually deal with the case
72:56
not only where it's infinite, which
72:57
would be the case of Poisson.
72:58
I mean, nobody believes I'm going
73:00
to get an infinite number of photons
73:02
in a finite amount of time.
73:04
But we just don't want to have to say there's got to be a--
73:08
this is the largest possible number.
73:09
We don't want to have to do that.
73:10
Because if you start doing this and the probabilities
73:13
become close to 0, things become degenerate and it's an issue.
73:16
So what we do is we bin.
73:18
We just bin stuff.
73:19
OK?
73:20
And so maybe if I have a binomial distribution
73:23
with, say, 200,000 possible values,
73:28
then it's actually maybe not the level of precision
73:32
I want to look at this.
73:33
Maybe I want to bin.
73:33
Maybe I want to say, let's just think
73:35
of all things that are between 0 and 100
73:37
to be the same thing, between 100 and 200 the same thing,
73:40
et cetera.
73:41
And so in fact, I'm actually going to bin.
73:44
I don't even have to think about things that are discrete.
73:46
I can even think about continuous cases.
73:49
And so if I want to test if I have a Gaussian distribution,
73:51
for example, I can just approximate that by some,
73:55
say, piecewise constant function that just says that,
73:59
well, if I have a Gaussian distribution like this,
74:03
I'm going to bin it like this.
74:06
And I'm going to say, well, the probability that I'm
74:08
less than this value is this.
74:10
The probability that I'm between this and this value is this.
74:12
The probability I'm between this and this value
74:14
is this, and then this and then this, right?
74:18
And now I've turned--
74:19
I've discretized, effectively, my Gaussian into a PMF.
74:24
The value-- this is p1.
74:26
The value here is p1.
74:28
This is p2.
74:30
This is p3.
74:32
This is p4.
74:35
This is p5 and p6, right?
74:39
I have discretized my Gaussian into six possible values.
74:41
That's just the probability that they fall into a certain bin.
74:46
And we can do this--
74:47
if you don't know what K is, just stop at 10.
74:51
You look at your data quickly and you say, well, you know,
74:54
I have so few of them that are-- like I see maybe one 8, one 11,
75:00
and one 15.
75:01
Well, everything that's between 8 and 20
75:03
I'm just going to put it in one bin.
75:05
Because what else are you going to do?
75:07
I mean, you just don't have enough observations.
75:09
And so what we do is we just bin everything.
75:11
So here I'm going to actually be slightly abstract.
75:14
Our bins are going to be intervals Aj.
75:16
So here-- they don't even have to be intervals.
75:18
I could go crazy and just like call the bin this guy
75:21
and this guy, right?
75:23
That would make no sense, but I could do that.
75:27
And then I'm-- and of course, you can do whatever you want,
75:30
but there's going to be some consequences in the conclusions
75:33
that you can take, right?
75:34
All you're going to be able to say
75:35
is that my distribution does not look like it
75:38
could be binned in this way.
75:40
That's all you're going to be able to say.
75:42
So if you decide to just put all the negative numbers
75:46
and the positive numbers, then it's
75:48
going to be very hard for you to distinguish
75:50
a Gaussian from a random variable that takes values
75:52
of minus 1 and plus 1 only.
75:54
You need to just be reasonable.
75:57
OK?
75:57
So now I have my pj's become the probability
76:00
that my random variable falls into bin j.
76:02
76:06
So that's pj of theta under the parametric distribution.
76:10
For the true one, whether it's parametric or not, I have a pj.
76:14
And then I have p hat j, which is
76:15
the proportion of observations that falls in this bin.
76:19
All right?
76:19
So I have a bunch of observations.
76:21
I count how many of them fall in this bin.
76:23
I divide by n, and that tells me what my estimated
76:26
probability for this bin is.
76:29
And theta hat, well, it's the same as before.
76:31
If I'm in a parametric family, I'm
76:32
just estimating theta hat, maybe the maximum likelihood
76:35
estimator, plug it in, and estimate
76:37
those pj's of theta hat.
76:39
From this, I form my chi-square, and I have exactly
76:43
the same thing as before.
76:45
So the answer to your question is, yes, you bin.
76:48
And it's the answer to even more questions.
76:51
So that's why there you can actually
76:53
use the chi-square test to test for normality.
76:56
Now here it's going to be slightly weaker,
76:58
because there's only an asymptotic theory,
77:00
whereas Kolmogorov-Smirnov and Kolmogorov-Lilliefors work
77:03
actually even for finite samples.
77:06
For the chi-square test, it's only asymptotic.
77:08
So you just pretend you actually know what the parameters are.
77:11
You just stuff them into a theta, a mu hat,
77:15
and sigma square hat.
77:16
And you just go to-- you just cross your finger
77:19
that n is large enough for everything
77:21
to have converged by the time you make your decision.
77:24
OK?
77:24
And then this is a copy/paste, with the same error actually
77:28
as the previous slide, where you just build your test based
77:31
on whether you exceed or not some quantile,
77:34
and you can also compute some p-value.
77:37
OK?
77:38
AUDIENCE: The error?
77:39
PHILIPPE RIGOLLET: I'm sorry?
77:40
AUDIENCE: What's the error?
77:41
PHILIPPE RIGOLLET: What is the error?
77:43
AUDIENCE: You said [INAUDIBLE] copy/paste [INAUDIBLE]..
77:45
PHILIPPE RIGOLLET: Oh, the error is that this
77:47
should be q alpha, right?
77:48
AUDIENCE: OK.
77:49
PHILIPPE RIGOLLET: I've been calling this q alpha.
77:51
I mean, that's my personal choice,
77:53
because I don't want to--
77:54
I only use q alpha.
77:55
So I only use quantiles where alpha is to the right, so.
77:59
That's what statisticians-- probabilists
78:01
would use this notation.
78:02
78:07
OK.
78:07
And so some questions, right?
78:10
So of course, in practice you're going
78:11
to have some issues which translate.
78:13
I say, well, how do you pick this guy, this K?
78:16
So I gave you some sort of a--
78:17
I mean, the way we discussed, right?
78:19
You have 8 and 10 and 20, then it's ad hoc.
78:23
And so depending on whether you want to stop K at 20
78:27
or if you want to bin those guys is really up to you.
78:29
And there's going to be some considerations
78:31
about the particular problem at hand.
78:32
I mean, is it coarse-- too coarse
78:34
for your problem to decide that the observations between 8
78:38
and 20 are the same?
78:39
It's really up to you.
78:40
Maybe that's actually making a huge difference
78:42
in terms of what phenomenon you're looking at.
78:45
The choice of the bins, right?
78:46
So here there's actually some sort
78:48
of rules, which are don't use only one bin
78:51
and make sure there's actually-- don't use them too small
78:55
so that there's at least one observation per bin, right?
78:57
And it's basically the same kind of rules
78:59
that you would have to build a histogram.
79:00
If you were to build a histogram for your data,
79:02
you still want to make sure that you
79:03
bin in an appropriate fashion.
79:05
OK?
79:05
And there's a bunch of rule of thumbs.
79:08
Every time you ask someone, they're
79:09
going to have a different rule of thumb,
79:11
so just make your own.
79:13
And then there's the computation of pj
79:17
of theta, which might be a bit complicated
79:19
because, in this case, I would have
79:21
to integrate the Gaussian between this number
79:24
and this number.
79:25
So for this case, I could just say, well,
79:27
it's the difference of the CDF in that value and that value
79:30
and then be happy with it.
79:31
But you can imagine that you have some slightly more
79:33
crazy distributions.
79:34
You're going to have to somewhat compute
79:36
some integrals that might be unpleasant for you to compute.
79:39
OK?
79:40
And in particular, I said the difference
79:41
of the PDF between that value and that value of-- sorry,
79:44
the CDF between that value and that value, it is true.
79:47
But it's not like you actually have
79:49
tables that compute the CDF at any value you like, right?
79:52
You have to sort of--
79:54
well, there might be but at some degree,
79:56
but you are going to have to use a computer typically
79:58
to do that.
80:01
OK?
80:01
And so for example, you could do the Poisson.
80:05
If I had time, if I had more than one minute,
80:07
I would actually do it for you.
80:08
But it's basically the same.
80:10
The Poisson, you are going to have an infinite tail,
80:12
and you just say, at some point I'm
80:14
going to cut everything that's larger than some value.
80:16
All right?
80:17
So you can play around, right?
80:20
I say, well, if you have extra knowledge about what you expect
80:23
to see, maybe you can cut at a certain number
80:26
and then just fold all the largest values from K minus 1
80:30
to infinity so that you actually have--
80:35
you have everything into one large bin.
80:37
OK?
80:38
That's the entire tail.
80:39
And that's the way people do it in insurance companies,
80:42
for example.
80:42
They assume that the number of accidents you're going to have
80:45
is a Poisson distribution.
80:47
They have to fit it to you.
80:48
They have to know--
80:49
or at least to your pool of insurance of injured people.
80:52
So they just slice you into what your character--
80:56
relevant characteristics are, and then they
80:58
want to estimate what the Poisson distribution is.
81:00
And basically, they can do a chi-square test
81:03
to check if it's indeed a Poisson distribution.
81:06
All right.
81:07
So that will be it for today.
81:10
And so I'll be--
81:11
I'll have your homework--
81:13