字幕記錄


00:00
00:00
PROFESSOR: The following content is provided under a Creative
00:02
Commons license.
00:03
Your support will help MIT OpenCourseWare
00:06
continue to offer high quality educational resources for free.
00:10
To make a donation or to view additional materials
00:12
from hundreds of MIT courses, visit MIT OpenCourseWare
00:16
at ocw.mit.edu.
00:17
00:21
So welcome back.
00:23
So we are now moving to a new chapter, which
00:26
is going to have a little more of a statistical flavor
00:30
when it comes to designing methods, all right?
00:32
Because if you think about it, OK--
00:35
some of you have probably attempted problem number two
00:39
in the problem set.
00:39
And you realize that maximum likelihood estimators does not
00:44
give you super trivial estimators, right?
00:48
I mean, when you have an n theta theta, then the thing you get
00:50
is not something you could have guessed before you actually
00:53
attempted to solve that problem.
00:55
And so, in a way, we've seen already sophisticated methods.
00:59
However, in many instances, the maximum likelihood estimator
01:02
was just an average.
01:03
And in a way, even if we had this confirmation
01:07
for maximum likelihood that indeed that was the estimator
01:09
that maximum likelihood would spit out,
01:11
and that our intuition was therefore pretty good,
01:15
most of the statistical analysis or use of the central limit
01:18
theorems, all these things actually
01:20
did not come in the building of estimator,
01:23
in the design of the estimator, but really
01:25
in the analysis of the estimator.
01:27
And you could say, well, if I know already
01:29
that the best estimator is the average,
01:31
I'm just going to use the average.
01:32
I don't have to, basically, quantify how good it is.
01:35
I just know it's the best I can do.
01:37
We're going to talk about tests.
01:39
And we're going to talk about parametric hypothesis testing.
01:44
So you should view this as-- parametric means,
01:46
well, it's about a parameter, like we did before.
01:49
And hypothesis testing is on the same level as estimation.
01:54
And on the same level as estimator
01:56
will be the word "test," OK?
01:58
And when we're going to devise a test,
02:01
we're going to actually need to understand
02:03
random fluctuations that arise from the central limit theorem
02:06
better, OK?
02:06
It's not just going to be in the analysis.
02:08
It's also going to be in the design.
02:10
And everything we've been doing before in understanding
02:12
the behavior of an estimator is actually
02:14
going to come in and be extremely
02:16
useful in the actual design of tests, OK?
02:21
So as an example, I want to talk to you about some real data.
02:25
02:28
I will not study this data.
02:29
But this data actually exist.
02:31
You can find it on R. And so, it's
02:34
the data from the so-called credit union Cherry Blossom
02:37
Run, which is a 10 mile race.
02:38
It takes place every year in D.C.
02:40
It seems that some of the years are pretty nice.
02:42
In 2009, there were about 15,000 participants.
02:45
Pretty big race.
02:47
And the average running time was 103.5 minutes, all right?
02:52
So about an hour and a half or a little bit more.
02:57
And so, you can ask the following question, right?
03:01
This is actual data, right?
03:02
103.5 actually averaged the running time for all of 15,000.
03:07
Now, this in practice, may not be something very suitable.
03:10
And you might want to just sample a few runners
03:13
and try to understand how they're
03:15
behaving every year without having
03:16
to collect the entire data set.
03:18
And so, you could ask the question, well,
03:20
let's say my budget is to ask for maybe 10 runners
03:24
what their running time was.
03:25
I still want to be able to determine
03:27
whether they were running faster in 2012 than in 2009.
03:31
Why do I put 2012, and not 2016?
03:34
Well, because the data set for 2012 is also available.
03:38
So if you are interested and you know how to use R,
03:41
just go and have fun with it.
03:44
So to answer this question, what we do is we select n runners,
03:47
right?
03:47
So n is a moderate number that's more manageable than 15,000.
03:51
From the 2012 race at random.
03:53
That's where the random variable is going to come from, right?
03:56
That's where we actually inject randomness in our problem.
03:58
04:02
So remember this is an experience.
04:04
So really in a way, the runners are the omegas.
04:06
And I'm interested in measurements on those guys.
04:10
So this is how I have a random variable.
04:11
And this random verbal here is measuring their running time.
04:15
OK.
04:16
If you look at the data set, you have all sorts
04:18
of random variables you could measure
04:19
about those random runners.
04:21
Country of origin.
04:22
I don't know, height, age, a bunch of things.
04:25
OK.
04:25
Here, the random variable of interest
04:27
being the running time.
04:29
OK.
04:30
Everybody understand what the process is?
04:32
OK.
04:33
So now I'm going to have to make some modeling assumptions.
04:36
And here, I'm actually pretty lucky.
04:37
I actually have all the data from a past year.
04:41
I mean, this is not the data from 2012, which I also have,
04:44
but I don't use.
04:45
But I can actually use past data to try to understand what
04:47
distribution do I have, right?
04:49
I mean, after all, running time is going
04:51
to be rounded to something.
04:52
Maybe I can think of it as a discrete random variable.
04:55
Maybe I can think of it as the exponential random variable.
04:58
Those are positive numbers.
05:00
I mean, there's many kind of running times
05:01
that could come up to mind.
05:03
Many kind of distributions I could think
05:04
of for this modeling part.
05:06
But it turns out that if you actually
05:08
plug the histogram of those running times for all 15,000
05:11
runners in 2009, you actually are
05:14
pretty happy to see that it really
05:16
looks like a bell-shaped curve, which suggest
05:18
that this should be a Gaussian.
05:19
So what you go on to do is you estimate the mean
05:25
from past observations, which was actually 103.5, as we said.
05:29
You submit the variance, which was 373.
05:34
And you just try to superimpose the curve
05:37
with this one, which is a Gaussian PDF with mean 103.5
05:43
and variants 373.
05:45
And you see that they actually look very much alike.
05:48
And so here, you're pretty comfortable to say
05:50
that the running time actually is Gaussian distribution.
05:53
All right?
05:54
So now I know that the x1 to xn, I'm
05:56
going to say they're Gaussian, OK?
05:58
I still need to specify two parameters.
06:01
So what I want to know is, is the distribution the same
06:05
from past years, right?
06:06
So I want to know if the random variable that I'm looking
06:08
for-- if I, say, pick one.
06:09
Say, x1.
06:10
Does it have the same distribution in 2012
06:12
that it did in 2009?
06:15
OK.
06:16
And so, the question is, is x1 has
06:19
a Gaussian distribution with mean 103.5 and variance 373?
06:23
Is that clear?
06:24
OK.
06:25
So this question that calls for a yes or no answer
06:30
is a hypothesis testing problem.
06:31
I am testing a hypothesis.
06:34
And this is the basis of basically all
06:36
of data-driven scientific inquiry.
06:39
You just ask questions.
06:40
You formulate a scientific hypothesis.
06:43
Knocking down this gene is going to cure melanoma, is this true?
06:48
I'm going to collect.
06:49
I'm going to try.
06:50
I'm to observe some patients on which I knock down this gene.
06:52
I'm going to collect some measurements.
06:54
And I'm going to try to answer this yes/no question, OK?
07:00
It's different from the question,
07:02
what is the mean running time for this year?
07:08
OK.
07:10
So this hypothesis testing is testing
07:12
if this hypothesis is true.
07:13
The hypothesis in common English we just said,
07:20
were runners running faster?
07:22
All right?
07:22
Anybody could formulate this hypothesis.
07:24
Now, you go to a statistician.
07:25
And he's like, oh, what you're really
07:27
asking me is x1 has a Gaussian distribution with mean
07:32
less than 103.5 and variance 373, right?
07:35
That's really the question that you ask in statistical terms.
07:38
And so, if you're asking if this was the same as before,
07:42
there's many ways it could not be the same as before.
07:44
There's basically three ways it could not
07:46
be the same as before.
07:47
It could be the case that x1 is in expectation to 103.5
07:54
So the expectation has changed.
07:56
Or the variance has changed.
07:58
Or the distribution has changed.
08:00
I mean, who knows?
08:01
Maybe runners are now just all running holding their hands.
08:05
And it's like now a point mass at 1 given point.
08:08
OK.
08:08
So you never know what could [INAUDIBLE]..
08:11
Now of course, if you allow for any change,
08:14
you will find change.
08:16
And so what you have to do is to factor in as much knowledge
08:18
as you can.
08:19
Make as many modeling assumptions,
08:20
so that you can let the data speak
08:22
about your particular question.
08:23
Here, your particular question is, are they running faster?
08:27
So you're only really asking a question about the expectation.
08:30
You really want to know if the expectation has changed.
08:33
So as far as you're concerned, you're
08:35
happy to make the assumption that the rest has
08:37
been unchanged.
08:38
OK.
08:39
And so, this is the question we're asking.
08:42
Is the expectation now less than 103.5?
08:45
Because you specifically asked whether runners were
08:48
going faster this year, right?
08:50
They tend to go faster rather than slower, all right?
08:55
OK.
08:56
So this is the question we're asking in mathematical terms.
09:00
So first, when I did that, I need to basically fix the rest.
09:03
And fixing the rest is actually part
09:05
of the modeling assumptions.
09:07
So I fixed my variance to be 373.
09:10
OK?
09:11
I assume that the variance has not
09:13
changed between 2009 and 2012.
09:17
Now, this is an assumption.
09:18
It turns out it's wrong.
09:21
So if you look at the data from 2012,
09:22
this is not the correct assumption.
09:24
But I'm just going to make it right now
09:26
for the sake of argument, OK?
09:29
And also the fact that it's Gaussian.
09:31
Now, this is going to be hard to violate, right?
09:34
I mean, where did this bell-shaped curve come from?
09:37
Well, it's just natural when you just measure a bunch of things.
09:42
The central limit theorem appears
09:44
in the small things of nature.
09:45
I mean, that's the bedtime story you get about the central limit
09:48
theorem.
09:48
And that's why the bell-shaped curve is everywhere in nature.
09:50
It's the sum of little independent things
09:52
that are going on.
09:53
And this Gaussian assumption, even if I wanted to relax it,
09:57
there's not much else I can do.
09:58
It is pretty robust across the years.
10:02
All right.
10:02
So the only thing that we did not fix
10:04
is the expectation of x1, which now I want to know what it is.
10:08
And since I don't know what it is, I'm going to call it mu.
10:11
And it's going to be a variable of interest, all right?
10:13
So it's just a number mu.
10:14
Whatever this is I can try to estimate it, maybe using
10:17
maximum likelihood estimation.
10:19
Probably using the average, because this is Gaussian.
10:21
And we know that the maximum likelihood
10:22
estimator for a Gaussian is just the average.
10:26
And now we only want to test if mu is equal to 103.5,
10:30
like it was in 2009.
10:34
Or on the contrary, if mu is not equal to 103.5.
10:37
And more specifically, if mu is actually
10:39
strictly less than 103.5.
10:41
That's the question you ask.
10:42
Now, why am I in writing mu equal to 103.5 or is
10:49
less than 103.5 and equal to 103.5
10:53
versus not equal to 103.5?
10:55
It's because since you asked me a more precise question,
10:58
I'm going to be able to give you a more precise answer.
11:01
And so, if your question is very specific--
11:03
are they running faster?
11:05
I'm going to factor that in what I write.
11:08
If you just ask me, is it the same?
11:10
I'm going to have to write, or is it different than 103.5?
11:13
And that's less information about what
11:15
you're looking for, OK?
11:19
So by making all these modeling assumptions--
11:23
the fact that the variance doesn't change,
11:25
the fact that it's still Gaussian--
11:26
I've actually reduced the number of.
11:31
And I put numbers in quotes, because this is still
11:34
an infinite of them.
11:35
But I'm limiting the number of ways
11:38
the hypothesis can be violated.
11:40
11:43
The number of possible alternative realities
11:46
for this hypothesis, all right?
11:48
For example, I'm saying there's no way
11:50
mu can be larger than 103.5.
11:53
I've already factored that in, OK?
11:55
It could be.
11:56
But I'm actually just going to say that if it's larger,
11:59
all I'm going to be able to tell you is that it's not smaller.
12:02
I'm not going to be able to tell you
12:06
that it's actually larger, OK?
12:07
12:12
And the only way it can be rejected now.
12:15
The only way I can reject my hypothesis
12:17
is if x belongs to very specific family of distributions.
12:22
If it has a distribution which is Gaussian
12:24
with mean mu and variance of 373 for mu, which is less 103.5.
12:29
All right?
12:30
So we started with basically was x1--
12:40
so that's the reality.
12:41
x1 follows n 103.5 373, OK?
12:49
And this is everything else, right?
12:53
So for example, here is x follows
12:55
some exponential, 0.1, OK?
13:02
This is just another distribution here.
13:04
Those are all the possible distributions.
13:06
What we said is we said, OK, first of all, let's just
13:09
keep only those Gaussian distributions, right?
13:13
And second, we said, well, among those Gaussian distributions,
13:18
let's only look at those that have-- well,
13:20
maybe this one should be at the boundary--
13:24
let's only look at the Gaussians here.
13:26
So this guy here are all the Gaussians
13:33
with mean mu and variance 373 for mu less than 103.5, OK?
13:43
So when you're going to give me data,
13:45
I'm going to be able to say, well, am I this guy?
13:48
Or am I one of those guys?
13:49
Rather than searching through everything.
13:51
And the more you search the easier for you
13:53
to find something that fits better the data, right?
13:56
And so, if I allow everything possible,
14:00
then there's going to be something
14:01
that just by pure randomness is actually going to look better
14:04
for the data, OK?
14:06
14:09
So for example, if I draw 10 random variables, right?
14:12
If n is equal to 10.
14:15
And let's say they take 10 different values.
14:18
Then it's actually more likely that those guys
14:20
come from a discrete distribution that
14:23
takes each of these values with probability 1 over 10,
14:27
than actually some Gaussian random variable, right?
14:30
That would be perfect.
14:31
I can actually explain it.
14:32
If the 10 numbers I got were say--
14:36
let's say I collect 3, 90, 95, and 102.
14:41
Then the most likely distribution for those guys
14:44
is the discrete distribution that
14:46
takes three values, 91 with probability 1/3, 95
14:51
with probability 1/3, and 102 with probably 1/3, right?
14:57
That's definitely the most likely distribution for this.
14:59
So if I allowed this, I would say, oh no.
15:02
This is not distributed according to that.
15:04
It's distributed according to this very specific
15:06
distribution, which is somewhere in the realm
15:09
of all possible distributions, OK?
15:12
So now we're just going to try to carve out all this stuff
15:15
by making our assumptions.
15:18
OK.
15:19
So here in this particular example,
15:20
just make a mental note that what we're doing
15:23
is that I actually--
15:25
a little birdie told me that the reference number is 103.5, OK?
15:31
That was the thing I'm actually looking for.
15:34
In practice, it's actually seldom the case
15:36
that you have this reference for yourself to think of, right?
15:40
Maybe here, I just happen to have a full data
15:43
set of all the runners of 2009.
15:46
But if I really just asked you, I said,
15:50
were runners faster in 2012 than in 2009?
15:55
Here's $10 to perform your statistical analysis.
15:59
What you're probably going to do is called maybe 10 runners
16:01
from 2012, maybe 15 runners from 2009,
16:05
ask them and try to compare their mean.
16:07
There's no standard reference.
16:09
You would not be able to come up with this 103.5,
16:11
because these data maybe is expensive to get or something.
16:14
OK.
16:15
So this is really more of the standard case, all right?
16:18
Where you really compare two things with each other,
16:21
but there's no actual ground truth number
16:23
that you're comparing it to.
16:26
OK.
16:26
So we'll come back to that in a second.
16:28
I'll tell you what the other example looks like.
16:32
So let's just stick to this example.
16:34
I tell you it's 103.5, OK?
16:36
Let's try to have our intuition work the same way.
16:39
We said, well, averages worked well.
16:42
The average, tell me, of over these 10 guys
16:46
should tell me what the mean should be.
16:49
So I can just say, well x bar is going
16:52
to be close to the true mean by the law of large number.
16:55
So I'm going to decide whether x bar is less than 103.5.
17:00
And conclude that in this case, indeed mu is less than 103.5,
17:04
because those two quantities are close, right?
17:06
I could do that.
17:08
The problem is that this could go pretty wrong.
17:10
Because if n is small, then I know
17:13
that xn bar is not equal to mu.
17:17
I know that xn bar is close to mu.
17:19
But I also know that there's pretty high chance
17:21
that it's not equal to mu.
17:23
In particular, I know it's going to be somewhere at 1 over root
17:26
n away from mu, right?
17:28
1 over root n being the root coming from what?
17:31
17:34
CLT, right?
17:35
That's the root n that comes from CLT. In blunt words,
17:40
CLT tells me the mean is at distance
17:44
1 over root n from the expectation, pretty much.
17:47
That's what it's telling.
17:48
So 1 over root n.
17:49
17:52
If I have 10 people in there, 1 over root 10
17:55
is not a huge number, right?
17:57
It's like 1/3 pretty much.
18:00
So 1/3 103.5.
18:02
If the true mean was actually 103.4,
18:07
but my average was telling me it's 103.4 plus 1/3,
18:12
I would actually come to two different conclusions, right?
18:15
18:22
So let's say that mu is equal to 103.4, OK?
18:29
So you're not supposed to know this, right?
18:32
That's the hidden truth.
18:34
18:37
OK.
18:38
Now I have n is equal to 10.
18:40
So I know that x bar n minus 103.4
18:49
is something of the order of 1 over the square root of 10,
18:52
which is of the order of, say, 0.3.
18:57
OK.
18:58
So here, this is all hand wavy, OK?
19:01
But that's what the central limit theorem tells me.
19:05
What it means is that it is possible
19:13
that x bar n is actually equal to is actually
19:20
equal to 103.4 plus 0.3, which is equal to 103.7.
19:30
Which means that while the truth is that mu is less than 103.5,
19:40
then I would conclude that mu is larger than 103.5, OK?
19:47
And that's because I have not been very cautious, OK?
19:49
19:52
So what we want to do is to have a little buffer
19:56
to account for the fact that xn bar is not
19:58
a precise value for the true mu.
20:01
It's something that's 1 over root n away from you.
20:05
And so, what we want is the better heuristic that
20:07
says, well, if I want to conclude that I'm
20:09
less than 103.5, maybe I need to be less than 103.5
20:14
minus a little buffer that goes to 0 as my sample size goes
20:17
to infinity.
20:19
And actually, that's what the law of large number tells me.
20:22
The central limit theorem actually
20:23
tells me that this should be true,
20:26
something that goes to 0 as n goes to infinity
20:30
and the rate 1 over root n, right?
20:32
That's basically what the central limit theorem tells me.
20:36
So to make this intuition more precise,
20:39
we need to understand those fluctuations.
20:41
We need to actually put in something
20:43
that's more precise than these little wiggles here, OK?
20:47
We need to actually have the central limit theorem come in.
20:49
20:53
So here is the example of comparing two groups.
20:57
So pharmaceutical companies use hypothesis
21:00
testing to test if a drug is efficient, right?
21:03
That's what they do.
21:04
They want to know, does my new drug work?
21:06
And that's what the Federal Drug Administration office
21:09
is doing on a daily basis.
21:11
They ask for extremely well regulated clinical trials
21:18
on a thousand people, and check, does this drug
21:22
make a difference?
21:23
Did everybody die?
21:24
Does it make no difference?
21:27
Should people pay $200 for a pill of sugar, right?
21:30
So that's what people are actually asking.
21:33
So to do so, of course, there is no ground truth about--
21:36
so there's actually a placebo effect.
21:38
So it's not like actually giving a drug that does not work
21:41
is going to have no effect on patients.
21:44
It will have a small effect, but it's very hard to quantify.
21:47
We know that it's there, but we don't know what it is.
21:50
And so rather than saying, oh the ground truth
21:52
is no improvement, the ground truth is the placebo effect.
21:56
And we need to measure what the placebo effect is.
22:00
So what we're going to do is we're
22:01
going to split our patients into two groups.
22:04
And there's going to be what's called a test
22:06
group and a control group.
22:10
So the word test here is used in a different way
22:13
than hypothesis testing.
22:14
So we'll just call it typically the drug group.
22:17
And so, I will refer to mu drug for this guy, OK?
22:22
Now, this let's say this is a cough syrup, OK?
22:26
And when you have a cough syrup, the way
22:29
you measure the efficacy of a cough syrup
22:34
is to measure how many times you cough per minute, OK?
22:40
And so, if I define mu control the number
22:42
of expectoration per hour.
22:48
So just the expected number, right?
22:50
This is the number I don't know, because I don't have access
22:53
to the entire population of people that will ever
22:55
take this cough syrup.
22:57
And so, I will call it mu control for the control group.
23:00
So those are the people who have been actually given just
23:02
like sugar, like maple syrup.
23:05
And mu drug are those people who are given the actual syrup, OK?
23:09
And you can imagine that maybe maple syrup will have an effect
23:12
on expectorations per hour just because, well, it's just sweet
23:18
and it helps, OK?
23:19
And so, we don't know what this effect is going to be.
23:21
We just want to measure if the drug is actually
23:24
having just a better impact on expectorations
23:28
per hour than the just pure maple syrup, OK?
23:34
So what we want to know is if mu drug is less than mu control.
23:38
That would be enough.
23:39
If we had access to all the populations
23:41
that will ever take the syrup for all ages,
23:44
then we would just measure, did this have an impact?
23:46
And even if it's a slightly ever so small impact,
23:49
then it's good to release this cough syrup,
23:52
assuming that it has no side effects or anything like this,
23:55
because it's just better than maple syrup, OK?
23:58
The problem is that we don't have access to this.
24:00
And we're going to have to make this decision based on samples
24:03
that give me imprecise knowledge about mu drug and mu control.
24:09
So in this case, unlike the first case
24:10
where we compared an unknown expected value
24:13
to have a fixed number, which was one of the 103.5, here,
24:17
we're just comparing two unknown numbers with each other, OK?
24:20
So there's two sources of randomness.
24:22
Trying to estimate the first one.
24:23
And trying to estimate the second one.
24:25
24:29
Before I move on, I just wanted to tell you I apologize.
24:31
One of the graders was not able to finish grading his problem
24:34
sets for today.
24:35
So for those of you who are here just to pick up their homework,
24:39
feel free to leave now.
24:41
Even if you have a name tag, I will pretend I did not read it.
24:45
OK.
24:45
So I'm sorry.
24:47
You'll get it on Tuesday.
24:49
And this will not happen again.
24:53
OK.
24:54
So for the clinical trials, now I'm
24:56
going to collect information.
24:57
I'm going to collect the data from the control group.
25:00
And I'm going to collect data from the test group, all right?
25:03
So my control group here.
25:05
I don't have to collect the same number of people in the control
25:08
group than in the drug group.
25:09
Actually, for cough syrup, maybe it's not that important.
25:12
But you can imagine that if you think
25:14
you have the cure to a really annoying disease,
25:20
it's actually hard to tell half of the people you
25:23
will get a pill of nothing, OK?
25:26
People tend to want to try the drug.
25:28
They're desperate.
25:28
And so, you have to have this sort of imbalance
25:30
between who is getting the drug and who's not getting the drug.
25:35
And people have to qualify for the clinical trials.
25:37
There's lots of fluctuations that
25:39
affect what the final numbers of people who are actually
25:42
going to get the drug and are going to get the control
25:44
is going to be.
25:45
And so, it's not easy for you to make those two numbers equal.
25:49
You'd like to have those numbers equal if you can,
25:51
but not necessarily.
25:55
And by the way, this is all part of some mystical science called
25:59
"design of experiments."
26:00
And in particular, you can imagine
26:02
that if one of the series had higher variants,
26:04
you would want to like more people in this group
26:07
than the other group.
26:08
Yeah?
26:10
STUDENT: So when we're subtracting [INAUDIBLE]
26:13
something that [INAUDIBLE] 0 [INAUDIBLE] to be satisfied.
26:20
So that's on purpose [INAUDIBLE]..
26:22
PROFESSOR: Yeah, that's on purpose.
26:23
And I'll come to that in a second, all right?
26:25
So basically, we're going to make it
26:31
if your answer is, is this true?
26:34
We're going to make it as hard as possible, but no harder
26:39
for you to say yes to this answer.
26:41
Because, well, we'll see why.
26:43
26:45
OK, so now we have two set of data, the x's and the y's.
26:50
The x's are the ones for the drug.
26:51
And the y's are the data that I collected from the people, who
26:55
were just given a placebo, OK?
26:57
And so, they're all IID random variables.
26:59
And here, since it's the number of expectorations,
27:02
I'm making a blunt modeling assumption.
27:07
I'm just going to say it's Poisson.
27:08
And it's characterized only by the mean mu drug or the mean mu
27:11
control, OK?
27:13
I've just made an assumption here.
27:15
It could be something different.
27:16
But let's say it's a Poisson distribution.
27:19
So now what I want to know is to test whether mu drug is
27:21
less than mu control.
27:22
We said that already.
27:23
But the way we said it before was not as mathematical
27:26
as it is now.
27:27
Now we're actually making a test on the parameters
27:29
of Poisson distribution.
27:30
Whereas before, we were just making test
27:32
on expected numbers, OK?
27:36
So the heuristic-- again, if we try to apply the heuristic now.
27:39
Rather than comparing mu x bar drug to some fixed number,
27:42
I'm actually comparing x bar drug to some control.
27:46
But now here, I need to have something that accounts for,
27:48
not only the fluctuations of x bar drug,
27:51
but also for the fluctuations of x bar control, OK?
27:55
And so, now I need something that
27:56
goes to 0 when all those two things go to infinity.
27:59
And typically, it should go to zero with 1 over root of n drug
28:02
and 1 over square root of n control, OK?
28:06
That's what the central limit theorem for both x bar
28:08
drug and x bar control.
28:11
Two central limit theorems are actually telling.
28:15
OK.
28:15
And then we can conclude that this happens.
28:17
And as you said, we're trying to make it
28:19
a bit harder to conclude this.
28:21
Because let's face it.
28:23
If we were actually using two simple heuristic, right?
28:26
28:30
For simplicity, right?
28:31
28:35
So I can rewrite x bar drug less than x bar control
28:43
minus this something that goes to 0.
28:46
I can write it as x bar drug minus x bar control less
28:54
than something negative, OK?
28:57
This little something, OK?
29:00
So now let's look at those guys.
29:02
This is the difference of two random variables.
29:06
From the central limit theorem, they
29:08
should be approximately Gaussian each.
29:12
And actually, we're going to think
29:14
of them as being independent.
29:15
There's no reason why the people in the control group
29:18
should have any effect on what's happening
29:20
to the people in the test group.
29:21
Those people probably don't even know each other.
29:23
And so, when I look at this, this should look like n 0
29:27
with some mean and some variants,
29:28
let's say I don't know what it is, OK?
29:30
The mean I actually know.
29:31
It's mu drug minus mu control, OK?
29:37
So if they were to plot the PDF of this guy,
29:39
it would look like this.
29:41
I would have something which is centered
29:42
at mu drug minus mu control.
29:45
29:48
And it would look like this, OK?
29:51
Now let's say that mu drug is actually equal to mu control.
29:55
That this pharmaceutical company is a huge scam.
29:59
And they really are trying to sell bottled corn
30:04
syrup for $200 a pop, OK?
30:07
So this is a huge scam.
30:08
And the true things are actually equal to 0.
30:12
So this thing is really centered about 0, OK?
30:15
Now, if were not to do this, then basically, half
30:18
of the time I would actually come up
30:20
with a distribution that's above this value.
30:22
And half of the time I would have something that's
30:24
below this value, which would mean that half of the scams
30:27
would actually go through FDA if I did not do this.
30:31
So what I'm trying to do is to say, well, OK.
30:33
You have to be here, so that there is actually
30:35
a very low probability that just by chance
30:37
you end up being here.
30:40
And we'll make all the statements extremely precise
30:42
later on.
30:43
But I think the drug thing makes it
30:46
interesting to see why you're making it hard,
30:49
because You don't want to allow people
30:51
to sell a thing like that.
30:52
30:55
Before we go more into the statistical thinking associated
30:58
to tests, let's just see how we would
31:01
do this quantification, right?
31:02
I mean after all, this is what we probably
31:04
are the most comfortable with at this point.
31:07
So let's just try to understand this.
31:10
And I'm going to make the statisticians favorite test,
31:16
which is the thing that obviously you do at home all
31:19
the time every time you get a new quarter,
31:21
is testing whether it's a fair coin or not.
31:23
All right?
31:24
So this test, of course, exists only in textbooks.
31:27
And I actually did not write this slide.
31:30
I was lazy to just replace all this stuff
31:32
by the Cherry Blossom Run.
31:37
So you have a coin.
31:38
Now you have 80 observations, x1 to x80.
31:42
So n is equal to 80.
31:45
I have x1, xn, IID, Bernoulli p.
31:53
And I want to know if I have a fair coin.
31:55
So in mathematical language, I want
31:57
to know if p is equal to 1/2.
32:00
32:04
Let's say this is just the heads, OK?
32:07
And a biased coin?
32:09
Well, maybe you would potentially
32:10
be interested whether it's biased
32:11
one direction or the other.
32:13
But not being a fair coin is already somewhat
32:15
of a discovery, OK?
32:17
And so, you just want to know whether p is equal to 1/2
32:20
or p is not equal to 1/2, OK?
32:25
Now, if I were to apply the very naive first example
32:29
to not reject this hypothesis.
32:32
If I run this thing 80 times, I need
32:35
to see exactly 40 heads and 40 tales.
32:40
Now this is very unlikely to happen exactly.
32:43
You're going to have close to 40 heads and close to 40 tails,
32:47
but how close should those things be?
32:49
OK?
32:50
And so, the little something is going
32:52
to be quantified by exactly this, OK?
32:55
So now here, let's say that my experiment gave me 54 heads.
33:06
That's 54?
33:07
Yeah.
33:08
33:10
Which means that my xn bar is 54 over 80, which is 0.68.
33:21
All right?
33:21
So I have this estimator.
33:24
Looks pretty large, right?
33:26
It's much larger than 0.5, so it does look like,
33:29
and my mom would certainly conclude,
33:32
that this is a biased coin for sure,
33:34
because she thinks I'm tricky.
33:35
All right.
33:36
So the question is, can this be due to chance?
33:40
Can this be due to chance alone?
33:42
Like what is the likelihood that a fair coin would actually
33:45
end up being 54 times on heads rather than 40?
33:51
OK?
33:52
And so, what we do is we say, OK, I
33:55
need to understand, what is the distribution of the number
33:58
of times it comes on heads?
33:59
And this is going to be a binomial,
34:01
but it's a little annoying to play with.
34:02
So we're going to use the central limit theorem that
34:05
tells me that xn bar minus p divided
34:10
by square root of p1 minus p is approximately distributed
34:15
as an n01.
34:17
And here, since n is equal to 80,
34:18
I'm pretty safe that this is actually going to work.
34:21
34:28
And I can actually use [INAUDIBLE],,
34:33
and put xn bar here.
34:34
34:38
[INAUDIBLE] tells me that this is OK to do.
34:40
All right.
34:41
So now I'm actually going to compute this.
34:43
So here, I know this.
34:44
This is square root of 80.
34:46
This is a 0.68.
34:48
What is this value here?
34:50
We'll talk about it.
34:51
Well, we're trying to understand what happens
34:53
if it is a fair coin, right?
34:55
So if fair, then p is equal to 0.5, right?
35:02
So what I want to know is, what is the likelihood
35:05
that a fair coin would give me 0.68?
35:09
Let me finish.
35:10
All right.
35:11
What is the likelihood that a fair coin will
35:14
allow me to do this, so I'm actually allowed to plug-in p
35:17
to be 0.5 here?
35:19
Now, your question is, why do I not plug-in p to be 0.5?
35:25
But you can.
35:25
All right.
35:26
I just want to make you plug-in p at one specific point,
35:29
but you're absolutely right.
35:30
35:34
OK.
35:34
Let's forget about your question for one second.
35:37
So now I'm going to have to look at xn bar minus 0.5 divided
35:41
by xn bar 1 minus xn bar.
35:45
Then this thing is approximately Gaussian and 0,1
35:51
if the coin is fair.
35:52
35:56
Otherwise, I'm going to have a mean which is not zero here.
36:01
If the coin is something else, whatever I get here, right?
36:04
36:07
Let's just write it for one second.
36:09
36:23
Let's do it.
36:25
So what is the distribution of this if p--
36:27
so that's p is equal to 0.5.
36:33
OK?
36:33
Now if p is equal to 0.6, then this thing is just, well,
36:39
I know that this is equal to square root of n xn
36:43
bar minus 0.6, divided by xn bar 1
36:52
minus xn in the bar squared root, plus--
36:55
well, now the difference.
36:57
Is So square root of n, 0.6 minus
37:00
0.5, divided by square root of xn bar 1 minus xn bar, right?
37:07
Now if p is equal to 0.6, then this guy is n 0,1,
37:13
but this guy is something different.
37:17
This is just a number that depends on square root of n.
37:22
It's actually pretty large.
37:24
So if I want to use the fact that this guy has
37:28
a normal distribution, I need to plug-in the true value here.
37:33
Now, the implicit question that I got was the following.
37:38
It says, well, if you know what p is, then what's
37:43
actually true is also this.
37:46
If p is equal to 0.5, then since I
37:51
know that root n xn bar minus p divided by square root of p 1
37:57
minus p is some n 0, 1, it's also true
38:01
that square root of n xn bar minus 0.5
38:06
divided by square root of 0.5 1 minus 0.5 is n 0,1, right?
38:14
I know what p is.
38:15
I'm just going to make it appear.
38:18
OK.
38:19
And so, what's actually nice about this particular
38:22
[INAUDIBLE] experiment is that I can check if my assumption is
38:27
valid by checking whether I'm actually--
38:31
so what I'm going to do right now
38:32
is check whether this is likely to be a Gaussian or not, right?
38:36
And there's two ways I can violate it.
38:38
By violating mean, but also by violating the variance.
38:42
And here, what I did in the first case,
38:44
I said, well I'm not allowing you to check whether you've
38:46
violated the variance.
38:47
I'm just plugging whatever variance you're getting.
38:49
Whereas here, I'm saying, well, there's
38:51
two ways you can violate it.
38:52
And I'm just going to factor everything in.
38:55
So now I can plug-in this number.
38:58
So this is 80.
39:00
This is 0.68.
39:02
So I can compute all this stuff.
39:04
I can compute all this stuff here as well.
39:06
And what I get in this case, if I put the xn bar 1,
39:10
I get 3.45, OK?
39:15
And now I claim that this makes it
39:17
reasonable to reject the hypothesis that p
39:19
is equal to 0.5.
39:21
Can somebody tell me why?
39:22
39:27
STUDENT: It's pretty big.
39:28
PROFESSOR: Yeah, 3 is pretty big.
39:30
So it's very unlikely.
39:31
So this number that I should see should
39:33
look like the number I would get if I asked a computer to draw
39:39
one random Gaussian for me.
39:42
This number, when I draw one random Gaussian,
39:45
is actually a number with 99.9% this number will
39:49
be between negative 3 and 3.
39:52
With 78% it's going to be between negative 2 and 2.
39:55
40:01
68% is between minus 1 and 1.
40:04
And with like 90% it's between minus 2 and 2.
40:07
So getting a 3.45 when you do this
40:10
is extremely unlikely to happen, which
40:13
means that you would have to be extremely unlucky for this
40:17
to ever happen.
40:17
Now, it can happen, right?
40:19
It could be the case that you flip 80 coins and 80 of them
40:25
are heads.
40:27
With what probability does this happen?
40:29
40:32
1 over 2 to the 80, right?
40:34
Which is probably better off playing the lottery
40:39
with this kind of odds, right?
40:41
I mean, this is just not going to happen, but it might happen.
40:43
So we cannot remove completely the uncertainty, right?
40:48
It's still possible that this is due to noise.
40:53
But we're just trying to make all the cases that
40:55
are very unlikely go away, OK?
40:58
And so, now I claim that 3.45 is very unlikely for a Gaussian.
41:03
So if I were to draw the PDF of a standard Gaussian, right?
41:07
So n 0, 1, right?
41:09
So that's PDF of n 0, 1.
41:12
41:16
3.73 is basically here, OK?
41:21
So it's just too far in the tails.
41:25
Understood?
41:26
Now I cannot say that the probability that the Gaussian
41:30
is equal to 373 is small, right?
41:33
I just cannot say that, because it's 0.
41:35
And it's also 0 for the probability that it's 0,
41:37
even though the most likely values are around 0.
41:41
It's the continuous random variable.
41:44
Any value you give me, it's going
41:45
to happen with probability zero.
41:47
So what we're going to say is, well, the fluctuations
41:51
are larger than this number.
41:52
The probability that I get anything worse
41:55
than this is actually extremely small, right?
41:57
Anything worse than this is just like farther than 3.73.
42:00
And this is going to be what we control.
42:03
All right?
42:04
So in this case, I claim that it's quite reasonable
42:06
to reject the hypothesis.
42:07
Is everybody OK with this?
42:10
Everybody find this shocking?
42:12
Or everybody has no idea what's going on?
42:14
Do you have any questions?
42:16
Yeah?
42:17
STUDENT: Regarding the case of p, where
42:19
minus p isn't close to xn.
42:21
If you use 1 minus p as 0.5, then you're
42:24
dividing by a larger number than you would if you used xn.
42:28
So it feels like our true number is not 3.45.
42:32
It's something a little bit smaller
42:34
than 3.45 for the distribution to actually be like 1/2.
42:39
Because it seems like we're adding
42:40
an unnecessary extra error by using xn bar.
42:43
And we're adding an error that makes
42:45
it seem that our result was less likely than it actually was.
42:50
43:00
PROFESSOR: That's correct.
43:03
And you're right.
43:05
I didn't want to plug-in the p everywhere,
43:07
but you should plug it in everywhere you can.
43:09
That's for sure, OK?
43:11
So let's agree on that.
43:12
And that's true that it makes the number a little bigger.
43:15
You compute how much you would get,
43:16
we would get if we 0.5 there.
43:18
43:20
Well, I don't know what the square root of 80 is.
43:23
Can somebody compute quickly?
43:26
I'm not asking you to do it.
43:27
But what I want is two times square root of 80 times 0.18.
43:46
3.22
43:48
OK.
43:49
I can make the same cartoon picture with 3.22.
43:55
But you're right.
43:56
This is definitely more accurate.
43:57
And I should have done this.
43:58
I didn't want to get the confused message, OK?
44:02
All right.
44:02
So now here's a second example that you can think of.
44:07
So now I toss it 30 times.
44:11
Still in the realm of the central limit theorem.
44:17
I get 13 heads rather than 15.
44:23
So I'm actually much closer to being exactly at half.
44:27
So let's see if this is actually going
44:28
to give me a plausible value.
44:29
44:32
So I get 0.33 in average.
44:34
If the truth was 0.5, I would get something like 0.77.
44:40
And now I claim that 0.77 is a plausible realization
44:44
for some standard Gaussian, OK?
44:46
Now, 0.77 is going to look like it's here.
44:49
44:55
So that could very well be something that just
44:57
comes because of randomness.
44:59
And again, if you think about it.
45:01
If I told you, you were expecting 15, you saw 13,
45:06
you're happy to put that on the account of randomness.
45:09
Now of course, the question is going to be,
45:11
where do I draw the line?
45:12
Right?
45:13
Is 12 the right number?
45:15
Is 11?
45:16
Is 10?
45:17
What is it?
45:18
45:21
So basically, the answer is it's whatever you want to be.
45:24
The problem it's hard to think on the scale, right?
45:28
What does it mean to think on the scale?
45:30
If I can't think in this scale, I'm
45:31
going to have to think on the scale of 80 of them.
45:33
I'm going to have to think on the scale of running 100 coin
45:38
flips.
45:39
And so, this scale is a moving target all the time.
45:43
Every time you have a new problem,
45:44
you have to have a new skill in mind.
45:45
And it's very difficult.
45:47
The purpose of statistical analysis,
45:50
and in particular this process that content
45:53
that takes your x bar and turns it
45:55
into something that should be standard Gaussian,
45:58
allows you to map the value of x bar
46:01
into a scale that is the standard scale of the Gaussian.
46:06
All right?
46:07
Now, all you need to have in mind
46:09
is, what is a large number or an unusually large number
46:13
for a Gaussian?
46:14
That's all you need to know.
46:15
46:18
So here, by the way, 0.77 is not this one,
46:21
because it was actually negative 0.77.
46:26
So this one.
46:28
OK.
46:29
So I can be on the right or I can be on the left of zero.
46:34
But they are still plausible.
46:36
So understand you could actually have in mind
46:40
all the values that are plausible for a Gaussian
46:42
and those that are not plausible,
46:43
and draw the line based on what you think is the right number.
46:46
So how large should a positive value of a Gaussian to become
46:49
unreasonable for you?
46:52
Is it 1?
46:54
Is it 1.5?
46:56
Is it 2?
46:56
Stop me when I get there.
46:57
Is it 2.5?
46:59
Is it 3?
47:00
STUDENT: I think 2.5 is definitely too big.
47:02
PROFESSOR: What?
47:03
STUDENT: Doesn't it depend on our prior?
47:04
Let's say we already have really good evidence
47:06
at this point [INAUDIBLE]
47:09
PROFESSOR: Yeah, so this is not Bayesian statistics.
47:12
So there's no such thing as a prior right now.
47:14
We'll get there.
47:15
You'll have your moment during one short chapter.
47:18
47:23
So there's no prior here, right?
47:25
It's really a matter of whether you think
47:27
is a Gaussian large or not.
47:28
It's not a matter of coins.
47:30
It's not a matter of anything.
47:31
Now I've just reduced it to just one question.
47:33
So forget about everything we just said.
47:36
And I'm asking you, when do you decide
47:38
that a number is too large to be reasonably drawn
47:43
from a Gaussian?
47:44
And this number is 2 or 1.96.
47:50
And that's basically the number that you get from this quintel.
47:53
We've seen the 1.96 before, right?
47:55
It's actually q alpha over 2, where alpha is equal to 5%.
47:59
That's a quintel of a Gaussian.
48:01
So actually, what we do is we map it again.
48:05
So are now at the Gaussians.
48:06
And then we map it again into some probabilities,
48:08
which is the probability of being farther than this thing.
48:10
And now probabilities, we can think.
48:12
Probability is something that quantifies my error.
48:15
And the question is what percentage of error
48:17
am I willing to tolerate.
48:18
And if I tell you 5%, that's something
48:20
you can really envision.
48:21
What it means is that if I were to do this test a million
48:24
times, 5% of the time I would expose myself
48:28
to making a mistake.
48:30
All right.
48:30
That's all it would say.
48:31
If you said, well, I don't want to account for 5%,
48:36
maybe I want 1%, then you have to move from 1.94 to 2.5.
48:42
And then if you say at I want 0.01%,
48:44
then you have to move to an even larger number.
48:47
So it depends.
48:48
But stating this number 1%, 5%, 10%
48:51
is much easier than seeing those numbers 1.96, 2.5, et cetera.
48:57
So we're just putting everything back on the scale.
49:00
All right.
49:01
To conclude, this, again, as we said,
49:03
does not suggest that the coin is unfair.
49:05
Now, it might be that the coin is unfair.
49:08
We just don't have enough evidence to say that.
49:10
And that goes back to your question about,
49:12
why are we siding with the fact that we're
49:17
making it harder to conclude that the runners were faster?
49:22
And this is the same thing.
49:23
We're making it harder to conclude
49:24
that the coin is biased.
49:26
Because there is a status quo.
49:28
And we're trying to see if we have evidence
49:30
against the status quo.
49:31
The status quo for the runners is they ran the same speed.
49:35
The status quo for the coin, we can probably all agree
49:37
is that the coin is fair.
49:39
The status quo for a drug?
49:41
I mean, again, unless you prove me
49:43
that you're actually not a scammer
49:45
is that the status quo is that this is maple syrup.
49:48
There's nothing in there.
49:50
Why would you?
49:51
I mean, if I let you get away with it,
49:53
you would put corn syrup.
49:55
It's cheaper.
49:58
OK.
49:59
So now let's move on to math.
50:01
All right.
50:01
So when I started doing mathematics,
50:04
I'm going to have to talk about random variables
50:06
and statistical models.
50:08
And here, there is actually a very simple thing,
50:13
which actually goes back to this picture.
50:15
50:18
A test is really asking me if my parameter
50:27
is in some region of the parameter set
50:28
or another region of the parameter set, right?
50:30
Yes/no.
50:32
And so, what I'm going to be given is a sample, x1, xn.
50:37
I have a model.
50:38
50:41
And again, those can be braces depending on the day.
50:46
And so, now I'm going to give myself theta 0 and theta 1
50:54
to this joint subset.
50:55
50:58
OK.
51:01
So capital theta here is the space
51:02
in which my parameter can live.
51:05
To make two disjoint subsets, I could just
51:06
split this guy in half, right?
51:11
I'm going to say, well, maybe it's this guy and this guy.
51:13
OK.
51:14
So this is theta 0.
51:16
And this is theta 1.
51:18
What it means when I split those two guys, in test,
51:22
I'm actually going to focus only on theta 0 or theta 1.
51:25
And so, it means that a priori I've already
51:28
removed all the possibilities of theta being in this region.
51:32
What does it mean?
51:33
Go back to the example of runners.
51:37
This region here for the Cherry Blossom Run
51:41
is the set of parameters, where mu was larger
51:44
than 103.5, right?
51:47
We removed that.
51:48
We didn't even consider this possibility.
51:49
We said either it's less--
51:52
sorry.
51:53
That's mu equal to 103.5.
51:55
And this was mu less than 103.5, OK?
51:59
But these guys were like if it happens, it happens.
52:03
I'm not making any statement about that case.
52:06
All right?
52:07
So now I take those two subsets.
52:09
And now I'm going to give them two different names,
52:11
because they're going to have an asymmetric role.
52:15
h0 is the null hypothesis.
52:18
And h1 is the alternative hypothesis.
52:23
h0 is the status quo.
52:27
h1 is what is considered typically
52:29
as scientific discovery.
52:32
So if you're a regulator, you're going to push towards h0.
52:36
If you're a scientist, you're going to push towards h1.
52:39
If you're a pharmaceutical company,
52:41
you're going to push towards h1.
52:42
OK?
52:43
And so, depending on whether you want to be conservative-- oh,
52:47
I can find evidence in a lot of data.
52:49
As soon as you give me three data points,
52:50
I'm going to be able to find evidence.
52:52
That means I'm going to tend to say, oh, it's h1.
52:55
But if you say you need a lot of data
52:58
before you can actually move away from the status quo,
53:00
that's age h0, OK?
53:01
So think of h0 as being status quo,
53:03
h1 being some discovery that goes against the status quo.
53:08
All right?
53:08
So if we believe that the truth theta is either
53:12
in one of those, what we say is we want to test h0 against h1.
53:17
OK.
53:17
This is actually wording.
53:19
So remember, because this is how your questions are
53:22
going to be formulated.
53:23
And this is how you want to probably communicate
53:26
as a statistician.
53:27
So you're going to say I have the null
53:29
and I have an alternative.
53:30
I want to test h0 against h1.
53:32
I want to test the null hypothesis
53:34
against the alternative hypothesis, OK?
53:36
53:39
Now, the two hypotheses I forgot to say are actually this.
53:42
h0 is that the theta belongs to theta 0.
53:46
And h1 is that it theta belongs to theta 1.
53:50
OK.
53:51
So here, for example, theta was mu.
53:53
And that was mu equal to 103.5.
53:57
And this was mu less than 103.5.
54:01
OK?
54:02
So typically, they're not going to look like thetas and things
54:06
like that.
54:06
They're going to look like very simple things, where you take
54:09
your usual notation for your usual parameter
54:11
and you just say in mathematical terms what relationship this
54:15
should be satisfying, right?
54:16
For example, in the drug example,
54:18
that would be mu drug is equal to mu control.
54:25
And here, that would be mu drug less than mu control.
54:30
The number of expectorations for people
54:34
who take the drug for the cough syrup
54:35
is less than the number of expectoration of people
54:38
who take the corn syrup, OK?
54:42
So now what we want to do.
54:45
54:47
We've set up our hypothesis testing problem.
54:51
You're a scientist.
54:52
You've set up your problem.
54:55
Now what you're going to do is collect data.
54:58
And what you're going to try to find on this data
55:00
is evidence against h0.
55:04
And the alternative is going to guide you
55:06
into which direction you should be looking
55:08
for evidence against this guy.
55:10
All right?
55:11
And so, of course, the narrower the alternative,
55:13
the easier it is for you, because you just
55:15
have to look at the one possible candidate, right?
55:19
But typically, h1 is a big group, like less than.
55:22
Nobody tells you it's either it's 103.5 and 103.
55:27
People tell you it's either 103.5 or less than 103.5.
55:32
OK.
55:33
And so, what we want to do is to decide whether we reject h0.
55:37
So we look for evidence against h0 in the data, OK?
55:40
55:44
So as I said, h0 and h1 do not play a symmetric role.
55:48
It's very important to know which one you're
55:51
going to place as h0 and which one you're
55:53
going to place at h1.
55:54
55:59
If it's a close call, you're always
56:01
going to side with h0, OK?
56:04
So you have to be careful about those.
56:05
You have to keep that in mind that if it's
56:07
a close call, if data does not carry a lot of evidence,
56:10
you're going to side with h0.
56:12
And so, you're actually never saying that h0 is true.
56:15
You're just saying I did not find evidence against h0.
56:18
You don't say I accept that h0.
56:21
You say I failed to reject h0.
56:25
OK.
56:26
And so one of the things that you
56:28
want to keep in mind when you're doing this
56:29
is this innocent until proven guilty.
56:32
So if you come from a country, like America,
56:37
there's such a thing.
56:38
And in particular, lack of evidence
56:41
does not mean that you are not guilty, all right?
56:45
OJ Simpson was found not guilty.
56:47
It was not found innocent, OK?
56:50
And so, this is basically what happens
56:52
is like the prosecutor brings their evidence.
56:55
And then the jury has to decide whether they
56:58
were convinced that this person was guilty of anything.
57:07
And the question is, do you have enough evidence?
57:11
But if you don't have evidence, it's
57:13
not the burden of the defender to prove that they're innocent.
57:17
Nobody's proving their innocent.
57:18
I mean, sometimes it helps.
57:20
But you just have to make sure that there's not
57:22
enough evidence against you, OK?
57:24
And that's basically what it's doing.
57:26
You're h0 until proven h1.
57:28
57:31
So how are we going to do this?
57:32
Well, as I said, the role of estimators
57:37
in hypothesis testing is played by something called tests.
57:40
And a test is a statistic.
57:42
Can somebody remind me what a statistic is?
57:44
57:47
Yep?
57:48
STUDENT: The measure [INAUDIBLE]
57:50
PROFESSOR: Yeah, that's actually just one step more.
57:53
So it's a function of the observations.
57:54
And we require it to be measurable.
57:56
And as a rule of thumb, measurable
57:58
means if I give you data, you can actually compute it, OK?
58:00
If you don't see a [INAUDIBLE] or an [INAUDIBLE],,
58:02
you don't have to think about it.
58:04
All right.
58:04
58:08
And so, what we do is we just have this test.
58:11
But now I'm actually asking only from this test
58:14
a yes/no answer, which I can code as 0, 1, right?
58:18
So as a rule of thumb, you say that, well, the test
58:21
is equal to 0 then h0.
58:23
The test is equal to 1 at h1.
58:25
And as we said, is that if the test is equal to 0,
58:27
it doesn't mean that a 0 is truth.
58:29
It means that I feel to rejected h0.
58:31
And if the test is equal to 1, I reject h0.
58:33
58:36
So I have two possibilities.
58:38
I look at my data.
58:39
I turn it into a yes/no answer.
58:41
And yes/no answer is really h0 or h1, OK?
58:45
Which one is the most likely basically.
58:49
All right.
58:50
So in the coin flip example, our test statistic
58:57
is actually something that takes value 0, 1.
59:00
And anything, any function that takes value at 0,
59:04
1 is an indicator function, OK?
59:07
So an indicator function is just a function.
59:11
So there's many ways you can write it.
59:13
59:18
So it's a 1 with a double bar.
59:20
If you aren't comfortable with this,
59:21
it's totally OK to write i of something, like i of a.
59:27
OK.
59:28
And that's what?
59:29
So a, here, is a statement, like an inequality, an equality,
59:34
some mathematical statement, OK?
59:38
Or not mathematical.
59:39
I mean, "a" can be, you know, my grandma is 20 years old, OK?
59:43
And so, this is basically 1 if a is true, and 0 if a is false.
59:50
59:54
That's the way you want to think about it.
59:56
60:02
This function takes only two values, and that's it.
60:05
60:10
So here's the example that we had.
60:12
We looked at whether the standardized xn
60:17
bar, the one that actually is approximately n 0,1
60:20
was larger than something in absolute value,
60:22
either very large or very small, but negative.
60:27
I'm going back to this picture.
60:29
We wanted to know if this guy was
60:31
either to the left of something or to the right of something,
60:35
right?
60:36
Was it in these regions?
60:37
60:42
Now this indicator, I can view this as a function of x bar.
60:49
What it does, it really splits the possible values
60:52
of x bar, which is just a real number, right?
60:54
In two groups.
60:56
The groups on which they lead to a value, which is 1.
60:59
And the groups on which they lead
61:00
to value, which is 0, right?
61:02
So what it does is that I can actually think
61:05
of it as the real line, x bar.
61:09
And there's basically some values here,
61:13
where I'm going to get a 1.
61:14
Maybe I'm going to get a 0 here.
61:16
Maybe I'm going to get a 0.
61:17
Maybe I'm going to get a 1.
61:18
I'm just splitting all possible values of x bar.
61:22
And I see whether to spit out the side which is 0
61:25
or which is 1.
61:26
In this case, it's not clear, right?
61:29
I mean, the function is very nonlinear.
61:31
It's x bar minus 0.5 divided by the square root of x bar 1
61:34
minus x bar.
61:35
If we put the p in the denominator,
61:36
that would be clear.
61:38
That would just be exactly something that looks like this.
61:40
61:45
The function would be like this.
61:46
It would be 1 if it's smaller than some value.
61:49
Less than 0 if it's in between two values.
61:52
And then 1 again.
61:54
So that's psi, OK?
62:00
So this is 1, right?
62:02
This is 1.
62:03
And this is 0.
62:04
So if x bar is too small or if x bar is too large,
62:07
then I'm getting a value 1.
62:09
But if it's somewhere in between, I'm getting a value 0.
62:12
Now, if I have this weird function,
62:14
it's not clear how this happened.
62:18
So the picture here that I get is
62:20
that I have a weird non-linear function, right?
62:27
So that's x bar.
62:28
That's square root of n x bar n 0.5
62:32
divided by the square root of x bar n 1 minus x bar n, right?
62:36
That's this function.
62:38
A priori, I have no idea what this function looks like.
62:40
62:43
We can probably analyze this function,
62:45
but let's pretend we don't know.
62:46
So it's like some crazy stuff like this.
62:49
And all I'm asking is whether in absolute value
62:56
it's larger than c, which means that is this function larger
62:59
than c or less than minus c?
63:01
63:05
The intervals on which I'm going to say 1
63:07
are this guy, this guy, this guy, and this guy.
63:17
OK.
63:18
And everywhere else, I'm seeing 0.
63:20
Everybody agree with this?
63:21
This is what I'm doing.
63:24
Now of course, it's probably easier for you
63:27
to just package it into this nice thing that's
63:29
just either larger than c, an absolute value,
63:31
or less Than C. I want to have to plot this function.
63:33
In practice, you don't have to.
63:36
Now, this is where I am actually claiming.
63:40
So here, I actually defined to you a test.
63:42
And I promised, starting this lecture, by saying,
63:44
oh, now we're going to do something better
63:46
than computing the averages.
63:47
Now I'm telling you it's just computing an average.
63:50
And the thing is the test is not just
63:52
the specification of this x bar.
63:54
It's also the specification of this constant c.
63:57
All right?
63:58
And the constant c was exactly where
64:02
our belief about what a large value for a Gaussian is.
64:07
That's exactly where it came in.
64:09
So this choice of c is basically a threshold
64:12
at which we decide above this threshold this isn't
64:15
likely to come from a Gaussian.
64:17
Below this threshold we decide that it's
64:18
likely to come from a Gaussian.
64:20
So we have to choose what this threshold is based
64:24
on what we think likely means.
64:26
64:34
Just a little bit more of those things.
64:37
So now we're going to have to characterize
64:39
what makes a good test, right?
64:43
Well, I'll come back to it in a second.
64:44
But you could have a test that says reject all the time.
64:48
And that's going to be bad test, right?
64:50
The FDA is not implementing a test
64:52
that says, yes all drugs work, now let's just go to Aruba, OK?
64:56
So people are trying to have something that
64:59
tries to work all the time.
65:01
Now FDA's not either saying, let's just
65:03
say that no drugs work, and let's go to Aruba, all right?
65:07
They're just trying to say the right thing
65:09
as often as possible.
65:11
And so, we're going to have to measure this.
65:13
So the things that are associated to a test
65:15
are the rejection region.
65:17
And if you look at this x in en, such that psi
65:21
of x is equal to 1, this is exactly this guy that I drew.
65:25
65:29
So here, I summarized the values of the sample
65:30
into their average.
65:32
But the values of the sample that I collect
65:35
will lead to a test that says 1.
65:38
All right?
65:39
So this is the rejection region.
65:40
If I collect a data point, technically I have--
65:43
so I have e to the n, which is a big space like this.
65:51
So that's e to the n.
65:52
Think of it as being the space of xn bars.
65:55
And I have a function that takes only value 0, 1.
65:59
So I can decompose it into this part
66:01
where it takes value 0 and the part where it takes value 1.
66:04
And those can be super complicated, right?
66:06
Can have a thing like this.
66:07
Can have some weird little islands where it takes value 1.
66:11
I can have some islands where it's takes value 0.
66:14
I can have some weird stuff going on.
66:16
But I can always partition it into the value
66:18
where it takes value 0 and the value where it takes value 1.
66:20
And the value where it takes 1, if psi is equal to 1,
66:25
this is called the rejection region of the plot, OK?
66:32
So just the samples that would lead me to rejecting.
66:37
And notice that this is the indicator of the rejection
66:42
region.
66:44
The test is the indicator of the rejection region.
66:48
So there's two ways you can make an error when there's a test.
66:52
Either the truth is in h1, and you're saying actually it's h1.
66:56
Or the truth is in h1, and you say it's h0.
66:59
And that's how we build-in the asymmetry between h0 and h1.
67:04
We control only one of the two errors.
67:06
And we hope for the best for the second one.
67:09
So the type 1 error is the one that says, well,
67:13
if it is actually the status quo, but a claim
67:16
that there is a discovery-- if it's actually h0,
67:19
but I claim that I'm in h1, then I
67:21
admit I commit a type I error.
67:25
And so the probability of type I error
67:27
is this function alpha of psi, which
67:29
is the probability of saying that psi is equal to 1
67:34
when theta is in h0.
67:36
Now, the problem is that this is not just number,
67:38
because theta is just like moving all over h0, right?
67:41
There's many values that theta can be, right?
67:45
So theta is somewhere here.
67:48
67:52
I erased it, OK.
67:53
67:59
All right.
68:00
For simplicity, we're going to think of theta
68:02
as being mu and 103.5, OK?
68:07
And so, I know that this is theta 1.
68:11
And just this point here was theta 0, OK?
68:18
Agreed?
68:19
This is with the Cherry Blossom Run.
68:22
Now, here in this case, it's actually easy.
68:25
I need to compute this function alpha
68:27
of psi, which maps theta in theta 0 to p theta of psi
68:37
equals 1.
68:37
So that's the probability that I reject when theta is in h0.
68:40
Then there's only one of them to compute,
68:42
because theta can only take this one value.
68:44
So this is really 103.5.
68:46
OK.
68:47
So that's the probability that I reject
68:48
when the true mean was 103.5.
68:52
Now, if I was testing whether--
68:54
if h0 was this entire guy here, all the
68:57
values larger than 103.5, then I would
69:00
have to compute this function for all possible values
69:03
of the theta in there.
69:06
And guess what?
69:07
The worst case is when it's going to be here.
69:09
Because it's so close to the alternative
69:11
that that's where I'm making the most error possible.
69:15
And then there's the type 2 error,
69:17
which is defined basically in the symmetric ways.
69:22
The function that maps theta to the probability.
69:25
So that's the probability of type 2 errors.
69:26
The probability that I fail to reject h0, right?
69:30
If psi is equal to 0, I fail to reject h0.
69:34
But that actually came from h1, OK?
69:39
So in this example, let's clear.
69:41
If I'm here, like if the true mean was 100,
69:45
I'm looking at the probability that the true mean is actually
69:48
100, and I'm actually saying it was 103.5.
69:51
Or it's not less than 103.5.
69:53
Yeah?
69:54
STUDENT: I'm just still confused by the notation.
69:56
When you say that [INAUDIBLE] theta sub 1 arrow r,
69:59
I'm not sure what that notation means.
70:02
PROFESSOR: Well, this just means it's a function
70:04
that maps theta 0 to r.
70:08
You've seen functions, right?
70:09
OK.
70:10
So that's just the way you write.
70:14
So that means that's a function f that goes from, say, r r,
70:20
and that maps x to x squared.
70:25
OK.
70:25
So here, I'm just saying I don't have to consider
70:27
all possible values.
70:29
I'm only considering the values on theta 0.
70:32
I put r actually.
70:33
I could restrict myself to the interval 0, 1,
70:36
because those are probabilities.
70:38
So it's just telling me where my function comes from
70:41
and where my function goes to.
70:44
And beta is a function, right?
70:47
So beta psi of theta is just the probability
70:52
that theta is equal to 1.
70:55
And I could define that for all thetas--
70:57
sorry.
70:58
If psi is equal to 0 in this case.
71:00
And that could define that for all thetas.
71:02
But the only ones that lead to an error
71:05
are the thetas that are in h1.
71:06
I mean, I can define this function.
71:08
It's just not going to correspond to an error, OK?
71:11
71:13
And the power of a test is the smallest--
71:18
so the power is basically 1 minus an error.
71:22
1 minus the probability of an error.
71:23
So it's the probability of making a correct decision, OK?
71:27
So it's the probability of making a correct decision
71:29
under h1, that's what the power is.
71:31
But again, this could be a function.
71:34
Because there's many ways that can be in h1
71:36
if h1 is an entire set of numbers.
71:39
For example, all the numbers there are less than 103.5.
71:42
And so, what I'm doing here when I define the power of a test,
71:45
I'm looking at the smallest possible of those values, OK?
71:50
So I'm looking at this function.
71:51
71:54
Maybe I should actually expand a little more on this.
71:57
72:02
OK.
72:03
So beta psi of theta is the probability under theta
72:10
that psi is equal to 0, right?
72:12
That's the probability in theta 1,
72:18
which means then the alternative, that they
72:21
feel to reject.
72:21
And I really should, because theta
72:23
was actually in theta 1, OK?
72:25
So this thing here is the probability of type 2 error.
72:29
Now, this is 1 minus the probability that I did reject
72:34
and I should have rejected.
72:36
That's just a little off the complement.
72:39
Because if psi is not equal to 0, then it's equal to 1.
72:42
So now if I rearrange this, it tells me
72:44
that the probability that psi is equal to 1--
72:48
this is actually 1 minus beta psi of theta.
72:50
72:54
So that's true for all thetas in theta 1.
72:57
And what I'm saying is, well, this
72:58
is now a good thing, right?
73:00
This number being large is a good thing.
73:02
It means I should have rejected, and I rejected.
73:05
I want this to happen with large probability.
73:07
And so, what I'm going to look at
73:09
is the most conservative choice of this number, right?
73:11
Rather than being super optimistic
73:13
and say, oh, but indeed if theta was actually equal to zero,
73:16
then I'm always going to conclude that--
73:19
I mean, if mu is equal to 0, everybody runs in 0 seconds,
73:22
then I with high probability I'm actually
73:25
going to make no mistake.
73:27
But really, I should look at the worst possible case, OK?
73:30
So what I'm looking at is basically
73:32
the smallest value it can take on theta one
73:45
is called power of psi.
73:53
Power of the test psi, OK?
73:55
So that's the smallest possible value it can take.
73:58
74:01
All right.
74:02
So I'm sorry.
74:02
This is a lot of definitions that you have to sink in.
74:05
And it's not super pleasant.
74:06
But that's what testing is.
74:09
There's a lot of jargon.
74:10
Those are actually fairly simple things.
74:12
Just maybe you should get a sheet for yourself.
74:14
And say, these are the new terms that I learned.
74:17
What is their test, rejection region?
74:19
Probably of type I error, probably
74:21
of type 2 error, and power.
74:22
Just make sure you know what those guys are.
74:23
Oh.
74:24
And null and alternative hypothesis, OK?
74:27
And once you know all these things,
74:28
you know what I'm talking about.
74:29
You know what I'm referring to.
74:31
And this is just jargon.
74:33
But in the end, those are just probabilities.
74:35
I mean, these a natural quantities.
74:38
Just for some reason, people have been used
74:40
to using different terminology.
74:43
So just to illustrate.
74:46
When do I make a typo 1 error?
74:48
And when do I not make a type 1 error?
74:51
So I make a type 1 error if h0 is true and I reject h0, right?
74:56
So the off diagonal blocks are when I make an error.
74:59
When I'm on the diagonal terms, h1 is true
75:02
and I reject h0, that's a correct decision.
75:05
When h0 is true and I fail to reject h0,
75:08
that's also the correct decision to make.
75:11
So I only make errors when I'm in one of the red blocks.
75:17
And one block is the type 1 error and the other block
75:20
is the type 2 error.
75:21
That's all it means, OK?
75:24
So you just have to know which one we called one.
75:26
75:32
I mean, this was chosen in a pretty ad hoc way.
75:36
So to conclude this lecture, let me ask you a few questions.
75:40
If in a US court, the defendant is found either say,
75:46
let's just say for the sake of discussion, innocent or guilty.
75:49
All right?
75:50
It's really guilty for not guilty,
75:51
but let's say innocent or guilty.
75:53
When does the jury make a type 1 error?
75:56
76:03
Yep?
76:03
76:07
And he's guilty?
76:10
And he's innocent, right?
76:11
The status quo, everybody is innocent until proven guilty.
76:14
So that's our h0 is that the person is innocent.
76:18
And so, that means that h0 is innocent.
76:21
And so, we're looking at the probably of type 1 error,
76:23
so that's when we reject the fact that it's innocent.
76:25
So conclude that this person is guilty, OK?
76:27
So type 1 error is when this person is innocent
76:29
and we conclude it's guilty.
76:31
What is the type 2 error?
76:32
76:36
Letting a guilty person go free, which
76:38
actually according to the constitution,
76:40
is the better of the two.
76:42
All right?
76:42
So what we're going to try to do is to control the first one,
76:45
and hope for the best for the second one.
76:47
How could the jury make sure that they make no type 1
76:51
error ever?
76:52
76:57
Always let the guy go free, right?
77:01
What is the effect on the type 2 error?
77:03
77:06
Yeah, it's the worst possible, right?
77:08
I mean, basically, for every guy that's guilty, you let them go.
77:12
That's the worst you can do.
77:14
And same thing, right?
77:15
How can the jury make sure that there's no type 2 error?
77:20
Always convict.
77:21
What is the effect on the American budget?
77:22
What is the effect on the type 1 error?
77:25
77:28
Right.
77:28
So the effect is that basically the type 1 error is maximized.
77:31
So there's this trade off between type 1
77:33
and type 2 error that's inherent.
77:35
And that's why we have this sort of multi objective thing.
77:39
We're trying to minimize two things at the same time.
77:41
And I can't find many ad hoc ways, right?
77:44
So if you've taken any optimization,
77:46
trying to optimize two things when one is going up
77:49
while the other one is going down, the only thing you can do
77:51
is make ad hoc heuristics.
77:53
Maybe you try to minimize the sum of those two guys.
77:55
Maybe you try to minimize 1/3 of the first guy
77:59
plus 2/3 of the second guy.
78:00
Maybe you try to minimize the first guy plus the square
78:03
of the second guy.
78:04
You can think of many ways, but none of them
78:05
is more justified than the other.
78:07
However, for statistical hypothesis testing,
78:10
there's one that's very well justified, which is just
78:12
constrain your type 1 error to be the smallest,
78:15
to be at a level that you deem acceptable.
78:18
5%.
78:18
78:24
I want to convict at most 5% of innocent people.
78:27
That's what I deem reasonable.
78:29
And based on that, I'm going to try to convict as many people
78:33
as they can, all right?
78:37
So that's called the Nieman Pearson paradigm,
78:39
and we'll talk about it next time.
78:42
All right.
78:43
Thank you.
78:44