字幕記錄 00:00 00:00 PROFESSOR: The following content is provided under a Creative 00:02 Commons license. 00:03 Your support will help MIT OpenCourseWare 00:06 continue to offer high quality educational resources for free. 00:10 To make a donation or to view additional materials 00:12 from hundreds of MIT courses, visit MIT OpenCourseWare 00:16 at ocw.mit.edu. 00:17 00:21 So welcome back. 00:23 So we are now moving to a new chapter, which 00:26 is going to have a little more of a statistical flavor 00:30 when it comes to designing methods, all right? 00:32 Because if you think about it, OK-- 00:35 some of you have probably attempted problem number two 00:39 in the problem set. 00:39 And you realize that maximum likelihood estimators does not 00:44 give you super trivial estimators, right? 00:48 I mean, when you have an n theta theta, then the thing you get 00:50 is not something you could have guessed before you actually 00:53 attempted to solve that problem. 00:55 And so, in a way, we've seen already sophisticated methods. 00:59 However, in many instances, the maximum likelihood estimator 01:02 was just an average. 01:03 And in a way, even if we had this confirmation 01:07 for maximum likelihood that indeed that was the estimator 01:09 that maximum likelihood would spit out, 01:11 and that our intuition was therefore pretty good, 01:15 most of the statistical analysis or use of the central limit 01:18 theorems, all these things actually 01:20 did not come in the building of estimator, 01:23 in the design of the estimator, but really 01:25 in the analysis of the estimator. 01:27 And you could say, well, if I know already 01:29 that the best estimator is the average, 01:31 I'm just going to use the average. 01:32 I don't have to, basically, quantify how good it is. 01:35 I just know it's the best I can do. 01:37 We're going to talk about tests. 01:39 And we're going to talk about parametric hypothesis testing. 01:44 So you should view this as-- parametric means, 01:46 well, it's about a parameter, like we did before. 01:49 And hypothesis testing is on the same level as estimation. 01:54 And on the same level as estimator 01:56 will be the word "test," OK? 01:58 And when we're going to devise a test, 02:01 we're going to actually need to understand 02:03 random fluctuations that arise from the central limit theorem 02:06 better, OK? 02:06 It's not just going to be in the analysis. 02:08 It's also going to be in the design. 02:10 And everything we've been doing before in understanding 02:12 the behavior of an estimator is actually 02:14 going to come in and be extremely 02:16 useful in the actual design of tests, OK? 02:21 So as an example, I want to talk to you about some real data. 02:25 02:28 I will not study this data. 02:29 But this data actually exist. 02:31 You can find it on R. And so, it's 02:34 the data from the so-called credit union Cherry Blossom 02:37 Run, which is a 10 mile race. 02:38 It takes place every year in D.C. 02:40 It seems that some of the years are pretty nice. 02:42 In 2009, there were about 15,000 participants. 02:45 Pretty big race. 02:47 And the average running time was 103.5 minutes, all right? 02:52 So about an hour and a half or a little bit more. 02:57 And so, you can ask the following question, right? 03:01 This is actual data, right? 03:02 103.5 actually averaged the running time for all of 15,000. 03:07 Now, this in practice, may not be something very suitable. 03:10 And you might want to just sample a few runners 03:13 and try to understand how they're 03:15 behaving every year without having 03:16 to collect the entire data set. 03:18 And so, you could ask the question, well, 03:20 let's say my budget is to ask for maybe 10 runners 03:24 what their running time was. 03:25 I still want to be able to determine 03:27 whether they were running faster in 2012 than in 2009. 03:31 Why do I put 2012, and not 2016? 03:34 Well, because the data set for 2012 is also available. 03:38 So if you are interested and you know how to use R, 03:41 just go and have fun with it. 03:44 So to answer this question, what we do is we select n runners, 03:47 right? 03:47 So n is a moderate number that's more manageable than 15,000. 03:51 From the 2012 race at random. 03:53 That's where the random variable is going to come from, right? 03:56 That's where we actually inject randomness in our problem. 03:58 04:02 So remember this is an experience. 04:04 So really in a way, the runners are the omegas. 04:06 And I'm interested in measurements on those guys. 04:10 So this is how I have a random variable. 04:11 And this random verbal here is measuring their running time. 04:15 OK. 04:16 If you look at the data set, you have all sorts 04:18 of random variables you could measure 04:19 about those random runners. 04:21 Country of origin. 04:22 I don't know, height, age, a bunch of things. 04:25 OK. 04:25 Here, the random variable of interest 04:27 being the running time. 04:29 OK. 04:30 Everybody understand what the process is? 04:32 OK. 04:33 So now I'm going to have to make some modeling assumptions. 04:36 And here, I'm actually pretty lucky. 04:37 I actually have all the data from a past year. 04:41 I mean, this is not the data from 2012, which I also have, 04:44 but I don't use. 04:45 But I can actually use past data to try to understand what 04:47 distribution do I have, right? 04:49 I mean, after all, running time is going 04:51 to be rounded to something. 04:52 Maybe I can think of it as a discrete random variable. 04:55 Maybe I can think of it as the exponential random variable. 04:58 Those are positive numbers. 05:00 I mean, there's many kind of running times 05:01 that could come up to mind. 05:03 Many kind of distributions I could think 05:04 of for this modeling part. 05:06 But it turns out that if you actually 05:08 plug the histogram of those running times for all 15,000 05:11 runners in 2009, you actually are 05:14 pretty happy to see that it really 05:16 looks like a bell-shaped curve, which suggest 05:18 that this should be a Gaussian. 05:19 So what you go on to do is you estimate the mean 05:25 from past observations, which was actually 103.5, as we said. 05:29 You submit the variance, which was 373. 05:34 And you just try to superimpose the curve 05:37 with this one, which is a Gaussian PDF with mean 103.5 05:43 and variants 373. 05:45 And you see that they actually look very much alike. 05:48 And so here, you're pretty comfortable to say 05:50 that the running time actually is Gaussian distribution. 05:53 All right? 05:54 So now I know that the x1 to xn, I'm 05:56 going to say they're Gaussian, OK? 05:58 I still need to specify two parameters. 06:01 So what I want to know is, is the distribution the same 06:05 from past years, right? 06:06 So I want to know if the random variable that I'm looking 06:08 for-- if I, say, pick one. 06:09 Say, x1. 06:10 Does it have the same distribution in 2012 06:12 that it did in 2009? 06:15 OK. 06:16 And so, the question is, is x1 has 06:19 a Gaussian distribution with mean 103.5 and variance 373? 06:23 Is that clear? 06:24 OK. 06:25 So this question that calls for a yes or no answer 06:30 is a hypothesis testing problem. 06:31 I am testing a hypothesis. 06:34 And this is the basis of basically all 06:36 of data-driven scientific inquiry. 06:39 You just ask questions. 06:40 You formulate a scientific hypothesis. 06:43 Knocking down this gene is going to cure melanoma, is this true? 06:48 I'm going to collect. 06:49 I'm going to try. 06:50 I'm to observe some patients on which I knock down this gene. 06:52 I'm going to collect some measurements. 06:54 And I'm going to try to answer this yes/no question, OK? 07:00 It's different from the question, 07:02 what is the mean running time for this year? 07:08 OK. 07:10 So this hypothesis testing is testing 07:12 if this hypothesis is true. 07:13 The hypothesis in common English we just said, 07:20 were runners running faster? 07:22 All right? 07:22 Anybody could formulate this hypothesis. 07:24 Now, you go to a statistician. 07:25 And he's like, oh, what you're really 07:27 asking me is x1 has a Gaussian distribution with mean 07:32 less than 103.5 and variance 373, right? 07:35 That's really the question that you ask in statistical terms. 07:38 And so, if you're asking if this was the same as before, 07:42 there's many ways it could not be the same as before. 07:44 There's basically three ways it could not 07:46 be the same as before. 07:47 It could be the case that x1 is in expectation to 103.5 07:54 So the expectation has changed. 07:56 Or the variance has changed. 07:58 Or the distribution has changed. 08:00 I mean, who knows? 08:01 Maybe runners are now just all running holding their hands. 08:05 And it's like now a point mass at 1 given point. 08:08 OK. 08:08 So you never know what could [INAUDIBLE].. 08:11 Now of course, if you allow for any change, 08:14 you will find change. 08:16 And so what you have to do is to factor in as much knowledge 08:18 as you can. 08:19 Make as many modeling assumptions, 08:20 so that you can let the data speak 08:22 about your particular question. 08:23 Here, your particular question is, are they running faster? 08:27 So you're only really asking a question about the expectation. 08:30 You really want to know if the expectation has changed. 08:33 So as far as you're concerned, you're 08:35 happy to make the assumption that the rest has 08:37 been unchanged. 08:38 OK. 08:39 And so, this is the question we're asking. 08:42 Is the expectation now less than 103.5? 08:45 Because you specifically asked whether runners were 08:48 going faster this year, right? 08:50 They tend to go faster rather than slower, all right? 08:55 OK. 08:56 So this is the question we're asking in mathematical terms. 09:00 So first, when I did that, I need to basically fix the rest. 09:03 And fixing the rest is actually part 09:05 of the modeling assumptions. 09:07 So I fixed my variance to be 373. 09:10 OK? 09:11 I assume that the variance has not 09:13 changed between 2009 and 2012. 09:17 Now, this is an assumption. 09:18 It turns out it's wrong. 09:21 So if you look at the data from 2012, 09:22 this is not the correct assumption. 09:24 But I'm just going to make it right now 09:26 for the sake of argument, OK? 09:29 And also the fact that it's Gaussian. 09:31 Now, this is going to be hard to violate, right? 09:34 I mean, where did this bell-shaped curve come from? 09:37 Well, it's just natural when you just measure a bunch of things. 09:42 The central limit theorem appears 09:44 in the small things of nature. 09:45 I mean, that's the bedtime story you get about the central limit 09:48 theorem. 09:48 And that's why the bell-shaped curve is everywhere in nature. 09:50 It's the sum of little independent things 09:52 that are going on. 09:53 And this Gaussian assumption, even if I wanted to relax it, 09:57 there's not much else I can do. 09:58 It is pretty robust across the years. 10:02 All right. 10:02 So the only thing that we did not fix 10:04 is the expectation of x1, which now I want to know what it is. 10:08 And since I don't know what it is, I'm going to call it mu. 10:11 And it's going to be a variable of interest, all right? 10:13 So it's just a number mu. 10:14 Whatever this is I can try to estimate it, maybe using 10:17 maximum likelihood estimation. 10:19 Probably using the average, because this is Gaussian. 10:21 And we know that the maximum likelihood 10:22 estimator for a Gaussian is just the average. 10:26 And now we only want to test if mu is equal to 103.5, 10:30 like it was in 2009. 10:34 Or on the contrary, if mu is not equal to 103.5. 10:37 And more specifically, if mu is actually 10:39 strictly less than 103.5. 10:41 That's the question you ask. 10:42 Now, why am I in writing mu equal to 103.5 or is 10:49 less than 103.5 and equal to 103.5 10:53 versus not equal to 103.5? 10:55 It's because since you asked me a more precise question, 10:58 I'm going to be able to give you a more precise answer. 11:01 And so, if your question is very specific-- 11:03 are they running faster? 11:05 I'm going to factor that in what I write. 11:08 If you just ask me, is it the same? 11:10 I'm going to have to write, or is it different than 103.5? 11:13 And that's less information about what 11:15 you're looking for, OK? 11:19 So by making all these modeling assumptions-- 11:23 the fact that the variance doesn't change, 11:25 the fact that it's still Gaussian-- 11:26 I've actually reduced the number of. 11:31 And I put numbers in quotes, because this is still 11:34 an infinite of them. 11:35 But I'm limiting the number of ways 11:38 the hypothesis can be violated. 11:40 11:43 The number of possible alternative realities 11:46 for this hypothesis, all right? 11:48 For example, I'm saying there's no way 11:50 mu can be larger than 103.5. 11:53 I've already factored that in, OK? 11:55 It could be. 11:56 But I'm actually just going to say that if it's larger, 11:59 all I'm going to be able to tell you is that it's not smaller. 12:02 I'm not going to be able to tell you 12:06 that it's actually larger, OK? 12:07 12:12 And the only way it can be rejected now. 12:15 The only way I can reject my hypothesis 12:17 is if x belongs to very specific family of distributions. 12:22 If it has a distribution which is Gaussian 12:24 with mean mu and variance of 373 for mu, which is less 103.5. 12:29 All right? 12:30 So we started with basically was x1-- 12:40 so that's the reality. 12:41 x1 follows n 103.5 373, OK? 12:49 And this is everything else, right? 12:53 So for example, here is x follows 12:55 some exponential, 0.1, OK? 13:02 This is just another distribution here. 13:04 Those are all the possible distributions. 13:06 What we said is we said, OK, first of all, let's just 13:09 keep only those Gaussian distributions, right? 13:13 And second, we said, well, among those Gaussian distributions, 13:18 let's only look at those that have-- well, 13:20 maybe this one should be at the boundary-- 13:24 let's only look at the Gaussians here. 13:26 So this guy here are all the Gaussians 13:33 with mean mu and variance 373 for mu less than 103.5, OK? 13:43 So when you're going to give me data, 13:45 I'm going to be able to say, well, am I this guy? 13:48 Or am I one of those guys? 13:49 Rather than searching through everything. 13:51 And the more you search the easier for you 13:53 to find something that fits better the data, right? 13:56 And so, if I allow everything possible, 14:00 then there's going to be something 14:01 that just by pure randomness is actually going to look better 14:04 for the data, OK? 14:06 14:09 So for example, if I draw 10 random variables, right? 14:12 If n is equal to 10. 14:15 And let's say they take 10 different values. 14:18 Then it's actually more likely that those guys 14:20 come from a discrete distribution that 14:23 takes each of these values with probability 1 over 10, 14:27 than actually some Gaussian random variable, right? 14:30 That would be perfect. 14:31 I can actually explain it. 14:32 If the 10 numbers I got were say-- 14:36 let's say I collect 3, 90, 95, and 102. 14:41 Then the most likely distribution for those guys 14:44 is the discrete distribution that 14:46 takes three values, 91 with probability 1/3, 95 14:51 with probability 1/3, and 102 with probably 1/3, right? 14:57 That's definitely the most likely distribution for this. 14:59 So if I allowed this, I would say, oh no. 15:02 This is not distributed according to that. 15:04 It's distributed according to this very specific 15:06 distribution, which is somewhere in the realm 15:09 of all possible distributions, OK? 15:12 So now we're just going to try to carve out all this stuff 15:15 by making our assumptions. 15:18 OK. 15:19 So here in this particular example, 15:20 just make a mental note that what we're doing 15:23 is that I actually-- 15:25 a little birdie told me that the reference number is 103.5, OK? 15:31 That was the thing I'm actually looking for. 15:34 In practice, it's actually seldom the case 15:36 that you have this reference for yourself to think of, right? 15:40 Maybe here, I just happen to have a full data 15:43 set of all the runners of 2009. 15:46 But if I really just asked you, I said, 15:50 were runners faster in 2012 than in 2009? 15:55 Here's $10 to perform your statistical analysis. 15:59 What you're probably going to do is called maybe 10 runners 16:01 from 2012, maybe 15 runners from 2009, 16:05 ask them and try to compare their mean. 16:07 There's no standard reference. 16:09 You would not be able to come up with this 103.5, 16:11 because these data maybe is expensive to get or something. 16:14 OK. 16:15 So this is really more of the standard case, all right? 16:18 Where you really compare two things with each other, 16:21 but there's no actual ground truth number 16:23 that you're comparing it to. 16:26 OK. 16:26 So we'll come back to that in a second. 16:28 I'll tell you what the other example looks like. 16:32 So let's just stick to this example. 16:34 I tell you it's 103.5, OK? 16:36 Let's try to have our intuition work the same way. 16:39 We said, well, averages worked well. 16:42 The average, tell me, of over these 10 guys 16:46 should tell me what the mean should be. 16:49 So I can just say, well x bar is going 16:52 to be close to the true mean by the law of large number. 16:55 So I'm going to decide whether x bar is less than 103.5. 17:00 And conclude that in this case, indeed mu is less than 103.5, 17:04 because those two quantities are close, right? 17:06 I could do that. 17:08 The problem is that this could go pretty wrong. 17:10 Because if n is small, then I know 17:13 that xn bar is not equal to mu. 17:17 I know that xn bar is close to mu. 17:19 But I also know that there's pretty high chance 17:21 that it's not equal to mu. 17:23 In particular, I know it's going to be somewhere at 1 over root 17:26 n away from mu, right? 17:28 1 over root n being the root coming from what? 17:31 17:34 CLT, right? 17:35 That's the root n that comes from CLT. In blunt words, 17:40 CLT tells me the mean is at distance 17:44 1 over root n from the expectation, pretty much. 17:47 That's what it's telling. 17:48 So 1 over root n. 17:49 17:52 If I have 10 people in there, 1 over root 10 17:55 is not a huge number, right? 17:57 It's like 1/3 pretty much. 18:00 So 1/3 103.5. 18:02 If the true mean was actually 103.4, 18:07 but my average was telling me it's 103.4 plus 1/3, 18:12 I would actually come to two different conclusions, right? 18:15 18:22 So let's say that mu is equal to 103.4, OK? 18:29 So you're not supposed to know this, right? 18:32 That's the hidden truth. 18:34 18:37 OK. 18:38 Now I have n is equal to 10. 18:40 So I know that x bar n minus 103.4 18:49 is something of the order of 1 over the square root of 10, 18:52 which is of the order of, say, 0.3. 18:57 OK. 18:58 So here, this is all hand wavy, OK? 19:01 But that's what the central limit theorem tells me. 19:05 What it means is that it is possible 19:13 that x bar n is actually equal to is actually 19:20 equal to 103.4 plus 0.3, which is equal to 103.7. 19:30 Which means that while the truth is that mu is less than 103.5, 19:40 then I would conclude that mu is larger than 103.5, OK? 19:47 And that's because I have not been very cautious, OK? 19:49 19:52 So what we want to do is to have a little buffer 19:56 to account for the fact that xn bar is not 19:58 a precise value for the true mu. 20:01 It's something that's 1 over root n away from you. 20:05 And so, what we want is the better heuristic that 20:07 says, well, if I want to conclude that I'm 20:09 less than 103.5, maybe I need to be less than 103.5 20:14 minus a little buffer that goes to 0 as my sample size goes 20:17 to infinity. 20:19 And actually, that's what the law of large number tells me. 20:22 The central limit theorem actually 20:23 tells me that this should be true, 20:26 something that goes to 0 as n goes to infinity 20:30 and the rate 1 over root n, right? 20:32 That's basically what the central limit theorem tells me. 20:36 So to make this intuition more precise, 20:39 we need to understand those fluctuations. 20:41 We need to actually put in something 20:43 that's more precise than these little wiggles here, OK? 20:47 We need to actually have the central limit theorem come in. 20:49 20:53 So here is the example of comparing two groups. 20:57 So pharmaceutical companies use hypothesis 21:00 testing to test if a drug is efficient, right? 21:03 That's what they do. 21:04 They want to know, does my new drug work? 21:06 And that's what the Federal Drug Administration office 21:09 is doing on a daily basis. 21:11 They ask for extremely well regulated clinical trials 21:18 on a thousand people, and check, does this drug 21:22 make a difference? 21:23 Did everybody die? 21:24 Does it make no difference? 21:27 Should people pay $200 for a pill of sugar, right? 21:30 So that's what people are actually asking. 21:33 So to do so, of course, there is no ground truth about-- 21:36 so there's actually a placebo effect. 21:38 So it's not like actually giving a drug that does not work 21:41 is going to have no effect on patients. 21:44 It will have a small effect, but it's very hard to quantify. 21:47 We know that it's there, but we don't know what it is. 21:50 And so rather than saying, oh the ground truth 21:52 is no improvement, the ground truth is the placebo effect. 21:56 And we need to measure what the placebo effect is. 22:00 So what we're going to do is we're 22:01 going to split our patients into two groups. 22:04 And there's going to be what's called a test 22:06 group and a control group. 22:10 So the word test here is used in a different way 22:13 than hypothesis testing. 22:14 So we'll just call it typically the drug group. 22:17 And so, I will refer to mu drug for this guy, OK? 22:22 Now, this let's say this is a cough syrup, OK? 22:26 And when you have a cough syrup, the way 22:29 you measure the efficacy of a cough syrup 22:34 is to measure how many times you cough per minute, OK? 22:40 And so, if I define mu control the number 22:42 of expectoration per hour. 22:48 So just the expected number, right? 22:50 This is the number I don't know, because I don't have access 22:53 to the entire population of people that will ever 22:55 take this cough syrup. 22:57 And so, I will call it mu control for the control group. 23:00 So those are the people who have been actually given just 23:02 like sugar, like maple syrup. 23:05 And mu drug are those people who are given the actual syrup, OK? 23:09 And you can imagine that maybe maple syrup will have an effect 23:12 on expectorations per hour just because, well, it's just sweet 23:18 and it helps, OK? 23:19 And so, we don't know what this effect is going to be. 23:21 We just want to measure if the drug is actually 23:24 having just a better impact on expectorations 23:28 per hour than the just pure maple syrup, OK? 23:34 So what we want to know is if mu drug is less than mu control. 23:38 That would be enough. 23:39 If we had access to all the populations 23:41 that will ever take the syrup for all ages, 23:44 then we would just measure, did this have an impact? 23:46 And even if it's a slightly ever so small impact, 23:49 then it's good to release this cough syrup, 23:52 assuming that it has no side effects or anything like this, 23:55 because it's just better than maple syrup, OK? 23:58 The problem is that we don't have access to this. 24:00 And we're going to have to make this decision based on samples 24:03 that give me imprecise knowledge about mu drug and mu control. 24:09 So in this case, unlike the first case 24:10 where we compared an unknown expected value 24:13 to have a fixed number, which was one of the 103.5, here, 24:17 we're just comparing two unknown numbers with each other, OK? 24:20 So there's two sources of randomness. 24:22 Trying to estimate the first one. 24:23 And trying to estimate the second one. 24:25 24:29 Before I move on, I just wanted to tell you I apologize. 24:31 One of the graders was not able to finish grading his problem 24:34 sets for today. 24:35 So for those of you who are here just to pick up their homework, 24:39 feel free to leave now. 24:41 Even if you have a name tag, I will pretend I did not read it. 24:45 OK. 24:45 So I'm sorry. 24:47 You'll get it on Tuesday. 24:49 And this will not happen again. 24:53 OK. 24:54 So for the clinical trials, now I'm 24:56 going to collect information. 24:57 I'm going to collect the data from the control group. 25:00 And I'm going to collect data from the test group, all right? 25:03 So my control group here. 25:05 I don't have to collect the same number of people in the control 25:08 group than in the drug group. 25:09 Actually, for cough syrup, maybe it's not that important. 25:12 But you can imagine that if you think 25:14 you have the cure to a really annoying disease, 25:20 it's actually hard to tell half of the people you 25:23 will get a pill of nothing, OK? 25:26 People tend to want to try the drug. 25:28 They're desperate. 25:28 And so, you have to have this sort of imbalance 25:30 between who is getting the drug and who's not getting the drug. 25:35 And people have to qualify for the clinical trials. 25:37 There's lots of fluctuations that 25:39 affect what the final numbers of people who are actually 25:42 going to get the drug and are going to get the control 25:44 is going to be. 25:45 And so, it's not easy for you to make those two numbers equal. 25:49 You'd like to have those numbers equal if you can, 25:51 but not necessarily. 25:55 And by the way, this is all part of some mystical science called 25:59 "design of experiments." 26:00 And in particular, you can imagine 26:02 that if one of the series had higher variants, 26:04 you would want to like more people in this group 26:07 than the other group. 26:08 Yeah? 26:10 STUDENT: So when we're subtracting [INAUDIBLE] 26:13 something that [INAUDIBLE] 0 [INAUDIBLE] to be satisfied. 26:20 So that's on purpose [INAUDIBLE].. 26:22 PROFESSOR: Yeah, that's on purpose. 26:23 And I'll come to that in a second, all right? 26:25 So basically, we're going to make it 26:31 if your answer is, is this true? 26:34 We're going to make it as hard as possible, but no harder 26:39 for you to say yes to this answer. 26:41 Because, well, we'll see why. 26:43 26:45 OK, so now we have two set of data, the x's and the y's. 26:50 The x's are the ones for the drug. 26:51 And the y's are the data that I collected from the people, who 26:55 were just given a placebo, OK? 26:57 And so, they're all IID random variables. 26:59 And here, since it's the number of expectorations, 27:02 I'm making a blunt modeling assumption. 27:07 I'm just going to say it's Poisson. 27:08 And it's characterized only by the mean mu drug or the mean mu 27:11 control, OK? 27:13 I've just made an assumption here. 27:15 It could be something different. 27:16 But let's say it's a Poisson distribution. 27:19 So now what I want to know is to test whether mu drug is 27:21 less than mu control. 27:22 We said that already. 27:23 But the way we said it before was not as mathematical 27:26 as it is now. 27:27 Now we're actually making a test on the parameters 27:29 of Poisson distribution. 27:30 Whereas before, we were just making test 27:32 on expected numbers, OK? 27:36 So the heuristic-- again, if we try to apply the heuristic now. 27:39 Rather than comparing mu x bar drug to some fixed number, 27:42 I'm actually comparing x bar drug to some control. 27:46 But now here, I need to have something that accounts for, 27:48 not only the fluctuations of x bar drug, 27:51 but also for the fluctuations of x bar control, OK? 27:55 And so, now I need something that 27:56 goes to 0 when all those two things go to infinity. 27:59 And typically, it should go to zero with 1 over root of n drug 28:02 and 1 over square root of n control, OK? 28:06 That's what the central limit theorem for both x bar 28:08 drug and x bar control. 28:11 Two central limit theorems are actually telling. 28:15 OK. 28:15 And then we can conclude that this happens. 28:17 And as you said, we're trying to make it 28:19 a bit harder to conclude this. 28:21 Because let's face it. 28:23 If we were actually using two simple heuristic, right? 28:26 28:30 For simplicity, right? 28:31 28:35 So I can rewrite x bar drug less than x bar control 28:43 minus this something that goes to 0. 28:46 I can write it as x bar drug minus x bar control less 28:54 than something negative, OK? 28:57 This little something, OK? 29:00 So now let's look at those guys. 29:02 This is the difference of two random variables. 29:06 From the central limit theorem, they 29:08 should be approximately Gaussian each. 29:12 And actually, we're going to think 29:14 of them as being independent. 29:15 There's no reason why the people in the control group 29:18 should have any effect on what's happening 29:20 to the people in the test group. 29:21 Those people probably don't even know each other. 29:23 And so, when I look at this, this should look like n 0 29:27 with some mean and some variants, 29:28 let's say I don't know what it is, OK? 29:30 The mean I actually know. 29:31 It's mu drug minus mu control, OK? 29:37 So if they were to plot the PDF of this guy, 29:39 it would look like this. 29:41 I would have something which is centered 29:42 at mu drug minus mu control. 29:45 29:48 And it would look like this, OK? 29:51 Now let's say that mu drug is actually equal to mu control. 29:55 That this pharmaceutical company is a huge scam. 29:59 And they really are trying to sell bottled corn 30:04 syrup for $200 a pop, OK? 30:07 So this is a huge scam. 30:08 And the true things are actually equal to 0. 30:12 So this thing is really centered about 0, OK? 30:15 Now, if were not to do this, then basically, half 30:18 of the time I would actually come up 30:20 with a distribution that's above this value. 30:22 And half of the time I would have something that's 30:24 below this value, which would mean that half of the scams 30:27 would actually go through FDA if I did not do this. 30:31 So what I'm trying to do is to say, well, OK. 30:33 You have to be here, so that there is actually 30:35 a very low probability that just by chance 30:37 you end up being here. 30:40 And we'll make all the statements extremely precise 30:42 later on. 30:43 But I think the drug thing makes it 30:46 interesting to see why you're making it hard, 30:49 because You don't want to allow people 30:51 to sell a thing like that. 30:52 30:55 Before we go more into the statistical thinking associated 30:58 to tests, let's just see how we would 31:01 do this quantification, right? 31:02 I mean after all, this is what we probably 31:04 are the most comfortable with at this point. 31:07 So let's just try to understand this. 31:10 And I'm going to make the statisticians favorite test, 31:16 which is the thing that obviously you do at home all 31:19 the time every time you get a new quarter, 31:21 is testing whether it's a fair coin or not. 31:23 All right? 31:24 So this test, of course, exists only in textbooks. 31:27 And I actually did not write this slide. 31:30 I was lazy to just replace all this stuff 31:32 by the Cherry Blossom Run. 31:37 So you have a coin. 31:38 Now you have 80 observations, x1 to x80. 31:42 So n is equal to 80. 31:45 I have x1, xn, IID, Bernoulli p. 31:53 And I want to know if I have a fair coin. 31:55 So in mathematical language, I want 31:57 to know if p is equal to 1/2. 32:00 32:04 Let's say this is just the heads, OK? 32:07 And a biased coin? 32:09 Well, maybe you would potentially 32:10 be interested whether it's biased 32:11 one direction or the other. 32:13 But not being a fair coin is already somewhat 32:15 of a discovery, OK? 32:17 And so, you just want to know whether p is equal to 1/2 32:20 or p is not equal to 1/2, OK? 32:25 Now, if I were to apply the very naive first example 32:29 to not reject this hypothesis. 32:32 If I run this thing 80 times, I need 32:35 to see exactly 40 heads and 40 tales. 32:40 Now this is very unlikely to happen exactly. 32:43 You're going to have close to 40 heads and close to 40 tails, 32:47 but how close should those things be? 32:49 OK? 32:50 And so, the little something is going 32:52 to be quantified by exactly this, OK? 32:55 So now here, let's say that my experiment gave me 54 heads. 33:06 That's 54? 33:07 Yeah. 33:08 33:10 Which means that my xn bar is 54 over 80, which is 0.68. 33:21 All right? 33:21 So I have this estimator. 33:24 Looks pretty large, right? 33:26 It's much larger than 0.5, so it does look like, 33:29 and my mom would certainly conclude, 33:32 that this is a biased coin for sure, 33:34 because she thinks I'm tricky. 33:35 All right. 33:36 So the question is, can this be due to chance? 33:40 Can this be due to chance alone? 33:42 Like what is the likelihood that a fair coin would actually 33:45 end up being 54 times on heads rather than 40? 33:51 OK? 33:52 And so, what we do is we say, OK, I 33:55 need to understand, what is the distribution of the number 33:58 of times it comes on heads? 33:59 And this is going to be a binomial, 34:01 but it's a little annoying to play with. 34:02 So we're going to use the central limit theorem that 34:05 tells me that xn bar minus p divided 34:10 by square root of p1 minus p is approximately distributed 34:15 as an n01. 34:17 And here, since n is equal to 80, 34:18 I'm pretty safe that this is actually going to work. 34:21 34:28 And I can actually use [INAUDIBLE],, 34:33 and put xn bar here. 34:34 34:38 [INAUDIBLE] tells me that this is OK to do. 34:40 All right. 34:41 So now I'm actually going to compute this. 34:43 So here, I know this. 34:44 This is square root of 80. 34:46 This is a 0.68. 34:48 What is this value here? 34:50 We'll talk about it. 34:51 Well, we're trying to understand what happens 34:53 if it is a fair coin, right? 34:55 So if fair, then p is equal to 0.5, right? 35:02 So what I want to know is, what is the likelihood 35:05 that a fair coin would give me 0.68? 35:09 Let me finish. 35:10 All right. 35:11 What is the likelihood that a fair coin will 35:14 allow me to do this, so I'm actually allowed to plug-in p 35:17 to be 0.5 here? 35:19 Now, your question is, why do I not plug-in p to be 0.5? 35:25 But you can. 35:25 All right. 35:26 I just want to make you plug-in p at one specific point, 35:29 but you're absolutely right. 35:30 35:34 OK. 35:34 Let's forget about your question for one second. 35:37 So now I'm going to have to look at xn bar minus 0.5 divided 35:41 by xn bar 1 minus xn bar. 35:45 Then this thing is approximately Gaussian and 0,1 35:51 if the coin is fair. 35:52 35:56 Otherwise, I'm going to have a mean which is not zero here. 36:01 If the coin is something else, whatever I get here, right? 36:04 36:07 Let's just write it for one second. 36:09 36:23 Let's do it. 36:25 So what is the distribution of this if p-- 36:27 so that's p is equal to 0.5. 36:33 OK? 36:33 Now if p is equal to 0.6, then this thing is just, well, 36:39 I know that this is equal to square root of n xn 36:43 bar minus 0.6, divided by xn bar 1 36:52 minus xn in the bar squared root, plus-- 36:55 well, now the difference. 36:57 Is So square root of n, 0.6 minus 37:00 0.5, divided by square root of xn bar 1 minus xn bar, right? 37:07 Now if p is equal to 0.6, then this guy is n 0,1, 37:13 but this guy is something different. 37:17 This is just a number that depends on square root of n. 37:22 It's actually pretty large. 37:24 So if I want to use the fact that this guy has 37:28 a normal distribution, I need to plug-in the true value here. 37:33 Now, the implicit question that I got was the following. 37:38 It says, well, if you know what p is, then what's 37:43 actually true is also this. 37:46 If p is equal to 0.5, then since I 37:51 know that root n xn bar minus p divided by square root of p 1 37:57 minus p is some n 0, 1, it's also true 38:01 that square root of n xn bar minus 0.5 38:06 divided by square root of 0.5 1 minus 0.5 is n 0,1, right? 38:14 I know what p is. 38:15 I'm just going to make it appear. 38:18 OK. 38:19 And so, what's actually nice about this particular 38:22 [INAUDIBLE] experiment is that I can check if my assumption is 38:27 valid by checking whether I'm actually-- 38:31 so what I'm going to do right now 38:32 is check whether this is likely to be a Gaussian or not, right? 38:36 And there's two ways I can violate it. 38:38 By violating mean, but also by violating the variance. 38:42 And here, what I did in the first case, 38:44 I said, well I'm not allowing you to check whether you've 38:46 violated the variance. 38:47 I'm just plugging whatever variance you're getting. 38:49 Whereas here, I'm saying, well, there's 38:51 two ways you can violate it. 38:52 And I'm just going to factor everything in. 38:55 So now I can plug-in this number. 38:58 So this is 80. 39:00 This is 0.68. 39:02 So I can compute all this stuff. 39:04 I can compute all this stuff here as well. 39:06 And what I get in this case, if I put the xn bar 1, 39:10 I get 3.45, OK? 39:15 And now I claim that this makes it 39:17 reasonable to reject the hypothesis that p 39:19 is equal to 0.5. 39:21 Can somebody tell me why? 39:22 39:27 STUDENT: It's pretty big. 39:28 PROFESSOR: Yeah, 3 is pretty big. 39:30 So it's very unlikely. 39:31 So this number that I should see should 39:33 look like the number I would get if I asked a computer to draw 39:39 one random Gaussian for me. 39:42 This number, when I draw one random Gaussian, 39:45 is actually a number with 99.9% this number will 39:49 be between negative 3 and 3. 39:52 With 78% it's going to be between negative 2 and 2. 39:55 40:01 68% is between minus 1 and 1. 40:04 And with like 90% it's between minus 2 and 2. 40:07 So getting a 3.45 when you do this 40:10 is extremely unlikely to happen, which 40:13 means that you would have to be extremely unlucky for this 40:17 to ever happen. 40:17 Now, it can happen, right? 40:19 It could be the case that you flip 80 coins and 80 of them 40:25 are heads. 40:27 With what probability does this happen? 40:29 40:32 1 over 2 to the 80, right? 40:34 Which is probably better off playing the lottery 40:39 with this kind of odds, right? 40:41 I mean, this is just not going to happen, but it might happen. 40:43 So we cannot remove completely the uncertainty, right? 40:48 It's still possible that this is due to noise. 40:53 But we're just trying to make all the cases that 40:55 are very unlikely go away, OK? 40:58 And so, now I claim that 3.45 is very unlikely for a Gaussian. 41:03 So if I were to draw the PDF of a standard Gaussian, right? 41:07 So n 0, 1, right? 41:09 So that's PDF of n 0, 1. 41:12 41:16 3.73 is basically here, OK? 41:21 So it's just too far in the tails. 41:25 Understood? 41:26 Now I cannot say that the probability that the Gaussian 41:30 is equal to 373 is small, right? 41:33 I just cannot say that, because it's 0. 41:35 And it's also 0 for the probability that it's 0, 41:37 even though the most likely values are around 0. 41:41 It's the continuous random variable. 41:44 Any value you give me, it's going 41:45 to happen with probability zero. 41:47 So what we're going to say is, well, the fluctuations 41:51 are larger than this number. 41:52 The probability that I get anything worse 41:55 than this is actually extremely small, right? 41:57 Anything worse than this is just like farther than 3.73. 42:00 And this is going to be what we control. 42:03 All right? 42:04 So in this case, I claim that it's quite reasonable 42:06 to reject the hypothesis. 42:07 Is everybody OK with this? 42:10 Everybody find this shocking? 42:12 Or everybody has no idea what's going on? 42:14 Do you have any questions? 42:16 Yeah? 42:17 STUDENT: Regarding the case of p, where 42:19 minus p isn't close to xn. 42:21 If you use 1 minus p as 0.5, then you're 42:24 dividing by a larger number than you would if you used xn. 42:28 So it feels like our true number is not 3.45. 42:32 It's something a little bit smaller 42:34 than 3.45 for the distribution to actually be like 1/2. 42:39 Because it seems like we're adding 42:40 an unnecessary extra error by using xn bar. 42:43 And we're adding an error that makes 42:45 it seem that our result was less likely than it actually was. 42:50 43:00 PROFESSOR: That's correct. 43:03 And you're right. 43:05 I didn't want to plug-in the p everywhere, 43:07 but you should plug it in everywhere you can. 43:09 That's for sure, OK? 43:11 So let's agree on that. 43:12 And that's true that it makes the number a little bigger. 43:15 You compute how much you would get, 43:16 we would get if we 0.5 there. 43:18 43:20 Well, I don't know what the square root of 80 is. 43:23 Can somebody compute quickly? 43:26 I'm not asking you to do it. 43:27 But what I want is two times square root of 80 times 0.18. 43:46 3.22 43:48 OK. 43:49 I can make the same cartoon picture with 3.22. 43:55 But you're right. 43:56 This is definitely more accurate. 43:57 And I should have done this. 43:58 I didn't want to get the confused message, OK? 44:02 All right. 44:02 So now here's a second example that you can think of. 44:07 So now I toss it 30 times. 44:11 Still in the realm of the central limit theorem. 44:17 I get 13 heads rather than 15. 44:23 So I'm actually much closer to being exactly at half. 44:27 So let's see if this is actually going 44:28 to give me a plausible value. 44:29 44:32 So I get 0.33 in average. 44:34 If the truth was 0.5, I would get something like 0.77. 44:40 And now I claim that 0.77 is a plausible realization 44:44 for some standard Gaussian, OK? 44:46 Now, 0.77 is going to look like it's here. 44:49 44:55 So that could very well be something that just 44:57 comes because of randomness. 44:59 And again, if you think about it. 45:01 If I told you, you were expecting 15, you saw 13, 45:06 you're happy to put that on the account of randomness. 45:09 Now of course, the question is going to be, 45:11 where do I draw the line? 45:12 Right? 45:13 Is 12 the right number? 45:15 Is 11? 45:16 Is 10? 45:17 What is it? 45:18 45:21 So basically, the answer is it's whatever you want to be. 45:24 The problem it's hard to think on the scale, right? 45:28 What does it mean to think on the scale? 45:30 If I can't think in this scale, I'm 45:31 going to have to think on the scale of 80 of them. 45:33 I'm going to have to think on the scale of running 100 coin 45:38 flips. 45:39 And so, this scale is a moving target all the time. 45:43 Every time you have a new problem, 45:44 you have to have a new skill in mind. 45:45 And it's very difficult. 45:47 The purpose of statistical analysis, 45:50 and in particular this process that content 45:53 that takes your x bar and turns it 45:55 into something that should be standard Gaussian, 45:58 allows you to map the value of x bar 46:01 into a scale that is the standard scale of the Gaussian. 46:06 All right? 46:07 Now, all you need to have in mind 46:09 is, what is a large number or an unusually large number 46:13 for a Gaussian? 46:14 That's all you need to know. 46:15 46:18 So here, by the way, 0.77 is not this one, 46:21 because it was actually negative 0.77. 46:26 So this one. 46:28 OK. 46:29 So I can be on the right or I can be on the left of zero. 46:34 But they are still plausible. 46:36 So understand you could actually have in mind 46:40 all the values that are plausible for a Gaussian 46:42 and those that are not plausible, 46:43 and draw the line based on what you think is the right number. 46:46 So how large should a positive value of a Gaussian to become 46:49 unreasonable for you? 46:52 Is it 1? 46:54 Is it 1.5? 46:56 Is it 2? 46:56 Stop me when I get there. 46:57 Is it 2.5? 46:59 Is it 3? 47:00 STUDENT: I think 2.5 is definitely too big. 47:02 PROFESSOR: What? 47:03 STUDENT: Doesn't it depend on our prior? 47:04 Let's say we already have really good evidence 47:06 at this point [INAUDIBLE] 47:09 PROFESSOR: Yeah, so this is not Bayesian statistics. 47:12 So there's no such thing as a prior right now. 47:14 We'll get there. 47:15 You'll have your moment during one short chapter. 47:18 47:23 So there's no prior here, right? 47:25 It's really a matter of whether you think 47:27 is a Gaussian large or not. 47:28 It's not a matter of coins. 47:30 It's not a matter of anything. 47:31 Now I've just reduced it to just one question. 47:33 So forget about everything we just said. 47:36 And I'm asking you, when do you decide 47:38 that a number is too large to be reasonably drawn 47:43 from a Gaussian? 47:44 And this number is 2 or 1.96. 47:50 And that's basically the number that you get from this quintel. 47:53 We've seen the 1.96 before, right? 47:55 It's actually q alpha over 2, where alpha is equal to 5%. 47:59 That's a quintel of a Gaussian. 48:01 So actually, what we do is we map it again. 48:05 So are now at the Gaussians. 48:06 And then we map it again into some probabilities, 48:08 which is the probability of being farther than this thing. 48:10 And now probabilities, we can think. 48:12 Probability is something that quantifies my error. 48:15 And the question is what percentage of error 48:17 am I willing to tolerate. 48:18 And if I tell you 5%, that's something 48:20 you can really envision. 48:21 What it means is that if I were to do this test a million 48:24 times, 5% of the time I would expose myself 48:28 to making a mistake. 48:30 All right. 48:30 That's all it would say. 48:31 If you said, well, I don't want to account for 5%, 48:36 maybe I want 1%, then you have to move from 1.94 to 2.5. 48:42 And then if you say at I want 0.01%, 48:44 then you have to move to an even larger number. 48:47 So it depends. 48:48 But stating this number 1%, 5%, 10% 48:51 is much easier than seeing those numbers 1.96, 2.5, et cetera. 48:57 So we're just putting everything back on the scale. 49:00 All right. 49:01 To conclude, this, again, as we said, 49:03 does not suggest that the coin is unfair. 49:05 Now, it might be that the coin is unfair. 49:08 We just don't have enough evidence to say that. 49:10 And that goes back to your question about, 49:12 why are we siding with the fact that we're 49:17 making it harder to conclude that the runners were faster? 49:22 And this is the same thing. 49:23 We're making it harder to conclude 49:24 that the coin is biased. 49:26 Because there is a status quo. 49:28 And we're trying to see if we have evidence 49:30 against the status quo. 49:31 The status quo for the runners is they ran the same speed. 49:35 The status quo for the coin, we can probably all agree 49:37 is that the coin is fair. 49:39 The status quo for a drug? 49:41 I mean, again, unless you prove me 49:43 that you're actually not a scammer 49:45 is that the status quo is that this is maple syrup. 49:48 There's nothing in there. 49:50 Why would you? 49:51 I mean, if I let you get away with it, 49:53 you would put corn syrup. 49:55 It's cheaper. 49:58 OK. 49:59 So now let's move on to math. 50:01 All right. 50:01 So when I started doing mathematics, 50:04 I'm going to have to talk about random variables 50:06 and statistical models. 50:08 And here, there is actually a very simple thing, 50:13 which actually goes back to this picture. 50:15 50:18 A test is really asking me if my parameter 50:27 is in some region of the parameter set 50:28 or another region of the parameter set, right? 50:30 Yes/no. 50:32 And so, what I'm going to be given is a sample, x1, xn. 50:37 I have a model. 50:38 50:41 And again, those can be braces depending on the day. 50:46 And so, now I'm going to give myself theta 0 and theta 1 50:54 to this joint subset. 50:55 50:58 OK. 51:01 So capital theta here is the space 51:02 in which my parameter can live. 51:05 To make two disjoint subsets, I could just 51:06 split this guy in half, right? 51:11 I'm going to say, well, maybe it's this guy and this guy. 51:13 OK. 51:14 So this is theta 0. 51:16 And this is theta 1. 51:18 What it means when I split those two guys, in test, 51:22 I'm actually going to focus only on theta 0 or theta 1. 51:25 And so, it means that a priori I've already 51:28 removed all the possibilities of theta being in this region. 51:32 What does it mean? 51:33 Go back to the example of runners. 51:37 This region here for the Cherry Blossom Run 51:41 is the set of parameters, where mu was larger 51:44 than 103.5, right? 51:47 We removed that. 51:48 We didn't even consider this possibility. 51:49 We said either it's less-- 51:52 sorry. 51:53 That's mu equal to 103.5. 51:55 And this was mu less than 103.5, OK? 51:59 But these guys were like if it happens, it happens. 52:03 I'm not making any statement about that case. 52:06 All right? 52:07 So now I take those two subsets. 52:09 And now I'm going to give them two different names, 52:11 because they're going to have an asymmetric role. 52:15 h0 is the null hypothesis. 52:18 And h1 is the alternative hypothesis. 52:23 h0 is the status quo. 52:27 h1 is what is considered typically 52:29 as scientific discovery. 52:32 So if you're a regulator, you're going to push towards h0. 52:36 If you're a scientist, you're going to push towards h1. 52:39 If you're a pharmaceutical company, 52:41 you're going to push towards h1. 52:42 OK? 52:43 And so, depending on whether you want to be conservative-- oh, 52:47 I can find evidence in a lot of data. 52:49 As soon as you give me three data points, 52:50 I'm going to be able to find evidence. 52:52 That means I'm going to tend to say, oh, it's h1. 52:55 But if you say you need a lot of data 52:58 before you can actually move away from the status quo, 53:00 that's age h0, OK? 53:01 So think of h0 as being status quo, 53:03 h1 being some discovery that goes against the status quo. 53:08 All right? 53:08 So if we believe that the truth theta is either 53:12 in one of those, what we say is we want to test h0 against h1. 53:17 OK. 53:17 This is actually wording. 53:19 So remember, because this is how your questions are 53:22 going to be formulated. 53:23 And this is how you want to probably communicate 53:26 as a statistician. 53:27 So you're going to say I have the null 53:29 and I have an alternative. 53:30 I want to test h0 against h1. 53:32 I want to test the null hypothesis 53:34 against the alternative hypothesis, OK? 53:36 53:39 Now, the two hypotheses I forgot to say are actually this. 53:42 h0 is that the theta belongs to theta 0. 53:46 And h1 is that it theta belongs to theta 1. 53:50 OK. 53:51 So here, for example, theta was mu. 53:53 And that was mu equal to 103.5. 53:57 And this was mu less than 103.5. 54:01 OK? 54:02 So typically, they're not going to look like thetas and things 54:06 like that. 54:06 They're going to look like very simple things, where you take 54:09 your usual notation for your usual parameter 54:11 and you just say in mathematical terms what relationship this 54:15 should be satisfying, right? 54:16 For example, in the drug example, 54:18 that would be mu drug is equal to mu control. 54:25 And here, that would be mu drug less than mu control. 54:30 The number of expectorations for people 54:34 who take the drug for the cough syrup 54:35 is less than the number of expectoration of people 54:38 who take the corn syrup, OK? 54:42 So now what we want to do. 54:45 54:47 We've set up our hypothesis testing problem. 54:51 You're a scientist. 54:52 You've set up your problem. 54:55 Now what you're going to do is collect data. 54:58 And what you're going to try to find on this data 55:00 is evidence against h0. 55:04 And the alternative is going to guide you 55:06 into which direction you should be looking 55:08 for evidence against this guy. 55:10 All right? 55:11 And so, of course, the narrower the alternative, 55:13 the easier it is for you, because you just 55:15 have to look at the one possible candidate, right? 55:19 But typically, h1 is a big group, like less than. 55:22 Nobody tells you it's either it's 103.5 and 103. 55:27 People tell you it's either 103.5 or less than 103.5. 55:32 OK. 55:33 And so, what we want to do is to decide whether we reject h0. 55:37 So we look for evidence against h0 in the data, OK? 55:40 55:44 So as I said, h0 and h1 do not play a symmetric role. 55:48 It's very important to know which one you're 55:51 going to place as h0 and which one you're 55:53 going to place at h1. 55:54 55:59 If it's a close call, you're always 56:01 going to side with h0, OK? 56:04 So you have to be careful about those. 56:05 You have to keep that in mind that if it's 56:07 a close call, if data does not carry a lot of evidence, 56:10 you're going to side with h0. 56:12 And so, you're actually never saying that h0 is true. 56:15 You're just saying I did not find evidence against h0. 56:18 You don't say I accept that h0. 56:21 You say I failed to reject h0. 56:25 OK. 56:26 And so one of the things that you 56:28 want to keep in mind when you're doing this 56:29 is this innocent until proven guilty. 56:32 So if you come from a country, like America, 56:37 there's such a thing. 56:38 And in particular, lack of evidence 56:41 does not mean that you are not guilty, all right? 56:45 OJ Simpson was found not guilty. 56:47 It was not found innocent, OK? 56:50 And so, this is basically what happens 56:52 is like the prosecutor brings their evidence. 56:55 And then the jury has to decide whether they 56:58 were convinced that this person was guilty of anything. 57:07 And the question is, do you have enough evidence? 57:11 But if you don't have evidence, it's 57:13 not the burden of the defender to prove that they're innocent. 57:17 Nobody's proving their innocent. 57:18 I mean, sometimes it helps. 57:20 But you just have to make sure that there's not 57:22 enough evidence against you, OK? 57:24 And that's basically what it's doing. 57:26 You're h0 until proven h1. 57:28 57:31 So how are we going to do this? 57:32 Well, as I said, the role of estimators 57:37 in hypothesis testing is played by something called tests. 57:40 And a test is a statistic. 57:42 Can somebody remind me what a statistic is? 57:44 57:47 Yep? 57:48 STUDENT: The measure [INAUDIBLE] 57:50 PROFESSOR: Yeah, that's actually just one step more. 57:53 So it's a function of the observations. 57:54 And we require it to be measurable. 57:56 And as a rule of thumb, measurable 57:58 means if I give you data, you can actually compute it, OK? 58:00 If you don't see a [INAUDIBLE] or an [INAUDIBLE],, 58:02 you don't have to think about it. 58:04 All right. 58:04 58:08 And so, what we do is we just have this test. 58:11 But now I'm actually asking only from this test 58:14 a yes/no answer, which I can code as 0, 1, right? 58:18 So as a rule of thumb, you say that, well, the test 58:21 is equal to 0 then h0. 58:23 The test is equal to 1 at h1. 58:25 And as we said, is that if the test is equal to 0, 58:27 it doesn't mean that a 0 is truth. 58:29 It means that I feel to rejected h0. 58:31 And if the test is equal to 1, I reject h0. 58:33 58:36 So I have two possibilities. 58:38 I look at my data. 58:39 I turn it into a yes/no answer. 58:41 And yes/no answer is really h0 or h1, OK? 58:45 Which one is the most likely basically. 58:49 All right. 58:50 So in the coin flip example, our test statistic 58:57 is actually something that takes value 0, 1. 59:00 And anything, any function that takes value at 0, 59:04 1 is an indicator function, OK? 59:07 So an indicator function is just a function. 59:11 So there's many ways you can write it. 59:13 59:18 So it's a 1 with a double bar. 59:20 If you aren't comfortable with this, 59:21 it's totally OK to write i of something, like i of a. 59:27 OK. 59:28 And that's what? 59:29 So a, here, is a statement, like an inequality, an equality, 59:34 some mathematical statement, OK? 59:38 Or not mathematical. 59:39 I mean, "a" can be, you know, my grandma is 20 years old, OK? 59:43 And so, this is basically 1 if a is true, and 0 if a is false. 59:50 59:54 That's the way you want to think about it. 59:56 60:02 This function takes only two values, and that's it. 60:05 60:10 So here's the example that we had. 60:12 We looked at whether the standardized xn 60:17 bar, the one that actually is approximately n 0,1 60:20 was larger than something in absolute value, 60:22 either very large or very small, but negative. 60:27 I'm going back to this picture. 60:29 We wanted to know if this guy was 60:31 either to the left of something or to the right of something, 60:35 right? 60:36 Was it in these regions? 60:37 60:42 Now this indicator, I can view this as a function of x bar. 60:49 What it does, it really splits the possible values 60:52 of x bar, which is just a real number, right? 60:54 In two groups. 60:56 The groups on which they lead to a value, which is 1. 60:59 And the groups on which they lead 61:00 to value, which is 0, right? 61:02 So what it does is that I can actually think 61:05 of it as the real line, x bar. 61:09 And there's basically some values here, 61:13 where I'm going to get a 1. 61:14 Maybe I'm going to get a 0 here. 61:16 Maybe I'm going to get a 0. 61:17 Maybe I'm going to get a 1. 61:18 I'm just splitting all possible values of x bar. 61:22 And I see whether to spit out the side which is 0 61:25 or which is 1. 61:26 In this case, it's not clear, right? 61:29 I mean, the function is very nonlinear. 61:31 It's x bar minus 0.5 divided by the square root of x bar 1 61:34 minus x bar. 61:35 If we put the p in the denominator, 61:36 that would be clear. 61:38 That would just be exactly something that looks like this. 61:40 61:45 The function would be like this. 61:46 It would be 1 if it's smaller than some value. 61:49 Less than 0 if it's in between two values. 61:52 And then 1 again. 61:54 So that's psi, OK? 62:00 So this is 1, right? 62:02 This is 1. 62:03 And this is 0. 62:04 So if x bar is too small or if x bar is too large, 62:07 then I'm getting a value 1. 62:09 But if it's somewhere in between, I'm getting a value 0. 62:12 Now, if I have this weird function, 62:14 it's not clear how this happened. 62:18 So the picture here that I get is 62:20 that I have a weird non-linear function, right? 62:27 So that's x bar. 62:28 That's square root of n x bar n 0.5 62:32 divided by the square root of x bar n 1 minus x bar n, right? 62:36 That's this function. 62:38 A priori, I have no idea what this function looks like. 62:40 62:43 We can probably analyze this function, 62:45 but let's pretend we don't know. 62:46 So it's like some crazy stuff like this. 62:49 And all I'm asking is whether in absolute value 62:56 it's larger than c, which means that is this function larger 62:59 than c or less than minus c? 63:01 63:05 The intervals on which I'm going to say 1 63:07 are this guy, this guy, this guy, and this guy. 63:17 OK. 63:18 And everywhere else, I'm seeing 0. 63:20 Everybody agree with this? 63:21 This is what I'm doing. 63:24 Now of course, it's probably easier for you 63:27 to just package it into this nice thing that's 63:29 just either larger than c, an absolute value, 63:31 or less Than C. I want to have to plot this function. 63:33 In practice, you don't have to. 63:36 Now, this is where I am actually claiming. 63:40 So here, I actually defined to you a test. 63:42 And I promised, starting this lecture, by saying, 63:44 oh, now we're going to do something better 63:46 than computing the averages. 63:47 Now I'm telling you it's just computing an average. 63:50 And the thing is the test is not just 63:52 the specification of this x bar. 63:54 It's also the specification of this constant c. 63:57 All right? 63:58 And the constant c was exactly where 64:02 our belief about what a large value for a Gaussian is. 64:07 That's exactly where it came in. 64:09 So this choice of c is basically a threshold 64:12 at which we decide above this threshold this isn't 64:15 likely to come from a Gaussian. 64:17 Below this threshold we decide that it's 64:18 likely to come from a Gaussian. 64:20 So we have to choose what this threshold is based 64:24 on what we think likely means. 64:26 64:34 Just a little bit more of those things. 64:37 So now we're going to have to characterize 64:39 what makes a good test, right? 64:43 Well, I'll come back to it in a second. 64:44 But you could have a test that says reject all the time. 64:48 And that's going to be bad test, right? 64:50 The FDA is not implementing a test 64:52 that says, yes all drugs work, now let's just go to Aruba, OK? 64:56 So people are trying to have something that 64:59 tries to work all the time. 65:01 Now FDA's not either saying, let's just 65:03 say that no drugs work, and let's go to Aruba, all right? 65:07 They're just trying to say the right thing 65:09 as often as possible. 65:11 And so, we're going to have to measure this. 65:13 So the things that are associated to a test 65:15 are the rejection region. 65:17 And if you look at this x in en, such that psi 65:21 of x is equal to 1, this is exactly this guy that I drew. 65:25 65:29 So here, I summarized the values of the sample 65:30 into their average. 65:32 But the values of the sample that I collect 65:35 will lead to a test that says 1. 65:38 All right? 65:39 So this is the rejection region. 65:40 If I collect a data point, technically I have-- 65:43 so I have e to the n, which is a big space like this. 65:51 So that's e to the n. 65:52 Think of it as being the space of xn bars. 65:55 And I have a function that takes only value 0, 1. 65:59 So I can decompose it into this part 66:01 where it takes value 0 and the part where it takes value 1. 66:04 And those can be super complicated, right? 66:06 Can have a thing like this. 66:07 Can have some weird little islands where it takes value 1. 66:11 I can have some islands where it's takes value 0. 66:14 I can have some weird stuff going on. 66:16 But I can always partition it into the value 66:18 where it takes value 0 and the value where it takes value 1. 66:20 And the value where it takes 1, if psi is equal to 1, 66:25 this is called the rejection region of the plot, OK? 66:32 So just the samples that would lead me to rejecting. 66:37 And notice that this is the indicator of the rejection 66:42 region. 66:44 The test is the indicator of the rejection region. 66:48 So there's two ways you can make an error when there's a test. 66:52 Either the truth is in h1, and you're saying actually it's h1. 66:56 Or the truth is in h1, and you say it's h0. 66:59 And that's how we build-in the asymmetry between h0 and h1. 67:04 We control only one of the two errors. 67:06 And we hope for the best for the second one. 67:09 So the type 1 error is the one that says, well, 67:13 if it is actually the status quo, but a claim 67:16 that there is a discovery-- if it's actually h0, 67:19 but I claim that I'm in h1, then I 67:21 admit I commit a type I error. 67:25 And so the probability of type I error 67:27 is this function alpha of psi, which 67:29 is the probability of saying that psi is equal to 1 67:34 when theta is in h0. 67:36 Now, the problem is that this is not just number, 67:38 because theta is just like moving all over h0, right? 67:41 There's many values that theta can be, right? 67:45 So theta is somewhere here. 67:48 67:52 I erased it, OK. 67:53 67:59 All right. 68:00 For simplicity, we're going to think of theta 68:02 as being mu and 103.5, OK? 68:07 And so, I know that this is theta 1. 68:11 And just this point here was theta 0, OK? 68:18 Agreed? 68:19 This is with the Cherry Blossom Run. 68:22 Now, here in this case, it's actually easy. 68:25 I need to compute this function alpha 68:27 of psi, which maps theta in theta 0 to p theta of psi 68:37 equals 1. 68:37 So that's the probability that I reject when theta is in h0. 68:40 Then there's only one of them to compute, 68:42 because theta can only take this one value. 68:44 So this is really 103.5. 68:46 OK. 68:47 So that's the probability that I reject 68:48 when the true mean was 103.5. 68:52 Now, if I was testing whether-- 68:54 if h0 was this entire guy here, all the 68:57 values larger than 103.5, then I would 69:00 have to compute this function for all possible values 69:03 of the theta in there. 69:06 And guess what? 69:07 The worst case is when it's going to be here. 69:09 Because it's so close to the alternative 69:11 that that's where I'm making the most error possible. 69:15 And then there's the type 2 error, 69:17 which is defined basically in the symmetric ways. 69:22 The function that maps theta to the probability. 69:25 So that's the probability of type 2 errors. 69:26 The probability that I fail to reject h0, right? 69:30 If psi is equal to 0, I fail to reject h0. 69:34 But that actually came from h1, OK? 69:39 So in this example, let's clear. 69:41 If I'm here, like if the true mean was 100, 69:45 I'm looking at the probability that the true mean is actually 69:48 100, and I'm actually saying it was 103.5. 69:51 Or it's not less than 103.5. 69:53 Yeah? 69:54 STUDENT: I'm just still confused by the notation. 69:56 When you say that [INAUDIBLE] theta sub 1 arrow r, 69:59 I'm not sure what that notation means. 70:02 PROFESSOR: Well, this just means it's a function 70:04 that maps theta 0 to r. 70:08 You've seen functions, right? 70:09 OK. 70:10 So that's just the way you write. 70:14 So that means that's a function f that goes from, say, r r, 70:20 and that maps x to x squared. 70:25 OK. 70:25 So here, I'm just saying I don't have to consider 70:27 all possible values. 70:29 I'm only considering the values on theta 0. 70:32 I put r actually. 70:33 I could restrict myself to the interval 0, 1, 70:36 because those are probabilities. 70:38 So it's just telling me where my function comes from 70:41 and where my function goes to. 70:44 And beta is a function, right? 70:47 So beta psi of theta is just the probability 70:52 that theta is equal to 1. 70:55 And I could define that for all thetas-- 70:57 sorry. 70:58 If psi is equal to 0 in this case. 71:00 And that could define that for all thetas. 71:02 But the only ones that lead to an error 71:05 are the thetas that are in h1. 71:06 I mean, I can define this function. 71:08 It's just not going to correspond to an error, OK? 71:11 71:13 And the power of a test is the smallest-- 71:18 so the power is basically 1 minus an error. 71:22 1 minus the probability of an error. 71:23 So it's the probability of making a correct decision, OK? 71:27 So it's the probability of making a correct decision 71:29 under h1, that's what the power is. 71:31 But again, this could be a function. 71:34 Because there's many ways that can be in h1 71:36 if h1 is an entire set of numbers. 71:39 For example, all the numbers there are less than 103.5. 71:42 And so, what I'm doing here when I define the power of a test, 71:45 I'm looking at the smallest possible of those values, OK? 71:50 So I'm looking at this function. 71:51 71:54 Maybe I should actually expand a little more on this. 71:57 72:02 OK. 72:03 So beta psi of theta is the probability under theta 72:10 that psi is equal to 0, right? 72:12 That's the probability in theta 1, 72:18 which means then the alternative, that they 72:21 feel to reject. 72:21 And I really should, because theta 72:23 was actually in theta 1, OK? 72:25 So this thing here is the probability of type 2 error. 72:29 Now, this is 1 minus the probability that I did reject 72:34 and I should have rejected. 72:36 That's just a little off the complement. 72:39 Because if psi is not equal to 0, then it's equal to 1. 72:42 So now if I rearrange this, it tells me 72:44 that the probability that psi is equal to 1-- 72:48 this is actually 1 minus beta psi of theta. 72:50 72:54 So that's true for all thetas in theta 1. 72:57 And what I'm saying is, well, this 72:58 is now a good thing, right? 73:00 This number being large is a good thing. 73:02 It means I should have rejected, and I rejected. 73:05 I want this to happen with large probability. 73:07 And so, what I'm going to look at 73:09 is the most conservative choice of this number, right? 73:11 Rather than being super optimistic 73:13 and say, oh, but indeed if theta was actually equal to zero, 73:16 then I'm always going to conclude that-- 73:19 I mean, if mu is equal to 0, everybody runs in 0 seconds, 73:22 then I with high probability I'm actually 73:25 going to make no mistake. 73:27 But really, I should look at the worst possible case, OK? 73:30 So what I'm looking at is basically 73:32 the smallest value it can take on theta one 73:45 is called power of psi. 73:53 Power of the test psi, OK? 73:55 So that's the smallest possible value it can take. 73:58 74:01 All right. 74:02 So I'm sorry. 74:02 This is a lot of definitions that you have to sink in. 74:05 And it's not super pleasant. 74:06 But that's what testing is. 74:09 There's a lot of jargon. 74:10 Those are actually fairly simple things. 74:12 Just maybe you should get a sheet for yourself. 74:14 And say, these are the new terms that I learned. 74:17 What is their test, rejection region? 74:19 Probably of type I error, probably 74:21 of type 2 error, and power. 74:22 Just make sure you know what those guys are. 74:23 Oh. 74:24 And null and alternative hypothesis, OK? 74:27 And once you know all these things, 74:28 you know what I'm talking about. 74:29 You know what I'm referring to. 74:31 And this is just jargon. 74:33 But in the end, those are just probabilities. 74:35 I mean, these a natural quantities. 74:38 Just for some reason, people have been used 74:40 to using different terminology. 74:43 So just to illustrate. 74:46 When do I make a typo 1 error? 74:48 And when do I not make a type 1 error? 74:51 So I make a type 1 error if h0 is true and I reject h0, right? 74:56 So the off diagonal blocks are when I make an error. 74:59 When I'm on the diagonal terms, h1 is true 75:02 and I reject h0, that's a correct decision. 75:05 When h0 is true and I fail to reject h0, 75:08 that's also the correct decision to make. 75:11 So I only make errors when I'm in one of the red blocks. 75:17 And one block is the type 1 error and the other block 75:20 is the type 2 error. 75:21 That's all it means, OK? 75:24 So you just have to know which one we called one. 75:26 75:32 I mean, this was chosen in a pretty ad hoc way. 75:36 So to conclude this lecture, let me ask you a few questions. 75:40 If in a US court, the defendant is found either say, 75:46 let's just say for the sake of discussion, innocent or guilty. 75:49 All right? 75:50 It's really guilty for not guilty, 75:51 but let's say innocent or guilty. 75:53 When does the jury make a type 1 error? 75:56 76:03 Yep? 76:03 76:07 And he's guilty? 76:10 And he's innocent, right? 76:11 The status quo, everybody is innocent until proven guilty. 76:14 So that's our h0 is that the person is innocent. 76:18 And so, that means that h0 is innocent. 76:21 And so, we're looking at the probably of type 1 error, 76:23 so that's when we reject the fact that it's innocent. 76:25 So conclude that this person is guilty, OK? 76:27 So type 1 error is when this person is innocent 76:29 and we conclude it's guilty. 76:31 What is the type 2 error? 76:32 76:36 Letting a guilty person go free, which 76:38 actually according to the constitution, 76:40 is the better of the two. 76:42 All right? 76:42 So what we're going to try to do is to control the first one, 76:45 and hope for the best for the second one. 76:47 How could the jury make sure that they make no type 1 76:51 error ever? 76:52 76:57 Always let the guy go free, right? 77:01 What is the effect on the type 2 error? 77:03 77:06 Yeah, it's the worst possible, right? 77:08 I mean, basically, for every guy that's guilty, you let them go. 77:12 That's the worst you can do. 77:14 And same thing, right? 77:15 How can the jury make sure that there's no type 2 error? 77:20 Always convict. 77:21 What is the effect on the American budget? 77:22 What is the effect on the type 1 error? 77:25 77:28 Right. 77:28 So the effect is that basically the type 1 error is maximized. 77:31 So there's this trade off between type 1 77:33 and type 2 error that's inherent. 77:35 And that's why we have this sort of multi objective thing. 77:39 We're trying to minimize two things at the same time. 77:41 And I can't find many ad hoc ways, right? 77:44 So if you've taken any optimization, 77:46 trying to optimize two things when one is going up 77:49 while the other one is going down, the only thing you can do 77:51 is make ad hoc heuristics. 77:53 Maybe you try to minimize the sum of those two guys. 77:55 Maybe you try to minimize 1/3 of the first guy 77:59 plus 2/3 of the second guy. 78:00 Maybe you try to minimize the first guy plus the square 78:03 of the second guy. 78:04 You can think of many ways, but none of them 78:05 is more justified than the other. 78:07 However, for statistical hypothesis testing, 78:10 there's one that's very well justified, which is just 78:12 constrain your type 1 error to be the smallest, 78:15 to be at a level that you deem acceptable. 78:18 5%. 78:18 78:24 I want to convict at most 5% of innocent people. 78:27 That's what I deem reasonable. 78:29 And based on that, I'm going to try to convict as many people 78:33 as they can, all right? 78:37 So that's called the Nieman Pearson paradigm, 78:39 and we'll talk about it next time. 78:42 All right. 78:43 Thank you. 78:44