https://www.youtube.com/watch?v=vMaKx9fmJHE&list=PLUl4u3cNGP60uVBMaoNERc6knT_MgPKS0&index=11 字幕記錄 00:00 00:00 The following content is provided under a Creative 00:02 Commons license. 00:03 Your support will help MIT OpenCourseWare 00:06 continue to offer high-quality educational resources for free. 00:10 To make a donation or to view additional materials 00:12 from hundreds of MIT courses, visit MIT OpenCourseWare 00:16 at ocw.mit.edu. 00:17 00:19 PHILIPPE RIGOLLET: We're talking about goodness-of-fit tests. 00:22 Goodness-of-fit tests are, does my data 00:25 come from a particular distribution? 00:27 And why would we want to know this? 00:28 Well, maybe we're interested in, for example, 00:32 knowing if the zodiac signs of the Fortune 500 CEOs 00:36 are uniformly distributed. 00:38 Or maybe we actually have slightly more-- 00:41 slightly deeper endeavors, such as understanding 00:44 if you can actually apply the t-test by testing normality 00:48 of your sample. 00:49 All right? 00:49 So we saw that there's the main result-- 00:51 the main standard test for this. 00:53 It's called the Kolmogorov-Smirnov test 00:55 that people use quite a bit. 00:57 It's probably one of the most used tests out there. 01:00 And there's other versions of it that I mentioned passing by. 01:05 There's the Cramer-von Mises, and there's 01:08 the Anderson-Darling test. 01:09 Now, how would you pick one of such tests? 01:12 Well, they're always are going to-- they're always 01:14 going to have their advantages and disadvantages. 01:17 And Kolmogorov-Smirnov is definitely the most widely used 01:22 because-- 01:23 well, I guess because it's a natural notion 01:24 of distance between functions. 01:26 You just look for each point how far they can be, 01:28 and you just look at the farthest 01:30 they can be everywhere. 01:31 Now, Cramer-von Mises involves L2 distance. 01:34 So if you're not used to Hilbert spaces or notions 01:39 of Euclidean spaces, at least it's a little more complicated. 01:43 And then Anderson-Darling is definitely 01:44 even more complicated. 01:45 Now, each of these tests is going 01:47 to be more powerful against other alternatives. 01:49 So unless you can really guess which alternative 01:52 you're expecting to see, which you probably 01:54 don't, because, again, you're in a case where you want 01:56 to typically declare H0 to be the correct one, 02:00 then it's really a matter of tossing a coin. 02:04 Maybe you can run all three of them 02:06 and just sleep better at night, because all three of them 02:09 have failed to reject, for example. 02:11 All right? 02:12 So as I mentioned, one of the maybe primary goals 02:15 to test goodness of fit is to be able to check 02:19 whether we can apply Student's test, right, 02:22 and if the Student distribution is actually 02:24 a valid distribution. 02:25 And for that, we need to have normally distributed data. 02:28 Now, as I said several times, normally distributed, 02:32 it's not a specific distribution. 02:34 It's a family of distributions that's 02:35 indexed by means and variances. 02:38 And the way I would want to test if a distribution is normally 02:41 distributed is, well, I would just 02:42 look at the most natural normal distribution 02:45 or Gaussian distribution that my data could follow. 02:48 That means that's the Gaussian distribution that 02:50 has the same mean as my data and the same empirical variance 02:53 as my data, right? 02:55 And so I'm going to be given some points x1, xn, 03:00 and I'm going to be asking, are those Gaussian? 03:02 03:04 That means this is equivalent to, say, 03:07 are they N mu sigma square for some mu sigma squared? 03:15 And of course, the natural choice 03:17 is to take mu hat to be-- 03:20 mu to be equal to mu hat, which is xn bar. 03:23 And sigma squared to be sigma squared hat to be, 03:30 well, Sn hat-- 03:32 Sn-- what we wrote Sn, which is 1/n sum from i equal 1 to n 03:37 of xi minus xn bar squared. 03:40 OK? 03:41 So this is definitely the natural one 03:44 you would want to test. 03:45 And maybe you could actually just close your eyes 03:47 and just stuff that in a Kolmogorov-Smirnov test. 03:52 OK? 03:53 So here, there's a few things that don't work. 03:55 The first one is that Donsker's theorem does not 03:57 work anymore, right? 03:58 Donsker's theorem was the one that 03:59 told us that, properly normalized, 04:02 this thing would actually converge 04:04 to the supremum of a Brownian bridge, which is not true. 04:07 So that's one problem. 04:08 But there's actually an even bigger problem 04:10 is that this distribution, we will check in a second, 04:13 actually does not-- 04:15 is pivotal itself, right, the statistic is pivotal. 04:19 It does not have a distribution that 04:20 depends on the known parameters, which 04:22 is sort of nice, at least under the null. 04:24 However, the distribution is not the same 04:27 as the one that had fixed mu and sigma. 04:31 The fact that they come from some random variables 04:34 is actually distorting the distribution itself. 04:36 And in particular, the quantiles are going to be distorted, 04:39 and we hinted at that last time. 04:41 So one other thing I need to tell you, though, 04:44 is that this thing actually-- so I know there's some-- 04:48 oh, yeah, that's where there's a word missing. 04:51 So we compute the quantiles for this test statistic. 04:54 And so what I need to promise to you 04:56 is that these quantiles do not depend 04:59 on any unknown parameter, right? 05:01 I mean, it's not clear, right? 05:06 So I want to test whether my data has some Gaussian 05:09 distribution. 05:09 So under the null, all I know is that my xi's are 05:15 Gaussian with some mean mu and some variance sigma, 05:18 which I don't know. 05:19 So it could be the case that when 05:20 I try to understand the distribution of this quantity 05:23 under the null, it depends on mu and sigma, which I don't know. 05:28 So we need to check that this is the case. 05:30 And what's actually our redemption 05:33 here is actually going to be the supremum. 05:37 The supremum is going to basically allow 05:39 us to, say, sup out mu and sigma square. 05:43 So let's check that, right? 05:44 So what I'm interested in is this quantity, supremum 05:48 over t and R of the difference between Fn of t 05:54 and, what I write, phi mu hat sigma squared of t. 06:02 So phi mu hat sigma hat squared-- 06:07 sorry, sigma hat squared-- 06:09 is the CDF of some Gaussian with mean mu hat and variance sigma 06:15 hat squared. 06:16 And so in particular, this thing here, phi hat of mu hat-- 06:24 sorry, phi hat of mu hat sigma hat squared of t 06:30 is the probability that some x is less than t, 06:34 where x follows some N mu hat sigma hat squared. 06:39 So what it means is that by just the translation 06:42 and scaling trig that we typically do 06:44 for Gaussian to turn it into some standard Gaussian, that 06:47 implies that there exists some z, which 06:50 is standard Gaussian this time, so mean 0 and variance 1, 06:54 such that x is equal to sigma hat x-- 06:58 sorry, z plus mu hat. 07:02 Agreed? 07:04 That's basically saying that x has some Gaussian with mean mu 07:08 and variance sigma squared. 07:09 And I'm not going to say the hats every single time, OK? 07:13 So OK, so that's what it means. 07:17 So in particular, maybe I shouldn't use x here, 07:20 because x is going to be my actual data. 07:22 So let me write y. 07:23 07:27 OK? 07:29 So now what is this guy here? 07:32 It's basically-- so phi hat. 07:35 So this implies that phi mu hat sigma hat squared of t 07:42 is equal to the probability that sigma hat z 07:46 plus mu hat is less than t, which 07:50 is equal to the probability that z is less than t 07:53 minus mu hat divided by sigma hat, right? 08:00 But now when z is the standard normal, 08:02 this is really just the cumulative distribution 08:04 function of a standard Gaussian but evaluated 08:07 at a point which is not t, but t minus mu 08:09 hat divided by sigma hat. 08:11 All right? 08:11 So in particular, what I know-- so from this what I get-- well, 08:15 maybe I'll remove that, it's going to be annoying-- 08:17 I know that phi mu hat sigma hat squared-- 08:23 sorry-- phi mu hat sigma hat squared of t 08:27 is simply phi of, say, 0, 1. 08:31 And that's just the notation. 08:32 Usually we don't put those, but here it's more convenient. 08:35 So it's phi 0, 1 of t minus mu hat divided by sigma hat. 08:43 OK? 08:45 That's just something you can quickly check. 08:48 There's this nice way of writing the cumulative distribution 08:51 function for any mean and any variance 08:55 in terms of the cumulative distribution function 08:57 with mean 0 and variance 1. 08:59 All right? 08:59 Not too complicated. 09:00 All right. 09:01 So I know what I'm going to say is that, OK, I have this sup 09:04 here. 09:05 So what I can write is that this thing here 09:07 is equal to the sup routine R of 1/n. 09:12 Let me write what Fn is-- 09:14 sum from i equal 1 to n of the indicator 09:17 that xi is less than t minus phi 0, 1 09:23 of t minus mu hat divided by sigma hat. 09:27 09:30 OK? 09:32 I actually want to make a change of variable 09:34 so that this thing I'm going to call mu-- 09:36 u, sorry. 09:37 OK? 09:38 And so I'm going to make my life easier, 09:40 and I'm going to make it appear here. 09:42 And so I'm just going to replace this by indicator 09:46 that xi minus mu hat divided by sigma hat less than t 09:52 minus mu hat divided by sigma hat, which is 09:56 sort of useless at this point. 09:57 I'm just making my formula more complicated. 10:00 But now I see something here that shows up, 10:02 and I will call it u, and this is another u. 10:06 OK? 10:08 So now what it means is that suping over t, when t ranges 10:12 from negative infinity to plus infinity, 10:15 the new range is from negative infinity to plus infinity, 10:17 right? 10:20 So this sup, I can actually write-- 10:22 this suping t I can write as the sup in u, 10:34 as the indicator that xi minus mu hat divided by sigma hat 10:38 is less than u minus phi 0, 1 of u. 10:47 Now, let's pause for one second. 10:49 Let's see where we're going. 10:51 What we're trying to show that this thing does not 10:53 depend on the unknown parameters, say, mu and sigma, 10:57 which are the mean and the variance of x under the null. 11:01 To do that, we basically need to make 11:04 only quantities that are sort of invariant under these values. 11:09 So I tried to make this thing invariant under anything, 11:11 and it's just really something that depends on nothing. 11:14 It's the CDF. 11:15 It doesn't depend on sigma hat and mu hat anymore. 11:18 But sigma hat and mu hat will depend on mu and sigma, right? 11:22 I mean, they're actually good estimators of those guys, 11:24 so they should be pretty close to them. 11:26 And so I need to make sure that I'm not actually 11:28 doing anything wrong here. 11:30 So the key thing here is going to be to observe that 1/n sum 11:35 from i equal 1 to n of indicator of xi minus u hat divided 11:40 by sigma hat less than u, which is the first term that I have 11:43 in this absolute value, well, this is what-- well, 11:48 this is equal to 1/n sum from i equal 1 to n of indicator 11:54 that-- 11:55 well, now under the null, which is 12:00 that x follows N mu sigma squared, for some mu and sigma 12:06 squared that are unknown. 12:07 But they are here. 12:08 They exist. 12:08 I just don't know what they are. 12:10 Then xi minus mu can be written as sigma zi plus mu 12:17 minus mu hat divided by sigma hat, where 12:23 z is equal to x minus mu divided by sigma, right? 12:29 That's just the same trick that I wrote here. 12:32 OK? 12:33 Everybody agree? 12:34 So I just standardize-- 12:36 sorry, z-- yeah, so zi is xi minus mu i minus mu divided 12:42 by sigma. 12:42 All right? 12:43 Just a standardization. 12:45 So now once I write this, I can actually 12:47 divide everybody by sigma. 12:49 12:55 Right? 12:55 So I just divided on top here and in the bottom here. 12:59 So now what I need to check is that the distribution 13:02 of this guy does not depend on mu or sigma. 13:08 That's what I claim. 13:10 What is the distribution of this indicator? 13:12 13:16 It's a Bernoulli, right? 13:19 And so if I want to understand its distribution, 13:21 all I need to do is to compute its expectation, 13:23 which is just the probability that this thing happens. 13:26 But the probability that this thing happens 13:27 is actually now depending on mu and sigma. 13:29 And the reason is that mu is what? 13:33 Well, it's x bar-- sorry, yeah, so mu hat-- sorry, is xn bar. 13:44 So mu hat minus mu, which under the null 13:50 follows N mu sigma square over n, right? 13:54 That's the property of the average. 13:57 So when I do mu hat minus mu divided by sigma, 14:00 this thing is what distribution? 14:04 It's still a normal. 14:05 It's a linear transformation of a normal. 14:07 What are the parameters? 14:11 AUDIENCE: 0, 1/n. 14:11 PHILIPPE RIGOLLET: Yeah, 0, 1/n. 14:13 14:16 But this does not depend on mu or sigma, right? 14:26 14:29 Now, I need to check that this guy does not 14:31 depend on mu or sigma. 14:34 What is the distribution of sigma hat over sigma? 14:37 14:40 AUDIENCE: It's a chi-square, right? 14:41 PHILIPPE RIGOLLET: Yeah, it is a chi-square. 14:43 So this is actually-- 14:45 sorry, sigma hat squared divided by sigma squared 14:48 is a chi-square with n minus 1 degrees of freedom. 14:54 Does not depend on mu or sigma. 14:55 15:00 AUDIENCE: [INAUDIBLE] 15:02 AUDIENCE: [INAUDIBLE] 15:03 AUDIENCE: Or sigma hat squared over sigma squared? 15:05 PHILIPPE RIGOLLET: Yeah, thank you. 15:07 So this is actually divided by it. 15:10 So maybe this guy. 15:11 Let's write it like that. 15:12 This is the proper way of writing it. 15:14 Thank you. 15:14 15:20 Right? 15:21 So now I have those two things. 15:22 Neither of them depends on mu or sigma. 15:25 I these two things. 15:28 There's just one more thing to check. 15:29 15:32 What is it? 15:32 15:35 AUDIENCE: That they're independent? 15:36 PHILIPPE RIGOLLET: That they're independent, right? 15:37 Because the dependence in mu and sigma 15:39 could be hidden in the covariance. 15:41 It could be the case that the marginal distribution of mu 15:44 does not depend on mu or sigma, that the marginal distribution 15:47 of sigma-- 15:48 of mu hat does not depend on mu and sigma. 15:49 The marginal distribution of sigma hat 15:51 does not depend on mu or sigma, but their correlation 15:54 could depend on mu and sigma. 15:56 But we also have that if I look at-- 15:59 so if I look at-- 15:59 16:02 so since mu hat is independent of sigma hat, 16:10 it means that the joint distribution of mu hat divided 16:33 by sigma and sigma hat divided by sigma 16:38 does not depend on blah, blah, blah, on mu and sigma. 16:46 OK? 16:47 16:50 Agree? 16:52 It's not in the individual ones, and it's not 16:54 in the way they interact with each other. 16:57 It's nowhere. 16:59 AUDIENCE: [INAUDIBLE] independence be [INAUDIBLE] 17:01 theorem? 17:02 PHILIPPE RIGOLLET: Yeah, covariance theorem, right. 17:03 So that's something we've been using over and again. 17:06 That's all under the null. 17:07 If my data is not Gaussian, nothing actually holds. 17:12 I just use the fact that under the null 17:14 I'm Gaussian for some mean mu and variance sigma squared. 17:17 But that's all I care about. 17:18 When I'm designing a test, I only 17:21 care about the distribution under the null, at least 17:24 to control the type I error. 17:26 Then to control the type II error, 17:28 then I cross my fingers pretty hard. 17:31 OK? 17:32 17:34 So now this basically implies what's written on the board, 17:41 that this distribution, this test statistic, 17:45 does not depend on any unknown parameters. 17:48 It's just something that's pivotal. 17:50 In particular, I could go at the back of a book 17:53 and check if there's a table for the quantiles of these things, 17:56 and indeed there are. 17:58 This is the table that you see. 18:00 So actually, this is not even in a book. 18:02 This is in Lilliefors original paper, 1967, 18:09 as you can tell from the typewriting. 18:13 And he actually probably was rolling some dice 18:17 from his office back in the day and was checking 18:19 that this was-- he simulated it, and this is 18:22 how he computed those numbers. 18:24 And here you also have some limiting distribution, 18:28 which is not the sup of a Brownian motion over 0, 18:31 1 of-- sorry, of a Brownian bridge over 0, 18:35 1, which is the one that you would 18:36 see for the Kolmogorov-Smirnov test, 18:38 but it's something that's slightly different. 18:41 And as I said, these numbers are actually typically much smaller 18:45 than the numbers you would get, right? 18:47 Remember, we got something that was about 0.5, I think, 18:50 or maybe 0.41, for the Kolmogorov-Smirnov test 18:54 at the same entrance, which means 18:56 that using Kolmogorov-Lilliefors test 18:58 it's going to be harder for you not 18:59 to reject for the same data. 19:02 It might be the case that in one case you reject, 19:04 and in the other one you fail to reject. 19:06 But the ordering is always that if you 19:09 fail to reject with Kolmogorov-Lilliefors, 19:12 you will fail to reject with Kolmogorov-Smirnov, right? 19:17 There's always one. 19:18 So that's why people tend to close their eyes 19:20 and prefer Kolmogorov-Smirnov because it just 19:23 makes their life easier. 19:25 OK? 19:27 So this is called Kolmogorov-Lilliefors. 19:29 I think there's actually an E here-- 19:33 sorry, an I before the E. Doesn't matter too much. 19:41 OK? 19:42 Are there any questions? 19:43 Yes? 19:43 AUDIENCE: Is there like a place you 19:45 can point to like [INAUDIBLE] 19:59 PHILIPPE RIGOLLET: Yeah. 20:00 AUDIENCE: [INAUDIBLE]. 20:01 PHILIPPE RIGOLLET: So the fact that it's actually 20:03 a different distribution is that here-- 20:07 so if I actually knew what mu and sigma were, 20:11 I would do exactly the same thing. 20:13 But here, rather than having this average with mu and sigma, 20:16 I would just have the-- 20:17 with mu hat and sigma hat, I would just 20:19 have the average with mu and sigma. 20:20 OK? 20:21 So what it means is that the key thing 20:23 is that what I would compare is the 1/n sum of some Bernoullis 20:29 with parameter. 20:30 And the parameter here would be the probability that mu-- 20:34 xi minus mu over sigma is less than u, 20:37 which is just the probability that phi-- 20:40 sorry, it's a Bernoulli with probability F of t. 20:44 Well, let me write what it is, right? 20:49 So that's minus phi 0, 1 of t. 20:57 OK? 20:57 So that's for the K-S test, and then I sup over t, right? 21:04 That's what I would have had, because this is actually 21:06 exactly the right thing. 21:08 Here I would remove the true mean. 21:10 I would divide by the true standard deviation. 21:12 So that would actually end up being a standard Gaussian, 21:15 and that's why I'm allowed to use phi 0, 1 here. 21:18 Agreed? 21:19 And these are Bernoullis because they're just indicators. 21:22 What happens in the Kolmogorov-Lilliefors test? 21:26 Well, here the Bernoulli, the only thing 21:28 that's going to change is this guy, right? 21:30 They still have a Bernoulli. 21:31 It's just that the parameters of the Bernoulli are weird. 21:34 The parameters of the Bernoulli looks like it's-- 21:37 it becomes the probability that some N(0, 1) plus some N(0, 21:47 1/n), right, divided by some square root of chi-squared n 22:02 minus 1 divided by n is less than t. 22:07 And those things are independent, 22:09 but those guys are not necessarily independent, right? 22:12 And so why is this probability changing? 22:14 Well, because this denominator is actually fluctuating a lot. 22:17 So that actually makes this probability different. 22:20 And so that's basically where it comes from, right? 22:23 So you could probably convince yourself 22:26 very quickly that this only makes those guys closer. 22:32 And why does it make those guys closer? 22:38 22:40 No, sorry. 22:41 It makes those guys farther, right? 22:43 And it makes those guys farther for a very clear reason, 22:46 is that the expectation of this Bernoulli is exactly that guy. 22:51 Here I think it's going to be true 22:52 as well that the expectation of this Bernoulli 22:54 is going to be that guy, but the fluctuations 22:56 are going to be much bigger than just the phi of the Bernoulli. 22:58 Because the first thing I do is I 22:59 have a random parameter from my Bernoulli, 23:01 and then I flip the Bernoulli. 23:02 So fluctuations are going to be bigger than a Bernoulli. 23:04 And so when I take the sup, I'm going 23:06 to have to [INAUDIBLE] them. 23:07 So it makes things farther apart, 23:09 which makes it more likely for you to reject. 23:11 Yeah? 23:12 AUDIENCE: You also said that if you compare the same-- if you 23:16 compare the table and you set at the same level, 23:19 the Lilliefors is like 0.2, and for the Smirnov is at 0.4. 23:24 PHILIPPE RIGOLLET: Yeah. 23:25 AUDIENCE: OK. 23:26 So it means that Lilliefors is harder not to reject? 23:30 PHILIPPE RIGOLLET: It means that Lilliefors is harder 23:32 not to reject, yes, because we reject when 23:35 we're larger than the number. 23:36 So the number being smaller with the same data, we might be, 23:39 right? 23:40 So basically, it looks like this. 23:43 What we run-- so here we have the distribution for the-- 23:55 so let's say this is the density for K-S. 24:05 And then we have the density for Kolmogorov-Lilliefors, K-L. OK? 24:11 And what the density of K-L looks like, 24:13 it looks like this, right? 24:22 And so if I want to squeeze in alpha here, 24:27 I'm going to have to squeeze in-- and I squeeze in alpha 24:30 here, then this is the quantile of order 1 minus alp-- 24:34 well, let's say alpha of the K-L. 24:38 And this is the quantile alpha of K-S. 24:41 So now you give me data, and what I do with it, 24:44 I check whether they're larger than this number. 24:46 So if I apply K-S, I check whether I'm larger or smaller 24:48 than this thing. 24:49 But if I apply Kolmogorov-Lilliefors, 24:51 I check whether I'm larger or smaller than this thing. 24:53 So over this entire range of values for my test statistic-- 24:56 because it is the same test statistic, 24:58 I just plugged in mu hat and sigma hat-- 25:00 for this entire range, the two tests have different outcomes. 25:04 And this is a big range in practice, right? 25:06 I mean, it's between-- 25:08 I mean, it's pretty much at scale here. 25:10 25:13 OK? 25:14 25:18 Any other-- yeah? 25:18 AUDIENCE: [INAUDIBLE] when n goes to infinity, the two tests 25:21 become the same now, right? 25:24 PHILIPPE RIGOLLET: Hmmm. 25:25 AUDIENCE: Looking at that formula-- 25:27 PHILIPPE RIGOLLET: Yeah, They should become the same 25:29 very far. 25:30 25:32 Let me see, though, because-- 25:34 right. 25:35 So here we have 8-- 25:38 so here we have, say, for 0.5, we get 0.886. 25:44 And for-- oh, I don't have it. 25:45 25:49 Yeah, actually, sorry. 25:50 So you're right. 25:51 You're totally right. 25:52 This is the Brownian bridge values. 25:56 Because in the limit by, say, Slutsky-- 26:00 sorry, I'm lost. 26:02 Yeah, these are the values that you 26:03 get for the Brownian bridge. 26:04 Because in the limit by Slutsky, this thing 26:07 is going to have no fluctuation, and this thing 26:09 is going to have no fluctuation. 26:11 So they're just going to be pinned down, 26:12 and it's going to look like as if I did not replace anything. 26:15 Because in the limit, I know those guys much faster-- 26:18 the mu hat and sigma hat converge 26:20 much faster to mu and sigma than the distribution itself, right? 26:25 So those are actually going to be negligible. 26:27 You're right. 26:29 Actually even, I didn't have-- 26:31 these are actually the numbers I showed you 26:32 for the bridge, the Brownian bridge, 26:34 last time, because I didn't have it for the Kolmogorov-Smirnov 26:36 one. 26:38 OK? 26:38 26:41 So there's actually-- so those are numerical ways of checking 26:44 things, right? 26:45 I give you data. 26:47 You just crank the Kolmogorov-Smirnov test. 26:50 Usually you press a 5 on MATLAB. 26:52 But let's say you actually compute this entire thing, 26:55 and there's a number that comes out, 26:57 and you decide whether it's large enough or small enough. 27:00 Of course, statistical software is going to make your life even 27:02 simpler by spitting out a p-value, because you can-- 27:05 I mean, if you can compute quantiles, you can also 27:07 when compute p-values. 27:09 And so your life is just fairly easy. 27:12 You just have red is bad, green is good, and then you can go. 27:18 The problem is that those are numbers you want to rely on. 27:21 But let's say you actually reject. 27:23 Let's say you reject. 27:23 Your p-value is actually just like slightly below 5%. 27:29 So you can say, well, maybe I'm just going to change 27:33 my p-value-- 27:34 my threshold to 1%, but you might 27:36 want to see what's happening. 27:38 And for that you need a visual diagnostic. 27:40 Like, how do I check if something departs 27:42 from being normal, for example? 27:44 How do I check if a distribution-- 27:46 why is a distribution not a uniform distribution? 27:49 Why is a distribution not an exponential distribution? 27:51 There's many, many, right? 27:53 If I have an exponential distribution 27:54 and half of my values are negative, 27:57 for example, well, there's like pretty obvious reasons 27:59 why it should not be exponential. 28:01 But it could be the case that it's 28:03 just the tails are little heavier 28:05 or there's more concentration at some point. 28:08 Maybe it has two modes. 28:10 There's things like this. 28:11 But the real thing, we don't believe 28:13 that the Gaussian is so important because it 28:16 looks like this close to 0. 28:19 What we like about the Gaussian is that the tails here 28:22 decay at this rate-- exponential minus x 28:24 squared over 2 that we described in the maybe first lecture. 28:28 And in particular, if there were like kinks around here, 28:31 it wouldn't matter too much. 28:33 This is not what makes issues for the Gaussian. 28:36 And so what we want is to have a visual diagnostic that tells us 28:41 if the tails of my distribution are 28:44 comparable to the tails of a Gaussian one, for example. 28:48 And those are what's called quantile-quantile plots, 28:51 and in particular-- or QQ plots. 28:54 And the basic QQ plots we're going to be using 28:58 are the ones that are called normal QQ plots that 29:00 are comparing your data to a Gaussian distribution, 29:03 or a normal distribution. 29:05 But in general, you could be comparing your data 29:07 to any distribution you want. 29:09 And the way you do this is by comparing 29:11 the quantiles of your data, the empirical quantiles, 29:14 to the quantiles of the actual distribution 29:16 you're trying to compare yourself to. 29:19 So this, in a way, is a visual way 29:22 of performing these goodness-of-fit tests. 29:25 And what's nice about visual is that there's room for debate. 29:29 You can see something that somebody else cannot see, 29:31 and you can always-- because you want to say that things are 29:33 Gaussian. 29:34 And we'll see some examples where you can actually say it 29:36 if you are good at debate, but it's actually 29:41 going to be clearly not true. 29:44 All right. 29:44 So this is a quick and easy check. 29:46 That's something I do all the time. 29:48 You give me data, I'm just going to run this. 29:49 One of the first things I do so I 29:51 can check if I can start entering the Gaussian 29:54 world without compromising myself too much. 29:57 And the idea is to say, well, if F is close to-- if F-- 30:04 if my data comes from an F, and if I 30:07 know that Fn is close to F, then rather 30:10 than computing some norm, some number that tells me 30:12 how far they are, summarizing how far they are, 30:14 I could actually plot the two functions 30:16 and see if they're far apart. 30:17 So let's think for one second what this kind of a plot 30:21 would look like. 30:23 Well, I would go between 0 and 1. 30:25 That's where everything would happen. 30:26 Let's say my distribution is the Gaussian distribution. 30:29 So this is the CDF of N(0, 1). 30:35 And now I have this guy that shows up, and remember 30:37 we had this piecewise constant. 30:39 30:44 Well, OK, let's say we get something like this. 30:46 We get a piecewise constant distribution for Fn, right? 30:51 30:54 Just from this, and even despite my bad skills at drawing, 31:00 it's clear that it's going to be hard 31:01 for you to distinguish those two things, 31:03 even for a fairly large amount of points. 31:05 Because the problem is going to happen here, 31:08 and those guys look pretty much the same everywhere 31:11 you are here. 31:11 You're going to see differences maybe in the middle, 31:14 but we don't care too much about those differences. 31:17 And so what's going to happen is that you're 31:19 going to want to compare those two things. 31:20 And this is basically you have the information you want, 31:23 but visually it just doesn't render very well because you're 31:26 not scaling things properly. 31:28 And the way we actually do it is by flipping things around. 31:32 And rather than comparing the plot of F to the plot of Fn, 31:36 we compare the plot of Fn inverse 31:38 to the plot of F inverse. 31:41 Now, if F goes from the real line to the interval 0, 1, 31:47 F inverse goes from 0, 1 to the whole real line. 31:52 So what's going to happen is that I'm 31:53 going to compare things on some intervals, which is the-- 31:57 which are the entire real line. 31:59 And then what values should I be looking at those things at? 32:02 Well, technically for F, if F is continuous I 32:05 could look at F inverse for any value that I please, right? 32:09 So I have F. And if I want to look at F inverse, 32:14 I pick a point here and I look at the value that it gives me, 32:17 and that's F inverse of, say, u, right, if this is u. 32:23 And I could pick any value I want, 32:24 I'm going to be able to find it. 32:25 The problem is that when I start to have 32:27 this piecewise constant thing, I need 32:30 to decide what value I assign for anything that's 32:33 in between two jumps, right? 32:35 And so I can choose whatever I want, 32:38 but in practice it's just going to be things 32:40 that I myself decide. 32:42 Maybe I can decide that this is the value. 32:44 Maybe I can decide that the value is here. 32:46 But for all these guys, I'm going to pretty much decide 32:49 always the same value, right? 32:51 If I'm in between-- 32:52 for this value u, for this jump the jump is here. 32:56 So for this value, I'm going to be 32:59 able to decide whether I want to go above or below, 33:02 but it's always this value that's going to come out. 33:05 So rather than picking values that are in between, 33:07 I might as well just pick only values 33:08 for which this is the value that it's going to get. 33:11 And those values are exactly 1/n, 2/n, 3/n, 4/n. 33:15 It's all the way to n/n, right? 33:17 That's exactly where the flat parts are. 33:19 We know we jump from 1/n every time. 33:23 And so that's exactly the recipe. 33:25 It says look at those values, 1/n, 2/n, 3/n 33:29 until, say, n minus 1 over n. 33:32 And for those values, compute the inverse 33:35 of both the empiricial CDF and the true CDF. 33:40 Now, for the empirical CDF, it's actually easy. 33:43 I just told you this is basically where the points-- 33:45 where the jumps occur. 33:47 And the jumps occur where? 33:49 Well, exactly at my observations. 33:53 Now, remember I need to sort those observations to talk 33:56 about them. 33:57 So the one that occurs for the i-th jump 34:00 is the i-th largest observation, which we denoted by X sub (i). 34:07 Remember? 34:07 We had this formula that we said, well, we have x1, xn. 34:11 These are my data. 34:13 And what I'm going to sort them into 34:14 is x sub (1), which is less than or equal to x 34:18 sub (2), which is less than x sub (n). 34:23 OK? 34:24 So we just ordered them from smallest to largest. 34:26 And then now we've done that, we just 34:28 put this parenthesis notation. 34:30 So in particular, Fn inverse of i/n 34:34 is the location where the i-th jumps occur, 34:38 which is the i-th largest observation. 34:40 OK? 34:42 So for this guy, these values, the y-axes 34:47 are actually fairly easy. 34:49 I know it's basically my ordered observations. 34:53 The x-values are-- well, that depends on the function 34:58 F I'm trying to test. 34:59 If it's the Gaussian, it's just the quantile 35:01 of order 1 minus 1/n, right? 35:05 It's this Q1 minus 1/n here that I need to compute. 35:08 It's the inverse of the cumulative distribution 35:11 function, which, given the formula for F, 35:13 you can actually compute or maybe estimate fairly well. 35:16 But it's something that you can find in tables. 35:18 Those are basically quantiles. 35:20 Inverse of CDFs are quantiles, right? 35:23 And so that's basically the things we're interested in. 35:28 That's why it's called quantile-quantile. 35:30 Those are sometimes referred to as theoretical quantiles, 35:34 the one we're trying to test, and empirical quantiles, 35:37 the one that corresponds to the empirical CDF. 35:39 And so I'm plotting a plot where the x-axis is quantile. 35:44 The y-axis is quantile. 35:45 And so I call this plot a quantile-quantile plot, or QQ 35:49 plot, because, well, just say 10 times quantile-quantile, 35:54 and then you'll see why. 35:55 Yeah? 35:56 AUDIENCE: [INAUDIBLE] have to have the [INAUDIBLE]?? 35:59 PHILIPPE RIGOLLET: Well, that's just-- 36:01 we're back to the-- 36:03 we're back to the goodness-of-fit test, right? 36:06 So if you look-- 36:08 so you don't do it yourself. 36:10 That's the simple answer. 36:11 You don't-- I'm just telling you how those plots are going to be 36:14 seen spit out from a software are going to look like. 36:17 Now, depending on the software, there's 36:19 a different thing that's happening. 36:21 Some softwares are actually plotting F with the right-- 36:25 let's say you want to do normal, as you asked. 36:27 So some software are just going to use F 36:30 to be with mu hat and sigma hat, and that's fine. 36:33 Some software are actually not going to do this. 36:36 They're just going to use a Gaussian. 36:39 But then they're going to actually have 36:41 a different reference point. 36:43 So what do we want to see here? 36:45 What should happen if all these points-- 36:48 if all my points actually come from F, 36:51 from a distribution that has CDF F? 36:53 What should happen? 36:54 What should I see? 36:55 36:58 Well, since Fn should be close to F, 37:01 Fn inverse should be close to F inverse, which 37:04 means that this point should be close to that point. 37:07 This point should be close to that point. 37:08 So ideally, if I actually pick the right F, 37:13 I should see a plot that looks like this, something where 37:19 all my points are very close to the line y 37:24 is equal to x, right? 37:26 And I'm going to have some fluctuations, 37:28 but something very close to this. 37:31 Now, that's if F is exactly the right one. 37:34 If F is not exactly the right one, in particular, 37:36 in the case of a Gaussian one, if I actually 37:40 plotted here the quantiles-- 37:43 so if I plotted F 0, 1 of t, right? 37:52 So let's say those are the ones I actually plot, 37:54 but I really don't know what-- mu hat is not 0 37:57 and sigma hat is not 0. 37:59 And so this is not the one I should be getting. 38:01 Since we actually know that phi of mu hat sigma hat 38:06 squared t is equal to phi 0, 1 of t minus mu hat divided 38:12 by sigma hat, there's just this change 38:16 of axis, which is actually very simple. 38:19 This change of axis is just a simple translation scaling, 38:22 which means that this line here is 38:26 going to be transformed into another line 38:28 with a different slope and a different intercept. 38:31 And so some software will actually decide 38:34 to go with this curve and just show you 38:37 what the reference curve should be, 38:39 rather than actually putting everything back 38:41 onto the 45-degree curve. 38:43 AUDIENCE: So if you get any straight line? 38:45 PHILIPPE RIGOLLET: Any straight line, you're happy. 38:47 I mean, depending on the software. 38:49 Because if the software actually really rescaled this thing 38:53 to have mu hat and sigma square and you find a different line, 38:56 a different straight line, this is 38:58 bad news, which is not going to happen actually. 39:01 It's impossible that happens, because you actually-- well, 39:05 it could. 39:06 If it's crazy, it could. 39:07 It shouldn't be very crazy. 39:09 OK. 39:10 So let's see what R does for us, for example. 39:14 So here in R, R actually does this funny trick where-- 39:20 so here I did not actually plot the lines. 39:22 I should actually add the lines. 39:23 So the command is like qqnorm of my sample, right? 39:27 And that's really simple. 39:28 I just stack all my data into some vector, say, x. 39:33 And I say qqnorm of x, and it just spits this thing out. 39:40 OK? 39:40 Very simple. 39:42 But I could actually add another command, 39:44 which I can't remember. 39:45 I think it's like qqline, and it's just going 39:50 to add the line on top of it. 39:52 But if you see, actually what R does for us, 39:55 it's actually doing the translation and scaling 39:58 on the axes themselves. 40:01 So it actually changes the x and y-axis in such a 40:05 way that when you look at your picture 40:07 and you forget about what the meaning of the axes are, 40:09 the relevant straight line is actually 40:11 still the 45-degree line. 40:13 It's Because it's actually done the change of units for you. 40:17 So you don't have to even see the line. 40:19 You know that, in your mind, that this is basically-- 40:21 the reference line is still 45 degree because that's 40:25 the way the axes are made. 40:27 But if I actually put my axes, right-- so here, for example, 40:29 it goes from-- 40:31 let's look at some-- 40:32 well, OK, those are all square. 40:36 Yeah, and that's probably because they actually have-- 40:38 the samples are actually from a standard normal. 40:41 So I did not make my life very easy 40:43 to illustrate your question, but of course, I 40:45 didn't know you were going to ask it. 40:46 Next time, let's just prepare. 40:49 Let's script more. 40:50 We'll see another one in the next plot. 40:52 But so here what you expect to see 40:54 is that all the plots should be on the 45-degree line, right? 40:58 This should be the right one. 40:59 And if you see, when I start having 10,000 samples, 41:02 this is exactly what's happening. 41:04 So this is as good as it gets. 41:05 This is an N(0, 1) plotted against the theoretical 41:08 quantile of an N(0, 1). 41:10 As good as it gets. 41:12 And if you see, for the second one, which is 50, 41:15 sample size of size-- 41:16 sample of size 50, there is some fudge factor, right? 41:19 I mean, those things-- 41:20 doesn't look like there's a straight line, right? 41:22 It sort of appears that there are some weird things happening 41:24 here at the lower tail. 41:27 And the reason why this is happening 41:29 is because we're trying to compare the tails, right? 41:32 When I look at this picture, the only thing that goes wrong 41:34 somehow is always at the tip, because those 41:37 are sort of rare and extreme values, 41:39 and they're sort of all over the place. 41:41 And so things are never really super smooth and super clean. 41:44 So this is what your best shot is. 41:46 This is what you will ever hope to get. 41:49 So size 10, right, so you have 10 points. 41:52 Remember, we actually-- well, I didn't really 41:54 tell you how to deal with the extreme cases. 41:56 Because the problem is that F inverse of 1 for the true F 41:59 is plus infinity. 42:01 So you have to make some sort of weird boundary choices 42:04 to decide what F inverse of 1 is, and it's something 42:07 that's like somewhere. 42:09 But you still want to put like 10 dots, right? 42:11 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 dots. 42:15 So I have 10 observations, you will see 10 dots. 42:17 I have 50 observations, you will see 50 dots, right, 42:21 because I have-- 42:22 there are 1/n, 2/n, 3/n all the way to n/n. 42:26 I didn't tell you the last one. 42:29 OK. 42:29 So this is when things go well, and this is 42:31 when things should not go well. 42:32 OK? 42:33 So here, actually, the distribution 42:35 is a Student's t with 15 degrees of freedom, 42:37 which should depart somewhat from a Gaussian distribution. 42:41 The tails should be heavier. 42:44 And what you can see is basically the following, 42:47 is that for 10 you actually see something that's crazy, right, 42:51 if I do 10 observations. 42:52 But if I do 50 observations, honestly, it's 42:55 kind of hard to say that it's different 42:56 from the standard normal. 42:58 So you could still be happy with this for 100. 43:01 And then this is what's happening for 10,000. 43:03 And even here it's not the beautiful straight line, 43:06 but it feels like you would be still tempted 43:08 to conclude that it's a beautiful straight line. 43:11 So let's try to guess. 43:13 So basically, there's-- for each of those sides there's two 43:18 phenomena. 43:18 Either it goes like this or it goes like this, 43:22 and then it goes like this or it goes like this. 43:24 Each side corresponds to the left tail, all the smallest 43:28 values. 43:29 So that's the left side. 43:30 And that's the right side-- corresponds 43:31 to the large values. 43:33 OK? 43:33 And so basically you can actually 43:35 think of some sort of a table that tells you 43:40 what your QQ plot looks like. 43:41 43:47 And so let's say it looks-- 43:48 so we have our reference 45-degree line. 43:50 So let's say this is the QQ plot. 43:52 That could be one thing. 43:54 This could be the QQ plot where I have another thing. 43:59 Then I can do this guy, and then I do this guy. 44:08 So this is like this. 44:10 OK? 44:11 So those are the four cases. 44:13 OK? 44:14 And here what's changing is the right tail, 44:19 and here what's changing is the-- 44:20 and when I go from here to here, what changes is the left tail. 44:24 Is that true? 44:26 No, sorry. 44:27 What changes here is the right tail, right? 44:29 It's this part that changes from top to bottom. 44:34 So here it's something about right tail, 44:38 and here that's something about left tail. 44:40 44:44 Everybody understands what I mean when I talk about tails? 44:46 OK. 44:48 And so here it's just going to be 44:50 a question of whether the tails are heavier 44:52 or lighter than the Gaussian. 44:54 Everybody understand what I mean when I say 44:56 heavy tails and light tails? 44:58 OK. 44:59 So right, so heavy tails just means 45:01 that basically here the tails of this guy 45:04 are heavier than the tails of this guy. 45:06 So it means that if I draw them, they're going to be above. 45:08 Actually, I'm going to keep this picture because it's 45:10 going to be very useful for me. 45:11 45:16 When I plug the quantiles at the same-- so let's 45:19 look at the right tail, for example. 45:21 Right here my picture is for right tails. 45:23 When I look at the quantiles of my theoretical distribution-- 45:26 so here you can see the bottom curve 45:28 we have the theoretical quantiles, 45:31 and those are the empirical quantiles. 45:34 If I look to the right here, are the theoretical quantiles 45:39 larger or smaller than the empirical quantiles? 45:41 45:47 Let me phrase it the other-- 45:48 are the empirical quantiles larger or smaller 45:50 than the theoretical quantiles? 45:53 AUDIENCE: This is a graph of quantiles, right? 45:56 So if it's [INAUDIBLE] it should be smaller. 45:59 PHILIPPE RIGOLLET: It should be smaller, right? 46:01 On this line, they are equal. 46:04 So if I see the empirical quantile showing up here, 46:07 it means that here the empirical quantile is less 46:10 than the theoretical quantile. 46:12 Agree? 46:13 So that means that if I look at this thing-- 46:16 and that's for the same values, right? 46:18 So the quantiles are computed for the same values i/n. 46:22 So it means that the empirical quantiles should be looking-- 46:25 so that should be the empirical quantile, 46:29 and that should be the theoretical quantile. 46:32 Agreed? 46:34 Those are the smaller values for the same alpha. 46:37 So that implies that the tails-- 46:41 the right tail, is it heavy or lighter-- 46:43 heavier or lighter than the Gaussian? 46:45 46:50 AUDIENCE: Lighter. 46:51 PHILIPPE RIGOLLET: Lighter, right? 46:52 Because those are the tails of the Gaussian. 46:54 Those are my theoretical quantiles. 46:55 That means that this is the tail of my empirical distribution. 46:59 So they are actually lighter. 47:00 47:08 OK? 47:09 So here, if I look at this thing, 47:11 this means that the right tail is actually light. 47:18 And by light, I mean lighter than Gaussian. 47:20 Heavy, I mean heavier than Gaussian. 47:22 OK? 47:23 OK, now we can probably do the entire thing. 47:27 Well, if this is light, this is going to be heavy, right? 47:31 That's when I'm above the curve. 47:33 47:36 Exercise-- is this light or is this heavy, the first column? 47:40 47:46 And it's OK. 47:47 It should take you at least 30 seconds. 47:51 AUDIENCE: [INAUDIBLE] different column? 47:53 PHILIPPE RIGOLLET: Yeah, this column, right? 47:54 So this is something that pertains-- 47:56 this entire column is going to tell me whether the fact 47:59 that this guy is above, does this 48:01 mean that I have lighter or heavier left tails? 48:06 AUDIENCE: Well, on the left, it's heavier. 48:09 PHILIPPE RIGOLLET: On the left, it's heavier. 48:11 OK. 48:12 I don't know. 48:12 Actually, I need to draw a picture. 48:14 You guys are probably faster than I am. 48:17 AUDIENCE: [INTERPOSING VOICES]. 48:19 PHILIPPE RIGOLLET: Actually, let me 48:21 check how much randomness is-- 48:23 who says it's lighter? 48:26 Who says it's heavier? 48:27 AUDIENCE: Yeah, but we're biased. 48:29 AUDIENCE: [INAUDIBLE] 48:30 PHILIPPE RIGOLLET: Yeah, OK. 48:32 AUDIENCE: [INAUDIBLE] 48:33 PHILIPPE RIGOLLET: All right. 48:34 So let's see if it's heavier. 48:36 So we're on the left tail, and so we have one looks like this, 48:40 one looks like that, right? 48:41 48:45 So we know here that I'm looking at this part here. 48:49 So it means that here my empirical quantile is larger 48:52 than the theoretical quantile. 48:53 48:58 OK? 49:00 So are my tails heavier or lighter? 49:02 49:06 They're lighter. 49:07 That was a bad bias. 49:08 AUDIENCE: [INAUDIBLE] 49:10 PHILIPPE RIGOLLET: Right? 49:11 It's below, so it's lighter. 49:14 Because the problem is that larger for the negative ones 49:19 means that it's smaller [INAUDIBLE],, right? 49:22 Yeah? 49:23 AUDIENCE: Sorry but, what exactly are these [INAUDIBLE]?? 49:26 If this is the inverse-- 49:28 if this is the inverse CDF, shouldn't everything-- 49:32 well, if this is the inverse CDF, 49:34 then you should only be inputting 49:36 values between 0 and 1 in it. 49:38 And-- 49:40 PHILIPPE RIGOLLET: Oh, did I put the inverse CDF? 49:42 AUDIENCE: Like on the previous slide, I think. 49:46 PHILIPPE RIGOLLET: No, the inverse 49:48 CDF, yeah, so I'm inputting-- 49:49 AUDIENCE: Oh, you're [INAUDIBLE].. 49:51 PHILIPPE RIGOLLET: Yeah, so it's a scatter plot, right? 49:53 So each point is attached-- each point 49:56 is attached 1/n, 2/n, 3/n. 49:59 Now, for each point I'm plotting, 50:01 that's my x-value, which maps a number between 0 and 1 50:05 back onto the entire real line, and my y-value is the same. 50:09 OK? 50:10 So what it means is that those two numbers, this is in the-- 50:14 this lives on the entire real line, not on the interval. 50:17 This lives on the entire real line, not in the interval. 50:20 And so my QQ plots take values on the entire real line, 50:26 entire real line, right? 50:28 So you think of it as a parameterized curve, where 50:31 the time steps are 1/n, 2/n, 3/n, 50:34 and I'm just like putting a dot every time I'm making one step. 50:38 OK? 50:41 OK, so what did we say? 50:43 That was lighter, right? 50:46 AUDIENCE: [INAUDIBLE] 50:51 PHILIPPE RIGOLLET: OK? 50:54 One of my favorite exercises is, here's a bunch of densities. 50:58 Here's a bunch of QQ plots. 51:00 Map the correct QQ plot to its own density. 51:04 All right? 51:05 And there won't be mingled lines that allow you to do that, 51:09 then you just have to follow, like at the back of cereal 51:11 boxes. 51:13 All right. 51:15 Are there any questions? 51:17 So one thing-- there's two things 51:18 I'm trying to communicate here is 51:19 if you see a QQ plot, now you should understand, 51:22 one, how it was built, and two, whether it means that you have 51:28 heavier tails or lighter tails. 51:30 Now, let's look at this guy. 51:32 What should we see? 51:34 We should see heavy on the left and heavy on the right, right? 51:37 We know that this should be the case. 51:39 So this thing actually looks like this, and it sort of does, 51:45 right? 51:46 If I take this line going through here, 51:48 I can see that this guy's tipping here, 51:50 and this guy's dipping here. 51:52 But honestly-- actually, I can't remember exactly, but t 15, 51:57 if I plotted the density on top of the Gaussian, 52:01 you can see a difference. 52:02 But if I just gave it to you, it would be very hard 52:04 for you to tell me if there's an actual difference between t 52:07 15 and Gaussian, right? 52:08 Those things are actually very close. 52:11 And so in particular, here we're really 52:12 trying to recognize what the shape is the fact-- 52:15 right? 52:16 So t 15 compared to a standard Gaussian was different, 52:20 but t 15 compared to a Gaussian with a slightly larger variance 52:26 is not going to actually-- you're not going 52:27 to see much of a difference. 52:29 So in a way, such distributions are actually not 52:33 too far from the Gaussian, and it's not too-- 52:35 it's still pretty benign to conclude that this was actually 52:38 a Gaussian distribution because you can just use the variance 52:42 as a little bit of a buffer. 52:43 I'm not going to get really into how 52:45 you would use a t-distribution into a t-test, 52:50 because it's kind of like Inception, right? 52:54 So but you could pretend that your data actually 52:58 is t-distributed and then build a t-distribution from it, 53:02 but let's not say that. 53:03 Maybe that was a bad example. 53:05 But there's like other heavy-tailed distributions like 53:08 Cauchy distribution, which doesn't even have a-- 53:10 it's not even integrable because that's 53:12 as heavy as the tails get. 53:14 And this you can really tell it's going to look like this. 53:18 It's going to be like pfft. 53:22 What does a uniform distribution look like? 53:24 53:30 Like this? 53:32 It's going to be-- it's going to look like a Gaussian one, 53:37 right? 53:38 So a uniform-- so this is my Gaussian. 53:41 A uniform is basically going to look like this, 53:43 one side take the right mean and the right variance, right? 53:46 So the tails are definitely lighter. 53:48 They're 0. 53:49 That's as lighter as it gets. 53:51 So the light-light is going to look like this S shape. 53:55 So an S-- light-tailed distribution has this S shape. 53:59 OK? 53:59 What is the exponential going to look like? 54:02 54:06 So the exponential is positively supported. 54:08 It only has positive numbers. 54:10 So there's no left tail. 54:11 This is also as light as it gets. 54:14 But the right tail, is it heavier or lighter 54:16 than the Gaussian? 54:17 AUDIENCE: Heavier. 54:18 PHILIPPE RIGOLLET: It's heavier, right? 54:19 It's only the case like e of the minus x rather e to the minus 54:21 x squared. 54:22 So it's heavier. 54:23 So it means that on the left it's going to be light, 54:27 and on the right it's going to be heavy. 54:29 So it's going to be U-shaped. 54:31 OK? 54:32 54:35 That will be fine. 54:37 All right. 54:39 Any other question? 54:41 Again, two messages, like, more technical, 54:44 and you can sort of fiddle with it by looking at it. 54:47 You can definitely conclude that this 54:49 is OK enough to be Gaussian for your purposes. 54:53 Yeah? 54:53 AUDIENCE: So [INAUDIBLE] 54:59 PHILIPPE RIGOLLET: I did not hear the "if" 55:01 at the beginning of your sentence. 55:02 55:06 AUDIENCE: I would want to be lighter tail, right, 55:08 because that'll be-- it's easier to reject? 55:10 Is that correct? 55:11 55:16 PHILIPPE RIGOLLET: So what is your purpose as a-- 55:20 AUDIENCE: I want to-- 55:21 I have some [INAUDIBLE] right? 55:25 I want to be able to say I reject H0 [INAUDIBLE].. 55:28 PHILIPPE RIGOLLET: Yes. 55:29 AUDIENCE: So if you wanted to make it easier 55:32 to reject H0, then-- 55:35 PHILIPPE RIGOLLET: Yeah, in a way that's true, right? 55:37 So once you've actually factored in the mean and the variance, 55:40 the only thing that actually-- 55:43 right. 55:43 So if you have Gaussian tails or lighter-- even lighter tails, 55:47 then it's harder for you to explain deviations 55:51 from randomness only, right? 55:52 If you have a uniform distribution 55:54 and you see something which is-- 55:56 if you're uniform on 0, 1 plus some number and you see 25, 55:59 you know this number is not going to be 0, right? 56:01 So that's basically as good as it gets. 56:04 And there's basically some smooth interpolation 56:06 if you have lighter tails. 56:07 Now, if you start having something that has heavy tails, 56:10 then it's more likely that pure noise 56:12 will generate large observations and therefore discovery. 56:15 So yes, lighter tails is definitely 56:19 the better-behaved noise. 56:21 Let's put it this way. 56:22 The lighter it is, the better behaved it is. 56:24 Now, this is good-- 56:27 this is good for some purposes, but when you want to compute 56:30 actual quantiles, like exact quantiles, 56:35 then it is true in general that the quantiles of lighter-tail 56:40 distributions are going to be dominated by the-- are going 56:42 to be dominated by the-- 56:46 let's say on the right tails, are 56:47 going to be dominated by those of a heavy distribution. 56:51 That is true. 56:52 But that's not always the case. 56:54 And in particular, there's going to be 56:54 some like sort of weird points where things are actually 56:57 changing depending on what level you're actually looking 56:59 at those things, maybe 5% or 10%, 57:01 in which case things might be changing a little bit. 57:04 But if you started going really towards the tail, 57:06 if you start looking at levels alpha which are 1% or 0.1%, 57:10 it is true that it's always-- 57:13 if you can actually-- so if you see something 57:14 that looks light tail, you definitely 57:16 do not want to conclude that it's Gaussian. 57:18 You want to actually change your modeling so that it 57:21 makes your life even easier. 57:23 And you actually factor in the fact 57:25 that you can see that the noise is actually more benign 57:27 than you would like it to be. 57:30 OK? 57:31 57:34 Stretching fingers, that's it? 57:35 All right. 57:37 OK. 57:38 So I want to-- 57:40 I mentioned at some point that we had this chi-square test 57:43 that was showing up. 57:45 And I do not know what I did-- 57:47 let's just-- oh, yeah. 57:49 So we have this chi-square test that we worked on last time, 57:53 right? 57:54 So the way I introduced the chi-square test is by saying, 57:57 I am fascinated by this question. 57:59 Let's check if it's correct, OK? 58:01 Or something maybe slightly deeper-- 58:04 let's check if juries in this country 58:06 are representative of racial distribution. 58:10 But you could actually-- those numbers here 58:14 come from a very specific thing. 58:16 That was the uniform. 58:16 That was our benchmark. 58:17 Here's the uniform. 58:19 And there was this guy, which was a benchmark, which 58:21 was the actual benchmark that we need to have for this problem. 58:24 And those things basically came out of my hat, right? 58:27 Those are numbers that exist. 58:29 But in practice, you actually make those numbers yourself. 58:33 And the way you do it is by saying, well, 58:36 if I have a binomial distribution 58:39 and I want to test if my data comes 58:41 from a binomial distribution, you 58:42 could ask this question, right? 58:44 You have a bunch of data. 58:45 I did not promise to you that this 58:48 was the sum of independent Bernoullis and [INAUDIBLE].. 58:50 And then you can actually check that it's a binomial indeed, 58:53 and you have binomial. 58:55 If you think about where you've encountered binomials, 58:57 it was mostly when you were drawing balls 58:59 from urns, which you probably don't do that much in practice. 59:02 OK? 59:02 And so maybe one day you want to model things as a binomial, 59:05 or maybe you want to model it as a Poisson, 59:07 as a limiting binomial, right? 59:08 People tell you photons arrive-- 59:11 the rate of a photon hitting some surface 59:13 is actually a Poisson distribution, right? 59:15 That's where they arise a lot in imaging. 59:18 So if I have a colleague who's taking pictures 59:21 of the skies over night, and he's like following stars 59:23 and it's just like moving around with the rotation of the Earth. 59:26 And he has to do this for like eight hours 59:28 because he needs to get enough photons over this picture 59:30 to actually arise. 59:32 And he knows they arrive at like a Poisson process, 59:35 and you know, chapter 7 of your probability class, I guess. 59:39 And 59:40 And there's all these distributions 59:43 outside the classroom you probably 59:44 want to check that they're actually correct. 59:46 And so the first one you might want to check, for example, 59:49 is a binomial. 59:49 So I give you a distribution, a binomial distribution 59:52 on, say, K trials, and you have some number p. 59:56 And here, I don't know typically what p should be, 59:59 but let's say I know it or estimate it from my data. 60:01 And here, since we're only going to deal with asymptotics, 60:04 just like it was the case for the Kolmogorov-Smirnov one, 60:07 in the asymptotic we're going to be 60:08 able to think of the estimated p as being a true p, OK, 60:13 under the null at least. 60:15 So therefore, each outcome, I can actually tell you what 60:19 the probability of a binomial-- 60:20 is this outcome. 60:21 For a given K and a given p, I can tell you 60:23 exactly what a binomial should give you 60:25 as the probability for the outcome. 60:27 And that's what I actually use to replace the numbers 1/12, 60:33 1/12, 1/12, 1/12 or the numbers 0.72, 0.7, 0.12, 0.9. 60:41 All these numbers I can actually compute 60:43 using the probabilities of a binomial, right? 60:45 So I know, for example, that the probability that a binomial np 60:52 is equal to, say, K is n choose K p to the K 1 minus p 61:02 to the n minus K. OK? 61:05 I mean, so these are numbers. 61:07 If you give me p and you give me n, 61:08 I can compute those numbers for all K from 0 to n. 61:12 And from this I can actually build a table. 61:14 61:22 All right? 61:22 So for each K-- 61:25 0. 61:26 So K is here, and from 0, 1, et cetera, 61:31 all the way to n, I can compute the true probability, which 61:35 is the probability that my binomial np is equal to 0, 61:40 the probability that my binomial is equal to 1, et cetera, 61:45 all the way to n. 61:46 I can compute those numbers. 61:47 Those are actually going to be exact numbers, right? 61:50 I just plug in the formula that I had. 61:52 And then I'm going to have some observed. 61:54 62:01 So that's going to be p hat, 0, and that's basically 62:05 the proportion of 0's, right? 62:12 So here you have to remember it's not a one-time experiment 62:16 like you do in probability where you say, 62:18 I'm going to draw n balls from an urn, 62:22 and I'm counting how many-- 62:24 how many I have. 62:25 This is statistics. 62:25 I need to be able to do this experiment many times 62:28 so I can actually, in the end, get an idea of what 62:31 the proportion of p's is. 62:33 So you have not just one binomial, 62:36 but you have n binomials. 62:38 Well, maybe I should not use n twice. 62:40 So that's why it's the K here, right? 62:42 So I have a binomial [INAUDIBLE] at Kp 62:44 and I just seize n of those guys. 62:46 And with this n of those guys, I can actually 62:48 estimate those probabilities. 62:50 And what I'm going to want to check 62:51 is if those two probabilities are actually 62:53 close to each other. 62:54 But I already know how to do this. 62:57 All right? 62:58 So here I'm going to test whether P 63:00 is in some parametric family, for example, 63:02 binomial or not binomial. 63:06 And testing-- if I know that it's a binomial [INAUDIBLE],, 63:09 and I basically just have to test if P is the right thing. 63:12 OK? 63:14 Oh, sorry, I'm actually lying to you here. 63:17 OK. 63:18 I don't want to test if it's binomial. 63:19 I want to test the parameter of the binomial here. 63:24 OK? 63:24 So I know-- no, sorry, [INAUDIBLE] sorry. 63:28 OK. 63:28 So I want to know if I'm in some family, 63:30 the family of binomials, or not in the family of binomials. 63:34 OK? 63:35 Well, that's what I want to do. 63:36 And so here H0 is basically equivalent to testing 63:39 if the pj's are the pj's that come from the binomial. 63:42 And the pj's here are the probabilities that I get. 63:46 This is the probability that I get j successes. 63:50 That's my pj. 63:51 That's j's value here. 63:54 OK? 63:54 So this is the example, and we know how to do this. 63:57 We construct p hat, which is the estimated 64:00 proportion of successes from the observations. 64:03 So here now I have n trials. 64:05 This is the actual maximum likelihood estimator. 64:08 This becomes a multinomial experiment, right? 64:12 So it's kind of confusing. 64:13 We have a multinomial experiment for a binomial distribution. 64:17 The binomial here is just a recipe 64:19 to create some test probabilities. 64:21 That's all it is. 64:22 The binomial here doesn't really matter. 64:24 It's really to create the test probabilities. 64:26 And then I'm going to define this test statistic, which 64:28 is known as the chi-square statistic, right? 64:36 This was the chi-square test. 64:37 We just looked at sum of the square root of the differences. 64:41 Inverting the covariance matrix or using the Fisher information 64:45 with removing the part that was not invertible 64:46 led us to actually use this particular value here, 64:50 and then we had to multiply by n. 64:54 OK? 64:55 And that, we know, converges to what? 64:59 A chi-square distribution. 65:01 So I'm not going to go through this again. 65:03 I'm just telling you you can use the chi-square 65:05 that we've seen, where we just came up with the numbers we 65:08 were testing. 65:09 Those numbers that were in this row for the true probabilities, 65:12 we came up with them out of thin air. 65:14 And now I'm telling you you can actually 65:15 come up with those guys from a binomial distribution 65:19 or a Poisson distribution or whatever 65:20 distribution you're happy with. 65:22 65:26 Any question? 65:26 65:30 So now I'm creating this thing, and I 65:31 can apply the entire theory that I have for the chi-square 65:34 and, in particular, that this thing converges 65:36 to a chi-square. 65:38 But if you see, there's something that's different. 65:40 What is different? 65:42 65:45 The degrees of freedom. 65:47 And if you think about it, again, the meaning of degrees 65:51 of freedom. 65:52 What does this word-- 65:54 these words actually mean? 65:55 It means, well, to which extent can I 65:57 play around with those values? 65:59 What are the possible values that I can get? 66:01 If I'm not equal to this particular value I'm testing, 66:03 how many directions can I be different from this guy? 66:07 And when we had a given set of values, 66:10 we could be any other set of values, right? 66:13 So here, I had this-- 66:16 I'm going to represent-- this is the set of all probability 66:19 distributions of vectors of size K. So here, 66:23 if I look at one point in this set, 66:25 this is something that looks like p1 through pK such that 66:29 their sum-- 66:30 such that they're non-negative, and the sum p1 through pK 66:36 is equal to 1. 66:37 OK? 66:37 So I have all those points here. 66:40 OK? 66:41 So this is basically the set that I had before. 66:44 I was testing whether I was equal to this one guy, 66:47 or if I was anything else. 66:48 And there's many ways I can be anything else. 66:51 What matters, of course, is what's around this guy 66:53 that I could actually confuse myself with. 66:55 But there's many ways I can move around this guy. 66:58 Agreed? 67:00 Now I'm actually just testing something very specific. 67:04 I'm saying, well, now the piece that I 67:06 have had to come from this-- have 67:09 to be constructed from this formula, this parametric family 67:13 P of theta. 67:14 And there's a fixed way for-- let's say this is theta, 67:20 so I have a theta here. 67:23 There's not that many ways this can actually give me 67:26 a set of probabilities, right? 67:28 I have to move to another theta to actually start 67:31 being confused. 67:32 And so here the number of degrees of freedom 67:34 is basically, how can I move along this family? 67:39 And so here, this is all the points, 67:41 but there might be just the subset 67:43 of the points that looks like this, just this curve, 67:45 not the half of this thing. 67:48 And those guys on this curve are the p thetas, 67:56 and that's for all thetas when theta runs across data. 68:00 So in a way, this is just a much smaller dimensional thing. 68:03 It's a much smaller object. 68:04 Those are only the ones that I can 68:06 create that are exactly of this very specific parametric form. 68:13 And of course, not all are of this form. 68:15 Not all probability PMFs are of this form. 68:19 And so that is going to have an effect 68:20 on what my PMF is going to be-- 68:24 sorry, on what my-- 68:28 sorry, what my degrees of freedoms are going to be. 68:33 Because when this thing is very small, that means when-- 68:39 that's happening when theta is actually, 68:41 say, a one-dimensional space, then there's still 68:44 many ways I can escape, right? 68:46 I can be different from this guy in pretty 68:48 much every other direction, except for those two 68:50 directions, just when I move from here 68:53 or when I move in this direction. 68:56 But now if this thing becomes bigger, 69:00 your theta is, say, two dimensional, 69:03 then when I'm here it's becoming harder 69:06 for me to not be that guy. 69:07 If I want to move away from it, then I 69:08 have to move away from the board. 69:11 And so that means that the bigger the dimension 69:15 of my theta, the smaller the degrees of freedoms 69:18 that I have, OK, because moving out of this parametric family 69:24 is actually very difficult for me. 69:27 So if you think, for example, as an extreme case, 69:30 the parametric family that I have is basically all PMFs, 69:36 all of them, right? 69:38 So that's a stupid parametric family. 69:39 I'm indexed by the distribution itself, 69:41 but it's still finite dimensional. 69:43 Then here, I have basically no degrees of freedom. 69:46 There's no way I can actually not 69:48 be that guy, because this is everything I have. 69:51 And so you don't have to really understand 69:54 how the computation comes into the numbers of dimension 69:59 and what I mean by dimension of this current space. 70:01 But really, what's important is that as the dimension of theta 70:05 becomes bigger, I have less degrees of freedom 70:09 to be away from this family. 70:11 This family becomes big, and it's very hard for me 70:13 to violate this. 70:14 So it's actually shrinking the number of degrees 70:17 of freedom of my chi-square. 70:18 And that's all you need to understand. 70:20 When d increases, the number of degrees of freedom decreases. 70:23 And I'd like to you to have an idea of why this is somewhat 70:27 true, and this is basically the picture 70:28 you should have in mind. 70:30 70:33 OK. 70:33 So now once I have done this, I can just construct. 70:35 So here I need to check. 70:37 So what is d in the case of the binomial? 70:39 70:42 AUDIENCE: 1. 70:43 PHILIPPE RIGOLLET: 1, right? 70:43 It's just a one-dimensional thing. 70:44 And for most of the examples we're 70:46 going to have it's going to be one dimensional. 70:48 So we have this weird thing. 70:49 We're going to have K minus 2 degrees of freedom. 70:51 70:54 So now I have this thing, and I have this asymptotic. 70:59 And then I can just basically use a test that has-- 71:02 that uses the fact that the asymptotic distribution 71:04 is this. 71:05 So I compute my quantiles out of this. 71:06 Again, I made the same mistake. 71:08 This should be q alpha, and this should be q alpha. 71:11 So that's just the tail probability 71:13 is equal to alpha when I'm on the right of q alpha. 71:16 And so those are the tail probability 71:18 of the appropriate chi-square with the appropriate number 71:20 of degrees of freedom. 71:22 And so I can compute p-values, and I can do whatever I want. 71:24 OK? 71:25 So then I just like [INAUDIBLE] my testing machinery. 71:28 OK? 71:29 So now I know how to test if I'm a binomial distribution or not. 71:34 Again here, testing if I'm a binomial distribution 71:38 is not a simple goodness of fit. 71:40 It's a composite one where I can actually-- 71:43 there's many ways I can be a binomial distribution 71:45 because there's as many as there is theta. 71:48 And so I'm actually plugging in the theta hat, which is 71:51 estimated from the data, right? 71:54 And here, since everything's happening in the asymptotics, 71:57 I'm not claiming that Tn has a pivotal distribution 72:00 for finite n. 72:01 That's actually not true. 72:02 It's going to depend like crazy on what 72:04 the actual distribution is. 72:06 But asymptotically, I have a chi-square, 72:08 which obviously does not depend on anything [INAUDIBLE].. 72:11 OK? 72:13 Yeah? 72:14 AUDIENCE: So in general, for the binomial [INAUDIBLE] trials. 72:19 But in the general case, the number of-- 72:23 the size of our PMF is the number of [INAUDIBLE].. 72:26 PHILIPPE RIGOLLET: Yeah. 72:27 AUDIENCE: So let's say that I was also 72:29 uncertain about what K was so that I don't 72:32 know how big my [INAUDIBLE] is. 72:37 [INAUDIBLE] 72:48 PHILIPPE RIGOLLET: That is correct. 72:50 And thank you for this beautiful segue into my next slide. 72:54 So we can actually deal with the case 72:56 not only where it's infinite, which 72:57 would be the case of Poisson. 72:58 I mean, nobody believes I'm going 73:00 to get an infinite number of photons 73:02 in a finite amount of time. 73:04 But we just don't want to have to say there's got to be a-- 73:08 this is the largest possible number. 73:09 We don't want to have to do that. 73:10 Because if you start doing this and the probabilities 73:13 become close to 0, things become degenerate and it's an issue. 73:16 So what we do is we bin. 73:18 We just bin stuff. 73:19 OK? 73:20 And so maybe if I have a binomial distribution 73:23 with, say, 200,000 possible values, 73:28 then it's actually maybe not the level of precision 73:32 I want to look at this. 73:33 Maybe I want to bin. 73:33 Maybe I want to say, let's just think 73:35 of all things that are between 0 and 100 73:37 to be the same thing, between 100 and 200 the same thing, 73:40 et cetera. 73:41 And so in fact, I'm actually going to bin. 73:44 I don't even have to think about things that are discrete. 73:46 I can even think about continuous cases. 73:49 And so if I want to test if I have a Gaussian distribution, 73:51 for example, I can just approximate that by some, 73:55 say, piecewise constant function that just says that, 73:59 well, if I have a Gaussian distribution like this, 74:03 I'm going to bin it like this. 74:06 And I'm going to say, well, the probability that I'm 74:08 less than this value is this. 74:10 The probability that I'm between this and this value is this. 74:12 The probability I'm between this and this value 74:14 is this, and then this and then this, right? 74:18 And now I've turned-- 74:19 I've discretized, effectively, my Gaussian into a PMF. 74:24 The value-- this is p1. 74:26 The value here is p1. 74:28 This is p2. 74:30 This is p3. 74:32 This is p4. 74:35 This is p5 and p6, right? 74:39 I have discretized my Gaussian into six possible values. 74:41 That's just the probability that they fall into a certain bin. 74:46 And we can do this-- 74:47 if you don't know what K is, just stop at 10. 74:51 You look at your data quickly and you say, well, you know, 74:54 I have so few of them that are-- like I see maybe one 8, one 11, 75:00 and one 15. 75:01 Well, everything that's between 8 and 20 75:03 I'm just going to put it in one bin. 75:05 Because what else are you going to do? 75:07 I mean, you just don't have enough observations. 75:09 And so what we do is we just bin everything. 75:11 So here I'm going to actually be slightly abstract. 75:14 Our bins are going to be intervals Aj. 75:16 So here-- they don't even have to be intervals. 75:18 I could go crazy and just like call the bin this guy 75:21 and this guy, right? 75:23 That would make no sense, but I could do that. 75:27 And then I'm-- and of course, you can do whatever you want, 75:30 but there's going to be some consequences in the conclusions 75:33 that you can take, right? 75:34 All you're going to be able to say 75:35 is that my distribution does not look like it 75:38 could be binned in this way. 75:40 That's all you're going to be able to say. 75:42 So if you decide to just put all the negative numbers 75:46 and the positive numbers, then it's 75:48 going to be very hard for you to distinguish 75:50 a Gaussian from a random variable that takes values 75:52 of minus 1 and plus 1 only. 75:54 You need to just be reasonable. 75:57 OK? 75:57 So now I have my pj's become the probability 76:00 that my random variable falls into bin j. 76:02 76:06 So that's pj of theta under the parametric distribution. 76:10 For the true one, whether it's parametric or not, I have a pj. 76:14 And then I have p hat j, which is 76:15 the proportion of observations that falls in this bin. 76:19 All right? 76:19 So I have a bunch of observations. 76:21 I count how many of them fall in this bin. 76:23 I divide by n, and that tells me what my estimated 76:26 probability for this bin is. 76:29 And theta hat, well, it's the same as before. 76:31 If I'm in a parametric family, I'm 76:32 just estimating theta hat, maybe the maximum likelihood 76:35 estimator, plug it in, and estimate 76:37 those pj's of theta hat. 76:39 From this, I form my chi-square, and I have exactly 76:43 the same thing as before. 76:45 So the answer to your question is, yes, you bin. 76:48 And it's the answer to even more questions. 76:51 So that's why there you can actually 76:53 use the chi-square test to test for normality. 76:56 Now here it's going to be slightly weaker, 76:58 because there's only an asymptotic theory, 77:00 whereas Kolmogorov-Smirnov and Kolmogorov-Lilliefors work 77:03 actually even for finite samples. 77:06 For the chi-square test, it's only asymptotic. 77:08 So you just pretend you actually know what the parameters are. 77:11 You just stuff them into a theta, a mu hat, 77:15 and sigma square hat. 77:16 And you just go to-- you just cross your finger 77:19 that n is large enough for everything 77:21 to have converged by the time you make your decision. 77:24 OK? 77:24 And then this is a copy/paste, with the same error actually 77:28 as the previous slide, where you just build your test based 77:31 on whether you exceed or not some quantile, 77:34 and you can also compute some p-value. 77:37 OK? 77:38 AUDIENCE: The error? 77:39 PHILIPPE RIGOLLET: I'm sorry? 77:40 AUDIENCE: What's the error? 77:41 PHILIPPE RIGOLLET: What is the error? 77:43 AUDIENCE: You said [INAUDIBLE] copy/paste [INAUDIBLE].. 77:45 PHILIPPE RIGOLLET: Oh, the error is that this 77:47 should be q alpha, right? 77:48 AUDIENCE: OK. 77:49 PHILIPPE RIGOLLET: I've been calling this q alpha. 77:51 I mean, that's my personal choice, 77:53 because I don't want to-- 77:54 I only use q alpha. 77:55 So I only use quantiles where alpha is to the right, so. 77:59 That's what statisticians-- probabilists 78:01 would use this notation. 78:02 78:07 OK. 78:07 And so some questions, right? 78:10 So of course, in practice you're going 78:11 to have some issues which translate. 78:13 I say, well, how do you pick this guy, this K? 78:16 So I gave you some sort of a-- 78:17 I mean, the way we discussed, right? 78:19 You have 8 and 10 and 20, then it's ad hoc. 78:23 And so depending on whether you want to stop K at 20 78:27 or if you want to bin those guys is really up to you. 78:29 And there's going to be some considerations 78:31 about the particular problem at hand. 78:32 I mean, is it coarse-- too coarse 78:34 for your problem to decide that the observations between 8 78:38 and 20 are the same? 78:39 It's really up to you. 78:40 Maybe that's actually making a huge difference 78:42 in terms of what phenomenon you're looking at. 78:45 The choice of the bins, right? 78:46 So here there's actually some sort 78:48 of rules, which are don't use only one bin 78:51 and make sure there's actually-- don't use them too small 78:55 so that there's at least one observation per bin, right? 78:57 And it's basically the same kind of rules 78:59 that you would have to build a histogram. 79:00 If you were to build a histogram for your data, 79:02 you still want to make sure that you 79:03 bin in an appropriate fashion. 79:05 OK? 79:05 And there's a bunch of rule of thumbs. 79:08 Every time you ask someone, they're 79:09 going to have a different rule of thumb, 79:11 so just make your own. 79:13 And then there's the computation of pj 79:17 of theta, which might be a bit complicated 79:19 because, in this case, I would have 79:21 to integrate the Gaussian between this number 79:24 and this number. 79:25 So for this case, I could just say, well, 79:27 it's the difference of the CDF in that value and that value 79:30 and then be happy with it. 79:31 But you can imagine that you have some slightly more 79:33 crazy distributions. 79:34 You're going to have to somewhat compute 79:36 some integrals that might be unpleasant for you to compute. 79:39 OK? 79:40 And in particular, I said the difference 79:41 of the PDF between that value and that value of-- sorry, 79:44 the CDF between that value and that value, it is true. 79:47 But it's not like you actually have 79:49 tables that compute the CDF at any value you like, right? 79:52 You have to sort of-- 79:54 well, there might be but at some degree, 79:56 but you are going to have to use a computer typically 79:58 to do that. 80:01 OK? 80:01 And so for example, you could do the Poisson. 80:05 If I had time, if I had more than one minute, 80:07 I would actually do it for you. 80:08 But it's basically the same. 80:10 The Poisson, you are going to have an infinite tail, 80:12 and you just say, at some point I'm 80:14 going to cut everything that's larger than some value. 80:16 All right? 80:17 So you can play around, right? 80:20 I say, well, if you have extra knowledge about what you expect 80:23 to see, maybe you can cut at a certain number 80:26 and then just fold all the largest values from K minus 1 80:30 to infinity so that you actually have-- 80:35 you have everything into one large bin. 80:37 OK? 80:38 That's the entire tail. 80:39 And that's the way people do it in insurance companies, 80:42 for example. 80:42 They assume that the number of accidents you're going to have 80:45 is a Poisson distribution. 80:47 They have to fit it to you. 80:48 They have to know-- 80:49 or at least to your pool of insurance of injured people. 80:52 So they just slice you into what your character-- 80:56 relevant characteristics are, and then they 80:58 want to estimate what the Poisson distribution is. 81:00 And basically, they can do a chi-square test 81:03 to check if it's indeed a Poisson distribution. 81:06 All right. 81:07 So that will be it for today. 81:10 And so I'll be-- 81:11 I'll have your homework-- 81:13