1 00:00:01,069 --> 00:00:03,189 the following content is provided under a Creative Commons license your support will help MIT OpenCourseWare continue to offer high quality educational resources for free to make a donation or to view additional materials from hundreds of MIT courses visit MIT opencourseware at ocw.mit.edu so welcome to the second lecture of 672 performance engineering of software systems today we're going to be talking about Bentley rules for optimizing work all right so work does anyone know what work means you're all at MIT so you should know so in terms of computer programming there's actually a formal definition of work the work of a program on a particular input is defined to be the sum total of all of the operations executed by the program so it's basically a gross measure of how much stuff the program needs to do and idea of optimizing work is to reduce the amount of stuff that the program needs to do in order to improve the running time of your program improve its performance so algorithm design can produce dramatic reductions in the work of a program for example if you want to sort an array of elements you can use an n log n time quicksort or you can use an N squared time sort like insertion sort so you've probably seen this before in your algorithms courses and for large enough values of n then n log n time sort is going to be much faster than an N squared sort so today I'm not going to be talking about algorithm design you'll see more of this in other courses here at MIT and we'll also talk a little bit about algorithm design later on in this semester we will be talking about many other cool tricks for reducing the work of a program but I do want to point out that reducing the work of our program doesn't automatically translate to a reduction in running time and this is because of the complex nature of computer hardware so there's a lot of things going on that aren't captured by this definition of work there's instruction level parallelism caching vectorization speculation and branch prediction and so on and we'll learn about some of these things throughout this semester but reducing the work of our program does serve as a good heuristic for reducing the overall running time of a program at least to a first order so today we'll be learning about many ways to reduce the work of your program so rules that we will be looking at we call them Bentley optimization rules in honor of John Lewis Bentley so John Lewis Bentley wrote a nice little book back in 1982 called writing efficient programs and inside this book there are various techniques for reducing the work of a computer program so if you haven't seen this book before it's it's very good so I highly encourage you to read it many of the original rules in Bentley's book had to deal with the vagaries of computer architecture three and a half decades ago so today we've created a new set of Bentley rules just dealing with the work of a program we'll be talking about architecture specific optimizations later on in the semester but today we won't be talking about this one cool fact is that John Lewis Bentley is actually my academic great-grandfather so John Bentley was one of charles license academic advisors charles leiserson was guy blocks academic adviser and guy block who's a professor at Carnegie Mellon was my advisor when I was a graduate student at CMU so it's a nice little fact and I had honor of meeting John Bentley a couple years ago at a conference and he told me that he was my academic great-grandfather yeah and Charles is my academic grandfather and all of Charles's students are my academic aunts and uncles including your TA Helen okay so here's a list of all the work optimizations that we'll be looking at today so they're grouped into four categories data structures loops logic and functions so there's a list of 22 rules on this slide today in fact we'll actually be able to look at all of them today so today's lecture is going to be structured as a series of many lectures and I'm gonna be spending one to three slides on each one of these optimizations all right so let's start with optimizations for data structures so first optimization is packing and encoding your data structure and the idea of packing is to store more than one data value in a machine word and the related idea of encoding is to convert data values into a representation that requires fewer bits so as I need one know why this could possibly reduce the running time of a program yes right so good answer the answer was it might need less memory fetches and it turns out that that's correct because a computer program spends a lot of time moving stuff around in memory and if you reduce the number of things that you have to move around in memory then that's a good heuristic for reducing the running time of your program so let's look at an example let's say we wanted to encode dates so let's say we wanted to encode this string September 11th 2018 you can store this using 18 bytes so you can use one byte per character here and this would require more than two double words because each double word is eight bytes or 64 bits and you have 18 bytes you need more than two double words and you have to move around these words every time you want to manipulate the date but it turns out that you can actually do better than using 18 bytes so let's say let's assume that we only want to store years between 4096 BCE and 4096 see so there are about 365.25 times 8192 dates in this range which is three million approximately and you can use log base 2 of 3 million bits to represent all of the dates within this range so the notation LG here means log base of 2 that's going to be the notation I'll be using in this class and LG will mean log base 10 so we take the ceiling of log base 2 of 3 million and that gives us 22 bits so a good way to remember how to compute the log base 2 of sum of something you can remember that the log base 2 of 1 million is 20 log base 2 of 1,000 is 10 and then you can factor this out and then add in log base 2 of 3 rounded up which is 2 so that gives you 22 bits and that easily fits within one 32-bit words you now you only need one word instead of three words as you did in the original representation but with this modified representation now determining the month of a particular date will take more work because now you're not explicitly storing the month in your representation whereas with the string representation you are explicitly storing it at the beginning of the string so this does take more work but it requires less space so any questions so far okay so it turns out that there's another way to store this which also makes it easy for you to fetch the month the year or the day for a particular date so here we're going to use the bit fields facilities in C so we're gonna create a struct called date underscore t with three fields a year the month in the date and the integer after the semicolon specifies how many bits I want to assign to this particular field in the struct so this says I need 13 bits for the year four bits for the month and five bits for the day so the 13 bits for the year is because I have 8,192 possible year so I need 13 bits to store that for the month I have 12 possible months so I need log base 2 of 12 rounded up which is 4 and then finally for the day I need log base 2 of 32 31 rounded up which is 5 so in total this still takes 22 bits but now the individual fields can now be accessed much more quick quickly than if we had just encoded the 3 million dates using sequential integers because now you can just extract a month just by saying whatever whatever you named your struct you can just say that that struck dot Mont and that gives you the month yes yeah so this will actually pad the struct a little bit at the end yes so you actually do require a little bit more than 22 bits that's a good good question but this representation is much more easy to access than if you just had encoded the integers as sequential integers but another point is that sometimes unpacking and decoding or the optimization because sometimes it takes a lot of work to encode and code the values and to extract them so sometimes you want to actually unpack the values so that they take more space but they're faster to access so it depends on your particular application you might want to do one thing or the other and the way to figure this out is just to experiment with it okay so I need other questions all right so the second optimization is data structure augmentation an idea here is to add information to a data structure to make common operations do less work so that they're faster and let's look at an example let's say we had two singly linked lists and we wanted to append them together and let's say we only stored the head corner to the list and then each element in the list has a pointer to the next element in the list now if you want to append one list to another list well that's going to require you walking down the first list to find the last element so that you can change the pointer of the last element to point to the beginning of the next list and this might be very slow if the first list is very long so does anyone see a way to augment this data structure so that appending two lists can be done much more efficiently yes yes so the answer is the store pointer to the last value and we call that a tail pointer so now we have two pointers both the head and the tail the head points the beginning of the list a tail points to the end of the list and now you can just append two lists in Causton time because you can access the last element in the list by following the tail pointer and then now you just change the successor pointer of the last element to point to the head of the second list and then now you also have to update the tail to point to the end of the second list okay so that's idea of data structure augmentation we added a little bit of extra information to the data structure such that now appending to lists is much more efficient than in the original method where we only had a head pointer questions okay so the next optimization is pre-computation the idea of pre-computation is to perform some calculations in advance so as to avoid doing these computations at mission critical times to avoid doing them at run time so I say we had a program that needed to use binomial coefficients and here's a definition of a binomial coefficient so it's basically the choose function so you want to choose you want to count the number of ways that you can choose K things from a set of n things and the formula for computing this is n factorial divided by the product of K factorial and n minus K factorial computing this choose function can actually be quite expensive because you have to do a lot of multiplications to compute the factorials even if the final result is not that big because you have to compute one term in the denominator and then two factorial terms in the denominator and then you also might run into integer overflow issues because n factorial grows very fast it grows super exponentially it grows like n to the N which is even faster than 2 to the N which is exponential so doing this computation you have to be very careful with integer overflow issues so one idea to speed up a program that uses these binomial coefficients is to pre compute a table of coefficients when you initialize the program and then just perform table look up on this pre computed table at runtime when you need the binomial coefficient so as anyone know what the table that stores binomial coefficients is called yes yeah Pascal's triangles good so here's what Pascal's triangle looks like so on the vertical axis we have different values of n and then on the horizontal axis we have different values of K and then to get and choose K just go to the and throw in the case column and look up that entry Pascal's triangle has a nice property that for every element it can be computed as a sum of the element directly above it and above it and to the left of it so here 56 is the sum of 35 and 21 and this gives us a nice formula to compute the binomial coefficients so we first check if n is less than K in this choose function if n is less than K then we just return zero because we're trying to choose more things than there are in a set if n is equal to zero then we just return one because here K must also be equal to zero since we had the condition and last and K above and there's one way to choose zero things from a set of zero things and then if K is equal to zero we also return one because there's only one way to choose zero things from a set of any number of things you just don't pick anything and then finally we recursively call this huge function so we call choose of n minus 1 K minus 1 this is essentially the entry above and diagonally diagonal to this and then we add in choose of n minus K m nice one K which is the entry directly above it so this is a recursive function for generating this Pascal's triangle but notice that we're actually still not doing pre computation because every time we call this choose function we're making two recursive calls and this can still be pretty expensive so how can we actually pre compute this table so here's some C code for pre computing Pascal's triangle and let's say we only wanted coefficients up to a shoe size of a hundred so we initialize matrix of a hundred by a hundred and then we call this in the netshoes function so first it goes from zero all the way up to two size minus one and then it sets choose of n 0 to be 1 it also choose sets choose of n n to be 1 so the first line is because there's only one way to choose 0 things from any number of things in the second line is because there's only one way to choose n things from N Things which is just to pick all of them and then now we have a second loop which goes from n equals 1 all the way up to choose size minus 1 and then first we set choose of 0 and to be 0 because here n is or K is greater than n so there's no way to pick that there's no way to pick more elements from a set of things that is less than the number of things you want to pick and then now you loop from K equals 1 all the way up to n minus 1 and then you apply this recursive formula so choose of n K is equal to choose of n minus 1 K minus 1 plus choose of n minus 1 K and then you also set choose of K and to be 0 so the this is basically all of the entries above the diagonal here where K is greater than n and then now inside the program whenever we need a binomial coefficient that's less than a hundred we can just do table lookup into this table and we just index into this choose array so does this make sense any questions it's just pretty easy so far right so one thing to note is that we're still computing this table at run time because we have to initialize this table at run time and if we want to run or program many times then we have to initialize this table many times so is there a way to only initialize this table once even though we might want to run the program many times yes yeah so good so put it in the source code and so we're gonna do compile time initialization and if you put the table in the source code then the compiler will compile this code and generate the table for you at compile time so now whenever you run it you don't have to spend time initializing the table so idea of compile time initialization there's a store that values of constants during compilation and therefore saving work at run time so let's say we wanted choose values up to 10 this is the table the 10 by 10 table storing all of the binomial coefficients up to 10 so if you put this in your source code now when you run the program you can just index into this table to get the appropriate appropriate constant here but this table was just a 10 by 10 table what if you wanted a table of a thousand by a thousand does anyone actually want to type this in a table of 1000 by 1000 so probably not so is there any way to get around this yes yeah yeah so the answer is to write a program that writes your program for you and that's called meta programming so here's a snippet of code that will generate this table for you so it's going to call this a net choose function that we defined before and that now it's just gonna print out C code so it's gonna print out that the Declaration of this array choose followed by a left bracket and then for each row of the table we're gonna print another left bracket and then print the value of each entry in that row followed by a right bracket and we do that for every row so this will give you the C code and now you can just copy and paste this and place it into your source code this is a pretty pretty cool technique to get your get your computer to do work for you and you're welcome to use this technique and in your homeworks and projects if you if you need to generate large tables of constant values so this is a very good technique to know so any questions yes yeah you can't so you can pipe the you can pipe the output of this program to a file yeah yes so we have yeah so I think you can write macros to actually generate this table and then the compiler will run the run those macros to generate this table for you yes you don't actually need to copy and paste it right so as Charles says you can write your meta program using any language you don't have to write it and see you can write it in Python if you're more familiar with that and it's often easier to write it using a scripting language like Python okay so let's look at the next optimization so if we already gone gone through a couple mini lectures already so congratulations to all of you who who are still here so the next optimization is caching the idea of caching is to store results have been accessed recently so that you don't need to compute them again in the program so let's look at an example so let's say we wanted to compute the hypotenuse of a right triangle with side lengths a and B so the formula for computing this is you take the square root of a times a plus B times B okay so turns out that the square root operator is actually a relatively expensive more expensive than additions and multiplications on modern machines so you don't want to have to call the square root function if you don't have to and one way to avoid doing that is to create a cache so here I have a cache just storing the previous hypotenuse that I calculated and I also store the values of a and B that were passed to the function and then now when I call the hypotenuse function I can first check if a is equal to the cash value of a and if B is equal to the cash value of B and if both of those are true then I already computed I potenuse before and then I can just return cast of age but if it's not in my cache now I need to actually compute it so I need to call this square root function and then I store the result into cash 2h and I also store a and B in the cache a in cash B respectively and then finally I return cash 2h so this this example isn't actually very realistic because my cache is only of size one and it's very unlikely in a program you're going to repeatedly call some function with the same input arguments but you can actually make a larger cache you can make a cache of size 1,000 storing the 1000 most recently computer hypotenuse values and then now when you call that potenuse function you can just check if it's in your cache checking the the larger cache is going to be more expensive because there are more values to look at they can still save you time overall and and hardware also does caching for you as well talk about later on in the semester but the point of this optimization is that you can also do caching yourself you can do it in software you don't have to let Hardware do it for you it turns out for this particular program here actually is about 30% faster if you do hit the cache 2/3 about 2/3 at the time so it does actually save you time if you do repeatedly compute the same values over and over again so that's caching any questions okay so the next optimization will look at is sparsity the idea of exploiting sparsity in an input is to avoid storing and computing on zero elements of that input and the fastest way to compute on zeros is not compute on them at all because we know that at any value plus zero is just that original value and any value times zero is just zero so why waste the computation doing that when you already know the result so let's look at an example this is matrix vector multiplication so we want to multiply an N by n matrix by an N by one vector we can do dense matrix vector multiplication by just doing a dot product of each row in the matrix with the column vector and then that will give us the output vector but if you do dense matrix vector multiplication that's gonna perform N squared or 36 in this example scalar multiplies but it turns out only 14 of these entries in this matrix are 0 or are nonzero so you just wasted work doing the multiplication on the zero elements because you know that 0 times any other element is just 0 so a better way to do this is instead of doing the multiplication for every element you first check if one of the arguments is 0 and if it is 0 then you don't have to actually do the multiplication but this is this this is still kind of slow because you still have to do a check for every entry in your matrix even though many of the entries are 0 so there's actually a pretty cool data structure that won't actually store these 0 entries and this will speed up your matrix vector multiplication if your matrix is sparse enough so let me describe how this data structure works it's called compressed sparse Row or CSR there's an analogous representation called compressed sparse column or CSC but today I'm just going to talk about CSR so we have three arrays first we have the rows the length of the rows array is just equal to the number of rows in a matrix plus 1 and then each entry in the rows array just stores an offset into the columns array or the calls array and inside the calls array I'm storing the indices of the nonzero entries in each of the rows so if we take Row 1 for example we have rows of 1 is equal to 2 that means I start looking at the second entry in the calls array and then now I have the indices of the nonzero columns in the first row so it's just 1 2 4 and 5 these are nonzero in this these are the indices for the nonzero entries and then I have another array called vowels the length of this array is the same as the calls array and then this array just stores the actual value in these indices here so the vowels array for Row 1 is going to store 4 1 5 and 9 because these are the nonzero entries in the first row right so the first rows array just serves as an index into this calls array so you can basically get the starting starting index in this calls away for any row just by looking at the entry stored at the corresponding location in the rows array so for example Row 2 starts at location 6 so it starts here and you have indices 3 & 5 which are the nonzero indices does anyone know how to compute the length the number of non-zeros in a row by looking at the rows array yes yes right yes so to get the length of a row you just take the difference between that Rose offset and the next Rose offset so we can see that the length of the first row is 4 because it's offset is 2 and the offset for Row 2 is 6 so you just take the difference between those two entries we have an additional entry here so we have the sixth row here because we want to be able to compute the length of the last row without overflowing in our array so we just created an additional entry in the road's array for that so turns out that this representation will save you space if your matrix is sparse so the storage required by the CSR format is order n plus n n Z where n n Z is a number of non-zeros in your matrix and the reason why you have n plus n n Z well you have two arrays here calls in bhau's whose length is equal to the number of non-zeros in the matrix and then you also have this rows array whose length is n plus 1 so that's why we have n plus n n Z and if the number of nonzero is as much less than N squared then this is going to be significantly more compact than the dense matrix representation however this isn't always going to be the most compact representation does anyone see why why might the dense representation sometimes take less space yeah sorry why might the dance representation sometimes take less space right so yeah if you have a relatively dance matrix then it might take more space than storing it it might take more space in the CSR representation because you have these two arrays so if you take the extreme case where all of the entries are non zeros then both of these arrays are going to be of length N squared so you already have 2n squared there and then you also need this Rose array which is of length n plus 1 okay so now I gave you this more compact representation for storing the matrix so how do we actually do stuff with this representation so it turns out that you can still do matrix vector multiplication using this compressed barse row format and here's the code for doing it so we have this struct here which is the CSR representation we have the Rose array that call calls array and then the Val's array and then we also have the number of rows N and the number of non-zeros n n Z and then now what we do we call this SP MV or sparse matrix vector multiply we pass in our CSR representation which is a and then the input array which is X and then we store the result in an output array Y so first we loop through all of the rows and then we set Y if I to be 0 this is just initialization and then for each of my rows I'm going to look at the non 0 the column indices for the non zero elements and I can do that by starting at K equals 2 rows of I and going up 2 rows of I plus 1 and then for each one of these entries I just look up the index the column index for the nonzero element and I can do that with calls of K so that BJ and then now I know which elements to multiply I multiplied vowels of K by ax of J and then now I just add that to Wi-Fi and then this after I finish with all of these multiplications and additions this will give me the same result as if I did the dense matrix vector multiplication so this is actually a pretty pretty cool program so I encourage you to look at this program offline to convince yourself that it's actually computing the same thing as the dense matrix vector multiplication version so I'm not going to prove this during lecture today but you can feel free to ask me or any of your TAS after class if you have questions about this and the number of scalar multiplications that you have to do using this code it's just going to be nnz because you're just operating on the non-zero elements you don't have to touch all of the zero elements and in contrast the dense matrix vector multiply algorithm would take N squared multiplications so this can be significantly faster for sparse matrices so it turns out that you can also use a similar structure to store a sparse static graph so I see many of you have seen graphs in your previous courses see here's what the sparse graph representation looks like so again we have this raised we have these two arrays we have offsets and edges the offsets array is analogous to the rows array in the edges array is analogous to the columns array for the CSR representation and then in this offsets array we store for each vertex where its neighbors start in this majus array and then in the edges array we just write the indices of its neighbors there so let's take vertex one for example the offset of vertex one is two so we know that its outgoing neighbors start at position two in this edges array and then we see that vertex one has outgoing edges two vertices two three and four and we see in the edges array two three four are listed and you can also get the degree of each vertex which is analogous to the length of each row by taking the difference between consecutive offsets so here we see that the degree of vertex 1 is 3 because it's offset is 1 it's offset is 2 in the offset of vertex 2 is 5 it turns out that using this representation you can run many classic graph algorithms such as breadth-first search and PageRank quite efficiently especially when the graph is sparse so this would be much more efficient than using a dense matrix to represent the graph and running these algorithms you can also store weight on the edges and one way to do that is to just create an additional array called weights whose length is equal to the number of edges in the graph and then you just store the weights in that array and this is analogous to the values array in the CSR representation but there's actually a more efficient way to store this if you always need to access the weight whenever you acts as an edge and the way to do this is to interleave the weights with the edges so to store the weight for a particular edge right next to that edge and create a array of twice number of edges in the graph and the reason why this is more efficient is because it gives you improved cache locality weight and we'll talk much more about cache locality later on in this course but the high-level idea is that whenever you access an edge the weight for that edge will also likely to be on the same cache line so you don't need to go to main memory to access the weight of that edge again and later on in the semester we'll actually have a whole lecture on doing optimizations for graph algorithms and today I'm just going to talk about the representation one representation of graphs but we'll talk much more about this later on any questions okay so that's it for the data structure optimizations we still have three more categories of optimizations to go over so it's a pretty fun lecture we get to learn about many cool tricks for reducing the work of your program so in the next class of optimizations will look at is logic so first thing is constant folding and propagation the idea of constant folding and propagation is to evaluate constant expressions and substitute the result into further expressions all at compilation times you don't have to do it at runtime so again let's look at an example so we're here we have this function called orrery does anyone know what worried means you can look it up on Google okay so so nori is a model of a solar system so here we're constructing a digital orrery and in order we we have these whole bunch of different constants we have the radius the diameter the circumference cross area surface area and also the volume if you look at this code you can notice that actually all these constants can be defined at compile time once we fix the radius so here we set the radius to be this constant here six six million three hundred and seventy one thousand I don't know where that constant comes from by the way but Charles made these slides so it probably does sorry okay radius of the earth now that diameter is just twice this radius the circumference is just pi times diameter cross area is pi times the radius squared surface area is circumference times the diameter and finally volume is four times pi times the radius cubed divided by three so you can actually evaluate all of these two constants at compile time so with a sufficiently high level of optimization the compiler will actually evaluate all of these things at compile time and that's the idea of constant folding and propagation it's a good idea to know about this even though the compiler is still pretty is pretty good at doing this because sometimes the compiler won't do it and in those cases you can do it yourself and you can also figure out whether the compiler is actually doing it when you look at the assembly code okay so the next optimization is common sub-expression elimination and the idea here is to avoid computing the same expression multiple times by valuing the expression once and storing the result for later use so let's look at this simple four line program we have a equal to B plus C then we set B equal to a minus D then we set C equal to B plus C and finally we set D equal to a minus D so notice here that the second and the fourth lines are computing the same expression they're both computing a minus D and they evaluate to the same thing so idea of common sub-expression elimination would be to just substitute the result of the first evaluation into the place where you need it in future future lines so here we still evaluate the first line for a minus D but now in the second time we need a minus D we just set the value to B so now D is equal to B instead of a minus D so in this example the first and the third line the right-hand side of those lines actually look the same they're both B plus seed does anyone see why you can't do common sub-expression elimination here yeah so you can't do common sub-expression for the first and the third lines because the value of B changes in between so the value of B changes on the second line so on the third line when you do B plus C is not actually computing the same thing as the first first evaluation of B plus C so again the compiler is usually smart enough to figure this optimization out so it will do this optimization for you in your code but again it doesn't always do it for use as good it's a good idea to know about this optimization so that you can do this optimization by hand when the compiler doesn't do it for you questions so far okay so next let's look at algebra identities the idea of exploiting algebraic identities is to replace more expensive algebraic expressions with expressions that are cheaper to evaluate so let's look at an example let's say we we have a whole bunch of balls and we want to detect whether two balls collide with each other say ball has an x-coordinate a y-coordinate a Z coordinate as well as a radius and the collision test works as follows we set D equal to the square root of the sum of the squares of the differences between each of the three coordinates of the two balls so here we're taking the square of B one's x-coordinate minus B 2 is x-coordinate and then adding the square of B one's y-coordinate minus B 2 is y-coordinate and finally adding the square of B 1 z coordinate minus B 2 z coordinate and now we take the square root of all of that and then if if the result is less than or equal to the sum of the two radii of the ball then that means that there's a collision and otherwise that means there's not a collision it turns out that the square root operator as I mentioned before is relatively expensive compared to doing multiplications and additions and subtractions on modern machines so how can we do this without using the square root operator yes right so that's actually a good check to a good fast path check I don't think it necessarily always gives you the right answer sir another yes right right so the answer is that you can actually take the square of both sides so now you don't have to take the square root anymore so we're gonna use the identity that says if the square root of U is less than or equal to V exactly when u is less than or equal to V squared so we're just going to take the square of both sides and here's the modified code so now I don't have this square root anymore on the right hand side when I compute V squared but instead I square the sum of the two radii so this this will give you the same answer however you do have to be careful with floating-point operations because they don't work exactly in the same way as real numbers so you sometimes might run into overflow issues or rounding issues so you do have to be careful if you're using algebraic identities in floating-point computations but the hile of idea is that you can use equivalent algebraic expressions to reduce the work of your program and we'll come back to this example later on in this lecture when we talk about some other optimizations such as a fast path optimization as one of the students pointed out yes which are you talking about this line so before we were comparing D yeah yeah okay is that clear okay okay so the next optimization is short-circuiting the idea here is that when we're performing a series of tests we can actually stop you evaluating this series of tests as soon as we know what the answer is so here's an example let's say we have an array a containing all non-negative integers and we want to check if the sum of the values and a exceeds some limit so the simple way to do this is you just sum up all of the values of the array using a for loop and then at the end you check if the total sum is greater than the limit so using this approach you always have to look at all the elements in the array but there's actually a better way to do this and the idea here is that once you know the partial sum exceeds the limit that you're testing against then you can just return true because at that point you know that the sum of the elements in the array will exceed the limit because all of the elements in the array are non-negative and then if you get all the way to the end of this for loop that means you didn't exceed this limit and you can just return false so this this second program here will usually be faster if most of the time you exceed the limit pretty early on when you loop through the array but if you actually end up looking at most of the elements anyways or even looking at all the elements this second program will actually be a little bit slower because you have this additional check inside this for loop that has to be done for every iteration so when you apply this optimization you should be aware of whether this will actually be faster or slower based on the frequency of of what when you can short-circuit the test questions okay and I want to point out that there are short-circuiting logical operators so if you do double ampersand that's a short-circuiting and logical and operator so if it you value so if it evaluates the left side to be false it means that the whole thing has to be false it's not even going to evaluate the right side and then the double vertical bar is going to be a short-circuiting or so if it knows that the left side is true it knows the whole thing has to be true because or just requires one of the two sides to be true and it's gonna short-circuit in contrast if you just have a single ampersand or a single vertical bar these are not short-circuiting operators are going to evaluate both sides of the argument the single ampersand and single vertical bar turn out to be pretty useful when you're doing bit manipulation and we'll be talking about these operators more on Thursday's lecture yes yeah if you use a double ampersand then that would be true yes yeah yes one example is if you might possibly index an array out of balance you can first check whether you would exceed the limit or be out of balance and if if so then you don't actually do the index okay a related idea is to order tests such that the tests that are more often successful or earlier and the ones that are less frequently successful are later in the order because you want to take advantage of short-circuiting and similarly inexpensive tests should precede expensive tests because if you do too inexpensive tests in your test short circuit then you don't have to do the more expensive tests so here's an example here we're checking whether a character this white space so characters white space if it's equal to the carriage return character if it's equal to the tab character is equal to space or if it's equal to the newline character so which one of these tests do you think should go at the beginning yes why is that yeah yeah so so the it turns out that the space and then you line characters appear more frequently than the curious returned in the tab in the the space is the most frequent because you have a lot of spaces in text so here I've reordered the test so that the space the check for space is first and then now if you have a character that's a space you can just short-circuit this test and return true next the newline character I have I have it as the second test because these are also pretty frequent you have a new line for every new line in your text and then less frequent is a tab character and finally the carriage return free character isn't that frequently used nowadays so now now with this or during the most frequently successful tests are going to appear first notice that this only actually saves you work if the character is a whitespace character if it's not a whitespace character then you're gonna end up evaluating all of these tests anyways okay so now let's go back to this example of detecting collision of balls so we're gonna look at the idea of creating a fast path and the idea of creating a fast path is that there might possibly be a check that will enable you to exit the program early because you already know what the result is and one one fast path check for this particular program here is you can check whether the bounding boxes of the two balls intersect if you know the bounding boxes of the two balls don't intersect then you know that the balls cannot collide if the bounding boxes of the two balls do intersect well then you have to do the more expensive test because that doesn't necessarily mean that the balls will collide so here's what the fast path test looks like we're first going to check whether the bounding box is intersect and we can do this by looking at the absolute value of the difference on each of the coordinates and checking if that's greater than the sum of the two radii and if so that means that for that particular coordinate the bounding boxes cannot intersect and therefore the balls cannot collide and then we can return false of any one of these tests return true and otherwise we'll do the more expensive test of comparing D squared to the square of the sum of the two radii and the reason why this is a fast path is because this test here is actually cheaper to evaluate than this test below here we're just doing subtractions additions and comparisons and below we're using the square operator which requires a multiplication and multiplications are usually more expensive than additions on modern machines so ideally if we don't need to do the multiplication we can avoid it by going through our fast path so for this example it probably isn't worth it to do the fast path checks since there's such a small program but in practice there are many applications and graphics that benefit greatly from doing fast path checks and the fast path check will greatly improve the performance of these graphics programs there's actually another optimization that we can do here I talked about this optimization couple slides ago does anyone see it yes right so we can apply common sub-expression elimination here because we're computing the sum of the two radii four times we can actually just compute it once stored in a variable and then use it for the subsequent three calls and then similarly what we're taking the difference between each of the coordinates we're also doing it twice so again we can store that in a variable and then just use the result in the second time any questions okay so the next idea is to combine tests together so here we're gonna replace a sequence of tests with just one test or switch statement here's an implementation of a full adder so a full adder there's a hardware device that takes us input three bits and then it returns the carry bit and the sum bit as output so here's a table that specifies for every possible input to the full adder what the output should be and are eight possible inputs to the full adder because it takes three bits and there eight possibilities there and this program here it's going to check all the possibilities its first going to check if a is equal to zero if so it checks that B equals equals to zero if so it checks if C is equal to zero and if that's true it returns zero and zero for the two bits and otherwise it returns one and zero and so on so this is basically a whole bunch of if-else statements nested together does anyone think this is a good way to write the program who thinks this is a bad way to write the program okay so most of you think it's a bad way to write the program and hopefully I can convince the rest of you who didn't raise your hand so here's a better way to write this program so we're gonna replace these multiple if-else clauses with a single switch statement and what we're gonna do is we're going to create this test variable that is a 3-bit variable so we're gonna place the C bit in the least significant digit the B bit we're gonna shift it over by one so in the second least significant digit and then the a bit in the third least significant digit and then now the value of this test variable is going to range from 0 to 7 and then for each possibility we can just specify what the sum and the carry bits should be and this requires just a single switch statement instead of a whole bunch of is else clauses there's actually an even better way to do this for this example which is to use table lookups you just pre compute all of these answers stored in a table and then just look it up at runtime but the idea here is that you can combine multiple tasks into a single test and this not only makes your code cleaner but it can also improve the performance of your program because you're not doing so many checks and you won't have an as many branch misses yes maybe yeah I mean I encourage you to see if you can write a faster program for for this all right so we're done with two categories of optimizations stuff two or more to go the third category is going to be about loops so if we didn't have any loops in our in our programs well there wouldn't be many interesting programs to optimize because most of our programs wouldn't be very long running but with loops we can actually optimize these loops and then realize the benefits of performance engineering the first loop optimization I want to talk about is hoisting the Gulf hoisting which is also called loop invariant code motion is to avoid recomputing a loop invariant code each time through the body of a loop so if you have a for loop where each and each iteration are competing the same thing well you can actually save work by just computing it once so in this example here I'm looping over an array of length N and then I'm setting Y if I equal to X of I times the exponential of the square root of pi over 2 but this exponential of square root of PI over 2 is actually the same in every iteration so I don't actually have to compute that every time so here's a version of the code that does hoisting I just moved this expression outside of the for loop and stored it in a variable factor and then now inside the for loop I just have to multiply by factor I already computed what this expression is and this can save a running time because computing the exponential d'squared of pi over two is actually relatively expensive so it turns out that for this example here the compiler can probably figure it out and do this hoisting for you but in some cases the compiler might not be able to figure it out especially if these functions here might change throughout the program so it's good it's a good idea to know what this optimization is so you can apply it in your code when the compiler doesn't do it for you okay sentinels so sentinels are special dummy values placed in a data structure to simplify the logic of handling boundary boundary conditions and in particular the handling of loop exit tests so here I have again have this program that checks whether I have this program that checks whether the sum of the elements in some array a will overflow if I added all of the elements together and here I've specified that all the elements of a are non-negative so how I do this is in every iteration I'm gonna increment sum by a of I and then I'll check if the resulting sum is less than a of I does anyone see why this will detect if I had an overflow yes you know so if so if the thing I added in cause is an overflow then the result is going to wrap around and it's going to be less than the thing I added in so this is why the check here that checks whether some is last native I will detect an overflow okay so I'm gonna do this check in every iteration if it's true I'll just return true and otherwise I get to the end of this for loop where I just return false but here on every iteration I'm doing two checks I'm first checking whether I should exit the body of this loop and then secondly I'm checking whether the sum is less than a of I it turns out that I can actually modify this program so that I only need to do one check in every iteration of the loop there's a modified version of this program so here I'm gonna assume that I have two additional entries in my array a so these are a of N and a of n minus one so I assume I can use these locations and I'm gonna set a of n to be the largest possible 64-bit integer or in 64 max and then I'm gonna set a of n plus 1 to be 1 now I'm going to initialize my loop variable I to be 0 and then I'm going to set the sum equal to the first element in a or a of 0 and then now I have this loop it checks whether the sum is greater than or equal to a of I and if so I'm gonna add the sum I'm going to add a of I plus 1 to the sum and then I also increment I okay and this code here does the same thing as a thing on the left because the only way I'm going to exit this while loop is if I overflow and I'll overflow if a of I becomes greater than sum or if the sum becomes less than a of I which is what I had in my original program and then otherwise I'm gonna just increment sum by AFI and then this code here is going to eventually overflow because if the elements in my array a don't cause the program to overflow I'm gonna get to a of N and a if n is a very large integer and if I add that what I have is going to cause the program to overflow and at that point I'm gonna exit this for loop or this while loop and then after I exit this loop I can check I can check why I overflowed if I overflowed because I because of some element of a then the loop index I is going to be less than n and I return true but if I overflowed because I added in this huge integer well then I is going to be equal to n and then I know that the elements of a didn't cause me to overflow the a of n value here did so then I just return false does this make sense so here in each iteration I only have to do one check instead of two texts as in my original code I only have to check whether the sum is greater than or equal to a if I does anyone know why I set a of n plus one equal to one yes yeah so good so the answer is because if all of my elements were zero in my original array that even though I add in this huge integer it's still not gonna overflow but now I'm going to when I get to a of n plus 1 I'm gonna add 1 to it and then that will cause the sum to overflow and then I can exit there so this is a steal deal with the boundary condition when all the entries in my array are 0 okay so next loop and rolling so loop unrolling attempts a safe work by combining several consecutive iterations of a loop into a single iteration thereby reducing the total number of iterations of the loop and consequently the number of times that the instructions that control the loop have to be executed there are two types of loop unrolling there's full loop and rolling why unroll all of the iterations of the for loop and I'll just get rid of the control flow logic entirely then there's partial loop unrolling where I only ate only unroll some of the iterations but not all of the iterations so I still have some control flow code in my loop let's first look at full loop unrolling say here I have a simple program that just loops for 10 iterations the fully unrolled loop just looks like the code on the right hand side I just wrote out all the lines of code that I have to do in straight line code instead of using a for loop and now I don't need to check on every iteration whether I need to exit the for loop so this is full loop unrolling this is actually not very common because most your loops are going to be much larger than 10 and often times many of your loop bounds are not going to be determined at compile time they're determined at runtime so the compiler can't fully unload that loop for you for small loops like this the compiler will probably unroll the loop for you but for larger loops it actually doesn't benefit to unroll the loop fully because you're gonna have a lot of structions and that's going to pollute your instruction cache so the more common form of Lupin trolling is partial loop and rolling and here in this example here I've enrolled the loop by a factor of four so I reduce a number of iterations of my for loop by a factor of four and then inside the body of each of each iteration I have four instructions and then notice I also changed the logic in my in the control flow of my for loop so now I'm incrementing the variable J by four instead of just by one and then since n might not necessarily be divisible by four I have I have to deal with the remaining elements and this is what the second for loop is doing here is just dealing with the remaining elements and this is the more common form of loop unrolling so the first benefit of doing this is that you have fewer fewer checks to this to the exit condition for the loop because you only have to do this check every four iterations instead of every iteration but the second and much bigger benefit is that it allows more compiler optimizations because it increases the size of the loop body and it gives the compiler more freedom to play around with code and to find ways to optimize the performance of that code so that's usually the bigger benefit if you unroll the loop by too much that actually isn't very good because because now you're going to be polluting your instruction cache and every time you fetch an instruction it's likely going to be a miss in your instruction cache and that's gonna decrease the performance of your program and furthermore if your loop body is already very big you don't really get additional improvements from having the compiler do more optimizations because it already has enough code to work with so giving it more code doesn't actually give you much there okay so I just said this the benefits of loop unrolling a lower number of instructions and loop control code and it also enables bohr compiler optimizations and the second benefit here is usually the much more important benefit and we'll talk more about compiler optimizations in a couple lectures okay any questions okay so the next optimization is loop fusion this is also called jamming and the idea here is to combine multiple loops over the same index range into a single loop thereby saving the overhead of loop control say here I have two loops they're both looping from I equals 0 all the way up to n minus 1 the first loop I'm computing the the minimum of a of I and V of I and storing the result in CFI the second loop I'm comparing the I'm computing the maximum of a of I and B of I and storing the result in D of I so since these are going over the same index range I can fuse together the two loops giving me a single loop that does both of these both of these lines here and this reduces the overhead of loop control code because now instead of doing this exit condition check two and times I only have to do it end times this also gives you better cash flow Cal D again we'll talk more about cache locality in a future lecture but at a lot high level here what it what it gives you is that once you load a of I and B of I into cache when you compute C of I it's also going to be in cache when you compute D of I whereas in the original code when you compute DF i it's very likely that a of I and B if I are going to be kicked out of cache already even though you brought it in when you compute at C of I for this example here again there's another optimization you can do common sub-expression elimination since you're computing this expression a if I is less than or equal to B of I twice okay next let's look at eliminating wasted iterations the idea of eliminating waste generation is to modify the loop bounds to avoid executing loop iterations over essentially empty loop bodies so here I have some code to transpose a matrix so I go from I equals 0 to n minus 1 from J equals 0 to n minus 1 and then I check if I is greater than J and if so I'll swap the entries in a of IJ and a of ji the reason why I have this check here is because I don't want to do the swab twice otherwise I'll just end up with the same matrix that I had before so I only have I all have to do the swap when I is greater than J one disadvantage of this code here is I still have a loop for N squared iterations even though only about half the iterations are actually doing useful work because about half the iterations are gonna fail this check here that checks whether I is greater than J so here's a modified version of the program where I basically eliminate these wasted iterations so now I'm gonna loop from I equals 1 to n minus 1 and then from J equals 0 all the way up to I minus 1 so now instead of going up to n minus 1 I'm going just up to I minus 1 and that basically puts this check whether I is greater than J into the loop control code and that saves me the extra wasted iterations okay so that's the last optimization on loops are there any questions yes so the check is so you still have to do the check in the loop control code but here you also had to do it and now you just don't have to do it again inside the body of the loop yes in the first example and be able to the processor pipeline yeah so so branch yes so most of these are gonna be branch hits so it's still gonna be pretty fast but it's gonna be even faster if you just don't do that check at all so I mean basically you should just test it out in your code to see whether it will give you a runtime improvement okay so last category of optimizations as functions so first the idea of inlining is to avoid the overhead of a function call by replacing a call to the function with the body of the function itself so here I have a piece of code that's computing the sum of squares of elements in an array a and then four so I have this for loop that in each iteration is calling this square function and the square function is defined above here it just does x times X for a input argument X but it turns out that there is actually some overhead to doing a function call and the idea here is to just put the body of the function inside the function that's calling it so instead of calling the square function I'm just gonna create a variable temp and then I call I call I set some equal to some plus 10 times temp so now I don't have to do the additional function call it you don't actually have to do this manually so if you declare your function to be static inline that the compiler is going to try to inline this function for you by placing the body of the function inside the code that's calling it and nowadays compiler is pretty good at doing this so even if you don't declare static in-line compiler will probably still inline this code for you but just to make sure you should if you want to inline a function you should declare it as static inline and you might ask why can't you just use a macro to do this but it turns out that inline functions nowadays are just as efficient as macros but they're better structured because they evaluate all of their arguments whereas macros just do a textual substitution so if you have an argument that's very expensive to evaluate the macro might actually paste that expression multiple times in your code and if the compiler isn't good enough to do common sub-expression elimination then you just wasted a lot of work okay so there's one more optimization or there's two more optimizations that I'm not gonna have time to talk about but I'm gonna post these slides on learning modules so please take a look at them tail recursion elimination and forcing recursion so here are a list of most of the rules that we looked at today there are two of the function optimizations I didn't get to talk about please take a look at that offline and ask your TAS if you have any questions and some closing advice is you should avoid premature optimization so all the things I talked about today improve the performance of your program but you first need to make sure that your program is correct if you have a program that doesn't do the right thing then it doesn't really benefit you to make it faster and to preserve correctness you should do regression testing so develop a suite of tests to check the Krakus of your program every time you change the program and as I said before reducing the work of a program doesn't necessarily decrease its running time but it's a good heuristic and finally the compiler automates many level of optimizations and you can look at the assembly code to see whether the compiler did something you you 2 00:00:03,189 --> 00:00:05,769 the following content is provided under a Creative Commons license your support will help MIT OpenCourseWare continue to offer high quality educational resources for free to make a donation or to view additional materials from hundreds of MIT courses visit MIT opencourseware at ocw.mit.edu so welcome to the second lecture of 672 performance engineering of software systems today we're going to be talking about Bentley rules for optimizing work all right so work does anyone know what work means you're all at MIT so you should know so in terms of computer programming there's actually a formal definition of work the work of a program on a particular input is defined to be the sum total of all of the operations executed by the program so it's basically a gross measure of how much stuff the program needs to do and idea of optimizing work is to reduce the amount of stuff that the program needs to do in order to improve the running time of your program improve its performance so algorithm design can produce dramatic reductions in the work of a program for example if you want to sort an array of elements you can use an n log n time quicksort or you can use an N squared time sort like insertion sort so you've probably seen this before in your algorithms courses and for large enough values of n then n log n time sort is going to be much faster than an N squared sort so today I'm not going to be talking about algorithm design you'll see more of this in other courses here at MIT and we'll also talk a little bit about algorithm design later on in this semester we will be talking about many other cool tricks for reducing the work of a program but I do want to point out that reducing the work of our program doesn't automatically translate to a reduction in running time and this is because of the complex nature of computer hardware so there's a lot of things going on that aren't captured by this definition of work there's instruction level parallelism caching vectorization speculation and branch prediction and so on and we'll learn about some of these things throughout this semester but reducing the work of our program does serve as a good heuristic for reducing the overall running time of a program at least to a first order so today we'll be learning about many ways to reduce the work of your program so rules that we will be looking at we call them Bentley optimization rules in honor of John Lewis Bentley so John Lewis Bentley wrote a nice little book back in 1982 called writing efficient programs and inside this book there are various techniques for reducing the work of a computer program so if you haven't seen this book before it's it's very good so I highly encourage you to read it many of the original rules in Bentley's book had to deal with the vagaries of computer architecture three and a half decades ago so today we've created a new set of Bentley rules just dealing with the work of a program we'll be talking about architecture specific optimizations later on in the semester but today we won't be talking about this one cool fact is that John Lewis Bentley is actually my academic great-grandfather so John Bentley was one of charles license academic advisors charles leiserson was guy blocks academic adviser and guy block who's a professor at Carnegie Mellon was my advisor when I was a graduate student at CMU so it's a nice little fact and I had honor of meeting John Bentley a couple years ago at a conference and he told me that he was my academic great-grandfather yeah and Charles is my academic grandfather and all of Charles's students are my academic aunts and uncles including your TA Helen okay so here's a list of all the work optimizations that we'll be looking at today so they're grouped into four categories data structures loops logic and functions so there's a list of 22 rules on this slide today in fact we'll actually be able to look at all of them today so today's lecture is going to be structured as a series of many lectures and I'm gonna be spending one to three slides on each one of these optimizations all right so let's start with optimizations for data structures so first optimization is packing and encoding your data structure and the idea of packing is to store more than one data value in a machine word and the related idea of encoding is to convert data values into a representation that requires fewer bits so as I need one know why this could possibly reduce the running time of a program yes right so good answer the answer was it might need less memory fetches and it turns out that that's correct because a computer program spends a lot of time moving stuff around in memory and if you reduce the number of things that you have to move around in memory then that's a good heuristic for reducing the running time of your program so let's look at an example let's say we wanted to encode dates so let's say we wanted to encode this string September 11th 2018 you can store this using 18 bytes so you can use one byte per character here and this would require more than two double words because each double word is eight bytes or 64 bits and you have 18 bytes you need more than two double words and you have to move around these words every time you want to manipulate the date but it turns out that you can actually do better than using 18 bytes so let's say let's assume that we only want to store years between 4096 BCE and 4096 see so there are about 365.25 times 8192 dates in this range which is three million approximately and you can use log base 2 of 3 million bits to represent all of the dates within this range so the notation LG here means log base of 2 that's going to be the notation I'll be using in this class and LG will mean log base 10 so we take the ceiling of log base 2 of 3 million and that gives us 22 bits so a good way to remember how to compute the log base 2 of sum of something you can remember that the log base 2 of 1 million is 20 log base 2 of 1,000 is 10 and then you can factor this out and then add in log base 2 of 3 rounded up which is 2 so that gives you 22 bits and that easily fits within one 32-bit words you now you only need one word instead of three words as you did in the original representation but with this modified representation now determining the month of a particular date will take more work because now you're not explicitly storing the month in your representation whereas with the string representation you are explicitly storing it at the beginning of the string so this does take more work but it requires less space so any questions so far okay so it turns out that there's another way to store this which also makes it easy for you to fetch the month the year or the day for a particular date so here we're going to use the bit fields facilities in C so we're gonna create a struct called date underscore t with three fields a year the month in the date and the integer after the semicolon specifies how many bits I want to assign to this particular field in the struct so this says I need 13 bits for the year four bits for the month and five bits for the day so the 13 bits for the year is because I have 8,192 possible year so I need 13 bits to store that for the month I have 12 possible months so I need log base 2 of 12 rounded up which is 4 and then finally for the day I need log base 2 of 32 31 rounded up which is 5 so in total this still takes 22 bits but now the individual fields can now be accessed much more quick quickly than if we had just encoded the 3 million dates using sequential integers because now you can just extract a month just by saying whatever whatever you named your struct you can just say that that struck dot Mont and that gives you the month yes yeah so this will actually pad the struct a little bit at the end yes so you actually do require a little bit more than 22 bits that's a good good question but this representation is much more easy to access than if you just had encoded the integers as sequential integers but another point is that sometimes unpacking and decoding or the optimization because sometimes it takes a lot of work to encode and code the values and to extract them so sometimes you want to actually unpack the values so that they take more space but they're faster to access so it depends on your particular application you might want to do one thing or the other and the way to figure this out is just to experiment with it okay so I need other questions all right so the second optimization is data structure augmentation an idea here is to add information to a data structure to make common operations do less work so that they're faster and let's look at an example let's say we had two singly linked lists and we wanted to append them together and let's say we only stored the head corner to the list and then each element in the list has a pointer to the next element in the list now if you want to append one list to another list well that's going to require you walking down the first list to find the last element so that you can change the pointer of the last element to point to the beginning of the next list and this might be very slow if the first list is very long so does anyone see a way to augment this data structure so that appending two lists can be done much more efficiently yes yes so the answer is the store pointer to the last value and we call that a tail pointer so now we have two pointers both the head and the tail the head points the beginning of the list a tail points to the end of the list and now you can just append two lists in Causton time because you can access the last element in the list by following the tail pointer and then now you just change the successor pointer of the last element to point to the head of the second list and then now you also have to update the tail to point to the end of the second list okay so that's idea of data structure augmentation we added a little bit of extra information to the data structure such that now appending to lists is much more efficient than in the original method where we only had a head pointer questions okay so the next optimization is pre-computation the idea of pre-computation is to perform some calculations in advance so as to avoid doing these computations at mission critical times to avoid doing them at run time so I say we had a program that needed to use binomial coefficients and here's a definition of a binomial coefficient so it's basically the choose function so you want to choose you want to count the number of ways that you can choose K things from a set of n things and the formula for computing this is n factorial divided by the product of K factorial and n minus K factorial computing this choose function can actually be quite expensive because you have to do a lot of multiplications to compute the factorials even if the final result is not that big because you have to compute one term in the denominator and then two factorial terms in the denominator and then you also might run into integer overflow issues because n factorial grows very fast it grows super exponentially it grows like n to the N which is even faster than 2 to the N which is exponential so doing this computation you have to be very careful with integer overflow issues so one idea to speed up a program that uses these binomial coefficients is to pre compute a table of coefficients when you initialize the program and then just perform table look up on this pre computed table at runtime when you need the binomial coefficient so as anyone know what the table that stores binomial coefficients is called yes yeah Pascal's triangles good so here's what Pascal's triangle looks like so on the vertical axis we have different values of n and then on the horizontal axis we have different values of K and then to get and choose K just go to the and throw in the case column and look up that entry Pascal's triangle has a nice property that for every element it can be computed as a sum of the element directly above it and above it and to the left of it so here 56 is the sum of 35 and 21 and this gives us a nice formula to compute the binomial coefficients so we first check if n is less than K in this choose function if n is less than K then we just return zero because we're trying to choose more things than there are in a set if n is equal to zero then we just return one because here K must also be equal to zero since we had the condition and last and K above and there's one way to choose zero things from a set of zero things and then if K is equal to zero we also return one because there's only one way to choose zero things from a set of any number of things you just don't pick anything and then finally we recursively call this huge function so we call choose of n minus 1 K minus 1 this is essentially the entry above and diagonally diagonal to this and then we add in choose of n minus K m nice one K which is the entry directly above it so this is a recursive function for generating this Pascal's triangle but notice that we're actually still not doing pre computation because every time we call this choose function we're making two recursive calls and this can still be pretty expensive so how can we actually pre compute this table so here's some C code for pre computing Pascal's triangle and let's say we only wanted coefficients up to a shoe size of a hundred so we initialize matrix of a hundred by a hundred and then we call this in the netshoes function so first it goes from zero all the way up to two size minus one and then it sets choose of n 0 to be 1 it also choose sets choose of n n to be 1 so the first line is because there's only one way to choose 0 things from any number of things in the second line is because there's only one way to choose n things from N Things which is just to pick all of them and then now we have a second loop which goes from n equals 1 all the way up to choose size minus 1 and then first we set choose of 0 and to be 0 because here n is or K is greater than n so there's no way to pick that there's no way to pick more elements from a set of things that is less than the number of things you want to pick and then now you loop from K equals 1 all the way up to n minus 1 and then you apply this recursive formula so choose of n K is equal to choose of n minus 1 K minus 1 plus choose of n minus 1 K and then you also set choose of K and to be 0 so the this is basically all of the entries above the diagonal here where K is greater than n and then now inside the program whenever we need a binomial coefficient that's less than a hundred we can just do table lookup into this table and we just index into this choose array so does this make sense any questions it's just pretty easy so far right so one thing to note is that we're still computing this table at run time because we have to initialize this table at run time and if we want to run or program many times then we have to initialize this table many times so is there a way to only initialize this table once even though we might want to run the program many times yes yeah so good so put it in the source code and so we're gonna do compile time initialization and if you put the table in the source code then the compiler will compile this code and generate the table for you at compile time so now whenever you run it you don't have to spend time initializing the table so idea of compile time initialization there's a store that values of constants during compilation and therefore saving work at run time so let's say we wanted choose values up to 10 this is the table the 10 by 10 table storing all of the binomial coefficients up to 10 so if you put this in your source code now when you run the program you can just index into this table to get the appropriate appropriate constant here but this table was just a 10 by 10 table what if you wanted a table of a thousand by a thousand does anyone actually want to type this in a table of 1000 by 1000 so probably not so is there any way to get around this yes yeah yeah so the answer is to write a program that writes your program for you and that's called meta programming so here's a snippet of code that will generate this table for you so it's going to call this a net choose function that we defined before and that now it's just gonna print out C code so it's gonna print out that the Declaration of this array choose followed by a left bracket and then for each row of the table we're gonna print another left bracket and then print the value of each entry in that row followed by a right bracket and we do that for every row so this will give you the C code and now you can just copy and paste this and place it into your source code this is a pretty pretty cool technique to get your get your computer to do work for you and you're welcome to use this technique and in your homeworks and projects if you if you need to generate large tables of constant values so this is a very good technique to know so any questions yes yeah you can't so you can pipe the you can pipe the output of this program to a file yeah yes so we have yeah so I think you can write macros to actually generate this table and then the compiler will run the run those macros to generate this table for you yes you don't actually need to copy and paste it right so as Charles says you can write your meta program using any language you don't have to write it and see you can write it in Python if you're more familiar with that and it's often easier to write it using a scripting language like Python okay so let's look at the next optimization so if we already gone gone through a couple mini lectures already so congratulations to all of you who who are still here so the next optimization is caching the idea of caching is to store results have been accessed recently so that you don't need to compute them again in the program so let's look at an example so let's say we wanted to compute the hypotenuse of a right triangle with side lengths a and B so the formula for computing this is you take the square root of a times a plus B times B okay so turns out that the square root operator is actually a relatively expensive more expensive than additions and multiplications on modern machines so you don't want to have to call the square root function if you don't have to and one way to avoid doing that is to create a cache so here I have a cache just storing the previous hypotenuse that I calculated and I also store the values of a and B that were passed to the function and then now when I call the hypotenuse function I can first check if a is equal to the cash value of a and if B is equal to the cash value of B and if both of those are true then I already computed I potenuse before and then I can just return cast of age but if it's not in my cache now I need to actually compute it so I need to call this square root function and then I store the result into cash 2h and I also store a and B in the cache a in cash B respectively and then finally I return cash 2h so this this example isn't actually very realistic because my cache is only of size one and it's very unlikely in a program you're going to repeatedly call some function with the same input arguments but you can actually make a larger cache you can make a cache of size 1,000 storing the 1000 most recently computer hypotenuse values and then now when you call that potenuse function you can just check if it's in your cache checking the the larger cache is going to be more expensive because there are more values to look at they can still save you time overall and and hardware also does caching for you as well talk about later on in the semester but the point of this optimization is that you can also do caching yourself you can do it in software you don't have to let Hardware do it for you it turns out for this particular program here actually is about 30% faster if you do hit the cache 2/3 about 2/3 at the time so it does actually save you time if you do repeatedly compute the same values over and over again so that's caching any questions okay so the next optimization will look at is sparsity the idea of exploiting sparsity in an input is to avoid storing and computing on zero elements of that input and the fastest way to compute on zeros is not compute on them at all because we know that at any value plus zero is just that original value and any value times zero is just zero so why waste the computation doing that when you already know the result so let's look at an example this is matrix vector multiplication so we want to multiply an N by n matrix by an N by one vector we can do dense matrix vector multiplication by just doing a dot product of each row in the matrix with the column vector and then that will give us the output vector but if you do dense matrix vector multiplication that's gonna perform N squared or 36 in this example scalar multiplies but it turns out only 14 of these entries in this matrix are 0 or are nonzero so you just wasted work doing the multiplication on the zero elements because you know that 0 times any other element is just 0 so a better way to do this is instead of doing the multiplication for every element you first check if one of the arguments is 0 and if it is 0 then you don't have to actually do the multiplication but this is this this is still kind of slow because you still have to do a check for every entry in your matrix even though many of the entries are 0 so there's actually a pretty cool data structure that won't actually store these 0 entries and this will speed up your matrix vector multiplication if your matrix is sparse enough so let me describe how this data structure works it's called compressed sparse Row or CSR there's an analogous representation called compressed sparse column or CSC but today I'm just going to talk about CSR so we have three arrays first we have the rows the length of the rows array is just equal to the number of rows in a matrix plus 1 and then each entry in the rows array just stores an offset into the columns array or the calls array and inside the calls array I'm storing the indices of the nonzero entries in each of the rows so if we take Row 1 for example we have rows of 1 is equal to 2 that means I start looking at the second entry in the calls array and then now I have the indices of the nonzero columns in the first row so it's just 1 2 4 and 5 these are nonzero in this these are the indices for the nonzero entries and then I have another array called vowels the length of this array is the same as the calls array and then this array just stores the actual value in these indices here so the vowels array for Row 1 is going to store 4 1 5 and 9 because these are the nonzero entries in the first row right so the first rows array just serves as an index into this calls array so you can basically get the starting starting index in this calls away for any row just by looking at the entry stored at the corresponding location in the rows array so for example Row 2 starts at location 6 so it starts here and you have indices 3 & 5 which are the nonzero indices does anyone know how to compute the length the number of non-zeros in a row by looking at the rows array yes yes right yes so to get the length of a row you just take the difference between that Rose offset and the next Rose offset so we can see that the length of the first row is 4 because it's offset is 2 and the offset for Row 2 is 6 so you just take the difference between those two entries we have an additional entry here so we have the sixth row here because we want to be able to compute the length of the last row without overflowing in our array so we just created an additional entry in the road's array for that so turns out that this representation will save you space if your matrix is sparse so the storage required by the CSR format is order n plus n n Z where n n Z is a number of non-zeros in your matrix and the reason why you have n plus n n Z well you have two arrays here calls in bhau's whose length is equal to the number of non-zeros in the matrix and then you also have this rows array whose length is n plus 1 so that's why we have n plus n n Z and if the number of nonzero is as much less than N squared then this is going to be significantly more compact than the dense matrix representation however this isn't always going to be the most compact representation does anyone see why why might the dense representation sometimes take less space yeah sorry why might the dance representation sometimes take less space right so yeah if you have a relatively dance matrix then it might take more space than storing it it might take more space in the CSR representation because you have these two arrays so if you take the extreme case where all of the entries are non zeros then both of these arrays are going to be of length N squared so you already have 2n squared there and then you also need this Rose array which is of length n plus 1 okay so now I gave you this more compact representation for storing the matrix so how do we actually do stuff with this representation so it turns out that you can still do matrix vector multiplication using this compressed barse row format and here's the code for doing it so we have this struct here which is the CSR representation we have the Rose array that call calls array and then the Val's array and then we also have the number of rows N and the number of non-zeros n n Z and then now what we do we call this SP MV or sparse matrix vector multiply we pass in our CSR representation which is a and then the input array which is X and then we store the result in an output array Y so first we loop through all of the rows and then we set Y if I to be 0 this is just initialization and then for each of my rows I'm going to look at the non 0 the column indices for the non zero elements and I can do that by starting at K equals 2 rows of I and going up 2 rows of I plus 1 and then for each one of these entries I just look up the index the column index for the nonzero element and I can do that with calls of K so that BJ and then now I know which elements to multiply I multiplied vowels of K by ax of J and then now I just add that to Wi-Fi and then this after I finish with all of these multiplications and additions this will give me the same result as if I did the dense matrix vector multiplication so this is actually a pretty pretty cool program so I encourage you to look at this program offline to convince yourself that it's actually computing the same thing as the dense matrix vector multiplication version so I'm not going to prove this during lecture today but you can feel free to ask me or any of your TAS after class if you have questions about this and the number of scalar multiplications that you have to do using this code it's just going to be nnz because you're just operating on the non-zero elements you don't have to touch all of the zero elements and in contrast the dense matrix vector multiply algorithm would take N squared multiplications so this can be significantly faster for sparse matrices so it turns out that you can also use a similar structure to store a sparse static graph so I see many of you have seen graphs in your previous courses see here's what the sparse graph representation looks like so again we have this raised we have these two arrays we have offsets and edges the offsets array is analogous to the rows array in the edges array is analogous to the columns array for the CSR representation and then in this offsets array we store for each vertex where its neighbors start in this majus array and then in the edges array we just write the indices of its neighbors there so let's take vertex one for example the offset of vertex one is two so we know that its outgoing neighbors start at position two in this edges array and then we see that vertex one has outgoing edges two vertices two three and four and we see in the edges array two three four are listed and you can also get the degree of each vertex which is analogous to the length of each row by taking the difference between consecutive offsets so here we see that the degree of vertex 1 is 3 because it's offset is 1 it's offset is 2 in the offset of vertex 2 is 5 it turns out that using this representation you can run many classic graph algorithms such as breadth-first search and PageRank quite efficiently especially when the graph is sparse so this would be much more efficient than using a dense matrix to represent the graph and running these algorithms you can also store weight on the edges and one way to do that is to just create an additional array called weights whose length is equal to the number of edges in the graph and then you just store the weights in that array and this is analogous to the values array in the CSR representation but there's actually a more efficient way to store this if you always need to access the weight whenever you acts as an edge and the way to do this is to interleave the weights with the edges so to store the weight for a particular edge right next to that edge and create a array of twice number of edges in the graph and the reason why this is more efficient is because it gives you improved cache locality weight and we'll talk much more about cache locality later on in this course but the high-level idea is that whenever you access an edge the weight for that edge will also likely to be on the same cache line so you don't need to go to main memory to access the weight of that edge again and later on in the semester we'll actually have a whole lecture on doing optimizations for graph algorithms and today I'm just going to talk about the representation one representation of graphs but we'll talk much more about this later on any questions okay so that's it for the data structure optimizations we still have three more categories of optimizations to go over so it's a pretty fun lecture we get to learn about many cool tricks for reducing the work of your program so in the next class of optimizations will look at is logic so first thing is constant folding and propagation the idea of constant folding and propagation is to evaluate constant expressions and substitute the result into further expressions all at compilation times you don't have to do it at runtime so again let's look at an example so we're here we have this function called orrery does anyone know what worried means you can look it up on Google okay so so nori is a model of a solar system so here we're constructing a digital orrery and in order we we have these whole bunch of different constants we have the radius the diameter the circumference cross area surface area and also the volume if you look at this code you can notice that actually all these constants can be defined at compile time once we fix the radius so here we set the radius to be this constant here six six million three hundred and seventy one thousand I don't know where that constant comes from by the way but Charles made these slides so it probably does sorry okay radius of the earth now that diameter is just twice this radius the circumference is just pi times diameter cross area is pi times the radius squared surface area is circumference times the diameter and finally volume is four times pi times the radius cubed divided by three so you can actually evaluate all of these two constants at compile time so with a sufficiently high level of optimization the compiler will actually evaluate all of these things at compile time and that's the idea of constant folding and propagation it's a good idea to know about this even though the compiler is still pretty is pretty good at doing this because sometimes the compiler won't do it and in those cases you can do it yourself and you can also figure out whether the compiler is actually doing it when you look at the assembly code okay so the next optimization is common sub-expression elimination and the idea here is to avoid computing the same expression multiple times by valuing the expression once and storing the result for later use so let's look at this simple four line program we have a equal to B plus C then we set B equal to a minus D then we set C equal to B plus C and finally we set D equal to a minus D so notice here that the second and the fourth lines are computing the same expression they're both computing a minus D and they evaluate to the same thing so idea of common sub-expression elimination would be to just substitute the result of the first evaluation into the place where you need it in future future lines so here we still evaluate the first line for a minus D but now in the second time we need a minus D we just set the value to B so now D is equal to B instead of a minus D so in this example the first and the third line the right-hand side of those lines actually look the same they're both B plus seed does anyone see why you can't do common sub-expression elimination here yeah so you can't do common sub-expression for the first and the third lines because the value of B changes in between so the value of B changes on the second line so on the third line when you do B plus C is not actually computing the same thing as the first first evaluation of B plus C so again the compiler is usually smart enough to figure this optimization out so it will do this optimization for you in your code but again it doesn't always do it for use as good it's a good idea to know about this optimization so that you can do this optimization by hand when the compiler doesn't do it for you questions so far okay so next let's look at algebra identities the idea of exploiting algebraic identities is to replace more expensive algebraic expressions with expressions that are cheaper to evaluate so let's look at an example let's say we we have a whole bunch of balls and we want to detect whether two balls collide with each other say ball has an x-coordinate a y-coordinate a Z coordinate as well as a radius and the collision test works as follows we set D equal to the square root of the sum of the squares of the differences between each of the three coordinates of the two balls so here we're taking the square of B one's x-coordinate minus B 2 is x-coordinate and then adding the square of B one's y-coordinate minus B 2 is y-coordinate and finally adding the square of B 1 z coordinate minus B 2 z coordinate and now we take the square root of all of that and then if if the result is less than or equal to the sum of the two radii of the ball then that means that there's a collision and otherwise that means there's not a collision it turns out that the square root operator as I mentioned before is relatively expensive compared to doing multiplications and additions and subtractions on modern machines so how can we do this without using the square root operator yes right so that's actually a good check to a good fast path check I don't think it necessarily always gives you the right answer sir another yes right right so the answer is that you can actually take the square of both sides so now you don't have to take the square root anymore so we're gonna use the identity that says if the square root of U is less than or equal to V exactly when u is less than or equal to V squared so we're just going to take the square of both sides and here's the modified code so now I don't have this square root anymore on the right hand side when I compute V squared but instead I square the sum of the two radii so this this will give you the same answer however you do have to be careful with floating-point operations because they don't work exactly in the same way as real numbers so you sometimes might run into overflow issues or rounding issues so you do have to be careful if you're using algebraic identities in floating-point computations but the hile of idea is that you can use equivalent algebraic expressions to reduce the work of your program and we'll come back to this example later on in this lecture when we talk about some other optimizations such as a fast path optimization as one of the students pointed out yes which are you talking about this line so before we were comparing D yeah yeah okay is that clear okay okay so the next optimization is short-circuiting the idea here is that when we're performing a series of tests we can actually stop you evaluating this series of tests as soon as we know what the answer is so here's an example let's say we have an array a containing all non-negative integers and we want to check if the sum of the values and a exceeds some limit so the simple way to do this is you just sum up all of the values of the array using a for loop and then at the end you check if the total sum is greater than the limit so using this approach you always have to look at all the elements in the array but there's actually a better way to do this and the idea here is that once you know the partial sum exceeds the limit that you're testing against then you can just return true because at that point you know that the sum of the elements in the array will exceed the limit because all of the elements in the array are non-negative and then if you get all the way to the end of this for loop that means you didn't exceed this limit and you can just return false so this this second program here will usually be faster if most of the time you exceed the limit pretty early on when you loop through the array but if you actually end up looking at most of the elements anyways or even looking at all the elements this second program will actually be a little bit slower because you have this additional check inside this for loop that has to be done for every iteration so when you apply this optimization you should be aware of whether this will actually be faster or slower based on the frequency of of what when you can short-circuit the test questions okay and I want to point out that there are short-circuiting logical operators so if you do double ampersand that's a short-circuiting and logical and operator so if it you value so if it evaluates the left side to be false it means that the whole thing has to be false it's not even going to evaluate the right side and then the double vertical bar is going to be a short-circuiting or so if it knows that the left side is true it knows the whole thing has to be true because or just requires one of the two sides to be true and it's gonna short-circuit in contrast if you just have a single ampersand or a single vertical bar these are not short-circuiting operators are going to evaluate both sides of the argument the single ampersand and single vertical bar turn out to be pretty useful when you're doing bit manipulation and we'll be talking about these operators more on Thursday's lecture yes yeah if you use a double ampersand then that would be true yes yeah yes one example is if you might possibly index an array out of balance you can first check whether you would exceed the limit or be out of balance and if if so then you don't actually do the index okay a related idea is to order tests such that the tests that are more often successful or earlier and the ones that are less frequently successful are later in the order because you want to take advantage of short-circuiting and similarly inexpensive tests should precede expensive tests because if you do too inexpensive tests in your test short circuit then you don't have to do the more expensive tests so here's an example here we're checking whether a character this white space so characters white space if it's equal to the carriage return character if it's equal to the tab character is equal to space or if it's equal to the newline character so which one of these tests do you think should go at the beginning yes why is that yeah yeah so so the it turns out that the space and then you line characters appear more frequently than the curious returned in the tab in the the space is the most frequent because you have a lot of spaces in text so here I've reordered the test so that the space the check for space is first and then now if you have a character that's a space you can just short-circuit this test and return true next the newline character I have I have it as the second test because these are also pretty frequent you have a new line for every new line in your text and then less frequent is a tab character and finally the carriage return free character isn't that frequently used nowadays so now now with this or during the most frequently successful tests are going to appear first notice that this only actually saves you work if the character is a whitespace character if it's not a whitespace character then you're gonna end up evaluating all of these tests anyways okay so now let's go back to this example of detecting collision of balls so we're gonna look at the idea of creating a fast path and the idea of creating a fast path is that there might possibly be a check that will enable you to exit the program early because you already know what the result is and one one fast path check for this particular program here is you can check whether the bounding boxes of the two balls intersect if you know the bounding boxes of the two balls don't intersect then you know that the balls cannot collide if the bounding boxes of the two balls do intersect well then you have to do the more expensive test because that doesn't necessarily mean that the balls will collide so here's what the fast path test looks like we're first going to check whether the bounding box is intersect and we can do this by looking at the absolute value of the difference on each of the coordinates and checking if that's greater than the sum of the two radii and if so that means that for that particular coordinate the bounding boxes cannot intersect and therefore the balls cannot collide and then we can return false of any one of these tests return true and otherwise we'll do the more expensive test of comparing D squared to the square of the sum of the two radii and the reason why this is a fast path is because this test here is actually cheaper to evaluate than this test below here we're just doing subtractions additions and comparisons and below we're using the square operator which requires a multiplication and multiplications are usually more expensive than additions on modern machines so ideally if we don't need to do the multiplication we can avoid it by going through our fast path so for this example it probably isn't worth it to do the fast path checks since there's such a small program but in practice there are many applications and graphics that benefit greatly from doing fast path checks and the fast path check will greatly improve the performance of these graphics programs there's actually another optimization that we can do here I talked about this optimization couple slides ago does anyone see it yes right so we can apply common sub-expression elimination here because we're computing the sum of the two radii four times we can actually just compute it once stored in a variable and then use it for the subsequent three calls and then similarly what we're taking the difference between each of the coordinates we're also doing it twice so again we can store that in a variable and then just use the result in the second time any questions okay so the next idea is to combine tests together so here we're gonna replace a sequence of tests with just one test or switch statement here's an implementation of a full adder so a full adder there's a hardware device that takes us input three bits and then it returns the carry bit and the sum bit as output so here's a table that specifies for every possible input to the full adder what the output should be and are eight possible inputs to the full adder because it takes three bits and there eight possibilities there and this program here it's going to check all the possibilities its first going to check if a is equal to zero if so it checks that B equals equals to zero if so it checks if C is equal to zero and if that's true it returns zero and zero for the two bits and otherwise it returns one and zero and so on so this is basically a whole bunch of if-else statements nested together does anyone think this is a good way to write the program who thinks this is a bad way to write the program okay so most of you think it's a bad way to write the program and hopefully I can convince the rest of you who didn't raise your hand so here's a better way to write this program so we're gonna replace these multiple if-else clauses with a single switch statement and what we're gonna do is we're going to create this test variable that is a 3-bit variable so we're gonna place the C bit in the least significant digit the B bit we're gonna shift it over by one so in the second least significant digit and then the a bit in the third least significant digit and then now the value of this test variable is going to range from 0 to 7 and then for each possibility we can just specify what the sum and the carry bits should be and this requires just a single switch statement instead of a whole bunch of is else clauses there's actually an even better way to do this for this example which is to use table lookups you just pre compute all of these answers stored in a table and then just look it up at runtime but the idea here is that you can combine multiple tasks into a single test and this not only makes your code cleaner but it can also improve the performance of your program because you're not doing so many checks and you won't have an as many branch misses yes maybe yeah I mean I encourage you to see if you can write a faster program for for this all right so we're done with two categories of optimizations stuff two or more to go the third category is going to be about loops so if we didn't have any loops in our in our programs well there wouldn't be many interesting programs to optimize because most of our programs wouldn't be very long running but with loops we can actually optimize these loops and then realize the benefits of performance engineering the first loop optimization I want to talk about is hoisting the Gulf hoisting which is also called loop invariant code motion is to avoid recomputing a loop invariant code each time through the body of a loop so if you have a for loop where each and each iteration are competing the same thing well you can actually save work by just computing it once so in this example here I'm looping over an array of length N and then I'm setting Y if I equal to X of I times the exponential of the square root of pi over 2 but this exponential of square root of PI over 2 is actually the same in every iteration so I don't actually have to compute that every time so here's a version of the code that does hoisting I just moved this expression outside of the for loop and stored it in a variable factor and then now inside the for loop I just have to multiply by factor I already computed what this expression is and this can save a running time because computing the exponential d'squared of pi over two is actually relatively expensive so it turns out that for this example here the compiler can probably figure it out and do this hoisting for you but in some cases the compiler might not be able to figure it out especially if these functions here might change throughout the program so it's good it's a good idea to know what this optimization is so you can apply it in your code when the compiler doesn't do it for you okay sentinels so sentinels are special dummy values placed in a data structure to simplify the logic of handling boundary boundary conditions and in particular the handling of loop exit tests so here I have again have this program that checks whether I have this program that checks whether the sum of the elements in some array a will overflow if I added all of the elements together and here I've specified that all the elements of a are non-negative so how I do this is in every iteration I'm gonna increment sum by a of I and then I'll check if the resulting sum is less than a of I does anyone see why this will detect if I had an overflow yes you know so if so if the thing I added in cause is an overflow then the result is going to wrap around and it's going to be less than the thing I added in so this is why the check here that checks whether some is last native I will detect an overflow okay so I'm gonna do this check in every iteration if it's true I'll just return true and otherwise I get to the end of this for loop where I just return false but here on every iteration I'm doing two checks I'm first checking whether I should exit the body of this loop and then secondly I'm checking whether the sum is less than a of I it turns out that I can actually modify this program so that I only need to do one check in every iteration of the loop there's a modified version of this program so here I'm gonna assume that I have two additional entries in my array a so these are a of N and a of n minus one so I assume I can use these locations and I'm gonna set a of n to be the largest possible 64-bit integer or in 64 max and then I'm gonna set a of n plus 1 to be 1 now I'm going to initialize my loop variable I to be 0 and then I'm going to set the sum equal to the first element in a or a of 0 and then now I have this loop it checks whether the sum is greater than or equal to a of I and if so I'm gonna add the sum I'm going to add a of I plus 1 to the sum and then I also increment I okay and this code here does the same thing as a thing on the left because the only way I'm going to exit this while loop is if I overflow and I'll overflow if a of I becomes greater than sum or if the sum becomes less than a of I which is what I had in my original program and then otherwise I'm gonna just increment sum by AFI and then this code here is going to eventually overflow because if the elements in my array a don't cause the program to overflow I'm gonna get to a of N and a if n is a very large integer and if I add that what I have is going to cause the program to overflow and at that point I'm gonna exit this for loop or this while loop and then after I exit this loop I can check I can check why I overflowed if I overflowed because I because of some element of a then the loop index I is going to be less than n and I return true but if I overflowed because I added in this huge integer well then I is going to be equal to n and then I know that the elements of a didn't cause me to overflow the a of n value here did so then I just return false does this make sense so here in each iteration I only have to do one check instead of two texts as in my original code I only have to check whether the sum is greater than or equal to a if I does anyone know why I set a of n plus one equal to one yes yeah so good so the answer is because if all of my elements were zero in my original array that even though I add in this huge integer it's still not gonna overflow but now I'm going to when I get to a of n plus 1 I'm gonna add 1 to it and then that will cause the sum to overflow and then I can exit there so this is a steal deal with the boundary condition when all the entries in my array are 0 okay so next loop and rolling so loop unrolling attempts a safe work by combining several consecutive iterations of a loop into a single iteration thereby reducing the total number of iterations of the loop and consequently the number of times that the instructions that control the loop have to be executed there are two types of loop unrolling there's full loop and rolling why unroll all of the iterations of the for loop and I'll just get rid of the control flow logic entirely then there's partial loop unrolling where I only ate only unroll some of the iterations but not all of the iterations so I still have some control flow code in my loop let's first look at full loop unrolling say here I have a simple program that just loops for 10 iterations the fully unrolled loop just looks like the code on the right hand side I just wrote out all the lines of code that I have to do in straight line code instead of using a for loop and now I don't need to check on every iteration whether I need to exit the for loop so this is full loop unrolling this is actually not very common because most your loops are going to be much larger than 10 and often times many of your loop bounds are not going to be determined at compile time they're determined at runtime so the compiler can't fully unload that loop for you for small loops like this the compiler will probably unroll the loop for you but for larger loops it actually doesn't benefit to unroll the loop fully because you're gonna have a lot of structions and that's going to pollute your instruction cache so the more common form of Lupin trolling is partial loop and rolling and here in this example here I've enrolled the loop by a factor of four so I reduce a number of iterations of my for loop by a factor of four and then inside the body of each of each iteration I have four instructions and then notice I also changed the logic in my in the control flow of my for loop so now I'm incrementing the variable J by four instead of just by one and then since n might not necessarily be divisible by four I have I have to deal with the remaining elements and this is what the second for loop is doing here is just dealing with the remaining elements and this is the more common form of loop unrolling so the first benefit of doing this is that you have fewer fewer checks to this to the exit condition for the loop because you only have to do this check every four iterations instead of every iteration but the second and much bigger benefit is that it allows more compiler optimizations because it increases the size of the loop body and it gives the compiler more freedom to play around with code and to find ways to optimize the performance of that code so that's usually the bigger benefit if you unroll the loop by too much that actually isn't very good because because now you're going to be polluting your instruction cache and every time you fetch an instruction it's likely going to be a miss in your instruction cache and that's gonna decrease the performance of your program and furthermore if your loop body is already very big you don't really get additional improvements from having the compiler do more optimizations because it already has enough code to work with so giving it more code doesn't actually give you much there okay so I just said this the benefits of loop unrolling a lower number of instructions and loop control code and it also enables bohr compiler optimizations and the second benefit here is usually the much more important benefit and we'll talk more about compiler optimizations in a couple lectures okay any questions okay so the next optimization is loop fusion this is also called jamming and the idea here is to combine multiple loops over the same index range into a single loop thereby saving the overhead of loop control say here I have two loops they're both looping from I equals 0 all the way up to n minus 1 the first loop I'm computing the the minimum of a of I and V of I and storing the result in CFI the second loop I'm comparing the I'm computing the maximum of a of I and B of I and storing the result in D of I so since these are going over the same index range I can fuse together the two loops giving me a single loop that does both of these both of these lines here and this reduces the overhead of loop control code because now instead of doing this exit condition check two and times I only have to do it end times this also gives you better cash flow Cal D again we'll talk more about cache locality in a future lecture but at a lot high level here what it what it gives you is that once you load a of I and B of I into cache when you compute C of I it's also going to be in cache when you compute D of I whereas in the original code when you compute DF i it's very likely that a of I and B if I are going to be kicked out of cache already even though you brought it in when you compute at C of I for this example here again there's another optimization you can do common sub-expression elimination since you're computing this expression a if I is less than or equal to B of I twice okay next let's look at eliminating wasted iterations the idea of eliminating waste generation is to modify the loop bounds to avoid executing loop iterations over essentially empty loop bodies so here I have some code to transpose a matrix so I go from I equals 0 to n minus 1 from J equals 0 to n minus 1 and then I check if I is greater than J and if so I'll swap the entries in a of IJ and a of ji the reason why I have this check here is because I don't want to do the swab twice otherwise I'll just end up with the same matrix that I had before so I only have I all have to do the swap when I is greater than J one disadvantage of this code here is I still have a loop for N squared iterations even though only about half the iterations are actually doing useful work because about half the iterations are gonna fail this check here that checks whether I is greater than J so here's a modified version of the program where I basically eliminate these wasted iterations so now I'm gonna loop from I equals 1 to n minus 1 and then from J equals 0 all the way up to I minus 1 so now instead of going up to n minus 1 I'm going just up to I minus 1 and that basically puts this check whether I is greater than J into the loop control code and that saves me the extra wasted iterations okay so that's the last optimization on loops are there any questions yes so the check is so you still have to do the check in the loop control code but here you also had to do it and now you just don't have to do it again inside the body of the loop yes in the first example and be able to the processor pipeline yeah so so branch yes so most of these are gonna be branch hits so it's still gonna be pretty fast but it's gonna be even faster if you just don't do that check at all so I mean basically you should just test it out in your code to see whether it will give you a runtime improvement okay so last category of optimizations as functions so first the idea of inlining is to avoid the overhead of a function call by replacing a call to the function with the body of the function itself so here I have a piece of code that's computing the sum of squares of elements in an array a and then four so I have this for loop that in each iteration is calling this square function and the square function is defined above here it just does x times X for a input argument X but it turns out that there is actually some overhead to doing a function call and the idea here is to just put the body of the function inside the function that's calling it so instead of calling the square function I'm just gonna create a variable temp and then I call I call I set some equal to some plus 10 times temp so now I don't have to do the additional function call it you don't actually have to do this manually so if you declare your function to be static inline that the compiler is going to try to inline this function for you by placing the body of the function inside the code that's calling it and nowadays compiler is pretty good at doing this so even if you don't declare static in-line compiler will probably still inline this code for you but just to make sure you should if you want to inline a function you should declare it as static inline and you might ask why can't you just use a macro to do this but it turns out that inline functions nowadays are just as efficient as macros but they're better structured because they evaluate all of their arguments whereas macros just do a textual substitution so if you have an argument that's very expensive to evaluate the macro might actually paste that expression multiple times in your code and if the compiler isn't good enough to do common sub-expression elimination then you just wasted a lot of work okay so there's one more optimization or there's two more optimizations that I'm not gonna have time to talk about but I'm gonna post these slides on learning modules so please take a look at them tail recursion elimination and forcing recursion so here are a list of most of the rules that we looked at today there are two of the function optimizations I didn't get to talk about please take a look at that offline and ask your TAS if you have any questions and some closing advice is you should avoid premature optimization so all the things I talked about today improve the performance of your program but you first need to make sure that your program is correct if you have a program that doesn't do the right thing then it doesn't really benefit you to make it faster and to preserve correctness you should do regression testing so develop a suite of tests to check the Krakus of your program every time you change the program and as I said before reducing the work of a program doesn't necessarily decrease its running time but it's a good heuristic and finally the compiler automates many level of optimizations and you can look at the assembly code to see whether the compiler did something you you 3 00:00:05,769 --> 00:00:08,019 4 00:00:08,019 --> 00:00:09,850 5 00:00:09,850 --> 00:00:10,930 6 00:00:10,930 --> 00:00:13,120 7 00:00:13,120 --> 00:00:15,160 8 00:00:15,160 --> 00:00:22,050 9 00:00:22,050 --> 00:00:26,109 10 00:00:26,109 --> 00:00:27,700 11 00:00:27,700 --> 00:00:30,880 12 00:00:30,880 --> 00:00:35,280 13 00:00:35,280 --> 00:00:38,770 14 00:00:38,770 --> 00:00:43,030 15 00:00:43,030 --> 00:00:47,950 16 00:00:47,950 --> 00:00:49,420 17 00:00:49,420 --> 00:00:52,540 18 00:00:52,540 --> 00:00:54,910 19 00:00:54,910 --> 00:00:57,010 20 00:00:57,010 --> 00:00:59,650 21 00:00:59,650 --> 00:01:01,600 22 00:01:01,600 --> 00:01:07,300 23 00:01:07,300 --> 00:01:09,310 24 00:01:09,310 --> 00:01:10,630 25 00:01:10,630 --> 00:01:14,050 26 00:01:14,050 --> 00:01:15,520 27 00:01:15,520 --> 00:01:18,880 28 00:01:18,880 --> 00:01:21,130 29 00:01:21,130 --> 00:01:24,130 30 00:01:24,130 --> 00:01:26,649 31 00:01:26,649 --> 00:01:30,070 32 00:01:30,070 --> 00:01:32,649 33 00:01:32,649 --> 00:01:34,960 34 00:01:34,960 --> 00:01:38,020 35 00:01:38,020 --> 00:01:40,359 36 00:01:40,359 --> 00:01:42,609 37 00:01:42,609 --> 00:01:45,340 38 00:01:45,340 --> 00:01:48,160 39 00:01:48,160 --> 00:01:50,289 40 00:01:50,289 --> 00:01:52,359 41 00:01:52,359 --> 00:01:54,149 42 00:01:54,149 --> 00:01:59,620 43 00:01:59,620 --> 00:02:01,510 44 00:02:01,510 --> 00:02:03,670 45 00:02:03,670 --> 00:02:05,380 46 00:02:05,380 --> 00:02:07,600 47 00:02:07,600 --> 00:02:09,369 48 00:02:09,369 --> 00:02:11,250 49 00:02:11,250 --> 00:02:13,420 50 00:02:13,420 --> 00:02:14,899 51 00:02:14,899 --> 00:02:17,149 52 00:02:17,149 --> 00:02:18,740 53 00:02:18,740 --> 00:02:21,940 54 00:02:21,940 --> 00:02:24,470 55 00:02:24,470 --> 00:02:25,699 56 00:02:25,699 --> 00:02:29,600 57 00:02:29,600 --> 00:02:31,250 58 00:02:31,250 --> 00:02:33,470 59 00:02:33,470 --> 00:02:35,569 60 00:02:35,569 --> 00:02:37,670 61 00:02:37,670 --> 00:02:40,339 62 00:02:40,339 --> 00:02:44,569 63 00:02:44,569 --> 00:02:46,849 64 00:02:46,849 --> 00:02:50,630 65 00:02:50,630 --> 00:02:53,509 66 00:02:53,509 --> 00:02:57,020 67 00:02:57,020 --> 00:02:59,149 68 00:02:59,149 --> 00:03:01,729 69 00:03:01,729 --> 00:03:04,699 70 00:03:04,699 --> 00:03:08,210 71 00:03:08,210 --> 00:03:10,580 72 00:03:10,580 --> 00:03:16,099 73 00:03:16,099 --> 00:03:19,009 74 00:03:19,009 --> 00:03:20,780 75 00:03:20,780 --> 00:03:22,780 76 00:03:22,780 --> 00:03:27,229 77 00:03:27,229 --> 00:03:29,240 78 00:03:29,240 --> 00:03:31,129 79 00:03:31,129 --> 00:03:32,569 80 00:03:32,569 --> 00:03:35,319 81 00:03:35,319 --> 00:03:40,869 82 00:03:40,869 --> 00:03:44,750 83 00:03:44,750 --> 00:03:46,670 84 00:03:46,670 --> 00:03:49,879 85 00:03:49,879 --> 00:03:52,420 86 00:03:52,420 --> 00:03:55,129 87 00:03:55,129 --> 00:03:57,589 88 00:03:57,589 --> 00:03:59,869 89 00:03:59,869 --> 00:04:02,059 90 00:04:02,059 --> 00:04:05,180 91 00:04:05,180 --> 00:04:06,770 92 00:04:06,770 --> 00:04:08,509 93 00:04:08,509 --> 00:04:10,490 94 00:04:10,490 --> 00:04:10,500 95 00:04:10,500 --> 00:04:14,050 96 00:04:14,050 --> 00:04:16,279 97 00:04:16,279 --> 00:04:18,259 98 00:04:18,259 --> 00:04:20,629 99 00:04:20,629 --> 00:04:20,639 100 00:04:20,639 --> 00:04:22,410 101 00:04:22,410 --> 00:04:28,330 102 00:04:28,330 --> 00:04:31,600 103 00:04:31,600 --> 00:04:34,300 104 00:04:34,300 --> 00:04:36,880 105 00:04:36,880 --> 00:04:40,660 106 00:04:40,660 --> 00:04:42,940 107 00:04:42,940 --> 00:04:45,400 108 00:04:45,400 --> 00:04:47,140 109 00:04:47,140 --> 00:04:48,670 110 00:04:48,670 --> 00:04:51,400 111 00:04:51,400 --> 00:04:53,380 112 00:04:53,380 --> 00:04:57,280 113 00:04:57,280 --> 00:05:00,460 114 00:05:00,460 --> 00:05:05,320 115 00:05:05,320 --> 00:05:07,870 116 00:05:07,870 --> 00:05:10,990 117 00:05:10,990 --> 00:05:13,320 118 00:05:13,320 --> 00:05:16,060 119 00:05:16,060 --> 00:05:17,500 120 00:05:17,500 --> 00:05:20,380 121 00:05:20,380 --> 00:05:23,920 122 00:05:23,920 --> 00:05:25,600 123 00:05:25,600 --> 00:05:32,740 124 00:05:32,740 --> 00:05:34,690 125 00:05:34,690 --> 00:05:37,180 126 00:05:37,180 --> 00:05:39,520 127 00:05:39,520 --> 00:05:41,260 128 00:05:41,260 --> 00:05:43,720 129 00:05:43,720 --> 00:05:45,250 130 00:05:45,250 --> 00:05:47,770 131 00:05:47,770 --> 00:05:49,270 132 00:05:49,270 --> 00:05:52,330 133 00:05:52,330 --> 00:05:56,130 134 00:05:56,130 --> 00:05:58,510 135 00:05:58,510 --> 00:06:02,530 136 00:06:02,530 --> 00:06:04,780 137 00:06:04,780 --> 00:06:09,280 138 00:06:09,280 --> 00:06:10,870 139 00:06:10,870 --> 00:06:12,670 140 00:06:12,670 --> 00:06:15,160 141 00:06:15,160 --> 00:06:17,740 142 00:06:17,740 --> 00:06:19,570 143 00:06:19,570 --> 00:06:24,220 144 00:06:24,220 --> 00:06:25,690 145 00:06:25,690 --> 00:06:28,710 146 00:06:28,710 --> 00:06:31,120 147 00:06:31,120 --> 00:06:35,830 148 00:06:35,830 --> 00:06:42,030 149 00:06:42,030 --> 00:06:45,219 150 00:06:45,219 --> 00:06:48,219 151 00:06:48,219 --> 00:06:51,159 152 00:06:51,159 --> 00:06:53,110 153 00:06:53,110 --> 00:06:57,250 154 00:06:57,250 --> 00:06:58,780 155 00:06:58,780 --> 00:07:00,719 156 00:07:00,719 --> 00:07:06,340 157 00:07:06,340 --> 00:07:09,100 158 00:07:09,100 --> 00:07:13,210 159 00:07:13,210 --> 00:07:15,969 160 00:07:15,969 --> 00:07:18,909 161 00:07:18,909 --> 00:07:21,909 162 00:07:21,909 --> 00:07:25,210 163 00:07:25,210 --> 00:07:27,730 164 00:07:27,730 --> 00:07:30,730 165 00:07:30,730 --> 00:07:32,770 166 00:07:32,770 --> 00:07:36,070 167 00:07:36,070 --> 00:07:38,409 168 00:07:38,409 --> 00:07:40,150 169 00:07:40,150 --> 00:07:43,450 170 00:07:43,450 --> 00:07:45,400 171 00:07:45,400 --> 00:07:47,920 172 00:07:47,920 --> 00:07:49,540 173 00:07:49,540 --> 00:07:52,140 174 00:07:52,140 --> 00:07:54,070 175 00:07:54,070 --> 00:07:57,159 176 00:07:57,159 --> 00:07:58,960 177 00:07:58,960 --> 00:08:00,969 178 00:08:00,969 --> 00:08:09,709 179 00:08:09,709 --> 00:08:14,059 180 00:08:14,059 --> 00:08:16,039 181 00:08:16,039 --> 00:08:19,159 182 00:08:19,159 --> 00:08:21,859 183 00:08:21,859 --> 00:08:26,269 184 00:08:26,269 --> 00:08:28,939 185 00:08:28,939 --> 00:08:31,939 186 00:08:31,939 --> 00:08:34,550 187 00:08:34,550 --> 00:08:38,149 188 00:08:38,149 --> 00:08:40,610 189 00:08:40,610 --> 00:08:42,529 190 00:08:42,529 --> 00:08:45,619 191 00:08:45,619 --> 00:08:48,139 192 00:08:48,139 --> 00:08:50,929 193 00:08:50,929 --> 00:08:53,720 194 00:08:53,720 --> 00:08:57,019 195 00:08:57,019 --> 00:08:58,879 196 00:08:58,879 --> 00:09:01,519 197 00:09:01,519 --> 00:09:03,889 198 00:09:03,889 --> 00:09:05,749 199 00:09:05,749 --> 00:09:09,769 200 00:09:09,769 --> 00:09:13,129 201 00:09:13,129 --> 00:09:15,199 202 00:09:15,199 --> 00:09:17,960 203 00:09:17,960 --> 00:09:19,850 204 00:09:19,850 --> 00:09:23,030 205 00:09:23,030 --> 00:09:26,389 206 00:09:26,389 --> 00:09:29,059 207 00:09:29,059 --> 00:09:31,759 208 00:09:31,759 --> 00:09:33,769 209 00:09:33,769 --> 00:09:44,210 210 00:09:44,210 --> 00:09:47,389 211 00:09:47,389 --> 00:09:50,629 212 00:09:50,629 --> 00:09:52,819 213 00:09:52,819 --> 00:09:57,170 214 00:09:57,170 --> 00:10:00,199 215 00:10:00,199 --> 00:10:03,559 216 00:10:03,559 --> 00:10:03,569 217 00:10:03,569 --> 00:10:08,130 218 00:10:08,130 --> 00:10:11,080 219 00:10:11,080 --> 00:10:13,090 220 00:10:13,090 --> 00:10:15,280 221 00:10:15,280 --> 00:10:19,210 222 00:10:19,210 --> 00:10:22,390 223 00:10:22,390 --> 00:10:24,160 224 00:10:24,160 --> 00:10:26,110 225 00:10:26,110 --> 00:10:29,290 226 00:10:29,290 --> 00:10:31,270 227 00:10:31,270 --> 00:10:33,370 228 00:10:33,370 --> 00:10:35,140 229 00:10:35,140 --> 00:10:50,410 230 00:10:50,410 --> 00:10:54,440 231 00:10:54,440 --> 00:10:58,130 232 00:10:58,130 --> 00:11:00,170 233 00:11:00,170 --> 00:11:02,810 234 00:11:02,810 --> 00:11:06,380 235 00:11:06,380 --> 00:11:08,720 236 00:11:08,720 --> 00:11:10,610 237 00:11:10,610 --> 00:11:14,240 238 00:11:14,240 --> 00:11:17,120 239 00:11:17,120 --> 00:11:18,980 240 00:11:18,980 --> 00:11:20,870 241 00:11:20,870 --> 00:11:23,960 242 00:11:23,960 --> 00:11:27,320 243 00:11:27,320 --> 00:11:29,540 244 00:11:29,540 --> 00:11:32,060 245 00:11:32,060 --> 00:11:33,740 246 00:11:33,740 --> 00:11:35,270 247 00:11:35,270 --> 00:11:38,840 248 00:11:38,840 --> 00:11:43,010 249 00:11:43,010 --> 00:11:44,810 250 00:11:44,810 --> 00:11:47,150 251 00:11:47,150 --> 00:11:50,410 252 00:11:50,410 --> 00:11:57,110 253 00:11:57,110 --> 00:11:59,780 254 00:11:59,780 --> 00:12:02,060 255 00:12:02,060 --> 00:12:03,980 256 00:12:03,980 --> 00:12:05,420 257 00:12:05,420 --> 00:12:07,040 258 00:12:07,040 --> 00:12:09,190 259 00:12:09,190 --> 00:12:12,050 260 00:12:12,050 --> 00:12:14,180 261 00:12:14,180 --> 00:12:15,740 262 00:12:15,740 --> 00:12:17,690 263 00:12:17,690 --> 00:12:20,000 264 00:12:20,000 --> 00:12:22,010 265 00:12:22,010 --> 00:12:23,930 266 00:12:23,930 --> 00:12:27,290 267 00:12:27,290 --> 00:12:29,390 268 00:12:29,390 --> 00:12:31,160 269 00:12:31,160 --> 00:12:34,400 270 00:12:34,400 --> 00:12:36,530 271 00:12:36,530 --> 00:12:38,240 272 00:12:38,240 --> 00:12:47,369 273 00:12:47,369 --> 00:12:51,449 274 00:12:51,449 --> 00:12:54,489 275 00:12:54,489 --> 00:12:56,650 276 00:12:56,650 --> 00:12:59,259 277 00:12:59,259 --> 00:13:01,540 278 00:13:01,540 --> 00:13:03,730 279 00:13:03,730 --> 00:13:08,530 280 00:13:08,530 --> 00:13:12,009 281 00:13:12,009 --> 00:13:14,049 282 00:13:14,049 --> 00:13:16,150 283 00:13:16,150 --> 00:13:19,059 284 00:13:19,059 --> 00:13:20,619 285 00:13:20,619 --> 00:13:23,160 286 00:13:23,160 --> 00:13:26,439 287 00:13:26,439 --> 00:13:29,350 288 00:13:29,350 --> 00:13:32,519 289 00:13:32,519 --> 00:13:34,660 290 00:13:34,660 --> 00:13:36,639 291 00:13:36,639 --> 00:13:39,340 292 00:13:39,340 --> 00:13:42,609 293 00:13:42,609 --> 00:13:44,379 294 00:13:44,379 --> 00:13:46,600 295 00:13:46,600 --> 00:13:48,699 296 00:13:48,699 --> 00:13:51,579 297 00:13:51,579 --> 00:13:54,819 298 00:13:54,819 --> 00:13:56,980 299 00:13:56,980 --> 00:13:59,799 300 00:13:59,799 --> 00:14:02,199 301 00:14:02,199 --> 00:14:05,079 302 00:14:05,079 --> 00:14:07,030 303 00:14:07,030 --> 00:14:13,210 304 00:14:13,210 --> 00:14:15,489 305 00:14:15,489 --> 00:14:17,769 306 00:14:17,769 --> 00:14:19,480 307 00:14:19,480 --> 00:14:21,519 308 00:14:21,519 --> 00:14:23,769 309 00:14:23,769 --> 00:14:25,749 310 00:14:25,749 --> 00:14:29,889 311 00:14:29,889 --> 00:14:33,039 312 00:14:33,039 --> 00:14:40,210 313 00:14:40,210 --> 00:14:44,889 314 00:14:44,889 --> 00:14:47,889 315 00:14:47,889 --> 00:14:50,379 316 00:14:50,379 --> 00:14:52,179 317 00:14:52,179 --> 00:14:54,909 318 00:14:54,909 --> 00:14:57,460 319 00:14:57,460 --> 00:15:00,159 320 00:15:00,159 --> 00:15:03,339 321 00:15:03,339 --> 00:15:07,539 322 00:15:07,539 --> 00:15:09,879 323 00:15:09,879 --> 00:15:12,789 324 00:15:12,789 --> 00:15:16,299 325 00:15:16,299 --> 00:15:20,769 326 00:15:20,769 --> 00:15:24,039 327 00:15:24,039 --> 00:15:29,559 328 00:15:29,559 --> 00:15:32,169 329 00:15:32,169 --> 00:15:34,269 330 00:15:34,269 --> 00:15:35,739 331 00:15:35,739 --> 00:15:40,929 332 00:15:40,929 --> 00:15:44,969 333 00:15:44,969 --> 00:15:48,789 334 00:15:48,789 --> 00:15:51,339 335 00:15:51,339 --> 00:15:53,079 336 00:15:53,079 --> 00:15:55,449 337 00:15:55,449 --> 00:15:57,759 338 00:15:57,759 --> 00:15:59,769 339 00:15:59,769 --> 00:16:03,460 340 00:16:03,460 --> 00:16:05,319 341 00:16:05,319 --> 00:16:08,769 342 00:16:08,769 --> 00:16:10,929 343 00:16:10,929 --> 00:16:14,229 344 00:16:14,229 --> 00:16:16,389 345 00:16:16,389 --> 00:16:20,439 346 00:16:20,439 --> 00:16:23,439 347 00:16:23,439 --> 00:16:27,039 348 00:16:27,039 --> 00:16:28,840 349 00:16:28,840 --> 00:16:32,649 350 00:16:32,649 --> 00:16:34,269 351 00:16:34,269 --> 00:16:36,279 352 00:16:36,279 --> 00:16:38,889 353 00:16:38,889 --> 00:16:41,169 354 00:16:41,169 --> 00:16:45,789 355 00:16:45,789 --> 00:16:51,939 356 00:16:51,939 --> 00:16:54,399 357 00:16:54,399 --> 00:16:57,519 358 00:16:57,519 --> 00:17:00,759 359 00:17:00,759 --> 00:17:04,090 360 00:17:04,090 --> 00:17:07,689 361 00:17:07,689 --> 00:17:10,299 362 00:17:10,299 --> 00:17:11,230 363 00:17:11,230 --> 00:17:14,350 364 00:17:14,350 --> 00:17:17,470 365 00:17:17,470 --> 00:17:21,699 366 00:17:21,699 --> 00:17:24,400 367 00:17:24,400 --> 00:17:26,140 368 00:17:26,140 --> 00:17:28,510 369 00:17:28,510 --> 00:17:30,100 370 00:17:30,100 --> 00:17:31,780 371 00:17:31,780 --> 00:17:35,410 372 00:17:35,410 --> 00:17:37,330 373 00:17:37,330 --> 00:17:39,400 374 00:17:39,400 --> 00:17:43,660 375 00:17:43,660 --> 00:17:48,370 376 00:17:48,370 --> 00:17:51,390 377 00:17:51,390 --> 00:17:53,980 378 00:17:53,980 --> 00:17:56,830 379 00:17:56,830 --> 00:17:58,150 380 00:17:58,150 --> 00:18:00,730 381 00:18:00,730 --> 00:18:03,310 382 00:18:03,310 --> 00:18:05,380 383 00:18:05,380 --> 00:18:07,690 384 00:18:07,690 --> 00:18:10,480 385 00:18:10,480 --> 00:18:14,620 386 00:18:14,620 --> 00:18:17,020 387 00:18:17,020 --> 00:18:19,570 388 00:18:19,570 --> 00:18:24,220 389 00:18:24,220 --> 00:18:26,110 390 00:18:26,110 --> 00:18:28,180 391 00:18:28,180 --> 00:18:30,340 392 00:18:30,340 --> 00:18:33,700 393 00:18:33,700 --> 00:18:39,250 394 00:18:39,250 --> 00:18:45,700 395 00:18:45,700 --> 00:18:50,950 396 00:18:50,950 --> 00:18:52,870 397 00:18:52,870 --> 00:18:55,330 398 00:18:55,330 --> 00:18:57,130 399 00:18:57,130 --> 00:19:00,190 400 00:19:00,190 --> 00:19:03,430 401 00:19:03,430 --> 00:19:05,020 402 00:19:05,020 --> 00:19:07,750 403 00:19:07,750 --> 00:19:12,860 404 00:19:12,860 --> 00:19:16,230 405 00:19:16,230 --> 00:19:21,480 406 00:19:21,480 --> 00:19:24,120 407 00:19:24,120 --> 00:19:25,710 408 00:19:25,710 --> 00:19:28,320 409 00:19:28,320 --> 00:19:29,970 410 00:19:29,970 --> 00:19:31,260 411 00:19:31,260 --> 00:19:34,680 412 00:19:34,680 --> 00:19:36,360 413 00:19:36,360 --> 00:19:37,919 414 00:19:37,919 --> 00:19:40,740 415 00:19:40,740 --> 00:19:45,180 416 00:19:45,180 --> 00:19:50,549 417 00:19:50,549 --> 00:19:52,799 418 00:19:52,799 --> 00:19:55,980 419 00:19:55,980 --> 00:19:58,140 420 00:19:58,140 --> 00:19:59,850 421 00:19:59,850 --> 00:20:02,030 422 00:20:02,030 --> 00:20:06,690 423 00:20:06,690 --> 00:20:09,390 424 00:20:09,390 --> 00:20:11,039 425 00:20:11,039 --> 00:20:14,730 426 00:20:14,730 --> 00:20:20,970 427 00:20:20,970 --> 00:20:24,270 428 00:20:24,270 --> 00:20:34,960 429 00:20:34,960 --> 00:20:38,150 430 00:20:38,150 --> 00:20:40,130 431 00:20:40,130 --> 00:20:43,880 432 00:20:43,880 --> 00:20:46,490 433 00:20:46,490 --> 00:20:50,539 434 00:20:50,539 --> 00:20:52,280 435 00:20:52,280 --> 00:20:54,530 436 00:20:54,530 --> 00:20:56,180 437 00:20:56,180 --> 00:20:58,520 438 00:20:58,520 --> 00:21:01,909 439 00:21:01,909 --> 00:21:04,460 440 00:21:04,460 --> 00:21:06,080 441 00:21:06,080 --> 00:21:07,909 442 00:21:07,909 --> 00:21:10,549 443 00:21:10,549 --> 00:21:12,680 444 00:21:12,680 --> 00:21:15,380 445 00:21:15,380 --> 00:21:17,060 446 00:21:17,060 --> 00:21:20,240 447 00:21:20,240 --> 00:21:23,690 448 00:21:23,690 --> 00:21:26,210 449 00:21:26,210 --> 00:21:28,460 450 00:21:28,460 --> 00:21:31,820 451 00:21:31,820 --> 00:21:33,860 452 00:21:33,860 --> 00:21:35,480 453 00:21:35,480 --> 00:21:49,000 454 00:21:49,000 --> 00:21:51,830 455 00:21:51,830 --> 00:21:53,659 456 00:21:53,659 --> 00:22:02,340 457 00:22:02,340 --> 00:22:09,420 458 00:22:09,420 --> 00:22:12,130 459 00:22:12,130 --> 00:22:13,870 460 00:22:13,870 --> 00:22:16,810 461 00:22:16,810 --> 00:22:19,030 462 00:22:19,030 --> 00:22:21,070 463 00:22:21,070 --> 00:23:11,710 464 00:23:11,710 --> 00:23:13,510 465 00:23:13,510 --> 00:23:14,980 466 00:23:14,980 --> 00:23:17,260 467 00:23:17,260 --> 00:23:19,330 468 00:23:19,330 --> 00:23:20,920 469 00:23:20,920 --> 00:23:28,510 470 00:23:28,510 --> 00:23:30,010 471 00:23:30,010 --> 00:23:32,080 472 00:23:32,080 --> 00:23:36,040 473 00:23:36,040 --> 00:23:39,640 474 00:23:39,640 --> 00:23:41,680 475 00:23:41,680 --> 00:23:43,600 476 00:23:43,600 --> 00:23:45,610 477 00:23:45,610 --> 00:23:49,750 478 00:23:49,750 --> 00:23:52,090 479 00:23:52,090 --> 00:23:54,070 480 00:23:54,070 --> 00:23:57,270 481 00:23:57,270 --> 00:24:00,550 482 00:24:00,550 --> 00:24:03,520 483 00:24:03,520 --> 00:24:08,780 484 00:24:08,780 --> 00:24:12,800 485 00:24:12,800 --> 00:24:15,170 486 00:24:15,170 --> 00:24:17,240 487 00:24:17,240 --> 00:24:18,770 488 00:24:18,770 --> 00:24:22,070 489 00:24:22,070 --> 00:24:23,510 490 00:24:23,510 --> 00:24:26,570 491 00:24:26,570 --> 00:24:29,540 492 00:24:29,540 --> 00:24:32,090 493 00:24:32,090 --> 00:24:34,520 494 00:24:34,520 --> 00:24:36,470 495 00:24:36,470 --> 00:24:40,490 496 00:24:40,490 --> 00:24:42,050 497 00:24:42,050 --> 00:24:44,780 498 00:24:44,780 --> 00:24:47,450 499 00:24:47,450 --> 00:24:49,340 500 00:24:49,340 --> 00:24:51,380 501 00:24:51,380 --> 00:24:55,310 502 00:24:55,310 --> 00:24:57,350 503 00:24:57,350 --> 00:24:59,660 504 00:24:59,660 --> 00:25:01,460 505 00:25:01,460 --> 00:25:04,370 506 00:25:04,370 --> 00:25:06,590 507 00:25:06,590 --> 00:25:09,110 508 00:25:09,110 --> 00:25:14,600 509 00:25:14,600 --> 00:25:16,610 510 00:25:16,610 --> 00:25:18,080 511 00:25:18,080 --> 00:25:19,760 512 00:25:19,760 --> 00:25:21,740 513 00:25:21,740 --> 00:25:25,220 514 00:25:25,220 --> 00:25:27,110 515 00:25:27,110 --> 00:25:30,130 516 00:25:30,130 --> 00:25:32,780 517 00:25:32,780 --> 00:25:35,060 518 00:25:35,060 --> 00:25:36,740 519 00:25:36,740 --> 00:25:39,350 520 00:25:39,350 --> 00:25:40,610 521 00:25:40,610 --> 00:25:42,530 522 00:25:42,530 --> 00:25:45,230 523 00:25:45,230 --> 00:25:51,950 524 00:25:51,950 --> 00:25:54,620 525 00:25:54,620 --> 00:25:56,780 526 00:25:56,780 --> 00:25:58,790 527 00:25:58,790 --> 00:26:00,170 528 00:26:00,170 --> 00:26:02,240 529 00:26:02,240 --> 00:26:05,030 530 00:26:05,030 --> 00:26:07,760 531 00:26:07,760 --> 00:26:10,930 532 00:26:10,930 --> 00:26:13,760 533 00:26:13,760 --> 00:26:15,260 534 00:26:15,260 --> 00:26:17,660 535 00:26:17,660 --> 00:26:19,820 536 00:26:19,820 --> 00:26:32,170 537 00:26:32,170 --> 00:26:35,200 538 00:26:35,200 --> 00:26:39,250 539 00:26:39,250 --> 00:26:42,190 540 00:26:42,190 --> 00:26:44,140 541 00:26:44,140 --> 00:26:47,710 542 00:26:47,710 --> 00:26:49,750 543 00:26:49,750 --> 00:26:52,510 544 00:26:52,510 --> 00:26:55,060 545 00:26:55,060 --> 00:26:57,550 546 00:26:57,550 --> 00:26:59,290 547 00:26:59,290 --> 00:27:02,440 548 00:27:02,440 --> 00:27:04,720 549 00:27:04,720 --> 00:27:07,750 550 00:27:07,750 --> 00:27:12,880 551 00:27:12,880 --> 00:27:14,830 552 00:27:14,830 --> 00:27:16,630 553 00:27:16,630 --> 00:27:19,210 554 00:27:19,210 --> 00:27:21,550 555 00:27:21,550 --> 00:27:24,400 556 00:27:24,400 --> 00:27:26,470 557 00:27:26,470 --> 00:27:30,220 558 00:27:30,220 --> 00:27:33,940 559 00:27:33,940 --> 00:27:36,130 560 00:27:36,130 --> 00:27:40,240 561 00:27:40,240 --> 00:27:41,620 562 00:27:41,620 --> 00:27:43,630 563 00:27:43,630 --> 00:27:47,230 564 00:27:47,230 --> 00:27:54,280 565 00:27:54,280 --> 00:27:55,990 566 00:27:55,990 --> 00:27:57,520 567 00:27:57,520 --> 00:28:00,070 568 00:28:00,070 --> 00:28:02,440 569 00:28:02,440 --> 00:28:05,500 570 00:28:05,500 --> 00:28:08,170 571 00:28:08,170 --> 00:28:10,210 572 00:28:10,210 --> 00:28:12,130 573 00:28:12,130 --> 00:28:17,230 574 00:28:17,230 --> 00:28:18,880 575 00:28:18,880 --> 00:28:22,420 576 00:28:22,420 --> 00:28:24,240 577 00:28:24,240 --> 00:28:27,520 578 00:28:27,520 --> 00:28:30,220 579 00:28:30,220 --> 00:28:32,050 580 00:28:32,050 --> 00:28:34,630 581 00:28:34,630 --> 00:28:36,910 582 00:28:36,910 --> 00:28:38,650 583 00:28:38,650 --> 00:28:40,570 584 00:28:40,570 --> 00:28:45,160 585 00:28:45,160 --> 00:28:46,000 586 00:28:46,000 --> 00:28:49,000 587 00:28:49,000 --> 00:28:50,710 588 00:28:50,710 --> 00:28:55,330 589 00:28:55,330 --> 00:28:57,970 590 00:28:57,970 --> 00:29:01,000 591 00:29:01,000 --> 00:29:03,400 592 00:29:03,400 --> 00:29:06,340 593 00:29:06,340 --> 00:29:10,270 594 00:29:10,270 --> 00:29:14,049 595 00:29:14,049 --> 00:29:16,900 596 00:29:16,900 --> 00:29:18,909 597 00:29:18,909 --> 00:29:23,169 598 00:29:23,169 --> 00:29:26,530 599 00:29:26,530 --> 00:29:29,770 600 00:29:29,770 --> 00:29:31,930 601 00:29:31,930 --> 00:29:35,169 602 00:29:35,169 --> 00:29:37,060 603 00:29:37,060 --> 00:29:39,520 604 00:29:39,520 --> 00:29:41,950 605 00:29:41,950 --> 00:29:45,430 606 00:29:45,430 --> 00:29:48,039 607 00:29:48,039 --> 00:29:51,810 608 00:29:51,810 --> 00:29:55,390 609 00:29:55,390 --> 00:29:57,490 610 00:29:57,490 --> 00:30:00,690 611 00:30:00,690 --> 00:30:03,460 612 00:30:03,460 --> 00:30:05,650 613 00:30:05,650 --> 00:30:08,049 614 00:30:08,049 --> 00:30:10,260 615 00:30:10,260 --> 00:30:13,240 616 00:30:13,240 --> 00:30:16,360 617 00:30:16,360 --> 00:30:21,669 618 00:30:21,669 --> 00:30:23,409 619 00:30:23,409 --> 00:30:25,330 620 00:30:25,330 --> 00:30:35,310 621 00:30:35,310 --> 00:30:41,880 622 00:30:41,880 --> 00:30:44,080 623 00:30:44,080 --> 00:30:46,510 624 00:30:46,510 --> 00:30:49,420 625 00:30:49,420 --> 00:30:51,520 626 00:30:51,520 --> 00:30:54,250 627 00:30:54,250 --> 00:30:56,380 628 00:30:56,380 --> 00:31:01,420 629 00:31:01,420 --> 00:31:05,680 630 00:31:05,680 --> 00:31:07,960 631 00:31:07,960 --> 00:31:09,940 632 00:31:09,940 --> 00:31:12,670 633 00:31:12,670 --> 00:31:14,560 634 00:31:14,560 --> 00:31:22,270 635 00:31:22,270 --> 00:31:24,820 636 00:31:24,820 --> 00:31:28,330 637 00:31:28,330 --> 00:31:31,210 638 00:31:31,210 --> 00:31:33,910 639 00:31:33,910 --> 00:31:38,050 640 00:31:38,050 --> 00:31:40,330 641 00:31:40,330 --> 00:31:42,730 642 00:31:42,730 --> 00:31:44,950 643 00:31:44,950 --> 00:31:47,290 644 00:31:47,290 --> 00:31:49,330 645 00:31:49,330 --> 00:31:51,910 646 00:31:51,910 --> 00:31:54,520 647 00:31:54,520 --> 00:31:56,440 648 00:31:56,440 --> 00:31:58,900 649 00:31:58,900 --> 00:32:03,690 650 00:32:03,690 --> 00:32:06,250 651 00:32:06,250 --> 00:32:08,050 652 00:32:08,050 --> 00:32:13,300 653 00:32:13,300 --> 00:32:16,890 654 00:32:16,890 --> 00:32:20,030 655 00:32:20,030 --> 00:32:22,340 656 00:32:22,340 --> 00:32:34,640 657 00:32:34,640 --> 00:32:37,070 658 00:32:37,070 --> 00:32:39,680 659 00:32:39,680 --> 00:32:42,050 660 00:32:42,050 --> 00:32:44,480 661 00:32:44,480 --> 00:32:46,940 662 00:32:46,940 --> 00:32:48,410 663 00:32:48,410 --> 00:32:51,380 664 00:32:51,380 --> 00:32:53,150 665 00:32:53,150 --> 00:32:54,740 666 00:32:54,740 --> 00:32:56,330 667 00:32:56,330 --> 00:33:05,180 668 00:33:05,180 --> 00:33:07,250 669 00:33:07,250 --> 00:33:09,640 670 00:33:09,640 --> 00:33:13,820 671 00:33:13,820 --> 00:33:15,530 672 00:33:15,530 --> 00:33:17,540 673 00:33:17,540 --> 00:33:21,290 674 00:33:21,290 --> 00:33:23,750 675 00:33:23,750 --> 00:33:26,300 676 00:33:26,300 --> 00:33:28,610 677 00:33:28,610 --> 00:33:31,040 678 00:33:31,040 --> 00:33:35,210 679 00:33:35,210 --> 00:33:37,790 680 00:33:37,790 --> 00:33:42,770 681 00:33:42,770 --> 00:33:45,070 682 00:33:45,070 --> 00:33:49,310 683 00:33:49,310 --> 00:33:52,220 684 00:33:52,220 --> 00:33:54,080 685 00:33:54,080 --> 00:33:58,790 686 00:33:58,790 --> 00:34:01,580 687 00:34:01,580 --> 00:34:04,960 688 00:34:04,960 --> 00:34:08,530 689 00:34:08,530 --> 00:34:12,440 690 00:34:12,440 --> 00:34:14,180 691 00:34:14,180 --> 00:34:18,860 692 00:34:18,860 --> 00:34:23,360 693 00:34:23,360 --> 00:34:25,340 694 00:34:25,340 --> 00:34:29,419 695 00:34:29,419 --> 00:34:31,909 696 00:34:31,909 --> 00:34:33,530 697 00:34:33,530 --> 00:34:36,440 698 00:34:36,440 --> 00:34:38,570 699 00:34:38,570 --> 00:34:42,680 700 00:34:42,680 --> 00:34:46,370 701 00:34:46,370 --> 00:34:48,830 702 00:34:48,830 --> 00:34:51,080 703 00:34:51,080 --> 00:34:53,360 704 00:34:53,360 --> 00:34:57,470 705 00:34:57,470 --> 00:35:00,170 706 00:35:00,170 --> 00:35:01,640 707 00:35:01,640 --> 00:35:03,680 708 00:35:03,680 --> 00:35:05,960 709 00:35:05,960 --> 00:35:08,840 710 00:35:08,840 --> 00:35:11,690 711 00:35:11,690 --> 00:35:13,910 712 00:35:13,910 --> 00:35:16,370 713 00:35:16,370 --> 00:35:17,960 714 00:35:17,960 --> 00:35:21,170 715 00:35:21,170 --> 00:35:23,420 716 00:35:23,420 --> 00:35:25,310 717 00:35:25,310 --> 00:35:27,260 718 00:35:27,260 --> 00:35:28,760 719 00:35:28,760 --> 00:35:31,250 720 00:35:31,250 --> 00:35:33,650 721 00:35:33,650 --> 00:35:36,440 722 00:35:36,440 --> 00:35:38,420 723 00:35:38,420 --> 00:35:44,870 724 00:35:44,870 --> 00:35:47,030 725 00:35:47,030 --> 00:35:50,420 726 00:35:50,420 --> 00:35:52,160 727 00:35:52,160 --> 00:35:57,860 728 00:35:57,860 --> 00:36:00,980 729 00:36:00,980 --> 00:36:03,740 730 00:36:03,740 --> 00:36:05,870 731 00:36:05,870 --> 00:36:07,670 732 00:36:07,670 --> 00:36:09,890 733 00:36:09,890 --> 00:36:12,500 734 00:36:12,500 --> 00:36:15,800 735 00:36:15,800 --> 00:36:19,010 736 00:36:19,010 --> 00:36:21,680 737 00:36:21,680 --> 00:36:23,840 738 00:36:23,840 --> 00:36:26,420 739 00:36:26,420 --> 00:36:29,510 740 00:36:29,510 --> 00:36:31,940 741 00:36:31,940 --> 00:36:33,770 742 00:36:33,770 --> 00:36:36,080 743 00:36:36,080 --> 00:36:39,830 744 00:36:39,830 --> 00:36:41,660 745 00:36:41,660 --> 00:36:45,140 746 00:36:45,140 --> 00:36:46,200 747 00:36:46,200 --> 00:36:48,269 748 00:36:48,269 --> 00:36:50,279 749 00:36:50,279 --> 00:36:52,589 750 00:36:52,589 --> 00:36:54,420 751 00:36:54,420 --> 00:36:56,370 752 00:36:56,370 --> 00:36:59,309 753 00:36:59,309 --> 00:37:05,579 754 00:37:05,579 --> 00:37:07,079 755 00:37:07,079 --> 00:37:09,510 756 00:37:09,510 --> 00:37:11,339 757 00:37:11,339 --> 00:37:13,490 758 00:37:13,490 --> 00:37:17,160 759 00:37:17,160 --> 00:37:18,630 760 00:37:18,630 --> 00:37:20,700 761 00:37:20,700 --> 00:37:26,549 762 00:37:26,549 --> 00:37:29,190 763 00:37:29,190 --> 00:37:31,559 764 00:37:31,559 --> 00:37:34,440 765 00:37:34,440 --> 00:37:36,000 766 00:37:36,000 --> 00:37:37,620 767 00:37:37,620 --> 00:37:39,359 768 00:37:39,359 --> 00:37:41,670 769 00:37:41,670 --> 00:37:44,880 770 00:37:44,880 --> 00:37:47,549 771 00:37:47,549 --> 00:37:49,049 772 00:37:49,049 --> 00:37:51,269 773 00:37:51,269 --> 00:37:53,190 774 00:37:53,190 --> 00:37:56,849 775 00:37:56,849 --> 00:37:58,680 776 00:37:58,680 --> 00:38:02,400 777 00:38:02,400 --> 00:38:04,410 778 00:38:04,410 --> 00:38:06,269 779 00:38:06,269 --> 00:38:09,390 780 00:38:09,390 --> 00:38:10,920 781 00:38:10,920 --> 00:38:13,470 782 00:38:13,470 --> 00:38:16,289 783 00:38:16,289 --> 00:38:19,710 784 00:38:19,710 --> 00:38:21,720 785 00:38:21,720 --> 00:38:23,309 786 00:38:23,309 --> 00:38:25,500 787 00:38:25,500 --> 00:38:30,180 788 00:38:30,180 --> 00:38:32,430 789 00:38:32,430 --> 00:38:35,750 790 00:38:35,750 --> 00:38:39,480 791 00:38:39,480 --> 00:38:41,160 792 00:38:41,160 --> 00:38:42,930 793 00:38:42,930 --> 00:38:55,680 794 00:38:55,680 --> 00:38:59,700 795 00:38:59,700 --> 00:39:02,980 796 00:39:02,980 --> 00:39:06,240 797 00:39:06,240 --> 00:39:10,270 798 00:39:10,270 --> 00:39:11,859 799 00:39:11,859 --> 00:39:14,580 800 00:39:14,580 --> 00:39:17,170 801 00:39:17,170 --> 00:39:20,800 802 00:39:20,800 --> 00:39:23,440 803 00:39:23,440 --> 00:39:25,060 804 00:39:25,060 --> 00:39:27,940 805 00:39:27,940 --> 00:39:29,740 806 00:39:29,740 --> 00:39:32,470 807 00:39:32,470 --> 00:39:36,640 808 00:39:36,640 --> 00:39:39,940 809 00:39:39,940 --> 00:39:42,910 810 00:39:42,910 --> 00:39:57,650 811 00:39:57,650 --> 00:40:04,979 812 00:40:04,979 --> 00:40:09,630 813 00:40:09,630 --> 00:40:12,269 814 00:40:12,269 --> 00:40:16,679 815 00:40:16,679 --> 00:40:18,120 816 00:40:18,120 --> 00:40:20,699 817 00:40:20,699 --> 00:40:24,329 818 00:40:24,329 --> 00:40:29,939 819 00:40:29,939 --> 00:40:32,069 820 00:40:32,069 --> 00:40:34,279 821 00:40:34,279 --> 00:40:37,679 822 00:40:37,679 --> 00:40:42,539 823 00:40:42,539 --> 00:40:45,109 824 00:40:45,109 --> 00:40:47,099 825 00:40:47,099 --> 00:40:48,809 826 00:40:48,809 --> 00:40:55,499 827 00:40:55,499 --> 00:40:57,410 828 00:40:57,410 --> 00:40:59,939 829 00:40:59,939 --> 00:41:02,370 830 00:41:02,370 --> 00:41:05,189 831 00:41:05,189 --> 00:41:08,789 832 00:41:08,789 --> 00:41:11,219 833 00:41:11,219 --> 00:41:14,120 834 00:41:14,120 --> 00:41:18,449 835 00:41:18,449 --> 00:41:20,999 836 00:41:20,999 --> 00:41:22,799 837 00:41:22,799 --> 00:41:25,969 838 00:41:25,969 --> 00:41:28,709 839 00:41:28,709 --> 00:41:31,679 840 00:41:31,679 --> 00:41:33,509 841 00:41:33,509 --> 00:41:35,819 842 00:41:35,819 --> 00:41:37,499 843 00:41:37,499 --> 00:41:39,479 844 00:41:39,479 --> 00:41:41,880 845 00:41:41,880 --> 00:41:44,669 846 00:41:44,669 --> 00:41:46,829 847 00:41:46,829 --> 00:41:48,419 848 00:41:48,419 --> 00:41:50,400 849 00:41:50,400 --> 00:41:50,410 850 00:41:50,410 --> 00:41:55,890 851 00:41:55,890 --> 00:41:59,740 852 00:41:59,740 --> 00:42:03,010 853 00:42:03,010 --> 00:42:04,570 854 00:42:04,570 --> 00:42:07,450 855 00:42:07,450 --> 00:42:09,940 856 00:42:09,940 --> 00:42:13,780 857 00:42:13,780 --> 00:42:17,290 858 00:42:17,290 --> 00:42:19,240 859 00:42:19,240 --> 00:42:22,090 860 00:42:22,090 --> 00:42:25,030 861 00:42:25,030 --> 00:42:28,870 862 00:42:28,870 --> 00:42:30,760 863 00:42:30,760 --> 00:42:30,770 864 00:42:30,770 --> 00:42:32,320 865 00:42:32,320 --> 00:42:34,270 866 00:42:34,270 --> 00:42:37,870 867 00:42:37,870 --> 00:42:39,900 868 00:42:39,900 --> 00:42:43,450 869 00:42:43,450 --> 00:42:47,020 870 00:42:47,020 --> 00:42:50,200 871 00:42:50,200 --> 00:42:53,500 872 00:42:53,500 --> 00:42:56,230 873 00:42:56,230 --> 00:42:58,090 874 00:42:58,090 --> 00:43:00,820 875 00:43:00,820 --> 00:43:07,270 876 00:43:07,270 --> 00:43:09,670 877 00:43:09,670 --> 00:43:11,110 878 00:43:11,110 --> 00:43:13,750 879 00:43:13,750 --> 00:43:16,150 880 00:43:16,150 --> 00:43:22,130 881 00:43:22,130 --> 00:43:24,359 882 00:43:24,359 --> 00:43:25,859 883 00:43:25,859 --> 00:43:30,240 884 00:43:30,240 --> 00:43:32,190 885 00:43:32,190 --> 00:43:34,530 886 00:43:34,530 --> 00:43:36,300 887 00:43:36,300 --> 00:43:38,040 888 00:43:38,040 --> 00:43:42,930 889 00:43:42,930 --> 00:43:45,000 890 00:43:45,000 --> 00:43:47,609 891 00:43:47,609 --> 00:43:50,250 892 00:43:50,250 --> 00:43:52,770 893 00:43:52,770 --> 00:43:55,170 894 00:43:55,170 --> 00:43:57,630 895 00:43:57,630 --> 00:43:59,970 896 00:43:59,970 --> 00:44:03,260 897 00:44:03,260 --> 00:44:16,440 898 00:44:16,440 --> 00:44:18,730 899 00:44:18,730 --> 00:44:21,850 900 00:44:21,850 --> 00:44:23,920 901 00:44:23,920 --> 00:44:27,390 902 00:44:27,390 --> 00:44:31,440 903 00:44:31,440 --> 00:44:34,330 904 00:44:34,330 --> 00:44:37,000 905 00:44:37,000 --> 00:44:39,490 906 00:44:39,490 --> 00:44:42,880 907 00:44:42,880 --> 00:44:44,770 908 00:44:44,770 --> 00:44:48,490 909 00:44:48,490 --> 00:44:52,540 910 00:44:52,540 --> 00:44:55,480 911 00:44:55,480 --> 00:44:57,359 912 00:44:57,359 --> 00:44:59,800 913 00:44:59,800 --> 00:45:01,900 914 00:45:01,900 --> 00:45:04,630 915 00:45:04,630 --> 00:45:07,090 916 00:45:07,090 --> 00:45:09,609 917 00:45:09,609 --> 00:45:11,890 918 00:45:11,890 --> 00:45:14,080 919 00:45:14,080 --> 00:45:16,060 920 00:45:16,060 --> 00:45:18,640 921 00:45:18,640 --> 00:45:20,710 922 00:45:20,710 --> 00:45:24,760 923 00:45:24,760 --> 00:45:27,220 924 00:45:27,220 --> 00:45:28,599 925 00:45:28,599 --> 00:45:35,080 926 00:45:35,080 --> 00:45:37,030 927 00:45:37,030 --> 00:45:39,250 928 00:45:39,250 --> 00:45:41,650 929 00:45:41,650 --> 00:45:46,240 930 00:45:46,240 --> 00:45:48,280 931 00:45:48,280 --> 00:45:58,310 932 00:45:58,310 --> 00:46:01,140 933 00:46:01,140 --> 00:46:04,620 934 00:46:04,620 --> 00:46:06,630 935 00:46:06,630 --> 00:46:06,640 936 00:46:06,640 --> 00:46:07,830 937 00:46:07,830 --> 00:46:17,700 938 00:46:17,700 --> 00:46:19,230 939 00:46:19,230 --> 00:46:21,120 940 00:46:21,120 --> 00:46:23,100 941 00:46:23,100 --> 00:46:25,680 942 00:46:25,680 --> 00:46:28,680 943 00:46:28,680 --> 00:46:30,810 944 00:46:30,810 --> 00:46:32,520 945 00:46:32,520 --> 00:46:33,960 946 00:46:33,960 --> 00:46:37,560 947 00:46:37,560 --> 00:46:40,170 948 00:46:40,170 --> 00:46:42,420 949 00:46:42,420 --> 00:46:42,430 950 00:46:42,430 --> 00:46:43,470 951 00:46:43,470 --> 00:46:45,690 952 00:46:45,690 --> 00:46:50,490 953 00:46:50,490 --> 00:46:52,380 954 00:46:52,380 --> 00:46:54,570 955 00:46:54,570 --> 00:46:56,520 956 00:46:56,520 --> 00:46:58,530 957 00:46:58,530 --> 00:47:02,130 958 00:47:02,130 --> 00:47:04,440 959 00:47:04,440 --> 00:47:07,230 960 00:47:07,230 --> 00:47:08,330 961 00:47:08,330 --> 00:47:10,770 962 00:47:10,770 --> 00:47:12,900 963 00:47:12,900 --> 00:47:15,090 964 00:47:15,090 --> 00:47:15,100 965 00:47:15,100 --> 00:47:22,840 966 00:47:22,840 --> 00:47:25,220 967 00:47:25,220 --> 00:47:26,840 968 00:47:26,840 --> 00:47:29,450 969 00:47:29,450 --> 00:47:31,700 970 00:47:31,700 --> 00:47:44,300 971 00:47:44,300 --> 00:47:48,410 972 00:47:48,410 --> 00:47:54,620 973 00:47:54,620 --> 00:48:02,820 974 00:48:02,820 --> 00:48:04,890 975 00:48:04,890 --> 00:48:08,400 976 00:48:08,400 --> 00:48:10,580 977 00:48:10,580 --> 00:48:13,350 978 00:48:13,350 --> 00:48:15,210 979 00:48:15,210 --> 00:48:21,240 980 00:48:21,240 --> 00:48:25,710 981 00:48:25,710 --> 00:48:29,370 982 00:48:29,370 --> 00:48:31,650 983 00:48:31,650 --> 00:48:35,880 984 00:48:35,880 --> 00:48:37,800 985 00:48:37,800 --> 00:48:39,810 986 00:48:39,810 --> 00:48:41,730 987 00:48:41,730 --> 00:48:45,630 988 00:48:45,630 --> 00:48:47,400 989 00:48:47,400 --> 00:48:51,750 990 00:48:51,750 --> 00:48:53,610 991 00:48:53,610 --> 00:48:56,340 992 00:48:56,340 --> 00:48:59,760 993 00:48:59,760 --> 00:49:01,820 994 00:49:01,820 --> 00:49:04,230 995 00:49:04,230 --> 00:49:07,110 996 00:49:07,110 --> 00:49:09,240 997 00:49:09,240 --> 00:49:10,530 998 00:49:10,530 --> 00:49:13,500 999 00:49:13,500 --> 00:49:15,060 1000 00:49:15,060 --> 00:49:17,550 1001 00:49:17,550 --> 00:49:24,660 1002 00:49:24,660 --> 00:49:26,460 1003 00:49:26,460 --> 00:49:30,000 1004 00:49:30,000 --> 00:49:32,070 1005 00:49:32,070 --> 00:49:34,680 1006 00:49:34,680 --> 00:49:36,660 1007 00:49:36,660 --> 00:49:38,550 1008 00:49:38,550 --> 00:49:40,560 1009 00:49:40,560 --> 00:49:42,180 1010 00:49:42,180 --> 00:49:44,190 1011 00:49:44,190 --> 00:49:45,900 1012 00:49:45,900 --> 00:49:50,160 1013 00:49:50,160 --> 00:49:52,590 1014 00:49:52,590 --> 00:49:54,900 1015 00:49:54,900 --> 00:49:58,470 1016 00:49:58,470 --> 00:50:05,030 1017 00:50:05,030 --> 00:50:15,380 1018 00:50:15,380 --> 00:50:17,090 1019 00:50:17,090 --> 00:50:20,390 1020 00:50:20,390 --> 00:50:22,910 1021 00:50:22,910 --> 00:50:28,370 1022 00:50:28,370 --> 00:50:30,680 1023 00:50:30,680 --> 00:50:32,090 1024 00:50:32,090 --> 00:50:33,620 1025 00:50:33,620 --> 00:50:36,410 1026 00:50:36,410 --> 00:50:38,270 1027 00:50:38,270 --> 00:50:40,670 1028 00:50:40,670 --> 00:50:43,340 1029 00:50:43,340 --> 00:50:45,320 1030 00:50:45,320 --> 00:50:47,000 1031 00:50:47,000 --> 00:50:49,910 1032 00:50:49,910 --> 00:50:51,770 1033 00:50:51,770 --> 00:50:53,120 1034 00:50:53,120 --> 00:50:54,830 1035 00:50:54,830 --> 00:50:57,320 1036 00:50:57,320 --> 00:50:59,420 1037 00:50:59,420 --> 00:51:01,070 1038 00:51:01,070 --> 00:51:03,260 1039 00:51:03,260 --> 00:51:05,990 1040 00:51:05,990 --> 00:51:23,750 1041 00:51:23,750 --> 00:51:26,520 1042 00:51:26,520 --> 00:51:37,210 1043 00:51:37,210 --> 00:51:45,799 1044 00:51:45,799 --> 00:51:47,660 1045 00:51:47,660 --> 00:51:50,299 1046 00:51:50,299 --> 00:51:53,510 1047 00:51:53,510 --> 00:51:55,880 1048 00:51:55,880 --> 00:52:03,349 1049 00:52:03,349 --> 00:52:06,049 1050 00:52:06,049 --> 00:52:09,260 1051 00:52:09,260 --> 00:52:11,510 1052 00:52:11,510 --> 00:52:15,470 1053 00:52:15,470 --> 00:52:17,120 1054 00:52:17,120 --> 00:52:21,069 1055 00:52:21,069 --> 00:52:23,990 1056 00:52:23,990 --> 00:52:26,299 1057 00:52:26,299 --> 00:52:27,920 1058 00:52:27,920 --> 00:52:32,210 1059 00:52:32,210 --> 00:52:35,000 1060 00:52:35,000 --> 00:52:38,900 1061 00:52:38,900 --> 00:52:41,089 1062 00:52:41,089 --> 00:52:43,099 1063 00:52:43,099 --> 00:52:45,950 1064 00:52:45,950 --> 00:52:47,839 1065 00:52:47,839 --> 00:52:52,670 1066 00:52:52,670 --> 00:53:01,190 1067 00:53:01,190 --> 00:53:10,110 1068 00:53:10,110 --> 00:53:13,360 1069 00:53:13,360 --> 00:53:15,040 1070 00:53:15,040 --> 00:53:17,980 1071 00:53:17,980 --> 00:53:19,990 1072 00:53:19,990 --> 00:53:21,940 1073 00:53:21,940 --> 00:53:24,280 1074 00:53:24,280 --> 00:53:27,820 1075 00:53:27,820 --> 00:53:31,830 1076 00:53:31,830 --> 00:53:34,240 1077 00:53:34,240 --> 00:53:35,530 1078 00:53:35,530 --> 00:53:38,100 1079 00:53:38,100 --> 00:53:41,680 1080 00:53:41,680 --> 00:53:44,020 1081 00:53:44,020 --> 00:53:45,610 1082 00:53:45,610 --> 00:53:49,200 1083 00:53:49,200 --> 00:53:52,000 1084 00:53:52,000 --> 00:53:53,890 1085 00:53:53,890 --> 00:53:55,930 1086 00:53:55,930 --> 00:54:01,810 1087 00:54:01,810 --> 00:54:04,060 1088 00:54:04,060 --> 00:54:10,660 1089 00:54:10,660 --> 00:54:12,730 1090 00:54:12,730 --> 00:54:15,280 1091 00:54:15,280 --> 00:54:16,750 1092 00:54:16,750 --> 00:54:18,220 1093 00:54:18,220 --> 00:54:26,500 1094 00:54:26,500 --> 00:54:29,830 1095 00:54:29,830 --> 00:54:35,650 1096 00:54:35,650 --> 00:54:37,810 1097 00:54:37,810 --> 00:54:39,610 1098 00:54:39,610 --> 00:54:41,740 1099 00:54:41,740 --> 00:54:44,440 1100 00:54:44,440 --> 00:54:45,880 1101 00:54:45,880 --> 00:54:50,860 1102 00:54:50,860 --> 00:54:53,350 1103 00:54:53,350 --> 00:54:55,270 1104 00:54:55,270 --> 00:54:57,520 1105 00:54:57,520 --> 00:54:59,290 1106 00:54:59,290 --> 00:55:01,720 1107 00:55:01,720 --> 00:55:04,960 1108 00:55:04,960 --> 00:55:06,640 1109 00:55:06,640 --> 00:55:07,900 1110 00:55:07,900 --> 00:55:09,580 1111 00:55:09,580 --> 00:55:14,380 1112 00:55:14,380 --> 00:55:16,930 1113 00:55:16,930 --> 00:55:20,050 1114 00:55:20,050 --> 00:55:21,820 1115 00:55:21,820 --> 00:55:23,230 1116 00:55:23,230 --> 00:55:25,120 1117 00:55:25,120 --> 00:55:27,460 1118 00:55:27,460 --> 00:55:29,049 1119 00:55:29,049 --> 00:55:32,920 1120 00:55:32,920 --> 00:55:34,980 1121 00:55:34,980 --> 00:55:37,630 1122 00:55:37,630 --> 00:55:39,460 1123 00:55:39,460 --> 00:55:41,799 1124 00:55:41,799 --> 00:55:44,980 1125 00:55:44,980 --> 00:55:46,660 1126 00:55:46,660 --> 00:55:49,240 1127 00:55:49,240 --> 00:55:52,180 1128 00:55:52,180 --> 00:55:54,280 1129 00:55:54,280 --> 00:55:56,410 1130 00:55:56,410 --> 00:55:59,589 1131 00:55:59,589 --> 00:56:02,770 1132 00:56:02,770 --> 00:56:06,400 1133 00:56:06,400 --> 00:56:08,020 1134 00:56:08,020 --> 00:56:09,490 1135 00:56:09,490 --> 00:56:11,260 1136 00:56:11,260 --> 00:56:13,180 1137 00:56:13,180 --> 00:56:17,109 1138 00:56:17,109 --> 00:56:19,809 1139 00:56:19,809 --> 00:56:26,470 1140 00:56:26,470 --> 00:56:28,569 1141 00:56:28,569 --> 00:56:30,430 1142 00:56:30,430 --> 00:56:33,849 1143 00:56:33,849 --> 00:56:35,920 1144 00:56:35,920 --> 00:56:38,109 1145 00:56:38,109 --> 00:56:40,599 1146 00:56:40,599 --> 00:56:42,640 1147 00:56:42,640 --> 00:56:46,240 1148 00:56:46,240 --> 00:56:48,520 1149 00:56:48,520 --> 00:56:51,309 1150 00:56:51,309 --> 00:56:53,609 1151 00:56:53,609 --> 00:56:53,619 1152 00:56:53,619 --> 00:56:59,380 1153 00:56:59,380 --> 00:57:01,670 1154 00:57:01,670 --> 00:57:03,530 1155 00:57:03,530 --> 00:57:06,620 1156 00:57:06,620 --> 00:57:09,530 1157 00:57:09,530 --> 00:57:11,000 1158 00:57:11,000 --> 00:57:13,360 1159 00:57:13,360 --> 00:57:16,400 1160 00:57:16,400 --> 00:57:18,680 1161 00:57:18,680 --> 00:57:20,780 1162 00:57:20,780 --> 00:57:22,820 1163 00:57:22,820 --> 00:57:24,800 1164 00:57:24,800 --> 00:57:29,560 1165 00:57:29,560 --> 00:57:36,980 1166 00:57:36,980 --> 00:57:39,470 1167 00:57:39,470 --> 00:57:42,410 1168 00:57:42,410 --> 00:57:44,690 1169 00:57:44,690 --> 00:57:48,950 1170 00:57:48,950 --> 00:57:51,680 1171 00:57:51,680 --> 00:57:53,690 1172 00:57:53,690 --> 00:57:56,030 1173 00:57:56,030 --> 00:57:59,329 1174 00:57:59,329 --> 00:58:02,270 1175 00:58:02,270 --> 00:58:05,150 1176 00:58:05,150 --> 00:58:08,089 1177 00:58:08,089 --> 00:58:10,640 1178 00:58:10,640 --> 00:58:12,589 1179 00:58:12,589 --> 00:58:15,109 1180 00:58:15,109 --> 00:58:17,359 1181 00:58:17,359 --> 00:58:18,800 1182 00:58:18,800 --> 00:58:21,560 1183 00:58:21,560 --> 00:58:23,599 1184 00:58:23,599 --> 00:58:26,359 1185 00:58:26,359 --> 00:58:29,030 1186 00:58:29,030 --> 00:58:32,210 1187 00:58:32,210 --> 00:58:34,460 1188 00:58:34,460 --> 00:58:37,099 1189 00:58:37,099 --> 00:58:41,300 1190 00:58:41,300 --> 00:58:42,920 1191 00:58:42,920 --> 00:58:46,339 1192 00:58:46,339 --> 00:58:50,930 1193 00:58:50,930 --> 00:58:52,339 1194 00:58:52,339 --> 00:58:55,970 1195 00:58:55,970 --> 00:58:58,690 1196 00:58:58,690 --> 00:59:01,070 1197 00:59:01,070 --> 00:59:04,820 1198 00:59:04,820 --> 00:59:07,910 1199 00:59:07,910 --> 00:59:11,540 1200 00:59:11,540 --> 00:59:12,800 1201 00:59:12,800 --> 00:59:15,890 1202 00:59:15,890 --> 00:59:18,650 1203 00:59:18,650 --> 00:59:21,470 1204 00:59:21,470 --> 00:59:24,200 1205 00:59:24,200 --> 00:59:25,940 1206 00:59:25,940 --> 00:59:28,310 1207 00:59:28,310 --> 00:59:30,950 1208 00:59:30,950 --> 00:59:32,930 1209 00:59:32,930 --> 00:59:35,329 1210 00:59:35,329 --> 00:59:38,480 1211 00:59:38,480 --> 00:59:41,740 1212 00:59:41,740 --> 00:59:44,030 1213 00:59:44,030 --> 00:59:48,490 1214 00:59:48,490 --> 00:59:50,599 1215 00:59:50,599 --> 00:59:52,849 1216 00:59:52,849 --> 00:59:54,950 1217 00:59:54,950 --> 00:59:58,099 1218 00:59:58,099 --> 01:00:02,960 1219 01:00:02,960 --> 01:00:05,599 1220 01:00:05,599 --> 01:00:08,720 1221 01:00:08,720 --> 01:00:10,490 1222 01:00:10,490 --> 01:00:12,290 1223 01:00:12,290 --> 01:00:13,609 1224 01:00:13,609 --> 01:00:16,430 1225 01:00:16,430 --> 01:00:27,740 1226 01:00:27,740 --> 01:00:29,510 1227 01:00:29,510 --> 01:00:33,080 1228 01:00:33,080 --> 01:00:34,700 1229 01:00:34,700 --> 01:00:38,890 1230 01:00:38,890 --> 01:00:41,870 1231 01:00:41,870 --> 01:00:44,960 1232 01:00:44,960 --> 01:00:48,260 1233 01:00:48,260 --> 01:00:50,330 1234 01:00:50,330 --> 01:00:52,040 1235 01:00:52,040 --> 01:00:54,950 1236 01:00:54,950 --> 01:00:57,380 1237 01:00:57,380 --> 01:01:00,470 1238 01:01:00,470 --> 01:01:03,830 1239 01:01:03,830 --> 01:01:06,380 1240 01:01:06,380 --> 01:01:08,240 1241 01:01:08,240 --> 01:01:09,710 1242 01:01:09,710 --> 01:01:13,160 1243 01:01:13,160 --> 01:01:15,380 1244 01:01:15,380 --> 01:01:18,080 1245 01:01:18,080 --> 01:01:20,710 1246 01:01:20,710 --> 01:01:23,480 1247 01:01:23,480 --> 01:01:25,160 1248 01:01:25,160 --> 01:01:29,960 1249 01:01:29,960 --> 01:01:32,510 1250 01:01:32,510 --> 01:01:35,810 1251 01:01:35,810 --> 01:01:40,190 1252 01:01:40,190 --> 01:01:41,900 1253 01:01:41,900 --> 01:01:44,840 1254 01:01:44,840 --> 01:01:48,250 1255 01:01:48,250 --> 01:01:50,720 1256 01:01:50,720 --> 01:01:52,849 1257 01:01:52,849 --> 01:01:55,099 1258 01:01:55,099 --> 01:01:57,349 1259 01:01:57,349 --> 01:01:59,090 1260 01:01:59,090 --> 01:02:00,890 1261 01:02:00,890 --> 01:02:00,900 1262 01:02:00,900 --> 01:02:01,950 1263 01:02:01,950 --> 01:02:05,700 1264 01:02:05,700 --> 01:02:08,400 1265 01:02:08,400 --> 01:02:10,200 1266 01:02:10,200 --> 01:02:15,960 1267 01:02:15,960 --> 01:02:18,390 1268 01:02:18,390 --> 01:02:20,940 1269 01:02:20,940 --> 01:02:23,730 1270 01:02:23,730 --> 01:02:25,590 1271 01:02:25,590 --> 01:02:27,480 1272 01:02:27,480 --> 01:02:30,030 1273 01:02:30,030 --> 01:02:33,780 1274 01:02:33,780 --> 01:02:35,730 1275 01:02:35,730 --> 01:02:37,680 1276 01:02:37,680 --> 01:02:47,400 1277 01:02:47,400 --> 01:02:50,280 1278 01:02:50,280 --> 01:02:52,530 1279 01:02:52,530 --> 01:02:54,570 1280 01:02:54,570 --> 01:02:57,690 1281 01:02:57,690 --> 01:03:03,300 1282 01:03:03,300 --> 01:03:06,090 1283 01:03:06,090 --> 01:03:09,480 1284 01:03:09,480 --> 01:03:12,480 1285 01:03:12,480 --> 01:03:16,110 1286 01:03:16,110 --> 01:03:19,170 1287 01:03:19,170 --> 01:03:21,030 1288 01:03:21,030 --> 01:03:24,920 1289 01:03:24,920 --> 01:03:28,200 1290 01:03:28,200 --> 01:03:31,470 1291 01:03:31,470 --> 01:03:34,530 1292 01:03:34,530 --> 01:03:37,290 1293 01:03:37,290 --> 01:03:46,450 1294 01:03:46,450 --> 01:03:49,630 1295 01:03:49,630 --> 01:03:51,880 1296 01:03:51,880 --> 01:03:53,829 1297 01:03:53,829 --> 01:03:56,079 1298 01:03:56,079 --> 01:03:58,569 1299 01:03:58,569 --> 01:04:00,250 1300 01:04:00,250 --> 01:04:05,710 1301 01:04:05,710 --> 01:04:07,960 1302 01:04:07,960 --> 01:04:07,970 1303 01:04:07,970 --> 01:04:08,920 1304 01:04:08,920 --> 01:04:10,690 1305 01:04:10,690 --> 01:04:12,280 1306 01:04:12,280 --> 01:04:15,849 1307 01:04:15,849 --> 01:04:18,700 1308 01:04:18,700 --> 01:04:20,500 1309 01:04:20,500 --> 01:04:22,359 1310 01:04:22,359 --> 01:04:25,150 1311 01:04:25,150 --> 01:04:28,210 1312 01:04:28,210 --> 01:04:30,220 1313 01:04:30,220 --> 01:04:32,049 1314 01:04:32,049 --> 01:04:34,599 1315 01:04:34,599 --> 01:04:37,000 1316 01:04:37,000 --> 01:04:39,069 1317 01:04:39,069 --> 01:04:42,460 1318 01:04:42,460 --> 01:04:44,760 1319 01:04:44,760 --> 01:04:47,559 1320 01:04:47,559 --> 01:04:50,559 1321 01:04:50,559 --> 01:04:54,039 1322 01:04:54,039 --> 01:04:57,490 1323 01:04:57,490 --> 01:04:59,890 1324 01:04:59,890 --> 01:05:02,349 1325 01:05:02,349 --> 01:05:05,670 1326 01:05:05,670 --> 01:05:08,950 1327 01:05:08,950 --> 01:05:11,859 1328 01:05:11,859 --> 01:05:14,620 1329 01:05:14,620 --> 01:05:17,230 1330 01:05:17,230 --> 01:05:22,030 1331 01:05:22,030 --> 01:05:25,990 1332 01:05:25,990 --> 01:05:27,640 1333 01:05:27,640 --> 01:05:29,980 1334 01:05:29,980 --> 01:05:34,390 1335 01:05:34,390 --> 01:05:37,000 1336 01:05:37,000 --> 01:05:38,950 1337 01:05:38,950 --> 01:05:42,430 1338 01:05:42,430 --> 01:05:44,980 1339 01:05:44,980 --> 01:05:47,920 1340 01:05:47,920 --> 01:05:51,309 1341 01:05:51,309 --> 01:05:53,980 1342 01:05:53,980 --> 01:05:55,690 1343 01:05:55,690 --> 01:05:58,960 1344 01:05:58,960 --> 01:06:00,490 1345 01:06:00,490 --> 01:06:01,840 1346 01:06:01,840 --> 01:06:03,970 1347 01:06:03,970 --> 01:06:05,770 1348 01:06:05,770 --> 01:06:08,110 1349 01:06:08,110 --> 01:06:09,850 1350 01:06:09,850 --> 01:06:12,700 1351 01:06:12,700 --> 01:06:15,700 1352 01:06:15,700 --> 01:06:18,580 1353 01:06:18,580 --> 01:06:20,680 1354 01:06:20,680 --> 01:06:23,620 1355 01:06:23,620 --> 01:06:26,020 1356 01:06:26,020 --> 01:06:28,600 1357 01:06:28,600 --> 01:06:30,850 1358 01:06:30,850 --> 01:06:35,170 1359 01:06:35,170 --> 01:06:39,910 1360 01:06:39,910 --> 01:06:43,780 1361 01:06:43,780 --> 01:06:45,880 1362 01:06:45,880 --> 01:06:48,100 1363 01:06:48,100 --> 01:06:50,350 1364 01:06:50,350 --> 01:06:54,400 1365 01:06:54,400 --> 01:07:07,050 1366 01:07:07,050 --> 01:07:09,870 1367 01:07:09,870 --> 01:07:12,150 1368 01:07:12,150 --> 01:07:15,180 1369 01:07:15,180 --> 01:07:16,770 1370 01:07:16,770 --> 01:07:20,130 1371 01:07:20,130 --> 01:07:22,500 1372 01:07:22,500 --> 01:07:24,360 1373 01:07:24,360 --> 01:07:26,220 1374 01:07:26,220 --> 01:07:27,720 1375 01:07:27,720 --> 01:07:30,180 1376 01:07:30,180 --> 01:07:39,600 1377 01:07:39,600 --> 01:07:42,240 1378 01:07:42,240 --> 01:07:44,040 1379 01:07:44,040 --> 01:07:46,050 1380 01:07:46,050 --> 01:07:49,080 1381 01:07:49,080 --> 01:07:50,730 1382 01:07:50,730 --> 01:07:53,040 1383 01:07:53,040 --> 01:07:55,170 1384 01:07:55,170 --> 01:07:58,680 1385 01:07:58,680 --> 01:08:00,750 1386 01:08:00,750 --> 01:08:03,210 1387 01:08:03,210 --> 01:08:04,710 1388 01:08:04,710 --> 01:08:07,530 1389 01:08:07,530 --> 01:08:10,080 1390 01:08:10,080 --> 01:08:10,090 1391 01:08:10,090 --> 01:08:10,590 1392 01:08:10,590 --> 01:08:13,500 1393 01:08:13,500 --> 01:08:15,360 1394 01:08:15,360 --> 01:08:17,640 1395 01:08:17,640 --> 01:08:22,440 1396 01:08:22,440 --> 01:08:25,740 1397 01:08:25,740 --> 01:08:27,690 1398 01:08:27,690 --> 01:08:31,680 1399 01:08:31,680 --> 01:08:33,300 1400 01:08:33,300 --> 01:08:36,210 1401 01:08:36,210 --> 01:08:37,710 1402 01:08:37,710 --> 01:08:40,860 1403 01:08:40,860 --> 01:08:42,510 1404 01:08:42,510 --> 01:08:46,170 1405 01:08:46,170 --> 01:08:49,530 1406 01:08:49,530 --> 01:08:52,650 1407 01:08:52,650 --> 01:08:54,780 1408 01:08:54,780 --> 01:08:57,960 1409 01:08:57,960 --> 01:09:00,059 1410 01:09:00,059 --> 01:09:01,590 1411 01:09:01,590 --> 01:09:03,090 1412 01:09:03,090 --> 01:09:07,590 1413 01:09:07,590 --> 01:09:09,570 1414 01:09:09,570 --> 01:09:13,320 1415 01:09:13,320 --> 01:09:16,349 1416 01:09:16,349 --> 01:09:19,710 1417 01:09:19,710 --> 01:09:20,849 1418 01:09:20,849 --> 01:09:22,320 1419 01:09:22,320 --> 01:09:26,220 1420 01:09:26,220 --> 01:09:28,919 1421 01:09:28,919 --> 01:09:33,240 1422 01:09:33,240 --> 01:09:35,820 1423 01:09:35,820 --> 01:09:38,910 1424 01:09:38,910 --> 01:09:40,829 1425 01:09:40,829 --> 01:09:43,680 1426 01:09:43,680 --> 01:09:47,809 1427 01:09:47,809 --> 01:09:51,959 1428 01:09:51,959 --> 01:09:54,479 1429 01:09:54,479 --> 01:09:56,820 1430 01:09:56,820 --> 01:10:00,479 1431 01:10:00,479 --> 01:10:02,129 1432 01:10:02,129 --> 01:10:05,129 1433 01:10:05,129 --> 01:10:06,660 1434 01:10:06,660 --> 01:10:08,220 1435 01:10:08,220 --> 01:10:09,330 1436 01:10:09,330 --> 01:10:14,790 1437 01:10:14,790 --> 01:10:18,990 1438 01:10:18,990 --> 01:10:20,879 1439 01:10:20,879 --> 01:10:25,830 1440 01:10:25,830 --> 01:10:27,180 1441 01:10:27,180 --> 01:10:28,410 1442 01:10:28,410 --> 01:10:31,250 1443 01:10:31,250 --> 01:10:34,109 1444 01:10:34,109 --> 01:10:36,390 1445 01:10:36,390 --> 01:10:38,879 1446 01:10:38,879 --> 01:10:41,040 1447 01:10:41,040 --> 01:10:42,720 1448 01:10:42,720 --> 01:10:45,000 1449 01:10:45,000 --> 01:10:46,890 1450 01:10:46,890 --> 01:10:51,540 1451 01:10:51,540 --> 01:10:53,760 1452 01:10:53,760 --> 01:10:57,300 1453 01:10:57,300 --> 01:10:59,490 1454 01:10:59,490 --> 01:11:02,160 1455 01:11:02,160 --> 01:11:03,629 1456 01:11:03,629 --> 01:11:06,149 1457 01:11:06,149 --> 01:11:07,800 1458 01:11:07,800 --> 01:11:10,830 1459 01:11:10,830 --> 01:11:12,959 1460 01:11:12,959 --> 01:11:15,200 1461 01:11:15,200 --> 01:11:17,100 1462 01:11:17,100 --> 01:11:18,689 1463 01:11:18,689 --> 01:11:21,540 1464 01:11:21,540 --> 01:11:24,149 1465 01:11:24,149 --> 01:11:29,760 1466 01:11:29,760 --> 01:11:31,830 1467 01:11:31,830 --> 01:11:33,540 1468 01:11:33,540 --> 01:11:34,290 1469 01:11:34,290 --> 01:11:35,880 1470 01:11:35,880 --> 01:11:38,400 1471 01:11:38,400 --> 01:11:39,900 1472 01:11:39,900 --> 01:11:41,250 1473 01:11:41,250 --> 01:11:44,370 1474 01:11:44,370 --> 01:11:56,910 1475 01:11:56,910 --> 01:11:59,790 1476 01:11:59,790 --> 01:12:01,650 1477 01:12:01,650 --> 01:12:03,630 1478 01:12:03,630 --> 01:12:07,080 1479 01:12:07,080 --> 01:12:11,160 1480 01:12:11,160 --> 01:12:12,480 1481 01:12:12,480 --> 01:12:15,210 1482 01:12:15,210 --> 01:12:18,000 1483 01:12:18,000 --> 01:12:21,240 1484 01:12:21,240 --> 01:12:23,430 1485 01:12:23,430 --> 01:12:25,800 1486 01:12:25,800 --> 01:12:28,050 1487 01:12:28,050 --> 01:12:33,660 1488 01:12:33,660 --> 01:12:35,220 1489 01:12:35,220 --> 01:12:38,040 1490 01:12:38,040 --> 01:12:40,470 1491 01:12:40,470 --> 01:12:43,910 1492 01:12:43,910 --> 01:12:47,040 1493 01:12:47,040 --> 01:12:49,830 1494 01:12:49,830 --> 01:12:52,470 1495 01:12:52,470 --> 01:12:55,370 1496 01:12:55,370 --> 01:12:58,710 1497 01:12:58,710 --> 01:13:00,600 1498 01:13:00,600 --> 01:13:03,360 1499 01:13:03,360 --> 01:13:06,210 1500 01:13:06,210 --> 01:13:08,280 1501 01:13:08,280 --> 01:13:10,770 1502 01:13:10,770 --> 01:13:12,570 1503 01:13:12,570 --> 01:13:14,370 1504 01:13:14,370 --> 01:13:18,030 1505 01:13:18,030 --> 01:13:19,920 1506 01:13:19,920 --> 01:13:21,450 1507 01:13:21,450 --> 01:13:22,830 1508 01:13:22,830 --> 01:13:27,150 1509 01:13:27,150 --> 01:13:30,640 1510 01:13:30,640 --> 01:13:32,500 1511 01:13:32,500 --> 01:13:34,060 1512 01:13:34,060 --> 01:13:37,750 1513 01:13:37,750 --> 01:13:44,920 1514 01:13:44,920 --> 01:13:46,810 1515 01:13:46,810 --> 01:13:49,510 1516 01:13:49,510 --> 01:13:51,610 1517 01:13:51,610 --> 01:13:53,709 1518 01:13:53,709 --> 01:13:57,910 1519 01:13:57,910 --> 01:14:01,420 1520 01:14:01,420 --> 01:14:04,270 1521 01:14:04,270 --> 01:14:06,880 1522 01:14:06,880 --> 01:14:09,160 1523 01:14:09,160 --> 01:14:12,150 1524 01:14:12,150 --> 01:14:14,680 1525 01:14:14,680 --> 01:14:16,060 1526 01:14:16,060 --> 01:14:17,709 1527 01:14:17,709 --> 01:14:20,170 1528 01:14:20,170 --> 01:14:22,090 1529 01:14:22,090 --> 01:14:26,680 1530 01:14:26,680 --> 01:14:29,229 1531 01:14:29,229 --> 01:14:31,450 1532 01:14:31,450 --> 01:14:33,220 1533 01:14:33,220 --> 01:14:36,550 1534 01:14:36,550 --> 01:14:38,260 1535 01:14:38,260 --> 01:14:40,540 1536 01:14:40,540 --> 01:14:44,620 1537 01:14:44,620 --> 01:14:46,440 1538 01:14:46,440 --> 01:14:50,170 1539 01:14:50,170 --> 01:14:52,090 1540 01:14:52,090 --> 01:14:55,090 1541 01:14:55,090 --> 01:14:57,100 1542 01:14:57,100 --> 01:14:59,740 1543 01:14:59,740 --> 01:15:02,979 1544 01:15:02,979 --> 01:15:05,890 1545 01:15:05,890 --> 01:15:08,080 1546 01:15:08,080 --> 01:15:13,709 1547 01:15:13,709 --> 01:15:17,290 1548 01:15:17,290 --> 01:15:27,339 1549 01:15:27,339 --> 01:15:31,129 1550 01:15:31,129 --> 01:15:32,839 1551 01:15:32,839 --> 01:15:34,520 1552 01:15:34,520 --> 01:15:37,010 1553 01:15:37,010 --> 01:15:55,260 1554 01:15:55,260 --> 01:16:01,860 1555 01:16:01,860 --> 01:16:05,890 1556 01:16:05,890 --> 01:16:10,460 1557 01:16:10,460 --> 01:16:12,920 1558 01:16:12,920 --> 01:16:15,950 1559 01:16:15,950 --> 01:16:17,450 1560 01:16:17,450 --> 01:16:22,370 1561 01:16:22,370 --> 01:16:23,870 1562 01:16:23,870 --> 01:16:26,270 1563 01:16:26,270 --> 01:16:33,080 1564 01:16:33,080 --> 01:16:38,300 1565 01:16:38,300 --> 01:16:39,920 1566 01:16:39,920 --> 01:16:42,020 1567 01:16:42,020 --> 01:16:44,480 1568 01:16:44,480 --> 01:16:49,460 1569 01:16:49,460 --> 01:16:51,170 1570 01:16:51,170 --> 01:16:54,440 1571 01:16:54,440 --> 01:16:57,640 1572 01:16:57,640 --> 01:17:00,530 1573 01:17:00,530 --> 01:17:02,420 1574 01:17:02,420 --> 01:17:04,970 1575 01:17:04,970 --> 01:17:09,920 1576 01:17:09,920 --> 01:17:11,750 1577 01:17:11,750 --> 01:17:14,630 1578 01:17:14,630 --> 01:17:16,970 1579 01:17:16,970 --> 01:17:19,430 1580 01:17:19,430 --> 01:17:22,400 1581 01:17:22,400 --> 01:17:24,800 1582 01:17:24,800 --> 01:17:27,950 1583 01:17:27,950 --> 01:17:32,120 1584 01:17:32,120 --> 01:17:33,410 1585 01:17:33,410 --> 01:17:38,750 1586 01:17:38,750 --> 01:17:42,560 1587 01:17:42,560 --> 01:17:44,300 1588 01:17:44,300 --> 01:17:46,550 1589 01:17:46,550 --> 01:17:48,530 1590 01:17:48,530 --> 01:17:50,690 1591 01:17:50,690 --> 01:17:53,420 1592 01:17:53,420 --> 01:17:55,580 1593 01:17:55,580 --> 01:17:57,650 1594 01:17:57,650 --> 01:17:59,690 1595 01:17:59,690 --> 01:18:02,240 1596 01:18:02,240 --> 01:18:04,460 1597 01:18:04,460 --> 01:18:07,540 1598 01:18:07,540 --> 01:18:10,430 1599 01:18:10,430 --> 01:18:12,650 1600 01:18:12,650 --> 01:18:14,480 1601 01:18:14,480 --> 01:18:16,940 1602 01:18:16,940 --> 01:18:16,950 1603 01:18:16,950 --> 01:18:17,630 1604 01:18:17,630 --> 01:18:19,190 1605 01:18:19,190 --> 01:18:20,960 1606 01:18:20,960 --> 01:18:23,660 1607 01:18:23,660 --> 01:18:25,220 1608 01:18:25,220 --> 01:18:28,190 1609 01:18:28,190 --> 01:18:30,140 1610 01:18:30,140 --> 01:18:31,820 1611 01:18:31,820 --> 01:18:33,410 1612 01:18:33,410 --> 01:18:35,390 1613 01:18:35,390 --> 01:18:44,240 1614 01:18:44,240 --> 01:18:46,580 1615 01:18:46,580 --> 01:18:48,230 1616 01:18:48,230 --> 01:18:50,060 1617 01:18:50,060 --> 01:18:52,270 1618 01:18:52,270 --> 01:18:54,350 1619 01:18:54,350 --> 01:18:57,530 1620 01:18:57,530 --> 01:19:03,950 1621 01:19:03,950 --> 01:19:05,420 1622 01:19:05,420 --> 01:19:07,430 1623 01:19:07,430 --> 01:19:09,740 1624 01:19:09,740 --> 01:19:12,830 1625 01:19:12,830 --> 01:19:16,100 1626 01:19:16,100 --> 01:19:18,440 1627 01:19:18,440 --> 01:19:20,390 1628 01:19:20,390 --> 01:19:22,340 1629 01:19:22,340 --> 01:19:23,660 1630 01:19:23,660 --> 01:19:25,460 1631 01:19:25,460 --> 01:19:27,230 1632 01:19:27,230 --> 01:19:29,420 1633 01:19:29,420 --> 01:19:32,870 1634 01:19:32,870 --> 01:19:34,790 1635 01:19:34,790 --> 01:19:37,790 1636 01:19:37,790 --> 01:19:39,380 1637 01:19:39,380 --> 01:19:43,490 1638 01:19:43,490 --> 01:19:45,440 1639 01:19:45,440 --> 01:19:47,120 1640 01:19:47,120 --> 01:19:48,440 1641 01:19:48,440 --> 01:19:51,290 1642 01:19:51,290 --> 01:19:53,030 1643 01:19:53,030 --> 01:19:54,980 1644 01:19:54,980 --> 01:19:57,930 1645 01:19:57,930 --> 01:19:57,940 1646 01:19:57,940 --> 01:20:06,050 1647 01:20:06,050 --> 01:20:06,060 1648 01:20:06,060 --> 01:20:08,120