1 00:00:01,069 --> 00:00:03,189 the following content is provided under a Creative Commons license your support will help MIT OpenCourseWare continue to offer high quality educational resources for free to make a donation or to view additional materials from hundreds of MIT courses visit MIT opencourseware at ocw.mit.edu it's my great pleasure to introduce again TB Charl TB is is not only a fabulous world-class performance engineer he is a world-class performance Mehta engineer I in other words building the tools and and such to make it so that people can engineer fast code and he's the author of the technology that we're using in our compiler the taper technology that's in the open silk compiler for parallelism so he implemented all of that and all the optimizations and so forth which has greatly improved the quality of the programming environment so today he's going to talk about something near and dear to his heart which is compilers and what they can and cannot do great thank you very much for that introduction can everyone hear me in the back yes great all right so as I understand it last lecture you talked about multi-threaded algorithms and you spent the lecture studying those algorithms analyzing them in a theoretical sense essentially analyzing their asymptotic running time work and span complexity this lecture is not bad at all we're not going to do that kind of math anywhere in the course of this lecture instead this lecture is going to take a look at compilers as Professor elijah had mentioned and what compilers can and cannot do so the last time you saw me standing up here was back in lecture 5 and during that lecture we talked about the mir and x86 64 assembly and how C code got translated into assembly code via LLVM ir in this lecture we're gonna talk more about what happens between the LLVM ir and assembly stages and essentially that's what happens when the compiler is allowed to edit and optimize the code in its IR representation while it's producing the assembly so last time we were talking about this IR in the assembly and this time they called the compiler guy back I suppose to tell you about the boxes in the middle now even though you're predominantly dealing with C code within this class I hope that some of the lessons from today's lecture you will be able to take away into any job that you pursue in the future because there are a lot of languages today that do end up being compiled C and C++ rust Swift even Haskell julia halide the list goes on and on and those languages all get compiled for a variety of different what we call backends different machine architectures not just x86 64 and in fact a lot of those a lot of those languages get compiled using very similar compilation technology to what you have in the clang LLVM compiler that you're using in this class in fact many of those languages today are optimized by LLVM itself LLVM is the internal engine within the compiler that actually does all of the optimization so that's my hope that the lessons you'll learn here today don't just apply to 172 they'll in fact apply to software they use and develop for many years down the road but let's take a step back and ask ourselves why bother studying the compiler optimizations at all why should we take a look at what's going on within this up to this point black box of software any ideas any suggestions and back [Music] you can avoid manually trying to optimize things that the compiler will do for you great answer great great answer any other answers you can learn how to write your code to take advantage of the compiler optimizations how do you know suggest to the compiler what it should or should not do as you're constructing your program great answer as well very good in the front it can absolutely help for debugging when the compiler itself has bugs the compiler is a big piece of software and you may have noticed that a lot of software contains bugs the compiler is no exception and it helps to understand where the compiler might have made a mistake or where the complier simply just didn't do what you thought it should be able to do understanding more of what happens in the compiler can demystify some of those oddities good answer hey hey yes any other thoughts it's fun well okay so in my completely biased opinion I would agree that it's fun to understand what the compiler does you may have different opinions that's okay I won't judge so I put together a list of reasons why in general we may care about what goes on inside the compiler i alighted that last point from this list my bad compiler is gonna have a really big impact on software it's kind of like it's kind of like this imagine that you're working on some software project and you have a teammate on your team who's pretty quiet but extremely smart and what that teammate does is whenever that teammate gets access to some code they jump in and immediately start trying to make that code work faster and that's really cool because that teammate does good work and oftentimes you see that what the teammate produces is indeed much faster code than what you wrote now in other industries you might just sit back and say teammate does fantastic work maybe they don't talk very often but that's okay teammate you do you but in this class we're performance engineers we want to understand what that teammate did to the software how did that teammate get so much performance out of the code the compiler is kind of like that teammate and so understanding what the compiler does is valuable in that sense as mentioned before compilers can save you performance engineering work if you understand that the compiler can do some optimization for you then you don't have to do it yourself and that means that you can continue writing simple and readable and maintainable code without sacrificing performance you can also understand the differences between the source code and whatever you might see show up and either the LLVM ir or the assembly if you have to look at the assembly language produced when you produced for your executable and compilers can make mistakes sometimes that's because of a genuine bug in the compiler and other times it's because the compiler just couldn't understand something about what was going on and having some insight into how the compiler reasons about code can help you understand why those mistakes were made or figure out ways to get a work around those mistakes or let you write meaningful bug reports to the compiler developers and of course understanding compilers can help you use them more effectively plus I think it's fun so the first thing to understand about a compiler is a basic anatomy of how the compiler works the compiler takes as input LLVM ir and up until this point we thought of it as just a big black box that does stuff to the IR and out pops more LVM ir but it's somehow optimized in fact what's going on within that black box the compiler is executing a sequence of what we call transformation passes all the code each transformation pass takes a look takes a look at its input and analyzes that code and then tries to edit the code in an effort to optimize the codes performance now a transformation pass might end up running multiple times and those passes run in some order that order ends up being a predetermined predetermined order that the compiler writers found to work pretty well on their tests that's about the level of insight that went into picking the order they seemed it seems to work well now some good news in terms of trying to understand what the compiler does you can actually just ask the compiler what did you do and you've already used this functionality as I understand in some of your assignments you've already asked the compiler to give you a report specifically about whether or not it could vectorize some code but in fact LLVM the clutter you have access to can produce reports not just for vectorization but for a lot of the different transformation passes that it tries to perform and there's some syntax that you have to pass to the compiler some compiler flags I have to specify in order to get those reports those are described on the slide I won't walk you through that text so you can look at the slides afterwards at the end of the day this string that you're passing is actually a regular expression if you know what regular expressions are great then you can use that to narrow down the search for for your report if you don't and you just want to see the whole report just provide a dot star as a string and you're good to go that's the good news you can get the compiler to tell you exactly what it did the bad news is that when you ask the compiler what it did it'll give you a report and that report looks something like this in fact I've elided most of the report for this particular piece of code because the report ends up being very long and as you might notice just from reading some of the texts there are definitely English words in this text and there are pointers to pieces of code that you've compiled but it is very jargony and hard to understand this isn't the easiest piece of this isn't the easier to report to make sense of okay so that's some good news and some bad news about these compiler reports the good news is you can ask the compiler and it'll happily tell you all about the things that it did you can tell you about which transformation passes we're successfully able to transform the code it can tell you conclusions that it drew about its analysis of the code but the bad news is these reports are kind of complicated they're they can be long they use a lot of internal compiler jargon which if you're not familiar with that jargon it makes it hard to understand it also turns out that not all of the transformation passes in the compiler give you these nice reports so you don't get to see the whole picture and in general the reports don't really tell you the whole story about what the compiler did or did not do and we'll see another example of that later on so part of the goal of today's lecture is to get some context for understanding the reports that you might see if you pass it as flags to the compiler and the structure of the today's lecture it's basically divided up into two parts first I want to give you some examples of compiler optimizations just simple examples so you get a sense as to how a compiler mechanically reasons about the code it's given and tries to optimize that code we'll take a look at how the compiler optimizes a single scalar value how it can optimize a structure how can optimize function calls and how it can optimize loops just simple examples to give some flavor and then the second half of lecture I have a few case studies for you which get into diagnosing ways in which the compiler fails I not do two bugs per se but simply didn't do an optimization you might have expected it to do not to be frank I think all of those case studies are really cool but it's not totally crucial that we get through every single case study during today's lecture the slides will be available afterwards so when we get to that part we'll just see how many case studies we can cover sound good any questions so far all right let's get to it let's start with a quick overview of compiler optimizations so here is a summary of the various options oh I forgot that I just copied this slide from a previous lecture given in this class you might recognize a slide I think from lecture two sorry about that that's okay well we can fix this we'll just go ahead and add this slide right now we need to change the title so we'll just cross that out and and put in our new title okay so yeah great and now well we should double check these lists and make sure that they're accurate data structures we'll come back to data structures loops hoisting yeah the compiler can do hoisting sentinels not really compilers not not good at sentinels loop unrolling yeah it absolutely does loop and rolling loop fusion yeah it can but there are some restrictions that apply you know your mileage might vary eliminate waste iterations some restrictions might apply ok logic constant folding and propagation yeah it's good on that it's common sub-expression elimination yeah I can find common sub-expressions you're fine they're algebraic idea knows algebra yeah good short-circuiting yes absolutely ordering tests depends on the tests I'll give it to the compiler but all I'll say restrictions apply creating a fast path compilers aren't that smart about fast paths they come up with really boring fast paths I'm gonna take that off the list combining tests again it kind of depends on the tests functions compilers are pretty good at functions so inlining yep can absolute do that tail recursion elimination yes absolutely corseting not so much ok great let's come back to data structures which we skipped before packing augmentee ok honestly the compiler does a lot with data structures but really none of those things the good pottery isn't smart about data structures in that particular way really the way that the compiler is smart about data structures is shown here if we expand this list to include even more compiler optimizations bottom line with data structures the compiler knows a lot about architecture and it really it's it's put a lot of effort into figuring out how to use registers really effectively reading and writing a register is super fast touching memory is not so fast and so the compiler works really hard to allocate registers put anything that lives in memory ordinarily into registers manipulate aggregate types to use registers as we'll see in a couple slides a line data that has to live in memory compilers are good at that compiler is also good at loops we already saw some example observation of the previous slide it can vectorize it does a lot of other cool stuff on switching is a cool optimization that I won't cover here idiom replacement it finds common patterns and does something smart with those fission skewing tiling interchange those all try to process the iterations of the loop in some clever way to make stuff go fast and some restrictions apply those are really in development in LLVM logic it doesn't a lot more with logic than what we saw before it can eliminate instructions that aren't necessary it can do strength reduction and other cool optimization I think we saw that one in the Bentley slides it gets rid of dead code it can do more idiom replacement branch routing is kind like reordering tests global value numbering another cool optimization that we won't talk about today functions that can do more on switching it can eliminate arguments that aren't necessary so the compiler can do a lot of stuff for you and at the end the day writing down this whole list is kind of a futile activity because it changes over time compilers are a moving target compiler developers their software engineers like you and me and they're clever and they're trying to apply all their clever software engineering practice to this compiler code base to make it do more stuff and so they're constantly adding new optimizations the compiler new clever analyses all the time so really what we're gonna look at today is just a couple examples to get a flavor for what the compiler does internally now if you want to follow along with how the compiler works the good news is by and large you can take a look at the LLVM ir to see what happens as a compiler processes your code you don't need to look at the assembly that's generally true but there are some exceptions so for example if we have these three snippets of C code on the left and we look at what the what your LLVM compiler generates in terms of the ir we can see that there are some optimizations reflected but not too many interesting ones the the x eight turns into a shift left operation by three because eight is a power two that's straightforward good we can see that in the ir the x 15 still looks like a x 15 no changes there the / 71 looks like a / 71 again no changes there now with arithmetic ops the difference between what you see in the LVM ir and what you see in the assembly this is where it's most pronounced at least in my experience because if we take a look at these same snippets of C code and we look at the corresponding x86 assembly for it we get the stuff on the right and this looks different let's pick through what this assembly code does well my in time so the first one and the C code takes the argument n and multiplies it by 8 and in the assembly we have this le a instruction anyone remember what the Le a instruction does I see one person shaking their head that's a perfectly reasonable response yeah go for it load effective address what does that mean load the address but don't actually access memory another way to phrase that compute the do this address calculation and give me the result of the address calculation don't read or write memory at that address just do the calculation that's what loading and effective address means essentially but you're exactly right the elia instruction does an address calculation and stores the results in the register on the right anyone remember enough about x86 address calculations to tell me how that le a in particular works the first le a on the slide yeah but before the first comma in this case nothing gets added to the product of the second two arguments in those parens you're exactly right so this le a takes the value eight multiplies it by whatever is in register R di which holds the value N and it stores the result into EAX so perfect it does a multiplied by eight the address calculator is only capable of a small range of operations it can do additions and it can multiply by one to four or eight that's it so it's a really simple circuit and the hardware but it's fast it's optimized heavily by modern processors and so if the compiler can use it they tend to try to use these le instructions so good job how about the next one multiplied by 15 turns into these two le a instructions can anyone tell me how these work you're basically multiplying by five and multiplying by three exactly right we can step through this as well if we look at the first Elliot instruction we take our TI which stores the value n we multiply that by four we add it to the original value of r di and so that computes four times n plus n which is five times N and that result gets stored into EAX good we've effectively multiplied by five the next instruction takes whatever is in our a X which is now 5 n multiplies that by 2 adds it to whatever is currently in our X which is once again 5 n so that computes 2 times 5 n + 5 n which is 3 times 5 n which is 15 n so just like that we've done our multiply with two le a instructions how about the last one in this last piece of code we take the argument in our TI and we move it into a X we then move the value three billion eight hundred and seventy 1 million five hundred nineteen thousand eight hundred and seventeen and put that into e CX as you do we multiply those two values together and then we shift the product right by 38 so obviously this divides by 71 any guesses as to how this performs the division operation we want both of you answered I might still call on you I'll give someone I'll give it a little more time for someone else to raise their hand go for it it has a lot to do with two to 38 very good yeah all right any any further guesses before I give the answer away yeah in the back it kinda so this is what's technically called a magic number and yes it's technically called a magic number and this magic number is equal to two to the thirty eight divided by seventy one plus one to deal with some rounding effects what this code does is it says let's compute n divided by 71 by first computing and divided by 71 times two to the thirty eight and then shifting off the lower thirty eight bits with that shift right operation and because it can by converting the operation into this it's able to replace the division operation with a multiply and if you remember hopefully from the architecture lecture multiply operations they're not the cheapest things in the world but they're not too bad division is really expensive if you want fast code never divide also never compute modulus or act or access memory yeah question why did I choose 38 I think it shows 38 because 38 works there's actually a formula for pretty much it doesn't want to choose a valley that's too large or else it'll overflow and it doesn't want to choose a value that's too small or else you lose precision so it's able to find a balancing point if you want to know more about magic numbers I recommend checking out this book called hackers delight for any of you who are familiar with this book it is a book full of bit tricks seriously that's the entire book it's just a book full of bit Rick's and there's a whole section in there describing how you do division by various constants using multiplication either signed or unsigned it's very cool but yeah magic number to convert a division into a multiply that's the that's the kind of thing that you might see from the assembly that's one of these examples of arithmetic operations that are really optimized at the very last step but for the rest of the optimizations for say we could focus on the ir any questions about that so far cool okay so for the next part of the lecture I want to show you a couple examples ations in terms of the LLVM ir and to show you these optimizations will have a little bit of code that will work through a running example if you will and this wording example will be some code that I stole from I think it's a serial program that simulates the behavior of and massive bodies in 2d space under the law of gravitation so we've got a whole bunch of point masses those point masses have varying masses and we just want to simulate what happens due to gravity as these masses interacting in the plane at a high level the N body code is pretty simple we have a top level simulate routine which just loops over all the time steps during which we want to perform this simulation and I each time step it calculates the various forces in acting on those different and then it updates the position of each body based on those forces in order to do that calculation it has some internal data structures one to represent each body which contains a couple of vector types and we define our own vector type to store two double precision floating point values now we don't see the entire code in order to look at some compiler optimizations but one routine that we will take a look at is this one to update the positions this is a simple loop that takes each body one at a time computes the new velocity on that body based on the forces acting on that body and uses vector operations to do that and then it updates the position of that body again using these vector operations that we've defined and then it stores the results into video structure for that body so all these methods with this code make use of these basic routines on 2d vectors points in X Y or points in 2d space and these routines are pretty simple there's one to add two vectors there's another to scale a vector by a scalar value and there's a third to compute the length which we won't actually look at too much everyone good so far okay so let's try to start simple let's take a look at just one of these one line vector operations vex scale all vex scale does is it takes one of these vector one of these vector inputs and a scalar value a and it multiplies X by a and Y by a and stores the results into a vector type and returns it great couldn't be simpler if we compile this with no opposition's whatsoever and we take a look at the LVM ir we get that which is a little more complicated than you might imagine the good news though is that if you turn on optimizations and you just turn on the first law of optimization just oh one whereas we got this code before now we get this which is far far simpler and so simple I can blow up the font size so you can actually read read the code on this slide so just see again no optimizations optimizations so a lot of stuff happened to get from to optimize this simple function we're gonna see what those optimizations actually were but first let's pick apart what's going on in this function we have our vex scale routine in LV Mir it takes a structure as its first argument and that's represented using two doubles it takes the scalar as the second argument and what the what the operation does is it multiplies those two those two fields by the third argument that double A and then packs those values into ade struct that'll return and finally it returns that struct so that's what the optimized code does let's see actually how we get to this optimized code we'll do this one step at a time let's start by optimizing the operations on a single scalar value that's why I picked this example so if we go back to the o 0 code and we just pick out the operations that dealt with that scalar value we narrow ourselves we narrow our scope down to just these lines so the double the argument double a is the final argument in the list and what we see is that within the VEX scale routine compile that o 0 we allocate some local storage we store that double a into the local storage and then later on we'll load the value out of the local storage before the multiply and then we load it again before the other multiply ok any idea is how we could make this code faster don't store it in memory what a great idea how would we get around not storing in memory saving your register in particular what property of LV Mir makes out really easy there are infinite registers and in fact the argument is already in a register it's already in the register percent to article so we don't need to move it into a register it's already there so how do we go about optimizing that code in this case well let's find the places where we're using the value and we're using the value loaded from memory and what we're gonna do is just replace those those loads for memory with the original argument we know exactly what operation we're trying to do we know we're trying to all we're trying to do a multiplied by the original parameter so we just find those two uses we cross them out and we put in the input parameter in his place that makes sense questions so far cool so now those multiplies aren't using the values returned by the loads how further can we optimize this code sorry delete the loads what else can we delete so there's no address calculation here just because the code is so simple but good insight the allocation and and the store great so those loads are dead code the store is dead code the allocation is dead code we eliminate all that dead code we convert those loads we just use the value of living in the register and we've already eliminated a bunch of instructions so the net effect of that was to turn the code optimize at o 0 that we had in the background into the code we have in the foreground which is slightly shorter but not that much so it's a little bit faster but not that much faster how do we optimize this function further do it for every variable we have in particular the only other variable we have is this structure that we're passing in so we want to do this kind of optimization on the structure make sense so let's see how we optimize this structure now the problem is that structures are harder to handle than individual scalar values because in general you can't store the whole structure in just a single register it's a it's more complicated to juggle all the data within a structure but nevertheless let's take a look at the code that operates on the structure or at least operates on the structure that we pass in to the function so when we eliminate all the other code we see that we've got an allocation see if I have animations here yeah I do we have an allocation so we can store the structure onto the stack then we have an address calculation that lets us store the first part of the structure onto the stack we have a second address calculation to store the second field on the stack and later on when we need those values we load the first one out of the first field out of memory and we load the second field out of memory this is a very similar pattern to what we had before except we've got more going on in this case so how do we go about optimizing this structure any ideas high level ideas ultimately we'll want to get rid of all of the memory references and all that storage for the structure how do we reason through how do we reason through eliminating all that stuff in a mechanical fashion based on what we've seen so far go for it they are passed in using separate parameters separate registers if you will as a quirk of how LLVM does it so given that insight what would you how would you optimize it Crossout percent 12 percent 6 and put in their relevance the relevant field cool let me phrase that a little bit differently let's do this one field at a time we've got a structure which has multiple fields let's just let's just take it one step at a time all right so let's look at the first field and let's look at the operations that deal with the first field we have in our code in our LV Mir some address calculations that refer to the same field of the structure in this case I believe it's the first field yes okay and ultimately we end up loading from this from this location in local memory so what value is this load going to retrieve how do we know that both address calculations refer to the same field good question what we do in this case is very careful analysis of the math that's going on we know that the the aleca the the location in local memory that's just a fixed location and from that we can interpret what each of the instructions does in terms of an address calculation and we and we can determine that they're the same value so we had this location in memory that we operate on and before we ultimately use before you do a multiply we end up loading from that location in memory so what value do we know is going to be loaded by that load instruction go for it [Music] not putting it back but we don't even worry about putting it back so correct correct so what are we multiplying of that multiply which value first element of the struct its percent zero it's a value that we stored right there that makes sense everyone see that any questions about that all right so we're storing the first element of the struct into this location later we load it out of that same location nothing else happened to that location so let's go ahead and optimize it just the same way we optimize the scaler we see that we use the result of the load right there but we know that that load is going to return the first field of our struct input so we'll just cross out and replace it with that field so now we're not using the result of that load what do we get to do as the compiler you tell you know the answer delete the dead code delete all of us remove the now dead code which is all those address calculations as well as the load operation and the store operation and that's pretty much it yeah good so we replaced that we were placed on operation and we got rid of a bunch of other code from our from our function we've now optimized one of the two fields in our struct what do we do next optimize the next one that happens similarly I won't walk you through that a second time refine where we're using the result of that load we can cross it out replace it with the appropriate input and then delete all of the relevant dead code and now we get to delete the original allocation because nothing is getting stored to that memory that makes sense any questions about that yeah yep when we first compiled LVM our do we unpack the struct and pass in the separate parameters yeah yeah so LLVM I are in this case when we compile that it was zero decided to pass it as separate parameters just as they represent eight just as its representation so in that sense yes but it was still doing the the standard create some local storage destroy the parameters out to local storage and then all operations just read out of local storage it's the standard thing that the compiler generates when when it's asked to compile C code and with no other optimizations that's what you get that makes sense yeah what are all the aligned eights doing the align eights are attributes that does that specify the alignment of that location in memory this is alignment information that the compiler either determines by analysis or implements as part of a standard so they're specifying how values are aligned in memory that matters a lot more for ultimate code generation unless we're able to just delete the memory references all together make sense cool any other questions all right so we optimized the first field we can optimize the second field a similar way turns out there's additional optimizations that need to happen in order to return a structure from this function those operations can be optimized in a similar way they're shown here we're not going to go through exactly how that works but at the end of the day after we've optimized all of that code we end up with this we end up with our function compiled at oh one and it's far simpler I think it's far more intuitive this is what I would imagine the code should look like when I've written the C code when I wrote the C code in the first place take your input do a couple multiplications and then it does some operations to create the return value and ultimately return that value so in summary the compilers the compiler works hard to transform data structures and scalar values to store as much as it possibly can purely within registers and avoid using any local storage if possible everyone good with that so far cool let's move on to another optimization let's talk about function calls let's take a look at how the compiler can optimize function calls by and large these optimizations will occur if you pass optimization level two or higher yeah this FYI so from our original C code we had some lines that performed a bunch of vector operations we had a Veck add that added two vectors together one of which was the result of X scale we're scaled the result of a CAD by some scalar value so we had this chain of this chain of calls in our code and if we take a look at the code compile that it was zero well we end up with is the snippet shown on the bottom which performs some operations on these vector structures does this multiply operation and then calls the sveck scalar routine the VEX field routine that we've decided to focus on first so how do we go about any ideas for how we go about optimizing this so to get a little bit of a hint what the compiler sees when it looks at that call is it sees this snippet containing the call instruction and also see in in our example it also sees the code for the VEX scalar function that we were just looking at and we're gonna suppose that it's already optimized of x scale as best as it can it's produced this code for the VEX scalar routine and so it sees that call instruction and it sees this code for the function that's being called so what could the compiler do at this point to try to make the function in the the code and the the code above even faster you're exactly right remove the call I just put the body of the VEX kale code right there in place of the call it takes a little bit of effort to pull that off but roughly speaking yeah we're just gonna copy and paste this code in our function into the place where where we're calling the function and so if we do that simple copy/paste we end up with some garbage code as an intermediate we had to do a little bit of renaming to make everything work out but at this point we have the code from our function in the place of that call and now we can observe that to restore correctness we don't want to do the call and we don't want to do the return that we just paste it in to paste it in place so we'll just go ahead and remove both that call and the return that is called function inlining we identify some function call or goodbye identify some function call and it takes the body of the function and just paste it right in place of that call so good make sense anyone confused raise your hand if you're confused alright now once you've done some amount of function inlining we can actually do some more optimizations so here we have the code after we got rid of the unnecessary call and returned and we have a couple multiply operation sitting in place that looks fine but if we expand our scope just a little bit what we see so we have some operations happening that we're sitting there already after the original call and what the compliant do is it can take a look at these instructions and long story short it realizes that all these instructions do is pack some data into a structure and then immediately unpack the structure so it's like you put a bunch of stuff into a bag and then you immediately dumped out the bag that was kind of a waste of time that's kind of a waste of code let's get rid of it right those operations are useless let's delete them Copiah has a great time deleting dead code but it's like it's what it lives to do all right now in fact in the original code we didn't just have one function call we had a whole sequence of function calls and if we expand our LVM I our snippet even a little further we can include those two function calls the original call to Veck at followed by the code that we've now optimized by inlining ultimately followed by yet another call to bec add minor spoiler the Veck add routine once it's optimized looks pretty similar to the VEX scalar routine and in particular as comparable size to the VEX scale routine so what's the compiler going to do to those two call sites and line it do more in lining in lining is great well in line these functions as well and then remove all the additional now useless instructions won't walk through that process but the result of that process looks something like this so the original C code we had this Beck ad which called the VEX scale as one of his arguments which called it Beck ad as one of its arguments and what we ended up with in the optimized IR is just a bunch of straight line code that performs floating-point operations it's almost as if the compiler took the original C code and transformed it into the equivalent C code shown on the bottom where it just operates on a whole bunch of doubles and just as primitive operations so function inlining as well as the additional transformations it was able to perform as a results together those were able to eliminate all of those function calls it was able to completely eliminate any costs associated with the function call abstraction at least in this code make sense I think that's pretty cool you write code that has a bunch of function calls because that's how you've constructed your interfaces but you're not really paying for those function calls function calls aren't the cheapest operation in the world especially if you think about everything that goes on in terms of the registers and the stack but the compiler is able to avoid all that overhead and just perform the floating-point operations we care about okay well a function inlining is so great and enables so many great optimizations why doesn't the compiler just inline every function call go for it recursion it's really hard to inline a recursive call in general you can't inline a function into itself although it turns out there are some exceptions so yes recursion creates problems with function inlining any other thoughts in the back you're you're definitely onto something so we had to do a bunch of this renaming stuff when we inline the first time and when we inline every single time and even though LOV Mir has an infinite number of registers the Machine doesn't and so all of that renaming does create a problem but there are other problems as well of a similar nature when you start inlining all those functions for example you copy pasted a bunch of code and that made the original call site even bigger and bigger and bigger and bigger and in and programs we generally don't think about the space they take in memory but they do take space in memory and that does have an impact on performance so great answer any other thoughts if your function becomes too long then it may not fit in an instruction cache and that can increase the amount of time it takes just to execute the function right because you're now not getting hash hits exactly right that's one of the problems with this code size blow up from inlining everything any other thoughts any final thoughts so the three main reasons why the compiler won't inline every function I think we touched on two of them here for some function calls like recursive calls it's impossible to inline them because you can't inline a function into itself but there are exceptions to that namely recursive tail calls if the last thing in a function is a function call then it turns out you can effectively inline that function call as an optimization we're not going to talk too much about how that works but there are there are corner cases but in general you can't inline a recursive call ah the compiler has another problem namely if you've if the function that you're calling is in a different Castle if it's in a different compilation unit then it's literally in a different file like that's compiled independently then the compiler can't very well inline that function because it doesn't know about the function it doesn't have access to that functions code there is a way to get around that problem with modern compiler technology that involves whole program optimization and I think there's some backup size that will tell you how to do that with LVM but in general if it's in a different compilation unit it can't be in line and finally as touched on function airline can increase code size which can hurt performance okay so some functions are okay to inline other functions could create this performance problem because you increase code size so how does the phone how does the compiler know whether or not in lining any particular function at a call site could hurt performance any guesses yeah yeah yeah so the compiler has some cost model which gives it some information about how much will it cost to in line that function is the cost model always right it is not so the answer how does it apply or no is really it doesn't know it makes a best guess using that cost model and other heuristics to determine when does it make sense to try to inline a function and because it's making a best guess sometimes the compiler guesses wrong so to wrap up this part here just a couple of tips for controlling function inlining in your own programs if there's a function that you know must always be inlined no matter what happens you can mark that function with a special attribute namely the always in line attribute for example if you have a function that does some complex address calculation and it should be inlined rather than called you may want to mark that with and always in line attribute similarly if you have a function that really should never be inlined it's never cost-effective to inline that function you can mark that function with the know inline attribute and finally if you want to enable more function inlining in the compiler you can use link time optimization or LTO to enable whole program optimization I won't go into that during these slides let's move on and talk about loop optimizations any questions so far before we before we continue yeah sorry - I keel static does static in line guarantee that the function that the compiler will always inline it it actually doesn't the in line keyword will provide a hint to the compiler that it should think about in lying the function but it doesn't provide any guarantees if you want a strong guarantee use the always in line attribute good question though all right noop optimizations you've already so you've already seen some move authorizations you've seen vectorization for example it turns out that the compiler does a lot of work to try to optimize loops so first why is that why would the compiler engineers invest so much effort into optimizing loops while loops in particular they're an extremely common control structure that also has a branch both things are true I think there's a higher level reason though or a more fundamental reason if you will most of the time the loop takes up the most time you got it loops account for a lot of the execution time of programs the way I like to think about this is with a really simple thought experiment let's imagine that you've got a machine with the two gigahertz processor we chosen these values to be easier to think about using mental math suppose you've got a two gigahertz processor with 16 cores each core executes one instruction per cycle and suppose that you've got a program which contains a trillion instructions and ample parallelism for those 16 cores but all of those instructions are simple straight-line code there are no branches there are no loops there are no complicated operations like oh like i/o it's just a bunch of really simple straight line code each instruction takes a cycle that executes the processor excuse one instruction per cycle how long does it take to run this code to execute the entire terabyte binary to the fortieth cycles for to the forty instructions but you're using a two gigahertz processor and 16 cores and you got ample parallelism in the program to keep them all saturated so how much time 32 seconds nice nice job this one has mastered power-of-two arithmetic in one's head it's a good skill to have especially in core sex yeah so if you have just a bunch of simple straight line code and you have a terabyte of it now that's a lot of code that is a big binary and yet the program this processor this relatively simpler processor can execute the whole thing in just just about 30 seconds now in your experience working with software you might have noticed that there are some programs that take longer than 30 seconds to run and some of those programs don't have terabyte size binaries the reason that those programs take longer to run by and large is loops so loops account for a lot of the execution time in real programs now you've already seen some move Altimas asians we're just gonna take a look at one other looped authorization today namely code hoisting also known as loop invariant code motion and to look at that we're gonna take another we're gonna take a look at a different snippet of code from the n-body simulation this code calculates the forces going on each of the end bodies and it does it with a doubly nested loop for all the bodies your number of bodies for all bodies your number of bodies as long as you're not looking at the same body call this add force routine which calculates to calculating the force on that the force between those two bodies and add that force to one of the bodies it's all that's going on in this code if we translate this code into LLVM ir we end up with hopefully unsurprisingly a doubly nested loop it looks something like this the body of the code the body of the innermost loop has been lighted just so things can fit on the slide well we can see the the overall structure on the outside we have some outer loop control this should look familiar from lecture 5 hopefully inside of that outer loop we have an inner loop and at the top and the bottom that inner loop the inner loop control and within that inner loop we do have one branch which can skip a bunch of code if you're looking at the same body pry and Jane but otherwise we have the loop body of the innermost loop basics charge now we just take as if we just zoom in on the top part of this doubly nested loop just the topmost three basic blocks and take a look at more of the code that's going on here we end up with something that looks like this and if you remember some of the discussion from lecture 5 about loop induction variables and what that looks like in LP miR what you find is that for the outer loop we have an induction variable at the very top it's that fee that we are fee instruction once again inside that outer loop we have the loop control for the inner loop which has its own induction variable once again we have another fee node that's how we can spot it and then we have the body of the innermost loop and this is just the start of it it's just a couple address calculations but can anyone tell me some interesting property about just a couple of these address calculations that could lead to an optimization the first to address calculations only depend on the outermost loop variable the the iteration variable for the outer loop exactly right so what can we do with those instructions bring them out of the inner loop why should we keep computing these addresses in the innermost loop when we could just compute them once in the outer loop that optimization is called code hoisting or loop invariant code motion those instructions are invariance to the code in the innermost loop so you hoist them out and once you highways them out you end up with a transformed loop that looks something like this what we have is the same outer loop control at the very top but now we're doing some address calculations there and we no longer have those address calculations on the inside and as a results those who stood calculations are performed just once per iteration of the outer loop rather than once per iteration of the inner loop and so those instructions are run far fewer times you get to save a lot of running time so the effect of this optimization in terms of C code because it can be a little tedious to look at LV miR is essentially like this we took this doubly nested loop in C we were calling add force of blah blah blah calculate force blah blah blah and now we just move the address calculation to get the ice body that we care about we move that to the outer loop now this was an example of loop invariant code motion on just a couple address calculations in general the compiler will try to prove that some calculation is invariants across all the iterations of a loop and whenever it can prove that it will try to hoist that code out of the loop if you can get code out of the body of a loop that reduces the running time of the loop saves a lot of execution time huge bang for the buck makes sense any questions about that so far all right so just to summarize this part what can the compiler do the compiler optimizes code by performing a sequence of transformation passes all those passes are pretty mechanical the compiler goes through the code it tries to find some property like this address calculations the same as that address calculation and so this load will return the same value as that store and so and so forth and based on that analysis it tries to get rid of some dead code and replace replace certain register values with other words your values replace things I live in memory with things that just live in registers a lot of the transformations resemble Bentley rule work optimizations so you've seen in lecture two so as you're studying for the your upcoming quiz you can kind of get two for one by looking at those Bentley rule optimizations and one transformation pass in particular function inlining was a good example of this one transformation can enable other transformations and those together can compound to give you fast code in general compilers perform a lot more transformations than just the ones that we saw today but there are things that the compiler can't do here's one very simple example in this case we're taking another look at this calculate forces routine although the compiler can optimize the code by moving address calculations out of the loop one thing that it can't do is exploit symmetry in the problem so in this problem what's going on is we're computing the forces on any pair of bodies using the law of gravitation and it turns out that the force acting on this on one body by another is exactly opposite the force acting on the other body by the one so f of 1 2 is equal to minus F of 2 1 the compiler will not figure that out the compiler knows algebra it doesn't know physics so won't be able to figure out that there's symmetry in this problem and it can avoid wasted operations make sense all right so that was an overview of some simple compiler optimizations we now have some failure some examples of some case studies to see where the compiler can I can get tripped up and doesn't matter if we get through all of these or not you'll have access to the slides afterwards but I think these are kind of cool so shall we take a look all right simple question does the compiler vectorize this loop so just to go over what this loop does it's a simple loop the function takes two in two vectors as inputs and or two arrays as inputs I should say an array called Y of like then an array of X or an array X of length N and some scalar value a and all that this function does is it loops over each element of the vector multiplies X of I by the input scalar adds there the product into Y sub I so does the loop vectorize yes why next could overlap and there is no information about whether they overlap so do they vectorize we have a vote for know anyone think that does vectorize you made a very convincing argument so everyone think it everyone believes that this loop does not vectorize it's a true anyone uncertain anyone unwilling to commit to yes or no right here all right a bunch of people are unwilling to commit to yes or no all right let's resolve this question let's first ask for the report let's look at the vectorization report we compile it we pass the flags together vectorization report and the vectorization report says yes it does vectorize this loop which is interesting because we have this great argument that says but we you don't know how these addresses fit in memory you don't know if x and y overlap with each other how can you possibly vectorize kind of a mystery well if we take a look at the actual compiled code when we optimize this at o2 it turns out you can pass certain flags through the compiler and get it to print out not just the LLVM AR but the LLVM are formatted as a control flow graph and the control flow graph for this simple two line function is the thing on the right which you obviously can't read and it's a little bit because it's a little bit small in terms of its text and there seems to have a lot going on so I took the liberty of redrawing that control flow graph with none of the code inside just to get a just get a picture of what the structure looks like for this compiled function and structurally speaking and looks like this and with a bit of practice staring at control flow graphs which you might get if you spend way too much time working on compilers you might look at this control photograph and think this craft looks a little too complicated for the to line function that we gave as input so what's going on here well we've got three different loops in this code and it turns out that one of those loops is full of vector operations okay the other two loops are not full of vector operations that's unvectorized code and then there's this basic block right at the top that has a conditional branch at the end of it branching to either the vectorize loop or the unvectorized loop and yeah there's a lot of other control flow going on as well but we can focus on just these components for the time being so what's that conditional branch doing well we can zoom in on just this one basic block and actually show it to be readable on the slide and the basic block looks like this so let's just study this L of U in my our code in this case we have that the address wise for in register 0 the address of X is stored in register 2 and register 3 stores the value of n so one instruction at a time who can tell me what the first instruction in this code does yes gets the address of why it doesn't look is that that we said okay so it does use the address of why it's an address calculation that operates on register 0 which stores the address of why but it's not just computing the address of why it's getting the address of the nth element of why it's adding in whatever is in register 3 which is the value n into the address of Y so that computes the address y plus n this is this is testing your memory of pointer arithmetic and see just a little bit but don't worry it won't be it won't be too rough so that's what the first address calculation does what is the next instruction do that cookies X plus n very good how about the next one it compares whether an expo side inflight Wilson or compares x + 1 + y vs y + n [Music] right so it does a comparison we'll take that a little more slowly does a comparison of X + n versus Y and checks is X plus and greater than willing perfect how about the next instruction yeah compares y+ n versus X is y plus n greater than X how about the last instruction before the branch yep go for it there was and of the results so this computes the comparison his Expo site in grades then why bitwise and he's y plus or his Michels N greater than X fair enough so what does the result of that condition mean I think we've pretty much already spoiled the answer anyone want to area one last time I had this whole set up cohort checks if they overlap so let's look at this condition in a couple of different situations if we have X living in one place in memory and Y living in another place in memory then no matter how we resolve this condition if we check in if we check is both y plus M greater than X and X plus n greater than Y the results will be false but if we have this situation where x and y overlap in memory for some amount of for some portion of memory then it turns out the regardless of whether X or Y is first x plus n will be greater than y y plus 7 will be greater than X and the condition will return true in other words the condition which is true if and only if these portions of memory 20 by x + y alias so going back to our original looping code we have a situation where we have a branch based on whether or not the alias and if they and in one case it executes Spector eyes the vector eyes loop and in another case the execute San on vectorize code X are returning to our original question in particular to actually use a vectorized loop if they don't alias so returning to our original question do they does this code get vectorized the answer is yes and no so if you voted yes you're actually right if you go to no and you were persuaded you were right and if you didn't commit to an answer I can't help you but that's interesting the compiler actually generated multiple versions of this due to uncertainty about memory aliasing yeah question so the question is could the compiler figure out this condition statically while it's compiling the function because we know the function is going to get called from somewhere the answer is sometimes it can a lot of times it can't if it's not capable of in lying this function for example then it probably doesn't have enough information to tell whether or not these two pointers will alias for example you're just building a library with a bunch of vector routines you don't know the code that's gonna call this routine eventually now in general Mable memory aliasing this this be the last point before we wrap up in general memory is and can cause a lot of issues when it comes to compiler optimization now you can make call us a compiler to act very conservatively in this example we have a simple serial base case for a matrix multiply routine but we don't know anything about the pointers to the C a or b matrices and when we try to compile this and optimize it the compiler complains that it can't do loop invariant code motion because it doesn't know anything about these pointers it could be that the pointer changes within the innermost loop so it can't move some calculation out to an outer loop compilers try to deal with this statically using an analysis technique called alias analysis and they do try very hard to figure out when are these pointers going to alias or what when are they guaranteed to not alias now in general it turns out that alias analysis isn't just hard it's undecidable if only it were hard maybe we'd have some hope but compilers in practice are faced with this undecidable question and they try a variety of tricks to get useful alias analysis results in practice for example based on information in the source code the compiler might annotate instructions with various metadata to track this aliasing information for example TB a a is aliasing information based on types there's some scoping information for aliasing there's some information that's it's guaranteed not to alias with this other operation all kinds of metadata now what can you do as a programmer to avoid these issues memory aliasing always annotate your pointers kids always annotate your pointers the restored you've seen before it tells the compiler address calculations based off this pointer won't alias with address calculations based off other pointers the cost keyword provides a little more information and says these addresses will be read from they won't be written to and that can enable a lot more compiler optimizations now that's all the time that we have there are a couple other cool case studies in the slides if you're welcome to peruse the slides afterwards thanks for your listening 2 00:00:03,189 --> 00:00:05,769 the following content is provided under a Creative Commons license your support will help MIT OpenCourseWare continue to offer high quality educational resources for free to make a donation or to view additional materials from hundreds of MIT courses visit MIT opencourseware at ocw.mit.edu it's my great pleasure to introduce again TB Charl TB is is not only a fabulous world-class performance engineer he is a world-class performance Mehta engineer I in other words building the tools and and such to make it so that people can engineer fast code and he's the author of the technology that we're using in our compiler the taper technology that's in the open silk compiler for parallelism so he implemented all of that and all the optimizations and so forth which has greatly improved the quality of the programming environment so today he's going to talk about something near and dear to his heart which is compilers and what they can and cannot do great thank you very much for that introduction can everyone hear me in the back yes great all right so as I understand it last lecture you talked about multi-threaded algorithms and you spent the lecture studying those algorithms analyzing them in a theoretical sense essentially analyzing their asymptotic running time work and span complexity this lecture is not bad at all we're not going to do that kind of math anywhere in the course of this lecture instead this lecture is going to take a look at compilers as Professor elijah had mentioned and what compilers can and cannot do so the last time you saw me standing up here was back in lecture 5 and during that lecture we talked about the mir and x86 64 assembly and how C code got translated into assembly code via LLVM ir in this lecture we're gonna talk more about what happens between the LLVM ir and assembly stages and essentially that's what happens when the compiler is allowed to edit and optimize the code in its IR representation while it's producing the assembly so last time we were talking about this IR in the assembly and this time they called the compiler guy back I suppose to tell you about the boxes in the middle now even though you're predominantly dealing with C code within this class I hope that some of the lessons from today's lecture you will be able to take away into any job that you pursue in the future because there are a lot of languages today that do end up being compiled C and C++ rust Swift even Haskell julia halide the list goes on and on and those languages all get compiled for a variety of different what we call backends different machine architectures not just x86 64 and in fact a lot of those a lot of those languages get compiled using very similar compilation technology to what you have in the clang LLVM compiler that you're using in this class in fact many of those languages today are optimized by LLVM itself LLVM is the internal engine within the compiler that actually does all of the optimization so that's my hope that the lessons you'll learn here today don't just apply to 172 they'll in fact apply to software they use and develop for many years down the road but let's take a step back and ask ourselves why bother studying the compiler optimizations at all why should we take a look at what's going on within this up to this point black box of software any ideas any suggestions and back [Music] you can avoid manually trying to optimize things that the compiler will do for you great answer great great answer any other answers you can learn how to write your code to take advantage of the compiler optimizations how do you know suggest to the compiler what it should or should not do as you're constructing your program great answer as well very good in the front it can absolutely help for debugging when the compiler itself has bugs the compiler is a big piece of software and you may have noticed that a lot of software contains bugs the compiler is no exception and it helps to understand where the compiler might have made a mistake or where the complier simply just didn't do what you thought it should be able to do understanding more of what happens in the compiler can demystify some of those oddities good answer hey hey yes any other thoughts it's fun well okay so in my completely biased opinion I would agree that it's fun to understand what the compiler does you may have different opinions that's okay I won't judge so I put together a list of reasons why in general we may care about what goes on inside the compiler i alighted that last point from this list my bad compiler is gonna have a really big impact on software it's kind of like it's kind of like this imagine that you're working on some software project and you have a teammate on your team who's pretty quiet but extremely smart and what that teammate does is whenever that teammate gets access to some code they jump in and immediately start trying to make that code work faster and that's really cool because that teammate does good work and oftentimes you see that what the teammate produces is indeed much faster code than what you wrote now in other industries you might just sit back and say teammate does fantastic work maybe they don't talk very often but that's okay teammate you do you but in this class we're performance engineers we want to understand what that teammate did to the software how did that teammate get so much performance out of the code the compiler is kind of like that teammate and so understanding what the compiler does is valuable in that sense as mentioned before compilers can save you performance engineering work if you understand that the compiler can do some optimization for you then you don't have to do it yourself and that means that you can continue writing simple and readable and maintainable code without sacrificing performance you can also understand the differences between the source code and whatever you might see show up and either the LLVM ir or the assembly if you have to look at the assembly language produced when you produced for your executable and compilers can make mistakes sometimes that's because of a genuine bug in the compiler and other times it's because the compiler just couldn't understand something about what was going on and having some insight into how the compiler reasons about code can help you understand why those mistakes were made or figure out ways to get a work around those mistakes or let you write meaningful bug reports to the compiler developers and of course understanding compilers can help you use them more effectively plus I think it's fun so the first thing to understand about a compiler is a basic anatomy of how the compiler works the compiler takes as input LLVM ir and up until this point we thought of it as just a big black box that does stuff to the IR and out pops more LVM ir but it's somehow optimized in fact what's going on within that black box the compiler is executing a sequence of what we call transformation passes all the code each transformation pass takes a look takes a look at its input and analyzes that code and then tries to edit the code in an effort to optimize the codes performance now a transformation pass might end up running multiple times and those passes run in some order that order ends up being a predetermined predetermined order that the compiler writers found to work pretty well on their tests that's about the level of insight that went into picking the order they seemed it seems to work well now some good news in terms of trying to understand what the compiler does you can actually just ask the compiler what did you do and you've already used this functionality as I understand in some of your assignments you've already asked the compiler to give you a report specifically about whether or not it could vectorize some code but in fact LLVM the clutter you have access to can produce reports not just for vectorization but for a lot of the different transformation passes that it tries to perform and there's some syntax that you have to pass to the compiler some compiler flags I have to specify in order to get those reports those are described on the slide I won't walk you through that text so you can look at the slides afterwards at the end of the day this string that you're passing is actually a regular expression if you know what regular expressions are great then you can use that to narrow down the search for for your report if you don't and you just want to see the whole report just provide a dot star as a string and you're good to go that's the good news you can get the compiler to tell you exactly what it did the bad news is that when you ask the compiler what it did it'll give you a report and that report looks something like this in fact I've elided most of the report for this particular piece of code because the report ends up being very long and as you might notice just from reading some of the texts there are definitely English words in this text and there are pointers to pieces of code that you've compiled but it is very jargony and hard to understand this isn't the easiest piece of this isn't the easier to report to make sense of okay so that's some good news and some bad news about these compiler reports the good news is you can ask the compiler and it'll happily tell you all about the things that it did you can tell you about which transformation passes we're successfully able to transform the code it can tell you conclusions that it drew about its analysis of the code but the bad news is these reports are kind of complicated they're they can be long they use a lot of internal compiler jargon which if you're not familiar with that jargon it makes it hard to understand it also turns out that not all of the transformation passes in the compiler give you these nice reports so you don't get to see the whole picture and in general the reports don't really tell you the whole story about what the compiler did or did not do and we'll see another example of that later on so part of the goal of today's lecture is to get some context for understanding the reports that you might see if you pass it as flags to the compiler and the structure of the today's lecture it's basically divided up into two parts first I want to give you some examples of compiler optimizations just simple examples so you get a sense as to how a compiler mechanically reasons about the code it's given and tries to optimize that code we'll take a look at how the compiler optimizes a single scalar value how it can optimize a structure how can optimize function calls and how it can optimize loops just simple examples to give some flavor and then the second half of lecture I have a few case studies for you which get into diagnosing ways in which the compiler fails I not do two bugs per se but simply didn't do an optimization you might have expected it to do not to be frank I think all of those case studies are really cool but it's not totally crucial that we get through every single case study during today's lecture the slides will be available afterwards so when we get to that part we'll just see how many case studies we can cover sound good any questions so far all right let's get to it let's start with a quick overview of compiler optimizations so here is a summary of the various options oh I forgot that I just copied this slide from a previous lecture given in this class you might recognize a slide I think from lecture two sorry about that that's okay well we can fix this we'll just go ahead and add this slide right now we need to change the title so we'll just cross that out and and put in our new title okay so yeah great and now well we should double check these lists and make sure that they're accurate data structures we'll come back to data structures loops hoisting yeah the compiler can do hoisting sentinels not really compilers not not good at sentinels loop unrolling yeah it absolutely does loop and rolling loop fusion yeah it can but there are some restrictions that apply you know your mileage might vary eliminate waste iterations some restrictions might apply ok logic constant folding and propagation yeah it's good on that it's common sub-expression elimination yeah I can find common sub-expressions you're fine they're algebraic idea knows algebra yeah good short-circuiting yes absolutely ordering tests depends on the tests I'll give it to the compiler but all I'll say restrictions apply creating a fast path compilers aren't that smart about fast paths they come up with really boring fast paths I'm gonna take that off the list combining tests again it kind of depends on the tests functions compilers are pretty good at functions so inlining yep can absolute do that tail recursion elimination yes absolutely corseting not so much ok great let's come back to data structures which we skipped before packing augmentee ok honestly the compiler does a lot with data structures but really none of those things the good pottery isn't smart about data structures in that particular way really the way that the compiler is smart about data structures is shown here if we expand this list to include even more compiler optimizations bottom line with data structures the compiler knows a lot about architecture and it really it's it's put a lot of effort into figuring out how to use registers really effectively reading and writing a register is super fast touching memory is not so fast and so the compiler works really hard to allocate registers put anything that lives in memory ordinarily into registers manipulate aggregate types to use registers as we'll see in a couple slides a line data that has to live in memory compilers are good at that compiler is also good at loops we already saw some example observation of the previous slide it can vectorize it does a lot of other cool stuff on switching is a cool optimization that I won't cover here idiom replacement it finds common patterns and does something smart with those fission skewing tiling interchange those all try to process the iterations of the loop in some clever way to make stuff go fast and some restrictions apply those are really in development in LLVM logic it doesn't a lot more with logic than what we saw before it can eliminate instructions that aren't necessary it can do strength reduction and other cool optimization I think we saw that one in the Bentley slides it gets rid of dead code it can do more idiom replacement branch routing is kind like reordering tests global value numbering another cool optimization that we won't talk about today functions that can do more on switching it can eliminate arguments that aren't necessary so the compiler can do a lot of stuff for you and at the end the day writing down this whole list is kind of a futile activity because it changes over time compilers are a moving target compiler developers their software engineers like you and me and they're clever and they're trying to apply all their clever software engineering practice to this compiler code base to make it do more stuff and so they're constantly adding new optimizations the compiler new clever analyses all the time so really what we're gonna look at today is just a couple examples to get a flavor for what the compiler does internally now if you want to follow along with how the compiler works the good news is by and large you can take a look at the LLVM ir to see what happens as a compiler processes your code you don't need to look at the assembly that's generally true but there are some exceptions so for example if we have these three snippets of C code on the left and we look at what the what your LLVM compiler generates in terms of the ir we can see that there are some optimizations reflected but not too many interesting ones the the x eight turns into a shift left operation by three because eight is a power two that's straightforward good we can see that in the ir the x 15 still looks like a x 15 no changes there the / 71 looks like a / 71 again no changes there now with arithmetic ops the difference between what you see in the LVM ir and what you see in the assembly this is where it's most pronounced at least in my experience because if we take a look at these same snippets of C code and we look at the corresponding x86 assembly for it we get the stuff on the right and this looks different let's pick through what this assembly code does well my in time so the first one and the C code takes the argument n and multiplies it by 8 and in the assembly we have this le a instruction anyone remember what the Le a instruction does I see one person shaking their head that's a perfectly reasonable response yeah go for it load effective address what does that mean load the address but don't actually access memory another way to phrase that compute the do this address calculation and give me the result of the address calculation don't read or write memory at that address just do the calculation that's what loading and effective address means essentially but you're exactly right the elia instruction does an address calculation and stores the results in the register on the right anyone remember enough about x86 address calculations to tell me how that le a in particular works the first le a on the slide yeah but before the first comma in this case nothing gets added to the product of the second two arguments in those parens you're exactly right so this le a takes the value eight multiplies it by whatever is in register R di which holds the value N and it stores the result into EAX so perfect it does a multiplied by eight the address calculator is only capable of a small range of operations it can do additions and it can multiply by one to four or eight that's it so it's a really simple circuit and the hardware but it's fast it's optimized heavily by modern processors and so if the compiler can use it they tend to try to use these le instructions so good job how about the next one multiplied by 15 turns into these two le a instructions can anyone tell me how these work you're basically multiplying by five and multiplying by three exactly right we can step through this as well if we look at the first Elliot instruction we take our TI which stores the value n we multiply that by four we add it to the original value of r di and so that computes four times n plus n which is five times N and that result gets stored into EAX good we've effectively multiplied by five the next instruction takes whatever is in our a X which is now 5 n multiplies that by 2 adds it to whatever is currently in our X which is once again 5 n so that computes 2 times 5 n + 5 n which is 3 times 5 n which is 15 n so just like that we've done our multiply with two le a instructions how about the last one in this last piece of code we take the argument in our TI and we move it into a X we then move the value three billion eight hundred and seventy 1 million five hundred nineteen thousand eight hundred and seventeen and put that into e CX as you do we multiply those two values together and then we shift the product right by 38 so obviously this divides by 71 any guesses as to how this performs the division operation we want both of you answered I might still call on you I'll give someone I'll give it a little more time for someone else to raise their hand go for it it has a lot to do with two to 38 very good yeah all right any any further guesses before I give the answer away yeah in the back it kinda so this is what's technically called a magic number and yes it's technically called a magic number and this magic number is equal to two to the thirty eight divided by seventy one plus one to deal with some rounding effects what this code does is it says let's compute n divided by 71 by first computing and divided by 71 times two to the thirty eight and then shifting off the lower thirty eight bits with that shift right operation and because it can by converting the operation into this it's able to replace the division operation with a multiply and if you remember hopefully from the architecture lecture multiply operations they're not the cheapest things in the world but they're not too bad division is really expensive if you want fast code never divide also never compute modulus or act or access memory yeah question why did I choose 38 I think it shows 38 because 38 works there's actually a formula for pretty much it doesn't want to choose a valley that's too large or else it'll overflow and it doesn't want to choose a value that's too small or else you lose precision so it's able to find a balancing point if you want to know more about magic numbers I recommend checking out this book called hackers delight for any of you who are familiar with this book it is a book full of bit tricks seriously that's the entire book it's just a book full of bit Rick's and there's a whole section in there describing how you do division by various constants using multiplication either signed or unsigned it's very cool but yeah magic number to convert a division into a multiply that's the that's the kind of thing that you might see from the assembly that's one of these examples of arithmetic operations that are really optimized at the very last step but for the rest of the optimizations for say we could focus on the ir any questions about that so far cool okay so for the next part of the lecture I want to show you a couple examples ations in terms of the LLVM ir and to show you these optimizations will have a little bit of code that will work through a running example if you will and this wording example will be some code that I stole from I think it's a serial program that simulates the behavior of and massive bodies in 2d space under the law of gravitation so we've got a whole bunch of point masses those point masses have varying masses and we just want to simulate what happens due to gravity as these masses interacting in the plane at a high level the N body code is pretty simple we have a top level simulate routine which just loops over all the time steps during which we want to perform this simulation and I each time step it calculates the various forces in acting on those different and then it updates the position of each body based on those forces in order to do that calculation it has some internal data structures one to represent each body which contains a couple of vector types and we define our own vector type to store two double precision floating point values now we don't see the entire code in order to look at some compiler optimizations but one routine that we will take a look at is this one to update the positions this is a simple loop that takes each body one at a time computes the new velocity on that body based on the forces acting on that body and uses vector operations to do that and then it updates the position of that body again using these vector operations that we've defined and then it stores the results into video structure for that body so all these methods with this code make use of these basic routines on 2d vectors points in X Y or points in 2d space and these routines are pretty simple there's one to add two vectors there's another to scale a vector by a scalar value and there's a third to compute the length which we won't actually look at too much everyone good so far okay so let's try to start simple let's take a look at just one of these one line vector operations vex scale all vex scale does is it takes one of these vector one of these vector inputs and a scalar value a and it multiplies X by a and Y by a and stores the results into a vector type and returns it great couldn't be simpler if we compile this with no opposition's whatsoever and we take a look at the LVM ir we get that which is a little more complicated than you might imagine the good news though is that if you turn on optimizations and you just turn on the first law of optimization just oh one whereas we got this code before now we get this which is far far simpler and so simple I can blow up the font size so you can actually read read the code on this slide so just see again no optimizations optimizations so a lot of stuff happened to get from to optimize this simple function we're gonna see what those optimizations actually were but first let's pick apart what's going on in this function we have our vex scale routine in LV Mir it takes a structure as its first argument and that's represented using two doubles it takes the scalar as the second argument and what the what the operation does is it multiplies those two those two fields by the third argument that double A and then packs those values into ade struct that'll return and finally it returns that struct so that's what the optimized code does let's see actually how we get to this optimized code we'll do this one step at a time let's start by optimizing the operations on a single scalar value that's why I picked this example so if we go back to the o 0 code and we just pick out the operations that dealt with that scalar value we narrow ourselves we narrow our scope down to just these lines so the double the argument double a is the final argument in the list and what we see is that within the VEX scale routine compile that o 0 we allocate some local storage we store that double a into the local storage and then later on we'll load the value out of the local storage before the multiply and then we load it again before the other multiply ok any idea is how we could make this code faster don't store it in memory what a great idea how would we get around not storing in memory saving your register in particular what property of LV Mir makes out really easy there are infinite registers and in fact the argument is already in a register it's already in the register percent to article so we don't need to move it into a register it's already there so how do we go about optimizing that code in this case well let's find the places where we're using the value and we're using the value loaded from memory and what we're gonna do is just replace those those loads for memory with the original argument we know exactly what operation we're trying to do we know we're trying to all we're trying to do a multiplied by the original parameter so we just find those two uses we cross them out and we put in the input parameter in his place that makes sense questions so far cool so now those multiplies aren't using the values returned by the loads how further can we optimize this code sorry delete the loads what else can we delete so there's no address calculation here just because the code is so simple but good insight the allocation and and the store great so those loads are dead code the store is dead code the allocation is dead code we eliminate all that dead code we convert those loads we just use the value of living in the register and we've already eliminated a bunch of instructions so the net effect of that was to turn the code optimize at o 0 that we had in the background into the code we have in the foreground which is slightly shorter but not that much so it's a little bit faster but not that much faster how do we optimize this function further do it for every variable we have in particular the only other variable we have is this structure that we're passing in so we want to do this kind of optimization on the structure make sense so let's see how we optimize this structure now the problem is that structures are harder to handle than individual scalar values because in general you can't store the whole structure in just a single register it's a it's more complicated to juggle all the data within a structure but nevertheless let's take a look at the code that operates on the structure or at least operates on the structure that we pass in to the function so when we eliminate all the other code we see that we've got an allocation see if I have animations here yeah I do we have an allocation so we can store the structure onto the stack then we have an address calculation that lets us store the first part of the structure onto the stack we have a second address calculation to store the second field on the stack and later on when we need those values we load the first one out of the first field out of memory and we load the second field out of memory this is a very similar pattern to what we had before except we've got more going on in this case so how do we go about optimizing this structure any ideas high level ideas ultimately we'll want to get rid of all of the memory references and all that storage for the structure how do we reason through how do we reason through eliminating all that stuff in a mechanical fashion based on what we've seen so far go for it they are passed in using separate parameters separate registers if you will as a quirk of how LLVM does it so given that insight what would you how would you optimize it Crossout percent 12 percent 6 and put in their relevance the relevant field cool let me phrase that a little bit differently let's do this one field at a time we've got a structure which has multiple fields let's just let's just take it one step at a time all right so let's look at the first field and let's look at the operations that deal with the first field we have in our code in our LV Mir some address calculations that refer to the same field of the structure in this case I believe it's the first field yes okay and ultimately we end up loading from this from this location in local memory so what value is this load going to retrieve how do we know that both address calculations refer to the same field good question what we do in this case is very careful analysis of the math that's going on we know that the the aleca the the location in local memory that's just a fixed location and from that we can interpret what each of the instructions does in terms of an address calculation and we and we can determine that they're the same value so we had this location in memory that we operate on and before we ultimately use before you do a multiply we end up loading from that location in memory so what value do we know is going to be loaded by that load instruction go for it [Music] not putting it back but we don't even worry about putting it back so correct correct so what are we multiplying of that multiply which value first element of the struct its percent zero it's a value that we stored right there that makes sense everyone see that any questions about that all right so we're storing the first element of the struct into this location later we load it out of that same location nothing else happened to that location so let's go ahead and optimize it just the same way we optimize the scaler we see that we use the result of the load right there but we know that that load is going to return the first field of our struct input so we'll just cross out and replace it with that field so now we're not using the result of that load what do we get to do as the compiler you tell you know the answer delete the dead code delete all of us remove the now dead code which is all those address calculations as well as the load operation and the store operation and that's pretty much it yeah good so we replaced that we were placed on operation and we got rid of a bunch of other code from our from our function we've now optimized one of the two fields in our struct what do we do next optimize the next one that happens similarly I won't walk you through that a second time refine where we're using the result of that load we can cross it out replace it with the appropriate input and then delete all of the relevant dead code and now we get to delete the original allocation because nothing is getting stored to that memory that makes sense any questions about that yeah yep when we first compiled LVM our do we unpack the struct and pass in the separate parameters yeah yeah so LLVM I are in this case when we compile that it was zero decided to pass it as separate parameters just as they represent eight just as its representation so in that sense yes but it was still doing the the standard create some local storage destroy the parameters out to local storage and then all operations just read out of local storage it's the standard thing that the compiler generates when when it's asked to compile C code and with no other optimizations that's what you get that makes sense yeah what are all the aligned eights doing the align eights are attributes that does that specify the alignment of that location in memory this is alignment information that the compiler either determines by analysis or implements as part of a standard so they're specifying how values are aligned in memory that matters a lot more for ultimate code generation unless we're able to just delete the memory references all together make sense cool any other questions all right so we optimized the first field we can optimize the second field a similar way turns out there's additional optimizations that need to happen in order to return a structure from this function those operations can be optimized in a similar way they're shown here we're not going to go through exactly how that works but at the end of the day after we've optimized all of that code we end up with this we end up with our function compiled at oh one and it's far simpler I think it's far more intuitive this is what I would imagine the code should look like when I've written the C code when I wrote the C code in the first place take your input do a couple multiplications and then it does some operations to create the return value and ultimately return that value so in summary the compilers the compiler works hard to transform data structures and scalar values to store as much as it possibly can purely within registers and avoid using any local storage if possible everyone good with that so far cool let's move on to another optimization let's talk about function calls let's take a look at how the compiler can optimize function calls by and large these optimizations will occur if you pass optimization level two or higher yeah this FYI so from our original C code we had some lines that performed a bunch of vector operations we had a Veck add that added two vectors together one of which was the result of X scale we're scaled the result of a CAD by some scalar value so we had this chain of this chain of calls in our code and if we take a look at the code compile that it was zero well we end up with is the snippet shown on the bottom which performs some operations on these vector structures does this multiply operation and then calls the sveck scalar routine the VEX field routine that we've decided to focus on first so how do we go about any ideas for how we go about optimizing this so to get a little bit of a hint what the compiler sees when it looks at that call is it sees this snippet containing the call instruction and also see in in our example it also sees the code for the VEX scalar function that we were just looking at and we're gonna suppose that it's already optimized of x scale as best as it can it's produced this code for the VEX scalar routine and so it sees that call instruction and it sees this code for the function that's being called so what could the compiler do at this point to try to make the function in the the code and the the code above even faster you're exactly right remove the call I just put the body of the VEX kale code right there in place of the call it takes a little bit of effort to pull that off but roughly speaking yeah we're just gonna copy and paste this code in our function into the place where where we're calling the function and so if we do that simple copy/paste we end up with some garbage code as an intermediate we had to do a little bit of renaming to make everything work out but at this point we have the code from our function in the place of that call and now we can observe that to restore correctness we don't want to do the call and we don't want to do the return that we just paste it in to paste it in place so we'll just go ahead and remove both that call and the return that is called function inlining we identify some function call or goodbye identify some function call and it takes the body of the function and just paste it right in place of that call so good make sense anyone confused raise your hand if you're confused alright now once you've done some amount of function inlining we can actually do some more optimizations so here we have the code after we got rid of the unnecessary call and returned and we have a couple multiply operation sitting in place that looks fine but if we expand our scope just a little bit what we see so we have some operations happening that we're sitting there already after the original call and what the compliant do is it can take a look at these instructions and long story short it realizes that all these instructions do is pack some data into a structure and then immediately unpack the structure so it's like you put a bunch of stuff into a bag and then you immediately dumped out the bag that was kind of a waste of time that's kind of a waste of code let's get rid of it right those operations are useless let's delete them Copiah has a great time deleting dead code but it's like it's what it lives to do all right now in fact in the original code we didn't just have one function call we had a whole sequence of function calls and if we expand our LVM I our snippet even a little further we can include those two function calls the original call to Veck at followed by the code that we've now optimized by inlining ultimately followed by yet another call to bec add minor spoiler the Veck add routine once it's optimized looks pretty similar to the VEX scalar routine and in particular as comparable size to the VEX scale routine so what's the compiler going to do to those two call sites and line it do more in lining in lining is great well in line these functions as well and then remove all the additional now useless instructions won't walk through that process but the result of that process looks something like this so the original C code we had this Beck ad which called the VEX scale as one of his arguments which called it Beck ad as one of its arguments and what we ended up with in the optimized IR is just a bunch of straight line code that performs floating-point operations it's almost as if the compiler took the original C code and transformed it into the equivalent C code shown on the bottom where it just operates on a whole bunch of doubles and just as primitive operations so function inlining as well as the additional transformations it was able to perform as a results together those were able to eliminate all of those function calls it was able to completely eliminate any costs associated with the function call abstraction at least in this code make sense I think that's pretty cool you write code that has a bunch of function calls because that's how you've constructed your interfaces but you're not really paying for those function calls function calls aren't the cheapest operation in the world especially if you think about everything that goes on in terms of the registers and the stack but the compiler is able to avoid all that overhead and just perform the floating-point operations we care about okay well a function inlining is so great and enables so many great optimizations why doesn't the compiler just inline every function call go for it recursion it's really hard to inline a recursive call in general you can't inline a function into itself although it turns out there are some exceptions so yes recursion creates problems with function inlining any other thoughts in the back you're you're definitely onto something so we had to do a bunch of this renaming stuff when we inline the first time and when we inline every single time and even though LOV Mir has an infinite number of registers the Machine doesn't and so all of that renaming does create a problem but there are other problems as well of a similar nature when you start inlining all those functions for example you copy pasted a bunch of code and that made the original call site even bigger and bigger and bigger and bigger and in and programs we generally don't think about the space they take in memory but they do take space in memory and that does have an impact on performance so great answer any other thoughts if your function becomes too long then it may not fit in an instruction cache and that can increase the amount of time it takes just to execute the function right because you're now not getting hash hits exactly right that's one of the problems with this code size blow up from inlining everything any other thoughts any final thoughts so the three main reasons why the compiler won't inline every function I think we touched on two of them here for some function calls like recursive calls it's impossible to inline them because you can't inline a function into itself but there are exceptions to that namely recursive tail calls if the last thing in a function is a function call then it turns out you can effectively inline that function call as an optimization we're not going to talk too much about how that works but there are there are corner cases but in general you can't inline a recursive call ah the compiler has another problem namely if you've if the function that you're calling is in a different Castle if it's in a different compilation unit then it's literally in a different file like that's compiled independently then the compiler can't very well inline that function because it doesn't know about the function it doesn't have access to that functions code there is a way to get around that problem with modern compiler technology that involves whole program optimization and I think there's some backup size that will tell you how to do that with LVM but in general if it's in a different compilation unit it can't be in line and finally as touched on function airline can increase code size which can hurt performance okay so some functions are okay to inline other functions could create this performance problem because you increase code size so how does the phone how does the compiler know whether or not in lining any particular function at a call site could hurt performance any guesses yeah yeah yeah so the compiler has some cost model which gives it some information about how much will it cost to in line that function is the cost model always right it is not so the answer how does it apply or no is really it doesn't know it makes a best guess using that cost model and other heuristics to determine when does it make sense to try to inline a function and because it's making a best guess sometimes the compiler guesses wrong so to wrap up this part here just a couple of tips for controlling function inlining in your own programs if there's a function that you know must always be inlined no matter what happens you can mark that function with a special attribute namely the always in line attribute for example if you have a function that does some complex address calculation and it should be inlined rather than called you may want to mark that with and always in line attribute similarly if you have a function that really should never be inlined it's never cost-effective to inline that function you can mark that function with the know inline attribute and finally if you want to enable more function inlining in the compiler you can use link time optimization or LTO to enable whole program optimization I won't go into that during these slides let's move on and talk about loop optimizations any questions so far before we before we continue yeah sorry - I keel static does static in line guarantee that the function that the compiler will always inline it it actually doesn't the in line keyword will provide a hint to the compiler that it should think about in lying the function but it doesn't provide any guarantees if you want a strong guarantee use the always in line attribute good question though all right noop optimizations you've already so you've already seen some move authorizations you've seen vectorization for example it turns out that the compiler does a lot of work to try to optimize loops so first why is that why would the compiler engineers invest so much effort into optimizing loops while loops in particular they're an extremely common control structure that also has a branch both things are true I think there's a higher level reason though or a more fundamental reason if you will most of the time the loop takes up the most time you got it loops account for a lot of the execution time of programs the way I like to think about this is with a really simple thought experiment let's imagine that you've got a machine with the two gigahertz processor we chosen these values to be easier to think about using mental math suppose you've got a two gigahertz processor with 16 cores each core executes one instruction per cycle and suppose that you've got a program which contains a trillion instructions and ample parallelism for those 16 cores but all of those instructions are simple straight-line code there are no branches there are no loops there are no complicated operations like oh like i/o it's just a bunch of really simple straight line code each instruction takes a cycle that executes the processor excuse one instruction per cycle how long does it take to run this code to execute the entire terabyte binary to the fortieth cycles for to the forty instructions but you're using a two gigahertz processor and 16 cores and you got ample parallelism in the program to keep them all saturated so how much time 32 seconds nice nice job this one has mastered power-of-two arithmetic in one's head it's a good skill to have especially in core sex yeah so if you have just a bunch of simple straight line code and you have a terabyte of it now that's a lot of code that is a big binary and yet the program this processor this relatively simpler processor can execute the whole thing in just just about 30 seconds now in your experience working with software you might have noticed that there are some programs that take longer than 30 seconds to run and some of those programs don't have terabyte size binaries the reason that those programs take longer to run by and large is loops so loops account for a lot of the execution time in real programs now you've already seen some move Altimas asians we're just gonna take a look at one other looped authorization today namely code hoisting also known as loop invariant code motion and to look at that we're gonna take another we're gonna take a look at a different snippet of code from the n-body simulation this code calculates the forces going on each of the end bodies and it does it with a doubly nested loop for all the bodies your number of bodies for all bodies your number of bodies as long as you're not looking at the same body call this add force routine which calculates to calculating the force on that the force between those two bodies and add that force to one of the bodies it's all that's going on in this code if we translate this code into LLVM ir we end up with hopefully unsurprisingly a doubly nested loop it looks something like this the body of the code the body of the innermost loop has been lighted just so things can fit on the slide well we can see the the overall structure on the outside we have some outer loop control this should look familiar from lecture 5 hopefully inside of that outer loop we have an inner loop and at the top and the bottom that inner loop the inner loop control and within that inner loop we do have one branch which can skip a bunch of code if you're looking at the same body pry and Jane but otherwise we have the loop body of the innermost loop basics charge now we just take as if we just zoom in on the top part of this doubly nested loop just the topmost three basic blocks and take a look at more of the code that's going on here we end up with something that looks like this and if you remember some of the discussion from lecture 5 about loop induction variables and what that looks like in LP miR what you find is that for the outer loop we have an induction variable at the very top it's that fee that we are fee instruction once again inside that outer loop we have the loop control for the inner loop which has its own induction variable once again we have another fee node that's how we can spot it and then we have the body of the innermost loop and this is just the start of it it's just a couple address calculations but can anyone tell me some interesting property about just a couple of these address calculations that could lead to an optimization the first to address calculations only depend on the outermost loop variable the the iteration variable for the outer loop exactly right so what can we do with those instructions bring them out of the inner loop why should we keep computing these addresses in the innermost loop when we could just compute them once in the outer loop that optimization is called code hoisting or loop invariant code motion those instructions are invariance to the code in the innermost loop so you hoist them out and once you highways them out you end up with a transformed loop that looks something like this what we have is the same outer loop control at the very top but now we're doing some address calculations there and we no longer have those address calculations on the inside and as a results those who stood calculations are performed just once per iteration of the outer loop rather than once per iteration of the inner loop and so those instructions are run far fewer times you get to save a lot of running time so the effect of this optimization in terms of C code because it can be a little tedious to look at LV miR is essentially like this we took this doubly nested loop in C we were calling add force of blah blah blah calculate force blah blah blah and now we just move the address calculation to get the ice body that we care about we move that to the outer loop now this was an example of loop invariant code motion on just a couple address calculations in general the compiler will try to prove that some calculation is invariants across all the iterations of a loop and whenever it can prove that it will try to hoist that code out of the loop if you can get code out of the body of a loop that reduces the running time of the loop saves a lot of execution time huge bang for the buck makes sense any questions about that so far all right so just to summarize this part what can the compiler do the compiler optimizes code by performing a sequence of transformation passes all those passes are pretty mechanical the compiler goes through the code it tries to find some property like this address calculations the same as that address calculation and so this load will return the same value as that store and so and so forth and based on that analysis it tries to get rid of some dead code and replace replace certain register values with other words your values replace things I live in memory with things that just live in registers a lot of the transformations resemble Bentley rule work optimizations so you've seen in lecture two so as you're studying for the your upcoming quiz you can kind of get two for one by looking at those Bentley rule optimizations and one transformation pass in particular function inlining was a good example of this one transformation can enable other transformations and those together can compound to give you fast code in general compilers perform a lot more transformations than just the ones that we saw today but there are things that the compiler can't do here's one very simple example in this case we're taking another look at this calculate forces routine although the compiler can optimize the code by moving address calculations out of the loop one thing that it can't do is exploit symmetry in the problem so in this problem what's going on is we're computing the forces on any pair of bodies using the law of gravitation and it turns out that the force acting on this on one body by another is exactly opposite the force acting on the other body by the one so f of 1 2 is equal to minus F of 2 1 the compiler will not figure that out the compiler knows algebra it doesn't know physics so won't be able to figure out that there's symmetry in this problem and it can avoid wasted operations make sense all right so that was an overview of some simple compiler optimizations we now have some failure some examples of some case studies to see where the compiler can I can get tripped up and doesn't matter if we get through all of these or not you'll have access to the slides afterwards but I think these are kind of cool so shall we take a look all right simple question does the compiler vectorize this loop so just to go over what this loop does it's a simple loop the function takes two in two vectors as inputs and or two arrays as inputs I should say an array called Y of like then an array of X or an array X of length N and some scalar value a and all that this function does is it loops over each element of the vector multiplies X of I by the input scalar adds there the product into Y sub I so does the loop vectorize yes why next could overlap and there is no information about whether they overlap so do they vectorize we have a vote for know anyone think that does vectorize you made a very convincing argument so everyone think it everyone believes that this loop does not vectorize it's a true anyone uncertain anyone unwilling to commit to yes or no right here all right a bunch of people are unwilling to commit to yes or no all right let's resolve this question let's first ask for the report let's look at the vectorization report we compile it we pass the flags together vectorization report and the vectorization report says yes it does vectorize this loop which is interesting because we have this great argument that says but we you don't know how these addresses fit in memory you don't know if x and y overlap with each other how can you possibly vectorize kind of a mystery well if we take a look at the actual compiled code when we optimize this at o2 it turns out you can pass certain flags through the compiler and get it to print out not just the LLVM AR but the LLVM are formatted as a control flow graph and the control flow graph for this simple two line function is the thing on the right which you obviously can't read and it's a little bit because it's a little bit small in terms of its text and there seems to have a lot going on so I took the liberty of redrawing that control flow graph with none of the code inside just to get a just get a picture of what the structure looks like for this compiled function and structurally speaking and looks like this and with a bit of practice staring at control flow graphs which you might get if you spend way too much time working on compilers you might look at this control photograph and think this craft looks a little too complicated for the to line function that we gave as input so what's going on here well we've got three different loops in this code and it turns out that one of those loops is full of vector operations okay the other two loops are not full of vector operations that's unvectorized code and then there's this basic block right at the top that has a conditional branch at the end of it branching to either the vectorize loop or the unvectorized loop and yeah there's a lot of other control flow going on as well but we can focus on just these components for the time being so what's that conditional branch doing well we can zoom in on just this one basic block and actually show it to be readable on the slide and the basic block looks like this so let's just study this L of U in my our code in this case we have that the address wise for in register 0 the address of X is stored in register 2 and register 3 stores the value of n so one instruction at a time who can tell me what the first instruction in this code does yes gets the address of why it doesn't look is that that we said okay so it does use the address of why it's an address calculation that operates on register 0 which stores the address of why but it's not just computing the address of why it's getting the address of the nth element of why it's adding in whatever is in register 3 which is the value n into the address of Y so that computes the address y plus n this is this is testing your memory of pointer arithmetic and see just a little bit but don't worry it won't be it won't be too rough so that's what the first address calculation does what is the next instruction do that cookies X plus n very good how about the next one it compares whether an expo side inflight Wilson or compares x + 1 + y vs y + n [Music] right so it does a comparison we'll take that a little more slowly does a comparison of X + n versus Y and checks is X plus and greater than willing perfect how about the next instruction yeah compares y+ n versus X is y plus n greater than X how about the last instruction before the branch yep go for it there was and of the results so this computes the comparison his Expo site in grades then why bitwise and he's y plus or his Michels N greater than X fair enough so what does the result of that condition mean I think we've pretty much already spoiled the answer anyone want to area one last time I had this whole set up cohort checks if they overlap so let's look at this condition in a couple of different situations if we have X living in one place in memory and Y living in another place in memory then no matter how we resolve this condition if we check in if we check is both y plus M greater than X and X plus n greater than Y the results will be false but if we have this situation where x and y overlap in memory for some amount of for some portion of memory then it turns out the regardless of whether X or Y is first x plus n will be greater than y y plus 7 will be greater than X and the condition will return true in other words the condition which is true if and only if these portions of memory 20 by x + y alias so going back to our original looping code we have a situation where we have a branch based on whether or not the alias and if they and in one case it executes Spector eyes the vector eyes loop and in another case the execute San on vectorize code X are returning to our original question in particular to actually use a vectorized loop if they don't alias so returning to our original question do they does this code get vectorized the answer is yes and no so if you voted yes you're actually right if you go to no and you were persuaded you were right and if you didn't commit to an answer I can't help you but that's interesting the compiler actually generated multiple versions of this due to uncertainty about memory aliasing yeah question so the question is could the compiler figure out this condition statically while it's compiling the function because we know the function is going to get called from somewhere the answer is sometimes it can a lot of times it can't if it's not capable of in lying this function for example then it probably doesn't have enough information to tell whether or not these two pointers will alias for example you're just building a library with a bunch of vector routines you don't know the code that's gonna call this routine eventually now in general Mable memory aliasing this this be the last point before we wrap up in general memory is and can cause a lot of issues when it comes to compiler optimization now you can make call us a compiler to act very conservatively in this example we have a simple serial base case for a matrix multiply routine but we don't know anything about the pointers to the C a or b matrices and when we try to compile this and optimize it the compiler complains that it can't do loop invariant code motion because it doesn't know anything about these pointers it could be that the pointer changes within the innermost loop so it can't move some calculation out to an outer loop compilers try to deal with this statically using an analysis technique called alias analysis and they do try very hard to figure out when are these pointers going to alias or what when are they guaranteed to not alias now in general it turns out that alias analysis isn't just hard it's undecidable if only it were hard maybe we'd have some hope but compilers in practice are faced with this undecidable question and they try a variety of tricks to get useful alias analysis results in practice for example based on information in the source code the compiler might annotate instructions with various metadata to track this aliasing information for example TB a a is aliasing information based on types there's some scoping information for aliasing there's some information that's it's guaranteed not to alias with this other operation all kinds of metadata now what can you do as a programmer to avoid these issues memory aliasing always annotate your pointers kids always annotate your pointers the restored you've seen before it tells the compiler address calculations based off this pointer won't alias with address calculations based off other pointers the cost keyword provides a little more information and says these addresses will be read from they won't be written to and that can enable a lot more compiler optimizations now that's all the time that we have there are a couple other cool case studies in the slides if you're welcome to peruse the slides afterwards thanks for your listening 3 00:00:05,769 --> 00:00:08,019 4 00:00:08,019 --> 00:00:09,850 5 00:00:09,850 --> 00:00:10,930 6 00:00:10,930 --> 00:00:13,120 7 00:00:13,120 --> 00:00:15,160 8 00:00:15,160 --> 00:00:21,720 9 00:00:21,720 --> 00:00:24,310 10 00:00:24,310 --> 00:00:31,470 11 00:00:31,470 --> 00:00:36,550 12 00:00:36,550 --> 00:00:41,110 13 00:00:41,110 --> 00:00:44,800 14 00:00:44,800 --> 00:00:52,180 15 00:00:52,180 --> 00:00:55,420 16 00:00:55,420 --> 00:00:59,680 17 00:00:59,680 --> 00:01:01,389 18 00:01:01,389 --> 00:01:03,100 19 00:01:03,100 --> 00:01:06,040 20 00:01:06,040 --> 00:01:08,230 21 00:01:08,230 --> 00:01:10,090 22 00:01:10,090 --> 00:01:12,450 23 00:01:12,450 --> 00:01:16,450 24 00:01:16,450 --> 00:01:17,920 25 00:01:17,920 --> 00:01:21,420 26 00:01:21,420 --> 00:01:26,110 27 00:01:26,110 --> 00:01:28,570 28 00:01:28,570 --> 00:01:31,770 29 00:01:31,770 --> 00:01:35,350 30 00:01:35,350 --> 00:01:37,000 31 00:01:37,000 --> 00:01:38,920 32 00:01:38,920 --> 00:01:41,440 33 00:01:41,440 --> 00:01:43,390 34 00:01:43,390 --> 00:01:45,880 35 00:01:45,880 --> 00:01:49,270 36 00:01:49,270 --> 00:01:51,910 37 00:01:51,910 --> 00:01:54,820 38 00:01:54,820 --> 00:01:57,580 39 00:01:57,580 --> 00:01:59,740 40 00:01:59,740 --> 00:02:02,260 41 00:02:02,260 --> 00:02:06,670 42 00:02:06,670 --> 00:02:09,639 43 00:02:09,639 --> 00:02:11,920 44 00:02:11,920 --> 00:02:13,910 45 00:02:13,910 --> 00:02:18,140 46 00:02:18,140 --> 00:02:21,790 47 00:02:21,790 --> 00:02:26,900 48 00:02:26,900 --> 00:02:30,380 49 00:02:30,380 --> 00:02:33,170 50 00:02:33,170 --> 00:02:34,699 51 00:02:34,699 --> 00:02:37,670 52 00:02:37,670 --> 00:02:41,449 53 00:02:41,449 --> 00:02:45,710 54 00:02:45,710 --> 00:02:47,509 55 00:02:47,509 --> 00:02:49,910 56 00:02:49,910 --> 00:02:52,490 57 00:02:52,490 --> 00:02:56,750 58 00:02:56,750 --> 00:02:58,430 59 00:02:58,430 --> 00:03:01,160 60 00:03:01,160 --> 00:03:02,509 61 00:03:02,509 --> 00:03:05,930 62 00:03:05,930 --> 00:03:08,840 63 00:03:08,840 --> 00:03:10,520 64 00:03:10,520 --> 00:03:15,170 65 00:03:15,170 --> 00:03:18,289 66 00:03:18,289 --> 00:03:20,360 67 00:03:20,360 --> 00:03:21,949 68 00:03:21,949 --> 00:03:24,170 69 00:03:24,170 --> 00:03:26,720 70 00:03:26,720 --> 00:03:32,060 71 00:03:32,060 --> 00:03:34,789 72 00:03:34,789 --> 00:03:37,520 73 00:03:37,520 --> 00:03:39,979 74 00:03:39,979 --> 00:03:42,080 75 00:03:42,080 --> 00:03:44,630 76 00:03:44,630 --> 00:03:48,110 77 00:03:48,110 --> 00:03:50,479 78 00:03:50,479 --> 00:03:53,930 79 00:03:53,930 --> 00:03:56,060 80 00:03:56,060 --> 00:03:58,520 81 00:03:58,520 --> 00:04:00,530 82 00:04:00,530 --> 00:04:03,680 83 00:04:03,680 --> 00:04:03,690 84 00:04:03,690 --> 00:04:04,900 85 00:04:04,900 --> 00:04:07,759 86 00:04:07,759 --> 00:04:10,039 87 00:04:10,039 --> 00:04:12,410 88 00:04:12,410 --> 00:04:14,449 89 00:04:14,449 --> 00:04:18,590 90 00:04:18,590 --> 00:04:26,870 91 00:04:26,870 --> 00:04:29,670 92 00:04:29,670 --> 00:04:29,680 93 00:04:29,680 --> 00:04:33,260 94 00:04:33,260 --> 00:04:35,510 95 00:04:35,510 --> 00:04:36,920 96 00:04:36,920 --> 00:04:38,540 97 00:04:38,540 --> 00:04:54,409 98 00:04:54,409 --> 00:04:56,629 99 00:04:56,629 --> 00:04:58,219 100 00:04:58,219 --> 00:05:01,610 101 00:05:01,610 --> 00:05:03,170 102 00:05:03,170 --> 00:05:05,360 103 00:05:05,360 --> 00:05:16,279 104 00:05:16,279 --> 00:05:18,050 105 00:05:18,050 --> 00:05:19,670 106 00:05:19,670 --> 00:05:21,550 107 00:05:21,550 --> 00:05:23,600 108 00:05:23,600 --> 00:05:25,670 109 00:05:25,670 --> 00:05:26,450 110 00:05:26,450 --> 00:05:29,180 111 00:05:29,180 --> 00:05:32,029 112 00:05:32,029 --> 00:05:34,490 113 00:05:34,490 --> 00:05:36,969 114 00:05:36,969 --> 00:05:39,260 115 00:05:39,260 --> 00:05:42,950 116 00:05:42,950 --> 00:05:45,409 117 00:05:45,409 --> 00:05:51,499 118 00:05:51,499 --> 00:05:54,320 119 00:05:54,320 --> 00:05:57,439 120 00:05:57,439 --> 00:05:59,870 121 00:05:59,870 --> 00:06:02,390 122 00:06:02,390 --> 00:06:07,100 123 00:06:07,100 --> 00:06:10,939 124 00:06:10,939 --> 00:06:14,029 125 00:06:14,029 --> 00:06:16,189 126 00:06:16,189 --> 00:06:18,050 127 00:06:18,050 --> 00:06:19,760 128 00:06:19,760 --> 00:06:23,120 129 00:06:23,120 --> 00:06:25,580 130 00:06:25,580 --> 00:06:27,559 131 00:06:27,559 --> 00:06:29,120 132 00:06:29,120 --> 00:06:32,149 133 00:06:32,149 --> 00:06:34,689 134 00:06:34,689 --> 00:06:37,339 135 00:06:37,339 --> 00:06:40,249 136 00:06:40,249 --> 00:06:42,730 137 00:06:42,730 --> 00:06:45,439 138 00:06:45,439 --> 00:06:46,430 139 00:06:46,430 --> 00:06:48,500 140 00:06:48,500 --> 00:06:50,750 141 00:06:50,750 --> 00:06:53,690 142 00:06:53,690 --> 00:06:55,280 143 00:06:55,280 --> 00:06:58,820 144 00:06:58,820 --> 00:07:00,350 145 00:07:00,350 --> 00:07:00,360 146 00:07:00,360 --> 00:07:01,220 147 00:07:01,220 --> 00:07:03,740 148 00:07:03,740 --> 00:07:05,780 149 00:07:05,780 --> 00:07:08,630 150 00:07:08,630 --> 00:07:11,120 151 00:07:11,120 --> 00:07:13,700 152 00:07:13,700 --> 00:07:15,730 153 00:07:15,730 --> 00:07:17,930 154 00:07:17,930 --> 00:07:21,380 155 00:07:21,380 --> 00:07:24,410 156 00:07:24,410 --> 00:07:25,970 157 00:07:25,970 --> 00:07:28,700 158 00:07:28,700 --> 00:07:30,320 159 00:07:30,320 --> 00:07:32,420 160 00:07:32,420 --> 00:07:35,060 161 00:07:35,060 --> 00:07:37,460 162 00:07:37,460 --> 00:07:41,300 163 00:07:41,300 --> 00:07:43,100 164 00:07:43,100 --> 00:07:45,110 165 00:07:45,110 --> 00:07:47,450 166 00:07:47,450 --> 00:07:49,790 167 00:07:49,790 --> 00:07:52,240 168 00:07:52,240 --> 00:07:55,990 169 00:07:55,990 --> 00:07:58,820 170 00:07:58,820 --> 00:08:00,620 171 00:08:00,620 --> 00:08:02,120 172 00:08:02,120 --> 00:08:04,130 173 00:08:04,130 --> 00:08:05,720 174 00:08:05,720 --> 00:08:07,909 175 00:08:07,909 --> 00:08:10,730 176 00:08:10,730 --> 00:08:12,350 177 00:08:12,350 --> 00:08:15,290 178 00:08:15,290 --> 00:08:18,610 179 00:08:18,610 --> 00:08:20,600 180 00:08:20,600 --> 00:08:24,230 181 00:08:24,230 --> 00:08:25,310 182 00:08:25,310 --> 00:08:30,170 183 00:08:30,170 --> 00:08:31,340 184 00:08:31,340 --> 00:08:33,800 185 00:08:33,800 --> 00:08:36,230 186 00:08:36,230 --> 00:08:40,459 187 00:08:40,459 --> 00:08:42,159 188 00:08:42,159 --> 00:08:45,800 189 00:08:45,800 --> 00:08:49,100 190 00:08:49,100 --> 00:08:51,470 191 00:08:51,470 --> 00:08:54,530 192 00:08:54,530 --> 00:08:56,930 193 00:08:56,930 --> 00:08:57,950 194 00:08:57,950 --> 00:09:00,980 195 00:09:00,980 --> 00:09:02,950 196 00:09:02,950 --> 00:09:05,300 197 00:09:05,300 --> 00:09:07,550 198 00:09:07,550 --> 00:09:09,950 199 00:09:09,950 --> 00:09:11,780 200 00:09:11,780 --> 00:09:14,570 201 00:09:14,570 --> 00:09:17,300 202 00:09:17,300 --> 00:09:20,090 203 00:09:20,090 --> 00:09:22,250 204 00:09:22,250 --> 00:09:26,210 205 00:09:26,210 --> 00:09:27,590 206 00:09:27,590 --> 00:09:30,020 207 00:09:30,020 --> 00:09:34,220 208 00:09:34,220 --> 00:09:35,660 209 00:09:35,660 --> 00:09:38,030 210 00:09:38,030 --> 00:09:40,910 211 00:09:40,910 --> 00:09:43,610 212 00:09:43,610 --> 00:09:45,620 213 00:09:45,620 --> 00:09:46,880 214 00:09:46,880 --> 00:09:49,430 215 00:09:49,430 --> 00:09:51,080 216 00:09:51,080 --> 00:09:55,250 217 00:09:55,250 --> 00:09:57,500 218 00:09:57,500 --> 00:09:59,840 219 00:09:59,840 --> 00:10:02,300 220 00:10:02,300 --> 00:10:04,610 221 00:10:04,610 --> 00:10:06,800 222 00:10:06,800 --> 00:10:08,810 223 00:10:08,810 --> 00:10:10,400 224 00:10:10,400 --> 00:10:12,590 225 00:10:12,590 --> 00:10:13,700 226 00:10:13,700 --> 00:10:16,850 227 00:10:16,850 --> 00:10:18,020 228 00:10:18,020 --> 00:10:20,180 229 00:10:20,180 --> 00:10:21,470 230 00:10:21,470 --> 00:10:23,780 231 00:10:23,780 --> 00:10:27,080 232 00:10:27,080 --> 00:10:28,700 233 00:10:28,700 --> 00:10:30,620 234 00:10:30,620 --> 00:10:32,110 235 00:10:32,110 --> 00:10:34,520 236 00:10:34,520 --> 00:10:36,850 237 00:10:36,850 --> 00:10:39,950 238 00:10:39,950 --> 00:10:42,110 239 00:10:42,110 --> 00:10:45,680 240 00:10:45,680 --> 00:10:48,320 241 00:10:48,320 --> 00:10:50,270 242 00:10:50,270 --> 00:10:51,590 243 00:10:51,590 --> 00:10:54,710 244 00:10:54,710 --> 00:10:57,010 245 00:10:57,010 --> 00:11:00,140 246 00:11:00,140 --> 00:11:02,990 247 00:11:02,990 --> 00:11:06,860 248 00:11:06,860 --> 00:11:08,780 249 00:11:08,780 --> 00:11:09,680 250 00:11:09,680 --> 00:11:12,680 251 00:11:12,680 --> 00:11:17,090 252 00:11:17,090 --> 00:11:18,470 253 00:11:18,470 --> 00:11:20,060 254 00:11:20,060 --> 00:11:21,740 255 00:11:21,740 --> 00:11:23,840 256 00:11:23,840 --> 00:11:25,490 257 00:11:25,490 --> 00:11:28,070 258 00:11:28,070 --> 00:11:30,500 259 00:11:30,500 --> 00:11:33,440 260 00:11:33,440 --> 00:11:37,220 261 00:11:37,220 --> 00:11:40,040 262 00:11:40,040 --> 00:11:42,080 263 00:11:42,080 --> 00:11:44,480 264 00:11:44,480 --> 00:11:47,480 265 00:11:47,480 --> 00:11:48,710 266 00:11:48,710 --> 00:11:52,040 267 00:11:52,040 --> 00:11:54,020 268 00:11:54,020 --> 00:11:56,990 269 00:11:56,990 --> 00:11:58,070 270 00:11:58,070 --> 00:12:00,200 271 00:12:00,200 --> 00:12:02,360 272 00:12:02,360 --> 00:12:04,730 273 00:12:04,730 --> 00:12:07,490 274 00:12:07,490 --> 00:12:09,500 275 00:12:09,500 --> 00:12:12,410 276 00:12:12,410 --> 00:12:14,600 277 00:12:14,600 --> 00:12:17,750 278 00:12:17,750 --> 00:12:18,830 279 00:12:18,830 --> 00:12:20,990 280 00:12:20,990 --> 00:12:23,210 281 00:12:23,210 --> 00:12:25,340 282 00:12:25,340 --> 00:12:27,920 283 00:12:27,920 --> 00:12:30,740 284 00:12:30,740 --> 00:12:32,660 285 00:12:32,660 --> 00:12:34,970 286 00:12:34,970 --> 00:12:37,130 287 00:12:37,130 --> 00:12:39,800 288 00:12:39,800 --> 00:12:41,480 289 00:12:41,480 --> 00:12:43,850 290 00:12:43,850 --> 00:12:46,790 291 00:12:46,790 --> 00:12:48,140 292 00:12:48,140 --> 00:12:50,660 293 00:12:50,660 --> 00:12:52,850 294 00:12:52,850 --> 00:12:56,120 295 00:12:56,120 --> 00:12:58,130 296 00:12:58,130 --> 00:13:00,800 297 00:13:00,800 --> 00:13:02,810 298 00:13:02,810 --> 00:13:05,390 299 00:13:05,390 --> 00:13:07,190 300 00:13:07,190 --> 00:13:09,680 301 00:13:09,680 --> 00:13:11,930 302 00:13:11,930 --> 00:13:13,250 303 00:13:13,250 --> 00:13:15,860 304 00:13:15,860 --> 00:13:20,010 305 00:13:20,010 --> 00:13:24,220 306 00:13:24,220 --> 00:13:26,200 307 00:13:26,200 --> 00:13:29,710 308 00:13:29,710 --> 00:13:34,720 309 00:13:34,720 --> 00:13:36,730 310 00:13:36,730 --> 00:13:40,180 311 00:13:40,180 --> 00:13:42,610 312 00:13:42,610 --> 00:13:46,300 313 00:13:46,300 --> 00:13:49,480 314 00:13:49,480 --> 00:13:51,970 315 00:13:51,970 --> 00:13:54,430 316 00:13:54,430 --> 00:13:57,190 317 00:13:57,190 --> 00:14:02,250 318 00:14:02,250 --> 00:14:04,960 319 00:14:04,960 --> 00:14:06,400 320 00:14:06,400 --> 00:14:10,300 321 00:14:10,300 --> 00:14:13,090 322 00:14:13,090 --> 00:14:15,760 323 00:14:15,760 --> 00:14:18,370 324 00:14:18,370 --> 00:14:20,530 325 00:14:20,530 --> 00:14:22,180 326 00:14:22,180 --> 00:14:25,810 327 00:14:25,810 --> 00:14:28,120 328 00:14:28,120 --> 00:14:29,740 329 00:14:29,740 --> 00:14:32,740 330 00:14:32,740 --> 00:14:35,230 331 00:14:35,230 --> 00:14:36,850 332 00:14:36,850 --> 00:14:37,990 333 00:14:37,990 --> 00:14:40,120 334 00:14:40,120 --> 00:14:42,250 335 00:14:42,250 --> 00:14:44,769 336 00:14:44,769 --> 00:14:48,040 337 00:14:48,040 --> 00:14:50,980 338 00:14:50,980 --> 00:14:54,640 339 00:14:54,640 --> 00:14:57,519 340 00:14:57,519 --> 00:14:59,500 341 00:14:59,500 --> 00:15:01,480 342 00:15:01,480 --> 00:15:03,910 343 00:15:03,910 --> 00:15:05,199 344 00:15:05,199 --> 00:15:07,000 345 00:15:07,000 --> 00:15:09,010 346 00:15:09,010 --> 00:15:11,110 347 00:15:11,110 --> 00:15:15,310 348 00:15:15,310 --> 00:15:17,680 349 00:15:17,680 --> 00:15:20,800 350 00:15:20,800 --> 00:15:25,030 351 00:15:25,030 --> 00:15:27,490 352 00:15:27,490 --> 00:15:28,810 353 00:15:28,810 --> 00:15:30,790 354 00:15:30,790 --> 00:15:33,700 355 00:15:33,700 --> 00:15:35,230 356 00:15:35,230 --> 00:15:38,950 357 00:15:38,950 --> 00:15:41,200 358 00:15:41,200 --> 00:15:43,990 359 00:15:43,990 --> 00:15:46,270 360 00:15:46,270 --> 00:15:49,600 361 00:15:49,600 --> 00:15:51,070 362 00:15:51,070 --> 00:15:53,380 363 00:15:53,380 --> 00:15:55,060 364 00:15:55,060 --> 00:15:57,700 365 00:15:57,700 --> 00:16:00,790 366 00:16:00,790 --> 00:16:03,400 367 00:16:03,400 --> 00:16:05,350 368 00:16:05,350 --> 00:16:09,910 369 00:16:09,910 --> 00:16:11,770 370 00:16:11,770 --> 00:16:14,290 371 00:16:14,290 --> 00:16:16,060 372 00:16:16,060 --> 00:16:16,070 373 00:16:16,070 --> 00:16:16,450 374 00:16:16,450 --> 00:16:18,610 375 00:16:18,610 --> 00:16:20,980 376 00:16:20,980 --> 00:16:23,650 377 00:16:23,650 --> 00:16:25,120 378 00:16:25,120 --> 00:16:26,710 379 00:16:26,710 --> 00:16:29,020 380 00:16:29,020 --> 00:16:31,000 381 00:16:31,000 --> 00:16:34,600 382 00:16:34,600 --> 00:16:37,960 383 00:16:37,960 --> 00:16:40,810 384 00:16:40,810 --> 00:16:43,540 385 00:16:43,540 --> 00:16:44,920 386 00:16:44,920 --> 00:16:49,180 387 00:16:49,180 --> 00:16:50,500 388 00:16:50,500 --> 00:16:52,600 389 00:16:52,600 --> 00:16:54,640 390 00:16:54,640 --> 00:16:56,290 391 00:16:56,290 --> 00:16:58,150 392 00:16:58,150 --> 00:17:01,360 393 00:17:01,360 --> 00:17:03,220 394 00:17:03,220 --> 00:17:06,130 395 00:17:06,130 --> 00:17:07,240 396 00:17:07,240 --> 00:17:08,320 397 00:17:08,320 --> 00:17:10,600 398 00:17:10,600 --> 00:17:12,670 399 00:17:12,670 --> 00:17:14,050 400 00:17:14,050 --> 00:17:17,199 401 00:17:17,199 --> 00:17:18,790 402 00:17:18,790 --> 00:17:21,490 403 00:17:21,490 --> 00:17:23,710 404 00:17:23,710 --> 00:17:25,990 405 00:17:25,990 --> 00:17:27,960 406 00:17:27,960 --> 00:17:30,190 407 00:17:30,190 --> 00:17:31,810 408 00:17:31,810 --> 00:17:34,450 409 00:17:34,450 --> 00:17:37,140 410 00:17:37,140 --> 00:17:39,300 411 00:17:39,300 --> 00:17:39,310 412 00:17:39,310 --> 00:17:40,930 413 00:17:40,930 --> 00:17:43,510 414 00:17:43,510 --> 00:17:46,510 415 00:17:46,510 --> 00:17:48,580 416 00:17:48,580 --> 00:17:49,840 417 00:17:49,840 --> 00:17:53,170 418 00:17:53,170 --> 00:17:54,970 419 00:17:54,970 --> 00:17:57,520 420 00:17:57,520 --> 00:18:01,690 421 00:18:01,690 --> 00:18:03,970 422 00:18:03,970 --> 00:18:05,700 423 00:18:05,700 --> 00:18:09,400 424 00:18:09,400 --> 00:18:12,940 425 00:18:12,940 --> 00:18:15,700 426 00:18:15,700 --> 00:18:19,270 427 00:18:19,270 --> 00:18:21,910 428 00:18:21,910 --> 00:18:23,890 429 00:18:23,890 --> 00:18:26,050 430 00:18:26,050 --> 00:18:29,800 431 00:18:29,800 --> 00:18:31,750 432 00:18:31,750 --> 00:18:33,940 433 00:18:33,940 --> 00:18:36,040 434 00:18:36,040 --> 00:18:40,180 435 00:18:40,180 --> 00:18:44,740 436 00:18:44,740 --> 00:18:49,540 437 00:18:49,540 --> 00:18:51,220 438 00:18:51,220 --> 00:18:53,980 439 00:18:53,980 --> 00:18:55,420 440 00:18:55,420 --> 00:18:57,550 441 00:18:57,550 --> 00:18:59,680 442 00:18:59,680 --> 00:19:02,320 443 00:19:02,320 --> 00:19:04,240 444 00:19:04,240 --> 00:19:08,400 445 00:19:08,400 --> 00:19:13,450 446 00:19:13,450 --> 00:19:15,280 447 00:19:15,280 --> 00:19:17,520 448 00:19:17,520 --> 00:19:19,960 449 00:19:19,960 --> 00:19:22,450 450 00:19:22,450 --> 00:19:24,130 451 00:19:24,130 --> 00:19:27,130 452 00:19:27,130 --> 00:19:28,270 453 00:19:28,270 --> 00:19:31,480 454 00:19:31,480 --> 00:19:37,210 455 00:19:37,210 --> 00:19:39,200 456 00:19:39,200 --> 00:19:41,510 457 00:19:41,510 --> 00:19:43,640 458 00:19:43,640 --> 00:19:46,070 459 00:19:46,070 --> 00:19:48,050 460 00:19:48,050 --> 00:19:50,810 461 00:19:50,810 --> 00:19:53,000 462 00:19:53,000 --> 00:19:56,600 463 00:19:56,600 --> 00:19:57,460 464 00:19:57,460 --> 00:20:00,800 465 00:20:00,800 --> 00:20:02,450 466 00:20:02,450 --> 00:20:03,530 467 00:20:03,530 --> 00:20:07,310 468 00:20:07,310 --> 00:20:10,640 469 00:20:10,640 --> 00:20:12,770 470 00:20:12,770 --> 00:20:23,440 471 00:20:23,440 --> 00:20:25,420 472 00:20:25,420 --> 00:20:28,570 473 00:20:28,570 --> 00:20:30,010 474 00:20:30,010 --> 00:20:34,180 475 00:20:34,180 --> 00:20:36,370 476 00:20:36,370 --> 00:20:39,400 477 00:20:39,400 --> 00:20:41,740 478 00:20:41,740 --> 00:20:45,580 479 00:20:45,580 --> 00:20:48,370 480 00:20:48,370 --> 00:20:50,980 481 00:20:50,980 --> 00:20:53,080 482 00:20:53,080 --> 00:20:56,590 483 00:20:56,590 --> 00:20:59,050 484 00:20:59,050 --> 00:21:02,050 485 00:21:02,050 --> 00:21:05,830 486 00:21:05,830 --> 00:21:07,900 487 00:21:07,900 --> 00:21:10,900 488 00:21:10,900 --> 00:21:13,300 489 00:21:13,300 --> 00:21:15,640 490 00:21:15,640 --> 00:21:23,130 491 00:21:23,130 --> 00:21:25,780 492 00:21:25,780 --> 00:21:29,380 493 00:21:29,380 --> 00:21:31,240 494 00:21:31,240 --> 00:21:33,250 495 00:21:33,250 --> 00:21:35,950 496 00:21:35,950 --> 00:21:39,070 497 00:21:39,070 --> 00:21:41,860 498 00:21:41,860 --> 00:21:45,490 499 00:21:45,490 --> 00:21:48,400 500 00:21:48,400 --> 00:21:50,770 501 00:21:50,770 --> 00:21:53,800 502 00:21:53,800 --> 00:21:56,770 503 00:21:56,770 --> 00:22:01,330 504 00:22:01,330 --> 00:22:03,430 505 00:22:03,430 --> 00:22:07,090 506 00:22:07,090 --> 00:22:13,390 507 00:22:13,390 --> 00:22:16,240 508 00:22:16,240 --> 00:22:19,870 509 00:22:19,870 --> 00:22:22,270 510 00:22:22,270 --> 00:22:26,230 511 00:22:26,230 --> 00:22:29,800 512 00:22:29,800 --> 00:22:33,430 513 00:22:33,430 --> 00:22:35,260 514 00:22:35,260 --> 00:22:37,030 515 00:22:37,030 --> 00:22:41,440 516 00:22:41,440 --> 00:22:44,230 517 00:22:44,230 --> 00:22:49,240 518 00:22:49,240 --> 00:22:54,550 519 00:22:54,550 --> 00:22:57,910 520 00:22:57,910 --> 00:23:03,130 521 00:23:03,130 --> 00:23:04,930 522 00:23:04,930 --> 00:23:07,030 523 00:23:07,030 --> 00:23:08,530 524 00:23:08,530 --> 00:23:14,680 525 00:23:14,680 --> 00:23:21,219 526 00:23:21,219 --> 00:23:27,339 527 00:23:27,339 --> 00:23:29,979 528 00:23:29,979 --> 00:23:42,170 529 00:23:42,170 --> 00:23:47,180 530 00:23:47,180 --> 00:23:49,220 531 00:23:49,220 --> 00:23:51,680 532 00:23:51,680 --> 00:23:53,630 533 00:23:53,630 --> 00:23:56,510 534 00:23:56,510 --> 00:23:58,690 535 00:23:58,690 --> 00:24:03,020 536 00:24:03,020 --> 00:24:06,080 537 00:24:06,080 --> 00:24:09,530 538 00:24:09,530 --> 00:24:12,200 539 00:24:12,200 --> 00:24:15,470 540 00:24:15,470 --> 00:24:18,550 541 00:24:18,550 --> 00:24:22,610 542 00:24:22,610 --> 00:24:25,670 543 00:24:25,670 --> 00:24:28,550 544 00:24:28,550 --> 00:24:30,860 545 00:24:30,860 --> 00:24:33,320 546 00:24:33,320 --> 00:24:34,550 547 00:24:34,550 --> 00:24:35,660 548 00:24:35,660 --> 00:24:37,970 549 00:24:37,970 --> 00:24:41,360 550 00:24:41,360 --> 00:24:43,820 551 00:24:43,820 --> 00:24:46,190 552 00:24:46,190 --> 00:24:52,340 553 00:24:52,340 --> 00:24:55,159 554 00:24:55,159 --> 00:24:58,820 555 00:24:58,820 --> 00:25:01,279 556 00:25:01,279 --> 00:25:03,409 557 00:25:03,409 --> 00:25:04,759 558 00:25:04,759 --> 00:25:07,729 559 00:25:07,729 --> 00:25:10,519 560 00:25:10,519 --> 00:25:12,080 561 00:25:12,080 --> 00:25:14,629 562 00:25:14,629 --> 00:25:17,330 563 00:25:17,330 --> 00:25:18,769 564 00:25:18,769 --> 00:25:21,830 565 00:25:21,830 --> 00:25:23,419 566 00:25:23,419 --> 00:25:25,039 567 00:25:25,039 --> 00:25:28,460 568 00:25:28,460 --> 00:25:31,039 569 00:25:31,039 --> 00:25:34,810 570 00:25:34,810 --> 00:25:38,330 571 00:25:38,330 --> 00:25:40,669 572 00:25:40,669 --> 00:25:42,019 573 00:25:42,019 --> 00:25:43,849 574 00:25:43,849 --> 00:25:45,440 575 00:25:45,440 --> 00:25:47,690 576 00:25:47,690 --> 00:25:50,180 577 00:25:50,180 --> 00:25:52,009 578 00:25:52,009 --> 00:25:56,799 579 00:25:56,799 --> 00:26:02,299 580 00:26:02,299 --> 00:26:03,499 581 00:26:03,499 --> 00:26:06,619 582 00:26:06,619 --> 00:26:09,289 583 00:26:09,289 --> 00:26:12,499 584 00:26:12,499 --> 00:26:14,359 585 00:26:14,359 --> 00:26:16,369 586 00:26:16,369 --> 00:26:20,749 587 00:26:20,749 --> 00:26:22,700 588 00:26:22,700 --> 00:26:25,639 589 00:26:25,639 --> 00:26:29,090 590 00:26:29,090 --> 00:26:30,399 591 00:26:30,399 --> 00:26:33,349 592 00:26:33,349 --> 00:26:34,669 593 00:26:34,669 --> 00:26:38,210 594 00:26:38,210 --> 00:26:42,799 595 00:26:42,799 --> 00:26:46,399 596 00:26:46,399 --> 00:26:49,519 597 00:26:49,519 --> 00:26:52,070 598 00:26:52,070 --> 00:26:54,440 599 00:26:54,440 --> 00:26:56,899 600 00:26:56,899 --> 00:26:59,299 601 00:26:59,299 --> 00:26:59,309 602 00:26:59,309 --> 00:27:00,180 603 00:27:00,180 --> 00:27:02,010 604 00:27:02,010 --> 00:27:05,640 605 00:27:05,640 --> 00:27:07,320 606 00:27:07,320 --> 00:27:09,240 607 00:27:09,240 --> 00:27:11,790 608 00:27:11,790 --> 00:27:13,830 609 00:27:13,830 --> 00:27:15,810 610 00:27:15,810 --> 00:27:19,620 611 00:27:19,620 --> 00:27:22,020 612 00:27:22,020 --> 00:27:24,720 613 00:27:24,720 --> 00:27:26,700 614 00:27:26,700 --> 00:27:28,380 615 00:27:28,380 --> 00:27:33,240 616 00:27:33,240 --> 00:27:35,430 617 00:27:35,430 --> 00:27:37,710 618 00:27:37,710 --> 00:27:40,860 619 00:27:40,860 --> 00:27:43,140 620 00:27:43,140 --> 00:27:46,230 621 00:27:46,230 --> 00:27:48,540 622 00:27:48,540 --> 00:27:50,640 623 00:27:50,640 --> 00:27:53,270 624 00:27:53,270 --> 00:27:56,310 625 00:27:56,310 --> 00:27:58,320 626 00:27:58,320 --> 00:27:58,330 627 00:27:58,330 --> 00:27:58,890 628 00:27:58,890 --> 00:28:03,270 629 00:28:03,270 --> 00:28:04,860 630 00:28:04,860 --> 00:28:06,930 631 00:28:06,930 --> 00:28:11,010 632 00:28:11,010 --> 00:28:12,360 633 00:28:12,360 --> 00:28:16,669 634 00:28:16,669 --> 00:28:22,710 635 00:28:22,710 --> 00:28:24,240 636 00:28:24,240 --> 00:28:27,210 637 00:28:27,210 --> 00:28:31,140 638 00:28:31,140 --> 00:28:35,190 639 00:28:35,190 --> 00:28:37,140 640 00:28:37,140 --> 00:28:41,370 641 00:28:41,370 --> 00:28:43,919 642 00:28:43,919 --> 00:28:47,580 643 00:28:47,580 --> 00:28:51,210 644 00:28:51,210 --> 00:28:53,280 645 00:28:53,280 --> 00:29:00,330 646 00:29:00,330 --> 00:29:05,820 647 00:29:05,820 --> 00:29:08,249 648 00:29:08,249 --> 00:29:10,529 649 00:29:10,529 --> 00:29:12,119 650 00:29:12,119 --> 00:29:16,789 651 00:29:16,789 --> 00:29:21,590 652 00:29:21,590 --> 00:29:24,570 653 00:29:24,570 --> 00:29:26,729 654 00:29:26,729 --> 00:29:30,629 655 00:29:30,629 --> 00:29:35,789 656 00:29:35,789 --> 00:29:39,599 657 00:29:39,599 --> 00:29:42,450 658 00:29:42,450 --> 00:29:45,289 659 00:29:45,289 --> 00:29:47,430 660 00:29:47,430 --> 00:29:50,580 661 00:29:50,580 --> 00:29:52,950 662 00:29:52,950 --> 00:29:54,539 663 00:29:54,539 --> 00:29:57,690 664 00:29:57,690 --> 00:29:58,799 665 00:29:58,799 --> 00:30:02,039 666 00:30:02,039 --> 00:30:06,080 667 00:30:06,080 --> 00:30:09,349 668 00:30:09,349 --> 00:30:14,580 669 00:30:14,580 --> 00:30:17,009 670 00:30:17,009 --> 00:30:20,849 671 00:30:20,849 --> 00:30:23,249 672 00:30:23,249 --> 00:30:26,669 673 00:30:26,669 --> 00:30:27,680 674 00:30:27,680 --> 00:30:30,599 675 00:30:30,599 --> 00:30:32,249 676 00:30:32,249 --> 00:30:35,369 677 00:30:35,369 --> 00:30:37,739 678 00:30:37,739 --> 00:30:40,229 679 00:30:40,229 --> 00:30:43,259 680 00:30:43,259 --> 00:30:46,440 681 00:30:46,440 --> 00:30:49,139 682 00:30:49,139 --> 00:30:51,509 683 00:30:51,509 --> 00:30:54,389 684 00:30:54,389 --> 00:30:57,570 685 00:30:57,570 --> 00:31:00,659 686 00:31:00,659 --> 00:31:03,090 687 00:31:03,090 --> 00:31:05,909 688 00:31:05,909 --> 00:31:08,549 689 00:31:08,549 --> 00:31:13,680 690 00:31:13,680 --> 00:31:15,989 691 00:31:15,989 --> 00:31:15,999 692 00:31:15,999 --> 00:31:20,580 693 00:31:20,580 --> 00:31:22,899 694 00:31:22,899 --> 00:31:22,909 695 00:31:22,909 --> 00:31:23,560 696 00:31:23,560 --> 00:31:25,480 697 00:31:25,480 --> 00:31:30,460 698 00:31:30,460 --> 00:31:33,940 699 00:31:33,940 --> 00:31:38,470 700 00:31:38,470 --> 00:31:42,070 701 00:31:42,070 --> 00:31:44,440 702 00:31:44,440 --> 00:31:48,580 703 00:31:48,580 --> 00:31:50,529 704 00:31:50,529 --> 00:31:54,460 705 00:31:54,460 --> 00:31:56,999 706 00:31:56,999 --> 00:31:58,899 707 00:31:58,899 --> 00:32:01,330 708 00:32:01,330 --> 00:32:05,409 709 00:32:05,409 --> 00:32:08,200 710 00:32:08,200 --> 00:32:10,149 711 00:32:10,149 --> 00:32:12,039 712 00:32:12,039 --> 00:32:13,930 713 00:32:13,930 --> 00:32:15,940 714 00:32:15,940 --> 00:32:19,180 715 00:32:19,180 --> 00:32:23,310 716 00:32:23,310 --> 00:32:26,249 717 00:32:26,249 --> 00:32:28,379 718 00:32:28,379 --> 00:32:34,659 719 00:32:34,659 --> 00:32:36,220 720 00:32:36,220 --> 00:32:39,039 721 00:32:39,039 --> 00:32:45,159 722 00:32:45,159 --> 00:32:56,110 723 00:32:56,110 --> 00:32:58,269 724 00:32:58,269 --> 00:33:00,269 725 00:33:00,269 --> 00:33:06,039 726 00:33:06,039 --> 00:33:07,060 727 00:33:07,060 --> 00:33:10,180 728 00:33:10,180 --> 00:33:12,580 729 00:33:12,580 --> 00:33:15,730 730 00:33:15,730 --> 00:33:17,680 731 00:33:17,680 --> 00:33:19,419 732 00:33:19,419 --> 00:33:20,710 733 00:33:20,710 --> 00:33:23,860 734 00:33:23,860 --> 00:33:26,380 735 00:33:26,380 --> 00:33:29,260 736 00:33:29,260 --> 00:33:31,990 737 00:33:31,990 --> 00:33:36,159 738 00:33:36,159 --> 00:33:38,740 739 00:33:38,740 --> 00:33:40,960 740 00:33:40,960 --> 00:33:42,940 741 00:33:42,940 --> 00:33:46,330 742 00:33:46,330 --> 00:33:47,830 743 00:33:47,830 --> 00:33:50,769 744 00:33:50,769 --> 00:33:53,320 745 00:33:53,320 --> 00:33:58,930 746 00:33:58,930 --> 00:34:02,529 747 00:34:02,529 --> 00:34:04,510 748 00:34:04,510 --> 00:34:07,600 749 00:34:07,600 --> 00:34:09,490 750 00:34:09,490 --> 00:34:11,919 751 00:34:11,919 --> 00:34:13,899 752 00:34:13,899 --> 00:34:17,470 753 00:34:17,470 --> 00:34:19,060 754 00:34:19,060 --> 00:34:21,310 755 00:34:21,310 --> 00:34:23,260 756 00:34:23,260 --> 00:34:26,889 757 00:34:26,889 --> 00:34:28,750 758 00:34:28,750 --> 00:34:31,599 759 00:34:31,599 --> 00:34:32,800 760 00:34:32,800 --> 00:34:35,440 761 00:34:35,440 --> 00:34:38,770 762 00:34:38,770 --> 00:34:40,750 763 00:34:40,750 --> 00:34:43,480 764 00:34:43,480 --> 00:34:46,119 765 00:34:46,119 --> 00:34:48,190 766 00:34:48,190 --> 00:34:50,919 767 00:34:50,919 --> 00:34:53,589 768 00:34:53,589 --> 00:34:56,109 769 00:34:56,109 --> 00:34:57,970 770 00:34:57,970 --> 00:34:59,470 771 00:34:59,470 --> 00:35:03,040 772 00:35:03,040 --> 00:35:07,460 773 00:35:07,460 --> 00:35:10,440 774 00:35:10,440 --> 00:35:15,710 775 00:35:15,710 --> 00:35:18,390 776 00:35:18,390 --> 00:35:21,750 777 00:35:21,750 --> 00:35:24,150 778 00:35:24,150 --> 00:35:27,240 779 00:35:27,240 --> 00:35:28,859 780 00:35:28,859 --> 00:35:31,349 781 00:35:31,349 --> 00:35:43,000 782 00:35:43,000 --> 00:35:45,490 783 00:35:45,490 --> 00:35:47,170 784 00:35:47,170 --> 00:35:50,170 785 00:35:50,170 --> 00:35:52,390 786 00:35:52,390 --> 00:36:01,299 787 00:36:01,299 --> 00:36:04,069 788 00:36:04,069 --> 00:36:08,299 789 00:36:08,299 --> 00:36:10,040 790 00:36:10,040 --> 00:36:12,079 791 00:36:12,079 --> 00:36:14,720 792 00:36:14,720 --> 00:36:17,690 793 00:36:17,690 --> 00:36:24,559 794 00:36:24,559 --> 00:36:26,180 795 00:36:26,180 --> 00:36:28,010 796 00:36:28,010 --> 00:36:32,720 797 00:36:32,720 --> 00:36:35,210 798 00:36:35,210 --> 00:36:37,490 799 00:36:37,490 --> 00:36:39,890 800 00:36:39,890 --> 00:36:45,799 801 00:36:45,799 --> 00:36:48,260 802 00:36:48,260 --> 00:36:52,099 803 00:36:52,099 --> 00:36:54,890 804 00:36:54,890 --> 00:36:56,329 805 00:36:56,329 --> 00:36:58,299 806 00:36:58,299 --> 00:37:01,520 807 00:37:01,520 --> 00:37:04,760 808 00:37:04,760 --> 00:37:09,260 809 00:37:09,260 --> 00:37:11,240 810 00:37:11,240 --> 00:37:13,940 811 00:37:13,940 --> 00:37:16,430 812 00:37:16,430 --> 00:37:18,410 813 00:37:18,410 --> 00:37:21,079 814 00:37:21,079 --> 00:37:28,310 815 00:37:28,310 --> 00:37:33,170 816 00:37:33,170 --> 00:37:36,630 817 00:37:36,630 --> 00:37:39,839 818 00:37:39,839 --> 00:37:43,079 819 00:37:43,079 --> 00:37:44,579 820 00:37:44,579 --> 00:37:50,970 821 00:37:50,970 --> 00:37:50,980 822 00:37:50,980 --> 00:37:53,910 823 00:37:53,910 --> 00:37:53,920 824 00:37:53,920 --> 00:38:02,640 825 00:38:02,640 --> 00:38:04,690 826 00:38:04,690 --> 00:38:12,060 827 00:38:12,060 --> 00:38:12,070 828 00:38:12,070 --> 00:38:13,330 829 00:38:13,330 --> 00:38:14,890 830 00:38:14,890 --> 00:38:22,930 831 00:38:22,930 --> 00:38:25,510 832 00:38:25,510 --> 00:38:29,500 833 00:38:29,500 --> 00:38:31,510 834 00:38:31,510 --> 00:38:40,600 835 00:38:40,600 --> 00:38:42,160 836 00:38:42,160 --> 00:38:44,440 837 00:38:44,440 --> 00:38:46,090 838 00:38:46,090 --> 00:38:48,420 839 00:38:48,420 --> 00:38:51,430 840 00:38:51,430 --> 00:38:54,280 841 00:38:54,280 --> 00:38:55,780 842 00:38:55,780 --> 00:38:58,240 843 00:38:58,240 --> 00:39:00,280 844 00:39:00,280 --> 00:39:04,330 845 00:39:04,330 --> 00:39:07,750 846 00:39:07,750 --> 00:39:09,010 847 00:39:09,010 --> 00:39:17,800 848 00:39:17,800 --> 00:39:23,000 849 00:39:23,000 --> 00:39:26,240 850 00:39:26,240 --> 00:39:29,519 851 00:39:29,519 --> 00:39:31,589 852 00:39:31,589 --> 00:39:32,940 853 00:39:32,940 --> 00:39:37,019 854 00:39:37,019 --> 00:39:41,339 855 00:39:41,339 --> 00:39:42,779 856 00:39:42,779 --> 00:39:45,890 857 00:39:45,890 --> 00:39:49,740 858 00:39:49,740 --> 00:39:54,769 859 00:39:54,769 --> 00:39:58,650 860 00:39:58,650 --> 00:40:00,210 861 00:40:00,210 --> 00:40:03,029 862 00:40:03,029 --> 00:40:05,370 863 00:40:05,370 --> 00:40:06,960 864 00:40:06,960 --> 00:40:10,109 865 00:40:10,109 --> 00:40:11,849 866 00:40:11,849 --> 00:40:13,650 867 00:40:13,650 --> 00:40:15,319 868 00:40:15,319 --> 00:40:17,220 869 00:40:17,220 --> 00:40:23,200 870 00:40:23,200 --> 00:40:30,770 871 00:40:30,770 --> 00:40:32,120 872 00:40:32,120 --> 00:40:40,670 873 00:40:40,670 --> 00:40:44,180 874 00:40:44,180 --> 00:40:45,920 875 00:40:45,920 --> 00:40:49,580 876 00:40:49,580 --> 00:40:51,190 877 00:40:51,190 --> 00:40:56,780 878 00:40:56,780 --> 00:41:00,020 879 00:41:00,020 --> 00:41:02,450 880 00:41:02,450 --> 00:41:04,310 881 00:41:04,310 --> 00:41:06,560 882 00:41:06,560 --> 00:41:09,349 883 00:41:09,349 --> 00:41:12,109 884 00:41:12,109 --> 00:41:14,480 885 00:41:14,480 --> 00:41:17,240 886 00:41:17,240 --> 00:41:21,910 887 00:41:21,910 --> 00:41:24,259 888 00:41:24,259 --> 00:41:27,200 889 00:41:27,200 --> 00:41:29,479 890 00:41:29,479 --> 00:41:32,239 891 00:41:32,239 --> 00:41:34,430 892 00:41:34,430 --> 00:41:38,299 893 00:41:38,299 --> 00:41:41,479 894 00:41:41,479 --> 00:41:44,150 895 00:41:44,150 --> 00:41:46,069 896 00:41:46,069 --> 00:41:48,650 897 00:41:48,650 --> 00:41:49,759 898 00:41:49,759 --> 00:41:52,660 899 00:41:52,660 --> 00:42:01,700 900 00:42:01,700 --> 00:42:03,109 901 00:42:03,109 --> 00:42:05,079 902 00:42:05,079 --> 00:42:07,700 903 00:42:07,700 --> 00:42:09,380 904 00:42:09,380 --> 00:42:12,979 905 00:42:12,979 --> 00:42:15,680 906 00:42:15,680 --> 00:42:17,599 907 00:42:17,599 --> 00:42:19,009 908 00:42:19,009 --> 00:42:21,680 909 00:42:21,680 --> 00:42:24,019 910 00:42:24,019 --> 00:42:27,529 911 00:42:27,529 --> 00:42:30,380 912 00:42:30,380 --> 00:42:33,229 913 00:42:33,229 --> 00:42:34,459 914 00:42:34,459 --> 00:42:36,709 915 00:42:36,709 --> 00:42:39,289 916 00:42:39,289 --> 00:42:41,450 917 00:42:41,450 --> 00:42:44,329 918 00:42:44,329 --> 00:42:47,299 919 00:42:47,299 --> 00:42:49,519 920 00:42:49,519 --> 00:42:53,450 921 00:42:53,450 --> 00:42:55,579 922 00:42:55,579 --> 00:42:58,309 923 00:42:58,309 --> 00:42:59,809 924 00:42:59,809 --> 00:43:02,539 925 00:43:02,539 --> 00:43:06,950 926 00:43:06,950 --> 00:43:08,410 927 00:43:08,410 --> 00:43:11,930 928 00:43:11,930 --> 00:43:14,059 929 00:43:14,059 --> 00:43:17,420 930 00:43:17,420 --> 00:43:19,400 931 00:43:19,400 --> 00:43:21,499 932 00:43:21,499 --> 00:43:24,920 933 00:43:24,920 --> 00:43:24,930 934 00:43:24,930 --> 00:43:28,360 935 00:43:28,360 --> 00:43:32,540 936 00:43:32,540 --> 00:43:35,000 937 00:43:35,000 --> 00:43:37,610 938 00:43:37,610 --> 00:43:40,310 939 00:43:40,310 --> 00:43:42,830 940 00:43:42,830 --> 00:43:45,470 941 00:43:45,470 --> 00:43:48,190 942 00:43:48,190 --> 00:43:52,610 943 00:43:52,610 --> 00:43:54,080 944 00:43:54,080 --> 00:43:56,540 945 00:43:56,540 --> 00:43:59,120 946 00:43:59,120 --> 00:44:01,670 947 00:44:01,670 --> 00:44:04,640 948 00:44:04,640 --> 00:44:06,620 949 00:44:06,620 --> 00:44:08,780 950 00:44:08,780 --> 00:44:15,040 951 00:44:15,040 --> 00:44:17,330 952 00:44:17,330 --> 00:44:23,110 953 00:44:23,110 --> 00:44:26,240 954 00:44:26,240 --> 00:44:28,700 955 00:44:28,700 --> 00:44:31,310 956 00:44:31,310 --> 00:44:34,610 957 00:44:34,610 --> 00:44:37,280 958 00:44:37,280 --> 00:44:38,840 959 00:44:38,840 --> 00:44:40,610 960 00:44:40,610 --> 00:44:42,710 961 00:44:42,710 --> 00:44:45,950 962 00:44:45,950 --> 00:44:48,020 963 00:44:48,020 --> 00:44:50,240 964 00:44:50,240 --> 00:44:53,330 965 00:44:53,330 --> 00:44:57,230 966 00:44:57,230 --> 00:44:59,690 967 00:44:59,690 --> 00:45:09,470 968 00:45:09,470 --> 00:45:11,040 969 00:45:11,040 --> 00:45:13,440 970 00:45:13,440 --> 00:45:15,750 971 00:45:15,750 --> 00:45:18,240 972 00:45:18,240 --> 00:45:20,460 973 00:45:20,460 --> 00:45:22,770 974 00:45:22,770 --> 00:45:25,500 975 00:45:25,500 --> 00:45:27,150 976 00:45:27,150 --> 00:45:29,820 977 00:45:29,820 --> 00:45:32,370 978 00:45:32,370 --> 00:45:34,830 979 00:45:34,830 --> 00:45:36,150 980 00:45:36,150 --> 00:45:39,960 981 00:45:39,960 --> 00:45:42,060 982 00:45:42,060 --> 00:45:44,340 983 00:45:44,340 --> 00:45:46,980 984 00:45:46,980 --> 00:45:48,300 985 00:45:48,300 --> 00:45:50,100 986 00:45:50,100 --> 00:45:54,510 987 00:45:54,510 --> 00:45:56,520 988 00:45:56,520 --> 00:45:59,310 989 00:45:59,310 --> 00:46:01,800 990 00:46:01,800 --> 00:46:04,260 991 00:46:04,260 --> 00:46:06,150 992 00:46:06,150 --> 00:46:08,550 993 00:46:08,550 --> 00:46:21,480 994 00:46:21,480 --> 00:46:24,710 995 00:46:24,710 --> 00:46:31,170 996 00:46:31,170 --> 00:46:33,270 997 00:46:33,270 --> 00:46:36,210 998 00:46:36,210 --> 00:46:37,589 999 00:46:37,589 --> 00:46:39,990 1000 00:46:39,990 --> 00:46:42,300 1001 00:46:42,300 --> 00:46:44,870 1002 00:46:44,870 --> 00:46:47,160 1003 00:46:47,160 --> 00:46:49,410 1004 00:46:49,410 --> 00:46:52,109 1005 00:46:52,109 --> 00:46:56,579 1006 00:46:56,579 --> 00:46:58,079 1007 00:46:58,079 --> 00:47:01,380 1008 00:47:01,380 --> 00:47:03,359 1009 00:47:03,359 --> 00:47:06,150 1010 00:47:06,150 --> 00:47:08,190 1011 00:47:08,190 --> 00:47:10,890 1012 00:47:10,890 --> 00:47:12,780 1013 00:47:12,780 --> 00:47:15,720 1014 00:47:15,720 --> 00:47:17,430 1015 00:47:17,430 --> 00:47:19,760 1016 00:47:19,760 --> 00:47:24,990 1017 00:47:24,990 --> 00:47:26,849 1018 00:47:26,849 --> 00:47:29,790 1019 00:47:29,790 --> 00:47:34,650 1020 00:47:34,650 --> 00:47:36,750 1021 00:47:36,750 --> 00:47:38,430 1022 00:47:38,430 --> 00:47:40,589 1023 00:47:40,589 --> 00:47:42,870 1024 00:47:42,870 --> 00:47:45,359 1025 00:47:45,359 --> 00:47:47,730 1026 00:47:47,730 --> 00:47:50,070 1027 00:47:50,070 --> 00:47:52,829 1028 00:47:52,829 --> 00:47:56,120 1029 00:47:56,120 --> 00:47:59,220 1030 00:47:59,220 --> 00:48:01,740 1031 00:48:01,740 --> 00:48:05,010 1032 00:48:05,010 --> 00:48:07,230 1033 00:48:07,230 --> 00:48:09,960 1034 00:48:09,960 --> 00:48:19,700 1035 00:48:19,700 --> 00:48:23,069 1036 00:48:23,069 --> 00:48:25,589 1037 00:48:25,589 --> 00:48:30,390 1038 00:48:30,390 --> 00:48:32,970 1039 00:48:32,970 --> 00:48:34,650 1040 00:48:34,650 --> 00:48:36,980 1041 00:48:36,980 --> 00:48:39,900 1042 00:48:39,900 --> 00:48:42,450 1043 00:48:42,450 --> 00:48:44,039 1044 00:48:44,039 --> 00:48:46,020 1045 00:48:46,020 --> 00:48:48,690 1046 00:48:48,690 --> 00:48:50,190 1047 00:48:50,190 --> 00:48:52,650 1048 00:48:52,650 --> 00:48:55,740 1049 00:48:55,740 --> 00:48:58,710 1050 00:48:58,710 --> 00:49:00,960 1051 00:49:00,960 --> 00:49:04,020 1052 00:49:04,020 --> 00:49:05,880 1053 00:49:05,880 --> 00:49:10,319 1054 00:49:10,319 --> 00:49:12,359 1055 00:49:12,359 --> 00:49:14,760 1056 00:49:14,760 --> 00:49:16,859 1057 00:49:16,859 --> 00:49:18,809 1058 00:49:18,809 --> 00:49:21,030 1059 00:49:21,030 --> 00:49:23,250 1060 00:49:23,250 --> 00:49:27,480 1061 00:49:27,480 --> 00:49:32,099 1062 00:49:32,099 --> 00:49:34,200 1063 00:49:34,200 --> 00:49:35,280 1064 00:49:35,280 --> 00:49:37,470 1065 00:49:37,470 --> 00:49:39,150 1066 00:49:39,150 --> 00:49:40,650 1067 00:49:40,650 --> 00:49:42,630 1068 00:49:42,630 --> 00:49:44,220 1069 00:49:44,220 --> 00:49:47,190 1070 00:49:47,190 --> 00:49:49,289 1071 00:49:49,289 --> 00:49:51,329 1072 00:49:51,329 --> 00:49:53,960 1073 00:49:53,960 --> 00:49:56,760 1074 00:49:56,760 --> 00:49:58,559 1075 00:49:58,559 --> 00:50:01,349 1076 00:50:01,349 --> 00:50:06,569 1077 00:50:06,569 --> 00:50:06,579 1078 00:50:06,579 --> 00:50:07,579 1079 00:50:07,579 --> 00:50:10,400 1080 00:50:10,400 --> 00:50:13,609 1081 00:50:13,609 --> 00:50:16,130 1082 00:50:16,130 --> 00:50:17,749 1083 00:50:17,749 --> 00:50:18,469 1084 00:50:18,469 --> 00:50:20,929 1085 00:50:20,929 --> 00:50:37,849 1086 00:50:37,849 --> 00:50:39,870 1087 00:50:39,870 --> 00:50:42,150 1088 00:50:42,150 --> 00:50:44,640 1089 00:50:44,640 --> 00:50:47,660 1090 00:50:47,660 --> 00:50:50,430 1091 00:50:50,430 --> 00:50:52,769 1092 00:50:52,769 --> 00:50:56,130 1093 00:50:56,130 --> 00:50:57,480 1094 00:50:57,480 --> 00:51:00,420 1095 00:51:00,420 --> 00:51:02,789 1096 00:51:02,789 --> 00:51:05,009 1097 00:51:05,009 --> 00:51:08,190 1098 00:51:08,190 --> 00:51:10,140 1099 00:51:10,140 --> 00:51:12,660 1100 00:51:12,660 --> 00:51:14,339 1101 00:51:14,339 --> 00:51:16,259 1102 00:51:16,259 --> 00:51:17,819 1103 00:51:17,819 --> 00:51:21,720 1104 00:51:21,720 --> 00:51:21,730 1105 00:51:21,730 --> 00:51:35,430 1106 00:51:35,430 --> 00:51:37,680 1107 00:51:37,680 --> 00:51:39,000 1108 00:51:39,000 --> 00:51:40,920 1109 00:51:40,920 --> 00:51:42,980 1110 00:51:42,980 --> 00:51:46,230 1111 00:51:46,230 --> 00:51:48,270 1112 00:51:48,270 --> 00:51:50,640 1113 00:51:50,640 --> 00:51:52,920 1114 00:51:52,920 --> 00:52:02,390 1115 00:52:02,390 --> 00:52:05,009 1116 00:52:05,009 --> 00:52:07,079 1117 00:52:07,079 --> 00:52:11,219 1118 00:52:11,219 --> 00:52:13,049 1119 00:52:13,049 --> 00:52:15,359 1120 00:52:15,359 --> 00:52:17,630 1121 00:52:17,630 --> 00:52:20,819 1122 00:52:20,819 --> 00:52:23,729 1123 00:52:23,729 --> 00:52:26,459 1124 00:52:26,459 --> 00:52:28,529 1125 00:52:28,529 --> 00:52:30,920 1126 00:52:30,920 --> 00:52:32,880 1127 00:52:32,880 --> 00:52:35,819 1128 00:52:35,819 --> 00:52:38,729 1129 00:52:38,729 --> 00:52:42,509 1130 00:52:42,509 --> 00:52:45,359 1131 00:52:45,359 --> 00:52:48,029 1132 00:52:48,029 --> 00:52:49,349 1133 00:52:49,349 --> 00:52:53,069 1134 00:52:53,069 --> 00:52:54,839 1135 00:52:54,839 --> 00:52:57,989 1136 00:52:57,989 --> 00:53:00,509 1137 00:53:00,509 --> 00:53:02,099 1138 00:53:02,099 --> 00:53:03,479 1139 00:53:03,479 --> 00:53:06,539 1140 00:53:06,539 --> 00:53:08,160 1141 00:53:08,160 --> 00:53:09,839 1142 00:53:09,839 --> 00:53:12,719 1143 00:53:12,719 --> 00:53:14,549 1144 00:53:14,549 --> 00:53:18,059 1145 00:53:18,059 --> 00:53:19,739 1146 00:53:19,739 --> 00:53:22,669 1147 00:53:22,669 --> 00:53:25,169 1148 00:53:25,169 --> 00:53:29,789 1149 00:53:29,789 --> 00:53:31,679 1150 00:53:31,679 --> 00:53:33,689 1151 00:53:33,689 --> 00:53:35,189 1152 00:53:35,189 --> 00:53:36,929 1153 00:53:36,929 --> 00:53:39,359 1154 00:53:39,359 --> 00:53:41,809 1155 00:53:41,809 --> 00:53:47,220 1156 00:53:47,220 --> 00:53:47,230 1157 00:53:47,230 --> 00:53:53,700 1158 00:53:53,700 --> 00:53:57,910 1159 00:53:57,910 --> 00:54:00,490 1160 00:54:00,490 --> 00:54:04,270 1161 00:54:04,270 --> 00:54:07,270 1162 00:54:07,270 --> 00:54:14,140 1163 00:54:14,140 --> 00:54:16,410 1164 00:54:16,410 --> 00:54:19,839 1165 00:54:19,839 --> 00:54:23,770 1166 00:54:23,770 --> 00:54:26,020 1167 00:54:26,020 --> 00:54:28,720 1168 00:54:28,720 --> 00:54:30,910 1169 00:54:30,910 --> 00:54:32,549 1170 00:54:32,549 --> 00:54:35,470 1171 00:54:35,470 --> 00:54:37,690 1172 00:54:37,690 --> 00:54:39,849 1173 00:54:39,849 --> 00:54:41,890 1174 00:54:41,890 --> 00:54:41,900 1175 00:54:41,900 --> 00:54:42,549 1176 00:54:42,549 --> 00:54:45,339 1177 00:54:45,339 --> 00:54:48,549 1178 00:54:48,549 --> 00:54:50,770 1179 00:54:50,770 --> 00:54:52,000 1180 00:54:52,000 --> 00:54:54,130 1181 00:54:54,130 --> 00:54:57,370 1182 00:54:57,370 --> 00:54:58,780 1183 00:54:58,780 --> 00:55:01,690 1184 00:55:01,690 --> 00:55:03,099 1185 00:55:03,099 --> 00:55:05,049 1186 00:55:05,049 --> 00:55:07,240 1187 00:55:07,240 --> 00:55:10,079 1188 00:55:10,079 --> 00:55:13,530 1189 00:55:13,530 --> 00:55:16,390 1190 00:55:16,390 --> 00:55:19,690 1191 00:55:19,690 --> 00:55:21,490 1192 00:55:21,490 --> 00:55:23,940 1193 00:55:23,940 --> 00:55:26,440 1194 00:55:26,440 --> 00:55:28,720 1195 00:55:28,720 --> 00:55:34,670 1196 00:55:34,670 --> 00:55:42,099 1197 00:55:42,099 --> 00:55:44,870 1198 00:55:44,870 --> 00:55:46,190 1199 00:55:46,190 --> 00:55:49,849 1200 00:55:49,849 --> 00:55:53,839 1201 00:55:53,839 --> 00:55:56,030 1202 00:55:56,030 --> 00:55:57,170 1203 00:55:57,170 --> 00:55:59,330 1204 00:55:59,330 --> 00:56:01,160 1205 00:56:01,160 --> 00:56:08,270 1206 00:56:08,270 --> 00:56:10,820 1207 00:56:10,820 --> 00:56:12,050 1208 00:56:12,050 --> 00:56:13,599 1209 00:56:13,599 --> 00:56:17,510 1210 00:56:17,510 --> 00:56:19,730 1211 00:56:19,730 --> 00:56:24,440 1212 00:56:24,440 --> 00:56:26,720 1213 00:56:26,720 --> 00:56:30,500 1214 00:56:30,500 --> 00:56:41,720 1215 00:56:41,720 --> 00:56:43,940 1216 00:56:43,940 --> 00:56:47,420 1217 00:56:47,420 --> 00:56:49,820 1218 00:56:49,820 --> 00:56:53,060 1219 00:56:53,060 --> 00:57:00,599 1220 00:57:00,599 --> 00:57:02,950 1221 00:57:02,950 --> 00:57:06,370 1222 00:57:06,370 --> 00:57:09,010 1223 00:57:09,010 --> 00:57:11,710 1224 00:57:11,710 --> 00:57:13,450 1225 00:57:13,450 --> 00:57:15,280 1226 00:57:15,280 --> 00:57:17,410 1227 00:57:17,410 --> 00:57:19,750 1228 00:57:19,750 --> 00:57:23,920 1229 00:57:23,920 --> 00:57:24,849 1230 00:57:24,849 --> 00:57:27,940 1231 00:57:27,940 --> 00:57:30,039 1232 00:57:30,039 --> 00:57:32,740 1233 00:57:32,740 --> 00:57:35,319 1234 00:57:35,319 --> 00:57:37,839 1235 00:57:37,839 --> 00:57:39,579 1236 00:57:39,579 --> 00:57:42,430 1237 00:57:42,430 --> 00:57:44,109 1238 00:57:44,109 --> 00:57:46,180 1239 00:57:46,180 --> 00:57:48,099 1240 00:57:48,099 --> 00:57:50,680 1241 00:57:50,680 --> 00:57:52,359 1242 00:57:52,359 --> 00:57:54,670 1243 00:57:54,670 --> 00:57:58,390 1244 00:57:58,390 --> 00:58:03,329 1245 00:58:03,329 --> 00:58:03,339 1246 00:58:03,339 --> 00:58:15,289 1247 00:58:15,289 --> 00:58:18,979 1248 00:58:18,979 --> 00:58:20,509 1249 00:58:20,509 --> 00:58:24,859 1250 00:58:24,859 --> 00:58:26,719 1251 00:58:26,719 --> 00:58:37,560 1252 00:58:37,560 --> 00:58:42,290 1253 00:58:42,290 --> 00:58:44,970 1254 00:58:44,970 --> 00:58:47,850 1255 00:58:47,850 --> 00:58:50,000 1256 00:58:50,000 --> 00:58:53,190 1257 00:58:53,190 --> 00:58:55,440 1258 00:58:55,440 --> 00:58:58,530 1259 00:58:58,530 --> 00:59:03,210 1260 00:59:03,210 --> 00:59:05,040 1261 00:59:05,040 --> 00:59:06,540 1262 00:59:06,540 --> 00:59:09,780 1263 00:59:09,780 --> 00:59:11,370 1264 00:59:11,370 --> 00:59:13,170 1265 00:59:13,170 --> 00:59:15,300 1266 00:59:15,300 --> 00:59:18,120 1267 00:59:18,120 --> 00:59:20,550 1268 00:59:20,550 --> 00:59:24,030 1269 00:59:24,030 --> 00:59:27,060 1270 00:59:27,060 --> 00:59:29,970 1271 00:59:29,970 --> 00:59:34,800 1272 00:59:34,800 --> 00:59:36,120 1273 00:59:36,120 --> 00:59:36,130 1274 00:59:36,130 --> 00:59:36,540 1275 00:59:36,540 --> 00:59:38,070 1276 00:59:38,070 --> 00:59:40,320 1277 00:59:40,320 --> 00:59:42,330 1278 00:59:42,330 --> 00:59:44,790 1279 00:59:44,790 --> 00:59:46,200 1280 00:59:46,200 --> 00:59:47,910 1281 00:59:47,910 --> 00:59:49,740 1282 00:59:49,740 --> 00:59:53,940 1283 00:59:53,940 --> 00:59:56,400 1284 00:59:56,400 --> 00:59:59,160 1285 00:59:59,160 --> 01:00:02,160 1286 01:00:02,160 --> 01:00:03,630 1287 01:00:03,630 --> 01:00:05,700 1288 01:00:05,700 --> 01:00:08,210 1289 01:00:08,210 --> 01:00:12,090 1290 01:00:12,090 --> 01:00:13,920 1291 01:00:13,920 --> 01:00:16,800 1292 01:00:16,800 --> 01:00:19,980 1293 01:00:19,980 --> 01:00:22,620 1294 01:00:22,620 --> 01:00:25,710 1295 01:00:25,710 --> 01:00:28,620 1296 01:00:28,620 --> 01:00:30,930 1297 01:00:30,930 --> 01:00:32,670 1298 01:00:32,670 --> 01:00:35,310 1299 01:00:35,310 --> 01:00:37,440 1300 01:00:37,440 --> 01:00:39,540 1301 01:00:39,540 --> 01:00:42,150 1302 01:00:42,150 --> 01:00:46,110 1303 01:00:46,110 --> 01:00:49,050 1304 01:00:49,050 --> 01:00:50,390 1305 01:00:50,390 --> 01:00:52,849 1306 01:00:52,849 --> 01:00:55,609 1307 01:00:55,609 --> 01:00:57,680 1308 01:00:57,680 --> 01:01:01,089 1309 01:01:01,089 --> 01:01:03,980 1310 01:01:03,980 --> 01:01:08,809 1311 01:01:08,809 --> 01:01:10,640 1312 01:01:10,640 --> 01:01:12,799 1313 01:01:12,799 --> 01:01:16,010 1314 01:01:16,010 --> 01:01:17,329 1315 01:01:17,329 --> 01:01:20,089 1316 01:01:20,089 --> 01:01:23,299 1317 01:01:23,299 --> 01:01:25,160 1318 01:01:25,160 --> 01:01:26,960 1319 01:01:26,960 --> 01:01:29,779 1320 01:01:29,779 --> 01:01:31,910 1321 01:01:31,910 --> 01:01:33,529 1322 01:01:33,529 --> 01:01:35,329 1323 01:01:35,329 --> 01:01:38,690 1324 01:01:38,690 --> 01:01:40,220 1325 01:01:40,220 --> 01:01:42,680 1326 01:01:42,680 --> 01:01:44,450 1327 01:01:44,450 --> 01:01:47,000 1328 01:01:47,000 --> 01:01:49,940 1329 01:01:49,940 --> 01:01:52,099 1330 01:01:52,099 --> 01:01:54,559 1331 01:01:54,559 --> 01:01:56,900 1332 01:01:56,900 --> 01:01:58,760 1333 01:01:58,760 --> 01:02:01,190 1334 01:02:01,190 --> 01:02:01,200 1335 01:02:01,200 --> 01:02:06,720 1336 01:02:06,720 --> 01:02:09,820 1337 01:02:09,820 --> 01:02:12,940 1338 01:02:12,940 --> 01:02:15,250 1339 01:02:15,250 --> 01:02:20,140 1340 01:02:20,140 --> 01:02:31,840 1341 01:02:31,840 --> 01:02:33,880 1342 01:02:33,880 --> 01:02:35,980 1343 01:02:35,980 --> 01:02:37,210 1344 01:02:37,210 --> 01:02:40,600 1345 01:02:40,600 --> 01:02:44,410 1346 01:02:44,410 --> 01:02:46,210 1347 01:02:46,210 --> 01:02:48,130 1348 01:02:48,130 --> 01:02:50,050 1349 01:02:50,050 --> 01:02:52,900 1350 01:02:52,900 --> 01:02:55,300 1351 01:02:55,300 --> 01:02:57,640 1352 01:02:57,640 --> 01:03:00,220 1353 01:03:00,220 --> 01:03:01,930 1354 01:03:01,930 --> 01:03:04,720 1355 01:03:04,720 --> 01:03:06,220 1356 01:03:06,220 --> 01:03:12,250 1357 01:03:12,250 --> 01:03:14,170 1358 01:03:14,170 --> 01:03:16,420 1359 01:03:16,420 --> 01:03:19,390 1360 01:03:19,390 --> 01:03:21,520 1361 01:03:21,520 --> 01:03:23,860 1362 01:03:23,860 --> 01:03:29,110 1363 01:03:29,110 --> 01:03:30,580 1364 01:03:30,580 --> 01:03:32,140 1365 01:03:32,140 --> 01:03:35,050 1366 01:03:35,050 --> 01:03:38,500 1367 01:03:38,500 --> 01:03:42,520 1368 01:03:42,520 --> 01:03:44,760 1369 01:03:44,760 --> 01:03:48,460 1370 01:03:48,460 --> 01:03:51,220 1371 01:03:51,220 --> 01:03:54,250 1372 01:03:54,250 --> 01:03:55,450 1373 01:03:55,450 --> 01:03:57,880 1374 01:03:57,880 --> 01:04:00,820 1375 01:04:00,820 --> 01:04:03,460 1376 01:04:03,460 --> 01:04:05,650 1377 01:04:05,650 --> 01:04:07,450 1378 01:04:07,450 --> 01:04:10,150 1379 01:04:10,150 --> 01:04:12,340 1380 01:04:12,340 --> 01:04:14,350 1381 01:04:14,350 --> 01:04:16,510 1382 01:04:16,510 --> 01:04:19,789 1383 01:04:19,789 --> 01:04:21,960 1384 01:04:21,960 --> 01:04:21,970 1385 01:04:21,970 --> 01:04:24,410 1386 01:04:24,410 --> 01:04:26,819 1387 01:04:26,819 --> 01:04:29,180 1388 01:04:29,180 --> 01:04:31,440 1389 01:04:31,440 --> 01:04:33,390 1390 01:04:33,390 --> 01:04:35,730 1391 01:04:35,730 --> 01:04:37,890 1392 01:04:37,890 --> 01:04:40,289 1393 01:04:40,289 --> 01:04:41,999 1394 01:04:41,999 --> 01:04:44,880 1395 01:04:44,880 --> 01:04:47,249 1396 01:04:47,249 --> 01:04:49,019 1397 01:04:49,019 --> 01:04:51,480 1398 01:04:51,480 --> 01:04:55,499 1399 01:04:55,499 --> 01:04:57,599 1400 01:04:57,599 --> 01:04:59,309 1401 01:04:59,309 --> 01:05:01,769 1402 01:05:01,769 --> 01:05:04,140 1403 01:05:04,140 --> 01:05:06,089 1404 01:05:06,089 --> 01:05:07,319 1405 01:05:07,319 --> 01:05:09,180 1406 01:05:09,180 --> 01:05:11,660 1407 01:05:11,660 --> 01:05:15,630 1408 01:05:15,630 --> 01:05:17,039 1409 01:05:17,039 --> 01:05:17,819 1410 01:05:17,819 --> 01:05:19,799 1411 01:05:19,799 --> 01:05:21,089 1412 01:05:21,089 --> 01:05:23,670 1413 01:05:23,670 --> 01:05:26,819 1414 01:05:26,819 --> 01:05:28,230 1415 01:05:28,230 --> 01:05:29,609 1416 01:05:29,609 --> 01:05:31,259 1417 01:05:31,259 --> 01:05:33,930 1418 01:05:33,930 --> 01:05:38,039 1419 01:05:38,039 --> 01:05:39,630 1420 01:05:39,630 --> 01:05:42,779 1421 01:05:42,779 --> 01:05:45,749 1422 01:05:45,749 --> 01:05:46,980 1423 01:05:46,980 --> 01:05:49,769 1424 01:05:49,769 --> 01:05:51,779 1425 01:05:51,779 --> 01:05:54,269 1426 01:05:54,269 --> 01:05:56,640 1427 01:05:56,640 --> 01:05:59,400 1428 01:05:59,400 --> 01:06:01,079 1429 01:06:01,079 --> 01:06:04,740 1430 01:06:04,740 --> 01:06:06,779 1431 01:06:06,779 --> 01:06:10,230 1432 01:06:10,230 --> 01:06:14,549 1433 01:06:14,549 --> 01:06:16,230 1434 01:06:16,230 --> 01:06:19,170 1435 01:06:19,170 --> 01:06:20,640 1436 01:06:20,640 --> 01:06:22,049 1437 01:06:22,049 --> 01:06:26,280 1438 01:06:26,280 --> 01:06:30,870 1439 01:06:30,870 --> 01:06:32,490 1440 01:06:32,490 --> 01:06:34,530 1441 01:06:34,530 --> 01:06:38,190 1442 01:06:38,190 --> 01:06:41,310 1443 01:06:41,310 --> 01:06:43,380 1444 01:06:43,380 --> 01:06:44,670 1445 01:06:44,670 --> 01:06:46,470 1446 01:06:46,470 --> 01:06:48,420 1447 01:06:48,420 --> 01:06:53,820 1448 01:06:53,820 --> 01:07:03,620 1449 01:07:03,620 --> 01:07:07,020 1450 01:07:07,020 --> 01:07:10,530 1451 01:07:10,530 --> 01:07:13,920 1452 01:07:13,920 --> 01:07:17,000 1453 01:07:17,000 --> 01:07:20,849 1454 01:07:20,849 --> 01:07:24,120 1455 01:07:24,120 --> 01:07:26,400 1456 01:07:26,400 --> 01:07:28,160 1457 01:07:28,160 --> 01:07:31,320 1458 01:07:31,320 --> 01:07:34,890 1459 01:07:34,890 --> 01:07:42,400 1460 01:07:42,400 --> 01:07:45,230 1461 01:07:45,230 --> 01:07:46,610 1462 01:07:46,610 --> 01:07:50,030 1463 01:07:50,030 --> 01:07:55,210 1464 01:07:55,210 --> 01:08:00,670 1465 01:08:00,670 --> 01:08:03,260 1466 01:08:03,260 --> 01:08:06,850 1467 01:08:06,850 --> 01:08:11,690 1468 01:08:11,690 --> 01:08:17,120 1469 01:08:17,120 --> 01:08:18,290 1470 01:08:18,290 --> 01:08:20,390 1471 01:08:20,390 --> 01:08:22,490 1472 01:08:22,490 --> 01:08:24,170 1473 01:08:24,170 --> 01:08:27,440 1474 01:08:27,440 --> 01:08:28,910 1475 01:08:28,910 --> 01:08:31,240 1476 01:08:31,240 --> 01:08:36,500 1477 01:08:36,500 --> 01:08:38,210 1478 01:08:38,210 --> 01:08:41,300 1479 01:08:41,300 --> 01:08:43,940 1480 01:08:43,940 --> 01:08:46,490 1481 01:08:46,490 --> 01:08:49,090 1482 01:08:49,090 --> 01:08:54,590 1483 01:08:54,590 --> 01:08:57,470 1484 01:08:57,470 --> 01:09:00,140 1485 01:09:00,140 --> 01:09:01,579 1486 01:09:01,579 --> 01:09:03,710 1487 01:09:03,710 --> 01:09:06,440 1488 01:09:06,440 --> 01:09:09,110 1489 01:09:09,110 --> 01:09:11,570 1490 01:09:11,570 --> 01:09:16,490 1491 01:09:16,490 --> 01:09:19,700 1492 01:09:19,700 --> 01:09:21,260 1493 01:09:21,260 --> 01:09:23,120 1494 01:09:23,120 --> 01:09:26,930 1495 01:09:26,930 --> 01:09:29,060 1496 01:09:29,060 --> 01:09:31,640 1497 01:09:31,640 --> 01:09:33,590 1498 01:09:33,590 --> 01:09:36,530 1499 01:09:36,530 --> 01:09:39,260 1500 01:09:39,260 --> 01:09:42,710 1501 01:09:42,710 --> 01:09:45,050 1502 01:09:45,050 --> 01:09:46,760 1503 01:09:46,760 --> 01:09:48,870 1504 01:09:48,870 --> 01:09:50,250 1505 01:09:50,250 --> 01:09:52,920 1506 01:09:52,920 --> 01:09:55,680 1507 01:09:55,680 --> 01:09:59,370 1508 01:09:59,370 --> 01:10:02,820 1509 01:10:02,820 --> 01:10:05,070 1510 01:10:05,070 --> 01:10:06,510 1511 01:10:06,510 --> 01:10:10,950 1512 01:10:10,950 --> 01:10:12,450 1513 01:10:12,450 --> 01:10:15,390 1514 01:10:15,390 --> 01:10:17,220 1515 01:10:17,220 --> 01:10:20,520 1516 01:10:20,520 --> 01:10:21,150 1517 01:10:21,150 --> 01:10:22,920 1518 01:10:22,920 --> 01:10:25,200 1519 01:10:25,200 --> 01:10:26,760 1520 01:10:26,760 --> 01:10:28,560 1521 01:10:28,560 --> 01:10:30,660 1522 01:10:30,660 --> 01:10:35,030 1523 01:10:35,030 --> 01:10:37,680 1524 01:10:37,680 --> 01:10:41,070 1525 01:10:41,070 --> 01:10:44,040 1526 01:10:44,040 --> 01:10:47,070 1527 01:10:47,070 --> 01:10:49,680 1528 01:10:49,680 --> 01:10:53,010 1529 01:10:53,010 --> 01:10:55,140 1530 01:10:55,140 --> 01:10:57,960 1531 01:10:57,960 --> 01:11:00,720 1532 01:11:00,720 --> 01:11:01,920 1533 01:11:01,920 --> 01:11:09,189 1534 01:11:09,189 --> 01:11:13,359 1535 01:11:13,359 --> 01:11:19,780 1536 01:11:19,780 --> 01:11:21,850 1537 01:11:21,850 --> 01:11:24,490 1538 01:11:24,490 --> 01:11:26,669 1539 01:11:26,669 --> 01:11:33,750 1540 01:11:33,750 --> 01:11:35,740 1541 01:11:35,740 --> 01:11:38,109 1542 01:11:38,109 --> 01:11:40,229 1543 01:11:40,229 --> 01:11:43,479 1544 01:11:43,479 --> 01:11:47,350 1545 01:11:47,350 --> 01:11:49,660 1546 01:11:49,660 --> 01:11:51,629 1547 01:11:51,629 --> 01:11:53,800 1548 01:11:53,800 --> 01:11:56,290 1549 01:11:56,290 --> 01:11:57,790 1550 01:11:57,790 --> 01:12:03,250 1551 01:12:03,250 --> 01:12:03,970 1552 01:12:03,970 --> 01:12:12,290 1553 01:12:12,290 --> 01:12:14,270 1554 01:12:14,270 --> 01:12:18,800 1555 01:12:18,800 --> 01:12:25,140 1556 01:12:25,140 --> 01:12:25,150 1557 01:12:25,150 --> 01:12:34,750 1558 01:12:34,750 --> 01:12:38,240 1559 01:12:38,240 --> 01:12:40,189 1560 01:12:40,189 --> 01:12:42,260 1561 01:12:42,260 --> 01:12:43,700 1562 01:12:43,700 --> 01:12:51,250 1563 01:12:51,250 --> 01:12:51,260 1564 01:12:51,260 --> 01:12:56,120 1565 01:12:56,120 --> 01:12:58,820 1566 01:12:58,820 --> 01:13:00,410 1567 01:13:00,410 --> 01:13:13,980 1568 01:13:13,980 --> 01:13:16,149 1569 01:13:16,149 --> 01:13:19,959 1570 01:13:19,959 --> 01:13:22,419 1571 01:13:22,419 --> 01:13:25,030 1572 01:13:25,030 --> 01:13:28,450 1573 01:13:28,450 --> 01:13:28,460 1574 01:13:28,460 --> 01:13:28,950 1575 01:13:28,950 --> 01:13:30,790 1576 01:13:30,790 --> 01:13:33,100 1577 01:13:33,100 --> 01:13:35,080 1578 01:13:35,080 --> 01:13:40,030 1579 01:13:40,030 --> 01:13:49,479 1580 01:13:49,479 --> 01:13:50,979 1581 01:13:50,979 --> 01:13:53,220 1582 01:13:53,220 --> 01:13:55,450 1583 01:13:55,450 --> 01:13:57,939 1584 01:13:57,939 --> 01:14:00,100 1585 01:14:00,100 --> 01:14:04,780 1586 01:14:04,780 --> 01:14:06,550 1587 01:14:06,550 --> 01:14:10,410 1588 01:14:10,410 --> 01:14:15,459 1589 01:14:15,459 --> 01:14:17,560 1590 01:14:17,560 --> 01:14:20,770 1591 01:14:20,770 --> 01:14:22,870 1592 01:14:22,870 --> 01:14:25,090 1593 01:14:25,090 --> 01:14:27,180 1594 01:14:27,180 --> 01:14:30,129 1595 01:14:30,129 --> 01:14:32,229 1596 01:14:32,229 --> 01:14:35,530 1597 01:14:35,530 --> 01:14:39,340 1598 01:14:39,340 --> 01:14:42,479 1599 01:14:42,479 --> 01:14:45,189 1600 01:14:45,189 --> 01:14:48,340 1601 01:14:48,340 --> 01:14:49,990 1602 01:14:49,990 --> 01:14:52,720 1603 01:14:52,720 --> 01:14:55,840 1604 01:14:55,840 --> 01:14:57,610 1605 01:14:57,610 --> 01:14:59,470 1606 01:14:59,470 --> 01:15:01,689 1607 01:15:01,689 --> 01:15:05,229 1608 01:15:05,229 --> 01:15:09,879 1609 01:15:09,879 --> 01:15:12,010 1610 01:15:12,010 --> 01:15:14,649 1611 01:15:14,649 --> 01:15:16,899 1612 01:15:16,899 --> 01:15:21,850 1613 01:15:21,850 --> 01:15:23,080 1614 01:15:23,080 --> 01:15:26,680 1615 01:15:26,680 --> 01:15:29,930 1616 01:15:29,930 --> 01:15:46,529 1617 01:15:46,529 --> 01:15:48,990 1618 01:15:48,990 --> 01:15:51,870 1619 01:15:51,870 --> 01:15:53,490 1620 01:15:53,490 --> 01:15:54,990 1621 01:15:54,990 --> 01:15:58,470 1622 01:15:58,470 --> 01:16:02,160 1623 01:16:02,160 --> 01:16:04,830 1624 01:16:04,830 --> 01:16:06,359 1625 01:16:06,359 --> 01:16:08,939 1626 01:16:08,939 --> 01:16:10,919 1627 01:16:10,919 --> 01:16:12,959 1628 01:16:12,959 --> 01:16:16,819 1629 01:16:16,819 --> 01:16:18,839 1630 01:16:18,839 --> 01:16:23,279 1631 01:16:23,279 --> 01:16:25,620 1632 01:16:25,620 --> 01:16:27,660 1633 01:16:27,660 --> 01:16:30,149 1634 01:16:30,149 --> 01:16:31,979 1635 01:16:31,979 --> 01:16:34,020 1636 01:16:34,020 --> 01:16:35,850 1637 01:16:35,850 --> 01:16:38,819 1638 01:16:38,819 --> 01:16:40,700 1639 01:16:40,700 --> 01:16:42,870 1640 01:16:42,870 --> 01:16:45,649 1641 01:16:45,649 --> 01:16:48,240 1642 01:16:48,240 --> 01:16:50,790 1643 01:16:50,790 --> 01:16:53,100 1644 01:16:53,100 --> 01:16:54,180 1645 01:16:54,180 --> 01:16:56,729 1646 01:16:56,729 --> 01:16:59,700 1647 01:16:59,700 --> 01:17:02,339 1648 01:17:02,339 --> 01:17:07,109 1649 01:17:07,109 --> 01:17:10,020 1650 01:17:10,020 --> 01:17:12,870 1651 01:17:12,870 --> 01:17:15,839 1652 01:17:15,839 --> 01:17:19,319 1653 01:17:19,319 --> 01:17:21,379 1654 01:17:21,379 --> 01:17:24,660 1655 01:17:24,660 --> 01:17:26,160 1656 01:17:26,160 --> 01:17:28,350 1657 01:17:28,350 --> 01:17:31,560 1658 01:17:31,560 --> 01:17:33,660 1659 01:17:33,660 --> 01:17:35,759 1660 01:17:35,759 --> 01:17:37,470 1661 01:17:37,470 --> 01:17:40,399 1662 01:17:40,399 --> 01:17:42,509 1663 01:17:42,509 --> 01:17:44,370 1664 01:17:44,370 --> 01:17:47,220 1665 01:17:47,220 --> 01:17:50,310 1666 01:17:50,310 --> 01:17:53,299 1667 01:17:53,299 --> 01:17:55,799 1668 01:17:55,799 --> 01:17:58,490 1669 01:17:58,490 --> 01:18:00,230 1670 01:18:00,230 --> 01:18:02,350 1671 01:18:02,350 --> 01:18:04,580 1672 01:18:04,580 --> 01:18:07,660 1673 01:18:07,660 --> 01:18:10,000 1674 01:18:10,000 --> 01:18:13,010 1675 01:18:13,010 --> 01:18:15,440 1676 01:18:15,440 --> 01:18:18,530 1677 01:18:18,530 --> 01:18:20,570 1678 01:18:20,570 --> 01:18:22,250 1679 01:18:22,250 --> 01:18:25,160 1680 01:18:25,160 --> 01:18:27,410 1681 01:18:27,410 --> 01:18:29,870 1682 01:18:29,870 --> 01:18:32,570 1683 01:18:32,570 --> 01:18:35,480 1684 01:18:35,480 --> 01:18:37,040 1685 01:18:37,040 --> 01:18:39,140 1686 01:18:39,140 --> 01:18:41,000 1687 01:18:41,000 --> 01:18:42,680 1688 01:18:42,680 --> 01:18:42,690 1689 01:18:42,690 --> 01:18:44,840