1 00:00:01,069 --> 00:00:03,189 the following content is provided under a Creative Commons license your support will help MIT OpenCourseWare continue to offer high quality educational resources for free to make a donation or to view additional materials from hundreds of MIT courses visit MIT opencourseware at ocw.mit.edu so welcome to six one seven two my name is charles leiserson and i am one of the two lecturers this term the other is professor Julian shun were both in EECS and in csail on the seventh floor of the gates building if you don't know it you are in performance engineering of software systems so if this is the wrong if you found yourself in the wrong place now is the time to exit I want to start today by talking a little bit about why we do performance engineering and then I'll do some a little bit of administration and then sort of dive into sort of a case study that'll give you a good sense of some of the things that we're going to do during the term I put the administration in the middle because it's like if if you don't if from me telling you about the course you don't want to do the course then it's like why should you listen to the administration right it's like so so let's just dive right in okay so so the first thing to always understand whenever you're doing something is a perspective on your on what matters in what you're doing so we're going to study the whole term we're gonna do software performance engineering and so this is kind of interesting because it turns out that performance is usually not at the top of what people are interested in when they're building software okay what are some of the things that are more important than software that's sorry then performance yeah deadlines good cost correctness extensibility yeah we would go on and on I think that you folks could probably make a pretty long list I made a short list of all the kinds of things that are more important than performance so then if programmers are so willing to sacrifice performance for these properties why do we study performance okay so this is kind of a bit of a paradox in a bit of a puzzle why do you study something that clearly isn't at the top of the list of what most people care about when they're developing software I think the answer to that is that performance is the currency of computing okay you use performance to buy these other properties so you'll say something like gee I want to make it easy to program and so therefore I'm willing to sacrifice some performance to make something easy to program now I'm willing to sacrifice some performance to make sure that my system is secure okay and all those things come out of your performance budget and clearly if performance degrades too far your stuff becomes unusable okay when I talk with people in with programmers and I say you know people are fond of saying ah performance you know yo you do performance performance doesn't matter I never think about it then I talk with with people who use computers and I ask what's your main complaint about the the computing systems you use answer too slow okay so it's interesting whether you're the producer or that whatever but but the real answer is that that performance is is like currency it's something you spend I would rather have if you look you know would I rather have $100 or a gallon of water well water is indispensable to life there's circumstances certainly where I would prefer to have the water okay then $100 okay but in our modern society I can buy water for much less than $100 okay so even though water is essential to life and far more important than money money is a currency and so I prefer to have the money because I can just buy the things I and that's the same kind of analogy of performance it has no intrinsic value but it contributes to things you can use it to buy things that you care about like usability or testability or what have you okay now in the early days of computing software performance engineering was common because machine resources were limited if you look at these machines from 1964 to 1977 I mean look at the site look at how many bytes they have on them right the in 64 there is a computer with 524 kilobytes okay that was a big machine back then that's kilobytes that's not megabytes that's not gigabytes that's kilobytes okay and many programs would strain the machine resources okay the clock rate for that machine was 33 kilohertz what's a typical clock rate today about what four gigahertz three gigahertz two gigahertz somewhere up there yeah somewhere in that range okay and here they were operating with kilohertz so many programs would not fit without intense performance engineering and one of the things also that there's a lot of a lot of sayings that came out of that area Donald Knuth who's one of the touring award winner absolutely fabulous computer scientists in all respects wrote premature optimization is the root of all evil and I invite you by the way to look that quote up and because there's actually taken out of context okay so trying to optimize stuff too early he was worried about okay bill Wolfe who built the design the bliss language and worked on the pdp-11 and such said more computing sins are committed in the name of efficiency without necessarily achieving it than for any other single reason including blind stupidity okay and Michael Jackson said the first rule of program optimization don't do it second rule of program optimization for experts only don't do it yet okay so everybody warning away because when you start trying to make things fast your code becomes unreadable okay making code that is readable and fast now that's that's where the art is and hopefully we'll learn a little bit about doing that okay and indeed there was no real point in in working too hard on performance engineering for many years if you look at technology scaling and you look at how many transistors are on various processor designs up until about 2004 we had Moore's law in in full throttle okay with chip densities doubling every two years and really quite amazing and along with that as they shrunk the dimensions of chips because by miniaturization the clock speed would go up correspondingly as well and so if you found something was too slow wait a couple years okay wait a couple years it'll be faster okay and so there wasn't you know if you're going to do something with software and make your software ugly that really wasn't a real you know wasn't a real good payoff compared to just simply waiting around and in that era there was something called Dennard scaling where which allowed things to as things shrunk allowed the clock speeds to get larger basically by reducing power you could reduce power and still keep everything fast and we'll talk about that in a minute so if you look at what happened to from 1977 to 2004 here are Apple computers with similar similar price tags and you can see the the clock rate really just skyrocketed one megahertz 400 megahertz 1.8 gigahertz okay and the data paths went from 8 bits to 32 64 the memory because correspondingly grow cost approximately the same and that was that's the legacy from Moore's Law and the tremendous advances in semiconductor technology and so until 2004 moore's law in the scaling of clock frequency so court Dennard scaling was essentially a printing press for the currency of performance okay you didn't have to do anything you just made the hardware go fast or very easy very and all that came to an end well some of it came to an end in 2004 when clock speeds plateaued okay so if you look at this around 2005 you can see all the speeds we hit you know 2 to 4 gigahertz and we have not been able to make chips go faster than that in any practical way since then but the densities have kept growing great now the reason that the clock speed flattened was because of power density and this is a slide from Intel from that era looking at the growth of power density and what they were projecting was that that the junction temperatures of the transistors on the chip if they just keep scaling the way they had been scaling would would start to approach first of all the temperature of a nuclear reactor than the temperature of a rocket nozzle and then the Sun surface okay so that we're not going to build little technology that cools that very well and even if you could solve it for a little bit the writing was on the wall we cannot scale clock frequencies anymore the reason for that is that originally clock frequency was scaled assuming that the most of the power was dynamic power which was going when you switched the circuit and what happened as we kept reducing that and reducing that is something that used to be in the noise namely the Deacon's okay started to become significant to the point where now today the dynamic power is is far less of a concern than the static power from just the circuits sitting there leaking and when you miniaturize you can't stop that effect from happening so what did the vendors do in 2004 and 2005 and since is they said oh gosh we've got all these transistors to use but we can't use the transistors to make stuff run faster so what they did is they introduced parallelism in the form of multi-core processors they put more than one processing core in a chip and to scale performance they would you know have multiple cores and each generation of Moore's law now was potentially doubling the number of of cores and so if you look at what happened for processor cores you see that around 2005 2004 2005 we started to get multiple processing cores per chip to the extent that today it's basically impossible to find a single core chip for a laptop or a workstation or whatever everything is multi-crew you can't buy just one you have to buy a parallel processor and and so the impact of that was that performance was no longer free you couldn't just speed up the hardware now if you wanted to use that potential you had to do parallel program and that's not something that anybody in the industry really had done so today there are a lot of other things that happen in the in that intervening time we got vector units as common parts of our machines we got GPUs we got steeper cache hierarchies we have a configurable logic on some machines and so forth and now it's up to the software to adapt to it and so although we don't want to have to deal with performance today you have to deal with performance and in your lifetimes you will have to deal with performance okay in software if you're gonna have effective software okay you can see what happened also this is a study that we did looking at software bugs and a variety of open source projects where they're mentioning the word performance and you can see that in 2004 the numbers start going up you know some of them it's not as as convincing for some things as others but generally there's a trend of after 2004 people started worrying more about performance if you look at software developer jobs you know as of tooth you know early mid-2000s 2000 oh ohs I guess okay the you see once again the mention of performance and jobs is going up and anecdotally I can tell you know I had one student who came to me after the spring after he'd taken six 172 and he said you know I went and I had applied for five jobs and every job asked me every at every job interview they asked me a question I couldn't have answered if I hadn't taken six 172 and I got five offers okay and when I compared those offers they tended to be 20 to 30% larger than people are just web monkeys okay so so anyway that's not to say that you should necessarily take this class okay but I just want to point out that what we're gonna learn is going to be interesting from a practical point of view ie your futures okay as well as theoretical points of view and technical points of view okay so modern processors are really complicated and the big question is how do we write software to use that modern hardware efficiently okay I want to give you a example of performance engineering of a very well studied problem namely matrix multiplication who has never seen this problem okay so there we got some Joker's in the class I can say okay so this is you know it takes n cubed operations because you're basically computing N squared dot products okay so essentially if you add up the total number of operations it's about 2n cubed because there is essentially a multiply and an ADD for every pair of terms that need to be accumulated okay so it's basically 2n cubed we're gonna look at it assuming for simplicity that our n is a an exact power of two okay now the machine that we're gonna look at is going to be one of the one one of the ones that you'll have access to an AWS okay it's a it's a compute optimized machine which has a Haswell microarchitecture running at 2.9 gigahertz there are two processor chips for each of these machines and nine processing cores per chip so a total of 18 cores so that's the amount of parallel processing it does two-way hyper-threading which we're actually going to not deal a lot with it hyper threading gives you a little bit more performance but it also makes it really hard to measure things so generally we will turn off hyper threading but the performance that you get tends to be correlated with what you get when your hyper thread for floating-point unit there it is capable of doing eight double precision operations that 64-bit floating-point operations including a fused multiply add per core per cycle okay so that that's a vector unit so you basically each of these 18 cores can do eight double precision operations and so including a fuse multiply add which is actually two operations okay the way that they count these things okay it has a cache line size of 64 bytes the AI cache is 32 kilobytes which is 8 way set associative we'll talk about some of these things if you don't know all the terms that's okay we're gonna cover most of these terms later on it's got a D cache of the same size it's got an l2 cache of 256 kilobytes and it's got an l3 cache or what's sometimes called an LLC last level cache of 25 megabytes and then it's got 60 gigabytes of DRAM so this is a honking big machine okay this is like you can get things to sing on this okay if you look at the peak performance it's the clock speed times 2 processor chips times 9 processing cores per chip each capable of if you can use both the multiply and the add 16 floating-point operations and that goes out to just short of a teraflops okay 836 gigaflops okay so that's that's a lot of power okay that's a lot of power these are fun machines actually okay especially when we get into things like the the the game playing AI and stuff that we do for the fourth project you'll see it they're really fun can I have a lot of compute okay now the base here's the basic code this is the full code for Python for doing matrix multiplication now generally in Python you wouldn't use this code because you just call a library subroutine that does matrix multiplication but sometimes you have a problem I'm going to illustrate with matrix multiplication but sometimes you have a problem that is for what you have to write the code and I want to give you an idea of how what kind of performance you get out of Python okay in addition somebody has to write if there is a library routine somebody had to write it and that person was a performance engineer because they wrote it to be as fast as possible and so this will give you an idea of what you can do to make run fast okay so when you run this code so you can see that the start time you know before the triply nested loop right here before the triply nested loop we take a time measurement and then we take another time measurement at the end and then we print the difference and then that's just this classic three triply nested loop for for matrix multiplication and so when you run this how long is this run for do you think any guesses let's see how about let's do this runs for six microseconds who thinks six microseconds how about six milliseconds how about six milliseconds how about six seconds how about six minutes okay how about six hours how about six days okay of course it's important to know what size it is this 4096 by 4096 as it shows in the code okay so and those of you didn't vote can I wake up let's get active this is active learning put yourself out there okay it doesn't matter whether you're right and wrong there'll be a bunch of people who got the right answer there have no idea why okay so it turns out it takes about 21,000 seconds which is about six hours okay amazing is this fast Yeah right duh right this fast no that you know how how do we tell whether whether this is fast or not okay you know what should we expect from our machine so let's do a back-of-the-envelope calculation of of how many operations there are and how fast we ought to be able to do we just went through and said what all the parameters the machine so there are 2n cubed operations that need to be performed we're not doing strassens algorithm or anything like that we're just doing straight triply nested loop so that's 2 to the 37th floating-point operations okay the running time is 21,000 seconds so that says that we're getting about 6.2 5 mega flops out of our machine when we run that code ok just by dividing it out how many floating-point operations per second do we get we take the number of operations divide it by the time okay the peak as you recall was about 836 gigaflops okay and we're getting 6.25 mega flops okay so we're getting about 0.0007 5 percent of peak okay this is not fast okay this is not fast so let's let's do something really similar let's code it in Java rather than Python okay so we take just that loop the code is almost the same okay triply nested loop we run it in Java okay and the running time now it turns out is about just under 3,000 seconds which is about 46 minutes the same code Python Java okay we got a almost a nine times speed-up over just simply coding it in a different language okay well let's try see that's the clap that's the language we're going to be using here what happens when you code it and see it's exactly the same thing okay we're going to use the clang LLVM 5.0 compiler I believe we're using 6.0 this term is that right yeah okay I should have rerun these numbers for sex but I didn't so now it's basically 1,100 seconds which is about 19 minutes so we got then about it's twice as fast as Java and about 18 times faster than Python okay so here's where we stand so far okay we have the running time of these various things okay and the relative speed up is how much faster it isn't in the previous row and the absolute speed up is how it is compared to the first row and now we're managing to get now 0.01 4 percent of peak so we're still we're still slow but before we go and try to optimize it further like why is Python so slow and C so fast anybody know okay that's that's kind of on the right track anybody else having the ice articulate a little bit why Python is so slow yeah you're right like multiplying add those aren't the only instructions pythons doing it's doing lots of code yeah yeah okay good so the big reason why Python is slow and C is so fast is that python is interpreted and C is compiled directly to byte to machine code and Java is somewhere in the middle because Java is compiled to bytecode which is then interpreted and then just-in-time compiled into machine code so let me tell it talked a little bit about these things so so interpreters such as in Python are versatile but slow it's one of these things where they said we're gonna take some of our performance and use it to make a more flexible easier to program environment okay the interpreter basically reads interprets and performs each program statement and then updates the Machine State so it's not just it's actually going through an each time reading your code figuring out what it does and then implementing it so there's like all this overhead compared to just doing its operations so interpreters can easily support high-level programming features and they can do things like dynamic code alteration and so forth at the cost of performance so that you know typically the the cycle for an interpreter is you read the next statement you interpret the statement you then perform the statement and then you update the state of the machine and then you would fetch the next instruction okay and you're going through that each time and that's done in software okay when you have things compiled to machine code it goes through a similar thing but it's highly optimized just for the things that machines are done okay and so when you compile you're able to take advantage of the hardware and interpreter of machine instructions and that's much much lower overhead than the big software overhead you get with Python now JIT is somewhere in the middle what's used in Java JIT compilers can recover some of the performance in fact it did a pretty good job in this case the idea is when the code is first interpreted it's executed it's interpreted and then the runtime C system keeps track of how often the various pieces of code are executed and whenever it finds that there's some piece of code that it's executing frequently it then calls the compiler to compile that piece of code and then subsequent to that it runs the compiled code so it tries to get the big advantage of the of performance by only compiling the things that are necessary you know for which it's actually going to pay off to to invoke the compiler to do ok so so anyway so that's the big difference with with those kinds of things one of the reasons we don't use Python in this class is because the performance model is hard to figure out ok see it's much closer to the metal much closer to the silicon ok and so it's much easier to figure out what's going on in that in that context ok but we will have a guest lecture that we're going to talk about performance in managed languages like like Python so it's not that we're going to ignore the topic but we will learn how to do performance engineering in a place where it's easier to do it ok now one of the things that good compiler will do is you know once you get to let's say we have the C version which is where we're going to move from this point cuz that's the fastest week we got so far is it turns out that you can change the order of loops in this program without affecting the correctness ok so here we went you know for I for j4k do the update ok we could otherwise do we could do for I for k4j do the update and it computes exactly the same thing or we could do for K for j4 I do the updates okay so we can change the order without affecting the correctness okay and so do you think the order of loops matters for performance uh uh you know I think this is like this leading question yeah question yes okay and you're exactly right cache locality is what it is so when we do that we get the loop order affects the running time by a factor of 18 whoa just by switching the order okay what's going on there okay what's going on so we're going to talk about this in more depth so I'm just going to fly through this because this is just sort of showing you the kinds of considerations that you do so hardware there are each processor reads and writes main memory in contiguous blocks called cache lines okay previously access cache lines are stored in a small memory called cache that sits near the processor when it access when the processor accessed something if it's in the cache you get a hit that's very cheap okay and fast if you miss you have to go out to either a deeper level cache or all the way out to main memory that is much much slower and we'll talk about that kind of thing so what happens in in for this matrix problem is the matrices are laid out in memory and look row major order that means you take you know you have a two-dimensional matrix it's laid out in the linear order of the addresses of memory by essentially taking Row one and then after Row one stick Row two and after that stick Row three and so forth and unfolding there's another order that things could have been laid out in fact they are in Fortran which is called column major order okay so it turns out C and Fortran operate in different orders okay and turns out it affects performance which way it does it so let's just take a look at the access pattern for order ijk okay so what we're doing is once we figure out what I and what J is we're gonna go through and cycle through k and as we cycle through K okay CIJ stays the same for everything we get for that excellent spatial locality because we're just accessing the same location every single time it's going to be in cache it's always going to be there it's going to be fast to access see for a what happens is we go through in a linear order and we get good spatial locality but for B it's going through columns and those points are distributed far away in memory so the processor is going to be bringing in 64 bytes to operate on a particular datum okay and then it's ignoring seven of the eight by seven of the eight floating-point words on that on that cache line and going to the next one so it's wasting an awful lot okay so this one has good spatial locality and that it's all adjacent and you would use the cache lines effectively this one you're going 4096 elements apart it's got poor spatial locality okay and that's why and that's for this one so then if we look at the different other ones this one the order ikj it turns out you get good spatial locality for both C and B and excellent for a okay and if you look at you know even another one you don't get nearly as good as the other one so there's a whole range of things okay this one you're doing optimally badly and both okay and so you can just measure the different ones and it turns out that that you can you can use a tool to figure this out and the tool that we'll be using is called cache grinned and it's at one of the valgrind Suites of caches and what it'll do is it'll tell you what the Miss rates are for the various pieces of code okay and you'll learn how to use that tool and figure out oh look at that we have a high miss rate for some and not for others so that may be why my code is running slowly okay so when you pick the best one of those okay we then got a relative speed up from about six and a half so what other simple changes can we try there's actually a collection of things that we could do that don't even have us touching the code what else could we do four people have played with compilers and such and hint yeah yeah change the compiler flags okay so clang which is the compiler will be using provides a collection of optimization switches and you can specify a switch to the compiler to ask it to optimize so the you do - oh and then a number and zero if you look at the documentation that says do not optimize one says optimize two says optimize even more 3 says optimize yet more ok in this case it turns out that even though it optimized more in oh three turns out Oh - was a better setting ok this is one of these cases it doesn't happen all the time usually oh three does better than Oh two but in this case Oh to actually optimize better than oh three because the optimizations are to some extent heuristic ok and there are also other kinds of automation you can have it do a profile guided optimization where you look at what the performance was and feed that back into the code and then the the compiler can be smarter about how it optimizes and there are a variety of other things so with this simple technology we now choosing a good optimization flag in this case Oh - we got for free basically a factor of three point two five okay without having to do much work at all ok and now we're actually starting to approach one percent of peak performance we got point three percent of peak performance ok so what's causing the low performance why aren't we getting most of the performance out of this machine why do you think yeah yeah we're not using all car so far we're using just one core and how many course we have 18 right 18 course ah 18 cords just sitting there 17 sitting idle while we are trying to optimize one okay so multi core so we have nine cores per chip and there are two of these chips the in our test machine so we're running on just one of them so let's use them all to do that we're going to use the silk infrastructure and in particular we can use what's called a parallel loop which in silk huge call silk four and so you just relate that outer loop for example in this case you say silk four it says do all those iterations in parallel compiler and runtime system are free to schedule them and so forth okay and we can also do it for the inner loop okay and you know it's like or it turns out you can't also do it for the middle loop if you think about it okay so I'll let you do that as a little bit of a homework problem why can't I just do a soak four of the inner loop okay so the question is which parallel version works best so we can do parallel the I loop we can parallel the J loop and we can do I and J together you can't do K just with a parallel loop and expect to get the right thing okay so and that's this way so if you look why what a spread of running times right okay if I paralyze the just the I loop it's three point one eight seconds and if I paralyze the J loop its it actually slows down I think right and then if I do both I and J its Eve it's still bad I just want to do the out loop there this has through it turns out with scheduling overhead and we'll learn about scheduling overhead and how you predict that and such so the rule of thumb here is paralyze outer loops rather than inner loops okay and so when we do parallel loops we get almost 18 X speed-up on 18 cores okay so let me assure you not all code is that easy to paralyze okay but this one happens to be so now we're up to what about just over 5% of peak okay so where are we losing where we're losing time here okay why are we getting just 5% yeah yep good so that's one and there's one other that we're not using very effectively because that's one and those are the two optimizations we're gonna do two to get a really good code here so what's the other one yeah that's actually related to the same question okay but there's another completely different source of of opportunity here yeah yeah okay we can actually manage the cache misses better okay so let's go back to hardware caches caches and let's restructure the computation to reuse data in the cache as much as possible because cache misses are slow and hits are fast and try to make the most of the cast by reusing the data that's already there okay so let's just take a look suppose that we're we're gonna just compute one row of see okay so we go through one row of see that's going to take us since is 4096 long vector there that's going to basically be 4096 writes that we're going to do okay and we're gonna get some spatial locality there which is good but we're basically doing the processors doing 4096 writes now to compute that row okay I need to access 4096 reads from a okay and I need all of B okay cuz I go each column of B okay as I'm going through to fully compute C do people see that okay so I need to just compute one row of C I'm gonna compute what I need to access one row of a and all of B okay because the first element of C needs the whole first column of B the second element of C needs the whole second column of B once again don't worry if you don't fully understand this because right now I'm just ripping through this at high speed we're going to go into this and much more depth in the class and there'll be plenty of time to master this stuff but the main thing to understand is you're going throw a B then I want to compute another row of C I'm going to do the same thing I'm gonna go through one row of a and all of B again so that when I'm done we do about 16 million 17 million memory accesses total okay that's a lot of memory asses so what if instead of doing that I do things in blocks okay so what if I want to compute a 64 by 64 block of C rather than a row of C so let's take a look at what happens there so remember by the way this number 16 17 million okay because we're gonna compare with it okay so what about to compute a block so if I look at a block that is going to take me 64 by 64 also takes 4096 acts writes to see same number okay but now I have to do about 200,000 reads from a because I need to access all those rows okay and then for B I need to access 64 columns of B okay and that's another two thousand two hundred sixty two thousand reads from B okay which ends up being half a million memory accesses total okay so I end up doing way fewer accesses okay if I can if if those blocks will fit in my cache okay so I do much less to compute the same size footprint if I compute a block rather than computing a row okay much more efficient okay and that's a scheme called tiling and so if you do tiled matrix multiplication what you do is you bust your matrices into let's say 64 by 64 sub matrices and then you do two levels of matrix multiply you do an outer level of multiplying of the blocks using the same algorithm and then when you hit the inner to do a 64 by 64 matrix multiply I then do another three nested loops you end up with six nested loops okay and so you're basically you know busting it like this and there's a tuning parameter of course which is you know how big do I make my tile size you know if it's s by s what should I do it the Leafs there should it be 64 should it be 128 should it be what number should I use there how do we find the right value of how do we find the right value of s this tuning parameter okay ideas of how we might find it yeah you could do that you might get a number but who knows what else is going on in the cache while you're doing this yeah test a bunch of them experiment okay try them see which one gives you good numbers and when you do that you get it turns out that 32 gives you the best performance okay for this particular problem okay so you can block it and then you can get faster and when you do that you now end up with that gave us a speed of about 1.7 okay so we're now up to what we're almost ten percent of peak okay and the other thing is that if you use cache grande or a similar tool you can figure out how many cache references there are and so forth and you can see that in fact it's dropped quite considerably when you do blocked you know the tiling versus just the straight parallel loops okay so once again you can use tools to help you figure this out and to understand the cause of what's going on well it turns out that our chips don't have just one cache they've got three levels of caches okay there's l1 cache okay and there's data and instructions so we're thinking about data here for the data for the matrix then it's got an l2 cache which is also private to the processor and then a shared l3 cache and then you go out to the DRAM you also can go to your neighboring processors and and such okay and they're of different sizes you can see they grow in size 32 to 232 kilobytes 256 kilobytes to 25 megabytes to main memory which is 60 gigabytes so what you can do is if you I want to do two-level tiling okay you can have two tuning parameters s and T and now you get to do you can't do binary search to find it unfortunately because it's multi-dimensional you kind of have to do it exhaustively and when you do that you end up with with nine nested loops okay but of course we don't really want to we have three levels of caching okay can anybody figure out the inductive number how many out for three levels of caching how many levels of tiling do we have to do this is a this is a gimme right twelve good twelve okay yeah into twelve okay that really and man you know when I say the code gets ugly when you start making things go fast okay right this is like okay okay but it turns out there's a trick you can tie out for every power of two simultaneously by just solving the problem recursively so the idea is that you do divide and conquer you divide each of the matrices into four sub matrices okay and then if you look at the calculations you need to do you have to solve eight subproblems of half the size and then do a and then do an addition okay and so you have eight multiplications of size n over two by n over two and one addition of n by n matrices and that gives you your answer but then of course what you're gonna do is solve each of those recursively okay and that's going to give you essentially the same type of performance here's the code we I don't expect that you understand this but we've written this using in parallel because it turns out you can do four of them in parallel and the silks spawn here says go and do this subroutine which is basically a sub problem and then while you're doing that the you're allowed to go and execute the next statement which will do another spawn another spawn and finally this and then this statement says but don't start the next phase until you've finished the first phase okay and we'll learn about about this stuff okay when we do that we get a running time of about 93 seconds which is about 50 times slow than the last version we're using cash much better but it turns out you know nothing is free nothing is easy in typically in performance engineer you have to be of clever why what happened here what why did this get worse even though it turns out if you actually look at the caching numbers you're getting great hits on cash I mean you have very few very few cache misses lots of hits on cache but we're still slower why do you suppose that is let me get some yeah yeah the overhead to start of the function and in particular the place that it matters is that the leaves of the computation okay so what we do is we have a very small base case we're doing this overhead all the way down to N equals 1 so there's a function call overhead even when you're multiplying one by one so hey let's pick a threshold and below that threshold let's just use a standard you know good algorithm for the for that threshold and if we're above that we'll do divide and conquer okay so so what we do is we call a instant you know if we're less than the threshold okay we we call a base case and the base case looks very much like just ordinary makes us multiply okay and so when you do that you can once again look to see what's the best value for the base case and it turns out in this case I guess it's it's 64 okay we get down to one point nine five seconds I didn't do the base case of one because I tried that and that was the one that gave us terrible performance sorry 32 oh yeah 32 is even better 1.3 good yeah so he picked 32 I think I even or I didn't highlight it okay I should have highlighted that on the slide so so then when we do that we now are getting 12 percent of peak okay and we're doing if you count up how many cache misses we have you can see that you know here's the data cache for l1 and with parallel divide-and-conquer it's the lowest but also now so is the last level caching okay and then total number of references as small as well so divide-and-conquer turns out to be a big win here okay now the other thing that we mentioned which was we're not using the vector Hardware all of these things have vectors that we can operate on okay they have vector heart that process data in what's called sim deme fashion which means single instructions same multiple data that means you give one instruction and it does operations on a vector okay and as we mentioned we have we have eight floating-point units per core which which we can also do a fuse multiply add okay so so each vector Reginald multiple words I believe in the one the machine we're using this term is four words I think so okay and the but it's important when you use these you can't just use them willy-nilly you've got a you've got to have all the you know you have to operate on or on the data as one chunk of vector data you can't you know have this Lane doing of the vector unit doing one thing in a different Lane doing something else they all have to be doing essentially the same thing the only difference being the indexing of memory okay so when you do that you can it so already we've actually been taking advantage of it but you can produce a vectorization report by by asking that and it'll tell you the system will tell you what what kinds of things are being vectorized which things are being vectorized which aren't and we'll talk about how you vectorize things that the compiler doesn't want to vectorize okay and in particular most machines don't support the newest sets of vector instructions so the Pilar uses vector instructions conservatively by default so what you were particularly if you're compiling for a particular machine you can say use that particular machine and here's some of the vectorization flags you can say use the AVX instructions if you have a VX you can use a VX - you can use the fuse multiply add vector instructions you can give a string that tells you the architecture that you're running on on that special things and you can say well use whatever machine I'm currently compiling on ok and it'll figure out which architecture is that ok now floating-point numbers as we'll talk about turn out to have some undesirable properties like they're not associative so if you do a times B times C how you parenthesize that can give you two different numbers and so if you give a specification of a code typically the compiler will not change the order of associativity because it says I want to get exactly the same result but you can give it a flag called fast math - F fast math which will allow it to do that kind of reordering ok if it's not important to you that it be the same as the default ordering ok and when you use that so in particularly using architecture native and fast math we actually get about double the performance out of vectorization just have any compiler vectorizer ok yeah question there's sixty four-bit yeah but so we use these days 64-bit is pretty standard they call that double precision but it's pretty stand unless you're doing AI applications in which case you may want to do lower precision arithmetic no flow to float is 32 okay so generally people use 60 and who are doing serious you know linear algebra calculations you 64 bits but sometimes they can use actually sometimes they can use less and then you can get more performance if you discover you can use fewer bits in your representation we'll talk about that too okay so last thing that we're going to do is there are you can actually use the instructions the vector instructions yourself rather than relying the compiler to do it and there's a whole manual of in strings ik instructions that you can call from C that allow you to do you know the specific vector instructions that you might want to do it so the compiler doesn't have to figure that out and so um and you can also use some other more insights to do things like you can do pre-processing and you can transpose the matrices which turns out to help and do data alignment and there's a lot of other things and using clever algorithm for the base case okay and so you and you do more performance engineering you think about what you're doing you code and then you run run run to to test and that's one nice reason to have the cloud because you can do tests in parallel so it takes you less time to do your tests in terms of your you know sitting around time when you're doing something you say oh I want to do ten tests let's spin up ten machines and do all the tests at the same time when you do that and the main one we're getting out of this is the AVX intrinsics we get up to point four one of peak so 41 percent of peak and get about fifty thousand speed-up okay and it turns out that's where we quit okay and the reason is because we built we beat Intel's professionally engineered math kernal library at that point okay you know there's a good question is why aren't we getting all of peak and you know I invite you to to figure that out okay it turns out though Intel MKL is better than what we did because we assumed it was a power of two Intel doesn't assume that it's a power of two and they're more robust and although we win on the 496 by 496 by 4096 matrices they win on other sizes of matrices so it's not all it's not all things so so but the end of the story is that you know what have we done we have just done a factor of 50,000 okay if you looked at the gas economy okay of a jumbo jet okay and getting the kind of performance that we just got in terms of miles per gallon you would be able to run a jumbo jet on a on a little Vespa scooter or whatever type of scooter that is okay that's how much we've been able to do it you gently let me just caution you won't see the magnitude of a performance improvement that we obtained for matrix multiplication okay that turns out to be one where it's a really good example because it's so dramatic but we will see some substantial numbers and in particular in sixty one seventy two you'll learn how to print this currency of performance all by yourself so that you don't have to take somebody else's library you can you know say oh no I'm an engineer that let me mention one other thing you this course we're going to focus on multi-core computing we are not in particular going to be doing GPUs or file systems or network performance in the real world those are hugely important ok what we found however is that it's better to learn a particular domain in particularly this particular domain people who master who master multi-core performance engineering in fact go on to do these other things and are really good at it ok because you've learned this sort of the core the basis the foundation you 2 00:00:03,189 --> 00:00:05,769 the following content is provided under a Creative Commons license your support will help MIT OpenCourseWare continue to offer high quality educational resources for free to make a donation or to view additional materials from hundreds of MIT courses visit MIT opencourseware at ocw.mit.edu so welcome to six one seven two my name is charles leiserson and i am one of the two lecturers this term the other is professor Julian shun were both in EECS and in csail on the seventh floor of the gates building if you don't know it you are in performance engineering of software systems so if this is the wrong if you found yourself in the wrong place now is the time to exit I want to start today by talking a little bit about why we do performance engineering and then I'll do some a little bit of administration and then sort of dive into sort of a case study that'll give you a good sense of some of the things that we're going to do during the term I put the administration in the middle because it's like if if you don't if from me telling you about the course you don't want to do the course then it's like why should you listen to the administration right it's like so so let's just dive right in okay so so the first thing to always understand whenever you're doing something is a perspective on your on what matters in what you're doing so we're going to study the whole term we're gonna do software performance engineering and so this is kind of interesting because it turns out that performance is usually not at the top of what people are interested in when they're building software okay what are some of the things that are more important than software that's sorry then performance yeah deadlines good cost correctness extensibility yeah we would go on and on I think that you folks could probably make a pretty long list I made a short list of all the kinds of things that are more important than performance so then if programmers are so willing to sacrifice performance for these properties why do we study performance okay so this is kind of a bit of a paradox in a bit of a puzzle why do you study something that clearly isn't at the top of the list of what most people care about when they're developing software I think the answer to that is that performance is the currency of computing okay you use performance to buy these other properties so you'll say something like gee I want to make it easy to program and so therefore I'm willing to sacrifice some performance to make something easy to program now I'm willing to sacrifice some performance to make sure that my system is secure okay and all those things come out of your performance budget and clearly if performance degrades too far your stuff becomes unusable okay when I talk with people in with programmers and I say you know people are fond of saying ah performance you know yo you do performance performance doesn't matter I never think about it then I talk with with people who use computers and I ask what's your main complaint about the the computing systems you use answer too slow okay so it's interesting whether you're the producer or that whatever but but the real answer is that that performance is is like currency it's something you spend I would rather have if you look you know would I rather have $100 or a gallon of water well water is indispensable to life there's circumstances certainly where I would prefer to have the water okay then $100 okay but in our modern society I can buy water for much less than $100 okay so even though water is essential to life and far more important than money money is a currency and so I prefer to have the money because I can just buy the things I and that's the same kind of analogy of performance it has no intrinsic value but it contributes to things you can use it to buy things that you care about like usability or testability or what have you okay now in the early days of computing software performance engineering was common because machine resources were limited if you look at these machines from 1964 to 1977 I mean look at the site look at how many bytes they have on them right the in 64 there is a computer with 524 kilobytes okay that was a big machine back then that's kilobytes that's not megabytes that's not gigabytes that's kilobytes okay and many programs would strain the machine resources okay the clock rate for that machine was 33 kilohertz what's a typical clock rate today about what four gigahertz three gigahertz two gigahertz somewhere up there yeah somewhere in that range okay and here they were operating with kilohertz so many programs would not fit without intense performance engineering and one of the things also that there's a lot of a lot of sayings that came out of that area Donald Knuth who's one of the touring award winner absolutely fabulous computer scientists in all respects wrote premature optimization is the root of all evil and I invite you by the way to look that quote up and because there's actually taken out of context okay so trying to optimize stuff too early he was worried about okay bill Wolfe who built the design the bliss language and worked on the pdp-11 and such said more computing sins are committed in the name of efficiency without necessarily achieving it than for any other single reason including blind stupidity okay and Michael Jackson said the first rule of program optimization don't do it second rule of program optimization for experts only don't do it yet okay so everybody warning away because when you start trying to make things fast your code becomes unreadable okay making code that is readable and fast now that's that's where the art is and hopefully we'll learn a little bit about doing that okay and indeed there was no real point in in working too hard on performance engineering for many years if you look at technology scaling and you look at how many transistors are on various processor designs up until about 2004 we had Moore's law in in full throttle okay with chip densities doubling every two years and really quite amazing and along with that as they shrunk the dimensions of chips because by miniaturization the clock speed would go up correspondingly as well and so if you found something was too slow wait a couple years okay wait a couple years it'll be faster okay and so there wasn't you know if you're going to do something with software and make your software ugly that really wasn't a real you know wasn't a real good payoff compared to just simply waiting around and in that era there was something called Dennard scaling where which allowed things to as things shrunk allowed the clock speeds to get larger basically by reducing power you could reduce power and still keep everything fast and we'll talk about that in a minute so if you look at what happened to from 1977 to 2004 here are Apple computers with similar similar price tags and you can see the the clock rate really just skyrocketed one megahertz 400 megahertz 1.8 gigahertz okay and the data paths went from 8 bits to 32 64 the memory because correspondingly grow cost approximately the same and that was that's the legacy from Moore's Law and the tremendous advances in semiconductor technology and so until 2004 moore's law in the scaling of clock frequency so court Dennard scaling was essentially a printing press for the currency of performance okay you didn't have to do anything you just made the hardware go fast or very easy very and all that came to an end well some of it came to an end in 2004 when clock speeds plateaued okay so if you look at this around 2005 you can see all the speeds we hit you know 2 to 4 gigahertz and we have not been able to make chips go faster than that in any practical way since then but the densities have kept growing great now the reason that the clock speed flattened was because of power density and this is a slide from Intel from that era looking at the growth of power density and what they were projecting was that that the junction temperatures of the transistors on the chip if they just keep scaling the way they had been scaling would would start to approach first of all the temperature of a nuclear reactor than the temperature of a rocket nozzle and then the Sun surface okay so that we're not going to build little technology that cools that very well and even if you could solve it for a little bit the writing was on the wall we cannot scale clock frequencies anymore the reason for that is that originally clock frequency was scaled assuming that the most of the power was dynamic power which was going when you switched the circuit and what happened as we kept reducing that and reducing that is something that used to be in the noise namely the Deacon's okay started to become significant to the point where now today the dynamic power is is far less of a concern than the static power from just the circuits sitting there leaking and when you miniaturize you can't stop that effect from happening so what did the vendors do in 2004 and 2005 and since is they said oh gosh we've got all these transistors to use but we can't use the transistors to make stuff run faster so what they did is they introduced parallelism in the form of multi-core processors they put more than one processing core in a chip and to scale performance they would you know have multiple cores and each generation of Moore's law now was potentially doubling the number of of cores and so if you look at what happened for processor cores you see that around 2005 2004 2005 we started to get multiple processing cores per chip to the extent that today it's basically impossible to find a single core chip for a laptop or a workstation or whatever everything is multi-crew you can't buy just one you have to buy a parallel processor and and so the impact of that was that performance was no longer free you couldn't just speed up the hardware now if you wanted to use that potential you had to do parallel program and that's not something that anybody in the industry really had done so today there are a lot of other things that happen in the in that intervening time we got vector units as common parts of our machines we got GPUs we got steeper cache hierarchies we have a configurable logic on some machines and so forth and now it's up to the software to adapt to it and so although we don't want to have to deal with performance today you have to deal with performance and in your lifetimes you will have to deal with performance okay in software if you're gonna have effective software okay you can see what happened also this is a study that we did looking at software bugs and a variety of open source projects where they're mentioning the word performance and you can see that in 2004 the numbers start going up you know some of them it's not as as convincing for some things as others but generally there's a trend of after 2004 people started worrying more about performance if you look at software developer jobs you know as of tooth you know early mid-2000s 2000 oh ohs I guess okay the you see once again the mention of performance and jobs is going up and anecdotally I can tell you know I had one student who came to me after the spring after he'd taken six 172 and he said you know I went and I had applied for five jobs and every job asked me every at every job interview they asked me a question I couldn't have answered if I hadn't taken six 172 and I got five offers okay and when I compared those offers they tended to be 20 to 30% larger than people are just web monkeys okay so so anyway that's not to say that you should necessarily take this class okay but I just want to point out that what we're gonna learn is going to be interesting from a practical point of view ie your futures okay as well as theoretical points of view and technical points of view okay so modern processors are really complicated and the big question is how do we write software to use that modern hardware efficiently okay I want to give you a example of performance engineering of a very well studied problem namely matrix multiplication who has never seen this problem okay so there we got some Joker's in the class I can say okay so this is you know it takes n cubed operations because you're basically computing N squared dot products okay so essentially if you add up the total number of operations it's about 2n cubed because there is essentially a multiply and an ADD for every pair of terms that need to be accumulated okay so it's basically 2n cubed we're gonna look at it assuming for simplicity that our n is a an exact power of two okay now the machine that we're gonna look at is going to be one of the one one of the ones that you'll have access to an AWS okay it's a it's a compute optimized machine which has a Haswell microarchitecture running at 2.9 gigahertz there are two processor chips for each of these machines and nine processing cores per chip so a total of 18 cores so that's the amount of parallel processing it does two-way hyper-threading which we're actually going to not deal a lot with it hyper threading gives you a little bit more performance but it also makes it really hard to measure things so generally we will turn off hyper threading but the performance that you get tends to be correlated with what you get when your hyper thread for floating-point unit there it is capable of doing eight double precision operations that 64-bit floating-point operations including a fused multiply add per core per cycle okay so that that's a vector unit so you basically each of these 18 cores can do eight double precision operations and so including a fuse multiply add which is actually two operations okay the way that they count these things okay it has a cache line size of 64 bytes the AI cache is 32 kilobytes which is 8 way set associative we'll talk about some of these things if you don't know all the terms that's okay we're gonna cover most of these terms later on it's got a D cache of the same size it's got an l2 cache of 256 kilobytes and it's got an l3 cache or what's sometimes called an LLC last level cache of 25 megabytes and then it's got 60 gigabytes of DRAM so this is a honking big machine okay this is like you can get things to sing on this okay if you look at the peak performance it's the clock speed times 2 processor chips times 9 processing cores per chip each capable of if you can use both the multiply and the add 16 floating-point operations and that goes out to just short of a teraflops okay 836 gigaflops okay so that's that's a lot of power okay that's a lot of power these are fun machines actually okay especially when we get into things like the the the game playing AI and stuff that we do for the fourth project you'll see it they're really fun can I have a lot of compute okay now the base here's the basic code this is the full code for Python for doing matrix multiplication now generally in Python you wouldn't use this code because you just call a library subroutine that does matrix multiplication but sometimes you have a problem I'm going to illustrate with matrix multiplication but sometimes you have a problem that is for what you have to write the code and I want to give you an idea of how what kind of performance you get out of Python okay in addition somebody has to write if there is a library routine somebody had to write it and that person was a performance engineer because they wrote it to be as fast as possible and so this will give you an idea of what you can do to make run fast okay so when you run this code so you can see that the start time you know before the triply nested loop right here before the triply nested loop we take a time measurement and then we take another time measurement at the end and then we print the difference and then that's just this classic three triply nested loop for for matrix multiplication and so when you run this how long is this run for do you think any guesses let's see how about let's do this runs for six microseconds who thinks six microseconds how about six milliseconds how about six milliseconds how about six seconds how about six minutes okay how about six hours how about six days okay of course it's important to know what size it is this 4096 by 4096 as it shows in the code okay so and those of you didn't vote can I wake up let's get active this is active learning put yourself out there okay it doesn't matter whether you're right and wrong there'll be a bunch of people who got the right answer there have no idea why okay so it turns out it takes about 21,000 seconds which is about six hours okay amazing is this fast Yeah right duh right this fast no that you know how how do we tell whether whether this is fast or not okay you know what should we expect from our machine so let's do a back-of-the-envelope calculation of of how many operations there are and how fast we ought to be able to do we just went through and said what all the parameters the machine so there are 2n cubed operations that need to be performed we're not doing strassens algorithm or anything like that we're just doing straight triply nested loop so that's 2 to the 37th floating-point operations okay the running time is 21,000 seconds so that says that we're getting about 6.2 5 mega flops out of our machine when we run that code ok just by dividing it out how many floating-point operations per second do we get we take the number of operations divide it by the time okay the peak as you recall was about 836 gigaflops okay and we're getting 6.25 mega flops okay so we're getting about 0.0007 5 percent of peak okay this is not fast okay this is not fast so let's let's do something really similar let's code it in Java rather than Python okay so we take just that loop the code is almost the same okay triply nested loop we run it in Java okay and the running time now it turns out is about just under 3,000 seconds which is about 46 minutes the same code Python Java okay we got a almost a nine times speed-up over just simply coding it in a different language okay well let's try see that's the clap that's the language we're going to be using here what happens when you code it and see it's exactly the same thing okay we're going to use the clang LLVM 5.0 compiler I believe we're using 6.0 this term is that right yeah okay I should have rerun these numbers for sex but I didn't so now it's basically 1,100 seconds which is about 19 minutes so we got then about it's twice as fast as Java and about 18 times faster than Python okay so here's where we stand so far okay we have the running time of these various things okay and the relative speed up is how much faster it isn't in the previous row and the absolute speed up is how it is compared to the first row and now we're managing to get now 0.01 4 percent of peak so we're still we're still slow but before we go and try to optimize it further like why is Python so slow and C so fast anybody know okay that's that's kind of on the right track anybody else having the ice articulate a little bit why Python is so slow yeah you're right like multiplying add those aren't the only instructions pythons doing it's doing lots of code yeah yeah okay good so the big reason why Python is slow and C is so fast is that python is interpreted and C is compiled directly to byte to machine code and Java is somewhere in the middle because Java is compiled to bytecode which is then interpreted and then just-in-time compiled into machine code so let me tell it talked a little bit about these things so so interpreters such as in Python are versatile but slow it's one of these things where they said we're gonna take some of our performance and use it to make a more flexible easier to program environment okay the interpreter basically reads interprets and performs each program statement and then updates the Machine State so it's not just it's actually going through an each time reading your code figuring out what it does and then implementing it so there's like all this overhead compared to just doing its operations so interpreters can easily support high-level programming features and they can do things like dynamic code alteration and so forth at the cost of performance so that you know typically the the cycle for an interpreter is you read the next statement you interpret the statement you then perform the statement and then you update the state of the machine and then you would fetch the next instruction okay and you're going through that each time and that's done in software okay when you have things compiled to machine code it goes through a similar thing but it's highly optimized just for the things that machines are done okay and so when you compile you're able to take advantage of the hardware and interpreter of machine instructions and that's much much lower overhead than the big software overhead you get with Python now JIT is somewhere in the middle what's used in Java JIT compilers can recover some of the performance in fact it did a pretty good job in this case the idea is when the code is first interpreted it's executed it's interpreted and then the runtime C system keeps track of how often the various pieces of code are executed and whenever it finds that there's some piece of code that it's executing frequently it then calls the compiler to compile that piece of code and then subsequent to that it runs the compiled code so it tries to get the big advantage of the of performance by only compiling the things that are necessary you know for which it's actually going to pay off to to invoke the compiler to do ok so so anyway so that's the big difference with with those kinds of things one of the reasons we don't use Python in this class is because the performance model is hard to figure out ok see it's much closer to the metal much closer to the silicon ok and so it's much easier to figure out what's going on in that in that context ok but we will have a guest lecture that we're going to talk about performance in managed languages like like Python so it's not that we're going to ignore the topic but we will learn how to do performance engineering in a place where it's easier to do it ok now one of the things that good compiler will do is you know once you get to let's say we have the C version which is where we're going to move from this point cuz that's the fastest week we got so far is it turns out that you can change the order of loops in this program without affecting the correctness ok so here we went you know for I for j4k do the update ok we could otherwise do we could do for I for k4j do the update and it computes exactly the same thing or we could do for K for j4 I do the updates okay so we can change the order without affecting the correctness okay and so do you think the order of loops matters for performance uh uh you know I think this is like this leading question yeah question yes okay and you're exactly right cache locality is what it is so when we do that we get the loop order affects the running time by a factor of 18 whoa just by switching the order okay what's going on there okay what's going on so we're going to talk about this in more depth so I'm just going to fly through this because this is just sort of showing you the kinds of considerations that you do so hardware there are each processor reads and writes main memory in contiguous blocks called cache lines okay previously access cache lines are stored in a small memory called cache that sits near the processor when it access when the processor accessed something if it's in the cache you get a hit that's very cheap okay and fast if you miss you have to go out to either a deeper level cache or all the way out to main memory that is much much slower and we'll talk about that kind of thing so what happens in in for this matrix problem is the matrices are laid out in memory and look row major order that means you take you know you have a two-dimensional matrix it's laid out in the linear order of the addresses of memory by essentially taking Row one and then after Row one stick Row two and after that stick Row three and so forth and unfolding there's another order that things could have been laid out in fact they are in Fortran which is called column major order okay so it turns out C and Fortran operate in different orders okay and turns out it affects performance which way it does it so let's just take a look at the access pattern for order ijk okay so what we're doing is once we figure out what I and what J is we're gonna go through and cycle through k and as we cycle through K okay CIJ stays the same for everything we get for that excellent spatial locality because we're just accessing the same location every single time it's going to be in cache it's always going to be there it's going to be fast to access see for a what happens is we go through in a linear order and we get good spatial locality but for B it's going through columns and those points are distributed far away in memory so the processor is going to be bringing in 64 bytes to operate on a particular datum okay and then it's ignoring seven of the eight by seven of the eight floating-point words on that on that cache line and going to the next one so it's wasting an awful lot okay so this one has good spatial locality and that it's all adjacent and you would use the cache lines effectively this one you're going 4096 elements apart it's got poor spatial locality okay and that's why and that's for this one so then if we look at the different other ones this one the order ikj it turns out you get good spatial locality for both C and B and excellent for a okay and if you look at you know even another one you don't get nearly as good as the other one so there's a whole range of things okay this one you're doing optimally badly and both okay and so you can just measure the different ones and it turns out that that you can you can use a tool to figure this out and the tool that we'll be using is called cache grinned and it's at one of the valgrind Suites of caches and what it'll do is it'll tell you what the Miss rates are for the various pieces of code okay and you'll learn how to use that tool and figure out oh look at that we have a high miss rate for some and not for others so that may be why my code is running slowly okay so when you pick the best one of those okay we then got a relative speed up from about six and a half so what other simple changes can we try there's actually a collection of things that we could do that don't even have us touching the code what else could we do four people have played with compilers and such and hint yeah yeah change the compiler flags okay so clang which is the compiler will be using provides a collection of optimization switches and you can specify a switch to the compiler to ask it to optimize so the you do - oh and then a number and zero if you look at the documentation that says do not optimize one says optimize two says optimize even more 3 says optimize yet more ok in this case it turns out that even though it optimized more in oh three turns out Oh - was a better setting ok this is one of these cases it doesn't happen all the time usually oh three does better than Oh two but in this case Oh to actually optimize better than oh three because the optimizations are to some extent heuristic ok and there are also other kinds of automation you can have it do a profile guided optimization where you look at what the performance was and feed that back into the code and then the the compiler can be smarter about how it optimizes and there are a variety of other things so with this simple technology we now choosing a good optimization flag in this case Oh - we got for free basically a factor of three point two five okay without having to do much work at all ok and now we're actually starting to approach one percent of peak performance we got point three percent of peak performance ok so what's causing the low performance why aren't we getting most of the performance out of this machine why do you think yeah yeah we're not using all car so far we're using just one core and how many course we have 18 right 18 course ah 18 cords just sitting there 17 sitting idle while we are trying to optimize one okay so multi core so we have nine cores per chip and there are two of these chips the in our test machine so we're running on just one of them so let's use them all to do that we're going to use the silk infrastructure and in particular we can use what's called a parallel loop which in silk huge call silk four and so you just relate that outer loop for example in this case you say silk four it says do all those iterations in parallel compiler and runtime system are free to schedule them and so forth okay and we can also do it for the inner loop okay and you know it's like or it turns out you can't also do it for the middle loop if you think about it okay so I'll let you do that as a little bit of a homework problem why can't I just do a soak four of the inner loop okay so the question is which parallel version works best so we can do parallel the I loop we can parallel the J loop and we can do I and J together you can't do K just with a parallel loop and expect to get the right thing okay so and that's this way so if you look why what a spread of running times right okay if I paralyze the just the I loop it's three point one eight seconds and if I paralyze the J loop its it actually slows down I think right and then if I do both I and J its Eve it's still bad I just want to do the out loop there this has through it turns out with scheduling overhead and we'll learn about scheduling overhead and how you predict that and such so the rule of thumb here is paralyze outer loops rather than inner loops okay and so when we do parallel loops we get almost 18 X speed-up on 18 cores okay so let me assure you not all code is that easy to paralyze okay but this one happens to be so now we're up to what about just over 5% of peak okay so where are we losing where we're losing time here okay why are we getting just 5% yeah yep good so that's one and there's one other that we're not using very effectively because that's one and those are the two optimizations we're gonna do two to get a really good code here so what's the other one yeah that's actually related to the same question okay but there's another completely different source of of opportunity here yeah yeah okay we can actually manage the cache misses better okay so let's go back to hardware caches caches and let's restructure the computation to reuse data in the cache as much as possible because cache misses are slow and hits are fast and try to make the most of the cast by reusing the data that's already there okay so let's just take a look suppose that we're we're gonna just compute one row of see okay so we go through one row of see that's going to take us since is 4096 long vector there that's going to basically be 4096 writes that we're going to do okay and we're gonna get some spatial locality there which is good but we're basically doing the processors doing 4096 writes now to compute that row okay I need to access 4096 reads from a okay and I need all of B okay cuz I go each column of B okay as I'm going through to fully compute C do people see that okay so I need to just compute one row of C I'm gonna compute what I need to access one row of a and all of B okay because the first element of C needs the whole first column of B the second element of C needs the whole second column of B once again don't worry if you don't fully understand this because right now I'm just ripping through this at high speed we're going to go into this and much more depth in the class and there'll be plenty of time to master this stuff but the main thing to understand is you're going throw a B then I want to compute another row of C I'm going to do the same thing I'm gonna go through one row of a and all of B again so that when I'm done we do about 16 million 17 million memory accesses total okay that's a lot of memory asses so what if instead of doing that I do things in blocks okay so what if I want to compute a 64 by 64 block of C rather than a row of C so let's take a look at what happens there so remember by the way this number 16 17 million okay because we're gonna compare with it okay so what about to compute a block so if I look at a block that is going to take me 64 by 64 also takes 4096 acts writes to see same number okay but now I have to do about 200,000 reads from a because I need to access all those rows okay and then for B I need to access 64 columns of B okay and that's another two thousand two hundred sixty two thousand reads from B okay which ends up being half a million memory accesses total okay so I end up doing way fewer accesses okay if I can if if those blocks will fit in my cache okay so I do much less to compute the same size footprint if I compute a block rather than computing a row okay much more efficient okay and that's a scheme called tiling and so if you do tiled matrix multiplication what you do is you bust your matrices into let's say 64 by 64 sub matrices and then you do two levels of matrix multiply you do an outer level of multiplying of the blocks using the same algorithm and then when you hit the inner to do a 64 by 64 matrix multiply I then do another three nested loops you end up with six nested loops okay and so you're basically you know busting it like this and there's a tuning parameter of course which is you know how big do I make my tile size you know if it's s by s what should I do it the Leafs there should it be 64 should it be 128 should it be what number should I use there how do we find the right value of how do we find the right value of s this tuning parameter okay ideas of how we might find it yeah you could do that you might get a number but who knows what else is going on in the cache while you're doing this yeah test a bunch of them experiment okay try them see which one gives you good numbers and when you do that you get it turns out that 32 gives you the best performance okay for this particular problem okay so you can block it and then you can get faster and when you do that you now end up with that gave us a speed of about 1.7 okay so we're now up to what we're almost ten percent of peak okay and the other thing is that if you use cache grande or a similar tool you can figure out how many cache references there are and so forth and you can see that in fact it's dropped quite considerably when you do blocked you know the tiling versus just the straight parallel loops okay so once again you can use tools to help you figure this out and to understand the cause of what's going on well it turns out that our chips don't have just one cache they've got three levels of caches okay there's l1 cache okay and there's data and instructions so we're thinking about data here for the data for the matrix then it's got an l2 cache which is also private to the processor and then a shared l3 cache and then you go out to the DRAM you also can go to your neighboring processors and and such okay and they're of different sizes you can see they grow in size 32 to 232 kilobytes 256 kilobytes to 25 megabytes to main memory which is 60 gigabytes so what you can do is if you I want to do two-level tiling okay you can have two tuning parameters s and T and now you get to do you can't do binary search to find it unfortunately because it's multi-dimensional you kind of have to do it exhaustively and when you do that you end up with with nine nested loops okay but of course we don't really want to we have three levels of caching okay can anybody figure out the inductive number how many out for three levels of caching how many levels of tiling do we have to do this is a this is a gimme right twelve good twelve okay yeah into twelve okay that really and man you know when I say the code gets ugly when you start making things go fast okay right this is like okay okay but it turns out there's a trick you can tie out for every power of two simultaneously by just solving the problem recursively so the idea is that you do divide and conquer you divide each of the matrices into four sub matrices okay and then if you look at the calculations you need to do you have to solve eight subproblems of half the size and then do a and then do an addition okay and so you have eight multiplications of size n over two by n over two and one addition of n by n matrices and that gives you your answer but then of course what you're gonna do is solve each of those recursively okay and that's going to give you essentially the same type of performance here's the code we I don't expect that you understand this but we've written this using in parallel because it turns out you can do four of them in parallel and the silks spawn here says go and do this subroutine which is basically a sub problem and then while you're doing that the you're allowed to go and execute the next statement which will do another spawn another spawn and finally this and then this statement says but don't start the next phase until you've finished the first phase okay and we'll learn about about this stuff okay when we do that we get a running time of about 93 seconds which is about 50 times slow than the last version we're using cash much better but it turns out you know nothing is free nothing is easy in typically in performance engineer you have to be of clever why what happened here what why did this get worse even though it turns out if you actually look at the caching numbers you're getting great hits on cash I mean you have very few very few cache misses lots of hits on cache but we're still slower why do you suppose that is let me get some yeah yeah the overhead to start of the function and in particular the place that it matters is that the leaves of the computation okay so what we do is we have a very small base case we're doing this overhead all the way down to N equals 1 so there's a function call overhead even when you're multiplying one by one so hey let's pick a threshold and below that threshold let's just use a standard you know good algorithm for the for that threshold and if we're above that we'll do divide and conquer okay so so what we do is we call a instant you know if we're less than the threshold okay we we call a base case and the base case looks very much like just ordinary makes us multiply okay and so when you do that you can once again look to see what's the best value for the base case and it turns out in this case I guess it's it's 64 okay we get down to one point nine five seconds I didn't do the base case of one because I tried that and that was the one that gave us terrible performance sorry 32 oh yeah 32 is even better 1.3 good yeah so he picked 32 I think I even or I didn't highlight it okay I should have highlighted that on the slide so so then when we do that we now are getting 12 percent of peak okay and we're doing if you count up how many cache misses we have you can see that you know here's the data cache for l1 and with parallel divide-and-conquer it's the lowest but also now so is the last level caching okay and then total number of references as small as well so divide-and-conquer turns out to be a big win here okay now the other thing that we mentioned which was we're not using the vector Hardware all of these things have vectors that we can operate on okay they have vector heart that process data in what's called sim deme fashion which means single instructions same multiple data that means you give one instruction and it does operations on a vector okay and as we mentioned we have we have eight floating-point units per core which which we can also do a fuse multiply add okay so so each vector Reginald multiple words I believe in the one the machine we're using this term is four words I think so okay and the but it's important when you use these you can't just use them willy-nilly you've got a you've got to have all the you know you have to operate on or on the data as one chunk of vector data you can't you know have this Lane doing of the vector unit doing one thing in a different Lane doing something else they all have to be doing essentially the same thing the only difference being the indexing of memory okay so when you do that you can it so already we've actually been taking advantage of it but you can produce a vectorization report by by asking that and it'll tell you the system will tell you what what kinds of things are being vectorized which things are being vectorized which aren't and we'll talk about how you vectorize things that the compiler doesn't want to vectorize okay and in particular most machines don't support the newest sets of vector instructions so the Pilar uses vector instructions conservatively by default so what you were particularly if you're compiling for a particular machine you can say use that particular machine and here's some of the vectorization flags you can say use the AVX instructions if you have a VX you can use a VX - you can use the fuse multiply add vector instructions you can give a string that tells you the architecture that you're running on on that special things and you can say well use whatever machine I'm currently compiling on ok and it'll figure out which architecture is that ok now floating-point numbers as we'll talk about turn out to have some undesirable properties like they're not associative so if you do a times B times C how you parenthesize that can give you two different numbers and so if you give a specification of a code typically the compiler will not change the order of associativity because it says I want to get exactly the same result but you can give it a flag called fast math - F fast math which will allow it to do that kind of reordering ok if it's not important to you that it be the same as the default ordering ok and when you use that so in particularly using architecture native and fast math we actually get about double the performance out of vectorization just have any compiler vectorizer ok yeah question there's sixty four-bit yeah but so we use these days 64-bit is pretty standard they call that double precision but it's pretty stand unless you're doing AI applications in which case you may want to do lower precision arithmetic no flow to float is 32 okay so generally people use 60 and who are doing serious you know linear algebra calculations you 64 bits but sometimes they can use actually sometimes they can use less and then you can get more performance if you discover you can use fewer bits in your representation we'll talk about that too okay so last thing that we're going to do is there are you can actually use the instructions the vector instructions yourself rather than relying the compiler to do it and there's a whole manual of in strings ik instructions that you can call from C that allow you to do you know the specific vector instructions that you might want to do it so the compiler doesn't have to figure that out and so um and you can also use some other more insights to do things like you can do pre-processing and you can transpose the matrices which turns out to help and do data alignment and there's a lot of other things and using clever algorithm for the base case okay and so you and you do more performance engineering you think about what you're doing you code and then you run run run to to test and that's one nice reason to have the cloud because you can do tests in parallel so it takes you less time to do your tests in terms of your you know sitting around time when you're doing something you say oh I want to do ten tests let's spin up ten machines and do all the tests at the same time when you do that and the main one we're getting out of this is the AVX intrinsics we get up to point four one of peak so 41 percent of peak and get about fifty thousand speed-up okay and it turns out that's where we quit okay and the reason is because we built we beat Intel's professionally engineered math kernal library at that point okay you know there's a good question is why aren't we getting all of peak and you know I invite you to to figure that out okay it turns out though Intel MKL is better than what we did because we assumed it was a power of two Intel doesn't assume that it's a power of two and they're more robust and although we win on the 496 by 496 by 4096 matrices they win on other sizes of matrices so it's not all it's not all things so so but the end of the story is that you know what have we done we have just done a factor of 50,000 okay if you looked at the gas economy okay of a jumbo jet okay and getting the kind of performance that we just got in terms of miles per gallon you would be able to run a jumbo jet on a on a little Vespa scooter or whatever type of scooter that is okay that's how much we've been able to do it you gently let me just caution you won't see the magnitude of a performance improvement that we obtained for matrix multiplication okay that turns out to be one where it's a really good example because it's so dramatic but we will see some substantial numbers and in particular in sixty one seventy two you'll learn how to print this currency of performance all by yourself so that you don't have to take somebody else's library you can you know say oh no I'm an engineer that let me mention one other thing you this course we're going to focus on multi-core computing we are not in particular going to be doing GPUs or file systems or network performance in the real world those are hugely important ok what we found however is that it's better to learn a particular domain in particularly this particular domain people who master who master multi-core performance engineering in fact go on to do these other things and are really good at it ok because you've learned this sort of the core the basis the foundation you 3 00:00:05,769 --> 00:00:08,019 4 00:00:08,019 --> 00:00:09,850 5 00:00:09,850 --> 00:00:10,930 6 00:00:10,930 --> 00:00:13,120 7 00:00:13,120 --> 00:00:15,160 8 00:00:15,160 --> 00:00:22,769 9 00:00:22,769 --> 00:00:27,490 10 00:00:27,490 --> 00:00:30,279 11 00:00:30,279 --> 00:00:32,740 12 00:00:32,740 --> 00:00:36,910 13 00:00:36,910 --> 00:00:40,869 14 00:00:40,869 --> 00:00:45,670 15 00:00:45,670 --> 00:00:49,420 16 00:00:49,420 --> 00:00:51,340 17 00:00:51,340 --> 00:00:53,680 18 00:00:53,680 --> 00:00:57,790 19 00:00:57,790 --> 00:01:02,290 20 00:01:02,290 --> 00:01:06,090 21 00:01:06,090 --> 00:01:08,020 22 00:01:08,020 --> 00:01:11,139 23 00:01:11,139 --> 00:01:14,050 24 00:01:14,050 --> 00:01:15,460 25 00:01:15,460 --> 00:01:18,609 26 00:01:18,609 --> 00:01:20,950 27 00:01:20,950 --> 00:01:22,660 28 00:01:22,660 --> 00:01:25,810 29 00:01:25,810 --> 00:01:27,460 30 00:01:27,460 --> 00:01:28,899 31 00:01:28,899 --> 00:01:34,810 32 00:01:34,810 --> 00:01:38,590 33 00:01:38,590 --> 00:01:40,149 34 00:01:40,149 --> 00:01:41,319 35 00:01:41,319 --> 00:01:44,530 36 00:01:44,530 --> 00:01:46,090 37 00:01:46,090 --> 00:01:47,620 38 00:01:47,620 --> 00:01:50,490 39 00:01:50,490 --> 00:01:53,109 40 00:01:53,109 --> 00:01:55,090 41 00:01:55,090 --> 00:01:58,569 42 00:01:58,569 --> 00:01:59,859 43 00:01:59,859 --> 00:02:02,289 44 00:02:02,289 --> 00:02:03,730 45 00:02:03,730 --> 00:02:06,719 46 00:02:06,719 --> 00:02:16,410 47 00:02:16,410 --> 00:02:19,509 48 00:02:19,509 --> 00:02:22,110 49 00:02:22,110 --> 00:02:25,660 50 00:02:25,660 --> 00:02:28,150 51 00:02:28,150 --> 00:02:31,330 52 00:02:31,330 --> 00:02:33,189 53 00:02:33,189 --> 00:02:35,260 54 00:02:35,260 --> 00:02:40,020 55 00:02:40,020 --> 00:02:43,390 56 00:02:43,390 --> 00:02:47,700 57 00:02:47,700 --> 00:02:50,530 58 00:02:50,530 --> 00:02:52,480 59 00:02:52,480 --> 00:02:54,160 60 00:02:54,160 --> 00:02:58,180 61 00:02:58,180 --> 00:03:02,170 62 00:03:02,170 --> 00:03:05,620 63 00:03:05,620 --> 00:03:09,010 64 00:03:09,010 --> 00:03:11,650 65 00:03:11,650 --> 00:03:13,990 66 00:03:13,990 --> 00:03:16,750 67 00:03:16,750 --> 00:03:19,120 68 00:03:19,120 --> 00:03:21,790 69 00:03:21,790 --> 00:03:25,420 70 00:03:25,420 --> 00:03:27,340 71 00:03:27,340 --> 00:03:29,530 72 00:03:29,530 --> 00:03:32,980 73 00:03:32,980 --> 00:03:35,830 74 00:03:35,830 --> 00:03:39,730 75 00:03:39,730 --> 00:03:41,080 76 00:03:41,080 --> 00:03:42,370 77 00:03:42,370 --> 00:03:43,750 78 00:03:43,750 --> 00:03:47,490 79 00:03:47,490 --> 00:03:50,890 80 00:03:50,890 --> 00:03:53,949 81 00:03:53,949 --> 00:03:57,760 82 00:03:57,760 --> 00:03:57,770 83 00:03:57,770 --> 00:03:58,770 84 00:03:58,770 --> 00:04:01,030 85 00:04:01,030 --> 00:04:03,010 86 00:04:03,010 --> 00:04:05,410 87 00:04:05,410 --> 00:04:07,780 88 00:04:07,780 --> 00:04:11,170 89 00:04:11,170 --> 00:04:14,980 90 00:04:14,980 --> 00:04:18,849 91 00:04:18,849 --> 00:04:20,440 92 00:04:20,440 --> 00:04:22,690 93 00:04:22,690 --> 00:04:26,140 94 00:04:26,140 --> 00:04:30,420 95 00:04:30,420 --> 00:04:35,400 96 00:04:35,400 --> 00:04:40,360 97 00:04:40,360 --> 00:04:42,100 98 00:04:42,100 --> 00:04:42,110 99 00:04:42,110 --> 00:04:43,530 100 00:04:43,530 --> 00:04:46,600 101 00:04:46,600 --> 00:04:48,310 102 00:04:48,310 --> 00:04:49,419 103 00:04:49,419 --> 00:04:51,669 104 00:04:51,669 --> 00:04:55,799 105 00:04:55,799 --> 00:04:58,839 106 00:04:58,839 --> 00:05:00,399 107 00:05:00,399 --> 00:05:03,999 108 00:05:03,999 --> 00:05:04,469 109 00:05:04,469 --> 00:05:09,009 110 00:05:09,009 --> 00:05:11,439 111 00:05:11,439 --> 00:05:14,049 112 00:05:14,049 --> 00:05:16,089 113 00:05:16,089 --> 00:05:22,209 114 00:05:22,209 --> 00:05:23,980 115 00:05:23,980 --> 00:05:26,919 116 00:05:26,919 --> 00:05:32,709 117 00:05:32,709 --> 00:05:36,070 118 00:05:36,070 --> 00:05:37,299 119 00:05:37,299 --> 00:05:42,579 120 00:05:42,579 --> 00:05:44,439 121 00:05:44,439 --> 00:05:47,439 122 00:05:47,439 --> 00:05:50,739 123 00:05:50,739 --> 00:05:56,679 124 00:05:56,679 --> 00:05:59,079 125 00:05:59,079 --> 00:06:01,209 126 00:06:01,209 --> 00:06:03,129 127 00:06:03,129 --> 00:06:05,649 128 00:06:05,649 --> 00:06:08,079 129 00:06:08,079 --> 00:06:11,529 130 00:06:11,529 --> 00:06:15,639 131 00:06:15,639 --> 00:06:17,290 132 00:06:17,290 --> 00:06:22,719 133 00:06:22,719 --> 00:06:24,179 134 00:06:24,179 --> 00:06:26,489 135 00:06:26,489 --> 00:06:29,049 136 00:06:29,049 --> 00:06:31,569 137 00:06:31,569 --> 00:06:33,100 138 00:06:33,100 --> 00:06:35,009 139 00:06:35,009 --> 00:06:37,329 140 00:06:37,329 --> 00:06:40,480 141 00:06:40,480 --> 00:06:44,739 142 00:06:44,739 --> 00:06:47,559 143 00:06:47,559 --> 00:06:49,149 144 00:06:49,149 --> 00:06:50,549 145 00:06:50,549 --> 00:06:53,319 146 00:06:53,319 --> 00:06:55,269 147 00:06:55,269 --> 00:06:59,920 148 00:06:59,920 --> 00:07:01,119 149 00:07:01,119 --> 00:07:02,679 150 00:07:02,679 --> 00:07:04,719 151 00:07:04,719 --> 00:07:07,929 152 00:07:07,929 --> 00:07:10,899 153 00:07:10,899 --> 00:07:12,909 154 00:07:12,909 --> 00:07:15,179 155 00:07:15,179 --> 00:07:18,459 156 00:07:18,459 --> 00:07:20,829 157 00:07:20,829 --> 00:07:22,089 158 00:07:22,089 --> 00:07:27,069 159 00:07:27,069 --> 00:07:32,289 160 00:07:32,289 --> 00:07:34,119 161 00:07:34,119 --> 00:07:37,239 162 00:07:37,239 --> 00:07:39,159 163 00:07:39,159 --> 00:07:42,850 164 00:07:42,850 --> 00:07:50,729 165 00:07:50,729 --> 00:07:55,779 166 00:07:55,779 --> 00:07:58,119 167 00:07:58,119 --> 00:08:02,939 168 00:08:02,939 --> 00:08:05,529 169 00:08:05,529 --> 00:08:08,199 170 00:08:08,199 --> 00:08:10,089 171 00:08:10,089 --> 00:08:13,809 172 00:08:13,809 --> 00:08:18,009 173 00:08:18,009 --> 00:08:22,329 174 00:08:22,329 --> 00:08:24,639 175 00:08:24,639 --> 00:08:26,229 176 00:08:26,229 --> 00:08:31,439 177 00:08:31,439 --> 00:08:37,499 178 00:08:37,499 --> 00:08:42,269 179 00:08:42,269 --> 00:08:44,980 180 00:08:44,980 --> 00:08:48,910 181 00:08:48,910 --> 00:08:52,929 182 00:08:52,929 --> 00:08:56,730 183 00:08:56,730 --> 00:08:59,860 184 00:08:59,860 --> 00:09:01,449 185 00:09:01,449 --> 00:09:03,189 186 00:09:03,189 --> 00:09:03,199 187 00:09:03,199 --> 00:09:03,579 188 00:09:03,579 --> 00:09:06,970 189 00:09:06,970 --> 00:09:11,350 190 00:09:11,350 --> 00:09:14,770 191 00:09:14,770 --> 00:09:17,760 192 00:09:17,760 --> 00:09:21,730 193 00:09:21,730 --> 00:09:24,340 194 00:09:24,340 --> 00:09:28,180 195 00:09:28,180 --> 00:09:31,510 196 00:09:31,510 --> 00:09:34,390 197 00:09:34,390 --> 00:09:36,970 198 00:09:36,970 --> 00:09:38,650 199 00:09:38,650 --> 00:09:42,940 200 00:09:42,940 --> 00:09:47,140 201 00:09:47,140 --> 00:09:49,720 202 00:09:49,720 --> 00:09:51,970 203 00:09:51,970 --> 00:09:54,670 204 00:09:54,670 --> 00:09:56,350 205 00:09:56,350 --> 00:09:58,560 206 00:09:58,560 --> 00:10:03,970 207 00:10:03,970 --> 00:10:07,510 208 00:10:07,510 --> 00:10:10,900 209 00:10:10,900 --> 00:10:13,450 210 00:10:13,450 --> 00:10:17,590 211 00:10:17,590 --> 00:10:20,500 212 00:10:20,500 --> 00:10:23,740 213 00:10:23,740 --> 00:10:26,230 214 00:10:26,230 --> 00:10:28,750 215 00:10:28,750 --> 00:10:31,660 216 00:10:31,660 --> 00:10:33,930 217 00:10:33,930 --> 00:10:37,690 218 00:10:37,690 --> 00:10:39,850 219 00:10:39,850 --> 00:10:42,880 220 00:10:42,880 --> 00:10:45,730 221 00:10:45,730 --> 00:10:47,680 222 00:10:47,680 --> 00:10:51,329 223 00:10:51,329 --> 00:10:54,520 224 00:10:54,520 --> 00:10:56,890 225 00:10:56,890 --> 00:10:58,840 226 00:10:58,840 --> 00:11:01,270 227 00:11:01,270 --> 00:11:02,710 228 00:11:02,710 --> 00:11:05,500 229 00:11:05,500 --> 00:11:07,450 230 00:11:07,450 --> 00:11:08,530 231 00:11:08,530 --> 00:11:10,540 232 00:11:10,540 --> 00:11:12,520 233 00:11:12,520 --> 00:11:14,470 234 00:11:14,470 --> 00:11:18,250 235 00:11:18,250 --> 00:11:20,290 236 00:11:20,290 --> 00:11:22,449 237 00:11:22,449 --> 00:11:24,340 238 00:11:24,340 --> 00:11:26,290 239 00:11:26,290 --> 00:11:27,400 240 00:11:27,400 --> 00:11:30,430 241 00:11:30,430 --> 00:11:33,910 242 00:11:33,910 --> 00:11:37,809 243 00:11:37,809 --> 00:11:40,509 244 00:11:40,509 --> 00:11:43,920 245 00:11:43,920 --> 00:11:47,559 246 00:11:47,559 --> 00:11:50,199 247 00:11:50,199 --> 00:11:54,730 248 00:11:54,730 --> 00:11:58,889 249 00:11:58,889 --> 00:12:02,350 250 00:12:02,350 --> 00:12:06,100 251 00:12:06,100 --> 00:12:07,420 252 00:12:07,420 --> 00:12:09,639 253 00:12:09,639 --> 00:12:11,939 254 00:12:11,939 --> 00:12:14,860 255 00:12:14,860 --> 00:12:19,559 256 00:12:19,559 --> 00:12:22,629 257 00:12:22,629 --> 00:12:24,850 258 00:12:24,850 --> 00:12:28,300 259 00:12:28,300 --> 00:12:31,119 260 00:12:31,119 --> 00:12:36,579 261 00:12:36,579 --> 00:12:39,009 262 00:12:39,009 --> 00:12:41,620 263 00:12:41,620 --> 00:12:44,079 264 00:12:44,079 --> 00:12:47,259 265 00:12:47,259 --> 00:12:48,460 266 00:12:48,460 --> 00:12:51,189 267 00:12:51,189 --> 00:12:53,499 268 00:12:53,499 --> 00:13:00,970 269 00:13:00,970 --> 00:13:03,309 270 00:13:03,309 --> 00:13:05,079 271 00:13:05,079 --> 00:13:07,780 272 00:13:07,780 --> 00:13:09,970 273 00:13:09,970 --> 00:13:11,590 274 00:13:11,590 --> 00:13:17,110 275 00:13:17,110 --> 00:13:18,549 276 00:13:18,549 --> 00:13:20,639 277 00:13:20,639 --> 00:13:23,110 278 00:13:23,110 --> 00:13:26,740 279 00:13:26,740 --> 00:13:30,369 280 00:13:30,369 --> 00:13:32,590 281 00:13:32,590 --> 00:13:35,079 282 00:13:35,079 --> 00:13:38,230 283 00:13:38,230 --> 00:13:40,770 284 00:13:40,770 --> 00:13:42,600 285 00:13:42,600 --> 00:13:45,240 286 00:13:45,240 --> 00:13:47,760 287 00:13:47,760 --> 00:13:49,310 288 00:13:49,310 --> 00:13:54,090 289 00:13:54,090 --> 00:13:57,440 290 00:13:57,440 --> 00:14:01,860 291 00:14:01,860 --> 00:14:04,170 292 00:14:04,170 --> 00:14:06,000 293 00:14:06,000 --> 00:14:09,840 294 00:14:09,840 --> 00:14:11,780 295 00:14:11,780 --> 00:14:14,310 296 00:14:14,310 --> 00:14:18,210 297 00:14:18,210 --> 00:14:19,470 298 00:14:19,470 --> 00:14:21,780 299 00:14:21,780 --> 00:14:27,120 300 00:14:27,120 --> 00:14:32,570 301 00:14:32,570 --> 00:14:36,510 302 00:14:36,510 --> 00:14:39,290 303 00:14:39,290 --> 00:14:42,360 304 00:14:42,360 --> 00:14:46,110 305 00:14:46,110 --> 00:14:48,810 306 00:14:48,810 --> 00:14:53,190 307 00:14:53,190 --> 00:14:57,270 308 00:14:57,270 --> 00:14:59,700 309 00:14:59,700 --> 00:15:00,840 310 00:15:00,840 --> 00:15:04,500 311 00:15:04,500 --> 00:15:07,710 312 00:15:07,710 --> 00:15:10,770 313 00:15:10,770 --> 00:15:13,100 314 00:15:13,100 --> 00:15:18,030 315 00:15:18,030 --> 00:15:20,400 316 00:15:20,400 --> 00:15:23,280 317 00:15:23,280 --> 00:15:25,470 318 00:15:25,470 --> 00:15:28,380 319 00:15:28,380 --> 00:15:31,980 320 00:15:31,980 --> 00:15:33,900 321 00:15:33,900 --> 00:15:35,570 322 00:15:35,570 --> 00:15:40,140 323 00:15:40,140 --> 00:15:43,110 324 00:15:43,110 --> 00:15:46,290 325 00:15:46,290 --> 00:15:50,250 326 00:15:50,250 --> 00:15:52,980 327 00:15:52,980 --> 00:15:54,230 328 00:15:54,230 --> 00:15:56,210 329 00:15:56,210 --> 00:15:59,150 330 00:15:59,150 --> 00:16:05,300 331 00:16:05,300 --> 00:16:10,840 332 00:16:10,840 --> 00:16:13,610 333 00:16:13,610 --> 00:16:14,900 334 00:16:14,900 --> 00:16:18,740 335 00:16:18,740 --> 00:16:20,960 336 00:16:20,960 --> 00:16:23,870 337 00:16:23,870 --> 00:16:26,720 338 00:16:26,720 --> 00:16:29,840 339 00:16:29,840 --> 00:16:32,720 340 00:16:32,720 --> 00:16:34,910 341 00:16:34,910 --> 00:16:38,240 342 00:16:38,240 --> 00:16:40,300 343 00:16:40,300 --> 00:16:45,170 344 00:16:45,170 --> 00:16:48,380 345 00:16:48,380 --> 00:16:50,180 346 00:16:50,180 --> 00:16:54,350 347 00:16:54,350 --> 00:16:58,460 348 00:16:58,460 --> 00:17:01,579 349 00:17:01,579 --> 00:17:04,460 350 00:17:04,460 --> 00:17:08,799 351 00:17:08,799 --> 00:17:11,600 352 00:17:11,600 --> 00:17:15,230 353 00:17:15,230 --> 00:17:17,840 354 00:17:17,840 --> 00:17:20,720 355 00:17:20,720 --> 00:17:25,010 356 00:17:25,010 --> 00:17:26,329 357 00:17:26,329 --> 00:17:29,180 358 00:17:29,180 --> 00:17:32,030 359 00:17:32,030 --> 00:17:34,310 360 00:17:34,310 --> 00:17:36,230 361 00:17:36,230 --> 00:17:38,810 362 00:17:38,810 --> 00:17:41,750 363 00:17:41,750 --> 00:17:43,900 364 00:17:43,900 --> 00:17:47,060 365 00:17:47,060 --> 00:17:49,850 366 00:17:49,850 --> 00:17:53,710 367 00:17:53,710 --> 00:17:56,660 368 00:17:56,660 --> 00:17:59,299 369 00:17:59,299 --> 00:18:03,909 370 00:18:03,909 --> 00:18:06,169 371 00:18:06,169 --> 00:18:07,799 372 00:18:07,799 --> 00:18:09,599 373 00:18:09,599 --> 00:18:12,989 374 00:18:12,989 --> 00:18:17,129 375 00:18:17,129 --> 00:18:19,830 376 00:18:19,830 --> 00:18:21,060 377 00:18:21,060 --> 00:18:22,469 378 00:18:22,469 --> 00:18:24,269 379 00:18:24,269 --> 00:18:26,940 380 00:18:26,940 --> 00:18:31,349 381 00:18:31,349 --> 00:18:34,589 382 00:18:34,589 --> 00:18:36,599 383 00:18:36,599 --> 00:18:40,769 384 00:18:40,769 --> 00:18:43,829 385 00:18:43,829 --> 00:18:47,070 386 00:18:47,070 --> 00:18:49,379 387 00:18:49,379 --> 00:18:52,379 388 00:18:52,379 --> 00:18:57,139 389 00:18:57,139 --> 00:19:04,320 390 00:19:04,320 --> 00:19:06,539 391 00:19:06,539 --> 00:19:11,339 392 00:19:11,339 --> 00:19:14,399 393 00:19:14,399 --> 00:19:19,019 394 00:19:19,019 --> 00:19:21,869 395 00:19:21,869 --> 00:19:21,879 396 00:19:21,879 --> 00:19:23,039 397 00:19:23,039 --> 00:19:24,149 398 00:19:24,149 --> 00:19:26,759 399 00:19:26,759 --> 00:19:31,799 400 00:19:31,799 --> 00:19:33,690 401 00:19:33,690 --> 00:19:34,950 402 00:19:34,950 --> 00:19:36,769 403 00:19:36,769 --> 00:19:41,099 404 00:19:41,099 --> 00:19:43,440 405 00:19:43,440 --> 00:19:45,329 406 00:19:45,329 --> 00:19:47,639 407 00:19:47,639 --> 00:19:49,739 408 00:19:49,739 --> 00:19:51,570 409 00:19:51,570 --> 00:19:54,149 410 00:19:54,149 --> 00:19:55,769 411 00:19:55,769 --> 00:19:57,180 412 00:19:57,180 --> 00:20:01,769 413 00:20:01,769 --> 00:20:03,479 414 00:20:03,479 --> 00:20:06,239 415 00:20:06,239 --> 00:20:08,039 416 00:20:08,039 --> 00:20:10,829 417 00:20:10,829 --> 00:20:13,200 418 00:20:13,200 --> 00:20:14,669 419 00:20:14,669 --> 00:20:16,649 420 00:20:16,649 --> 00:20:18,629 421 00:20:18,629 --> 00:20:20,639 422 00:20:20,639 --> 00:20:21,139 423 00:20:21,139 --> 00:20:23,899 424 00:20:23,899 --> 00:20:27,440 425 00:20:27,440 --> 00:20:31,219 426 00:20:31,219 --> 00:20:34,219 427 00:20:34,219 --> 00:20:36,409 428 00:20:36,409 --> 00:20:38,029 429 00:20:38,029 --> 00:20:40,009 430 00:20:40,009 --> 00:20:42,649 431 00:20:42,649 --> 00:20:45,019 432 00:20:45,019 --> 00:20:48,889 433 00:20:48,889 --> 00:20:54,519 434 00:20:54,519 --> 00:20:59,899 435 00:20:59,899 --> 00:21:02,869 436 00:21:02,869 --> 00:21:06,469 437 00:21:06,469 --> 00:21:11,690 438 00:21:11,690 --> 00:21:17,269 439 00:21:17,269 --> 00:21:20,779 440 00:21:20,779 --> 00:21:27,070 441 00:21:27,070 --> 00:21:30,139 442 00:21:30,139 --> 00:21:33,680 443 00:21:33,680 --> 00:21:37,310 444 00:21:37,310 --> 00:21:40,639 445 00:21:40,639 --> 00:21:43,009 446 00:21:43,009 --> 00:21:44,629 447 00:21:44,629 --> 00:21:45,589 448 00:21:45,589 --> 00:21:46,969 449 00:21:46,969 --> 00:21:49,690 450 00:21:49,690 --> 00:21:52,959 451 00:21:52,959 --> 00:21:57,489 452 00:21:57,489 --> 00:22:04,810 453 00:22:04,810 --> 00:22:08,839 454 00:22:08,839 --> 00:22:12,759 455 00:22:12,759 --> 00:22:16,099 456 00:22:16,099 --> 00:22:17,509 457 00:22:17,509 --> 00:22:18,979 458 00:22:18,979 --> 00:22:24,580 459 00:22:24,580 --> 00:22:27,109 460 00:22:27,109 --> 00:22:28,580 461 00:22:28,580 --> 00:22:29,749 462 00:22:29,749 --> 00:22:32,330 463 00:22:32,330 --> 00:22:34,279 464 00:22:34,279 --> 00:22:36,349 465 00:22:36,349 --> 00:22:37,519 466 00:22:37,519 --> 00:22:41,589 467 00:22:41,589 --> 00:22:44,779 468 00:22:44,779 --> 00:22:49,359 469 00:22:49,359 --> 00:22:53,149 470 00:22:53,149 --> 00:22:56,899 471 00:22:56,899 --> 00:23:00,399 472 00:23:00,399 --> 00:23:03,190 473 00:23:03,190 --> 00:23:05,419 474 00:23:05,419 --> 00:23:07,219 475 00:23:07,219 --> 00:23:12,469 476 00:23:12,469 --> 00:23:17,810 477 00:23:17,810 --> 00:23:22,609 478 00:23:22,609 --> 00:23:28,129 479 00:23:28,129 --> 00:23:38,899 480 00:23:38,899 --> 00:23:44,869 481 00:23:44,869 --> 00:23:46,219 482 00:23:46,219 --> 00:23:50,659 483 00:23:50,659 --> 00:23:53,239 484 00:23:53,239 --> 00:23:55,940 485 00:23:55,940 --> 00:23:59,749 486 00:23:59,749 --> 00:24:04,489 487 00:24:04,489 --> 00:24:08,869 488 00:24:08,869 --> 00:24:16,250 489 00:24:16,250 --> 00:24:19,759 490 00:24:19,759 --> 00:24:21,700 491 00:24:21,700 --> 00:24:26,360 492 00:24:26,360 --> 00:24:27,740 493 00:24:27,740 --> 00:24:29,419 494 00:24:29,419 --> 00:24:32,720 495 00:24:32,720 --> 00:24:35,509 496 00:24:35,509 --> 00:24:39,049 497 00:24:39,049 --> 00:24:41,720 498 00:24:41,720 --> 00:24:43,789 499 00:24:43,789 --> 00:24:47,690 500 00:24:47,690 --> 00:24:50,990 501 00:24:50,990 --> 00:24:53,840 502 00:24:53,840 --> 00:24:56,269 503 00:24:56,269 --> 00:24:59,509 504 00:24:59,509 --> 00:25:01,940 505 00:25:01,940 --> 00:25:06,620 506 00:25:06,620 --> 00:25:08,690 507 00:25:08,690 --> 00:25:10,940 508 00:25:10,940 --> 00:25:12,710 509 00:25:12,710 --> 00:25:15,049 510 00:25:15,049 --> 00:25:22,490 511 00:25:22,490 --> 00:25:26,509 512 00:25:26,509 --> 00:25:29,149 513 00:25:29,149 --> 00:25:33,610 514 00:25:33,610 --> 00:25:44,990 515 00:25:44,990 --> 00:25:47,700 516 00:25:47,700 --> 00:25:52,160 517 00:25:52,160 --> 00:25:54,810 518 00:25:54,810 --> 00:25:54,820 519 00:25:54,820 --> 00:25:55,320 520 00:25:55,320 --> 00:25:58,650 521 00:25:58,650 --> 00:26:00,090 522 00:26:00,090 --> 00:26:05,480 523 00:26:05,480 --> 00:26:11,400 524 00:26:11,400 --> 00:26:14,130 525 00:26:14,130 --> 00:26:18,240 526 00:26:18,240 --> 00:26:20,250 527 00:26:20,250 --> 00:26:23,190 528 00:26:23,190 --> 00:26:24,840 529 00:26:24,840 --> 00:26:27,690 530 00:26:27,690 --> 00:26:30,060 531 00:26:30,060 --> 00:26:31,530 532 00:26:31,530 --> 00:26:34,880 533 00:26:34,880 --> 00:26:41,690 534 00:26:41,690 --> 00:26:43,980 535 00:26:43,980 --> 00:26:45,539 536 00:26:45,539 --> 00:26:47,159 537 00:26:47,159 --> 00:26:50,669 538 00:26:50,669 --> 00:26:52,830 539 00:26:52,830 --> 00:26:55,020 540 00:26:55,020 --> 00:26:58,020 541 00:26:58,020 --> 00:26:59,669 542 00:26:59,669 --> 00:27:03,990 543 00:27:03,990 --> 00:27:06,720 544 00:27:06,720 --> 00:27:09,900 545 00:27:09,900 --> 00:27:12,570 546 00:27:12,570 --> 00:27:14,010 547 00:27:14,010 --> 00:27:16,020 548 00:27:16,020 --> 00:27:17,400 549 00:27:17,400 --> 00:27:19,799 550 00:27:19,799 --> 00:27:22,470 551 00:27:22,470 --> 00:27:24,780 552 00:27:24,780 --> 00:27:26,909 553 00:27:26,909 --> 00:27:28,950 554 00:27:28,950 --> 00:27:30,720 555 00:27:30,720 --> 00:27:33,000 556 00:27:33,000 --> 00:27:35,310 557 00:27:35,310 --> 00:27:37,470 558 00:27:37,470 --> 00:27:40,460 559 00:27:40,460 --> 00:27:43,110 560 00:27:43,110 --> 00:27:44,700 561 00:27:44,700 --> 00:27:47,520 562 00:27:47,520 --> 00:27:50,280 563 00:27:50,280 --> 00:27:52,620 564 00:27:52,620 --> 00:27:54,930 565 00:27:54,930 --> 00:27:57,390 566 00:27:57,390 --> 00:28:00,780 567 00:28:00,780 --> 00:28:01,890 568 00:28:01,890 --> 00:28:03,510 569 00:28:03,510 --> 00:28:07,020 570 00:28:07,020 --> 00:28:08,580 571 00:28:08,580 --> 00:28:10,430 572 00:28:10,430 --> 00:28:12,660 573 00:28:12,660 --> 00:28:14,760 574 00:28:14,760 --> 00:28:17,250 575 00:28:17,250 --> 00:28:19,080 576 00:28:19,080 --> 00:28:22,950 577 00:28:22,950 --> 00:28:24,540 578 00:28:24,540 --> 00:28:26,160 579 00:28:26,160 --> 00:28:28,860 580 00:28:28,860 --> 00:28:31,650 581 00:28:31,650 --> 00:28:33,360 582 00:28:33,360 --> 00:28:35,550 583 00:28:35,550 --> 00:28:39,470 584 00:28:39,470 --> 00:28:42,830 585 00:28:42,830 --> 00:28:44,850 586 00:28:44,850 --> 00:28:47,340 587 00:28:47,340 --> 00:28:53,850 588 00:28:53,850 --> 00:28:56,790 589 00:28:56,790 --> 00:28:58,200 590 00:28:58,200 --> 00:29:00,540 591 00:29:00,540 --> 00:29:04,730 592 00:29:04,730 --> 00:29:08,430 593 00:29:08,430 --> 00:29:11,580 594 00:29:11,580 --> 00:29:13,380 595 00:29:13,380 --> 00:29:18,810 596 00:29:18,810 --> 00:29:21,030 597 00:29:21,030 --> 00:29:22,440 598 00:29:22,440 --> 00:29:26,280 599 00:29:26,280 --> 00:29:27,840 600 00:29:27,840 --> 00:29:31,440 601 00:29:31,440 --> 00:29:33,000 602 00:29:33,000 --> 00:29:36,900 603 00:29:36,900 --> 00:29:42,270 604 00:29:42,270 --> 00:29:44,700 605 00:29:44,700 --> 00:29:46,140 606 00:29:46,140 --> 00:29:47,880 607 00:29:47,880 --> 00:29:51,750 608 00:29:51,750 --> 00:29:53,520 609 00:29:53,520 --> 00:29:54,840 610 00:29:54,840 --> 00:29:57,330 611 00:29:57,330 --> 00:30:00,750 612 00:30:00,750 --> 00:30:03,930 613 00:30:03,930 --> 00:30:07,539 614 00:30:07,539 --> 00:30:09,729 615 00:30:09,729 --> 00:30:15,460 616 00:30:15,460 --> 00:30:19,989 617 00:30:19,989 --> 00:30:21,879 618 00:30:21,879 --> 00:30:25,840 619 00:30:25,840 --> 00:30:31,060 620 00:30:31,060 --> 00:30:32,499 621 00:30:32,499 --> 00:30:38,619 622 00:30:38,619 --> 00:30:40,659 623 00:30:40,659 --> 00:30:44,830 624 00:30:44,830 --> 00:30:46,570 625 00:30:46,570 --> 00:30:51,430 626 00:30:51,430 --> 00:30:55,180 627 00:30:55,180 --> 00:30:58,509 628 00:30:58,509 --> 00:30:59,979 629 00:30:59,979 --> 00:31:01,330 630 00:31:01,330 --> 00:31:03,340 631 00:31:03,340 --> 00:31:06,009 632 00:31:06,009 --> 00:31:09,340 633 00:31:09,340 --> 00:31:11,169 634 00:31:11,169 --> 00:31:14,109 635 00:31:14,109 --> 00:31:16,479 636 00:31:16,479 --> 00:31:18,340 637 00:31:18,340 --> 00:31:20,950 638 00:31:20,950 --> 00:31:22,599 639 00:31:22,599 --> 00:31:24,849 640 00:31:24,849 --> 00:31:28,149 641 00:31:28,149 --> 00:31:30,789 642 00:31:30,789 --> 00:31:32,859 643 00:31:32,859 --> 00:31:34,979 644 00:31:34,979 --> 00:31:39,359 645 00:31:39,359 --> 00:31:42,399 646 00:31:42,399 --> 00:31:44,169 647 00:31:44,169 --> 00:31:46,960 648 00:31:46,960 --> 00:31:48,700 649 00:31:48,700 --> 00:31:50,590 650 00:31:50,590 --> 00:31:52,599 651 00:31:52,599 --> 00:31:55,029 652 00:31:55,029 --> 00:31:57,580 653 00:31:57,580 --> 00:32:00,129 654 00:32:00,129 --> 00:32:01,749 655 00:32:01,749 --> 00:32:02,859 656 00:32:02,859 --> 00:32:05,979 657 00:32:05,979 --> 00:32:08,979 658 00:32:08,979 --> 00:32:11,799 659 00:32:11,799 --> 00:32:13,450 660 00:32:13,450 --> 00:32:15,789 661 00:32:15,789 --> 00:32:19,989 662 00:32:19,989 --> 00:32:22,150 663 00:32:22,150 --> 00:32:24,669 664 00:32:24,669 --> 00:32:26,830 665 00:32:26,830 --> 00:32:31,539 666 00:32:31,539 --> 00:32:34,390 667 00:32:34,390 --> 00:32:36,549 668 00:32:36,549 --> 00:32:38,530 669 00:32:38,530 --> 00:32:40,120 670 00:32:40,120 --> 00:32:41,980 671 00:32:41,980 --> 00:32:45,030 672 00:32:45,030 --> 00:32:47,320 673 00:32:47,320 --> 00:32:49,780 674 00:32:49,780 --> 00:32:52,330 675 00:32:52,330 --> 00:32:54,930 676 00:32:54,930 --> 00:32:58,270 677 00:32:58,270 --> 00:33:00,400 678 00:33:00,400 --> 00:33:03,730 679 00:33:03,730 --> 00:33:07,450 680 00:33:07,450 --> 00:33:09,549 681 00:33:09,549 --> 00:33:12,700 682 00:33:12,700 --> 00:33:14,470 683 00:33:14,470 --> 00:33:16,900 684 00:33:16,900 --> 00:33:18,970 685 00:33:18,970 --> 00:33:21,240 686 00:33:21,240 --> 00:33:24,580 687 00:33:24,580 --> 00:33:26,710 688 00:33:26,710 --> 00:33:30,909 689 00:33:30,909 --> 00:33:33,610 690 00:33:33,610 --> 00:33:35,650 691 00:33:35,650 --> 00:33:38,919 692 00:33:38,919 --> 00:33:42,400 693 00:33:42,400 --> 00:33:46,060 694 00:33:46,060 --> 00:33:48,010 695 00:33:48,010 --> 00:33:49,539 696 00:33:49,539 --> 00:33:51,430 697 00:33:51,430 --> 00:33:54,400 698 00:33:54,400 --> 00:33:56,289 699 00:33:56,289 --> 00:34:00,789 700 00:34:00,789 --> 00:34:04,090 701 00:34:04,090 --> 00:34:06,159 702 00:34:06,159 --> 00:34:10,030 703 00:34:10,030 --> 00:34:13,570 704 00:34:13,570 --> 00:34:15,159 705 00:34:15,159 --> 00:34:17,379 706 00:34:17,379 --> 00:34:19,149 707 00:34:19,149 --> 00:34:21,790 708 00:34:21,790 --> 00:34:23,200 709 00:34:23,200 --> 00:34:26,290 710 00:34:26,290 --> 00:34:29,260 711 00:34:29,260 --> 00:34:33,300 712 00:34:33,300 --> 00:34:35,409 713 00:34:35,409 --> 00:34:37,869 714 00:34:37,869 --> 00:34:40,299 715 00:34:40,299 --> 00:34:43,059 716 00:34:43,059 --> 00:34:46,750 717 00:34:46,750 --> 00:34:48,460 718 00:34:48,460 --> 00:34:56,740 719 00:34:56,740 --> 00:34:58,420 720 00:34:58,420 --> 00:35:01,990 721 00:35:01,990 --> 00:35:03,579 722 00:35:03,579 --> 00:35:06,069 723 00:35:06,069 --> 00:35:08,530 724 00:35:08,530 --> 00:35:13,120 725 00:35:13,120 --> 00:35:15,760 726 00:35:15,760 --> 00:35:17,079 727 00:35:17,079 --> 00:35:21,069 728 00:35:21,069 --> 00:35:24,190 729 00:35:24,190 --> 00:35:27,400 730 00:35:27,400 --> 00:35:29,289 731 00:35:29,289 --> 00:35:32,079 732 00:35:32,079 --> 00:35:36,010 733 00:35:36,010 --> 00:35:37,630 734 00:35:37,630 --> 00:35:40,180 735 00:35:40,180 --> 00:35:42,069 736 00:35:42,069 --> 00:35:44,079 737 00:35:44,079 --> 00:35:47,230 738 00:35:47,230 --> 00:35:49,690 739 00:35:49,690 --> 00:35:52,809 740 00:35:52,809 --> 00:35:56,230 741 00:35:56,230 --> 00:35:58,240 742 00:35:58,240 --> 00:36:01,720 743 00:36:01,720 --> 00:36:04,420 744 00:36:04,420 --> 00:36:05,770 745 00:36:05,770 --> 00:36:09,599 746 00:36:09,599 --> 00:36:12,039 747 00:36:12,039 --> 00:36:15,250 748 00:36:15,250 --> 00:36:19,960 749 00:36:19,960 --> 00:36:23,620 750 00:36:23,620 --> 00:36:27,130 751 00:36:27,130 --> 00:36:29,020 752 00:36:29,020 --> 00:36:31,210 753 00:36:31,210 --> 00:36:34,960 754 00:36:34,960 --> 00:36:37,539 755 00:36:37,539 --> 00:36:39,180 756 00:36:39,180 --> 00:36:42,760 757 00:36:42,760 --> 00:36:45,050 758 00:36:45,050 --> 00:36:46,609 759 00:36:46,609 --> 00:36:55,430 760 00:36:55,430 --> 00:36:58,550 761 00:36:58,550 --> 00:37:01,670 762 00:37:01,670 --> 00:37:06,140 763 00:37:06,140 --> 00:37:08,300 764 00:37:08,300 --> 00:37:10,880 765 00:37:10,880 --> 00:37:12,830 766 00:37:12,830 --> 00:37:15,770 767 00:37:15,770 --> 00:37:21,170 768 00:37:21,170 --> 00:37:23,589 769 00:37:23,589 --> 00:37:28,190 770 00:37:28,190 --> 00:37:30,349 771 00:37:30,349 --> 00:37:32,060 772 00:37:32,060 --> 00:37:34,550 773 00:37:34,550 --> 00:37:37,430 774 00:37:37,430 --> 00:37:41,720 775 00:37:41,720 --> 00:37:45,730 776 00:37:45,730 --> 00:37:50,900 777 00:37:50,900 --> 00:37:52,609 778 00:37:52,609 --> 00:37:54,920 779 00:37:54,920 --> 00:37:56,930 780 00:37:56,930 --> 00:37:58,460 781 00:37:58,460 --> 00:38:01,099 782 00:38:01,099 --> 00:38:05,480 783 00:38:05,480 --> 00:38:09,140 784 00:38:09,140 --> 00:38:11,089 785 00:38:11,089 --> 00:38:13,490 786 00:38:13,490 --> 00:38:15,589 787 00:38:15,589 --> 00:38:17,870 788 00:38:17,870 --> 00:38:21,980 789 00:38:21,980 --> 00:38:24,790 790 00:38:24,790 --> 00:38:27,859 791 00:38:27,859 --> 00:38:30,560 792 00:38:30,560 --> 00:38:34,579 793 00:38:34,579 --> 00:38:38,030 794 00:38:38,030 --> 00:38:40,400 795 00:38:40,400 --> 00:38:42,710 796 00:38:42,710 --> 00:38:44,390 797 00:38:44,390 --> 00:38:45,940 798 00:38:45,940 --> 00:38:48,109 799 00:38:48,109 --> 00:38:50,089 800 00:38:50,089 --> 00:38:51,770 801 00:38:51,770 --> 00:38:54,349 802 00:38:54,349 --> 00:38:55,620 803 00:38:55,620 --> 00:39:00,560 804 00:39:00,560 --> 00:39:03,120 805 00:39:03,120 --> 00:39:05,940 806 00:39:05,940 --> 00:39:09,270 807 00:39:09,270 --> 00:39:12,380 808 00:39:12,380 --> 00:39:17,220 809 00:39:17,220 --> 00:39:19,080 810 00:39:19,080 --> 00:39:34,110 811 00:39:34,110 --> 00:39:35,250 812 00:39:35,250 --> 00:39:37,890 813 00:39:37,890 --> 00:39:39,210 814 00:39:39,210 --> 00:39:42,510 815 00:39:42,510 --> 00:39:52,440 816 00:39:52,440 --> 00:39:55,980 817 00:39:55,980 --> 00:39:57,780 818 00:39:57,780 --> 00:40:01,440 819 00:40:01,440 --> 00:40:07,890 820 00:40:07,890 --> 00:40:09,960 821 00:40:09,960 --> 00:40:11,970 822 00:40:11,970 --> 00:40:14,460 823 00:40:14,460 --> 00:40:17,010 824 00:40:17,010 --> 00:40:19,650 825 00:40:19,650 --> 00:40:21,960 826 00:40:21,960 --> 00:40:24,000 827 00:40:24,000 --> 00:40:26,790 828 00:40:26,790 --> 00:40:29,210 829 00:40:29,210 --> 00:40:32,460 830 00:40:32,460 --> 00:40:35,520 831 00:40:35,520 --> 00:40:38,450 832 00:40:38,450 --> 00:40:42,450 833 00:40:42,450 --> 00:40:45,930 834 00:40:45,930 --> 00:40:48,660 835 00:40:48,660 --> 00:40:50,430 836 00:40:50,430 --> 00:40:52,770 837 00:40:52,770 --> 00:40:56,310 838 00:40:56,310 --> 00:41:02,720 839 00:41:02,720 --> 00:41:08,740 840 00:41:08,740 --> 00:41:13,270 841 00:41:13,270 --> 00:41:17,590 842 00:41:17,590 --> 00:41:21,970 843 00:41:21,970 --> 00:41:24,880 844 00:41:24,880 --> 00:41:28,510 845 00:41:28,510 --> 00:41:32,440 846 00:41:32,440 --> 00:41:35,230 847 00:41:35,230 --> 00:41:39,660 848 00:41:39,660 --> 00:41:43,060 849 00:41:43,060 --> 00:41:44,620 850 00:41:44,620 --> 00:41:46,000 851 00:41:46,000 --> 00:41:47,770 852 00:41:47,770 --> 00:41:49,030 853 00:41:49,030 --> 00:41:50,440 854 00:41:50,440 --> 00:41:51,820 855 00:41:51,820 --> 00:41:53,740 856 00:41:53,740 --> 00:41:55,720 857 00:41:55,720 --> 00:41:57,640 858 00:41:57,640 --> 00:41:59,350 859 00:41:59,350 --> 00:42:01,570 860 00:42:01,570 --> 00:42:05,710 861 00:42:05,710 --> 00:42:09,130 862 00:42:09,130 --> 00:42:13,020 863 00:42:13,020 --> 00:42:15,820 864 00:42:15,820 --> 00:42:20,110 865 00:42:20,110 --> 00:42:23,410 866 00:42:23,410 --> 00:42:26,680 867 00:42:26,680 --> 00:42:28,150 868 00:42:28,150 --> 00:42:31,450 869 00:42:31,450 --> 00:42:33,310 870 00:42:33,310 --> 00:42:35,380 871 00:42:35,380 --> 00:42:38,230 872 00:42:38,230 --> 00:42:42,340 873 00:42:42,340 --> 00:42:46,810 874 00:42:46,810 --> 00:42:50,230 875 00:42:50,230 --> 00:42:53,410 876 00:42:53,410 --> 00:42:56,980 877 00:42:56,980 --> 00:42:59,680 878 00:42:59,680 --> 00:43:02,740 879 00:43:02,740 --> 00:43:05,530 880 00:43:05,530 --> 00:43:08,490 881 00:43:08,490 --> 00:43:12,690 882 00:43:12,690 --> 00:43:17,140 883 00:43:17,140 --> 00:43:21,340 884 00:43:21,340 --> 00:43:22,660 885 00:43:22,660 --> 00:43:25,839 886 00:43:25,839 --> 00:43:28,900 887 00:43:28,900 --> 00:43:31,630 888 00:43:31,630 --> 00:43:33,460 889 00:43:33,460 --> 00:43:35,710 890 00:43:35,710 --> 00:43:38,500 891 00:43:38,500 --> 00:43:43,210 892 00:43:43,210 --> 00:43:45,849 893 00:43:45,849 --> 00:43:47,829 894 00:43:47,829 --> 00:43:49,960 895 00:43:49,960 --> 00:43:54,329 896 00:43:54,329 --> 00:44:00,279 897 00:44:00,279 --> 00:44:02,049 898 00:44:02,049 --> 00:44:06,970 899 00:44:06,970 --> 00:44:08,829 900 00:44:08,829 --> 00:44:11,200 901 00:44:11,200 --> 00:44:13,990 902 00:44:13,990 --> 00:44:15,759 903 00:44:15,759 --> 00:44:17,890 904 00:44:17,890 --> 00:44:21,279 905 00:44:21,279 --> 00:44:23,259 906 00:44:23,259 --> 00:44:26,710 907 00:44:26,710 --> 00:44:29,620 908 00:44:29,620 --> 00:44:36,830 909 00:44:36,830 --> 00:44:39,260 910 00:44:39,260 --> 00:44:41,210 911 00:44:41,210 --> 00:44:44,960 912 00:44:44,960 --> 00:44:48,110 913 00:44:48,110 --> 00:44:49,940 914 00:44:49,940 --> 00:44:53,000 915 00:44:53,000 --> 00:44:55,070 916 00:44:55,070 --> 00:44:58,040 917 00:44:58,040 --> 00:45:01,940 918 00:45:01,940 --> 00:45:03,500 919 00:45:03,500 --> 00:45:08,050 920 00:45:08,050 --> 00:45:12,500 921 00:45:12,500 --> 00:45:15,200 922 00:45:15,200 --> 00:45:21,710 923 00:45:21,710 --> 00:45:23,900 924 00:45:23,900 --> 00:45:26,090 925 00:45:26,090 --> 00:45:28,280 926 00:45:28,280 --> 00:45:30,080 927 00:45:30,080 --> 00:45:32,030 928 00:45:32,030 --> 00:45:32,040 929 00:45:32,040 --> 00:45:32,510 930 00:45:32,510 --> 00:45:34,400 931 00:45:34,400 --> 00:45:37,700 932 00:45:37,700 --> 00:45:39,800 933 00:45:39,800 --> 00:45:41,480 934 00:45:41,480 --> 00:45:46,220 935 00:45:46,220 --> 00:45:49,250 936 00:45:49,250 --> 00:45:52,160 937 00:45:52,160 --> 00:45:57,200 938 00:45:57,200 --> 00:45:59,150 939 00:45:59,150 --> 00:46:00,590 940 00:46:00,590 --> 00:46:03,290 941 00:46:03,290 --> 00:46:05,120 942 00:46:05,120 --> 00:46:07,820 943 00:46:07,820 --> 00:46:10,220 944 00:46:10,220 --> 00:46:14,840 945 00:46:14,840 --> 00:46:16,490 946 00:46:16,490 --> 00:46:20,780 947 00:46:20,780 --> 00:46:24,590 948 00:46:24,590 --> 00:46:28,340 949 00:46:28,340 --> 00:46:30,260 950 00:46:30,260 --> 00:46:33,770 951 00:46:33,770 --> 00:46:37,070 952 00:46:37,070 --> 00:46:39,950 953 00:46:39,950 --> 00:46:41,660 954 00:46:41,660 --> 00:46:43,760 955 00:46:43,760 --> 00:46:46,550 956 00:46:46,550 --> 00:46:49,100 957 00:46:49,100 --> 00:46:55,170 958 00:46:55,170 --> 00:46:57,270 959 00:46:57,270 --> 00:46:59,750 960 00:46:59,750 --> 00:47:02,300 961 00:47:02,300 --> 00:47:05,700 962 00:47:05,700 --> 00:47:07,650 963 00:47:07,650 --> 00:47:14,970 964 00:47:14,970 --> 00:47:19,170 965 00:47:19,170 --> 00:47:22,170 966 00:47:22,170 --> 00:47:24,900 967 00:47:24,900 --> 00:47:26,430 968 00:47:26,430 --> 00:47:33,800 969 00:47:33,800 --> 00:47:38,460 970 00:47:38,460 --> 00:47:40,200 971 00:47:40,200 --> 00:47:43,170 972 00:47:43,170 --> 00:47:45,950 973 00:47:45,950 --> 00:47:48,330 974 00:47:48,330 --> 00:47:50,400 975 00:47:50,400 --> 00:47:53,820 976 00:47:53,820 --> 00:47:55,410 977 00:47:55,410 --> 00:47:57,660 978 00:47:57,660 --> 00:48:01,080 979 00:48:01,080 --> 00:48:04,770 980 00:48:04,770 --> 00:48:06,810 981 00:48:06,810 --> 00:48:08,640 982 00:48:08,640 --> 00:48:10,170 983 00:48:10,170 --> 00:48:11,490 984 00:48:11,490 --> 00:48:14,670 985 00:48:14,670 --> 00:48:16,170 986 00:48:16,170 --> 00:48:18,690 987 00:48:18,690 --> 00:48:21,510 988 00:48:21,510 --> 00:48:23,690 989 00:48:23,690 --> 00:48:26,460 990 00:48:26,460 --> 00:48:28,320 991 00:48:28,320 --> 00:48:32,250 992 00:48:32,250 --> 00:48:35,670 993 00:48:35,670 --> 00:48:38,010 994 00:48:38,010 --> 00:48:40,950 995 00:48:40,950 --> 00:48:42,180 996 00:48:42,180 --> 00:48:44,190 997 00:48:44,190 --> 00:48:47,160 998 00:48:47,160 --> 00:48:49,080 999 00:48:49,080 --> 00:48:51,470 1000 00:48:51,470 --> 00:48:55,380 1001 00:48:55,380 --> 00:48:59,850 1002 00:48:59,850 --> 00:49:01,760 1003 00:49:01,760 --> 00:49:05,520 1004 00:49:05,520 --> 00:49:08,850 1005 00:49:08,850 --> 00:49:12,650 1006 00:49:12,650 --> 00:49:14,670 1007 00:49:14,670 --> 00:49:18,180 1008 00:49:18,180 --> 00:49:20,490 1009 00:49:20,490 --> 00:49:22,170 1010 00:49:22,170 --> 00:49:24,120 1011 00:49:24,120 --> 00:49:26,700 1012 00:49:26,700 --> 00:49:29,940 1013 00:49:29,940 --> 00:49:32,280 1014 00:49:32,280 --> 00:49:33,350 1015 00:49:33,350 --> 00:49:40,920 1016 00:49:40,920 --> 00:49:42,030 1017 00:49:42,030 --> 00:49:44,790 1018 00:49:44,790 --> 00:49:47,250 1019 00:49:47,250 --> 00:49:49,890 1020 00:49:49,890 --> 00:49:52,830 1021 00:49:52,830 --> 00:49:55,290 1022 00:49:55,290 --> 00:49:57,150 1023 00:49:57,150 --> 00:50:00,120 1024 00:50:00,120 --> 00:50:03,530 1025 00:50:03,530 --> 00:50:07,170 1026 00:50:07,170 --> 00:50:09,060 1027 00:50:09,060 --> 00:50:11,490 1028 00:50:11,490 --> 00:50:16,110 1029 00:50:16,110 --> 00:50:18,300 1030 00:50:18,300 --> 00:50:23,460 1031 00:50:23,460 --> 00:50:25,290 1032 00:50:25,290 --> 00:50:27,500 1033 00:50:27,500 --> 00:50:27,510 1034 00:50:27,510 --> 00:50:29,760 1035 00:50:29,760 --> 00:50:33,000 1036 00:50:33,000 --> 00:50:35,190 1037 00:50:35,190 --> 00:50:36,870 1038 00:50:36,870 --> 00:50:41,070 1039 00:50:41,070 --> 00:50:43,500 1040 00:50:43,500 --> 00:50:45,510 1041 00:50:45,510 --> 00:50:47,070 1042 00:50:47,070 --> 00:50:49,130 1043 00:50:49,130 --> 00:50:52,500 1044 00:50:52,500 --> 00:50:55,230 1045 00:50:55,230 --> 00:50:58,470 1046 00:50:58,470 --> 00:51:00,120 1047 00:51:00,120 --> 00:51:04,020 1048 00:51:04,020 --> 00:51:09,860 1049 00:51:09,860 --> 00:51:13,980 1050 00:51:13,980 --> 00:51:15,690 1051 00:51:15,690 --> 00:51:18,210 1052 00:51:18,210 --> 00:51:23,039 1053 00:51:23,039 --> 00:51:24,420 1054 00:51:24,420 --> 00:51:26,870 1055 00:51:26,870 --> 00:51:29,400 1056 00:51:29,400 --> 00:51:31,799 1057 00:51:31,799 --> 00:51:35,039 1058 00:51:35,039 --> 00:51:40,440 1059 00:51:40,440 --> 00:51:42,329 1060 00:51:42,329 --> 00:51:45,329 1061 00:51:45,329 --> 00:51:48,720 1062 00:51:48,720 --> 00:51:50,880 1063 00:51:50,880 --> 00:51:53,370 1064 00:51:53,370 --> 00:51:55,410 1065 00:51:55,410 --> 00:51:57,240 1066 00:51:57,240 --> 00:52:02,010 1067 00:52:02,010 --> 00:52:06,230 1068 00:52:06,230 --> 00:52:09,809 1069 00:52:09,809 --> 00:52:12,809 1070 00:52:12,809 --> 00:52:18,210 1071 00:52:18,210 --> 00:52:19,950 1072 00:52:19,950 --> 00:52:23,210 1073 00:52:23,210 --> 00:52:26,670 1074 00:52:26,670 --> 00:52:31,920 1075 00:52:31,920 --> 00:52:33,180 1076 00:52:33,180 --> 00:52:36,089 1077 00:52:36,089 --> 00:52:38,760 1078 00:52:38,760 --> 00:52:43,140 1079 00:52:43,140 --> 00:52:46,079 1080 00:52:46,079 --> 00:52:50,039 1081 00:52:50,039 --> 00:52:51,510 1082 00:52:51,510 --> 00:52:52,920 1083 00:52:52,920 --> 00:52:54,510 1084 00:52:54,510 --> 00:52:56,329 1085 00:52:56,329 --> 00:53:03,930 1086 00:53:03,930 --> 00:53:05,640 1087 00:53:05,640 --> 00:53:07,620 1088 00:53:07,620 --> 00:53:11,490 1089 00:53:11,490 --> 00:53:13,920 1090 00:53:13,920 --> 00:53:17,069 1091 00:53:17,069 --> 00:53:18,630 1092 00:53:18,630 --> 00:53:20,490 1093 00:53:20,490 --> 00:53:22,260 1094 00:53:22,260 --> 00:53:24,539 1095 00:53:24,539 --> 00:53:26,609 1096 00:53:26,609 --> 00:53:28,109 1097 00:53:28,109 --> 00:53:29,400 1098 00:53:29,400 --> 00:53:31,320 1099 00:53:31,320 --> 00:53:34,080 1100 00:53:34,080 --> 00:53:35,610 1101 00:53:35,610 --> 00:53:37,980 1102 00:53:37,980 --> 00:53:39,930 1103 00:53:39,930 --> 00:53:42,300 1104 00:53:42,300 --> 00:53:45,270 1105 00:53:45,270 --> 00:53:48,060 1106 00:53:48,060 --> 00:53:50,010 1107 00:53:50,010 --> 00:53:52,230 1108 00:53:52,230 --> 00:53:53,910 1109 00:53:53,910 --> 00:53:56,460 1110 00:53:56,460 --> 00:53:57,990 1111 00:53:57,990 --> 00:54:01,320 1112 00:54:01,320 --> 00:54:05,030 1113 00:54:05,030 --> 00:54:07,740 1114 00:54:07,740 --> 00:54:10,260 1115 00:54:10,260 --> 00:54:12,320 1116 00:54:12,320 --> 00:54:17,910 1117 00:54:17,910 --> 00:54:19,320 1118 00:54:19,320 --> 00:54:23,190 1119 00:54:23,190 --> 00:54:25,470 1120 00:54:25,470 --> 00:54:28,970 1121 00:54:28,970 --> 00:54:32,520 1122 00:54:32,520 --> 00:54:34,950 1123 00:54:34,950 --> 00:54:38,040 1124 00:54:38,040 --> 00:54:40,320 1125 00:54:40,320 --> 00:54:43,020 1126 00:54:43,020 --> 00:54:44,820 1127 00:54:44,820 --> 00:54:47,400 1128 00:54:47,400 --> 00:54:49,730 1129 00:54:49,730 --> 00:54:52,620 1130 00:54:52,620 --> 00:54:54,000 1131 00:54:54,000 --> 00:54:56,850 1132 00:54:56,850 --> 00:55:01,440 1133 00:55:01,440 --> 00:55:01,450 1134 00:55:01,450 --> 00:55:10,380 1135 00:55:10,380 --> 00:55:14,110 1136 00:55:14,110 --> 00:55:16,630 1137 00:55:16,630 --> 00:55:19,180 1138 00:55:19,180 --> 00:55:20,410 1139 00:55:20,410 --> 00:55:22,300 1140 00:55:22,300 --> 00:55:29,920 1141 00:55:29,920 --> 00:55:32,470 1142 00:55:32,470 --> 00:55:35,110 1143 00:55:35,110 --> 00:55:40,660 1144 00:55:40,660 --> 00:55:44,200 1145 00:55:44,200 --> 00:55:46,120 1146 00:55:46,120 --> 00:55:47,680 1147 00:55:47,680 --> 00:55:49,870 1148 00:55:49,870 --> 00:55:52,570 1149 00:55:52,570 --> 00:55:56,500 1150 00:55:56,500 --> 00:55:59,290 1151 00:55:59,290 --> 00:56:01,690 1152 00:56:01,690 --> 00:56:03,820 1153 00:56:03,820 --> 00:56:05,880 1154 00:56:05,880 --> 00:56:10,300 1155 00:56:10,300 --> 00:56:13,950 1156 00:56:13,950 --> 00:56:17,410 1157 00:56:17,410 --> 00:56:19,450 1158 00:56:19,450 --> 00:56:20,740 1159 00:56:20,740 --> 00:56:22,030 1160 00:56:22,030 --> 00:56:26,260 1161 00:56:26,260 --> 00:56:31,690 1162 00:56:31,690 --> 00:56:33,760 1163 00:56:33,760 --> 00:56:35,740 1164 00:56:35,740 --> 00:56:37,870 1165 00:56:37,870 --> 00:56:40,000 1166 00:56:40,000 --> 00:56:43,470 1167 00:56:43,470 --> 00:56:48,010 1168 00:56:48,010 --> 00:56:49,420 1169 00:56:49,420 --> 00:56:51,400 1170 00:56:51,400 --> 00:56:53,560 1171 00:56:53,560 --> 00:56:55,720 1172 00:56:55,720 --> 00:56:57,670 1173 00:56:57,670 --> 00:57:02,080 1174 00:57:02,080 --> 00:57:03,700 1175 00:57:03,700 --> 00:57:04,960 1176 00:57:04,960 --> 00:57:07,570 1177 00:57:07,570 --> 00:57:09,700 1178 00:57:09,700 --> 00:57:11,800 1179 00:57:11,800 --> 00:57:14,290 1180 00:57:14,290 --> 00:57:16,790 1181 00:57:16,790 --> 00:57:21,590 1182 00:57:21,590 --> 00:57:25,270 1183 00:57:25,270 --> 00:57:30,320 1184 00:57:30,320 --> 00:57:34,370 1185 00:57:34,370 --> 00:57:37,370 1186 00:57:37,370 --> 00:57:39,500 1187 00:57:39,500 --> 00:57:44,630 1188 00:57:44,630 --> 00:57:46,160 1189 00:57:46,160 --> 00:57:51,130 1190 00:57:51,130 --> 00:57:58,130 1191 00:57:58,130 --> 00:58:00,260 1192 00:58:00,260 --> 00:58:01,730 1193 00:58:01,730 --> 00:58:03,800 1194 00:58:03,800 --> 00:58:06,770 1195 00:58:06,770 --> 00:58:09,940 1196 00:58:09,940 --> 00:58:17,720 1197 00:58:17,720 --> 00:58:19,940 1198 00:58:19,940 --> 00:58:25,190 1199 00:58:25,190 --> 00:58:27,650 1200 00:58:27,650 --> 00:58:30,220 1201 00:58:30,220 --> 00:58:34,790 1202 00:58:34,790 --> 00:58:40,040 1203 00:58:40,040 --> 00:58:42,170 1204 00:58:42,170 --> 00:58:44,920 1205 00:58:44,920 --> 00:58:48,500 1206 00:58:48,500 --> 00:58:52,550 1207 00:58:52,550 --> 00:58:54,980 1208 00:58:54,980 --> 00:58:57,710 1209 00:58:57,710 --> 00:58:59,840 1210 00:58:59,840 --> 00:59:01,250 1211 00:59:01,250 --> 00:59:02,810 1212 00:59:02,810 --> 00:59:05,330 1213 00:59:05,330 --> 00:59:08,180 1214 00:59:08,180 --> 00:59:11,720 1215 00:59:11,720 --> 00:59:14,690 1216 00:59:14,690 --> 00:59:16,160 1217 00:59:16,160 --> 00:59:18,320 1218 00:59:18,320 --> 00:59:21,590 1219 00:59:21,590 --> 00:59:23,300 1220 00:59:23,300 --> 00:59:27,170 1221 00:59:27,170 --> 00:59:29,060 1222 00:59:29,060 --> 00:59:30,160 1223 00:59:30,160 --> 00:59:31,420 1224 00:59:31,420 --> 00:59:34,060 1225 00:59:34,060 --> 00:59:36,790 1226 00:59:36,790 --> 00:59:40,000 1227 00:59:40,000 --> 00:59:41,920 1228 00:59:41,920 --> 00:59:45,100 1229 00:59:45,100 --> 00:59:47,890 1230 00:59:47,890 --> 00:59:49,600 1231 00:59:49,600 --> 00:59:53,470 1232 00:59:53,470 --> 00:59:58,120 1233 00:59:58,120 --> 01:00:01,750 1234 01:00:01,750 --> 01:00:03,190 1235 01:00:03,190 --> 01:00:04,960 1236 01:00:04,960 --> 01:00:14,830 1237 01:00:14,830 --> 01:00:14,840 1238 01:00:14,840 --> 01:00:16,900