1
00:00:01,069 --> 00:00:03,189

the following content is provided under  a Creative Commons license your support  will help MIT OpenCourseWare continue to  offer high quality educational resources  for free  to make a donation or to view additional  materials from hundreds of MIT courses  visit MIT opencourseware at ocw.mit.edu  so welcome to six one seven two my name  is charles leiserson and i am one of the  two lecturers this term the other is  professor Julian shun were both in EECS  and in csail on the seventh floor of the  gates building if you don't know it you  are in performance engineering of  software systems so if this is the wrong  if you found yourself in the wrong place  now is the time to exit I want to start  today by talking a little bit about why  we do performance engineering and then  I'll do some a little bit of  administration and then sort of dive  into sort of a case study that'll give  you a good sense of some of the things  that we're going to do during the term  I put the administration in the middle  because it's like if if you don't if  from me telling you about the course you  don't want to do the course then it's  like why should you listen to the  administration right it's like so so  let's just dive right in okay so so the  first thing to always understand  whenever you're doing something is a  perspective on your on what matters in  what you're doing so we're going to  study the whole term we're gonna do  software performance engineering and so  this is kind of interesting because it  turns out that performance is usually  not at the top of what people are  interested in when they're building  software okay what are some of the  things that are more important than  software that's sorry then performance  yeah deadlines good cost correctness  extensibility yeah we would go on and on  I think that you folks could probably  make a pretty long list I made a short  list of all the kinds of things that are  more important than performance so then  if programmers are so willing to  sacrifice performance for these  properties why do we study performance  okay so this is kind of a bit of a  paradox in a bit of a puzzle why do you  study something that clearly isn't at  the top of the list of what most people  care about when they're developing  software I think the answer to that is  that performance is the currency of  computing okay you use performance to  buy these other properties so you'll say  something like gee I want to make it  easy to program and so therefore I'm  willing to sacrifice some performance to  make something easy to program now I'm  willing to sacrifice some performance to  make sure that my system is secure okay  and all those things come out of your  performance budget and clearly if  performance degrades too far your stuff  becomes unusable okay when I talk with  people in with programmers and I say you  know people are fond of saying ah  performance you know yo you do  performance performance doesn't matter I  never think about it then I talk with  with people who use computers and I ask  what's your main complaint about the the  computing systems you use answer too  slow  okay so it's interesting whether you're  the producer or that whatever but but  the real answer is that that performance  is is like currency it's something you  spend I would rather have if you look  you know would I rather have $100 or a  gallon of water well water is  indispensable to life  there's circumstances certainly where I  would prefer to have the water okay then  $100 okay but in our modern society I  can buy water for much less than $100  okay so even though water is essential  to life and far more important than  money  money is a currency and so I prefer to  have the money because I can just buy  the things I  and that's the same kind of analogy of  performance it has no intrinsic value  but it contributes to things you can use  it to buy things that you care about  like usability or testability or what  have you  okay now in the early days of computing  software performance engineering was  common because machine resources were  limited if you look at these machines  from 1964 to 1977 I mean look at the  site look at how many bytes they have on  them right the in 64 there is a computer  with 524 kilobytes okay that was a big  machine back then that's kilobytes  that's not megabytes that's not  gigabytes that's kilobytes okay and many  programs would strain the machine  resources okay the clock rate for that  machine was 33 kilohertz what's a  typical clock rate today about what four  gigahertz three gigahertz two gigahertz  somewhere up there yeah somewhere in  that range okay and here they were  operating with kilohertz so many  programs would not fit without intense  performance engineering and one of the  things also that there's a lot of a lot  of sayings that came out of that area  Donald Knuth who's one of the touring  award winner absolutely fabulous  computer scientists in all respects  wrote premature optimization is the root  of all evil and I invite you by the way  to look that quote up and because  there's actually taken out of context  okay so trying to optimize stuff too  early he was worried about okay bill  Wolfe who built the design the bliss  language and worked on the pdp-11 and  such said more computing sins are  committed in the name of efficiency  without necessarily achieving it than  for any other single reason including  blind stupidity okay and Michael Jackson  said the first rule of program  optimization don't  do it second rule of program  optimization for experts only don't do  it yet okay so everybody warning away  because when you start trying to make  things fast your code becomes unreadable  okay making code that is readable and  fast now that's that's where the art is  and hopefully we'll learn a little bit  about doing that okay and indeed there  was no real point in in working too hard  on performance engineering for many  years if you look at technology scaling  and you look at how many transistors are  on various processor designs up until  about 2004 we had Moore's law in in full  throttle okay with chip densities  doubling every two years and really  quite amazing and along with that as  they shrunk the dimensions of chips  because by miniaturization the clock  speed would go up correspondingly as  well and so if you found something was  too slow wait a couple years okay wait a  couple years it'll be faster okay and so  there wasn't you know if you're going to  do something with software and make your  software ugly that really wasn't a real  you know wasn't a real good payoff  compared to just simply waiting around  and in that era there was something  called Dennard scaling where which  allowed things to as things shrunk  allowed the clock speeds to get larger  basically by reducing power you could  reduce power and still keep everything  fast and we'll talk about that in a  minute  so if you look at what happened to from  1977 to 2004 here are Apple computers  with similar  similar price tags and you can see the  the clock rate really just skyrocketed  one megahertz 400 megahertz 1.8  gigahertz okay and the data paths went  from 8 bits to 32 64 the memory because  correspondingly grow cost approximately  the same and that was that's the legacy  from Moore's Law and the tremendous  advances in semiconductor technology and  so until 2004 moore's law in the scaling  of clock frequency so court Dennard  scaling was essentially a printing press  for the currency of performance okay you  didn't have to do anything you just made  the hardware go fast or very easy very  and all that came to an end well some of  it came to an end in 2004 when clock  speeds plateaued okay so if you look at  this around 2005 you can see all the  speeds we hit you know 2 to 4 gigahertz  and we have not been able to make chips  go faster than that in any practical way  since then but the densities have kept  growing great now the reason that the  clock speed flattened was because of  power density and this is a slide from  Intel from that era looking at the  growth of power density and what they  were projecting was that that the  junction temperatures of the transistors  on the chip if they just keep scaling  the way they had been scaling would  would start to approach first of all the  temperature of a nuclear reactor than  the temperature of a rocket nozzle and  then the Sun surface okay so that we're  not going to build little technology  that cools that very well and even if  you could solve it for a little bit the  writing was on the wall  we cannot scale clock frequencies  anymore the reason for that is that  originally clock frequency was scaled  assuming that the most of the power was  dynamic power which was going when you  switched the circuit and what happened  as we kept reducing that and reducing  that is something that used to be in the  noise namely the  Deacon's okay started to become  significant to the point where now today  the dynamic power is is far less of a  concern than the static power from just  the circuits sitting there leaking and  when you miniaturize you can't stop that  effect from happening so what did the  vendors do in 2004 and 2005 and since is  they said oh gosh we've got all these  transistors to use but we can't use the  transistors to make stuff run faster so  what they did is they introduced  parallelism in the form of multi-core  processors they put more than one  processing core in a chip and to scale  performance they would you know have  multiple cores and each generation of  Moore's law now was potentially doubling  the number of of cores and so if you  look at what happened for processor  cores you see that around 2005 2004 2005  we started to get multiple processing  cores per chip to the extent that today  it's basically impossible to find a  single core chip for a laptop or a  workstation or whatever  everything is multi-crew you can't buy  just one you have to buy a parallel  processor and and so the impact of that  was that performance was no longer free  you couldn't just speed up the hardware  now if you wanted to use that potential  you had to do parallel program and  that's not something that anybody in the  industry really had done so today there  are a lot of other things that happen in  the in that intervening time we got  vector units as common parts of our  machines we got GPUs we got steeper  cache hierarchies we have a configurable  logic on some machines and so forth and  now it's up to the software to adapt to  it and so although we don't want to have  to deal with performance  today you have to deal with performance  and in your lifetimes you will have to  deal with performance okay in software  if you're gonna have effective software  okay you can see what happened also this  is a study that we did looking at  software bugs and a variety of open  source projects where they're mentioning  the word performance and you can see  that in 2004 the numbers start going up  you know some of them it's not as as  convincing for some things as others but  generally there's a trend of after 2004  people started worrying more about  performance if you look at software  developer jobs you know as of tooth you  know early mid-2000s 2000 oh ohs I guess  okay the you see once again the mention  of performance and jobs is going up and  anecdotally I can tell you know I had  one student who came to me after the  spring after he'd taken six 172 and he  said you know I went and I had applied  for five jobs and every job asked me  every at every job interview they asked  me a question I couldn't have answered  if I hadn't taken six 172 and I got five  offers okay and when I compared those  offers they tended to be 20 to 30%  larger than people are just web monkeys  okay so so anyway that's not to say that  you should necessarily take this class  okay but I just want to point out that  what we're gonna learn is going to be  interesting from a practical point of  view ie your futures okay as well as  theoretical points of view and technical  points of view okay  so modern processors are really  complicated and the big question is how  do we write software to use that modern  hardware efficiently okay I want to give  you a example of performance engineering  of a very well  studied problem namely matrix  multiplication who has never seen this  problem okay so there we got some  Joker's in the class I can say okay so  this is you know it takes n cubed  operations because you're basically  computing N squared dot products okay so  essentially if you add up the total  number of operations it's about 2n cubed  because there is essentially a multiply  and an ADD for every pair of terms that  need to be accumulated okay so it's  basically 2n cubed we're gonna look at  it assuming for simplicity that our n is  a an exact power of two  okay now the machine that we're gonna  look at is going to be one of the one  one of the ones that you'll have access  to an AWS okay it's a it's a compute  optimized machine which has a Haswell  microarchitecture running at 2.9  gigahertz there are two processor chips  for each of these machines and nine  processing cores per chip so a total of  18 cores so that's the amount of  parallel processing it does two-way  hyper-threading which we're actually  going to not deal a lot with it hyper  threading gives you a little bit more  performance but it also makes it really  hard to measure things so generally we  will turn off hyper threading but the  performance that you get tends to be  correlated with what you get when your  hyper thread for floating-point unit  there it is capable of doing eight  double precision operations that 64-bit  floating-point operations including a  fused multiply add per core per cycle  okay so that that's a vector unit so you  basically each of these 18 cores can do  eight double precision operations and so  including a fuse multiply add which is  actually two operations  okay the way that they count these  things okay it has a cache line size of  64 bytes the AI cache is 32 kilobytes  which is 8 way set associative we'll  talk about some of these things if you  don't know all the terms that's okay  we're gonna cover most of these terms  later on it's got a D cache of the same  size it's got an l2 cache of 256  kilobytes and it's got an l3 cache or  what's sometimes called an LLC last  level cache of 25 megabytes and then  it's got 60 gigabytes of DRAM so this is  a honking big machine okay this is like  you can get things to sing on this okay  if you look at the peak performance it's  the clock speed times 2 processor chips  times 9 processing cores per chip each  capable of if you can use both the  multiply and the add 16 floating-point  operations and that goes out to just  short of a teraflops okay 836 gigaflops  okay so that's that's a lot of power  okay  that's a lot of power these are fun  machines actually okay especially when  we get into things like the the the game  playing AI and stuff that we do for the  fourth project you'll see it they're  really fun can I have a lot of compute  okay now the base here's the basic code  this is the full code for Python for  doing matrix multiplication now  generally in Python you wouldn't use  this code because you just call a  library subroutine that does matrix  multiplication but sometimes you have a  problem I'm going to illustrate with  matrix multiplication but sometimes you  have a problem that is for what you have  to write the code and I want to give you  an idea of how what kind of performance  you get out of Python okay  in addition somebody has to write if  there is a library routine somebody had  to write it and that person was a  performance engineer because they wrote  it to be as fast as possible and so this  will give you an idea of what you can do  to make  run fast okay so when you run this code  so you can see that the start time you  know before the triply nested loop right  here before the triply nested loop we  take a time measurement and then we take  another time measurement at the end and  then we print the difference and then  that's just this classic three triply  nested loop for for matrix  multiplication and so when you run this  how long is this run for do you think  any guesses let's see how about let's do  this runs for six microseconds who  thinks six microseconds how about six  milliseconds how about six milliseconds  how about six seconds how about six  minutes okay  how about six hours how about six days  okay of course it's important to know  what size it is this 4096 by 4096 as it  shows in the code okay so and those of  you didn't vote can I wake up let's get  active this is active learning put  yourself out there okay it doesn't  matter whether you're right and wrong  there'll be a bunch of people who got  the right answer there have no idea why  okay so it turns out it takes about  21,000 seconds which is about six hours  okay amazing is this fast  Yeah right duh right this fast no that  you know how how do we tell whether  whether this is fast or not okay you  know what should we expect from our  machine so let's do a  back-of-the-envelope calculation of of  how many operations there are and how  fast we ought to be able to do we just  went through and said what all the  parameters the machine so there are 2n  cubed operations that need to be  performed we're not doing strassens  algorithm or anything like that we're  just doing straight triply nested loop  so that's 2 to the 37th floating-point  operations okay the running time is  21,000 seconds so that says that we're  getting about 6.2 5 mega flops out of  our machine when we run that code ok  just by dividing it out how many  floating-point operations per second do  we get we take the number of operations  divide it by the time okay the peak as  you recall was about 836 gigaflops okay  and we're getting 6.25 mega flops okay  so we're getting about 0.0007 5 percent  of peak okay this is not fast okay this  is not fast so let's let's do something  really similar let's code it in Java  rather than Python okay so we take just  that loop the code is almost the same  okay triply nested loop we run it in  Java okay and the running time now it  turns out is about just under 3,000  seconds which is about 46 minutes the  same code Python Java okay we got a  almost a nine times speed-up over just  simply coding it in a different language  okay well let's try see that's the clap  that's the language we're going to be  using here what happens when you code it  and see it's exactly the same thing okay  we're going to use the clang LLVM 5.0  compiler I believe we're using 6.0 this  term is that right yeah okay I should  have rerun these numbers for sex but I  didn't so now it's basically 1,100  seconds which is about 19 minutes so we  got then about it's twice as fast as  Java and about 18 times faster than  Python okay so here's where we stand so  far okay we have the running time of  these various things okay and the  relative speed up is how much faster it  isn't in the previous row and the  absolute speed up is how it is compared  to the first row and now we're managing  to get now 0.01 4 percent of peak so  we're still we're still slow but before  we go and try to optimize it further  like why is Python so slow and C so fast  anybody know  okay that's that's kind of on the right  track anybody else having the ice  articulate a little bit why Python is so  slow  yeah you're right like multiplying add  those aren't the only instructions  pythons doing it's doing lots of code  yeah yeah okay good so the big reason  why Python is slow and C is so fast is  that python is interpreted and C is  compiled directly to byte to machine  code and Java is somewhere in the middle  because Java is compiled to bytecode  which is then interpreted and then  just-in-time compiled into machine code  so let me tell it talked a little bit  about these things so so interpreters  such as in Python are versatile but slow  it's one of these things where they said  we're gonna take some of our performance  and use it to make a more flexible  easier to program environment okay the  interpreter basically reads interprets  and performs each program statement and  then updates the Machine State so it's  not just it's actually going through an  each time reading your code figuring out  what it does and then implementing it so  there's like all this overhead compared  to just doing its operations so  interpreters can easily support  high-level programming features and they  can do things like dynamic code  alteration and so forth at the cost of  performance so that you know typically  the the cycle for an interpreter is you  read the next statement you interpret  the statement you then perform the  statement and then you update the state  of the machine and then you would fetch  the next instruction okay and you're  going through that each time and that's  done in software okay when you have  things compiled to machine code it goes  through a similar thing but it's highly  optimized just for the things that  machines are done okay and so when you  compile you're able to take advantage of  the hardware and interpreter of machine  instructions and that's much much lower  overhead than the big software overhead  you get with Python  now JIT is somewhere in the middle  what's used in Java JIT compilers can  recover some of the performance in fact  it did a pretty good job in this case  the idea is when the code is first  interpreted it's executed it's  interpreted and then the runtime C  system keeps track of how often the  various pieces of code are executed and  whenever it finds that there's some  piece of code that it's executing  frequently it then calls the compiler to  compile that piece of code and then  subsequent to that it runs the compiled  code so it tries to get the big  advantage of the of performance by only  compiling the things that are necessary  you know for which it's actually going  to pay off to to invoke the compiler to  do ok so so anyway so that's the big  difference with with those kinds of  things one of the reasons we don't use  Python in this class is because the  performance model is hard to figure out  ok see it's much closer to the metal  much closer to the silicon ok and so  it's much easier to figure out what's  going on in that in that context ok but  we will have a guest lecture that we're  going to talk about performance in  managed languages like like Python so  it's not that we're going to ignore the  topic but we will learn how to do  performance engineering in a place where  it's easier to do it ok now one of the  things that good compiler will do is you  know once you get to let's say we have  the C version which is where we're going  to move from this point cuz that's the  fastest week we got so far is it turns  out that you can change the order of  loops in this program  without affecting the correctness ok so  here we went you know for I for j4k do  the update ok we could otherwise do we  could do for I for k4j do the update  and it computes exactly the same thing  or we could do for K for j4 I do the  updates okay so we can change the order  without affecting the correctness okay  and so do you think the order of loops  matters for performance uh uh you know I  think this is like this leading question  yeah question yes okay and you're  exactly right cache locality is what it  is so when we do that we get the loop  order affects the running time by a  factor of 18 whoa just by switching the  order okay what's going on there okay  what's going on so we're going to talk  about this in more depth so I'm just  going to fly through this because this  is just sort of showing you the kinds of  considerations that you do so hardware  there are each processor reads and  writes main memory in contiguous blocks  called cache lines okay previously  access cache lines are stored in a small  memory called cache that sits near the  processor when it access when the  processor accessed something if it's in  the cache you get a hit that's very  cheap okay and fast if you miss you have  to go out to either a deeper level cache  or all the way out to main memory that  is much much slower and we'll talk about  that kind of thing so what happens in in  for this matrix problem is the matrices  are laid out in memory and look row  major order that means you take you know  you have a two-dimensional matrix it's  laid out in the linear order of the  addresses of memory by essentially  taking Row one and then after Row one  stick Row two and after that stick Row  three and so forth and unfolding there's  another order that things could have  been laid out in fact they are in  Fortran which is called column major  order okay so it turns out C and Fortran  operate in different orders okay and  turns out it affects performance which  way it does it so let's just take a look  at the access pattern for order ijk okay  so what we're doing is  once we figure out what I and what J is  we're gonna go through and cycle through  k and as we cycle through K okay CIJ  stays the same for everything we get for  that excellent spatial locality because  we're just accessing the same location  every single time it's going to be in  cache it's always going to be there it's  going to be fast to access see for a  what happens is we go through in a  linear order and we get good spatial  locality but for B it's going through  columns and those points are distributed  far away in memory so the processor is  going to be bringing in 64 bytes to  operate on a particular datum okay and  then it's ignoring seven of the eight by  seven of the eight floating-point words  on that on that cache line and going to  the next one so it's wasting an awful  lot okay so this one has good spatial  locality and that it's all adjacent and  you would use the cache lines  effectively this one you're going 4096  elements apart it's got poor spatial  locality okay and that's why and that's  for this one so then if we look at the  different other ones this one the order  ikj it turns out you get good spatial  locality for both C and B and excellent  for a okay and if you look at you know  even another one you don't get nearly as  good as the other one so there's a whole  range of things okay this one you're  doing optimally badly and both okay and  so you can just measure the different  ones and it turns out that that you can  you can use a tool to figure this out  and the tool that we'll be using is  called cache grinned and it's at one of  the valgrind Suites of caches and what  it'll do is it'll tell you what the Miss  rates are for the various pieces of code  okay and you'll learn how to use that  tool and figure out oh look at that we  have a high miss rate for some and not  for others so that may be why my code is  running slowly okay so when you pick the  best one of those okay we then got a  relative speed up from about six  and a half so what other simple changes  can we try there's actually a collection  of things that we could do that don't  even have us touching the code what else  could we do four people have played with  compilers and such and hint yeah yeah  change the compiler flags  okay so clang which is the compiler will  be using provides a collection of  optimization switches and you can  specify a switch to the compiler to ask  it to optimize so the you do - oh and  then a number and zero if you look at  the documentation that says do not  optimize one says optimize two says  optimize even more 3 says optimize yet  more ok in this case it turns out that  even though it optimized more in oh  three turns out Oh - was a better  setting ok this is one of these cases it  doesn't happen all the time usually oh  three does better than Oh two but in  this case Oh to actually optimize better  than oh three because the optimizations  are to some extent heuristic ok and  there are also other kinds of automation  you can have it do a profile guided  optimization where you look at what the  performance was and feed that back into  the code and then the the compiler can  be smarter about how it optimizes and  there are a variety of other things so  with this simple technology we now  choosing a good optimization flag in  this case Oh - we got for free basically  a factor of three point two five okay  without having to do much work at all ok  and now we're actually starting to  approach one percent of peak performance  we got point three percent of peak  performance ok so what's causing the low  performance why aren't we getting most  of the performance out of this machine  why do you think yeah  yeah we're not using all car so far  we're using just one core and how many  course we have 18 right 18 course ah 18  cords just sitting there 17 sitting idle  while we are trying to optimize one okay  so multi core so we have nine cores per  chip and there are two of these chips  the in our test machine so we're running  on just one of them so let's use them  all to do that we're going to use the  silk infrastructure and in particular we  can use what's called a parallel loop  which in silk huge call silk four and so  you just relate that outer loop for  example in this case you say silk four  it says do all those iterations in  parallel compiler and runtime system are  free to schedule them and so forth okay  and we can also do it for the inner loop  okay and you know it's like or it turns  out you can't also do it for the middle  loop if you think about it  okay so I'll let you do that as a little  bit of a homework problem why can't I  just do a soak four of the inner loop  okay so the question is which parallel  version works best so we can do parallel  the I loop we can parallel the J loop  and we can do I and J together you can't  do K just with a parallel loop and  expect to get the right thing okay so  and that's this way so if you look why  what a spread of running times right  okay if I paralyze the just the I loop  it's three point one eight seconds and  if I paralyze the J loop its it actually  slows down I think right and then if I  do both I and J its Eve it's still bad I  just want to do the out loop there this  has through it turns out with scheduling  overhead and we'll learn about  scheduling overhead and how you predict  that and such so the rule of thumb here  is paralyze outer loops rather than  inner loops okay and so when we do  parallel loops we get  almost 18 X speed-up on 18 cores okay so  let me assure you not all code is that  easy to paralyze okay but this one  happens to be so now we're up to what  about just over 5% of peak  okay so where are we losing where we're  losing time here okay why are we getting  just 5% yeah yep good so that's one and  there's one other that we're not using  very effectively because that's one and  those are the two optimizations we're  gonna do two to get a really good code  here so what's the other one yeah that's  actually related to the same question  okay but there's another completely  different source of of opportunity here  yeah yeah okay we can actually manage  the cache misses better okay  so let's go back to hardware caches  caches and let's restructure the  computation to reuse data in the cache  as much as possible because cache misses  are slow and hits are fast and try to  make the most of the cast by reusing the  data that's already there okay so let's  just take a look suppose that we're  we're gonna just compute one row of see  okay so we go through one row of see  that's going to take us since is 4096  long vector there that's going to  basically be 4096 writes that we're  going to do okay and we're gonna get  some spatial locality there which is  good but we're basically doing the  processors doing 4096 writes now to  compute that row okay I need to access  4096 reads from a okay and  I need all of B okay cuz I go each  column of B okay as I'm going through to  fully compute C do people see that okay  so I need to just compute one row of C  I'm gonna compute what I need to access  one row of a and all of B okay because  the first element of C needs the whole  first column of B the second element of  C needs the whole second column of B  once again don't worry if you don't  fully understand this because right now  I'm just ripping through this at high  speed we're going to go into this and  much more depth in the class and  there'll be plenty of time to master  this stuff but the main thing to  understand is you're going throw a B  then I want to compute another row of C  I'm going to do the same thing I'm gonna  go through one row of a and all of B  again so that when I'm done we do about  16 million 17 million memory accesses  total okay that's a lot of memory asses  so what if instead of doing that I do  things in blocks okay so what if I want  to compute a 64 by 64 block of C rather  than a row of C so let's take a look at  what happens there so remember by the  way this number 16 17 million okay  because we're gonna compare with it okay  so what about to compute a block so if I  look at a block that is going to take me  64 by 64 also takes 4096 acts writes to  see same number okay but now I have to  do about 200,000 reads from a because I  need to access all those rows okay and  then for B I need to access 64 columns  of B okay and that's another two  thousand two hundred sixty two thousand  reads from B okay which ends up being  half a million memory accesses total  okay so I end up doing way fewer  accesses okay if I can if if those  blocks will fit in my cache okay so I do  much less to compute  the same size footprint if I compute a  block rather than computing a row okay  much more efficient okay and that's a  scheme called tiling and so if you do  tiled matrix multiplication what you do  is you bust your matrices into let's say  64 by 64 sub matrices and then you do  two levels of matrix multiply you do an  outer level of multiplying of the blocks  using the same algorithm and then when  you hit the inner to do a 64 by 64  matrix multiply I then do another three  nested loops you end up with six nested  loops okay and so you're basically you  know busting it like this and there's a  tuning parameter of course which is you  know how big do I make my tile size you  know if it's s by s what should I do it  the Leafs there should it be 64 should  it be 128 should it be what number  should I use there how do we find the  right value of how do we find the right  value of s this tuning parameter okay  ideas of how we might find it yeah  you could do that you might get a number  but who knows what else is going on in  the cache while you're doing this yeah  test a bunch of them experiment okay try  them see which one gives you good  numbers and when you do that you get it  turns out that 32 gives you the best  performance okay for this particular  problem okay so you can block it and  then you can get faster and when you do  that you now end up with that gave us a  speed of about 1.7 okay  so we're now up to what we're almost ten  percent of peak okay and the other thing  is that if you use cache grande or a  similar tool you can figure out how many  cache references there are and so forth  and you can see that in fact it's  dropped quite considerably when you do  blocked  you know the tiling versus just the  straight parallel loops okay so once  again you can use tools to help you  figure this out and to understand the  cause of what's going on well it turns  out that our chips don't have just one  cache they've got three levels of caches  okay there's l1 cache okay and there's  data and instructions so we're thinking  about data here for the data for the  matrix then it's got an l2 cache which  is also private to the processor and  then a shared l3 cache and then you go  out to the DRAM you also can go to your  neighboring processors and and such okay  and they're of different sizes you can  see they grow in size 32 to 232  kilobytes 256 kilobytes to 25 megabytes  to main memory which is 60 gigabytes so  what you can do is if you I want to do  two-level tiling okay you can have two  tuning parameters s and T and now you  get to do you can't do binary search to  find it unfortunately because it's  multi-dimensional you kind of have to do  it exhaustively and when you do that you  end up with  with nine nested loops okay but of  course we don't really want to we have  three levels of caching  okay can anybody figure out the  inductive number how many out for three  levels of caching how many levels of  tiling do we have to do this is a this  is a gimme right twelve good twelve okay  yeah into twelve okay that really and  man you know when I say the code gets  ugly when you start making things go  fast okay right this is like okay okay  but it turns out there's a trick you can  tie out for every power of two  simultaneously by just solving the  problem recursively so the idea is that  you do divide and conquer you divide  each of the matrices into four sub  matrices okay and then if you look at  the calculations you need to do you have  to solve eight subproblems of half the  size and then do a and then do an  addition okay and so you have eight  multiplications of size n over two by n  over two and one addition of n by n  matrices and that gives you your answer  but then of course what you're gonna do  is solve each of those recursively okay  and that's going to give you essentially  the same type of performance here's the  code we I don't expect that you  understand this but we've written this  using in parallel because it turns out  you can do four of them in parallel and  the silks spawn here says go and do this  subroutine which is basically a sub  problem and then while you're doing that  the you're allowed to go and execute the  next statement which will do another  spawn another spawn and finally this and  then this statement says but don't start  the next phase until you've finished the  first phase okay and we'll learn about  about this stuff okay when we do that we  get a running time of about 93 seconds  which is about 50 times slow  than the last version we're using cash  much better but it turns out you know  nothing is free nothing is easy in  typically in performance engineer you  have to be of clever why what happened  here what why did this get worse even  though it turns out if you actually look  at the caching numbers you're getting  great hits on cash I mean you have very  few very few cache misses lots of hits  on cache but we're still slower why do  you suppose that is  let me get some yeah yeah the overhead  to start of the function and in  particular the place that it matters is  that the leaves of the computation okay  so what we do is we have a very small  base case we're doing this overhead all  the way down to N equals 1 so there's a  function call overhead even when you're  multiplying one by one so hey let's pick  a threshold and below that threshold  let's just use a standard you know good  algorithm for the for that threshold and  if we're above that we'll do divide and  conquer okay so so what we do is we call  a instant you know if we're less than  the threshold okay we we call a base  case and the base case looks very much  like just ordinary makes us multiply  okay  and so when you do that you can once  again look to see what's the best value  for the base case and it turns out in  this case I guess it's it's 64 okay we  get down to one point nine five seconds  I didn't do the base case of one because  I tried that and that was the one that  gave us terrible performance  sorry 32 oh yeah 32 is even better 1.3  good yeah so he picked 32 I think I even  or I didn't highlight it okay  I should have highlighted that on the  slide so so then when we do that we now  are getting 12 percent of peak okay and  we're doing if you count up how many  cache misses we have  you can see that you know here's the  data cache for l1 and with parallel  divide-and-conquer it's the lowest but  also now so is the last level caching  okay and then total number of references  as small as well so divide-and-conquer  turns out to be a big win here okay now  the other thing that we mentioned which  was we're not using the vector Hardware  all of these things have vectors that we  can operate on okay they have vector  heart that process data in what's called  sim deme fashion which means single  instructions same multiple data that  means you give one instruction and it  does operations on a vector okay and as  we mentioned we have we have eight  floating-point units per core which  which we can also do a fuse multiply add  okay so so each vector Reginald multiple  words I believe in the one the machine  we're using this term is four words I  think so okay  and the but it's important when you use  these you can't just use them  willy-nilly you've got a you've got to  have all the you know you have to  operate on or on the data as one chunk  of vector data you can't you know have  this Lane doing of the vector unit doing  one thing in a different Lane doing  something else they all have to be doing  essentially the same thing the only  difference being the indexing of memory  okay so when you do that you can it so  already we've actually been taking  advantage of it but you can produce a  vectorization report by by asking that  and it'll tell you the system will tell  you what what kinds of things are being  vectorized which things are being  vectorized which aren't and we'll talk  about how you vectorize things that the  compiler doesn't want to vectorize okay  and in particular most machines don't  support the newest sets of vector  instructions so the  Pilar uses vector instructions  conservatively by default so what you  were particularly if you're compiling  for a particular machine you can say use  that particular machine and here's some  of the vectorization flags you can say  use the AVX instructions if you have a  VX you can use a VX - you can use the  fuse multiply add vector instructions  you can give a string that tells you the  architecture that you're running on on  that special things and you can say well  use whatever machine I'm currently  compiling on ok and it'll figure out  which architecture is that ok now  floating-point numbers as we'll talk  about turn out to have some undesirable  properties like they're not associative  so if you do a times B times C how you  parenthesize that can give you two  different numbers and so if you give a  specification of a code typically the  compiler will not change the order of  associativity because it says I want to  get exactly the same result but you can  give it a flag called fast math - F fast  math which will allow it to do that kind  of reordering ok if it's not important  to you that it be the same as the  default ordering ok and when you use  that so in particularly using  architecture native and fast math we  actually get about double the  performance out of vectorization just  have any compiler vectorizer ok yeah  question  there's sixty four-bit yeah but so we  use these days 64-bit is pretty standard  they call that double precision but it's  pretty stand unless you're doing AI  applications in which case you may want  to do lower precision arithmetic no flow  to float is 32 okay  so generally people use 60 and who are  doing serious you know linear algebra  calculations you 64 bits but sometimes  they can use actually sometimes they can  use less and then you can get more  performance if you discover you can use  fewer bits in your representation we'll  talk about that too okay so last thing  that we're going to do is there are you  can actually use the instructions the  vector instructions yourself rather than  relying the compiler to do it and  there's a whole manual of in strings ik  instructions that you can call from C  that allow you to do you know the  specific vector instructions that you  might want to do it so the compiler  doesn't have to figure that out  and so um and you can also use some  other more insights to do things like  you can do pre-processing and you can  transpose the matrices which turns out  to help and do data alignment and  there's a lot of other things and using  clever algorithm for the base case okay  and so you and you do more performance  engineering you think about what you're  doing you code and then you run run run  to to test and that's one nice reason to  have the cloud because you can do tests  in parallel so it takes you less time to  do your tests in terms of your you know  sitting around time when you're doing  something you say oh I want to do ten  tests let's spin up ten machines and do  all the tests at the same time when you  do that and the main one we're getting  out of this is the AVX intrinsics we get  up to  point four one of peak so 41 percent of  peak and get about fifty thousand  speed-up okay and it turns out that's  where we quit okay and the reason is  because we built we beat Intel's  professionally engineered math kernal  library at that point okay you know  there's a good question is why aren't we  getting all of peak and you know I  invite you to to figure that out okay it  turns out though Intel MKL is better  than what we did because we assumed it  was a power of two Intel doesn't assume  that it's a power of two and they're  more robust and although we win on the  496 by 496 by 4096 matrices they win on  other sizes of matrices so it's not all  it's not all things so so but the end of  the story is that you know what have we  done we have just done a factor of  50,000 okay if you looked at the gas  economy okay of a jumbo jet okay and  getting the kind of performance that we  just got in terms of miles per gallon  you would be able to run a jumbo jet on  a on a little Vespa scooter or whatever  type of scooter that is okay that's how  much we've been able to do it you gently  let me just caution you won't see the  magnitude of a performance improvement  that we obtained for matrix  multiplication okay that turns out to be  one where it's a really good example  because it's so dramatic but we will see  some substantial numbers and in  particular in sixty one seventy two  you'll learn how to print this currency  of performance all by yourself so that  you don't have to take somebody else's  library you can you know say oh no I'm  an engineer that let me mention one  other thing you  this course we're going to focus on  multi-core computing we are not in  particular going to be doing GPUs or  file systems or network performance in  the real world those are hugely  important ok what we found however is  that it's better to learn a particular  domain in particularly this particular  domain people who master who master  multi-core performance engineering in  fact go on to do these other things and  are really good at it  ok because you've learned this sort of  the core the basis the foundation  you

2
00:00:03,189 --> 00:00:05,769
the following content is provided under  a Creative Commons license your support  will help MIT OpenCourseWare continue to  offer high quality educational resources  for free  to make a donation or to view additional  materials from hundreds of MIT courses  visit MIT opencourseware at ocw.mit.edu  so welcome to six one seven two my name  is charles leiserson and i am one of the  two lecturers this term the other is  professor Julian shun were both in EECS  and in csail on the seventh floor of the  gates building if you don't know it you  are in performance engineering of  software systems so if this is the wrong  if you found yourself in the wrong place  now is the time to exit I want to start  today by talking a little bit about why  we do performance engineering and then  I'll do some a little bit of  administration and then sort of dive  into sort of a case study that'll give  you a good sense of some of the things  that we're going to do during the term  I put the administration in the middle  because it's like if if you don't if  from me telling you about the course you  don't want to do the course then it's  like why should you listen to the  administration right it's like so so  let's just dive right in okay so so the  first thing to always understand  whenever you're doing something is a  perspective on your on what matters in  what you're doing so we're going to  study the whole term we're gonna do  software performance engineering and so  this is kind of interesting because it  turns out that performance is usually  not at the top of what people are  interested in when they're building  software okay what are some of the  things that are more important than  software that's sorry then performance  yeah deadlines good cost correctness  extensibility yeah we would go on and on  I think that you folks could probably  make a pretty long list I made a short  list of all the kinds of things that are  more important than performance so then  if programmers are so willing to  sacrifice performance for these  properties why do we study performance  okay so this is kind of a bit of a  paradox in a bit of a puzzle why do you  study something that clearly isn't at  the top of the list of what most people  care about when they're developing  software I think the answer to that is  that performance is the currency of  computing okay you use performance to  buy these other properties so you'll say  something like gee I want to make it  easy to program and so therefore I'm  willing to sacrifice some performance to  make something easy to program now I'm  willing to sacrifice some performance to  make sure that my system is secure okay  and all those things come out of your  performance budget and clearly if  performance degrades too far your stuff  becomes unusable okay when I talk with  people in with programmers and I say you  know people are fond of saying ah  performance you know yo you do  performance performance doesn't matter I  never think about it then I talk with  with people who use computers and I ask  what's your main complaint about the the  computing systems you use answer too  slow  okay so it's interesting whether you're  the producer or that whatever but but  the real answer is that that performance  is is like currency it's something you  spend I would rather have if you look  you know would I rather have $100 or a  gallon of water well water is  indispensable to life  there's circumstances certainly where I  would prefer to have the water okay then  $100 okay but in our modern society I  can buy water for much less than $100  okay so even though water is essential  to life and far more important than  money  money is a currency and so I prefer to  have the money because I can just buy  the things I  and that's the same kind of analogy of  performance it has no intrinsic value  but it contributes to things you can use  it to buy things that you care about  like usability or testability or what  have you  okay now in the early days of computing  software performance engineering was  common because machine resources were  limited if you look at these machines  from 1964 to 1977 I mean look at the  site look at how many bytes they have on  them right the in 64 there is a computer  with 524 kilobytes okay that was a big  machine back then that's kilobytes  that's not megabytes that's not  gigabytes that's kilobytes okay and many  programs would strain the machine  resources okay the clock rate for that  machine was 33 kilohertz what's a  typical clock rate today about what four  gigahertz three gigahertz two gigahertz  somewhere up there yeah somewhere in  that range okay and here they were  operating with kilohertz so many  programs would not fit without intense  performance engineering and one of the  things also that there's a lot of a lot  of sayings that came out of that area  Donald Knuth who's one of the touring  award winner absolutely fabulous  computer scientists in all respects  wrote premature optimization is the root  of all evil and I invite you by the way  to look that quote up and because  there's actually taken out of context  okay so trying to optimize stuff too  early he was worried about okay bill  Wolfe who built the design the bliss  language and worked on the pdp-11 and  such said more computing sins are  committed in the name of efficiency  without necessarily achieving it than  for any other single reason including  blind stupidity okay and Michael Jackson  said the first rule of program  optimization don't  do it second rule of program  optimization for experts only don't do  it yet okay so everybody warning away  because when you start trying to make  things fast your code becomes unreadable  okay making code that is readable and  fast now that's that's where the art is  and hopefully we'll learn a little bit  about doing that okay and indeed there  was no real point in in working too hard  on performance engineering for many  years if you look at technology scaling  and you look at how many transistors are  on various processor designs up until  about 2004 we had Moore's law in in full  throttle okay with chip densities  doubling every two years and really  quite amazing and along with that as  they shrunk the dimensions of chips  because by miniaturization the clock  speed would go up correspondingly as  well and so if you found something was  too slow wait a couple years okay wait a  couple years it'll be faster okay and so  there wasn't you know if you're going to  do something with software and make your  software ugly that really wasn't a real  you know wasn't a real good payoff  compared to just simply waiting around  and in that era there was something  called Dennard scaling where which  allowed things to as things shrunk  allowed the clock speeds to get larger  basically by reducing power you could  reduce power and still keep everything  fast and we'll talk about that in a  minute  so if you look at what happened to from  1977 to 2004 here are Apple computers  with similar  similar price tags and you can see the  the clock rate really just skyrocketed  one megahertz 400 megahertz 1.8  gigahertz okay and the data paths went  from 8 bits to 32 64 the memory because  correspondingly grow cost approximately  the same and that was that's the legacy  from Moore's Law and the tremendous  advances in semiconductor technology and  so until 2004 moore's law in the scaling  of clock frequency so court Dennard  scaling was essentially a printing press  for the currency of performance okay you  didn't have to do anything you just made  the hardware go fast or very easy very  and all that came to an end well some of  it came to an end in 2004 when clock  speeds plateaued okay so if you look at  this around 2005 you can see all the  speeds we hit you know 2 to 4 gigahertz  and we have not been able to make chips  go faster than that in any practical way  since then but the densities have kept  growing great now the reason that the  clock speed flattened was because of  power density and this is a slide from  Intel from that era looking at the  growth of power density and what they  were projecting was that that the  junction temperatures of the transistors  on the chip if they just keep scaling  the way they had been scaling would  would start to approach first of all the  temperature of a nuclear reactor than  the temperature of a rocket nozzle and  then the Sun surface okay so that we're  not going to build little technology  that cools that very well and even if  you could solve it for a little bit the  writing was on the wall  we cannot scale clock frequencies  anymore the reason for that is that  originally clock frequency was scaled  assuming that the most of the power was  dynamic power which was going when you  switched the circuit and what happened  as we kept reducing that and reducing  that is something that used to be in the  noise namely the  Deacon's okay started to become  significant to the point where now today  the dynamic power is is far less of a  concern than the static power from just  the circuits sitting there leaking and  when you miniaturize you can't stop that  effect from happening so what did the  vendors do in 2004 and 2005 and since is  they said oh gosh we've got all these  transistors to use but we can't use the  transistors to make stuff run faster so  what they did is they introduced  parallelism in the form of multi-core  processors they put more than one  processing core in a chip and to scale  performance they would you know have  multiple cores and each generation of  Moore's law now was potentially doubling  the number of of cores and so if you  look at what happened for processor  cores you see that around 2005 2004 2005  we started to get multiple processing  cores per chip to the extent that today  it's basically impossible to find a  single core chip for a laptop or a  workstation or whatever  everything is multi-crew you can't buy  just one you have to buy a parallel  processor and and so the impact of that  was that performance was no longer free  you couldn't just speed up the hardware  now if you wanted to use that potential  you had to do parallel program and  that's not something that anybody in the  industry really had done so today there  are a lot of other things that happen in  the in that intervening time we got  vector units as common parts of our  machines we got GPUs we got steeper  cache hierarchies we have a configurable  logic on some machines and so forth and  now it's up to the software to adapt to  it and so although we don't want to have  to deal with performance  today you have to deal with performance  and in your lifetimes you will have to  deal with performance okay in software  if you're gonna have effective software  okay you can see what happened also this  is a study that we did looking at  software bugs and a variety of open  source projects where they're mentioning  the word performance and you can see  that in 2004 the numbers start going up  you know some of them it's not as as  convincing for some things as others but  generally there's a trend of after 2004  people started worrying more about  performance if you look at software  developer jobs you know as of tooth you  know early mid-2000s 2000 oh ohs I guess  okay the you see once again the mention  of performance and jobs is going up and  anecdotally I can tell you know I had  one student who came to me after the  spring after he'd taken six 172 and he  said you know I went and I had applied  for five jobs and every job asked me  every at every job interview they asked  me a question I couldn't have answered  if I hadn't taken six 172 and I got five  offers okay and when I compared those  offers they tended to be 20 to 30%  larger than people are just web monkeys  okay so so anyway that's not to say that  you should necessarily take this class  okay but I just want to point out that  what we're gonna learn is going to be  interesting from a practical point of  view ie your futures okay as well as  theoretical points of view and technical  points of view okay  so modern processors are really  complicated and the big question is how  do we write software to use that modern  hardware efficiently okay I want to give  you a example of performance engineering  of a very well  studied problem namely matrix  multiplication who has never seen this  problem okay so there we got some  Joker's in the class I can say okay so  this is you know it takes n cubed  operations because you're basically  computing N squared dot products okay so  essentially if you add up the total  number of operations it's about 2n cubed  because there is essentially a multiply  and an ADD for every pair of terms that  need to be accumulated okay so it's  basically 2n cubed we're gonna look at  it assuming for simplicity that our n is  a an exact power of two  okay now the machine that we're gonna  look at is going to be one of the one  one of the ones that you'll have access  to an AWS okay it's a it's a compute  optimized machine which has a Haswell  microarchitecture running at 2.9  gigahertz there are two processor chips  for each of these machines and nine  processing cores per chip so a total of  18 cores so that's the amount of  parallel processing it does two-way  hyper-threading which we're actually  going to not deal a lot with it hyper  threading gives you a little bit more  performance but it also makes it really  hard to measure things so generally we  will turn off hyper threading but the  performance that you get tends to be  correlated with what you get when your  hyper thread for floating-point unit  there it is capable of doing eight  double precision operations that 64-bit  floating-point operations including a  fused multiply add per core per cycle  okay so that that's a vector unit so you  basically each of these 18 cores can do  eight double precision operations and so  including a fuse multiply add which is  actually two operations  okay the way that they count these  things okay it has a cache line size of  64 bytes the AI cache is 32 kilobytes  which is 8 way set associative we'll  talk about some of these things if you  don't know all the terms that's okay  we're gonna cover most of these terms  later on it's got a D cache of the same  size it's got an l2 cache of 256  kilobytes and it's got an l3 cache or  what's sometimes called an LLC last  level cache of 25 megabytes and then  it's got 60 gigabytes of DRAM so this is  a honking big machine okay this is like  you can get things to sing on this okay  if you look at the peak performance it's  the clock speed times 2 processor chips  times 9 processing cores per chip each  capable of if you can use both the  multiply and the add 16 floating-point  operations and that goes out to just  short of a teraflops okay 836 gigaflops  okay so that's that's a lot of power  okay  that's a lot of power these are fun  machines actually okay especially when  we get into things like the the the game  playing AI and stuff that we do for the  fourth project you'll see it they're  really fun can I have a lot of compute  okay now the base here's the basic code  this is the full code for Python for  doing matrix multiplication now  generally in Python you wouldn't use  this code because you just call a  library subroutine that does matrix  multiplication but sometimes you have a  problem I'm going to illustrate with  matrix multiplication but sometimes you  have a problem that is for what you have  to write the code and I want to give you  an idea of how what kind of performance  you get out of Python okay  in addition somebody has to write if  there is a library routine somebody had  to write it and that person was a  performance engineer because they wrote  it to be as fast as possible and so this  will give you an idea of what you can do  to make  run fast okay so when you run this code  so you can see that the start time you  know before the triply nested loop right  here before the triply nested loop we  take a time measurement and then we take  another time measurement at the end and  then we print the difference and then  that's just this classic three triply  nested loop for for matrix  multiplication and so when you run this  how long is this run for do you think  any guesses let's see how about let's do  this runs for six microseconds who  thinks six microseconds how about six  milliseconds how about six milliseconds  how about six seconds how about six  minutes okay  how about six hours how about six days  okay of course it's important to know  what size it is this 4096 by 4096 as it  shows in the code okay so and those of  you didn't vote can I wake up let's get  active this is active learning put  yourself out there okay it doesn't  matter whether you're right and wrong  there'll be a bunch of people who got  the right answer there have no idea why  okay so it turns out it takes about  21,000 seconds which is about six hours  okay amazing is this fast  Yeah right duh right this fast no that  you know how how do we tell whether  whether this is fast or not okay you  know what should we expect from our  machine so let's do a  back-of-the-envelope calculation of of  how many operations there are and how  fast we ought to be able to do we just  went through and said what all the  parameters the machine so there are 2n  cubed operations that need to be  performed we're not doing strassens  algorithm or anything like that we're  just doing straight triply nested loop  so that's 2 to the 37th floating-point  operations okay the running time is  21,000 seconds so that says that we're  getting about 6.2 5 mega flops out of  our machine when we run that code ok  just by dividing it out how many  floating-point operations per second do  we get we take the number of operations  divide it by the time okay the peak as  you recall was about 836 gigaflops okay  and we're getting 6.25 mega flops okay  so we're getting about 0.0007 5 percent  of peak okay this is not fast okay this  is not fast so let's let's do something  really similar let's code it in Java  rather than Python okay so we take just  that loop the code is almost the same  okay triply nested loop we run it in  Java okay and the running time now it  turns out is about just under 3,000  seconds which is about 46 minutes the  same code Python Java okay we got a  almost a nine times speed-up over just  simply coding it in a different language  okay well let's try see that's the clap  that's the language we're going to be  using here what happens when you code it  and see it's exactly the same thing okay  we're going to use the clang LLVM 5.0  compiler I believe we're using 6.0 this  term is that right yeah okay I should  have rerun these numbers for sex but I  didn't so now it's basically 1,100  seconds which is about 19 minutes so we  got then about it's twice as fast as  Java and about 18 times faster than  Python okay so here's where we stand so  far okay we have the running time of  these various things okay and the  relative speed up is how much faster it  isn't in the previous row and the  absolute speed up is how it is compared  to the first row and now we're managing  to get now 0.01 4 percent of peak so  we're still we're still slow but before  we go and try to optimize it further  like why is Python so slow and C so fast  anybody know  okay that's that's kind of on the right  track anybody else having the ice  articulate a little bit why Python is so  slow  yeah you're right like multiplying add  those aren't the only instructions  pythons doing it's doing lots of code  yeah yeah okay good so the big reason  why Python is slow and C is so fast is  that python is interpreted and C is  compiled directly to byte to machine  code and Java is somewhere in the middle  because Java is compiled to bytecode  which is then interpreted and then  just-in-time compiled into machine code  so let me tell it talked a little bit  about these things so so interpreters  such as in Python are versatile but slow  it's one of these things where they said  we're gonna take some of our performance  and use it to make a more flexible  easier to program environment okay the  interpreter basically reads interprets  and performs each program statement and  then updates the Machine State so it's  not just it's actually going through an  each time reading your code figuring out  what it does and then implementing it so  there's like all this overhead compared  to just doing its operations so  interpreters can easily support  high-level programming features and they  can do things like dynamic code  alteration and so forth at the cost of  performance so that you know typically  the the cycle for an interpreter is you  read the next statement you interpret  the statement you then perform the  statement and then you update the state  of the machine and then you would fetch  the next instruction okay and you're  going through that each time and that's  done in software okay when you have  things compiled to machine code it goes  through a similar thing but it's highly  optimized just for the things that  machines are done okay and so when you  compile you're able to take advantage of  the hardware and interpreter of machine  instructions and that's much much lower  overhead than the big software overhead  you get with Python  now JIT is somewhere in the middle  what's used in Java JIT compilers can  recover some of the performance in fact  it did a pretty good job in this case  the idea is when the code is first  interpreted it's executed it's  interpreted and then the runtime C  system keeps track of how often the  various pieces of code are executed and  whenever it finds that there's some  piece of code that it's executing  frequently it then calls the compiler to  compile that piece of code and then  subsequent to that it runs the compiled  code so it tries to get the big  advantage of the of performance by only  compiling the things that are necessary  you know for which it's actually going  to pay off to to invoke the compiler to  do ok so so anyway so that's the big  difference with with those kinds of  things one of the reasons we don't use  Python in this class is because the  performance model is hard to figure out  ok see it's much closer to the metal  much closer to the silicon ok and so  it's much easier to figure out what's  going on in that in that context ok but  we will have a guest lecture that we're  going to talk about performance in  managed languages like like Python so  it's not that we're going to ignore the  topic but we will learn how to do  performance engineering in a place where  it's easier to do it ok now one of the  things that good compiler will do is you  know once you get to let's say we have  the C version which is where we're going  to move from this point cuz that's the  fastest week we got so far is it turns  out that you can change the order of  loops in this program  without affecting the correctness ok so  here we went you know for I for j4k do  the update ok we could otherwise do we  could do for I for k4j do the update  and it computes exactly the same thing  or we could do for K for j4 I do the  updates okay so we can change the order  without affecting the correctness okay  and so do you think the order of loops  matters for performance uh uh you know I  think this is like this leading question  yeah question yes okay and you're  exactly right cache locality is what it  is so when we do that we get the loop  order affects the running time by a  factor of 18 whoa just by switching the  order okay what's going on there okay  what's going on so we're going to talk  about this in more depth so I'm just  going to fly through this because this  is just sort of showing you the kinds of  considerations that you do so hardware  there are each processor reads and  writes main memory in contiguous blocks  called cache lines okay previously  access cache lines are stored in a small  memory called cache that sits near the  processor when it access when the  processor accessed something if it's in  the cache you get a hit that's very  cheap okay and fast if you miss you have  to go out to either a deeper level cache  or all the way out to main memory that  is much much slower and we'll talk about  that kind of thing so what happens in in  for this matrix problem is the matrices  are laid out in memory and look row  major order that means you take you know  you have a two-dimensional matrix it's  laid out in the linear order of the  addresses of memory by essentially  taking Row one and then after Row one  stick Row two and after that stick Row  three and so forth and unfolding there's  another order that things could have  been laid out in fact they are in  Fortran which is called column major  order okay so it turns out C and Fortran  operate in different orders okay and  turns out it affects performance which  way it does it so let's just take a look  at the access pattern for order ijk okay  so what we're doing is  once we figure out what I and what J is  we're gonna go through and cycle through  k and as we cycle through K okay CIJ  stays the same for everything we get for  that excellent spatial locality because  we're just accessing the same location  every single time it's going to be in  cache it's always going to be there it's  going to be fast to access see for a  what happens is we go through in a  linear order and we get good spatial  locality but for B it's going through  columns and those points are distributed  far away in memory so the processor is  going to be bringing in 64 bytes to  operate on a particular datum okay and  then it's ignoring seven of the eight by  seven of the eight floating-point words  on that on that cache line and going to  the next one so it's wasting an awful  lot okay so this one has good spatial  locality and that it's all adjacent and  you would use the cache lines  effectively this one you're going 4096  elements apart it's got poor spatial  locality okay and that's why and that's  for this one so then if we look at the  different other ones this one the order  ikj it turns out you get good spatial  locality for both C and B and excellent  for a okay and if you look at you know  even another one you don't get nearly as  good as the other one so there's a whole  range of things okay this one you're  doing optimally badly and both okay and  so you can just measure the different  ones and it turns out that that you can  you can use a tool to figure this out  and the tool that we'll be using is  called cache grinned and it's at one of  the valgrind Suites of caches and what  it'll do is it'll tell you what the Miss  rates are for the various pieces of code  okay and you'll learn how to use that  tool and figure out oh look at that we  have a high miss rate for some and not  for others so that may be why my code is  running slowly okay so when you pick the  best one of those okay we then got a  relative speed up from about six  and a half so what other simple changes  can we try there's actually a collection  of things that we could do that don't  even have us touching the code what else  could we do four people have played with  compilers and such and hint yeah yeah  change the compiler flags  okay so clang which is the compiler will  be using provides a collection of  optimization switches and you can  specify a switch to the compiler to ask  it to optimize so the you do - oh and  then a number and zero if you look at  the documentation that says do not  optimize one says optimize two says  optimize even more 3 says optimize yet  more ok in this case it turns out that  even though it optimized more in oh  three turns out Oh - was a better  setting ok this is one of these cases it  doesn't happen all the time usually oh  three does better than Oh two but in  this case Oh to actually optimize better  than oh three because the optimizations  are to some extent heuristic ok and  there are also other kinds of automation  you can have it do a profile guided  optimization where you look at what the  performance was and feed that back into  the code and then the the compiler can  be smarter about how it optimizes and  there are a variety of other things so  with this simple technology we now  choosing a good optimization flag in  this case Oh - we got for free basically  a factor of three point two five okay  without having to do much work at all ok  and now we're actually starting to  approach one percent of peak performance  we got point three percent of peak  performance ok so what's causing the low  performance why aren't we getting most  of the performance out of this machine  why do you think yeah  yeah we're not using all car so far  we're using just one core and how many  course we have 18 right 18 course ah 18  cords just sitting there 17 sitting idle  while we are trying to optimize one okay  so multi core so we have nine cores per  chip and there are two of these chips  the in our test machine so we're running  on just one of them so let's use them  all to do that we're going to use the  silk infrastructure and in particular we  can use what's called a parallel loop  which in silk huge call silk four and so  you just relate that outer loop for  example in this case you say silk four  it says do all those iterations in  parallel compiler and runtime system are  free to schedule them and so forth okay  and we can also do it for the inner loop  okay and you know it's like or it turns  out you can't also do it for the middle  loop if you think about it  okay so I'll let you do that as a little  bit of a homework problem why can't I  just do a soak four of the inner loop  okay so the question is which parallel  version works best so we can do parallel  the I loop we can parallel the J loop  and we can do I and J together you can't  do K just with a parallel loop and  expect to get the right thing okay so  and that's this way so if you look why  what a spread of running times right  okay if I paralyze the just the I loop  it's three point one eight seconds and  if I paralyze the J loop its it actually  slows down I think right and then if I  do both I and J its Eve it's still bad I  just want to do the out loop there this  has through it turns out with scheduling  overhead and we'll learn about  scheduling overhead and how you predict  that and such so the rule of thumb here  is paralyze outer loops rather than  inner loops okay and so when we do  parallel loops we get  almost 18 X speed-up on 18 cores okay so  let me assure you not all code is that  easy to paralyze okay but this one  happens to be so now we're up to what  about just over 5% of peak  okay so where are we losing where we're  losing time here okay why are we getting  just 5% yeah yep good so that's one and  there's one other that we're not using  very effectively because that's one and  those are the two optimizations we're  gonna do two to get a really good code  here so what's the other one yeah that's  actually related to the same question  okay but there's another completely  different source of of opportunity here  yeah yeah okay we can actually manage  the cache misses better okay  so let's go back to hardware caches  caches and let's restructure the  computation to reuse data in the cache  as much as possible because cache misses  are slow and hits are fast and try to  make the most of the cast by reusing the  data that's already there okay so let's  just take a look suppose that we're  we're gonna just compute one row of see  okay so we go through one row of see  that's going to take us since is 4096  long vector there that's going to  basically be 4096 writes that we're  going to do okay and we're gonna get  some spatial locality there which is  good but we're basically doing the  processors doing 4096 writes now to  compute that row okay I need to access  4096 reads from a okay and  I need all of B okay cuz I go each  column of B okay as I'm going through to  fully compute C do people see that okay  so I need to just compute one row of C  I'm gonna compute what I need to access  one row of a and all of B okay because  the first element of C needs the whole  first column of B the second element of  C needs the whole second column of B  once again don't worry if you don't  fully understand this because right now  I'm just ripping through this at high  speed we're going to go into this and  much more depth in the class and  there'll be plenty of time to master  this stuff but the main thing to  understand is you're going throw a B  then I want to compute another row of C  I'm going to do the same thing I'm gonna  go through one row of a and all of B  again so that when I'm done we do about  16 million 17 million memory accesses  total okay that's a lot of memory asses  so what if instead of doing that I do  things in blocks okay so what if I want  to compute a 64 by 64 block of C rather  than a row of C so let's take a look at  what happens there so remember by the  way this number 16 17 million okay  because we're gonna compare with it okay  so what about to compute a block so if I  look at a block that is going to take me  64 by 64 also takes 4096 acts writes to  see same number okay but now I have to  do about 200,000 reads from a because I  need to access all those rows okay and  then for B I need to access 64 columns  of B okay and that's another two  thousand two hundred sixty two thousand  reads from B okay which ends up being  half a million memory accesses total  okay so I end up doing way fewer  accesses okay if I can if if those  blocks will fit in my cache okay so I do  much less to compute  the same size footprint if I compute a  block rather than computing a row okay  much more efficient okay and that's a  scheme called tiling and so if you do  tiled matrix multiplication what you do  is you bust your matrices into let's say  64 by 64 sub matrices and then you do  two levels of matrix multiply you do an  outer level of multiplying of the blocks  using the same algorithm and then when  you hit the inner to do a 64 by 64  matrix multiply I then do another three  nested loops you end up with six nested  loops okay and so you're basically you  know busting it like this and there's a  tuning parameter of course which is you  know how big do I make my tile size you  know if it's s by s what should I do it  the Leafs there should it be 64 should  it be 128 should it be what number  should I use there how do we find the  right value of how do we find the right  value of s this tuning parameter okay  ideas of how we might find it yeah  you could do that you might get a number  but who knows what else is going on in  the cache while you're doing this yeah  test a bunch of them experiment okay try  them see which one gives you good  numbers and when you do that you get it  turns out that 32 gives you the best  performance okay for this particular  problem okay so you can block it and  then you can get faster and when you do  that you now end up with that gave us a  speed of about 1.7 okay  so we're now up to what we're almost ten  percent of peak okay and the other thing  is that if you use cache grande or a  similar tool you can figure out how many  cache references there are and so forth  and you can see that in fact it's  dropped quite considerably when you do  blocked  you know the tiling versus just the  straight parallel loops okay so once  again you can use tools to help you  figure this out and to understand the  cause of what's going on well it turns  out that our chips don't have just one  cache they've got three levels of caches  okay there's l1 cache okay and there's  data and instructions so we're thinking  about data here for the data for the  matrix then it's got an l2 cache which  is also private to the processor and  then a shared l3 cache and then you go  out to the DRAM you also can go to your  neighboring processors and and such okay  and they're of different sizes you can  see they grow in size 32 to 232  kilobytes 256 kilobytes to 25 megabytes  to main memory which is 60 gigabytes so  what you can do is if you I want to do  two-level tiling okay you can have two  tuning parameters s and T and now you  get to do you can't do binary search to  find it unfortunately because it's  multi-dimensional you kind of have to do  it exhaustively and when you do that you  end up with  with nine nested loops okay but of  course we don't really want to we have  three levels of caching  okay can anybody figure out the  inductive number how many out for three  levels of caching how many levels of  tiling do we have to do this is a this  is a gimme right twelve good twelve okay  yeah into twelve okay that really and  man you know when I say the code gets  ugly when you start making things go  fast okay right this is like okay okay  but it turns out there's a trick you can  tie out for every power of two  simultaneously by just solving the  problem recursively so the idea is that  you do divide and conquer you divide  each of the matrices into four sub  matrices okay and then if you look at  the calculations you need to do you have  to solve eight subproblems of half the  size and then do a and then do an  addition okay and so you have eight  multiplications of size n over two by n  over two and one addition of n by n  matrices and that gives you your answer  but then of course what you're gonna do  is solve each of those recursively okay  and that's going to give you essentially  the same type of performance here's the  code we I don't expect that you  understand this but we've written this  using in parallel because it turns out  you can do four of them in parallel and  the silks spawn here says go and do this  subroutine which is basically a sub  problem and then while you're doing that  the you're allowed to go and execute the  next statement which will do another  spawn another spawn and finally this and  then this statement says but don't start  the next phase until you've finished the  first phase okay and we'll learn about  about this stuff okay when we do that we  get a running time of about 93 seconds  which is about 50 times slow  than the last version we're using cash  much better but it turns out you know  nothing is free nothing is easy in  typically in performance engineer you  have to be of clever why what happened  here what why did this get worse even  though it turns out if you actually look  at the caching numbers you're getting  great hits on cash I mean you have very  few very few cache misses lots of hits  on cache but we're still slower why do  you suppose that is  let me get some yeah yeah the overhead  to start of the function and in  particular the place that it matters is  that the leaves of the computation okay  so what we do is we have a very small  base case we're doing this overhead all  the way down to N equals 1 so there's a  function call overhead even when you're  multiplying one by one so hey let's pick  a threshold and below that threshold  let's just use a standard you know good  algorithm for the for that threshold and  if we're above that we'll do divide and  conquer okay so so what we do is we call  a instant you know if we're less than  the threshold okay we we call a base  case and the base case looks very much  like just ordinary makes us multiply  okay  and so when you do that you can once  again look to see what's the best value  for the base case and it turns out in  this case I guess it's it's 64 okay we  get down to one point nine five seconds  I didn't do the base case of one because  I tried that and that was the one that  gave us terrible performance  sorry 32 oh yeah 32 is even better 1.3  good yeah so he picked 32 I think I even  or I didn't highlight it okay  I should have highlighted that on the  slide so so then when we do that we now  are getting 12 percent of peak okay and  we're doing if you count up how many  cache misses we have  you can see that you know here's the  data cache for l1 and with parallel  divide-and-conquer it's the lowest but  also now so is the last level caching  okay and then total number of references  as small as well so divide-and-conquer  turns out to be a big win here okay now  the other thing that we mentioned which  was we're not using the vector Hardware  all of these things have vectors that we  can operate on okay they have vector  heart that process data in what's called  sim deme fashion which means single  instructions same multiple data that  means you give one instruction and it  does operations on a vector okay and as  we mentioned we have we have eight  floating-point units per core which  which we can also do a fuse multiply add  okay so so each vector Reginald multiple  words I believe in the one the machine  we're using this term is four words I  think so okay  and the but it's important when you use  these you can't just use them  willy-nilly you've got a you've got to  have all the you know you have to  operate on or on the data as one chunk  of vector data you can't you know have  this Lane doing of the vector unit doing  one thing in a different Lane doing  something else they all have to be doing  essentially the same thing the only  difference being the indexing of memory  okay so when you do that you can it so  already we've actually been taking  advantage of it but you can produce a  vectorization report by by asking that  and it'll tell you the system will tell  you what what kinds of things are being  vectorized which things are being  vectorized which aren't and we'll talk  about how you vectorize things that the  compiler doesn't want to vectorize okay  and in particular most machines don't  support the newest sets of vector  instructions so the  Pilar uses vector instructions  conservatively by default so what you  were particularly if you're compiling  for a particular machine you can say use  that particular machine and here's some  of the vectorization flags you can say  use the AVX instructions if you have a  VX you can use a VX - you can use the  fuse multiply add vector instructions  you can give a string that tells you the  architecture that you're running on on  that special things and you can say well  use whatever machine I'm currently  compiling on ok and it'll figure out  which architecture is that ok now  floating-point numbers as we'll talk  about turn out to have some undesirable  properties like they're not associative  so if you do a times B times C how you  parenthesize that can give you two  different numbers and so if you give a  specification of a code typically the  compiler will not change the order of  associativity because it says I want to  get exactly the same result but you can  give it a flag called fast math - F fast  math which will allow it to do that kind  of reordering ok if it's not important  to you that it be the same as the  default ordering ok and when you use  that so in particularly using  architecture native and fast math we  actually get about double the  performance out of vectorization just  have any compiler vectorizer ok yeah  question  there's sixty four-bit yeah but so we  use these days 64-bit is pretty standard  they call that double precision but it's  pretty stand unless you're doing AI  applications in which case you may want  to do lower precision arithmetic no flow  to float is 32 okay  so generally people use 60 and who are  doing serious you know linear algebra  calculations you 64 bits but sometimes  they can use actually sometimes they can  use less and then you can get more  performance if you discover you can use  fewer bits in your representation we'll  talk about that too okay so last thing  that we're going to do is there are you  can actually use the instructions the  vector instructions yourself rather than  relying the compiler to do it and  there's a whole manual of in strings ik  instructions that you can call from C  that allow you to do you know the  specific vector instructions that you  might want to do it so the compiler  doesn't have to figure that out  and so um and you can also use some  other more insights to do things like  you can do pre-processing and you can  transpose the matrices which turns out  to help and do data alignment and  there's a lot of other things and using  clever algorithm for the base case okay  and so you and you do more performance  engineering you think about what you're  doing you code and then you run run run  to to test and that's one nice reason to  have the cloud because you can do tests  in parallel so it takes you less time to  do your tests in terms of your you know  sitting around time when you're doing  something you say oh I want to do ten  tests let's spin up ten machines and do  all the tests at the same time when you  do that and the main one we're getting  out of this is the AVX intrinsics we get  up to  point four one of peak so 41 percent of  peak and get about fifty thousand  speed-up okay and it turns out that's  where we quit okay and the reason is  because we built we beat Intel's  professionally engineered math kernal  library at that point okay you know  there's a good question is why aren't we  getting all of peak and you know I  invite you to to figure that out okay it  turns out though Intel MKL is better  than what we did because we assumed it  was a power of two Intel doesn't assume  that it's a power of two and they're  more robust and although we win on the  496 by 496 by 4096 matrices they win on  other sizes of matrices so it's not all  it's not all things so so but the end of  the story is that you know what have we  done we have just done a factor of  50,000 okay if you looked at the gas  economy okay of a jumbo jet okay and  getting the kind of performance that we  just got in terms of miles per gallon  you would be able to run a jumbo jet on  a on a little Vespa scooter or whatever  type of scooter that is okay that's how  much we've been able to do it you gently  let me just caution you won't see the  magnitude of a performance improvement  that we obtained for matrix  multiplication okay that turns out to be  one where it's a really good example  because it's so dramatic but we will see  some substantial numbers and in  particular in sixty one seventy two  you'll learn how to print this currency  of performance all by yourself so that  you don't have to take somebody else's  library you can you know say oh no I'm  an engineer that let me mention one  other thing you  this course we're going to focus on  multi-core computing we are not in  particular going to be doing GPUs or  file systems or network performance in  the real world those are hugely  important ok what we found however is  that it's better to learn a particular  domain in particularly this particular  domain people who master who master  multi-core performance engineering in  fact go on to do these other things and  are really good at it  ok because you've learned this sort of  the core the basis the foundation  you
 

3
00:00:05,769 --> 00:00:08,019

 
4
00:00:08,019 --> 00:00:09,850

 
5
00:00:09,850 --> 00:00:10,930

 
6
00:00:10,930 --> 00:00:13,120

 
7
00:00:13,120 --> 00:00:15,160

 
8
00:00:15,160 --> 00:00:22,769

 
9
00:00:22,769 --> 00:00:27,490

 
10
00:00:27,490 --> 00:00:30,279

 
11
00:00:30,279 --> 00:00:32,740

 
12
00:00:32,740 --> 00:00:36,910

 
13
00:00:36,910 --> 00:00:40,869

 
14
00:00:40,869 --> 00:00:45,670

 
15
00:00:45,670 --> 00:00:49,420

 
16
00:00:49,420 --> 00:00:51,340

 
17
00:00:51,340 --> 00:00:53,680

 
18
00:00:53,680 --> 00:00:57,790

 
19
00:00:57,790 --> 00:01:02,290

 
20
00:01:02,290 --> 00:01:06,090

 
21
00:01:06,090 --> 00:01:08,020

 
22
00:01:08,020 --> 00:01:11,139

 
23
00:01:11,139 --> 00:01:14,050

 
24
00:01:14,050 --> 00:01:15,460

 
25
00:01:15,460 --> 00:01:18,609

 
26
00:01:18,609 --> 00:01:20,950

 
27
00:01:20,950 --> 00:01:22,660

 
28
00:01:22,660 --> 00:01:25,810

 
29
00:01:25,810 --> 00:01:27,460

 
30
00:01:27,460 --> 00:01:28,899

 
31
00:01:28,899 --> 00:01:34,810

 
32
00:01:34,810 --> 00:01:38,590

 
33
00:01:38,590 --> 00:01:40,149

 
34
00:01:40,149 --> 00:01:41,319

 
35
00:01:41,319 --> 00:01:44,530

 
36
00:01:44,530 --> 00:01:46,090

 
37
00:01:46,090 --> 00:01:47,620

 
38
00:01:47,620 --> 00:01:50,490

 
39
00:01:50,490 --> 00:01:53,109

 
40
00:01:53,109 --> 00:01:55,090

 
41
00:01:55,090 --> 00:01:58,569

 
42
00:01:58,569 --> 00:01:59,859

 
43
00:01:59,859 --> 00:02:02,289

 
44
00:02:02,289 --> 00:02:03,730

 
45
00:02:03,730 --> 00:02:06,719

 
46
00:02:06,719 --> 00:02:16,410

 
47
00:02:16,410 --> 00:02:19,509

 
48
00:02:19,509 --> 00:02:22,110

 
49
00:02:22,110 --> 00:02:25,660

 
50
00:02:25,660 --> 00:02:28,150

 
51
00:02:28,150 --> 00:02:31,330

 
52
00:02:31,330 --> 00:02:33,189

 
53
00:02:33,189 --> 00:02:35,260

 
54
00:02:35,260 --> 00:02:40,020

 
55
00:02:40,020 --> 00:02:43,390

 
56
00:02:43,390 --> 00:02:47,700

 
57
00:02:47,700 --> 00:02:50,530

 
58
00:02:50,530 --> 00:02:52,480

 
59
00:02:52,480 --> 00:02:54,160

 
60
00:02:54,160 --> 00:02:58,180

 
61
00:02:58,180 --> 00:03:02,170

 
62
00:03:02,170 --> 00:03:05,620

 
63
00:03:05,620 --> 00:03:09,010

 
64
00:03:09,010 --> 00:03:11,650

 
65
00:03:11,650 --> 00:03:13,990

 
66
00:03:13,990 --> 00:03:16,750

 
67
00:03:16,750 --> 00:03:19,120

 
68
00:03:19,120 --> 00:03:21,790

 
69
00:03:21,790 --> 00:03:25,420

 
70
00:03:25,420 --> 00:03:27,340

 
71
00:03:27,340 --> 00:03:29,530

 
72
00:03:29,530 --> 00:03:32,980

 
73
00:03:32,980 --> 00:03:35,830

 
74
00:03:35,830 --> 00:03:39,730

 
75
00:03:39,730 --> 00:03:41,080

 
76
00:03:41,080 --> 00:03:42,370

 
77
00:03:42,370 --> 00:03:43,750

 
78
00:03:43,750 --> 00:03:47,490

 
79
00:03:47,490 --> 00:03:50,890

 
80
00:03:50,890 --> 00:03:53,949

 
81
00:03:53,949 --> 00:03:57,760

 
82
00:03:57,760 --> 00:03:57,770

 
83
00:03:57,770 --> 00:03:58,770


84
00:03:58,770 --> 00:04:01,030

 
85
00:04:01,030 --> 00:04:03,010

 
86
00:04:03,010 --> 00:04:05,410

 
87
00:04:05,410 --> 00:04:07,780

 
88
00:04:07,780 --> 00:04:11,170

 
89
00:04:11,170 --> 00:04:14,980

 
90
00:04:14,980 --> 00:04:18,849

 
91
00:04:18,849 --> 00:04:20,440

 
92
00:04:20,440 --> 00:04:22,690

 
93
00:04:22,690 --> 00:04:26,140

 
94
00:04:26,140 --> 00:04:30,420

 
95
00:04:30,420 --> 00:04:35,400

 
96
00:04:35,400 --> 00:04:40,360

 
97
00:04:40,360 --> 00:04:42,100

 
98
00:04:42,100 --> 00:04:42,110

 
99
00:04:42,110 --> 00:04:43,530


100
00:04:43,530 --> 00:04:46,600

 
101
00:04:46,600 --> 00:04:48,310

 
102
00:04:48,310 --> 00:04:49,419

 
103
00:04:49,419 --> 00:04:51,669

 
104
00:04:51,669 --> 00:04:55,799

 
105
00:04:55,799 --> 00:04:58,839

 
106
00:04:58,839 --> 00:05:00,399

 
107
00:05:00,399 --> 00:05:03,999

 
108
00:05:03,999 --> 00:05:04,469

 
109
00:05:04,469 --> 00:05:09,009

 
110
00:05:09,009 --> 00:05:11,439

 
111
00:05:11,439 --> 00:05:14,049

 
112
00:05:14,049 --> 00:05:16,089

 
113
00:05:16,089 --> 00:05:22,209

 
114
00:05:22,209 --> 00:05:23,980

 
115
00:05:23,980 --> 00:05:26,919

 
116
00:05:26,919 --> 00:05:32,709

 
117
00:05:32,709 --> 00:05:36,070

 
118
00:05:36,070 --> 00:05:37,299

 
119
00:05:37,299 --> 00:05:42,579

 
120
00:05:42,579 --> 00:05:44,439

 
121
00:05:44,439 --> 00:05:47,439

 
122
00:05:47,439 --> 00:05:50,739

 
123
00:05:50,739 --> 00:05:56,679

 
124
00:05:56,679 --> 00:05:59,079

 
125
00:05:59,079 --> 00:06:01,209

 
126
00:06:01,209 --> 00:06:03,129

 
127
00:06:03,129 --> 00:06:05,649

 
128
00:06:05,649 --> 00:06:08,079

 
129
00:06:08,079 --> 00:06:11,529

 
130
00:06:11,529 --> 00:06:15,639

 
131
00:06:15,639 --> 00:06:17,290

 
132
00:06:17,290 --> 00:06:22,719

 
133
00:06:22,719 --> 00:06:24,179

 
134
00:06:24,179 --> 00:06:26,489

 
135
00:06:26,489 --> 00:06:29,049

 
136
00:06:29,049 --> 00:06:31,569

 
137
00:06:31,569 --> 00:06:33,100

 
138
00:06:33,100 --> 00:06:35,009

 
139
00:06:35,009 --> 00:06:37,329

 
140
00:06:37,329 --> 00:06:40,480

 
141
00:06:40,480 --> 00:06:44,739

 
142
00:06:44,739 --> 00:06:47,559

 
143
00:06:47,559 --> 00:06:49,149

 
144
00:06:49,149 --> 00:06:50,549

 
145
00:06:50,549 --> 00:06:53,319

 
146
00:06:53,319 --> 00:06:55,269

 
147
00:06:55,269 --> 00:06:59,920

 
148
00:06:59,920 --> 00:07:01,119

 
149
00:07:01,119 --> 00:07:02,679

 
150
00:07:02,679 --> 00:07:04,719

 
151
00:07:04,719 --> 00:07:07,929

 
152
00:07:07,929 --> 00:07:10,899

 
153
00:07:10,899 --> 00:07:12,909

 
154
00:07:12,909 --> 00:07:15,179

 
155
00:07:15,179 --> 00:07:18,459

 
156
00:07:18,459 --> 00:07:20,829

 
157
00:07:20,829 --> 00:07:22,089

 
158
00:07:22,089 --> 00:07:27,069

 
159
00:07:27,069 --> 00:07:32,289

 
160
00:07:32,289 --> 00:07:34,119

 
161
00:07:34,119 --> 00:07:37,239

 
162
00:07:37,239 --> 00:07:39,159

 
163
00:07:39,159 --> 00:07:42,850

 
164
00:07:42,850 --> 00:07:50,729

 
165
00:07:50,729 --> 00:07:55,779

 
166
00:07:55,779 --> 00:07:58,119

 
167
00:07:58,119 --> 00:08:02,939

 
168
00:08:02,939 --> 00:08:05,529

 
169
00:08:05,529 --> 00:08:08,199

 
170
00:08:08,199 --> 00:08:10,089

 
171
00:08:10,089 --> 00:08:13,809

 
172
00:08:13,809 --> 00:08:18,009

 
173
00:08:18,009 --> 00:08:22,329

 
174
00:08:22,329 --> 00:08:24,639

 
175
00:08:24,639 --> 00:08:26,229

 
176
00:08:26,229 --> 00:08:31,439

 
177
00:08:31,439 --> 00:08:37,499

 
178
00:08:37,499 --> 00:08:42,269

 
179
00:08:42,269 --> 00:08:44,980

 
180
00:08:44,980 --> 00:08:48,910

 
181
00:08:48,910 --> 00:08:52,929

 
182
00:08:52,929 --> 00:08:56,730

 
183
00:08:56,730 --> 00:08:59,860

 
184
00:08:59,860 --> 00:09:01,449

 
185
00:09:01,449 --> 00:09:03,189

 
186
00:09:03,189 --> 00:09:03,199

 
187
00:09:03,199 --> 00:09:03,579


188
00:09:03,579 --> 00:09:06,970

 
189
00:09:06,970 --> 00:09:11,350

 
190
00:09:11,350 --> 00:09:14,770

 
191
00:09:14,770 --> 00:09:17,760

 
192
00:09:17,760 --> 00:09:21,730

 
193
00:09:21,730 --> 00:09:24,340

 
194
00:09:24,340 --> 00:09:28,180

 
195
00:09:28,180 --> 00:09:31,510

 
196
00:09:31,510 --> 00:09:34,390

 
197
00:09:34,390 --> 00:09:36,970

 
198
00:09:36,970 --> 00:09:38,650

 
199
00:09:38,650 --> 00:09:42,940

 
200
00:09:42,940 --> 00:09:47,140

 
201
00:09:47,140 --> 00:09:49,720

 
202
00:09:49,720 --> 00:09:51,970

 
203
00:09:51,970 --> 00:09:54,670

 
204
00:09:54,670 --> 00:09:56,350

 
205
00:09:56,350 --> 00:09:58,560

 
206
00:09:58,560 --> 00:10:03,970

 
207
00:10:03,970 --> 00:10:07,510

 
208
00:10:07,510 --> 00:10:10,900

 
209
00:10:10,900 --> 00:10:13,450

 
210
00:10:13,450 --> 00:10:17,590

 
211
00:10:17,590 --> 00:10:20,500

 
212
00:10:20,500 --> 00:10:23,740

 
213
00:10:23,740 --> 00:10:26,230

 
214
00:10:26,230 --> 00:10:28,750

 
215
00:10:28,750 --> 00:10:31,660

 
216
00:10:31,660 --> 00:10:33,930

 
217
00:10:33,930 --> 00:10:37,690

 
218
00:10:37,690 --> 00:10:39,850

 
219
00:10:39,850 --> 00:10:42,880

 
220
00:10:42,880 --> 00:10:45,730

 
221
00:10:45,730 --> 00:10:47,680

 
222
00:10:47,680 --> 00:10:51,329

 
223
00:10:51,329 --> 00:10:54,520

 
224
00:10:54,520 --> 00:10:56,890

 
225
00:10:56,890 --> 00:10:58,840

 
226
00:10:58,840 --> 00:11:01,270

 
227
00:11:01,270 --> 00:11:02,710

 
228
00:11:02,710 --> 00:11:05,500

 
229
00:11:05,500 --> 00:11:07,450

 
230
00:11:07,450 --> 00:11:08,530

 
231
00:11:08,530 --> 00:11:10,540

 
232
00:11:10,540 --> 00:11:12,520

 
233
00:11:12,520 --> 00:11:14,470

 
234
00:11:14,470 --> 00:11:18,250

 
235
00:11:18,250 --> 00:11:20,290

 
236
00:11:20,290 --> 00:11:22,449

 
237
00:11:22,449 --> 00:11:24,340

 
238
00:11:24,340 --> 00:11:26,290

 
239
00:11:26,290 --> 00:11:27,400

 
240
00:11:27,400 --> 00:11:30,430

 
241
00:11:30,430 --> 00:11:33,910

 
242
00:11:33,910 --> 00:11:37,809

 
243
00:11:37,809 --> 00:11:40,509

 
244
00:11:40,509 --> 00:11:43,920

 
245
00:11:43,920 --> 00:11:47,559

 
246
00:11:47,559 --> 00:11:50,199

 
247
00:11:50,199 --> 00:11:54,730

 
248
00:11:54,730 --> 00:11:58,889

 
249
00:11:58,889 --> 00:12:02,350

 
250
00:12:02,350 --> 00:12:06,100

 
251
00:12:06,100 --> 00:12:07,420

 
252
00:12:07,420 --> 00:12:09,639

 
253
00:12:09,639 --> 00:12:11,939

 
254
00:12:11,939 --> 00:12:14,860

 
255
00:12:14,860 --> 00:12:19,559

 
256
00:12:19,559 --> 00:12:22,629

 
257
00:12:22,629 --> 00:12:24,850

 
258
00:12:24,850 --> 00:12:28,300

 
259
00:12:28,300 --> 00:12:31,119

 
260
00:12:31,119 --> 00:12:36,579

 
261
00:12:36,579 --> 00:12:39,009

 
262
00:12:39,009 --> 00:12:41,620

 
263
00:12:41,620 --> 00:12:44,079

 
264
00:12:44,079 --> 00:12:47,259

 
265
00:12:47,259 --> 00:12:48,460

 
266
00:12:48,460 --> 00:12:51,189

 
267
00:12:51,189 --> 00:12:53,499

 
268
00:12:53,499 --> 00:13:00,970

 
269
00:13:00,970 --> 00:13:03,309

 
270
00:13:03,309 --> 00:13:05,079

 
271
00:13:05,079 --> 00:13:07,780

 
272
00:13:07,780 --> 00:13:09,970

 
273
00:13:09,970 --> 00:13:11,590

 
274
00:13:11,590 --> 00:13:17,110

 
275
00:13:17,110 --> 00:13:18,549

 
276
00:13:18,549 --> 00:13:20,639

 
277
00:13:20,639 --> 00:13:23,110

 
278
00:13:23,110 --> 00:13:26,740

 
279
00:13:26,740 --> 00:13:30,369

 
280
00:13:30,369 --> 00:13:32,590

 
281
00:13:32,590 --> 00:13:35,079

 
282
00:13:35,079 --> 00:13:38,230

 
283
00:13:38,230 --> 00:13:40,770

 
284
00:13:40,770 --> 00:13:42,600

 
285
00:13:42,600 --> 00:13:45,240

 
286
00:13:45,240 --> 00:13:47,760

 
287
00:13:47,760 --> 00:13:49,310

 
288
00:13:49,310 --> 00:13:54,090

 
289
00:13:54,090 --> 00:13:57,440

 
290
00:13:57,440 --> 00:14:01,860

 
291
00:14:01,860 --> 00:14:04,170

 
292
00:14:04,170 --> 00:14:06,000

 
293
00:14:06,000 --> 00:14:09,840

 
294
00:14:09,840 --> 00:14:11,780

 
295
00:14:11,780 --> 00:14:14,310

 
296
00:14:14,310 --> 00:14:18,210

 
297
00:14:18,210 --> 00:14:19,470

 
298
00:14:19,470 --> 00:14:21,780

 
299
00:14:21,780 --> 00:14:27,120

 
300
00:14:27,120 --> 00:14:32,570

 
301
00:14:32,570 --> 00:14:36,510

 
302
00:14:36,510 --> 00:14:39,290

 
303
00:14:39,290 --> 00:14:42,360

 
304
00:14:42,360 --> 00:14:46,110

 
305
00:14:46,110 --> 00:14:48,810

 
306
00:14:48,810 --> 00:14:53,190

 
307
00:14:53,190 --> 00:14:57,270

 
308
00:14:57,270 --> 00:14:59,700

 
309
00:14:59,700 --> 00:15:00,840

 
310
00:15:00,840 --> 00:15:04,500

 
311
00:15:04,500 --> 00:15:07,710

 
312
00:15:07,710 --> 00:15:10,770

 
313
00:15:10,770 --> 00:15:13,100

 
314
00:15:13,100 --> 00:15:18,030

 
315
00:15:18,030 --> 00:15:20,400

 
316
00:15:20,400 --> 00:15:23,280

 
317
00:15:23,280 --> 00:15:25,470

 
318
00:15:25,470 --> 00:15:28,380

 
319
00:15:28,380 --> 00:15:31,980

 
320
00:15:31,980 --> 00:15:33,900

 
321
00:15:33,900 --> 00:15:35,570

 
322
00:15:35,570 --> 00:15:40,140

 
323
00:15:40,140 --> 00:15:43,110

 
324
00:15:43,110 --> 00:15:46,290

 
325
00:15:46,290 --> 00:15:50,250

 
326
00:15:50,250 --> 00:15:52,980

 
327
00:15:52,980 --> 00:15:54,230

 
328
00:15:54,230 --> 00:15:56,210

 
329
00:15:56,210 --> 00:15:59,150

 
330
00:15:59,150 --> 00:16:05,300

 
331
00:16:05,300 --> 00:16:10,840

 
332
00:16:10,840 --> 00:16:13,610

 
333
00:16:13,610 --> 00:16:14,900

 
334
00:16:14,900 --> 00:16:18,740

 
335
00:16:18,740 --> 00:16:20,960

 
336
00:16:20,960 --> 00:16:23,870

 
337
00:16:23,870 --> 00:16:26,720

 
338
00:16:26,720 --> 00:16:29,840

 
339
00:16:29,840 --> 00:16:32,720

 
340
00:16:32,720 --> 00:16:34,910

 
341
00:16:34,910 --> 00:16:38,240

 
342
00:16:38,240 --> 00:16:40,300

 
343
00:16:40,300 --> 00:16:45,170

 
344
00:16:45,170 --> 00:16:48,380

 
345
00:16:48,380 --> 00:16:50,180

 
346
00:16:50,180 --> 00:16:54,350

 
347
00:16:54,350 --> 00:16:58,460

 
348
00:16:58,460 --> 00:17:01,579

 
349
00:17:01,579 --> 00:17:04,460

 
350
00:17:04,460 --> 00:17:08,799

 
351
00:17:08,799 --> 00:17:11,600

 
352
00:17:11,600 --> 00:17:15,230

 
353
00:17:15,230 --> 00:17:17,840

 
354
00:17:17,840 --> 00:17:20,720

 
355
00:17:20,720 --> 00:17:25,010

 
356
00:17:25,010 --> 00:17:26,329

 
357
00:17:26,329 --> 00:17:29,180

 
358
00:17:29,180 --> 00:17:32,030

 
359
00:17:32,030 --> 00:17:34,310

 
360
00:17:34,310 --> 00:17:36,230

 
361
00:17:36,230 --> 00:17:38,810

 
362
00:17:38,810 --> 00:17:41,750

 
363
00:17:41,750 --> 00:17:43,900

 
364
00:17:43,900 --> 00:17:47,060

 
365
00:17:47,060 --> 00:17:49,850

 
366
00:17:49,850 --> 00:17:53,710

 
367
00:17:53,710 --> 00:17:56,660

 
368
00:17:56,660 --> 00:17:59,299

 
369
00:17:59,299 --> 00:18:03,909

 
370
00:18:03,909 --> 00:18:06,169

 
371
00:18:06,169 --> 00:18:07,799

 
372
00:18:07,799 --> 00:18:09,599

 
373
00:18:09,599 --> 00:18:12,989

 
374
00:18:12,989 --> 00:18:17,129

 
375
00:18:17,129 --> 00:18:19,830

 
376
00:18:19,830 --> 00:18:21,060

 
377
00:18:21,060 --> 00:18:22,469

 
378
00:18:22,469 --> 00:18:24,269

 
379
00:18:24,269 --> 00:18:26,940

 
380
00:18:26,940 --> 00:18:31,349

 
381
00:18:31,349 --> 00:18:34,589

 
382
00:18:34,589 --> 00:18:36,599

 
383
00:18:36,599 --> 00:18:40,769

 
384
00:18:40,769 --> 00:18:43,829

 
385
00:18:43,829 --> 00:18:47,070

 
386
00:18:47,070 --> 00:18:49,379

 
387
00:18:49,379 --> 00:18:52,379

 
388
00:18:52,379 --> 00:18:57,139

 
389
00:18:57,139 --> 00:19:04,320

 
390
00:19:04,320 --> 00:19:06,539

 
391
00:19:06,539 --> 00:19:11,339

 
392
00:19:11,339 --> 00:19:14,399

 
393
00:19:14,399 --> 00:19:19,019

 
394
00:19:19,019 --> 00:19:21,869

 
395
00:19:21,869 --> 00:19:21,879

 
396
00:19:21,879 --> 00:19:23,039


397
00:19:23,039 --> 00:19:24,149

 
398
00:19:24,149 --> 00:19:26,759

 
399
00:19:26,759 --> 00:19:31,799

 
400
00:19:31,799 --> 00:19:33,690

 
401
00:19:33,690 --> 00:19:34,950

 
402
00:19:34,950 --> 00:19:36,769

 
403
00:19:36,769 --> 00:19:41,099

 
404
00:19:41,099 --> 00:19:43,440

 
405
00:19:43,440 --> 00:19:45,329

 
406
00:19:45,329 --> 00:19:47,639

 
407
00:19:47,639 --> 00:19:49,739

 
408
00:19:49,739 --> 00:19:51,570

 
409
00:19:51,570 --> 00:19:54,149

 
410
00:19:54,149 --> 00:19:55,769

 
411
00:19:55,769 --> 00:19:57,180

 
412
00:19:57,180 --> 00:20:01,769

 
413
00:20:01,769 --> 00:20:03,479

 
414
00:20:03,479 --> 00:20:06,239

 
415
00:20:06,239 --> 00:20:08,039

 
416
00:20:08,039 --> 00:20:10,829

 
417
00:20:10,829 --> 00:20:13,200

 
418
00:20:13,200 --> 00:20:14,669

 
419
00:20:14,669 --> 00:20:16,649

 
420
00:20:16,649 --> 00:20:18,629

 
421
00:20:18,629 --> 00:20:20,639

 
422
00:20:20,639 --> 00:20:21,139

 
423
00:20:21,139 --> 00:20:23,899

 
424
00:20:23,899 --> 00:20:27,440

 
425
00:20:27,440 --> 00:20:31,219

 
426
00:20:31,219 --> 00:20:34,219

 
427
00:20:34,219 --> 00:20:36,409

 
428
00:20:36,409 --> 00:20:38,029

 
429
00:20:38,029 --> 00:20:40,009

 
430
00:20:40,009 --> 00:20:42,649

 
431
00:20:42,649 --> 00:20:45,019

 
432
00:20:45,019 --> 00:20:48,889

 
433
00:20:48,889 --> 00:20:54,519

 
434
00:20:54,519 --> 00:20:59,899

 
435
00:20:59,899 --> 00:21:02,869

 
436
00:21:02,869 --> 00:21:06,469

 
437
00:21:06,469 --> 00:21:11,690

 
438
00:21:11,690 --> 00:21:17,269

 
439
00:21:17,269 --> 00:21:20,779

 
440
00:21:20,779 --> 00:21:27,070

 
441
00:21:27,070 --> 00:21:30,139

 
442
00:21:30,139 --> 00:21:33,680

 
443
00:21:33,680 --> 00:21:37,310

 
444
00:21:37,310 --> 00:21:40,639

 
445
00:21:40,639 --> 00:21:43,009

 
446
00:21:43,009 --> 00:21:44,629

 
447
00:21:44,629 --> 00:21:45,589

 
448
00:21:45,589 --> 00:21:46,969

 
449
00:21:46,969 --> 00:21:49,690

 
450
00:21:49,690 --> 00:21:52,959

 
451
00:21:52,959 --> 00:21:57,489

 
452
00:21:57,489 --> 00:22:04,810

 
453
00:22:04,810 --> 00:22:08,839

 
454
00:22:08,839 --> 00:22:12,759

 
455
00:22:12,759 --> 00:22:16,099

 
456
00:22:16,099 --> 00:22:17,509

 
457
00:22:17,509 --> 00:22:18,979

 
458
00:22:18,979 --> 00:22:24,580

 
459
00:22:24,580 --> 00:22:27,109

 
460
00:22:27,109 --> 00:22:28,580

 
461
00:22:28,580 --> 00:22:29,749

 
462
00:22:29,749 --> 00:22:32,330

 
463
00:22:32,330 --> 00:22:34,279

 
464
00:22:34,279 --> 00:22:36,349

 
465
00:22:36,349 --> 00:22:37,519

 
466
00:22:37,519 --> 00:22:41,589

 
467
00:22:41,589 --> 00:22:44,779

 
468
00:22:44,779 --> 00:22:49,359

 
469
00:22:49,359 --> 00:22:53,149

 
470
00:22:53,149 --> 00:22:56,899

 
471
00:22:56,899 --> 00:23:00,399

 
472
00:23:00,399 --> 00:23:03,190

 
473
00:23:03,190 --> 00:23:05,419

 
474
00:23:05,419 --> 00:23:07,219

 
475
00:23:07,219 --> 00:23:12,469

 
476
00:23:12,469 --> 00:23:17,810

 
477
00:23:17,810 --> 00:23:22,609

 
478
00:23:22,609 --> 00:23:28,129

 
479
00:23:28,129 --> 00:23:38,899

 
480
00:23:38,899 --> 00:23:44,869

 
481
00:23:44,869 --> 00:23:46,219

 
482
00:23:46,219 --> 00:23:50,659

 
483
00:23:50,659 --> 00:23:53,239

 
484
00:23:53,239 --> 00:23:55,940

 
485
00:23:55,940 --> 00:23:59,749

 
486
00:23:59,749 --> 00:24:04,489

 
487
00:24:04,489 --> 00:24:08,869

 
488
00:24:08,869 --> 00:24:16,250

 
489
00:24:16,250 --> 00:24:19,759

 
490
00:24:19,759 --> 00:24:21,700

 
491
00:24:21,700 --> 00:24:26,360

 
492
00:24:26,360 --> 00:24:27,740

 
493
00:24:27,740 --> 00:24:29,419

 
494
00:24:29,419 --> 00:24:32,720

 
495
00:24:32,720 --> 00:24:35,509

 
496
00:24:35,509 --> 00:24:39,049

 
497
00:24:39,049 --> 00:24:41,720

 
498
00:24:41,720 --> 00:24:43,789

 
499
00:24:43,789 --> 00:24:47,690

 
500
00:24:47,690 --> 00:24:50,990

 
501
00:24:50,990 --> 00:24:53,840

 
502
00:24:53,840 --> 00:24:56,269

 
503
00:24:56,269 --> 00:24:59,509

 
504
00:24:59,509 --> 00:25:01,940

 
505
00:25:01,940 --> 00:25:06,620

 
506
00:25:06,620 --> 00:25:08,690

 
507
00:25:08,690 --> 00:25:10,940

 
508
00:25:10,940 --> 00:25:12,710

 
509
00:25:12,710 --> 00:25:15,049

 
510
00:25:15,049 --> 00:25:22,490

 
511
00:25:22,490 --> 00:25:26,509

 
512
00:25:26,509 --> 00:25:29,149

 
513
00:25:29,149 --> 00:25:33,610

 
514
00:25:33,610 --> 00:25:44,990

 
515
00:25:44,990 --> 00:25:47,700

 
516
00:25:47,700 --> 00:25:52,160

 
517
00:25:52,160 --> 00:25:54,810

 
518
00:25:54,810 --> 00:25:54,820

 
519
00:25:54,820 --> 00:25:55,320


520
00:25:55,320 --> 00:25:58,650

 
521
00:25:58,650 --> 00:26:00,090

 
522
00:26:00,090 --> 00:26:05,480

 
523
00:26:05,480 --> 00:26:11,400

 
524
00:26:11,400 --> 00:26:14,130

 
525
00:26:14,130 --> 00:26:18,240

 
526
00:26:18,240 --> 00:26:20,250

 
527
00:26:20,250 --> 00:26:23,190

 
528
00:26:23,190 --> 00:26:24,840

 
529
00:26:24,840 --> 00:26:27,690

 
530
00:26:27,690 --> 00:26:30,060

 
531
00:26:30,060 --> 00:26:31,530

 
532
00:26:31,530 --> 00:26:34,880

 
533
00:26:34,880 --> 00:26:41,690

 
534
00:26:41,690 --> 00:26:43,980

 
535
00:26:43,980 --> 00:26:45,539

 
536
00:26:45,539 --> 00:26:47,159

 
537
00:26:47,159 --> 00:26:50,669

 
538
00:26:50,669 --> 00:26:52,830

 
539
00:26:52,830 --> 00:26:55,020

 
540
00:26:55,020 --> 00:26:58,020

 
541
00:26:58,020 --> 00:26:59,669

 
542
00:26:59,669 --> 00:27:03,990

 
543
00:27:03,990 --> 00:27:06,720

 
544
00:27:06,720 --> 00:27:09,900

 
545
00:27:09,900 --> 00:27:12,570

 
546
00:27:12,570 --> 00:27:14,010

 
547
00:27:14,010 --> 00:27:16,020

 
548
00:27:16,020 --> 00:27:17,400

 
549
00:27:17,400 --> 00:27:19,799

 
550
00:27:19,799 --> 00:27:22,470

 
551
00:27:22,470 --> 00:27:24,780

 
552
00:27:24,780 --> 00:27:26,909

 
553
00:27:26,909 --> 00:27:28,950

 
554
00:27:28,950 --> 00:27:30,720

 
555
00:27:30,720 --> 00:27:33,000

 
556
00:27:33,000 --> 00:27:35,310

 
557
00:27:35,310 --> 00:27:37,470

 
558
00:27:37,470 --> 00:27:40,460

 
559
00:27:40,460 --> 00:27:43,110

 
560
00:27:43,110 --> 00:27:44,700

 
561
00:27:44,700 --> 00:27:47,520

 
562
00:27:47,520 --> 00:27:50,280

 
563
00:27:50,280 --> 00:27:52,620

 
564
00:27:52,620 --> 00:27:54,930

 
565
00:27:54,930 --> 00:27:57,390

 
566
00:27:57,390 --> 00:28:00,780

 
567
00:28:00,780 --> 00:28:01,890

 
568
00:28:01,890 --> 00:28:03,510

 
569
00:28:03,510 --> 00:28:07,020

 
570
00:28:07,020 --> 00:28:08,580

 
571
00:28:08,580 --> 00:28:10,430

 
572
00:28:10,430 --> 00:28:12,660

 
573
00:28:12,660 --> 00:28:14,760

 
574
00:28:14,760 --> 00:28:17,250

 
575
00:28:17,250 --> 00:28:19,080

 
576
00:28:19,080 --> 00:28:22,950

 
577
00:28:22,950 --> 00:28:24,540

 
578
00:28:24,540 --> 00:28:26,160

 
579
00:28:26,160 --> 00:28:28,860

 
580
00:28:28,860 --> 00:28:31,650

 
581
00:28:31,650 --> 00:28:33,360

 
582
00:28:33,360 --> 00:28:35,550

 
583
00:28:35,550 --> 00:28:39,470

 
584
00:28:39,470 --> 00:28:42,830

 
585
00:28:42,830 --> 00:28:44,850

 
586
00:28:44,850 --> 00:28:47,340

 
587
00:28:47,340 --> 00:28:53,850

 
588
00:28:53,850 --> 00:28:56,790

 
589
00:28:56,790 --> 00:28:58,200

 
590
00:28:58,200 --> 00:29:00,540

 
591
00:29:00,540 --> 00:29:04,730

 
592
00:29:04,730 --> 00:29:08,430

 
593
00:29:08,430 --> 00:29:11,580

 
594
00:29:11,580 --> 00:29:13,380

 
595
00:29:13,380 --> 00:29:18,810

 
596
00:29:18,810 --> 00:29:21,030

 
597
00:29:21,030 --> 00:29:22,440

 
598
00:29:22,440 --> 00:29:26,280

 
599
00:29:26,280 --> 00:29:27,840

 
600
00:29:27,840 --> 00:29:31,440

 
601
00:29:31,440 --> 00:29:33,000

 
602
00:29:33,000 --> 00:29:36,900

 
603
00:29:36,900 --> 00:29:42,270

 
604
00:29:42,270 --> 00:29:44,700

 
605
00:29:44,700 --> 00:29:46,140

 
606
00:29:46,140 --> 00:29:47,880

 
607
00:29:47,880 --> 00:29:51,750

 
608
00:29:51,750 --> 00:29:53,520

 
609
00:29:53,520 --> 00:29:54,840

 
610
00:29:54,840 --> 00:29:57,330

 
611
00:29:57,330 --> 00:30:00,750

 
612
00:30:00,750 --> 00:30:03,930

 
613
00:30:03,930 --> 00:30:07,539

 
614
00:30:07,539 --> 00:30:09,729

 
615
00:30:09,729 --> 00:30:15,460

 
616
00:30:15,460 --> 00:30:19,989

 
617
00:30:19,989 --> 00:30:21,879

 
618
00:30:21,879 --> 00:30:25,840

 
619
00:30:25,840 --> 00:30:31,060

 
620
00:30:31,060 --> 00:30:32,499

 
621
00:30:32,499 --> 00:30:38,619

 
622
00:30:38,619 --> 00:30:40,659

 
623
00:30:40,659 --> 00:30:44,830

 
624
00:30:44,830 --> 00:30:46,570

 
625
00:30:46,570 --> 00:30:51,430

 
626
00:30:51,430 --> 00:30:55,180

 
627
00:30:55,180 --> 00:30:58,509

 
628
00:30:58,509 --> 00:30:59,979

 
629
00:30:59,979 --> 00:31:01,330

 
630
00:31:01,330 --> 00:31:03,340

 
631
00:31:03,340 --> 00:31:06,009

 
632
00:31:06,009 --> 00:31:09,340

 
633
00:31:09,340 --> 00:31:11,169

 
634
00:31:11,169 --> 00:31:14,109

 
635
00:31:14,109 --> 00:31:16,479

 
636
00:31:16,479 --> 00:31:18,340

 
637
00:31:18,340 --> 00:31:20,950

 
638
00:31:20,950 --> 00:31:22,599

 
639
00:31:22,599 --> 00:31:24,849

 
640
00:31:24,849 --> 00:31:28,149

 
641
00:31:28,149 --> 00:31:30,789

 
642
00:31:30,789 --> 00:31:32,859

 
643
00:31:32,859 --> 00:31:34,979

 
644
00:31:34,979 --> 00:31:39,359

 
645
00:31:39,359 --> 00:31:42,399

 
646
00:31:42,399 --> 00:31:44,169

 
647
00:31:44,169 --> 00:31:46,960

 
648
00:31:46,960 --> 00:31:48,700

 
649
00:31:48,700 --> 00:31:50,590

 
650
00:31:50,590 --> 00:31:52,599

 
651
00:31:52,599 --> 00:31:55,029

 
652
00:31:55,029 --> 00:31:57,580

 
653
00:31:57,580 --> 00:32:00,129

 
654
00:32:00,129 --> 00:32:01,749

 
655
00:32:01,749 --> 00:32:02,859

 
656
00:32:02,859 --> 00:32:05,979

 
657
00:32:05,979 --> 00:32:08,979

 
658
00:32:08,979 --> 00:32:11,799

 
659
00:32:11,799 --> 00:32:13,450

 
660
00:32:13,450 --> 00:32:15,789

 
661
00:32:15,789 --> 00:32:19,989

 
662
00:32:19,989 --> 00:32:22,150

 
663
00:32:22,150 --> 00:32:24,669

 
664
00:32:24,669 --> 00:32:26,830

 
665
00:32:26,830 --> 00:32:31,539

 
666
00:32:31,539 --> 00:32:34,390

 
667
00:32:34,390 --> 00:32:36,549

 
668
00:32:36,549 --> 00:32:38,530

 
669
00:32:38,530 --> 00:32:40,120

 
670
00:32:40,120 --> 00:32:41,980

 
671
00:32:41,980 --> 00:32:45,030

 
672
00:32:45,030 --> 00:32:47,320

 
673
00:32:47,320 --> 00:32:49,780

 
674
00:32:49,780 --> 00:32:52,330

 
675
00:32:52,330 --> 00:32:54,930

 
676
00:32:54,930 --> 00:32:58,270

 
677
00:32:58,270 --> 00:33:00,400

 
678
00:33:00,400 --> 00:33:03,730

 
679
00:33:03,730 --> 00:33:07,450

 
680
00:33:07,450 --> 00:33:09,549

 
681
00:33:09,549 --> 00:33:12,700

 
682
00:33:12,700 --> 00:33:14,470

 
683
00:33:14,470 --> 00:33:16,900

 
684
00:33:16,900 --> 00:33:18,970

 
685
00:33:18,970 --> 00:33:21,240

 
686
00:33:21,240 --> 00:33:24,580

 
687
00:33:24,580 --> 00:33:26,710

 
688
00:33:26,710 --> 00:33:30,909

 
689
00:33:30,909 --> 00:33:33,610

 
690
00:33:33,610 --> 00:33:35,650

 
691
00:33:35,650 --> 00:33:38,919

 
692
00:33:38,919 --> 00:33:42,400

 
693
00:33:42,400 --> 00:33:46,060

 
694
00:33:46,060 --> 00:33:48,010

 
695
00:33:48,010 --> 00:33:49,539

 
696
00:33:49,539 --> 00:33:51,430

 
697
00:33:51,430 --> 00:33:54,400

 
698
00:33:54,400 --> 00:33:56,289

 
699
00:33:56,289 --> 00:34:00,789

 
700
00:34:00,789 --> 00:34:04,090

 
701
00:34:04,090 --> 00:34:06,159

 
702
00:34:06,159 --> 00:34:10,030

 
703
00:34:10,030 --> 00:34:13,570

 
704
00:34:13,570 --> 00:34:15,159

 
705
00:34:15,159 --> 00:34:17,379

 
706
00:34:17,379 --> 00:34:19,149

 
707
00:34:19,149 --> 00:34:21,790

 
708
00:34:21,790 --> 00:34:23,200

 
709
00:34:23,200 --> 00:34:26,290

 
710
00:34:26,290 --> 00:34:29,260

 
711
00:34:29,260 --> 00:34:33,300

 
712
00:34:33,300 --> 00:34:35,409

 
713
00:34:35,409 --> 00:34:37,869

 
714
00:34:37,869 --> 00:34:40,299

 
715
00:34:40,299 --> 00:34:43,059

 
716
00:34:43,059 --> 00:34:46,750

 
717
00:34:46,750 --> 00:34:48,460

 
718
00:34:48,460 --> 00:34:56,740

 
719
00:34:56,740 --> 00:34:58,420

 
720
00:34:58,420 --> 00:35:01,990

 
721
00:35:01,990 --> 00:35:03,579

 
722
00:35:03,579 --> 00:35:06,069

 
723
00:35:06,069 --> 00:35:08,530

 
724
00:35:08,530 --> 00:35:13,120

 
725
00:35:13,120 --> 00:35:15,760

 
726
00:35:15,760 --> 00:35:17,079

 
727
00:35:17,079 --> 00:35:21,069

 
728
00:35:21,069 --> 00:35:24,190

 
729
00:35:24,190 --> 00:35:27,400

 
730
00:35:27,400 --> 00:35:29,289

 
731
00:35:29,289 --> 00:35:32,079

 
732
00:35:32,079 --> 00:35:36,010

 
733
00:35:36,010 --> 00:35:37,630

 
734
00:35:37,630 --> 00:35:40,180

 
735
00:35:40,180 --> 00:35:42,069

 
736
00:35:42,069 --> 00:35:44,079

 
737
00:35:44,079 --> 00:35:47,230

 
738
00:35:47,230 --> 00:35:49,690

 
739
00:35:49,690 --> 00:35:52,809

 
740
00:35:52,809 --> 00:35:56,230

 
741
00:35:56,230 --> 00:35:58,240

 
742
00:35:58,240 --> 00:36:01,720

 
743
00:36:01,720 --> 00:36:04,420

 
744
00:36:04,420 --> 00:36:05,770

 
745
00:36:05,770 --> 00:36:09,599

 
746
00:36:09,599 --> 00:36:12,039

 
747
00:36:12,039 --> 00:36:15,250

 
748
00:36:15,250 --> 00:36:19,960

 
749
00:36:19,960 --> 00:36:23,620

 
750
00:36:23,620 --> 00:36:27,130

 
751
00:36:27,130 --> 00:36:29,020

 
752
00:36:29,020 --> 00:36:31,210

 
753
00:36:31,210 --> 00:36:34,960

 
754
00:36:34,960 --> 00:36:37,539

 
755
00:36:37,539 --> 00:36:39,180

 
756
00:36:39,180 --> 00:36:42,760

 
757
00:36:42,760 --> 00:36:45,050

 
758
00:36:45,050 --> 00:36:46,609

 
759
00:36:46,609 --> 00:36:55,430

 
760
00:36:55,430 --> 00:36:58,550

 
761
00:36:58,550 --> 00:37:01,670

 
762
00:37:01,670 --> 00:37:06,140

 
763
00:37:06,140 --> 00:37:08,300

 
764
00:37:08,300 --> 00:37:10,880

 
765
00:37:10,880 --> 00:37:12,830

 
766
00:37:12,830 --> 00:37:15,770

 
767
00:37:15,770 --> 00:37:21,170

 
768
00:37:21,170 --> 00:37:23,589

 
769
00:37:23,589 --> 00:37:28,190

 
770
00:37:28,190 --> 00:37:30,349

 
771
00:37:30,349 --> 00:37:32,060

 
772
00:37:32,060 --> 00:37:34,550

 
773
00:37:34,550 --> 00:37:37,430

 
774
00:37:37,430 --> 00:37:41,720

 
775
00:37:41,720 --> 00:37:45,730

 
776
00:37:45,730 --> 00:37:50,900

 
777
00:37:50,900 --> 00:37:52,609

 
778
00:37:52,609 --> 00:37:54,920

 
779
00:37:54,920 --> 00:37:56,930

 
780
00:37:56,930 --> 00:37:58,460

 
781
00:37:58,460 --> 00:38:01,099

 
782
00:38:01,099 --> 00:38:05,480

 
783
00:38:05,480 --> 00:38:09,140

 
784
00:38:09,140 --> 00:38:11,089

 
785
00:38:11,089 --> 00:38:13,490

 
786
00:38:13,490 --> 00:38:15,589

 
787
00:38:15,589 --> 00:38:17,870

 
788
00:38:17,870 --> 00:38:21,980

 
789
00:38:21,980 --> 00:38:24,790

 
790
00:38:24,790 --> 00:38:27,859

 
791
00:38:27,859 --> 00:38:30,560

 
792
00:38:30,560 --> 00:38:34,579

 
793
00:38:34,579 --> 00:38:38,030

 
794
00:38:38,030 --> 00:38:40,400

 
795
00:38:40,400 --> 00:38:42,710

 
796
00:38:42,710 --> 00:38:44,390

 
797
00:38:44,390 --> 00:38:45,940

 
798
00:38:45,940 --> 00:38:48,109

 
799
00:38:48,109 --> 00:38:50,089

 
800
00:38:50,089 --> 00:38:51,770

 
801
00:38:51,770 --> 00:38:54,349

 
802
00:38:54,349 --> 00:38:55,620

 
803
00:38:55,620 --> 00:39:00,560

 
804
00:39:00,560 --> 00:39:03,120

 
805
00:39:03,120 --> 00:39:05,940

 
806
00:39:05,940 --> 00:39:09,270

 
807
00:39:09,270 --> 00:39:12,380

 
808
00:39:12,380 --> 00:39:17,220

 
809
00:39:17,220 --> 00:39:19,080

 
810
00:39:19,080 --> 00:39:34,110

 
811
00:39:34,110 --> 00:39:35,250

 
812
00:39:35,250 --> 00:39:37,890

 
813
00:39:37,890 --> 00:39:39,210

 
814
00:39:39,210 --> 00:39:42,510

 
815
00:39:42,510 --> 00:39:52,440

 
816
00:39:52,440 --> 00:39:55,980

 
817
00:39:55,980 --> 00:39:57,780

 
818
00:39:57,780 --> 00:40:01,440

 
819
00:40:01,440 --> 00:40:07,890

 
820
00:40:07,890 --> 00:40:09,960

 
821
00:40:09,960 --> 00:40:11,970

 
822
00:40:11,970 --> 00:40:14,460

 
823
00:40:14,460 --> 00:40:17,010

 
824
00:40:17,010 --> 00:40:19,650

 
825
00:40:19,650 --> 00:40:21,960

 
826
00:40:21,960 --> 00:40:24,000

 
827
00:40:24,000 --> 00:40:26,790

 
828
00:40:26,790 --> 00:40:29,210

 
829
00:40:29,210 --> 00:40:32,460

 
830
00:40:32,460 --> 00:40:35,520

 
831
00:40:35,520 --> 00:40:38,450

 
832
00:40:38,450 --> 00:40:42,450

 
833
00:40:42,450 --> 00:40:45,930

 
834
00:40:45,930 --> 00:40:48,660

 
835
00:40:48,660 --> 00:40:50,430

 
836
00:40:50,430 --> 00:40:52,770

 
837
00:40:52,770 --> 00:40:56,310

 
838
00:40:56,310 --> 00:41:02,720

 
839
00:41:02,720 --> 00:41:08,740

 
840
00:41:08,740 --> 00:41:13,270

 
841
00:41:13,270 --> 00:41:17,590

 
842
00:41:17,590 --> 00:41:21,970

 
843
00:41:21,970 --> 00:41:24,880

 
844
00:41:24,880 --> 00:41:28,510

 
845
00:41:28,510 --> 00:41:32,440

 
846
00:41:32,440 --> 00:41:35,230

 
847
00:41:35,230 --> 00:41:39,660

 
848
00:41:39,660 --> 00:41:43,060

 
849
00:41:43,060 --> 00:41:44,620

 
850
00:41:44,620 --> 00:41:46,000

 
851
00:41:46,000 --> 00:41:47,770

 
852
00:41:47,770 --> 00:41:49,030

 
853
00:41:49,030 --> 00:41:50,440

 
854
00:41:50,440 --> 00:41:51,820

 
855
00:41:51,820 --> 00:41:53,740

 
856
00:41:53,740 --> 00:41:55,720

 
857
00:41:55,720 --> 00:41:57,640

 
858
00:41:57,640 --> 00:41:59,350

 
859
00:41:59,350 --> 00:42:01,570

 
860
00:42:01,570 --> 00:42:05,710

 
861
00:42:05,710 --> 00:42:09,130

 
862
00:42:09,130 --> 00:42:13,020

 
863
00:42:13,020 --> 00:42:15,820

 
864
00:42:15,820 --> 00:42:20,110

 
865
00:42:20,110 --> 00:42:23,410

 
866
00:42:23,410 --> 00:42:26,680

 
867
00:42:26,680 --> 00:42:28,150

 
868
00:42:28,150 --> 00:42:31,450

 
869
00:42:31,450 --> 00:42:33,310

 
870
00:42:33,310 --> 00:42:35,380

 
871
00:42:35,380 --> 00:42:38,230

 
872
00:42:38,230 --> 00:42:42,340

 
873
00:42:42,340 --> 00:42:46,810

 
874
00:42:46,810 --> 00:42:50,230

 
875
00:42:50,230 --> 00:42:53,410

 
876
00:42:53,410 --> 00:42:56,980

 
877
00:42:56,980 --> 00:42:59,680

 
878
00:42:59,680 --> 00:43:02,740

 
879
00:43:02,740 --> 00:43:05,530

 
880
00:43:05,530 --> 00:43:08,490

 
881
00:43:08,490 --> 00:43:12,690

 
882
00:43:12,690 --> 00:43:17,140

 
883
00:43:17,140 --> 00:43:21,340

 
884
00:43:21,340 --> 00:43:22,660

 
885
00:43:22,660 --> 00:43:25,839

 
886
00:43:25,839 --> 00:43:28,900

 
887
00:43:28,900 --> 00:43:31,630

 
888
00:43:31,630 --> 00:43:33,460

 
889
00:43:33,460 --> 00:43:35,710

 
890
00:43:35,710 --> 00:43:38,500

 
891
00:43:38,500 --> 00:43:43,210

 
892
00:43:43,210 --> 00:43:45,849

 
893
00:43:45,849 --> 00:43:47,829

 
894
00:43:47,829 --> 00:43:49,960

 
895
00:43:49,960 --> 00:43:54,329

 
896
00:43:54,329 --> 00:44:00,279

 
897
00:44:00,279 --> 00:44:02,049

 
898
00:44:02,049 --> 00:44:06,970

 
899
00:44:06,970 --> 00:44:08,829

 
900
00:44:08,829 --> 00:44:11,200

 
901
00:44:11,200 --> 00:44:13,990

 
902
00:44:13,990 --> 00:44:15,759

 
903
00:44:15,759 --> 00:44:17,890

 
904
00:44:17,890 --> 00:44:21,279

 
905
00:44:21,279 --> 00:44:23,259

 
906
00:44:23,259 --> 00:44:26,710

 
907
00:44:26,710 --> 00:44:29,620

 
908
00:44:29,620 --> 00:44:36,830

 
909
00:44:36,830 --> 00:44:39,260

 
910
00:44:39,260 --> 00:44:41,210

 
911
00:44:41,210 --> 00:44:44,960

 
912
00:44:44,960 --> 00:44:48,110

 
913
00:44:48,110 --> 00:44:49,940

 
914
00:44:49,940 --> 00:44:53,000

 
915
00:44:53,000 --> 00:44:55,070

 
916
00:44:55,070 --> 00:44:58,040

 
917
00:44:58,040 --> 00:45:01,940

 
918
00:45:01,940 --> 00:45:03,500

 
919
00:45:03,500 --> 00:45:08,050

 
920
00:45:08,050 --> 00:45:12,500

 
921
00:45:12,500 --> 00:45:15,200

 
922
00:45:15,200 --> 00:45:21,710

 
923
00:45:21,710 --> 00:45:23,900

 
924
00:45:23,900 --> 00:45:26,090

 
925
00:45:26,090 --> 00:45:28,280

 
926
00:45:28,280 --> 00:45:30,080

 
927
00:45:30,080 --> 00:45:32,030

 
928
00:45:32,030 --> 00:45:32,040

 
929
00:45:32,040 --> 00:45:32,510


930
00:45:32,510 --> 00:45:34,400

 
931
00:45:34,400 --> 00:45:37,700

 
932
00:45:37,700 --> 00:45:39,800

 
933
00:45:39,800 --> 00:45:41,480

 
934
00:45:41,480 --> 00:45:46,220

 
935
00:45:46,220 --> 00:45:49,250

 
936
00:45:49,250 --> 00:45:52,160

 
937
00:45:52,160 --> 00:45:57,200

 
938
00:45:57,200 --> 00:45:59,150

 
939
00:45:59,150 --> 00:46:00,590

 
940
00:46:00,590 --> 00:46:03,290

 
941
00:46:03,290 --> 00:46:05,120

 
942
00:46:05,120 --> 00:46:07,820

 
943
00:46:07,820 --> 00:46:10,220

 
944
00:46:10,220 --> 00:46:14,840

 
945
00:46:14,840 --> 00:46:16,490

 
946
00:46:16,490 --> 00:46:20,780

 
947
00:46:20,780 --> 00:46:24,590

 
948
00:46:24,590 --> 00:46:28,340

 
949
00:46:28,340 --> 00:46:30,260

 
950
00:46:30,260 --> 00:46:33,770

 
951
00:46:33,770 --> 00:46:37,070

 
952
00:46:37,070 --> 00:46:39,950

 
953
00:46:39,950 --> 00:46:41,660

 
954
00:46:41,660 --> 00:46:43,760

 
955
00:46:43,760 --> 00:46:46,550

 
956
00:46:46,550 --> 00:46:49,100

 
957
00:46:49,100 --> 00:46:55,170

 
958
00:46:55,170 --> 00:46:57,270

 
959
00:46:57,270 --> 00:46:59,750

 
960
00:46:59,750 --> 00:47:02,300

 
961
00:47:02,300 --> 00:47:05,700

 
962
00:47:05,700 --> 00:47:07,650

 
963
00:47:07,650 --> 00:47:14,970

 
964
00:47:14,970 --> 00:47:19,170

 
965
00:47:19,170 --> 00:47:22,170

 
966
00:47:22,170 --> 00:47:24,900

 
967
00:47:24,900 --> 00:47:26,430

 
968
00:47:26,430 --> 00:47:33,800

 
969
00:47:33,800 --> 00:47:38,460

 
970
00:47:38,460 --> 00:47:40,200

 
971
00:47:40,200 --> 00:47:43,170

 
972
00:47:43,170 --> 00:47:45,950

 
973
00:47:45,950 --> 00:47:48,330

 
974
00:47:48,330 --> 00:47:50,400

 
975
00:47:50,400 --> 00:47:53,820

 
976
00:47:53,820 --> 00:47:55,410

 
977
00:47:55,410 --> 00:47:57,660

 
978
00:47:57,660 --> 00:48:01,080

 
979
00:48:01,080 --> 00:48:04,770

 
980
00:48:04,770 --> 00:48:06,810

 
981
00:48:06,810 --> 00:48:08,640

 
982
00:48:08,640 --> 00:48:10,170

 
983
00:48:10,170 --> 00:48:11,490

 
984
00:48:11,490 --> 00:48:14,670

 
985
00:48:14,670 --> 00:48:16,170

 
986
00:48:16,170 --> 00:48:18,690

 
987
00:48:18,690 --> 00:48:21,510

 
988
00:48:21,510 --> 00:48:23,690

 
989
00:48:23,690 --> 00:48:26,460

 
990
00:48:26,460 --> 00:48:28,320

 
991
00:48:28,320 --> 00:48:32,250

 
992
00:48:32,250 --> 00:48:35,670

 
993
00:48:35,670 --> 00:48:38,010

 
994
00:48:38,010 --> 00:48:40,950

 
995
00:48:40,950 --> 00:48:42,180

 
996
00:48:42,180 --> 00:48:44,190

 
997
00:48:44,190 --> 00:48:47,160

 
998
00:48:47,160 --> 00:48:49,080

 
999
00:48:49,080 --> 00:48:51,470

 
1000
00:48:51,470 --> 00:48:55,380

 
1001
00:48:55,380 --> 00:48:59,850

 
1002
00:48:59,850 --> 00:49:01,760

 
1003
00:49:01,760 --> 00:49:05,520

 
1004
00:49:05,520 --> 00:49:08,850

 
1005
00:49:08,850 --> 00:49:12,650

 
1006
00:49:12,650 --> 00:49:14,670

 
1007
00:49:14,670 --> 00:49:18,180

 
1008
00:49:18,180 --> 00:49:20,490

 
1009
00:49:20,490 --> 00:49:22,170

 
1010
00:49:22,170 --> 00:49:24,120

 
1011
00:49:24,120 --> 00:49:26,700

 
1012
00:49:26,700 --> 00:49:29,940

 
1013
00:49:29,940 --> 00:49:32,280

 
1014
00:49:32,280 --> 00:49:33,350

 
1015
00:49:33,350 --> 00:49:40,920

 
1016
00:49:40,920 --> 00:49:42,030

 
1017
00:49:42,030 --> 00:49:44,790

 
1018
00:49:44,790 --> 00:49:47,250

 
1019
00:49:47,250 --> 00:49:49,890

 
1020
00:49:49,890 --> 00:49:52,830

 
1021
00:49:52,830 --> 00:49:55,290

 
1022
00:49:55,290 --> 00:49:57,150

 
1023
00:49:57,150 --> 00:50:00,120

 
1024
00:50:00,120 --> 00:50:03,530

 
1025
00:50:03,530 --> 00:50:07,170

 
1026
00:50:07,170 --> 00:50:09,060

 
1027
00:50:09,060 --> 00:50:11,490

 
1028
00:50:11,490 --> 00:50:16,110

 
1029
00:50:16,110 --> 00:50:18,300

 
1030
00:50:18,300 --> 00:50:23,460

 
1031
00:50:23,460 --> 00:50:25,290

 
1032
00:50:25,290 --> 00:50:27,500

 
1033
00:50:27,500 --> 00:50:27,510

 
1034
00:50:27,510 --> 00:50:29,760


1035
00:50:29,760 --> 00:50:33,000

 
1036
00:50:33,000 --> 00:50:35,190

 
1037
00:50:35,190 --> 00:50:36,870

 
1038
00:50:36,870 --> 00:50:41,070

 
1039
00:50:41,070 --> 00:50:43,500

 
1040
00:50:43,500 --> 00:50:45,510

 
1041
00:50:45,510 --> 00:50:47,070

 
1042
00:50:47,070 --> 00:50:49,130

 
1043
00:50:49,130 --> 00:50:52,500

 
1044
00:50:52,500 --> 00:50:55,230

 
1045
00:50:55,230 --> 00:50:58,470

 
1046
00:50:58,470 --> 00:51:00,120

 
1047
00:51:00,120 --> 00:51:04,020

 
1048
00:51:04,020 --> 00:51:09,860

 
1049
00:51:09,860 --> 00:51:13,980

 
1050
00:51:13,980 --> 00:51:15,690

 
1051
00:51:15,690 --> 00:51:18,210

 
1052
00:51:18,210 --> 00:51:23,039

 
1053
00:51:23,039 --> 00:51:24,420

 
1054
00:51:24,420 --> 00:51:26,870

 
1055
00:51:26,870 --> 00:51:29,400

 
1056
00:51:29,400 --> 00:51:31,799

 
1057
00:51:31,799 --> 00:51:35,039

 
1058
00:51:35,039 --> 00:51:40,440

 
1059
00:51:40,440 --> 00:51:42,329

 
1060
00:51:42,329 --> 00:51:45,329

 
1061
00:51:45,329 --> 00:51:48,720

 
1062
00:51:48,720 --> 00:51:50,880

 
1063
00:51:50,880 --> 00:51:53,370

 
1064
00:51:53,370 --> 00:51:55,410

 
1065
00:51:55,410 --> 00:51:57,240

 
1066
00:51:57,240 --> 00:52:02,010

 
1067
00:52:02,010 --> 00:52:06,230

 
1068
00:52:06,230 --> 00:52:09,809

 
1069
00:52:09,809 --> 00:52:12,809

 
1070
00:52:12,809 --> 00:52:18,210

 
1071
00:52:18,210 --> 00:52:19,950

 
1072
00:52:19,950 --> 00:52:23,210

 
1073
00:52:23,210 --> 00:52:26,670

 
1074
00:52:26,670 --> 00:52:31,920

 
1075
00:52:31,920 --> 00:52:33,180

 
1076
00:52:33,180 --> 00:52:36,089

 
1077
00:52:36,089 --> 00:52:38,760

 
1078
00:52:38,760 --> 00:52:43,140

 
1079
00:52:43,140 --> 00:52:46,079

 
1080
00:52:46,079 --> 00:52:50,039

 
1081
00:52:50,039 --> 00:52:51,510

 
1082
00:52:51,510 --> 00:52:52,920

 
1083
00:52:52,920 --> 00:52:54,510

 
1084
00:52:54,510 --> 00:52:56,329

 
1085
00:52:56,329 --> 00:53:03,930

 
1086
00:53:03,930 --> 00:53:05,640

 
1087
00:53:05,640 --> 00:53:07,620

 
1088
00:53:07,620 --> 00:53:11,490

 
1089
00:53:11,490 --> 00:53:13,920

 
1090
00:53:13,920 --> 00:53:17,069

 
1091
00:53:17,069 --> 00:53:18,630

 
1092
00:53:18,630 --> 00:53:20,490

 
1093
00:53:20,490 --> 00:53:22,260

 
1094
00:53:22,260 --> 00:53:24,539

 
1095
00:53:24,539 --> 00:53:26,609

 
1096
00:53:26,609 --> 00:53:28,109

 
1097
00:53:28,109 --> 00:53:29,400

 
1098
00:53:29,400 --> 00:53:31,320

 
1099
00:53:31,320 --> 00:53:34,080

 
1100
00:53:34,080 --> 00:53:35,610

 
1101
00:53:35,610 --> 00:53:37,980

 
1102
00:53:37,980 --> 00:53:39,930

 
1103
00:53:39,930 --> 00:53:42,300

 
1104
00:53:42,300 --> 00:53:45,270

 
1105
00:53:45,270 --> 00:53:48,060

 
1106
00:53:48,060 --> 00:53:50,010

 
1107
00:53:50,010 --> 00:53:52,230

 
1108
00:53:52,230 --> 00:53:53,910

 
1109
00:53:53,910 --> 00:53:56,460

 
1110
00:53:56,460 --> 00:53:57,990

 
1111
00:53:57,990 --> 00:54:01,320

 
1112
00:54:01,320 --> 00:54:05,030

 
1113
00:54:05,030 --> 00:54:07,740

 
1114
00:54:07,740 --> 00:54:10,260

 
1115
00:54:10,260 --> 00:54:12,320

 
1116
00:54:12,320 --> 00:54:17,910

 
1117
00:54:17,910 --> 00:54:19,320

 
1118
00:54:19,320 --> 00:54:23,190

 
1119
00:54:23,190 --> 00:54:25,470

 
1120
00:54:25,470 --> 00:54:28,970

 
1121
00:54:28,970 --> 00:54:32,520

 
1122
00:54:32,520 --> 00:54:34,950

 
1123
00:54:34,950 --> 00:54:38,040

 
1124
00:54:38,040 --> 00:54:40,320

 
1125
00:54:40,320 --> 00:54:43,020

 
1126
00:54:43,020 --> 00:54:44,820

 
1127
00:54:44,820 --> 00:54:47,400

 
1128
00:54:47,400 --> 00:54:49,730

 
1129
00:54:49,730 --> 00:54:52,620

 
1130
00:54:52,620 --> 00:54:54,000

 
1131
00:54:54,000 --> 00:54:56,850

 
1132
00:54:56,850 --> 00:55:01,440

 
1133
00:55:01,440 --> 00:55:01,450

 
1134
00:55:01,450 --> 00:55:10,380


1135
00:55:10,380 --> 00:55:14,110

 
1136
00:55:14,110 --> 00:55:16,630

 
1137
00:55:16,630 --> 00:55:19,180

 
1138
00:55:19,180 --> 00:55:20,410

 
1139
00:55:20,410 --> 00:55:22,300

 
1140
00:55:22,300 --> 00:55:29,920

 
1141
00:55:29,920 --> 00:55:32,470

 
1142
00:55:32,470 --> 00:55:35,110

 
1143
00:55:35,110 --> 00:55:40,660

 
1144
00:55:40,660 --> 00:55:44,200

 
1145
00:55:44,200 --> 00:55:46,120

 
1146
00:55:46,120 --> 00:55:47,680

 
1147
00:55:47,680 --> 00:55:49,870

 
1148
00:55:49,870 --> 00:55:52,570

 
1149
00:55:52,570 --> 00:55:56,500

 
1150
00:55:56,500 --> 00:55:59,290

 
1151
00:55:59,290 --> 00:56:01,690

 
1152
00:56:01,690 --> 00:56:03,820

 
1153
00:56:03,820 --> 00:56:05,880

 
1154
00:56:05,880 --> 00:56:10,300

 
1155
00:56:10,300 --> 00:56:13,950

 
1156
00:56:13,950 --> 00:56:17,410

 
1157
00:56:17,410 --> 00:56:19,450

 
1158
00:56:19,450 --> 00:56:20,740

 
1159
00:56:20,740 --> 00:56:22,030

 
1160
00:56:22,030 --> 00:56:26,260

 
1161
00:56:26,260 --> 00:56:31,690

 
1162
00:56:31,690 --> 00:56:33,760

 
1163
00:56:33,760 --> 00:56:35,740

 
1164
00:56:35,740 --> 00:56:37,870

 
1165
00:56:37,870 --> 00:56:40,000

 
1166
00:56:40,000 --> 00:56:43,470

 
1167
00:56:43,470 --> 00:56:48,010

 
1168
00:56:48,010 --> 00:56:49,420

 
1169
00:56:49,420 --> 00:56:51,400

 
1170
00:56:51,400 --> 00:56:53,560

 
1171
00:56:53,560 --> 00:56:55,720

 
1172
00:56:55,720 --> 00:56:57,670

 
1173
00:56:57,670 --> 00:57:02,080

 
1174
00:57:02,080 --> 00:57:03,700

 
1175
00:57:03,700 --> 00:57:04,960

 
1176
00:57:04,960 --> 00:57:07,570

 
1177
00:57:07,570 --> 00:57:09,700

 
1178
00:57:09,700 --> 00:57:11,800

 
1179
00:57:11,800 --> 00:57:14,290

 
1180
00:57:14,290 --> 00:57:16,790

 
1181
00:57:16,790 --> 00:57:21,590

 
1182
00:57:21,590 --> 00:57:25,270

 
1183
00:57:25,270 --> 00:57:30,320

 
1184
00:57:30,320 --> 00:57:34,370

 
1185
00:57:34,370 --> 00:57:37,370

 
1186
00:57:37,370 --> 00:57:39,500

 
1187
00:57:39,500 --> 00:57:44,630

 
1188
00:57:44,630 --> 00:57:46,160

 
1189
00:57:46,160 --> 00:57:51,130

 
1190
00:57:51,130 --> 00:57:58,130

 
1191
00:57:58,130 --> 00:58:00,260

 
1192
00:58:00,260 --> 00:58:01,730

 
1193
00:58:01,730 --> 00:58:03,800

 
1194
00:58:03,800 --> 00:58:06,770

 
1195
00:58:06,770 --> 00:58:09,940

 
1196
00:58:09,940 --> 00:58:17,720

 
1197
00:58:17,720 --> 00:58:19,940

 
1198
00:58:19,940 --> 00:58:25,190

 
1199
00:58:25,190 --> 00:58:27,650

 
1200
00:58:27,650 --> 00:58:30,220

 
1201
00:58:30,220 --> 00:58:34,790

 
1202
00:58:34,790 --> 00:58:40,040

 
1203
00:58:40,040 --> 00:58:42,170

 
1204
00:58:42,170 --> 00:58:44,920

 
1205
00:58:44,920 --> 00:58:48,500

 
1206
00:58:48,500 --> 00:58:52,550

 
1207
00:58:52,550 --> 00:58:54,980

 
1208
00:58:54,980 --> 00:58:57,710

 
1209
00:58:57,710 --> 00:58:59,840

 
1210
00:58:59,840 --> 00:59:01,250

 
1211
00:59:01,250 --> 00:59:02,810

 
1212
00:59:02,810 --> 00:59:05,330

 
1213
00:59:05,330 --> 00:59:08,180

 
1214
00:59:08,180 --> 00:59:11,720

 
1215
00:59:11,720 --> 00:59:14,690

 
1216
00:59:14,690 --> 00:59:16,160

 
1217
00:59:16,160 --> 00:59:18,320

 
1218
00:59:18,320 --> 00:59:21,590

 
1219
00:59:21,590 --> 00:59:23,300

 
1220
00:59:23,300 --> 00:59:27,170

 
1221
00:59:27,170 --> 00:59:29,060

 
1222
00:59:29,060 --> 00:59:30,160

 
1223
00:59:30,160 --> 00:59:31,420

 
1224
00:59:31,420 --> 00:59:34,060

 
1225
00:59:34,060 --> 00:59:36,790

 
1226
00:59:36,790 --> 00:59:40,000

 
1227
00:59:40,000 --> 00:59:41,920

 
1228
00:59:41,920 --> 00:59:45,100

 
1229
00:59:45,100 --> 00:59:47,890

 
1230
00:59:47,890 --> 00:59:49,600

 
1231
00:59:49,600 --> 00:59:53,470

 
1232
00:59:53,470 --> 00:59:58,120

 
1233
00:59:58,120 --> 01:00:01,750

 
1234
01:00:01,750 --> 01:00:03,190

 
1235
01:00:03,190 --> 01:00:04,960

 
1236
01:00:04,960 --> 01:00:14,830

 
1237
01:00:14,830 --> 01:00:14,840

 
1238
01:00:14,840 --> 01:00:16,900