1
00:00:01,069 --> 00:00:03,189

the following content is provided under  a Creative Commons license your support  will help MIT OpenCourseWare continue to  offer high quality educational resources  for free  to make a donation or to view additional  materials from hundreds of MIT courses  visit MIT opencourseware at ocw.mit.edu  all right so today we're gonna talk  about cache oblivious algorithms who  remembers what a cache oblivious  algorithm is so what is a cache  oblivious algorithm oblivious to cache  size so a cache oblivious algorithm is  an algorithm that automatically Tunes  for the cache size on your machine so it  achieves good cache efficiency and the  code doesn't need to have any knowledge  of the cache parameters of your machine  in contrast a cache aware algorithm  would actually know the parameters of  the cache sizes on your machine and the  code would actually put the sizes of the  cache inside so today we're gonna talk a  lot more about cache oblivious  algorithms last time we talked about one  cache oblivious algorithm that was for  matrix multiplication and today we're  gonna talk about some other ones so  first example I want to talk about is  simulation of heat diffusion so here's a  famous equation known as the heat  equation and this equation is in two  dimensions and we want to simulate this  function you that has three parameters t  x and y t is the time step x and y are  the XY coordinates of the 2d space and  we want to know the temperature for each  x y coordinate for any point in time T  and the 2d heat equation can be modeled  using a differential equation so how  many of you have seen differential  equations before okay so good most most  of you so here I'm showing the equation  in two dimensions but you can get  similar equations for higher dimensions  so here it says the partial derivative  of U with respect to T is equal to alpha  alpha is what's called the thermal  diffusivity it's a constant times the  sum of the second partial derivative of  U with respect to X and the partial  second partial derivative  you with respect to Y so this is a  pretty famous equation and you can see  that we have a single derivative on the  left side and a double d'oeurves on the  right side and how do we actually write  code to simulate this to deihi process  so oftentimes in scientific computing  people will come up with these  differential equations just to describe  physical processes and then they want to  come up with efficient code to actually  simulate the physical process okay so  here's an example of a 2d heat diffusion  so let's say we started out with  configuration on the left and here the  the color corresponds to a temperature  so a brighter color means it's hotter  yellow is the hottest and blue is the  coldest and on the Left we just have six  172 which is the course number so if you  didn't know that you're probably in the  wrong class and then afterwards we're  gonna run it for a couple time steps and  then the heat is gonna diffuse to the  neighboring regions of the 2d space so  after we run it for a couple steps we  might get the configuration on the right  where the heat is more spread out now  and oftentimes we want to run this  simulation for a number of time steps  until the distribution of heat converges  so it becomes stable and it doesn't  change by much anymore and then we stop  the simulation so this is the 1d heat  equation I showed you at a 2d one  earlier but we're actually going to  generate code for the 1d heat equation  since it's simpler but all the ideas  generalize to higher dimensions and  here's the range of colors corresponding  to temperatures so the hottest colors on  the left and the coldest colors on the  right and if you had a heat source  that's on the left-hand side of this bar  then this might possibly be a stable  distribution so if you keep running the  simulation you might get a stable  distribution of heat that looks like  this  okay so how do we actually write code to  simulate this differential equation so  one commonly used method is known as  finite difference approximation so we're  going to approximate the partial  derivative of U with respect to each of  its coordinates so the partial  derivative of U with respect to T is  approximately equal to U of T plus delta  T where delta T is some small small  value and X and then minus U of T X and  then that's all divided by delta T so  how many of you have seen this  approximation method before from your  calculus class okay good so as you bring  the value of delta T down to zero then  the thing on the right hand side  approaches the true partial derivative  so that's a partial derivative with  respect to T we also need to get the  partial derivative with respect to X and  here I'm saying that partial derivative  with respect to X is approximately equal  to UT of X plus Delta x over 2 minus UT  of X minus Delta X over 2 all divided by  Delta X so notice here that instead of  adding Delta X in the first term and not  adding anything in the second term I'm  actually adding Delta X over 2 in the  first term I'm subtracting Delta X over  2 in the second term and it turns out  that I can do this with the  approximation method and it still turns  out to be valid as long as the two terms  that I'm putting in there their  difference is Delta X so here the  difference is Delta X and I can  basically decide how to split up this  Delta X term among the two things in the  numerator and the reason why ice  I chose Delta X over 2 here is because  the math is just gonna work out nicely  and it's gonna give us cleaner code this  is just a first partial derivative with  respect to X I actually need the second  partial derivative since the right-hand  side of this equation has a second  partial derivative  so this is what the second partial  derivative looks like so I just take the  partial derivative with respect to X of  each of the terms in my numerator from  the equation above and then now I can  actually plug in the value of this  partial derivative by applying the  equation above using the arguments T and  X plus Delta x over 2 and similarly for  the second term so for the first term  when I plug it into the equation for the  partial derivative of with respect to X  I'm just gonna get UT of X plus Delta X  minus UT x and then for the second term  I'm going to get UT of X minus Delta X  and then I subtract another factor of UT  X so that's why I'm subtracting two  times UT x in the numerator here and  then the partial derivative of each of  the things in the numerator also have to  divide by this Delta X term so on the  denominator I get Delta x squared so now  I have the second partial derivative  with respect to X and I also have the  first partial derivative with respect to  T so I can just plug them into my  equation above so on the left hand side  I just have this term here and I'm  multiplying by this alpha constant and  then on this term just comes from here  so this is what the 1d heat equation  reduces to using the finite difference  approximation method so any questions on  this  okay so how do we actually write code to  simulate this equation here so we're  gonna use what's called a stencil  computation and here I'm going to set  Delta X and delta T equal to one just  for simplicity but in general you can  set them to whatever you want you can  make them smaller to have a more  fine-grained simulation so my set Delta  X and delta T equal to 1 then the  denominators of these two terms just  become one and I don't need to worry  about them and then I'm going to  represent my 2d space using a 2d matrix  where the horizontal axis represents  values of X and the vertical axis  represents values of T and I want to  fill in all these entries that have a  black dot in it the ones with the orange  dot those are my boundaries so those are  actually fixed throughout the  computation so I'm not gonna do any  computation on those those are just  given to me as input so they's they  could be heat sources if we're doing the  heat simulation and then now I can  actually write code to simulate simulate  this equation so if I want to compute U  of T plus 1 X I can just go up here and  I see that that's equal to this thing  over here and then I bring the negative  UT X term to the right so I get UT of X  plus alpha times u T X plus 1 minus 2  times u T X plus u T X minus 1 you know  as I said before we just want to keep  iterating this until the temperatures  becomes stable so I'm going to proceed  in time which in time is going up here  and to compute one of these points so  let's say this is UT plus 1x I need to  know the value of UT X which is just the  thing below me in the matrix and I also  need to know UT X + 1 + UT X minus 1 and  those are just  blowme and diagonal to either the left  or the right side  so each value here just depends on three  other values and this is called a  three-point stance so this is the  pattern that this equation is  representing and in general a stencil  computation is going to update each  point in an array using a fixed pattern  this is called a stencil so I'm going to  do the same thing for all of the other  points and here I'm going to compute all  of the values of X for a given time step  and then I move on to the next time step  and then I keep doing this until the the  distribution of temperatures becomes  stable and then I'm done  okay so these stencil computations are  widely used in scientific computing  they're used for weather simulations  stock market simulations fluid dynamics  image processing probability and so on  so they're all they're used all over the  place and science so this is a very  important concept to know so let's say I  I just ran the code as I showed you in  the animation so I completed one row at  a time before I moved on to the next row  how would this code perform with respect  to caching  yes  yeah so if ax is small this would do  pretty well but what if X is much larger  than the size of your cache yeah you  would do badly and and why is that  yeah so it turns out that there there  could be some reuse here because when I  compute the second row I'm actually  using some values that are computed in  the first row but if the row is much  larger than the cache size by the time I  get to the second row the values that  are loaded into cache from the first row  without then evict 'add and therefore  I'm going to suffer a cache miss again  for the second row for the values I need  to load and even though they could have  been used if we made our code have good  locality another question I have is if  we only cared about the values of X at  the most recent time stub do we actually  have to keep around this whole 2d matrix  or can we get by with less storage  yeah so how many how many rows would I  have to keep around if I only cared  about the most recent time stub yeah  - and why is that right so I need to  keep around 2 rows because I need the  row from the previous time step in order  to compute the values in the current  time stuff and after the current time  step I can just swap the roles of the  two rows that I'm keeping around and  then reused a previous row for the next  one yes  so I I need to know when I'm competing  in a second row I need to keep around  all of these values that are computed in  the first row because these values get  fed to one of the computations in the  second row so I need to actually keep  all of them around  yeah  oh I see I see what you're saying yeah  so that's actually good observation so  you only need to keep a constant amount  more storage because you'll just be  overwriting the values as you go through  the row so so if you keep around one row  some of the values would be for the  current times have been some of them  would be from the previous time steps a  good observation yes okay so so that  code as we saw it wasn't very cache  efficient you could make it cache  efficient using tiling but we're gonna  go straight to the cache oblivious  algorithm because it's much cleaner so  let's recall the ideal cache model we  talked about this in the previous  lecture so here we have a two-level  hierarchy we have a cache and then we  have the main memory and the cache kit  the cache has size of M bites and a  cache line is B bytes so you can keep  around M over B cache lines in your  cache and if something's in cash and you  operate on it then it doesn't incur any  cache misses but if you have to go to  main memory to load the cache line in  then you incur one cache miss the ideal  cache model assumes that the cache is  fully associative so any cache line can  go anywhere in the cache and it also  assumes either an optimal of mission  replacement policy or the LRU policy so  the optimal omission replacement policy  knows the sequence of all future  requests to memory and when it needs to  evict something it's going to pick the  thing that leads to the fewest cache  misses overall to evict the LRU policy  Jesse Vicks the things the thing that  was least recently used but we saw from  the previous lecture that in terms of  asymptotic costs these two replacement  policies will give you cache misses  within a constant factor of each other  so you can use either one depending on  what's convenient  and to performance measures that we care  about when we're analyzing an algorithm  in the ideal cash model are the work in  the number of cache misses so that work  it's just a total number of operations  that the algorithm incurs and serially  this is just the ordinary running time  in a number of cache misses it's a  number of cache lines you have to  transfer between the cache and your main  memory so let's assume that we're  running an algorithm or analyzing an  algorithm in the ideal cache model and  it runs serially what kinds of cache  misses does the ideal cache model not  capture so remember we talked about  several types of cache misses and  there's one type of cache miss that this  model doesn't capture when we're running  serially  so let's assume we're running this see  really without any parallelism here so  the sharing miss has only come about  when you have parallelism yes yes so the  answer is conflict misses and why is  that why does this model not capture it  yeah so this is a fully associative  cache so any cache line can go anywhere  in the cache and you can only get  conflict misses for set associative  schemes where each cache line can only  be mapped to a particular set and if you  have too many cache lines that map to  that particular set then you're gonna  keep evicting each other even though the  rest of the cache could have space and  that's what's called a conflict miss the  ideal cache model does capture capacity  misses so therefore it's still a very  good model to use at a high level when  you're designing efficient algorithms  because I encourage use to encourage you  to optimize for spatial and temporal  locality and once you have a good  algorithm in the ideal cache model then  you can start dealing with conflict  misses using some of the strategies that  we talked about last time such as  padding or using temporary memory so any  questions on this  okay so this is the code that does the  heat simulation that we saw earlier so  it's just two for loops the nested for  loop in the outer loop we're looping  over the time-dimension in the inner  loop we're looping over the space  dimension so we're computing all of the  values of X before we move on to the  next time step and then we're storing  two rows here and we're using this trick  called even-odd trick and here's how it  works so to access the next row that we  want to compute that we just do t plus 1  mod 2 and then tax as a current row is  just T mod 2 so this is implicit or oh  we're keeping around as we progress  through time and then we're gonna set T  plus U of T plus 1 mod to X equal to  kernel of you a pointer to UT mod to X  and this kernel function is defined up  here and recall that when we're we're  actually passing a pointer to this  kernel function we can actually treat a  pointer as the beginning of an array so  we're using array notation up here  inside the kernel function so that ray W  is passed as input and then we need to  return W of 0 that's just a element at  the current pointer that we passed to  kernel and then we add alpha times W of  negative 1 that's 1 LM element before  the thing that we're pointing to minus 2  times W of 0 plus W of one W of 1 is the  next element that we're pointing to ok  so let's look at the caching behavior of  this code so we're gonna analyze the  cache complexity and we're gonna assume  the LRU replacement policy here because  we can and as we said before we're going  to loop through one entire row at a time  before we go on to the next row so the  number of cache misses I get assuming  that n  is greater than M so that the row size  is greater than the cache size the  number of cache misses is Theta of NT  over B so how do I get this cache  complexity bound here  so how many cache misses do I have to  incur for each row of this 2d space that  I'm computing yes right so I need n over  B cache misses for each row and this is  because I can load in B bytes at a time  so I benefit from spatial locality there  and then I have n elements that I need  to compute so it's theta n over B per  row and as we said before when we get to  the next row the stuff that we need from  the previous row have already been  evicted from cache so I basically have  to incur theta n over B cache misses for  every row and the number of rows I'm  going to compute as T so it's just data  of NT over B any questions on this  analysis so how many do you think we can  do better than this okay so one person  two three okay so turns out that we can  do better than this you can actually do  better with tiling but I'm not gonna do  the tiling version I want to do the  cache oblivious version and the cache  oblivious version is going to work on  trapezoidal regions in the 2d space and  recall that a trapezoid has a top base  in a bottom base and here the top base  is at t1 the bottom base is at T zero  and the height is just t1 minus t0 in  the width of a trapezoid it's just the  width of it at the midpoint between t1  and t0 so at T 1 plus T 0 over 2 so  we're going to compute all of the points  inside this trapezoid that satisfy these  inequalities here so T has to be greater  than or equal to t0 and less than t1 and  then X it's greater than or equal to x0  plus DX 0 times t minus t0 so DX 0 is  actually the inverse  slope here and then it also has to be  less than x1 plus DX 1 times t minus t0  so DX 1 is the inverse slope on the  other side and DX is 0 and DX 1 have to  be either negative 1 0 or 1 so negative  1 just corresponds to inverse slope of  negative 1 which is also a slope of  negative 1 if it's 1 then it's just a  slope or inverse low-proof 1 and then if  it's 0 then we just have a vertical line  okay so the nice property of this  trapezoid is that we can actually  compute everything inside the trapezoid  without looking outside the trapezoid so  we can compute everything here  independently of any other trapezoids  that we might be generating and we're  gonna come up with a dividing conquer  approach to execute this code so the  divide and conquer algorithm has a base  case so our base case is going to be  when the height of the trapezoid is 1  and when the height is 1 then we're just  going to compute all of the values using  a simple loop can any order of the  computation inside this loop is valid  since we have all the values in the base  of the trapezoid and we can compute the  values in the top of the trapezoid in  whatever order they don't depend on each  other so that's the base case any  questions so far  so here's one of the recursive cases it  turns out that we're gonna have two  different types of cuts the first cut is  called a space cut so I'm gonna do a  space cut if the width of the trapezoid  is greater than or equal to twice the  height so this means that the trapezoid  is too wide and I'm going to cut it  vertically more specifically I'm going  to cut it with a line with slope one  negative one going through the center of  the trapezoid and then I'm going to  traverse the trapezoid on the left side  first and then after I'm done with that  traverse the trapezoid on the right side  so can I can I actually switch the order  of this can I compute the stuff on the  right side before I do this stuff on the  left side no why is that  yeah so there's some points in the right  trapezoid that depend on the values from  the left trapezoid and so so for the  left trapezoid every point we want to  compute we already have all of its  points assuming that we get all the  values of the base points but for the  right hand side some of the values  depend on values in the left trapezoid  so we can't execute the right trapezoid  until we're done with the left trapezoid  and this is the reason why I cut this  trapezoid with a slope of negative one  instead of using a vertical vertical cut  because if I did a vertical cut then  inside both of the trapezoids I would  have points that depend on the other  trapezoid so this is one of the two cuts  it's called a space cut and it happens  when the trapezoid is too wide the other  cut is the time cut I'm gonna cut with  respect to the time dimension and this  happens when the trapezoid is too tall  so when the width is less than twice the  height of the trapezoid then what I'm  gonna do is I'm just gonna cut it with a  horizontal Zonta line through the center  and then I'm going to traverse the  bottom trapezoid first and after I'm  done with that I can traverse a top  trapezoid and again the top trapezoid  depends on some points from the bottom  trapezoid so I can't switch the order of  those any questions  okay so let's now look at the code that  implements this recursive divide and  conquer algorithm so here's the C code  it takes as input t0 and t1 these are  the coordinates of the top in the bottom  of the trapezoid or bottom and top of  the trapezoid then x0 is the left side  of the trapezoid of the base of the  trapezoid d x0 is the inverse slope and  the x1 is the right side of the bottom  base of the trapezoid and DX 1 is the  inverse slope on the right side so we're  first going to compute the height of our  trapezoid and we're gonna let LT be the  height and that's just t1 minus t0 and  if the height is 1 then we're just going  to use a for loop over all the points in  that in that height 1 trapezoid okay  we're going to call this kernel function  that we defined before and otherwise the  height is greater than 1 and we're gonna  check whether we should do a space cut  or a time cut so we do a space cut if  the trapezoid is too wide and this  condition inside the if Clause is  checking if the trapezoid is too wide  and if so then we're going to make two  recursive calls to try out this oeid and  we're gonna cut it in the middle using  this slope of negative 1 so you see the  negative ones here in the recursive  calls and otherwise we'll do a time cut  and for the time cut we just cut it in  the middle so we cut it at the value of  T that's equal to L T divided by 2 or t0  plus L T divided by 2 and again we have  two recursive calls to trapezoid ok so  even though I'm only generating slopes  of negative 1 in this recursive call  this code is gonna work even if I have  slopes of 1 and 0 because I could start  out with slopes of 1 and 0 for example  if I had a rectangular region then the  slopes are just gonna be 0 and this code  is still gonna work but most of the  slopes that I'm dealing with are going  to be a negative 1 because those are the  new slopes and I'm generating  any questions  so this code is very concise it turns  out that the even odd trick still works  here you can still keep around just two  rows because your guarantee that you're  not gonna write overwrite any value  until all the all the things that depend  on it are computed but when you're come  when you're using just two rows the  values in a particular row might not all  correspond to the same time step because  we're not finishing an entire row before  we move on to the next one  we're actually partially computing rows  okay so let's analyze the cache  complexity of this algorithm again we're  going to use the recursion tree approach  that we talked about last time and our  our code is going to split itself into  two subproblems at every level until it  gets to a leaf and each leaf represents  theta of HW points where H is Theta of W  so H is the height double use the width  and we have the property that H is equal  to theta of W because of the nature of  the algorithm when the trapezoid becomes  too wide we're going to make it less  wide by doing a space cut and if it  becomes too tall we're going to do a  horizontal cut so we're guaranteed that  the height and the width are going to be  within a constant factor of each other  when we get to the base case and each  leaf in the base case it's going to  incur theta of W over B misses because  we have to load in the base of the  trapezoid from cache from main memory  and once that's in cache we can compute  all of the other points in the trapezoid  without incurring any more cache misses  so the cache misses per leaf is just  theta W over B and we're gonna set W  equal to theta of M in an in the  analysis because that's the point when  the trapezoid fits into cache the  algorithm doesn't actually have any  knowledge of this M parameter so it's  still going to keep dividing conquering  until I gets the base case of size 1  but just for the analysis we're going to  use a base case when w is theta of M so  the number of leaves we have is Theta of  NT divided by HW because each leaf is of  size theta HW the number of internal  nodes we have is equal to a number of  leaves minus one because we have a tree  here but the internal nodes don't  contribute substantially to the cache  complexity because each of them is just  doing a constant number of cache misses  to set up the two recursive calls and  it's not doing anything more expensive  than that so we just need to compute the  number of cache misses at the leaves we  have theta of NT over HW leaves each one  of which takes theta of W over B cache  misses and we're going to multiply that  out so the W term cancels out  we just have NT over HB and we set H and  W to be theta of M so we're just left  with NT over MB as our cache complexity  yes  sure so each leaf only incurs theta W  over B cache misses because we need to  compute the values of the base of the  trapezoid and that's going to incur  theta of W over B cache misses because  it's W wide and then for everything else  it's going to fit into cache so when we  compute them we already have all of our  previous values that we want into cash  in cache so so that's why it's not gonna  incur any more cache misses does that  make sense yeah okay so this was just  analysis for one dimension but it  actually generalizes to more than one  dimensions so in general if we have D  dimensions then the cache complexity is  going to be theta of NT divided by M to  the one over D times B so if d is one  then that just reduces to NT over MB if  D is two then it's going to be NT over B  times square root of M and so on it  turns out that this bound is also  optimal so any questions so compared to  the looping code this code actually has  much better cache complexity it's saving  by a factor of M the looping code had a  cache complexity that was theta of NT  over B it didn't have an M in the  denominator okay so we're actually gonna  do a simulation now we're gonna compare  how the looping code and a trapezoid  code runs and in this simulation the  green points correspond to a cache hit  the purple points correspond to a cache  miss and we're gonna seem a fully  associative cache using the LRU  replacement policy where the cache line  size is four points and the cache size  is 32 points and we're going to set the  cache hit latency to be one cycle and  the cache miss latency to be ten cycles  so an order of magnitude slower for  cache misses and we're doing this for a  rectangular region where  and is 95 and we're doing it for 87 time  steps so let me pull out the simulation  now okay so on the left hand side we're  gonna have the looping code on the right  hand side we're going to have the  trapezoidal code so let's start this so  you can see that the looping code is  going over one row at a time where the  trapezoidal code is not doing that it's  partially computing one row and then  moving on to the next row okay I can  also show the the cuts that are  generated by the trapezoidal algorithm  have to remember how to do this so see  so so there are there's the carts that  are generated by the trapezoidal  algorithm and I can speed this up so you  can see that the trapezoidal algorithm  is incurring much fewer cache misses  than the looping code so I'll just make  this go faster and the the trapezoidal  code is going to finish while the  looping code is still slowly making its  way up to top  okay so it's finally done so any  questions on this yeah in which of the  regions so it's loading you get a cache  miss for the purple dots here and then  the cache line size is four so you get a  cache miss for the first point and then  you hit on the next three points and you  get another cache miss for the fifth  point and then you hit on the six seven  and eight points yeah so for the one  above it so we're assuming that the the  two arrays fit in cache already so we  don't actually need to load them from  memory the thing above it just depends  on the values that we have already  computed and that fits in cache those  are ready in cache this this base of the  trapezoid is already in cash and the row  right above it we just need to look at  those values does that make sense yeah  yeah any other questions on this so I  don't want to see this again or yeah  so I could let this run for the rest of  the lecture but I have more interesting  material that I want to talk about so  let's just stop after this finishes and  as you see again the looping code is  slowly making its way to the top it's  much slower than the trapezoid code okay  so that was only for one dimensions now  let's look at what happens in two  dimensions and here I'm going to show  another demo and this demo I'm actually  going to run the code for real the  previous demo was just a simulation so  this is going to happen in real time and  I'm going to simulate the heat in a 2d  space and recall that the colors  correspond to the temperatures so a  brighter color means it's hotter so let  me pull out the other demo  okay so here my mouse cursor is the  source of heat so you see that it's  making the the points around my mouse  cursor hot and then it's slowly  diffusing its way to the other points  now I can actually move this so that my  heat source changes and then the heat I  generated earlier slowly goes away okay  so so in the lower left hand corner I'm  showing the number of iterations per  second of the code and we can see that  the looping code is taking it's doing  about fifteen hundred and sixty  iterations per second let's see what  happens when we switch to the trapezoid  code so the trapezoid code is doing  about eighteen hundred and thirty  iterations per second so it's a little  bit faster but not by to much does  anyone have an idea why we're seeing  this behavior so we said that the  trapezoid code incurs many fewer cache  misses than the looping code so we would  expect it to be significantly faster but  here it's only a little bit faster yeah  right so that's that's a good point so  in 2d you're only saving a factor of  square root of M instead of M but square  root of M is still pretty big compared  to the speed-up we're getting here so  any other ideas yeah so there is a  constant factor in the trapezoidal code  but even after accounting for the Trump  for the constant factor you should still  see a better speed-up than this so even  accounting for the constant factors  what other problem might be going on  here yeah yeah so that's that's another  good observation but the caches that  we're using they still should get pretty  good good cache I mean they should still  have cache complexity that's pretty  close to the ideal cache model I mean  you might be off by a small constant  factor but not by too much yeah sorry  could you repeat that  [Music]  yes oh okay so if I move the cursor it's  probably gonna be a little bit slower  you know so gust slower by a little bit  but that doesn't really affect the  performance I can just leave my cursor  there and this is what the iterations  per second is yes  [Music]  - maybe we're seeing the crow's-feet  yeah so there is some other factor  dominating does anyone have an idea what  that factor what might be no it's not  it's not the rendering I I ran the code  without showing the graphics and  performance was similar yes  yes yeah so there could be other things  using the cache but that would be true  for both of the programs and I don't  actually have anything that's in sense  of running set for PowerPoint I don't  think that uses much of my cache all  right so let's look at why this is the  case so it turns out that the hardware  is actually helping the looping code so  the question is how come the cache  oblivious trapezoidal code can have so  many fewer cache misses but that Vantage  gained over the looping version is so  marginal it turns out that for the  looping code the hardware is actually  helping it by doing Hardware prefetching  and hardware prefetching for the looping  code is actually pretty good because the  access pattern is very regular it's just  going one row at a time so the hardware  can predict the memory access pattern of  the looping code and therefore it can  bring in the cache lines that the  looping code would need before it  actually gets to that part of the  computation so prefetching is helping  the looping code and it's not helping  the trapezoidal code that much because  the access pattern is less regular there  and prefetching does use memory  bandwidth but when you're using just a  single core you have more than enough  bandwidth to take advantage of the  hardware prefetching capabilities of the  machine but later on we'll see that when  we're running in parallel the memory  bandwidth does become more of an issue  when you have multiple processors all  using the memory yeah question  thank you  [Music]  you can do software prefetching there  are there are instructions to do  software prefetching hardware  prefetching is usually more efficient  but it's likes less flexible than the  software prefetching but here we're not  actually doing that we're just taking  advantage of hardware prefetching yeah  so we we didn't actually try that it  could benefit a little bit if we use a  little bit of software prefetching  although I think it would benefit the  looping code probably as well if we did  that yes sorry sorry uh yeah so hardware  prefetching is enabled it's always done  by the hardware on the machines that  we're using today so this was a this is  a pretty surprising result but we'll  actually see the fact of the memory  bandwidth later on when we look at the  parallel code any other questions before  I continue okay so let's now look at the  interplay between caching and  parallelism so this was a theorem that  we proved in the previous lecture so  let's just recall what it says says let  choose a P be the number of cache misses  in a deterministic still computation  when run on P processors each with a  private cache and let S sub P be a  number of successful steals during the  computation in the ideal cache model  with a cache size of M in a block size  of B the number of cache misses on P  processors equal to the number of cache  misses on one processor plus order  number of successful steals times M over  B and last time we also said that the  number of successful steals is upper  bounded by the span of the computation  times the number of processors if you  minimize the span of your computation  then you can also minimize the number of  successful steals and then for low span  algorithms the first term is usually  going to dominate the q1 term so I'm not  going to go over the proof we did that  last time the moral the story is that  minimizing cache misses in the serial  illusion essentially minimizes them in  the parallel execution assuming that you  have a low span algorithm  so let's see whether our trapezoidal  algorithm works in parallel so does the  space cut work in parallel recall that  the space cut I'm cutting it with a  slope of negative 1 through the center  because it's too wide so can I execute  the two trapezoids in parallel here no  the reason is that the right travel SOI  depends on the result of the left  trapezoid so I can't execute them at the  same time but there is a way that I can  execute traps always in parallel so  instead of just doing the cut through  the center I'm actually going to do a V  cut so now I have three trapezoids the  two trapezoids in black I can't actually  execute those in parallel because  they're independent and everything in  those two trapezoids just depends on the  base of that trapezoid and after I'm  done with the two trapezoids labelled  one then I can X I can compute the  trapezoid label 2 and this is known as a  parallel space cut it produces two black  trapezoids as well as a gray trapezoid  in the blue to black trapezoids execute  in parallel and afterwards the gray  trapezoid executes and this is done  recursively as well any questions yeah  no ok  we also had the time cut oh sorry  so if the trapezoid is inverted then  we're going to do this upside down v cut  and in this case we're going to execute  the middle trapezoid before we execute  the two trapezoids on the side for the  for the time cut it turns out that we're  just gonna use the same cut as before  and we're just gonna execute the two  trapezoids serially so we do get a  little bit of parallelism here from the  parallel space cut let's look at how the  parallel codes perform now  so okay so this was a serial looping  code here's the parallel looping code so  we had 1450 before about 3700 now so  little over twice the speed up and this  is on a four core machine it's just on  my laptop sorry  oh yeah sure  yeah slowing down a little bit but not  by too much  okay let's look at the trapezoidal code  No so as we saw before the trapezoid  code does about 1840 iterations per  second and we can paralyze this so now  it's doing about fifty three fifty  iterations per second so it's getting  about a factor of three speed-up okay I  can move it around a little bit more if  you want to see you Susan  serial trapezoid and parallel trapezoid  is everyone happy okay because I had to  do this in real time the input size  wasn't actually that big so I ran the  experiment offline without the rendering  on a much larger input so this input is  a three thousand by three thousand grid  and I did this for one thousand time  steps using four processor cores and my  cache size is eight megabytes the last  level cache size so the input size here  is much larger than my last level cache  size and here are the times that I got  so the serial looping code took about  129 seconds and when we did it in  parallel it was about 66 seconds so got  about a factor of two speed-up which is  consistent with what we saw for the  trapezoidal code it actually got a  better speed up well we increase the  input size so I got about a factor of  four speed-up and this is because for  larger input size the cache cache  efficiency plays a much larger role  because the cache is so small compared  to our input size so here we see that  the parallel looping code achieves less  than half of the potential speed up even  though the parallel looping code has  much more potential parallelism than the  trapezoidal code so a trapezoidal code  only had a little bit of parallelism  only for the space cuts  whereas the the trapezoidal code is  actually getting pretty good speed up so  this is near linear speed-up since I'm  using four cores in this getting three  point nine six X speed-up so what could  be going on here another thing to look  at is to compare the serial trapezoid  code with the serial looping code as  well as the parallel trapezoid code with  the peril parallel looping code so if  you look at the serial trapezoid code  you see that it's about twice as fast as  the serial looping code but the parallel  trapezoidal code is about four times  faster than the parallel looping code  and the reason here is that the the  hardware prefetching can't help the  parallel looping code that much because  when you're running in parallel all of  the cores are using memory and there's a  memory bandwidth bottleneck here and  prefetching actually needs to use memory  bandwidth so in the serial case we had  plenty of memory bandwidth you we could  use for prefetching but in the parallel  case we had we don't actually have much  parallel but much memory bandwidth we  can use here so that's why in a parallel  case the trapezoidal code gets better  speed up over the parallel looping code  compared to the serial case and the  trapezoid code also gets better speed up  because because it's actually it does  things more locally so it needs to use  less fewer memory operations and there's  a scalability bottleneck at the memory  interconnect but because the trapezoidal  code is cache oblivious it does a lot of  work in cache whereas the looping code  does more accesses to the main memory  any questions on this  okay so how do we know when we have when  we have a parallel speed-up bottleneck  how do we know what what's causing it so  there are several main things that we  should look at so we should see if our  algorithm has insufficient parallelism  whether the scheduling overhead is  dominating whether we have a lack of  memory bandwidth or whether there's  contention going on in contention can  refer to either locking or true and  false sharing which we talked about in  the last lecture so first two are  usually quite easy to diagnose you can  compute the work and the span of your  algorithm and from that you can get the  parallelism you can also use silk scale  to help you diagnose the first two  problems because silk scale can tell you  how much parallelism you're getting in  your code and it can also tell you the  burden parallelism which includes the  scheduler overhead what about for memory  bandwidth how can we diagnose that says  anyone have any ideas so I can tell you  one way to do it I can open up open up  my hardware and take out all of my  memory chips except for one of them and  run my serial code and if it slows down  then that means it was probably memory  bandwidth bound when I did it in  parallel but that's a pretty  heavy-handed way to diagnose this  problem is there anything we can do with  just software yes  yeah so you could use a tool like  valgrind  to count the number of memory accesses  yes  [Music]  so you can make sure only one processor  is being used but you can't you can't  but it might be using like more that  more memory than just the memory from  one chip there's actually a simpler way  to do this the idea is that we'll just  run P identical copies of the serial  code and then dole all be executing in  parallel and if the execution slows down  that that means that they were probably  contending for memory bandwidth does  that make sense one caveat of this is  you can only do this if you have enough  physical memory because if when you're  running P identical copies you have to  use more DRAM than if you just ran one  copy so you have to have enough physical  memory but often times you can usually  isolate some part of the code that you  think has a performance bottleneck and  just execute that part of the program  with P copies in parallel and hopefully  that will take less memory there also  hardware counters you can check if you  have root access to your machine that  can measure how much memory bandwidth  your program is using but this is a  pretty simple way to do this so there  are ways to diagnose lack of memory  bandwidth turns out that contention is  much harder to diagnose there are tools  that exists that detect lock contention  in an execution but usually they only  detect the contention when the condensor  actually happens but the contention  doesn't have to happen every time you  run your code so these tools don't  detect the potential for lock lock  contention and potential for true and  false sharing is even harder to detect  especially false sharing because if  you're using a bunch of variables in  your code you don't know which of those  mapped to the same cache line so this is  much harder to detect usually when  you're trying to debug the speed-up  bottleneck in your code you would first  look at the first three things here and  then once you eliminated those first  these things then you can start looking  at whether contention is causing the  problem  any questions okay so talked about  stencil computation I want to now talk  about another problem sorting and we  want to do this cache efficiently okay  so let's first analyze the cache  complexity of a standard merge sort so  we first need to analyze the complexity  of merging because this is used as a  subroutine in merge sort and as you  recall in merging we're given two sorted  input arrays and we want to generate an  output array that's also sorted  containing all the elements from the two  input arrays and the algorithm is going  to maintain a pointer to the head of  each of our input arrays and then it's  going to compare the two elements and  take the smaller one and put it into the  output and then increment the pointer  for that array and then we keep doing  this taking the smaller of the two  elements until the two input arrays  become empty  at which point we're done with the  algorithm we have one sorted output  array okay so to merge n elements the  time to do this is just theta of n here  n is the sum of the sizes of my two  input arrays and this is because I'm  only doing a constant amount of work for  each of my input elements what about the  number of cache misses  how many cache misses why why incur when  I'm merging n elements  yes  yeah so I'm going to incur theta n over  B cash misses because my two input  arrays are stored contiguously in  memories I can read them in B bytes at a  time with just one cache miss and then  my output array is also stored  contiguously so I can write things out B  bytes at a time with just one cache miss  I might waste waste a cache line at the  beginning and end of each of my three  arrays but that only affects the bound  by a constant factor so it's theta of n  over B so now let's look at merge sort  so recall that merge sort has two  recursive calls to itself on inputs of  half the size and then it doesn't merge  at the end to merge the two sorted  outputs of its recursive calls so if you  look at how the recursion proceeds its  first going to divide the input array  into two halves it's going to divide it  into two halves again again until you  get to the base case of just one element  at which point you return and then now  we start merging pairs of these elements  together so now I have these arrays of  size two in sorted order and I emerged  pairs of those arrays and I get sub  arrays of size four and that finally I  do this one more time to get my sorted  output ok so let's review the work of  merge sort so what's the recurrence for  merge sort if we're competing the work  yes  this  yeah so that's correct so I have two sub  problems of size n over two and that I  need to do theta and work to do the  merge and this is case two of master  theorem so I'm computing log base B of A  which is log base 2 of 2 in dashes 1 and  that's the same as the exponent in the  term that I'm adding in so since they're  the same I add in an additional log  factor and my overall work is just theta  of n log n okay so now I'm gonna solve  this recurrence again using the  recursion tree method I'm still gonna  get theta of n log n but I'm doing this  because it's gonna be useful when we  analyze the cache complexity so the top  level I have a problem of size N and I'm  going to branch into two problems of  size n over two and when I'm done with  them I have to do a merge which takes  theta n work and I'm just putting n here  I'm ignoring the constants and I'm got  branch again and each one of these is  gonna do n over 2 work to merge and I'm  gonna get all the way down to my base  case after log base two of n levels the  top level is doing and work second level  is also doing n work and it's gonna be  the same for all levels down to the  leaves leaves is also doing linear  amount of work so the overall work is  just the work per level times the number  of levels so it's just eight of n log N  any questions okay so now let's analyze  this with caching okay so we said  earlier that the cache complexity of the  merging subroutine is theta of n over B  and here's the recurrence for the cache  complexity of merge sort so my base case  here is when n is less than or equal to  CM for some sufficiently small constant  C and this is because at this point my  problem size fits into cache and  everything else I do for that problem is  still going to be in cash and to load it  into cash the the base case I need to  incur theta and over B cache misses and  otherwise I'm going to have two  recursive calls of size n over two and  then I need to do theta and over B cache  misses to do the merge of my two results  okay so here my base case is larger than  when I what I did for the work the  algorithm actually stole recursing down  to a constant massage base case but just  for analysis I'm stopping the recursion  when and is less than or equal to CM so  let's analyze this so again I'm going to  have the problems of size n at beginning  and then I'm gonna split it into two  problems of size n over two and then I'm  gonna have to pay and over B cache  misses to merge the results together  similarly for the next level now I'm  paying and over to be cache misses for  each of my two problems here to do the  merge and I keep going down until I get  to a sub problem of size theta of CM at  that point is going to fit in cache and  I don't need to recurse anymore in my  analysis so a number of levels of this  recursion tree it's just log base two of  n over cm so I'm basically chopping off  the bottom of this recursion tree the  number of levels I had below this is log  base two of CM so I'm taking log base  two of n and subtracting log base two of  CM and that's equivalent to log base two  of n divided  by CM the number of leaves I have is n  over cm since each leaf is Theta of CM  large in the number of cache misses I  need to incur to process a leaf it's  just theta of M over B because I just  need to load the the problem I just need  to load the input for that sub problem  into cache and then everything else fits  in cache so for the top level I'm  incurring and over B cache misses the  next level I'm also incurring n over B  cache misses same with the third level  and then for the leaves I'm incurring M  over B times and over C on cache misses  the n over cm is the number of leaves I  have in theta of M over B is the number  of cache misses per leaf and that also  equals theta of n over B so overall I  multiply theta n over B by the number of  levels I have so the number of levels I  have is log base two of n over cm and I  just got rid of the constant here since  doesn't affect asymptotic bound so the  number of cache misses I have a state of  n over B times log base two of or any  log any base for the log log of n over m  so any questions on this analysis  Sok I am saving a factor of B here in  the first term so that does reduce that  just gives me a better cache complexity  than just a work bound but for the M  term it's actually inside the  denominator of the log and that doesn't  actually help me that much so let's look  at how much we actually save so when n  is much greater than M then log base 2  of n over m is approximately equal to  log base 2 of n so the M term doesn't  contribute much to the asymptotic cost  and therefore compared to the work we're  only saving a factor of B when n is  approximately equal to M then log base 2  of n over m is constant and we're saving  a factor of B log n so we save more when  the memory size or when the problem size  is small but for large enough problem  sizes we're only saving a factor of B so  does anyone think that we can do better  than this so I've asked this question  several times before and the answer is  always the same yes it's a good answer  so we're gonna do this using multi way  merging so instead of just merging two  sorted sub-arrays we're gonna merge  together our sorted sub-arrays so we're  going to have our sub arrays each of  size and over R and these are sorted and  I'm gonna merge them together using  what's called a tournament tree so how  determinate tournament tree works is I'm  going to compare the heads of each pair  of these sub arrays and store it in the  node of the tournament tree and then  after I do that I compare these two  nodes and I get the minimum of those two  eventually after I compare all all the  elements I'm just left with the minimum  element at the root of the tournament  tree and then I can place that into my  output  so the first time I want to fill this  tournament tree it's going to take theta  of our work because there are nodes in  my tournament tree so when I compare  these two elements the smaller one is  six for these two the smaller one is two  and then I compare two and six take the  smaller one and then on the other side I  have a seven here so I compare two and  seven two is smaller so it appears at  the root and then I know that that's  going to be my minimum element among the  heads of all of the our sub arrays that  I'm passing it so the first time to  generate this tournament tree takes  theta of our work because I have to fill  in our nodes but once I generated this  tournament tree for all subsequent  rounds I only need to fill in the path  from the element that one to the output  or rate or to the root of the tournament  tree because those are the only values  that would have changed so now I'm going  to fill in this path here and this only  takes theta of log our work to do it  because the the height of this  tournament tree is log base 2 of R so  I'm going to fill this in now 14 goes  here 6 is a smaller of the two and then  6 is a smaller to 2 again so my next  element is 6 and I keep doing this until  all of the elements from my our sub  arrays get put into the output the total  work for merging is going to be theta of  R for the first round plus n log R for  all the remaining rounds and that's just  equal to theta of n log R because we're  assuming that n is bigger than R here  any questions on how the multi way merge  works  No okay so let's analyze the work of  this multi way merge that when used in  size inside merge sort  so recurrence is gonna be as follows so  if n is 1 then we just do a cost amount  of work otherwise we have our  subproblems of size and over R and then  we're paying stative n log R to do the  multi way merge so here's the recursion  tree at the top level we have a problem  of size n and then we're gonna split  into our subproblems of size n of R and  that we have to pay and log our work to  merge the results of the recursive call  together and then we keep doing this  turns out that the work at each level  sums sums up to n log based let n log  base 2 of R in the number of levels we  have here is log base R of n because  each time we're branching by a factor of  R for the leaves we have n leaves and we  just pale in your work for that because  we don't have to pay for the merge we're  not doing any more recursive calls  so the overall work is going to be theta  of n log R times log base R of n plus n  for the leaves but that's just a lower  order term and if you work out the math  some of these terms are gonna cancel out  and you just get theta of n log n for  the work so the work is the same as the  binary merge sort let's now analyze the  cache complexity so let's assume that we  have R less than C M over B for a  sufficiently small constant C less than  or equal to 1 we're going to consider  the our way merging of contiguous arrays  of total size N and if R is less than C  M over B then we can fit the entire  tournament tree into cache and we can  also fit one block from each of the R  sub arrays into cache and in that case  the total number of cache misses to do  the multi way merge is just theta and  over B because we just have to go over  the end elements in our input arrays  so the recurrence for the are way merge  sort is as follows so if n is less than  cm then it fits in cash it's a number of  cache misses we page the state of n over  B and otherwise we have our subproblems  of size and over R and that we add theta  and over B to do the merge of the  results of the subproblems any questions  on the recurrence here yes good question  so we didn't pick the value of R so this  is not a cache oblivious algorithm and  we'll see what to choose for our in a  couple slides so let's analyze the cache  complexity of this algorithm again using  the recursion tree analysis so at the  top level we're going to have our  subproblems of size n over R and we have  to pay and over B cache misses to merge  them and it turns out that at each level  the number of cache misses we have to  pay is n over B if you work out the math  and the number of levels of this  recursion tree is going to be log base R  of n over cm because we're going to stop  recursing when our problem size is cm  and on every level of recursion we're  branching by a factor of R so our leaf  our leaf size is cm therefore the number  of leaves we have is n over cm and for  each leaf it's going to take theta of M  over B cache misses to load it into  cache and afterwards we can do  everything in cache and multiplying the  number of leaves by the cost per leaf we  get theta n over B cache misses and  therefore the number of cache misses is  n over B times the number of levels  number of levels is log base R of n over  m so compare to the binary merge sort  algorithm here we actually have a factor  of R in the base of the log before the  the base of the log wishes to  so now the question is what we're gonna  set our to be so again we have a voodoo  parameter this is not a cache oblivious  algorithm and we said that R has to be  less than or equal to CM / B in order  for the analysis to work out so let's  just make it as large as possible  let's just set R equal to theta of M  over B and now now we can see what this  complexity works out to be so we have  the tall cache assumption we also have  the fact that log base M of n over m is  equal to theta of log n over log M so  the cache complexity estate of n over B  times log base M over B of n over m but  if we have the tall cache assumption  that log base M over B is the same as  log base of M asymptotically so that's  how we get to the second line and then  you can rearrange some terms and cancel  some terms out and we'll end up with  theta n log n divided by B log M so  we're saving a factor of theta of B log  M compare to the work of the algorithm  whereas for the binary version of  mercero we were only saving a factor of  B for large enough inputs so here we get  another factor of log m in our savings  so as I said the binary 1 cache misses  is n log n over B whereas the multi-way  one is n log n over B log m and as long  as n is much greater than M then we're  actually saving more much more than the  binary version so we're saving a factor  of log M in cache misses and let's just  ignore the constants here and look at  what log m can be in practice so here  are some typical cache sizes the l1  cache size is 32 kilobytes so that's to  215th and log base 2 of that is 15 so we  get a 15 X savings and then for the  larger cache sizes we get an even larger  saving sign any questions on this  so the problem with this algorithm is  that it's not cache-oblivious so we have  to tune the value of R for a particular  machine and even when we're running on  the same machine there could be other  jobs running that content for the cache  turns out that there are several cache  oblivious sorting algorithms the first  one that was developed was by paper by  charles larsen and it's called funnel  sort the idea here is to recursively  sort end to the 1/3 groups of n to the  2/3 elements and then merged the sorted  groups with an end to the 1/3 funnel and  this funnel is called a cage K funnel  more generally and it merges together K  cubed elements and K sorted lists and  the cost for doing this merge is as  shown here and you plug this into the  recurrence  you'll get that the asymptotic number of  cache misses is the same as that of the  multi way merge sort algorithm while  being cache oblivious and this bound is  actually optimal so I'm not gonna have  time to talk about the details of the  funnel sort algorithm but I do have time  to just show you a pretty picture of  what the funnel looks like so this is  what a K funnel looks like it's  recursively constructed we have a bunch  of square root of K funnels inside  they're all connected to some buffers so  they feed alamos to buffer and then the  buffers feed element to this output  square root of K funnel which outputs to  which becomes the output for the K  funnel so this whole blue thing is the K  funnel and the small green triangles are  square root of K funnels and the number  of cache misses incurred by doing this  merge is shown here and it uses the tall  cache assumption for analysis so a  pretty cool algorithm and we posted a  paper online that describes this  algorithm so if you're interested in the  details I encourage you to read that  paper there are also many other cache  oblivious algorithms out there so that  there's been hundreds of papers on cache  oblivious algorithms here are some of  them some of these are also described in  the paper in fact I think all of these  are described in the paper we posted  so some cool cache-oblivious data  structures have been developed such as  for b-trees ordered filed maintenance  and priority queues so it's a lot of  literature on cache oblivious algorithms  and there's also a lot of material  online if you're interested in learning  more

2
00:00:03,189 --> 00:00:05,769
the following content is provided under  a Creative Commons license your support  will help MIT OpenCourseWare continue to  offer high quality educational resources  for free  to make a donation or to view additional  materials from hundreds of MIT courses  visit MIT opencourseware at ocw.mit.edu  all right so today we're gonna talk  about cache oblivious algorithms who  remembers what a cache oblivious  algorithm is so what is a cache  oblivious algorithm oblivious to cache  size so a cache oblivious algorithm is  an algorithm that automatically Tunes  for the cache size on your machine so it  achieves good cache efficiency and the  code doesn't need to have any knowledge  of the cache parameters of your machine  in contrast a cache aware algorithm  would actually know the parameters of  the cache sizes on your machine and the  code would actually put the sizes of the  cache inside so today we're gonna talk a  lot more about cache oblivious  algorithms last time we talked about one  cache oblivious algorithm that was for  matrix multiplication and today we're  gonna talk about some other ones so  first example I want to talk about is  simulation of heat diffusion so here's a  famous equation known as the heat  equation and this equation is in two  dimensions and we want to simulate this  function you that has three parameters t  x and y t is the time step x and y are  the XY coordinates of the 2d space and  we want to know the temperature for each  x y coordinate for any point in time T  and the 2d heat equation can be modeled  using a differential equation so how  many of you have seen differential  equations before okay so good most most  of you so here I'm showing the equation  in two dimensions but you can get  similar equations for higher dimensions  so here it says the partial derivative  of U with respect to T is equal to alpha  alpha is what's called the thermal  diffusivity it's a constant times the  sum of the second partial derivative of  U with respect to X and the partial  second partial derivative  you with respect to Y so this is a  pretty famous equation and you can see  that we have a single derivative on the  left side and a double d'oeurves on the  right side and how do we actually write  code to simulate this to deihi process  so oftentimes in scientific computing  people will come up with these  differential equations just to describe  physical processes and then they want to  come up with efficient code to actually  simulate the physical process okay so  here's an example of a 2d heat diffusion  so let's say we started out with  configuration on the left and here the  the color corresponds to a temperature  so a brighter color means it's hotter  yellow is the hottest and blue is the  coldest and on the Left we just have six  172 which is the course number so if you  didn't know that you're probably in the  wrong class and then afterwards we're  gonna run it for a couple time steps and  then the heat is gonna diffuse to the  neighboring regions of the 2d space so  after we run it for a couple steps we  might get the configuration on the right  where the heat is more spread out now  and oftentimes we want to run this  simulation for a number of time steps  until the distribution of heat converges  so it becomes stable and it doesn't  change by much anymore and then we stop  the simulation so this is the 1d heat  equation I showed you at a 2d one  earlier but we're actually going to  generate code for the 1d heat equation  since it's simpler but all the ideas  generalize to higher dimensions and  here's the range of colors corresponding  to temperatures so the hottest colors on  the left and the coldest colors on the  right and if you had a heat source  that's on the left-hand side of this bar  then this might possibly be a stable  distribution so if you keep running the  simulation you might get a stable  distribution of heat that looks like  this  okay so how do we actually write code to  simulate this differential equation so  one commonly used method is known as  finite difference approximation so we're  going to approximate the partial  derivative of U with respect to each of  its coordinates so the partial  derivative of U with respect to T is  approximately equal to U of T plus delta  T where delta T is some small small  value and X and then minus U of T X and  then that's all divided by delta T so  how many of you have seen this  approximation method before from your  calculus class okay good so as you bring  the value of delta T down to zero then  the thing on the right hand side  approaches the true partial derivative  so that's a partial derivative with  respect to T we also need to get the  partial derivative with respect to X and  here I'm saying that partial derivative  with respect to X is approximately equal  to UT of X plus Delta x over 2 minus UT  of X minus Delta X over 2 all divided by  Delta X so notice here that instead of  adding Delta X in the first term and not  adding anything in the second term I'm  actually adding Delta X over 2 in the  first term I'm subtracting Delta X over  2 in the second term and it turns out  that I can do this with the  approximation method and it still turns  out to be valid as long as the two terms  that I'm putting in there their  difference is Delta X so here the  difference is Delta X and I can  basically decide how to split up this  Delta X term among the two things in the  numerator and the reason why ice  I chose Delta X over 2 here is because  the math is just gonna work out nicely  and it's gonna give us cleaner code this  is just a first partial derivative with  respect to X I actually need the second  partial derivative since the right-hand  side of this equation has a second  partial derivative  so this is what the second partial  derivative looks like so I just take the  partial derivative with respect to X of  each of the terms in my numerator from  the equation above and then now I can  actually plug in the value of this  partial derivative by applying the  equation above using the arguments T and  X plus Delta x over 2 and similarly for  the second term so for the first term  when I plug it into the equation for the  partial derivative of with respect to X  I'm just gonna get UT of X plus Delta X  minus UT x and then for the second term  I'm going to get UT of X minus Delta X  and then I subtract another factor of UT  X so that's why I'm subtracting two  times UT x in the numerator here and  then the partial derivative of each of  the things in the numerator also have to  divide by this Delta X term so on the  denominator I get Delta x squared so now  I have the second partial derivative  with respect to X and I also have the  first partial derivative with respect to  T so I can just plug them into my  equation above so on the left hand side  I just have this term here and I'm  multiplying by this alpha constant and  then on this term just comes from here  so this is what the 1d heat equation  reduces to using the finite difference  approximation method so any questions on  this  okay so how do we actually write code to  simulate this equation here so we're  gonna use what's called a stencil  computation and here I'm going to set  Delta X and delta T equal to one just  for simplicity but in general you can  set them to whatever you want you can  make them smaller to have a more  fine-grained simulation so my set Delta  X and delta T equal to 1 then the  denominators of these two terms just  become one and I don't need to worry  about them and then I'm going to  represent my 2d space using a 2d matrix  where the horizontal axis represents  values of X and the vertical axis  represents values of T and I want to  fill in all these entries that have a  black dot in it the ones with the orange  dot those are my boundaries so those are  actually fixed throughout the  computation so I'm not gonna do any  computation on those those are just  given to me as input so they's they  could be heat sources if we're doing the  heat simulation and then now I can  actually write code to simulate simulate  this equation so if I want to compute U  of T plus 1 X I can just go up here and  I see that that's equal to this thing  over here and then I bring the negative  UT X term to the right so I get UT of X  plus alpha times u T X plus 1 minus 2  times u T X plus u T X minus 1 you know  as I said before we just want to keep  iterating this until the temperatures  becomes stable so I'm going to proceed  in time which in time is going up here  and to compute one of these points so  let's say this is UT plus 1x I need to  know the value of UT X which is just the  thing below me in the matrix and I also  need to know UT X + 1 + UT X minus 1 and  those are just  blowme and diagonal to either the left  or the right side  so each value here just depends on three  other values and this is called a  three-point stance so this is the  pattern that this equation is  representing and in general a stencil  computation is going to update each  point in an array using a fixed pattern  this is called a stencil so I'm going to  do the same thing for all of the other  points and here I'm going to compute all  of the values of X for a given time step  and then I move on to the next time step  and then I keep doing this until the the  distribution of temperatures becomes  stable and then I'm done  okay so these stencil computations are  widely used in scientific computing  they're used for weather simulations  stock market simulations fluid dynamics  image processing probability and so on  so they're all they're used all over the  place and science so this is a very  important concept to know so let's say I  I just ran the code as I showed you in  the animation so I completed one row at  a time before I moved on to the next row  how would this code perform with respect  to caching  yes  yeah so if ax is small this would do  pretty well but what if X is much larger  than the size of your cache yeah you  would do badly and and why is that  yeah so it turns out that there there  could be some reuse here because when I  compute the second row I'm actually  using some values that are computed in  the first row but if the row is much  larger than the cache size by the time I  get to the second row the values that  are loaded into cache from the first row  without then evict 'add and therefore  I'm going to suffer a cache miss again  for the second row for the values I need  to load and even though they could have  been used if we made our code have good  locality another question I have is if  we only cared about the values of X at  the most recent time stub do we actually  have to keep around this whole 2d matrix  or can we get by with less storage  yeah so how many how many rows would I  have to keep around if I only cared  about the most recent time stub yeah  - and why is that right so I need to  keep around 2 rows because I need the  row from the previous time step in order  to compute the values in the current  time stuff and after the current time  step I can just swap the roles of the  two rows that I'm keeping around and  then reused a previous row for the next  one yes  so I I need to know when I'm competing  in a second row I need to keep around  all of these values that are computed in  the first row because these values get  fed to one of the computations in the  second row so I need to actually keep  all of them around  yeah  oh I see I see what you're saying yeah  so that's actually good observation so  you only need to keep a constant amount  more storage because you'll just be  overwriting the values as you go through  the row so so if you keep around one row  some of the values would be for the  current times have been some of them  would be from the previous time steps a  good observation yes okay so so that  code as we saw it wasn't very cache  efficient you could make it cache  efficient using tiling but we're gonna  go straight to the cache oblivious  algorithm because it's much cleaner so  let's recall the ideal cache model we  talked about this in the previous  lecture so here we have a two-level  hierarchy we have a cache and then we  have the main memory and the cache kit  the cache has size of M bites and a  cache line is B bytes so you can keep  around M over B cache lines in your  cache and if something's in cash and you  operate on it then it doesn't incur any  cache misses but if you have to go to  main memory to load the cache line in  then you incur one cache miss the ideal  cache model assumes that the cache is  fully associative so any cache line can  go anywhere in the cache and it also  assumes either an optimal of mission  replacement policy or the LRU policy so  the optimal omission replacement policy  knows the sequence of all future  requests to memory and when it needs to  evict something it's going to pick the  thing that leads to the fewest cache  misses overall to evict the LRU policy  Jesse Vicks the things the thing that  was least recently used but we saw from  the previous lecture that in terms of  asymptotic costs these two replacement  policies will give you cache misses  within a constant factor of each other  so you can use either one depending on  what's convenient  and to performance measures that we care  about when we're analyzing an algorithm  in the ideal cash model are the work in  the number of cache misses so that work  it's just a total number of operations  that the algorithm incurs and serially  this is just the ordinary running time  in a number of cache misses it's a  number of cache lines you have to  transfer between the cache and your main  memory so let's assume that we're  running an algorithm or analyzing an  algorithm in the ideal cache model and  it runs serially what kinds of cache  misses does the ideal cache model not  capture so remember we talked about  several types of cache misses and  there's one type of cache miss that this  model doesn't capture when we're running  serially  so let's assume we're running this see  really without any parallelism here so  the sharing miss has only come about  when you have parallelism yes yes so the  answer is conflict misses and why is  that why does this model not capture it  yeah so this is a fully associative  cache so any cache line can go anywhere  in the cache and you can only get  conflict misses for set associative  schemes where each cache line can only  be mapped to a particular set and if you  have too many cache lines that map to  that particular set then you're gonna  keep evicting each other even though the  rest of the cache could have space and  that's what's called a conflict miss the  ideal cache model does capture capacity  misses so therefore it's still a very  good model to use at a high level when  you're designing efficient algorithms  because I encourage use to encourage you  to optimize for spatial and temporal  locality and once you have a good  algorithm in the ideal cache model then  you can start dealing with conflict  misses using some of the strategies that  we talked about last time such as  padding or using temporary memory so any  questions on this  okay so this is the code that does the  heat simulation that we saw earlier so  it's just two for loops the nested for  loop in the outer loop we're looping  over the time-dimension in the inner  loop we're looping over the space  dimension so we're computing all of the  values of X before we move on to the  next time step and then we're storing  two rows here and we're using this trick  called even-odd trick and here's how it  works so to access the next row that we  want to compute that we just do t plus 1  mod 2 and then tax as a current row is  just T mod 2 so this is implicit or oh  we're keeping around as we progress  through time and then we're gonna set T  plus U of T plus 1 mod to X equal to  kernel of you a pointer to UT mod to X  and this kernel function is defined up  here and recall that when we're we're  actually passing a pointer to this  kernel function we can actually treat a  pointer as the beginning of an array so  we're using array notation up here  inside the kernel function so that ray W  is passed as input and then we need to  return W of 0 that's just a element at  the current pointer that we passed to  kernel and then we add alpha times W of  negative 1 that's 1 LM element before  the thing that we're pointing to minus 2  times W of 0 plus W of one W of 1 is the  next element that we're pointing to ok  so let's look at the caching behavior of  this code so we're gonna analyze the  cache complexity and we're gonna assume  the LRU replacement policy here because  we can and as we said before we're going  to loop through one entire row at a time  before we go on to the next row so the  number of cache misses I get assuming  that n  is greater than M so that the row size  is greater than the cache size the  number of cache misses is Theta of NT  over B so how do I get this cache  complexity bound here  so how many cache misses do I have to  incur for each row of this 2d space that  I'm computing yes right so I need n over  B cache misses for each row and this is  because I can load in B bytes at a time  so I benefit from spatial locality there  and then I have n elements that I need  to compute so it's theta n over B per  row and as we said before when we get to  the next row the stuff that we need from  the previous row have already been  evicted from cache so I basically have  to incur theta n over B cache misses for  every row and the number of rows I'm  going to compute as T so it's just data  of NT over B any questions on this  analysis so how many do you think we can  do better than this okay so one person  two three okay so turns out that we can  do better than this you can actually do  better with tiling but I'm not gonna do  the tiling version I want to do the  cache oblivious version and the cache  oblivious version is going to work on  trapezoidal regions in the 2d space and  recall that a trapezoid has a top base  in a bottom base and here the top base  is at t1 the bottom base is at T zero  and the height is just t1 minus t0 in  the width of a trapezoid it's just the  width of it at the midpoint between t1  and t0 so at T 1 plus T 0 over 2 so  we're going to compute all of the points  inside this trapezoid that satisfy these  inequalities here so T has to be greater  than or equal to t0 and less than t1 and  then X it's greater than or equal to x0  plus DX 0 times t minus t0 so DX 0 is  actually the inverse  slope here and then it also has to be  less than x1 plus DX 1 times t minus t0  so DX 1 is the inverse slope on the  other side and DX is 0 and DX 1 have to  be either negative 1 0 or 1 so negative  1 just corresponds to inverse slope of  negative 1 which is also a slope of  negative 1 if it's 1 then it's just a  slope or inverse low-proof 1 and then if  it's 0 then we just have a vertical line  okay so the nice property of this  trapezoid is that we can actually  compute everything inside the trapezoid  without looking outside the trapezoid so  we can compute everything here  independently of any other trapezoids  that we might be generating and we're  gonna come up with a dividing conquer  approach to execute this code so the  divide and conquer algorithm has a base  case so our base case is going to be  when the height of the trapezoid is 1  and when the height is 1 then we're just  going to compute all of the values using  a simple loop can any order of the  computation inside this loop is valid  since we have all the values in the base  of the trapezoid and we can compute the  values in the top of the trapezoid in  whatever order they don't depend on each  other so that's the base case any  questions so far  so here's one of the recursive cases it  turns out that we're gonna have two  different types of cuts the first cut is  called a space cut so I'm gonna do a  space cut if the width of the trapezoid  is greater than or equal to twice the  height so this means that the trapezoid  is too wide and I'm going to cut it  vertically more specifically I'm going  to cut it with a line with slope one  negative one going through the center of  the trapezoid and then I'm going to  traverse the trapezoid on the left side  first and then after I'm done with that  traverse the trapezoid on the right side  so can I can I actually switch the order  of this can I compute the stuff on the  right side before I do this stuff on the  left side no why is that  yeah so there's some points in the right  trapezoid that depend on the values from  the left trapezoid and so so for the  left trapezoid every point we want to  compute we already have all of its  points assuming that we get all the  values of the base points but for the  right hand side some of the values  depend on values in the left trapezoid  so we can't execute the right trapezoid  until we're done with the left trapezoid  and this is the reason why I cut this  trapezoid with a slope of negative one  instead of using a vertical vertical cut  because if I did a vertical cut then  inside both of the trapezoids I would  have points that depend on the other  trapezoid so this is one of the two cuts  it's called a space cut and it happens  when the trapezoid is too wide the other  cut is the time cut I'm gonna cut with  respect to the time dimension and this  happens when the trapezoid is too tall  so when the width is less than twice the  height of the trapezoid then what I'm  gonna do is I'm just gonna cut it with a  horizontal Zonta line through the center  and then I'm going to traverse the  bottom trapezoid first and after I'm  done with that I can traverse a top  trapezoid and again the top trapezoid  depends on some points from the bottom  trapezoid so I can't switch the order of  those any questions  okay so let's now look at the code that  implements this recursive divide and  conquer algorithm so here's the C code  it takes as input t0 and t1 these are  the coordinates of the top in the bottom  of the trapezoid or bottom and top of  the trapezoid then x0 is the left side  of the trapezoid of the base of the  trapezoid d x0 is the inverse slope and  the x1 is the right side of the bottom  base of the trapezoid and DX 1 is the  inverse slope on the right side so we're  first going to compute the height of our  trapezoid and we're gonna let LT be the  height and that's just t1 minus t0 and  if the height is 1 then we're just going  to use a for loop over all the points in  that in that height 1 trapezoid okay  we're going to call this kernel function  that we defined before and otherwise the  height is greater than 1 and we're gonna  check whether we should do a space cut  or a time cut so we do a space cut if  the trapezoid is too wide and this  condition inside the if Clause is  checking if the trapezoid is too wide  and if so then we're going to make two  recursive calls to try out this oeid and  we're gonna cut it in the middle using  this slope of negative 1 so you see the  negative ones here in the recursive  calls and otherwise we'll do a time cut  and for the time cut we just cut it in  the middle so we cut it at the value of  T that's equal to L T divided by 2 or t0  plus L T divided by 2 and again we have  two recursive calls to trapezoid ok so  even though I'm only generating slopes  of negative 1 in this recursive call  this code is gonna work even if I have  slopes of 1 and 0 because I could start  out with slopes of 1 and 0 for example  if I had a rectangular region then the  slopes are just gonna be 0 and this code  is still gonna work but most of the  slopes that I'm dealing with are going  to be a negative 1 because those are the  new slopes and I'm generating  any questions  so this code is very concise it turns  out that the even odd trick still works  here you can still keep around just two  rows because your guarantee that you're  not gonna write overwrite any value  until all the all the things that depend  on it are computed but when you're come  when you're using just two rows the  values in a particular row might not all  correspond to the same time step because  we're not finishing an entire row before  we move on to the next one  we're actually partially computing rows  okay so let's analyze the cache  complexity of this algorithm again we're  going to use the recursion tree approach  that we talked about last time and our  our code is going to split itself into  two subproblems at every level until it  gets to a leaf and each leaf represents  theta of HW points where H is Theta of W  so H is the height double use the width  and we have the property that H is equal  to theta of W because of the nature of  the algorithm when the trapezoid becomes  too wide we're going to make it less  wide by doing a space cut and if it  becomes too tall we're going to do a  horizontal cut so we're guaranteed that  the height and the width are going to be  within a constant factor of each other  when we get to the base case and each  leaf in the base case it's going to  incur theta of W over B misses because  we have to load in the base of the  trapezoid from cache from main memory  and once that's in cache we can compute  all of the other points in the trapezoid  without incurring any more cache misses  so the cache misses per leaf is just  theta W over B and we're gonna set W  equal to theta of M in an in the  analysis because that's the point when  the trapezoid fits into cache the  algorithm doesn't actually have any  knowledge of this M parameter so it's  still going to keep dividing conquering  until I gets the base case of size 1  but just for the analysis we're going to  use a base case when w is theta of M so  the number of leaves we have is Theta of  NT divided by HW because each leaf is of  size theta HW the number of internal  nodes we have is equal to a number of  leaves minus one because we have a tree  here but the internal nodes don't  contribute substantially to the cache  complexity because each of them is just  doing a constant number of cache misses  to set up the two recursive calls and  it's not doing anything more expensive  than that so we just need to compute the  number of cache misses at the leaves we  have theta of NT over HW leaves each one  of which takes theta of W over B cache  misses and we're going to multiply that  out so the W term cancels out  we just have NT over HB and we set H and  W to be theta of M so we're just left  with NT over MB as our cache complexity  yes  sure so each leaf only incurs theta W  over B cache misses because we need to  compute the values of the base of the  trapezoid and that's going to incur  theta of W over B cache misses because  it's W wide and then for everything else  it's going to fit into cache so when we  compute them we already have all of our  previous values that we want into cash  in cache so so that's why it's not gonna  incur any more cache misses does that  make sense yeah okay so this was just  analysis for one dimension but it  actually generalizes to more than one  dimensions so in general if we have D  dimensions then the cache complexity is  going to be theta of NT divided by M to  the one over D times B so if d is one  then that just reduces to NT over MB if  D is two then it's going to be NT over B  times square root of M and so on it  turns out that this bound is also  optimal so any questions so compared to  the looping code this code actually has  much better cache complexity it's saving  by a factor of M the looping code had a  cache complexity that was theta of NT  over B it didn't have an M in the  denominator okay so we're actually gonna  do a simulation now we're gonna compare  how the looping code and a trapezoid  code runs and in this simulation the  green points correspond to a cache hit  the purple points correspond to a cache  miss and we're gonna seem a fully  associative cache using the LRU  replacement policy where the cache line  size is four points and the cache size  is 32 points and we're going to set the  cache hit latency to be one cycle and  the cache miss latency to be ten cycles  so an order of magnitude slower for  cache misses and we're doing this for a  rectangular region where  and is 95 and we're doing it for 87 time  steps so let me pull out the simulation  now okay so on the left hand side we're  gonna have the looping code on the right  hand side we're going to have the  trapezoidal code so let's start this so  you can see that the looping code is  going over one row at a time where the  trapezoidal code is not doing that it's  partially computing one row and then  moving on to the next row okay I can  also show the the cuts that are  generated by the trapezoidal algorithm  have to remember how to do this so see  so so there are there's the carts that  are generated by the trapezoidal  algorithm and I can speed this up so you  can see that the trapezoidal algorithm  is incurring much fewer cache misses  than the looping code so I'll just make  this go faster and the the trapezoidal  code is going to finish while the  looping code is still slowly making its  way up to top  okay so it's finally done so any  questions on this yeah in which of the  regions so it's loading you get a cache  miss for the purple dots here and then  the cache line size is four so you get a  cache miss for the first point and then  you hit on the next three points and you  get another cache miss for the fifth  point and then you hit on the six seven  and eight points yeah so for the one  above it so we're assuming that the the  two arrays fit in cache already so we  don't actually need to load them from  memory the thing above it just depends  on the values that we have already  computed and that fits in cache those  are ready in cache this this base of the  trapezoid is already in cash and the row  right above it we just need to look at  those values does that make sense yeah  yeah any other questions on this so I  don't want to see this again or yeah  so I could let this run for the rest of  the lecture but I have more interesting  material that I want to talk about so  let's just stop after this finishes and  as you see again the looping code is  slowly making its way to the top it's  much slower than the trapezoid code okay  so that was only for one dimensions now  let's look at what happens in two  dimensions and here I'm going to show  another demo and this demo I'm actually  going to run the code for real the  previous demo was just a simulation so  this is going to happen in real time and  I'm going to simulate the heat in a 2d  space and recall that the colors  correspond to the temperatures so a  brighter color means it's hotter so let  me pull out the other demo  okay so here my mouse cursor is the  source of heat so you see that it's  making the the points around my mouse  cursor hot and then it's slowly  diffusing its way to the other points  now I can actually move this so that my  heat source changes and then the heat I  generated earlier slowly goes away okay  so so in the lower left hand corner I'm  showing the number of iterations per  second of the code and we can see that  the looping code is taking it's doing  about fifteen hundred and sixty  iterations per second let's see what  happens when we switch to the trapezoid  code so the trapezoid code is doing  about eighteen hundred and thirty  iterations per second so it's a little  bit faster but not by to much does  anyone have an idea why we're seeing  this behavior so we said that the  trapezoid code incurs many fewer cache  misses than the looping code so we would  expect it to be significantly faster but  here it's only a little bit faster yeah  right so that's that's a good point so  in 2d you're only saving a factor of  square root of M instead of M but square  root of M is still pretty big compared  to the speed-up we're getting here so  any other ideas yeah so there is a  constant factor in the trapezoidal code  but even after accounting for the Trump  for the constant factor you should still  see a better speed-up than this so even  accounting for the constant factors  what other problem might be going on  here yeah yeah so that's that's another  good observation but the caches that  we're using they still should get pretty  good good cache I mean they should still  have cache complexity that's pretty  close to the ideal cache model I mean  you might be off by a small constant  factor but not by too much yeah sorry  could you repeat that  [Music]  yes oh okay so if I move the cursor it's  probably gonna be a little bit slower  you know so gust slower by a little bit  but that doesn't really affect the  performance I can just leave my cursor  there and this is what the iterations  per second is yes  [Music]  - maybe we're seeing the crow's-feet  yeah so there is some other factor  dominating does anyone have an idea what  that factor what might be no it's not  it's not the rendering I I ran the code  without showing the graphics and  performance was similar yes  yes yeah so there could be other things  using the cache but that would be true  for both of the programs and I don't  actually have anything that's in sense  of running set for PowerPoint I don't  think that uses much of my cache all  right so let's look at why this is the  case so it turns out that the hardware  is actually helping the looping code so  the question is how come the cache  oblivious trapezoidal code can have so  many fewer cache misses but that Vantage  gained over the looping version is so  marginal it turns out that for the  looping code the hardware is actually  helping it by doing Hardware prefetching  and hardware prefetching for the looping  code is actually pretty good because the  access pattern is very regular it's just  going one row at a time so the hardware  can predict the memory access pattern of  the looping code and therefore it can  bring in the cache lines that the  looping code would need before it  actually gets to that part of the  computation so prefetching is helping  the looping code and it's not helping  the trapezoidal code that much because  the access pattern is less regular there  and prefetching does use memory  bandwidth but when you're using just a  single core you have more than enough  bandwidth to take advantage of the  hardware prefetching capabilities of the  machine but later on we'll see that when  we're running in parallel the memory  bandwidth does become more of an issue  when you have multiple processors all  using the memory yeah question  thank you  [Music]  you can do software prefetching there  are there are instructions to do  software prefetching hardware  prefetching is usually more efficient  but it's likes less flexible than the  software prefetching but here we're not  actually doing that we're just taking  advantage of hardware prefetching yeah  so we we didn't actually try that it  could benefit a little bit if we use a  little bit of software prefetching  although I think it would benefit the  looping code probably as well if we did  that yes sorry sorry uh yeah so hardware  prefetching is enabled it's always done  by the hardware on the machines that  we're using today so this was a this is  a pretty surprising result but we'll  actually see the fact of the memory  bandwidth later on when we look at the  parallel code any other questions before  I continue okay so let's now look at the  interplay between caching and  parallelism so this was a theorem that  we proved in the previous lecture so  let's just recall what it says says let  choose a P be the number of cache misses  in a deterministic still computation  when run on P processors each with a  private cache and let S sub P be a  number of successful steals during the  computation in the ideal cache model  with a cache size of M in a block size  of B the number of cache misses on P  processors equal to the number of cache  misses on one processor plus order  number of successful steals times M over  B and last time we also said that the  number of successful steals is upper  bounded by the span of the computation  times the number of processors if you  minimize the span of your computation  then you can also minimize the number of  successful steals and then for low span  algorithms the first term is usually  going to dominate the q1 term so I'm not  going to go over the proof we did that  last time the moral the story is that  minimizing cache misses in the serial  illusion essentially minimizes them in  the parallel execution assuming that you  have a low span algorithm  so let's see whether our trapezoidal  algorithm works in parallel so does the  space cut work in parallel recall that  the space cut I'm cutting it with a  slope of negative 1 through the center  because it's too wide so can I execute  the two trapezoids in parallel here no  the reason is that the right travel SOI  depends on the result of the left  trapezoid so I can't execute them at the  same time but there is a way that I can  execute traps always in parallel so  instead of just doing the cut through  the center I'm actually going to do a V  cut so now I have three trapezoids the  two trapezoids in black I can't actually  execute those in parallel because  they're independent and everything in  those two trapezoids just depends on the  base of that trapezoid and after I'm  done with the two trapezoids labelled  one then I can X I can compute the  trapezoid label 2 and this is known as a  parallel space cut it produces two black  trapezoids as well as a gray trapezoid  in the blue to black trapezoids execute  in parallel and afterwards the gray  trapezoid executes and this is done  recursively as well any questions yeah  no ok  we also had the time cut oh sorry  so if the trapezoid is inverted then  we're going to do this upside down v cut  and in this case we're going to execute  the middle trapezoid before we execute  the two trapezoids on the side for the  for the time cut it turns out that we're  just gonna use the same cut as before  and we're just gonna execute the two  trapezoids serially so we do get a  little bit of parallelism here from the  parallel space cut let's look at how the  parallel codes perform now  so okay so this was a serial looping  code here's the parallel looping code so  we had 1450 before about 3700 now so  little over twice the speed up and this  is on a four core machine it's just on  my laptop sorry  oh yeah sure  yeah slowing down a little bit but not  by too much  okay let's look at the trapezoidal code  No so as we saw before the trapezoid  code does about 1840 iterations per  second and we can paralyze this so now  it's doing about fifty three fifty  iterations per second so it's getting  about a factor of three speed-up okay I  can move it around a little bit more if  you want to see you Susan  serial trapezoid and parallel trapezoid  is everyone happy okay because I had to  do this in real time the input size  wasn't actually that big so I ran the  experiment offline without the rendering  on a much larger input so this input is  a three thousand by three thousand grid  and I did this for one thousand time  steps using four processor cores and my  cache size is eight megabytes the last  level cache size so the input size here  is much larger than my last level cache  size and here are the times that I got  so the serial looping code took about  129 seconds and when we did it in  parallel it was about 66 seconds so got  about a factor of two speed-up which is  consistent with what we saw for the  trapezoidal code it actually got a  better speed up well we increase the  input size so I got about a factor of  four speed-up and this is because for  larger input size the cache cache  efficiency plays a much larger role  because the cache is so small compared  to our input size so here we see that  the parallel looping code achieves less  than half of the potential speed up even  though the parallel looping code has  much more potential parallelism than the  trapezoidal code so a trapezoidal code  only had a little bit of parallelism  only for the space cuts  whereas the the trapezoidal code is  actually getting pretty good speed up so  this is near linear speed-up since I'm  using four cores in this getting three  point nine six X speed-up so what could  be going on here another thing to look  at is to compare the serial trapezoid  code with the serial looping code as  well as the parallel trapezoid code with  the peril parallel looping code so if  you look at the serial trapezoid code  you see that it's about twice as fast as  the serial looping code but the parallel  trapezoidal code is about four times  faster than the parallel looping code  and the reason here is that the the  hardware prefetching can't help the  parallel looping code that much because  when you're running in parallel all of  the cores are using memory and there's a  memory bandwidth bottleneck here and  prefetching actually needs to use memory  bandwidth so in the serial case we had  plenty of memory bandwidth you we could  use for prefetching but in the parallel  case we had we don't actually have much  parallel but much memory bandwidth we  can use here so that's why in a parallel  case the trapezoidal code gets better  speed up over the parallel looping code  compared to the serial case and the  trapezoid code also gets better speed up  because because it's actually it does  things more locally so it needs to use  less fewer memory operations and there's  a scalability bottleneck at the memory  interconnect but because the trapezoidal  code is cache oblivious it does a lot of  work in cache whereas the looping code  does more accesses to the main memory  any questions on this  okay so how do we know when we have when  we have a parallel speed-up bottleneck  how do we know what what's causing it so  there are several main things that we  should look at so we should see if our  algorithm has insufficient parallelism  whether the scheduling overhead is  dominating whether we have a lack of  memory bandwidth or whether there's  contention going on in contention can  refer to either locking or true and  false sharing which we talked about in  the last lecture so first two are  usually quite easy to diagnose you can  compute the work and the span of your  algorithm and from that you can get the  parallelism you can also use silk scale  to help you diagnose the first two  problems because silk scale can tell you  how much parallelism you're getting in  your code and it can also tell you the  burden parallelism which includes the  scheduler overhead what about for memory  bandwidth how can we diagnose that says  anyone have any ideas so I can tell you  one way to do it I can open up open up  my hardware and take out all of my  memory chips except for one of them and  run my serial code and if it slows down  then that means it was probably memory  bandwidth bound when I did it in  parallel but that's a pretty  heavy-handed way to diagnose this  problem is there anything we can do with  just software yes  yeah so you could use a tool like  valgrind  to count the number of memory accesses  yes  [Music]  so you can make sure only one processor  is being used but you can't you can't  but it might be using like more that  more memory than just the memory from  one chip there's actually a simpler way  to do this the idea is that we'll just  run P identical copies of the serial  code and then dole all be executing in  parallel and if the execution slows down  that that means that they were probably  contending for memory bandwidth does  that make sense one caveat of this is  you can only do this if you have enough  physical memory because if when you're  running P identical copies you have to  use more DRAM than if you just ran one  copy so you have to have enough physical  memory but often times you can usually  isolate some part of the code that you  think has a performance bottleneck and  just execute that part of the program  with P copies in parallel and hopefully  that will take less memory there also  hardware counters you can check if you  have root access to your machine that  can measure how much memory bandwidth  your program is using but this is a  pretty simple way to do this so there  are ways to diagnose lack of memory  bandwidth turns out that contention is  much harder to diagnose there are tools  that exists that detect lock contention  in an execution but usually they only  detect the contention when the condensor  actually happens but the contention  doesn't have to happen every time you  run your code so these tools don't  detect the potential for lock lock  contention and potential for true and  false sharing is even harder to detect  especially false sharing because if  you're using a bunch of variables in  your code you don't know which of those  mapped to the same cache line so this is  much harder to detect usually when  you're trying to debug the speed-up  bottleneck in your code you would first  look at the first three things here and  then once you eliminated those first  these things then you can start looking  at whether contention is causing the  problem  any questions okay so talked about  stencil computation I want to now talk  about another problem sorting and we  want to do this cache efficiently okay  so let's first analyze the cache  complexity of a standard merge sort so  we first need to analyze the complexity  of merging because this is used as a  subroutine in merge sort and as you  recall in merging we're given two sorted  input arrays and we want to generate an  output array that's also sorted  containing all the elements from the two  input arrays and the algorithm is going  to maintain a pointer to the head of  each of our input arrays and then it's  going to compare the two elements and  take the smaller one and put it into the  output and then increment the pointer  for that array and then we keep doing  this taking the smaller of the two  elements until the two input arrays  become empty  at which point we're done with the  algorithm we have one sorted output  array okay so to merge n elements the  time to do this is just theta of n here  n is the sum of the sizes of my two  input arrays and this is because I'm  only doing a constant amount of work for  each of my input elements what about the  number of cache misses  how many cache misses why why incur when  I'm merging n elements  yes  yeah so I'm going to incur theta n over  B cash misses because my two input  arrays are stored contiguously in  memories I can read them in B bytes at a  time with just one cache miss and then  my output array is also stored  contiguously so I can write things out B  bytes at a time with just one cache miss  I might waste waste a cache line at the  beginning and end of each of my three  arrays but that only affects the bound  by a constant factor so it's theta of n  over B so now let's look at merge sort  so recall that merge sort has two  recursive calls to itself on inputs of  half the size and then it doesn't merge  at the end to merge the two sorted  outputs of its recursive calls so if you  look at how the recursion proceeds its  first going to divide the input array  into two halves it's going to divide it  into two halves again again until you  get to the base case of just one element  at which point you return and then now  we start merging pairs of these elements  together so now I have these arrays of  size two in sorted order and I emerged  pairs of those arrays and I get sub  arrays of size four and that finally I  do this one more time to get my sorted  output ok so let's review the work of  merge sort so what's the recurrence for  merge sort if we're competing the work  yes  this  yeah so that's correct so I have two sub  problems of size n over two and that I  need to do theta and work to do the  merge and this is case two of master  theorem so I'm computing log base B of A  which is log base 2 of 2 in dashes 1 and  that's the same as the exponent in the  term that I'm adding in so since they're  the same I add in an additional log  factor and my overall work is just theta  of n log n okay so now I'm gonna solve  this recurrence again using the  recursion tree method I'm still gonna  get theta of n log n but I'm doing this  because it's gonna be useful when we  analyze the cache complexity so the top  level I have a problem of size N and I'm  going to branch into two problems of  size n over two and when I'm done with  them I have to do a merge which takes  theta n work and I'm just putting n here  I'm ignoring the constants and I'm got  branch again and each one of these is  gonna do n over 2 work to merge and I'm  gonna get all the way down to my base  case after log base two of n levels the  top level is doing and work second level  is also doing n work and it's gonna be  the same for all levels down to the  leaves leaves is also doing linear  amount of work so the overall work is  just the work per level times the number  of levels so it's just eight of n log N  any questions okay so now let's analyze  this with caching okay so we said  earlier that the cache complexity of the  merging subroutine is theta of n over B  and here's the recurrence for the cache  complexity of merge sort so my base case  here is when n is less than or equal to  CM for some sufficiently small constant  C and this is because at this point my  problem size fits into cache and  everything else I do for that problem is  still going to be in cash and to load it  into cash the the base case I need to  incur theta and over B cache misses and  otherwise I'm going to have two  recursive calls of size n over two and  then I need to do theta and over B cache  misses to do the merge of my two results  okay so here my base case is larger than  when I what I did for the work the  algorithm actually stole recursing down  to a constant massage base case but just  for analysis I'm stopping the recursion  when and is less than or equal to CM so  let's analyze this so again I'm going to  have the problems of size n at beginning  and then I'm gonna split it into two  problems of size n over two and then I'm  gonna have to pay and over B cache  misses to merge the results together  similarly for the next level now I'm  paying and over to be cache misses for  each of my two problems here to do the  merge and I keep going down until I get  to a sub problem of size theta of CM at  that point is going to fit in cache and  I don't need to recurse anymore in my  analysis so a number of levels of this  recursion tree it's just log base two of  n over cm so I'm basically chopping off  the bottom of this recursion tree the  number of levels I had below this is log  base two of CM so I'm taking log base  two of n and subtracting log base two of  CM and that's equivalent to log base two  of n divided  by CM the number of leaves I have is n  over cm since each leaf is Theta of CM  large in the number of cache misses I  need to incur to process a leaf it's  just theta of M over B because I just  need to load the the problem I just need  to load the input for that sub problem  into cache and then everything else fits  in cache so for the top level I'm  incurring and over B cache misses the  next level I'm also incurring n over B  cache misses same with the third level  and then for the leaves I'm incurring M  over B times and over C on cache misses  the n over cm is the number of leaves I  have in theta of M over B is the number  of cache misses per leaf and that also  equals theta of n over B so overall I  multiply theta n over B by the number of  levels I have so the number of levels I  have is log base two of n over cm and I  just got rid of the constant here since  doesn't affect asymptotic bound so the  number of cache misses I have a state of  n over B times log base two of or any  log any base for the log log of n over m  so any questions on this analysis  Sok I am saving a factor of B here in  the first term so that does reduce that  just gives me a better cache complexity  than just a work bound but for the M  term it's actually inside the  denominator of the log and that doesn't  actually help me that much so let's look  at how much we actually save so when n  is much greater than M then log base 2  of n over m is approximately equal to  log base 2 of n so the M term doesn't  contribute much to the asymptotic cost  and therefore compared to the work we're  only saving a factor of B when n is  approximately equal to M then log base 2  of n over m is constant and we're saving  a factor of B log n so we save more when  the memory size or when the problem size  is small but for large enough problem  sizes we're only saving a factor of B so  does anyone think that we can do better  than this so I've asked this question  several times before and the answer is  always the same yes it's a good answer  so we're gonna do this using multi way  merging so instead of just merging two  sorted sub-arrays we're gonna merge  together our sorted sub-arrays so we're  going to have our sub arrays each of  size and over R and these are sorted and  I'm gonna merge them together using  what's called a tournament tree so how  determinate tournament tree works is I'm  going to compare the heads of each pair  of these sub arrays and store it in the  node of the tournament tree and then  after I do that I compare these two  nodes and I get the minimum of those two  eventually after I compare all all the  elements I'm just left with the minimum  element at the root of the tournament  tree and then I can place that into my  output  so the first time I want to fill this  tournament tree it's going to take theta  of our work because there are nodes in  my tournament tree so when I compare  these two elements the smaller one is  six for these two the smaller one is two  and then I compare two and six take the  smaller one and then on the other side I  have a seven here so I compare two and  seven two is smaller so it appears at  the root and then I know that that's  going to be my minimum element among the  heads of all of the our sub arrays that  I'm passing it so the first time to  generate this tournament tree takes  theta of our work because I have to fill  in our nodes but once I generated this  tournament tree for all subsequent  rounds I only need to fill in the path  from the element that one to the output  or rate or to the root of the tournament  tree because those are the only values  that would have changed so now I'm going  to fill in this path here and this only  takes theta of log our work to do it  because the the height of this  tournament tree is log base 2 of R so  I'm going to fill this in now 14 goes  here 6 is a smaller of the two and then  6 is a smaller to 2 again so my next  element is 6 and I keep doing this until  all of the elements from my our sub  arrays get put into the output the total  work for merging is going to be theta of  R for the first round plus n log R for  all the remaining rounds and that's just  equal to theta of n log R because we're  assuming that n is bigger than R here  any questions on how the multi way merge  works  No okay so let's analyze the work of  this multi way merge that when used in  size inside merge sort  so recurrence is gonna be as follows so  if n is 1 then we just do a cost amount  of work otherwise we have our  subproblems of size and over R and then  we're paying stative n log R to do the  multi way merge so here's the recursion  tree at the top level we have a problem  of size n and then we're gonna split  into our subproblems of size n of R and  that we have to pay and log our work to  merge the results of the recursive call  together and then we keep doing this  turns out that the work at each level  sums sums up to n log based let n log  base 2 of R in the number of levels we  have here is log base R of n because  each time we're branching by a factor of  R for the leaves we have n leaves and we  just pale in your work for that because  we don't have to pay for the merge we're  not doing any more recursive calls  so the overall work is going to be theta  of n log R times log base R of n plus n  for the leaves but that's just a lower  order term and if you work out the math  some of these terms are gonna cancel out  and you just get theta of n log n for  the work so the work is the same as the  binary merge sort let's now analyze the  cache complexity so let's assume that we  have R less than C M over B for a  sufficiently small constant C less than  or equal to 1 we're going to consider  the our way merging of contiguous arrays  of total size N and if R is less than C  M over B then we can fit the entire  tournament tree into cache and we can  also fit one block from each of the R  sub arrays into cache and in that case  the total number of cache misses to do  the multi way merge is just theta and  over B because we just have to go over  the end elements in our input arrays  so the recurrence for the are way merge  sort is as follows so if n is less than  cm then it fits in cash it's a number of  cache misses we page the state of n over  B and otherwise we have our subproblems  of size and over R and that we add theta  and over B to do the merge of the  results of the subproblems any questions  on the recurrence here yes good question  so we didn't pick the value of R so this  is not a cache oblivious algorithm and  we'll see what to choose for our in a  couple slides so let's analyze the cache  complexity of this algorithm again using  the recursion tree analysis so at the  top level we're going to have our  subproblems of size n over R and we have  to pay and over B cache misses to merge  them and it turns out that at each level  the number of cache misses we have to  pay is n over B if you work out the math  and the number of levels of this  recursion tree is going to be log base R  of n over cm because we're going to stop  recursing when our problem size is cm  and on every level of recursion we're  branching by a factor of R so our leaf  our leaf size is cm therefore the number  of leaves we have is n over cm and for  each leaf it's going to take theta of M  over B cache misses to load it into  cache and afterwards we can do  everything in cache and multiplying the  number of leaves by the cost per leaf we  get theta n over B cache misses and  therefore the number of cache misses is  n over B times the number of levels  number of levels is log base R of n over  m so compare to the binary merge sort  algorithm here we actually have a factor  of R in the base of the log before the  the base of the log wishes to  so now the question is what we're gonna  set our to be so again we have a voodoo  parameter this is not a cache oblivious  algorithm and we said that R has to be  less than or equal to CM / B in order  for the analysis to work out so let's  just make it as large as possible  let's just set R equal to theta of M  over B and now now we can see what this  complexity works out to be so we have  the tall cache assumption we also have  the fact that log base M of n over m is  equal to theta of log n over log M so  the cache complexity estate of n over B  times log base M over B of n over m but  if we have the tall cache assumption  that log base M over B is the same as  log base of M asymptotically so that's  how we get to the second line and then  you can rearrange some terms and cancel  some terms out and we'll end up with  theta n log n divided by B log M so  we're saving a factor of theta of B log  M compare to the work of the algorithm  whereas for the binary version of  mercero we were only saving a factor of  B for large enough inputs so here we get  another factor of log m in our savings  so as I said the binary 1 cache misses  is n log n over B whereas the multi-way  one is n log n over B log m and as long  as n is much greater than M then we're  actually saving more much more than the  binary version so we're saving a factor  of log M in cache misses and let's just  ignore the constants here and look at  what log m can be in practice so here  are some typical cache sizes the l1  cache size is 32 kilobytes so that's to  215th and log base 2 of that is 15 so we  get a 15 X savings and then for the  larger cache sizes we get an even larger  saving sign any questions on this  so the problem with this algorithm is  that it's not cache-oblivious so we have  to tune the value of R for a particular  machine and even when we're running on  the same machine there could be other  jobs running that content for the cache  turns out that there are several cache  oblivious sorting algorithms the first  one that was developed was by paper by  charles larsen and it's called funnel  sort the idea here is to recursively  sort end to the 1/3 groups of n to the  2/3 elements and then merged the sorted  groups with an end to the 1/3 funnel and  this funnel is called a cage K funnel  more generally and it merges together K  cubed elements and K sorted lists and  the cost for doing this merge is as  shown here and you plug this into the  recurrence  you'll get that the asymptotic number of  cache misses is the same as that of the  multi way merge sort algorithm while  being cache oblivious and this bound is  actually optimal so I'm not gonna have  time to talk about the details of the  funnel sort algorithm but I do have time  to just show you a pretty picture of  what the funnel looks like so this is  what a K funnel looks like it's  recursively constructed we have a bunch  of square root of K funnels inside  they're all connected to some buffers so  they feed alamos to buffer and then the  buffers feed element to this output  square root of K funnel which outputs to  which becomes the output for the K  funnel so this whole blue thing is the K  funnel and the small green triangles are  square root of K funnels and the number  of cache misses incurred by doing this  merge is shown here and it uses the tall  cache assumption for analysis so a  pretty cool algorithm and we posted a  paper online that describes this  algorithm so if you're interested in the  details I encourage you to read that  paper there are also many other cache  oblivious algorithms out there so that  there's been hundreds of papers on cache  oblivious algorithms here are some of  them some of these are also described in  the paper in fact I think all of these  are described in the paper we posted  so some cool cache-oblivious data  structures have been developed such as  for b-trees ordered filed maintenance  and priority queues so it's a lot of  literature on cache oblivious algorithms  and there's also a lot of material  online if you're interested in learning  more
 

3
00:00:05,769 --> 00:00:08,019

 
4
00:00:08,019 --> 00:00:09,850

 
5
00:00:09,850 --> 00:00:10,930

 
6
00:00:10,930 --> 00:00:13,120

 
7
00:00:13,120 --> 00:00:15,160

 
8
00:00:15,160 --> 00:00:22,590

 
9
00:00:22,590 --> 00:00:24,730

 
10
00:00:24,730 --> 00:00:27,850

 
11
00:00:27,850 --> 00:00:29,530

 
12
00:00:29,530 --> 00:00:39,010

 
13
00:00:39,010 --> 00:00:43,600

 
14
00:00:43,600 --> 00:00:47,410

 
15
00:00:47,410 --> 00:00:51,400

 
16
00:00:51,400 --> 00:00:53,380

 
17
00:00:53,380 --> 00:00:56,110

 
18
00:00:56,110 --> 00:00:57,580

 
19
00:00:57,580 --> 00:00:59,830

 
20
00:00:59,830 --> 00:01:01,720

 
21
00:01:01,720 --> 00:01:03,910

 
22
00:01:03,910 --> 00:01:06,520

 
23
00:01:06,520 --> 00:01:08,980

 
24
00:01:08,980 --> 00:01:12,249

 
25
00:01:12,249 --> 00:01:14,200

 
26
00:01:14,200 --> 00:01:16,359

 
27
00:01:16,359 --> 00:01:18,160

 
28
00:01:18,160 --> 00:01:19,840

 
29
00:01:19,840 --> 00:01:22,960

 
30
00:01:22,960 --> 00:01:24,810

 
31
00:01:24,810 --> 00:01:31,210

 
32
00:01:31,210 --> 00:01:33,550

 
33
00:01:33,550 --> 00:01:37,300

 
34
00:01:37,300 --> 00:01:40,810

 
35
00:01:40,810 --> 00:01:44,109

 
36
00:01:44,109 --> 00:01:47,200

 
37
00:01:47,200 --> 00:01:50,020

 
38
00:01:50,020 --> 00:01:51,789

 
39
00:01:51,789 --> 00:01:55,800

 
40
00:01:55,800 --> 00:01:59,469

 
41
00:01:59,469 --> 00:02:02,230

 
42
00:02:02,230 --> 00:02:03,760

 
43
00:02:03,760 --> 00:02:07,450

 
44
00:02:07,450 --> 00:02:11,590

 
45
00:02:11,590 --> 00:02:13,210

 
46
00:02:13,210 --> 00:02:16,000

 
47
00:02:16,000 --> 00:02:17,890

 
48
00:02:17,890 --> 00:02:22,390

 
49
00:02:22,390 --> 00:02:25,060

 
50
00:02:25,060 --> 00:02:28,479

 
51
00:02:28,479 --> 00:02:31,120

 
52
00:02:31,120 --> 00:02:33,490

 
53
00:02:33,490 --> 00:02:35,350

 
54
00:02:35,350 --> 00:02:39,400

 
55
00:02:39,400 --> 00:02:44,260

 
56
00:02:44,260 --> 00:02:45,850

 
57
00:02:45,850 --> 00:02:48,190

 
58
00:02:48,190 --> 00:02:50,050

 
59
00:02:50,050 --> 00:02:54,220

 
60
00:02:54,220 --> 00:02:56,640

 
61
00:02:56,640 --> 00:02:58,420

 
62
00:02:58,420 --> 00:03:01,270

 
63
00:03:01,270 --> 00:03:03,130

 
64
00:03:03,130 --> 00:03:04,780

 
65
00:03:04,780 --> 00:03:10,900

 
66
00:03:10,900 --> 00:03:14,290

 
67
00:03:14,290 --> 00:03:16,290

 
68
00:03:16,290 --> 00:03:18,970

 
69
00:03:18,970 --> 00:03:20,890

 
70
00:03:20,890 --> 00:03:22,780

 
71
00:03:22,780 --> 00:03:24,970

 
72
00:03:24,970 --> 00:03:28,090

 
73
00:03:28,090 --> 00:03:30,550

 
74
00:03:30,550 --> 00:03:31,720

 
75
00:03:31,720 --> 00:03:35,080

 
76
00:03:35,080 --> 00:03:37,060

 
77
00:03:37,060 --> 00:03:38,860

 
78
00:03:38,860 --> 00:03:41,590

 
79
00:03:41,590 --> 00:03:43,240

 
80
00:03:43,240 --> 00:03:46,150

 
81
00:03:46,150 --> 00:03:48,600

 
82
00:03:48,600 --> 00:03:51,400

 
83
00:03:51,400 --> 00:03:53,949

 
84
00:03:53,949 --> 00:03:57,759

 
85
00:03:57,759 --> 00:03:59,949

 
86
00:03:59,949 --> 00:04:01,660

 
87
00:04:01,660 --> 00:04:08,860

 
88
00:04:08,860 --> 00:04:10,300

 
89
00:04:10,300 --> 00:04:11,620

 
90
00:04:11,620 --> 00:04:14,110

 
91
00:04:14,110 --> 00:04:16,570

 
92
00:04:16,570 --> 00:04:19,380

 
93
00:04:19,380 --> 00:04:22,930

 
94
00:04:22,930 --> 00:04:25,210

 
95
00:04:25,210 --> 00:04:27,250

 
96
00:04:27,250 --> 00:04:30,030

 
97
00:04:30,030 --> 00:04:32,980

 
98
00:04:32,980 --> 00:04:35,770

 
99
00:04:35,770 --> 00:04:37,659

 
100
00:04:37,659 --> 00:04:40,450

 
101
00:04:40,450 --> 00:04:42,010

 
102
00:04:42,010 --> 00:04:42,020

 
103
00:04:42,020 --> 00:04:44,450


104
00:04:44,450 --> 00:04:46,970

 
105
00:04:46,970 --> 00:04:50,330

 
106
00:04:50,330 --> 00:04:52,640

 
107
00:04:52,640 --> 00:04:55,940

 
108
00:04:55,940 --> 00:04:57,950

 
109
00:04:57,950 --> 00:05:00,980

 
110
00:05:00,980 --> 00:05:04,850

 
111
00:05:04,850 --> 00:05:07,750

 
112
00:05:07,750 --> 00:05:11,510

 
113
00:05:11,510 --> 00:05:14,930

 
114
00:05:14,930 --> 00:05:19,670

 
115
00:05:19,670 --> 00:05:23,180

 
116
00:05:23,180 --> 00:05:24,560

 
117
00:05:24,560 --> 00:05:27,530

 
118
00:05:27,530 --> 00:05:31,400

 
119
00:05:31,400 --> 00:05:34,130

 
120
00:05:34,130 --> 00:05:35,630

 
121
00:05:35,630 --> 00:05:38,890

 
122
00:05:38,890 --> 00:05:41,060

 
123
00:05:41,060 --> 00:05:43,250

 
124
00:05:43,250 --> 00:05:47,210

 
125
00:05:47,210 --> 00:05:50,090

 
126
00:05:50,090 --> 00:05:51,710

 
127
00:05:51,710 --> 00:05:56,390

 
128
00:05:56,390 --> 00:05:59,840

 
129
00:05:59,840 --> 00:06:03,500

 
130
00:06:03,500 --> 00:06:05,540

 
131
00:06:05,540 --> 00:06:07,010

 
132
00:06:07,010 --> 00:06:08,840

 
133
00:06:08,840 --> 00:06:10,610

 
134
00:06:10,610 --> 00:06:12,560

 
135
00:06:12,560 --> 00:06:14,060

 
136
00:06:14,060 --> 00:06:16,100

 
137
00:06:16,100 --> 00:06:18,550

 
138
00:06:18,550 --> 00:06:21,410

 
139
00:06:21,410 --> 00:06:23,270

 
140
00:06:23,270 --> 00:06:24,560

 
141
00:06:24,560 --> 00:06:26,840

 
142
00:06:26,840 --> 00:06:29,960

 
143
00:06:29,960 --> 00:06:35,120

 
144
00:06:35,120 --> 00:06:37,280

 
145
00:06:37,280 --> 00:06:38,810

 
146
00:06:38,810 --> 00:06:42,080

 
147
00:06:42,080 --> 00:06:44,060

 
148
00:06:44,060 --> 00:06:45,830

 
149
00:06:45,830 --> 00:06:47,930

 
150
00:06:47,930 --> 00:06:49,550

 
151
00:06:49,550 --> 00:06:51,100

 
152
00:06:51,100 --> 00:06:53,540

 
153
00:06:53,540 --> 00:06:55,730

 
154
00:06:55,730 --> 00:06:58,190

 
155
00:06:58,190 --> 00:07:00,380

 
156
00:07:00,380 --> 00:07:03,830

 
157
00:07:03,830 --> 00:07:06,020

 
158
00:07:06,020 --> 00:07:07,940

 
159
00:07:07,940 --> 00:07:10,970

 
160
00:07:10,970 --> 00:07:13,550

 
161
00:07:13,550 --> 00:07:18,080

 
162
00:07:18,080 --> 00:07:21,560

 
163
00:07:21,560 --> 00:07:24,170

 
164
00:07:24,170 --> 00:07:27,320

 
165
00:07:27,320 --> 00:07:31,400

 
166
00:07:31,400 --> 00:07:34,220

 
167
00:07:34,220 --> 00:07:37,250

 
168
00:07:37,250 --> 00:07:40,070

 
169
00:07:40,070 --> 00:07:42,290

 
170
00:07:42,290 --> 00:07:44,870

 
171
00:07:44,870 --> 00:07:47,120

 
172
00:07:47,120 --> 00:07:49,580

 
173
00:07:49,580 --> 00:07:55,610

 
174
00:07:55,610 --> 00:07:58,640

 
175
00:07:58,640 --> 00:08:00,290

 
176
00:08:00,290 --> 00:08:01,910

 
177
00:08:01,910 --> 00:08:05,300

 
178
00:08:05,300 --> 00:08:08,030

 
179
00:08:08,030 --> 00:08:09,860

 
180
00:08:09,860 --> 00:08:11,810

 
181
00:08:11,810 --> 00:08:15,610

 
182
00:08:15,610 --> 00:08:19,040

 
183
00:08:19,040 --> 00:08:21,070

 
184
00:08:21,070 --> 00:08:23,480

 
185
00:08:23,480 --> 00:08:23,490

 
186
00:08:23,490 --> 00:08:28,740


187
00:08:28,740 --> 00:08:32,940

 
188
00:08:32,940 --> 00:08:37,420

 
189
00:08:37,420 --> 00:08:39,160

 
190
00:08:39,160 --> 00:08:42,600

 
191
00:08:42,600 --> 00:08:46,930

 
192
00:08:46,930 --> 00:08:48,430

 
193
00:08:48,430 --> 00:08:49,960

 
194
00:08:49,960 --> 00:08:51,850

 
195
00:08:51,850 --> 00:08:54,940

 
196
00:08:54,940 --> 00:08:57,940

 
197
00:08:57,940 --> 00:08:59,590

 
198
00:08:59,590 --> 00:09:01,540

 
199
00:09:01,540 --> 00:09:04,810

 
200
00:09:04,810 --> 00:09:08,220

 
201
00:09:08,220 --> 00:09:12,580

 
202
00:09:12,580 --> 00:09:14,440

 
203
00:09:14,440 --> 00:09:17,860

 
204
00:09:17,860 --> 00:09:20,950

 
205
00:09:20,950 --> 00:09:24,340

 
206
00:09:24,340 --> 00:09:27,400

 
207
00:09:27,400 --> 00:09:28,780

 
208
00:09:28,780 --> 00:09:30,970

 
209
00:09:30,970 --> 00:09:32,380

 
210
00:09:32,380 --> 00:09:34,510

 
211
00:09:34,510 --> 00:09:37,270

 
212
00:09:37,270 --> 00:09:42,340

 
213
00:09:42,340 --> 00:09:46,060

 
214
00:09:46,060 --> 00:09:48,640

 
215
00:09:48,640 --> 00:09:53,590

 
216
00:09:53,590 --> 00:09:56,890

 
217
00:09:56,890 --> 00:09:59,050

 
218
00:09:59,050 --> 00:10:02,110

 
219
00:10:02,110 --> 00:10:06,610

 
220
00:10:06,610 --> 00:10:11,500

 
221
00:10:11,500 --> 00:10:12,940

 
222
00:10:12,940 --> 00:10:16,390

 
223
00:10:16,390 --> 00:10:19,210

 
224
00:10:19,210 --> 00:10:22,450

 
225
00:10:22,450 --> 00:10:29,680

 
226
00:10:29,680 --> 00:10:32,260

 
227
00:10:32,260 --> 00:10:34,900

 
228
00:10:34,900 --> 00:10:38,020

 
229
00:10:38,020 --> 00:10:41,560

 
230
00:10:41,560 --> 00:10:42,460

 
231
00:10:42,460 --> 00:10:45,310

 
232
00:10:45,310 --> 00:10:47,130

 
233
00:10:47,130 --> 00:10:50,620

 
234
00:10:50,620 --> 00:10:52,870

 
235
00:10:52,870 --> 00:10:54,340

 
236
00:10:54,340 --> 00:10:57,000

 
237
00:10:57,000 --> 00:10:59,500

 
238
00:10:59,500 --> 00:11:01,450

 
239
00:11:01,450 --> 00:11:04,810

 
240
00:11:04,810 --> 00:11:08,080

 
241
00:11:08,080 --> 00:11:09,490

 
242
00:11:09,490 --> 00:11:12,700

 
243
00:11:12,700 --> 00:11:14,830

 
244
00:11:14,830 --> 00:11:18,540

 
245
00:11:18,540 --> 00:11:23,560

 
246
00:11:23,560 --> 00:11:26,140

 
247
00:11:26,140 --> 00:11:29,220

 
248
00:11:29,220 --> 00:11:33,160

 
249
00:11:33,160 --> 00:11:34,950

 
250
00:11:34,950 --> 00:11:37,750

 
251
00:11:37,750 --> 00:11:40,690

 
252
00:11:40,690 --> 00:11:42,820

 
253
00:11:42,820 --> 00:11:44,320

 
254
00:11:44,320 --> 00:11:46,060

 
255
00:11:46,060 --> 00:11:50,640

 
256
00:11:50,640 --> 00:11:53,740

 
257
00:11:53,740 --> 00:11:55,840

 
258
00:11:55,840 --> 00:11:57,820

 
259
00:11:57,820 --> 00:12:00,430

 
260
00:12:00,430 --> 00:12:11,640

 
261
00:12:11,640 --> 00:12:11,650

 
262
00:12:11,650 --> 00:12:22,910


263
00:12:22,910 --> 00:12:28,470

 
264
00:12:28,470 --> 00:12:30,300

 
265
00:12:30,300 --> 00:12:34,500

 
266
00:12:34,500 --> 00:12:48,949

 
267
00:12:48,949 --> 00:12:51,360

 
268
00:12:51,360 --> 00:12:54,150

 
269
00:12:54,150 --> 00:12:56,550

 
270
00:12:56,550 --> 00:12:58,620

 
271
00:12:58,620 --> 00:13:00,840

 
272
00:13:00,840 --> 00:13:02,730

 
273
00:13:02,730 --> 00:13:04,500

 
274
00:13:04,500 --> 00:13:06,389

 
275
00:13:06,389 --> 00:13:08,670

 
276
00:13:08,670 --> 00:13:11,280

 
277
00:13:11,280 --> 00:13:13,230

 
278
00:13:13,230 --> 00:13:15,150

 
279
00:13:15,150 --> 00:13:17,400

 
280
00:13:17,400 --> 00:13:25,410

 
281
00:13:25,410 --> 00:13:28,199

 
282
00:13:28,199 --> 00:13:30,420

 
283
00:13:30,420 --> 00:13:32,970

 
284
00:13:32,970 --> 00:13:44,060

 
285
00:13:44,060 --> 00:13:46,400

 
286
00:13:46,400 --> 00:13:48,139

 
287
00:13:48,139 --> 00:13:51,920

 
288
00:13:51,920 --> 00:13:59,780

 
289
00:13:59,780 --> 00:14:02,059

 
290
00:14:02,059 --> 00:14:03,920

 
291
00:14:03,920 --> 00:14:05,540

 
292
00:14:05,540 --> 00:14:07,610

 
293
00:14:07,610 --> 00:14:09,379

 
294
00:14:09,379 --> 00:14:11,870

 
295
00:14:11,870 --> 00:14:14,090

 
296
00:14:14,090 --> 00:14:30,220

 
297
00:14:30,220 --> 00:14:33,140

 
298
00:14:33,140 --> 00:14:35,660

 
299
00:14:35,660 --> 00:14:37,580

 
300
00:14:37,580 --> 00:14:40,670

 
301
00:14:40,670 --> 00:14:43,190

 
302
00:14:43,190 --> 00:14:44,570

 
303
00:14:44,570 --> 00:14:54,870

 
304
00:14:54,870 --> 00:14:54,880

 
305
00:14:54,880 --> 00:14:58,519


306
00:14:58,519 --> 00:15:01,099

 
307
00:15:01,099 --> 00:15:03,289

 
308
00:15:03,289 --> 00:15:05,779

 
309
00:15:05,779 --> 00:15:09,199

 
310
00:15:09,199 --> 00:15:11,869

 
311
00:15:11,869 --> 00:15:14,209

 
312
00:15:14,209 --> 00:15:16,399

 
313
00:15:16,399 --> 00:15:17,809

 
314
00:15:17,809 --> 00:15:19,519

 
315
00:15:19,519 --> 00:15:29,119

 
316
00:15:29,119 --> 00:15:31,609

 
317
00:15:31,609 --> 00:15:33,499

 
318
00:15:33,499 --> 00:15:35,869

 
319
00:15:35,869 --> 00:15:37,569

 
320
00:15:37,569 --> 00:15:41,259

 
321
00:15:41,259 --> 00:15:44,599

 
322
00:15:44,599 --> 00:15:46,189

 
323
00:15:46,189 --> 00:15:48,559

 
324
00:15:48,559 --> 00:15:52,639

 
325
00:15:52,639 --> 00:15:55,549

 
326
00:15:55,549 --> 00:15:59,239

 
327
00:15:59,239 --> 00:16:01,429

 
328
00:16:01,429 --> 00:16:03,499

 
329
00:16:03,499 --> 00:16:07,819

 
330
00:16:07,819 --> 00:16:10,639

 
331
00:16:10,639 --> 00:16:12,259

 
332
00:16:12,259 --> 00:16:14,899

 
333
00:16:14,899 --> 00:16:18,710

 
334
00:16:18,710 --> 00:16:20,539

 
335
00:16:20,539 --> 00:16:23,509

 
336
00:16:23,509 --> 00:16:26,299

 
337
00:16:26,299 --> 00:16:28,759

 
338
00:16:28,759 --> 00:16:31,879

 
339
00:16:31,879 --> 00:16:33,909

 
340
00:16:33,909 --> 00:16:36,469

 
341
00:16:36,469 --> 00:16:39,109

 
342
00:16:39,109 --> 00:16:40,849

 
343
00:16:40,849 --> 00:16:42,499

 
344
00:16:42,499 --> 00:16:46,489

 
345
00:16:46,489 --> 00:16:49,099

 
346
00:16:49,099 --> 00:16:52,729

 
347
00:16:52,729 --> 00:16:54,829

 
348
00:16:54,829 --> 00:16:58,279

 
349
00:16:58,279 --> 00:16:59,629

 
350
00:16:59,629 --> 00:17:01,339

 
351
00:17:01,339 --> 00:17:03,649

 
352
00:17:03,649 --> 00:17:06,760

 
353
00:17:06,760 --> 00:17:09,730

 
354
00:17:09,730 --> 00:17:11,710

 
355
00:17:11,710 --> 00:17:14,890

 
356
00:17:14,890 --> 00:17:16,750

 
357
00:17:16,750 --> 00:17:18,220

 
358
00:17:18,220 --> 00:17:21,699

 
359
00:17:21,699 --> 00:17:23,850

 
360
00:17:23,850 --> 00:17:26,169

 
361
00:17:26,169 --> 00:17:27,370

 
362
00:17:27,370 --> 00:17:30,700

 
363
00:17:30,700 --> 00:17:36,130

 
364
00:17:36,130 --> 00:17:38,049

 
365
00:17:38,049 --> 00:17:39,730

 
366
00:17:39,730 --> 00:17:42,880

 
367
00:17:42,880 --> 00:17:45,040

 
368
00:17:45,040 --> 00:17:49,390

 
369
00:17:49,390 --> 00:17:51,040

 
370
00:17:51,040 --> 00:17:53,020

 
371
00:17:53,020 --> 00:17:54,580

 
372
00:17:54,580 --> 00:17:54,590

 
373
00:17:54,590 --> 00:18:06,070


374
00:18:06,070 --> 00:18:08,650

 
375
00:18:08,650 --> 00:18:11,860

 
376
00:18:11,860 --> 00:18:13,270

 
377
00:18:13,270 --> 00:18:16,990

 
378
00:18:16,990 --> 00:18:18,910

 
379
00:18:18,910 --> 00:18:26,220

 
380
00:18:26,220 --> 00:18:28,630

 
381
00:18:28,630 --> 00:18:31,180

 
382
00:18:31,180 --> 00:18:32,920

 
383
00:18:32,920 --> 00:18:35,230

 
384
00:18:35,230 --> 00:18:37,810

 
385
00:18:37,810 --> 00:18:40,510

 
386
00:18:40,510 --> 00:18:42,010

 
387
00:18:42,010 --> 00:18:43,480

 
388
00:18:43,480 --> 00:18:45,250

 
389
00:18:45,250 --> 00:18:47,380

 
390
00:18:47,380 --> 00:18:50,470

 
391
00:18:50,470 --> 00:18:52,750

 
392
00:18:52,750 --> 00:18:56,830

 
393
00:18:56,830 --> 00:18:58,930

 
394
00:18:58,930 --> 00:19:00,430

 
395
00:19:00,430 --> 00:19:02,560

 
396
00:19:02,560 --> 00:19:04,870

 
397
00:19:04,870 --> 00:19:06,880

 
398
00:19:06,880 --> 00:19:09,190

 
399
00:19:09,190 --> 00:19:10,990

 
400
00:19:10,990 --> 00:19:12,670

 
401
00:19:12,670 --> 00:19:14,320

 
402
00:19:14,320 --> 00:19:18,220

 
403
00:19:18,220 --> 00:19:28,640

 
404
00:19:28,640 --> 00:19:35,490

 
405
00:19:35,490 --> 00:19:38,220

 
406
00:19:38,220 --> 00:19:41,250

 
407
00:19:41,250 --> 00:19:43,110

 
408
00:19:43,110 --> 00:19:45,120

 
409
00:19:45,120 --> 00:19:46,710

 
410
00:19:46,710 --> 00:19:48,780

 
411
00:19:48,780 --> 00:19:50,640

 
412
00:19:50,640 --> 00:19:54,300

 
413
00:19:54,300 --> 00:19:57,840

 
414
00:19:57,840 --> 00:20:00,120

 
415
00:20:00,120 --> 00:20:03,660

 
416
00:20:03,660 --> 00:20:06,210

 
417
00:20:06,210 --> 00:20:09,480

 
418
00:20:09,480 --> 00:20:13,830

 
419
00:20:13,830 --> 00:20:15,420

 
420
00:20:15,420 --> 00:20:19,890

 
421
00:20:19,890 --> 00:20:23,220

 
422
00:20:23,220 --> 00:20:27,420

 
423
00:20:27,420 --> 00:20:30,630

 
424
00:20:30,630 --> 00:20:33,870

 
425
00:20:33,870 --> 00:20:35,550

 
426
00:20:35,550 --> 00:20:37,200

 
427
00:20:37,200 --> 00:20:41,070

 
428
00:20:41,070 --> 00:20:43,470

 
429
00:20:43,470 --> 00:20:46,260

 
430
00:20:46,260 --> 00:20:49,320

 
431
00:20:49,320 --> 00:20:52,650

 
432
00:20:52,650 --> 00:20:55,410

 
433
00:20:55,410 --> 00:20:58,830

 
434
00:20:58,830 --> 00:21:01,530

 
435
00:21:01,530 --> 00:21:04,230

 
436
00:21:04,230 --> 00:21:07,740

 
437
00:21:07,740 --> 00:21:13,470

 
438
00:21:13,470 --> 00:21:16,020

 
439
00:21:16,020 --> 00:21:18,480

 
440
00:21:18,480 --> 00:21:22,140

 
441
00:21:22,140 --> 00:21:24,810

 
442
00:21:24,810 --> 00:21:28,920

 
443
00:21:28,920 --> 00:21:31,050

 
444
00:21:31,050 --> 00:21:35,310

 
445
00:21:35,310 --> 00:21:37,500

 
446
00:21:37,500 --> 00:21:38,400

 
447
00:21:38,400 --> 00:21:41,580

 
448
00:21:41,580 --> 00:21:43,650

 
449
00:21:43,650 --> 00:21:46,020

 
450
00:21:46,020 --> 00:21:50,520

 
451
00:21:50,520 --> 00:22:04,010

 
452
00:22:04,010 --> 00:22:06,320

 
453
00:22:06,320 --> 00:22:09,260

 
454
00:22:09,260 --> 00:22:17,510

 
455
00:22:17,510 --> 00:22:22,100

 
456
00:22:22,100 --> 00:22:24,799

 
457
00:22:24,799 --> 00:22:27,049

 
458
00:22:27,049 --> 00:22:29,270

 
459
00:22:29,270 --> 00:22:31,250

 
460
00:22:31,250 --> 00:22:33,590

 
461
00:22:33,590 --> 00:22:35,870

 
462
00:22:35,870 --> 00:22:37,790

 
463
00:22:37,790 --> 00:22:39,950

 
464
00:22:39,950 --> 00:22:42,049

 
465
00:22:42,049 --> 00:22:44,870

 
466
00:22:44,870 --> 00:22:46,790

 
467
00:22:46,790 --> 00:22:51,740

 
468
00:22:51,740 --> 00:23:03,410

 
469
00:23:03,410 --> 00:23:10,630

 
470
00:23:10,630 --> 00:23:15,380

 
471
00:23:15,380 --> 00:23:16,850

 
472
00:23:16,850 --> 00:23:18,680

 
473
00:23:18,680 --> 00:23:20,120

 
474
00:23:20,120 --> 00:23:23,930

 
475
00:23:23,930 --> 00:23:26,180

 
476
00:23:26,180 --> 00:23:29,450

 
477
00:23:29,450 --> 00:23:32,840

 
478
00:23:32,840 --> 00:23:37,490

 
479
00:23:37,490 --> 00:23:41,270

 
480
00:23:41,270 --> 00:23:45,080

 
481
00:23:45,080 --> 00:23:47,360

 
482
00:23:47,360 --> 00:23:50,090

 
483
00:23:50,090 --> 00:23:55,330

 
484
00:23:55,330 --> 00:23:58,669

 
485
00:23:58,669 --> 00:24:02,270

 
486
00:24:02,270 --> 00:24:05,120

 
487
00:24:05,120 --> 00:24:07,250

 
488
00:24:07,250 --> 00:24:11,409

 
489
00:24:11,409 --> 00:24:16,190

 
490
00:24:16,190 --> 00:24:17,480

 
491
00:24:17,480 --> 00:24:20,990

 
492
00:24:20,990 --> 00:24:24,860

 
493
00:24:24,860 --> 00:24:26,870

 
494
00:24:26,870 --> 00:24:30,410

 
495
00:24:30,410 --> 00:24:33,650

 
496
00:24:33,650 --> 00:24:36,919

 
497
00:24:36,919 --> 00:24:38,480

 
498
00:24:38,480 --> 00:24:41,510

 
499
00:24:41,510 --> 00:24:44,660

 
500
00:24:44,660 --> 00:24:51,520

 
501
00:24:51,520 --> 00:24:55,430

 
502
00:24:55,430 --> 00:24:57,140

 
503
00:24:57,140 --> 00:24:59,030

 
504
00:24:59,030 --> 00:25:01,700

 
505
00:25:01,700 --> 00:25:02,990

 
506
00:25:02,990 --> 00:25:04,910

 
507
00:25:04,910 --> 00:25:07,130

 
508
00:25:07,130 --> 00:25:08,870

 
509
00:25:08,870 --> 00:25:14,900

 
510
00:25:14,900 --> 00:25:17,600

 
511
00:25:17,600 --> 00:25:20,240

 
512
00:25:20,240 --> 00:25:22,190

 
513
00:25:22,190 --> 00:25:25,130

 
514
00:25:25,130 --> 00:25:28,580

 
515
00:25:28,580 --> 00:25:32,060

 
516
00:25:32,060 --> 00:25:34,700

 
517
00:25:34,700 --> 00:25:38,540

 
518
00:25:38,540 --> 00:25:41,240

 
519
00:25:41,240 --> 00:25:43,130

 
520
00:25:43,130 --> 00:25:44,720

 
521
00:25:44,720 --> 00:25:49,070

 
522
00:25:49,070 --> 00:25:56,330

 
523
00:25:56,330 --> 00:25:58,940

 
524
00:25:58,940 --> 00:25:59,840

 
525
00:25:59,840 --> 00:26:02,150

 
526
00:26:02,150 --> 00:26:05,960

 
527
00:26:05,960 --> 00:26:08,390

 
528
00:26:08,390 --> 00:26:09,830

 
529
00:26:09,830 --> 00:26:11,720

 
530
00:26:11,720 --> 00:26:15,280

 
531
00:26:15,280 --> 00:26:20,330

 
532
00:26:20,330 --> 00:26:22,370

 
533
00:26:22,370 --> 00:26:24,890

 
534
00:26:24,890 --> 00:26:28,340

 
535
00:26:28,340 --> 00:26:30,740

 
536
00:26:30,740 --> 00:26:32,600

 
537
00:26:32,600 --> 00:26:35,000

 
538
00:26:35,000 --> 00:26:39,920

 
539
00:26:39,920 --> 00:26:41,840

 
540
00:26:41,840 --> 00:26:43,430

 
541
00:26:43,430 --> 00:26:52,529

 
542
00:26:52,529 --> 00:26:54,959

 
543
00:26:54,959 --> 00:26:56,729

 
544
00:26:56,729 --> 00:27:01,229

 
545
00:27:01,229 --> 00:27:03,599

 
546
00:27:03,599 --> 00:27:05,339

 
547
00:27:05,339 --> 00:27:07,159

 
548
00:27:07,159 --> 00:27:10,709

 
549
00:27:10,709 --> 00:27:12,389

 
550
00:27:12,389 --> 00:27:14,999

 
551
00:27:14,999 --> 00:27:16,979

 
552
00:27:16,979 --> 00:27:18,719

 
553
00:27:18,719 --> 00:27:21,509

 
554
00:27:21,509 --> 00:27:23,609

 
555
00:27:23,609 --> 00:27:25,559

 
556
00:27:25,559 --> 00:27:27,089

 
557
00:27:27,089 --> 00:27:29,129

 
558
00:27:29,129 --> 00:27:30,930

 
559
00:27:30,930 --> 00:27:35,399

 
560
00:27:35,399 --> 00:27:37,079

 
561
00:27:37,079 --> 00:27:40,229

 
562
00:27:40,229 --> 00:27:42,209

 
563
00:27:42,209 --> 00:27:44,519

 
564
00:27:44,519 --> 00:27:46,439

 
565
00:27:46,439 --> 00:27:49,169

 
566
00:27:49,169 --> 00:27:51,569

 
567
00:27:51,569 --> 00:27:53,579

 
568
00:27:53,579 --> 00:27:55,889

 
569
00:27:55,889 --> 00:27:58,709

 
570
00:27:58,709 --> 00:28:00,989

 
571
00:28:00,989 --> 00:28:02,399

 
572
00:28:02,399 --> 00:28:04,229

 
573
00:28:04,229 --> 00:28:05,909

 
574
00:28:05,909 --> 00:28:07,589

 
575
00:28:07,589 --> 00:28:16,510

 
576
00:28:16,510 --> 00:28:19,730

 
577
00:28:19,730 --> 00:28:22,010

 
578
00:28:22,010 --> 00:28:25,780

 
579
00:28:25,780 --> 00:28:30,290

 
580
00:28:30,290 --> 00:28:32,030

 
581
00:28:32,030 --> 00:28:33,920

 
582
00:28:33,920 --> 00:28:37,580

 
583
00:28:37,580 --> 00:28:39,410

 
584
00:28:39,410 --> 00:28:42,350

 
585
00:28:42,350 --> 00:28:44,990

 
586
00:28:44,990 --> 00:28:47,390

 
587
00:28:47,390 --> 00:28:50,930

 
588
00:28:50,930 --> 00:28:52,370

 
589
00:28:52,370 --> 00:28:54,920

 
590
00:28:54,920 --> 00:28:58,190

 
591
00:28:58,190 --> 00:29:00,830

 
592
00:29:00,830 --> 00:29:05,270

 
593
00:29:05,270 --> 00:29:08,240

 
594
00:29:08,240 --> 00:29:09,620

 
595
00:29:09,620 --> 00:29:13,130

 
596
00:29:13,130 --> 00:29:15,770

 
597
00:29:15,770 --> 00:29:17,540

 
598
00:29:17,540 --> 00:29:20,090

 
599
00:29:20,090 --> 00:29:22,520

 
600
00:29:22,520 --> 00:29:24,440

 
601
00:29:24,440 --> 00:29:26,210

 
602
00:29:26,210 --> 00:29:29,750

 
603
00:29:29,750 --> 00:29:32,170

 
604
00:29:32,170 --> 00:29:34,550

 
605
00:29:34,550 --> 00:29:36,740

 
606
00:29:36,740 --> 00:29:38,690

 
607
00:29:38,690 --> 00:29:41,660

 
608
00:29:41,660 --> 00:29:43,970

 
609
00:29:43,970 --> 00:29:46,790

 
610
00:29:46,790 --> 00:29:51,020

 
611
00:29:51,020 --> 00:29:53,870

 
612
00:29:53,870 --> 00:30:00,200

 
613
00:30:00,200 --> 00:30:02,480

 
614
00:30:02,480 --> 00:30:04,160

 
615
00:30:04,160 --> 00:30:07,130

 
616
00:30:07,130 --> 00:30:09,170

 
617
00:30:09,170 --> 00:30:11,780

 
618
00:30:11,780 --> 00:30:13,940

 
619
00:30:13,940 --> 00:30:15,950

 
620
00:30:15,950 --> 00:30:17,540

 
621
00:30:17,540 --> 00:30:19,010

 
622
00:30:19,010 --> 00:30:20,780

 
623
00:30:20,780 --> 00:30:24,620

 
624
00:30:24,620 --> 00:30:33,810

 
625
00:30:33,810 --> 00:30:37,090

 
626
00:30:37,090 --> 00:30:40,780

 
627
00:30:40,780 --> 00:30:44,170

 
628
00:30:44,170 --> 00:30:46,030

 
629
00:30:46,030 --> 00:30:48,220

 
630
00:30:48,220 --> 00:30:51,640

 
631
00:30:51,640 --> 00:30:55,180

 
632
00:30:55,180 --> 00:30:57,280

 
633
00:30:57,280 --> 00:30:59,470

 
634
00:30:59,470 --> 00:31:01,270

 
635
00:31:01,270 --> 00:31:03,190

 
636
00:31:03,190 --> 00:31:04,750

 
637
00:31:04,750 --> 00:31:10,020

 
638
00:31:10,020 --> 00:31:12,760

 
639
00:31:12,760 --> 00:31:18,070

 
640
00:31:18,070 --> 00:31:19,630

 
641
00:31:19,630 --> 00:31:24,030

 
642
00:31:24,030 --> 00:31:28,180

 
643
00:31:28,180 --> 00:31:30,520

 
644
00:31:30,520 --> 00:31:34,290

 
645
00:31:34,290 --> 00:31:39,580

 
646
00:31:39,580 --> 00:31:41,290

 
647
00:31:41,290 --> 00:31:45,370

 
648
00:31:45,370 --> 00:31:47,890

 
649
00:31:47,890 --> 00:31:49,840

 
650
00:31:49,840 --> 00:31:51,610

 
651
00:31:51,610 --> 00:31:55,180

 
652
00:31:55,180 --> 00:31:57,220

 
653
00:31:57,220 --> 00:31:59,260

 
654
00:31:59,260 --> 00:32:00,820

 
655
00:32:00,820 --> 00:32:02,350

 
656
00:32:02,350 --> 00:32:06,610

 
657
00:32:06,610 --> 00:32:08,920

 
658
00:32:08,920 --> 00:32:12,340

 
659
00:32:12,340 --> 00:32:14,380

 
660
00:32:14,380 --> 00:32:17,080

 
661
00:32:17,080 --> 00:32:19,870

 
662
00:32:19,870 --> 00:32:21,490

 
663
00:32:21,490 --> 00:32:23,290

 
664
00:32:23,290 --> 00:32:25,720

 
665
00:32:25,720 --> 00:32:30,010

 
666
00:32:30,010 --> 00:32:32,230

 
667
00:32:32,230 --> 00:32:35,200

 
668
00:32:35,200 --> 00:32:37,720

 
669
00:32:37,720 --> 00:32:40,900

 
670
00:32:40,900 --> 00:32:42,820

 
671
00:32:42,820 --> 00:32:45,130

 
672
00:32:45,130 --> 00:32:47,710

 
673
00:32:47,710 --> 00:32:49,660

 
674
00:32:49,660 --> 00:32:57,070

 
675
00:32:57,070 --> 00:32:59,260

 
676
00:32:59,260 --> 00:33:03,610

 
677
00:33:03,610 --> 00:33:09,430

 
678
00:33:09,430 --> 00:33:11,320

 
679
00:33:11,320 --> 00:33:13,240

 
680
00:33:13,240 --> 00:33:15,970

 
681
00:33:15,970 --> 00:33:17,980

 
682
00:33:17,980 --> 00:33:20,560

 
683
00:33:20,560 --> 00:33:22,510

 
684
00:33:22,510 --> 00:33:24,370

 
685
00:33:24,370 --> 00:33:26,560

 
686
00:33:26,560 --> 00:33:30,820

 
687
00:33:30,820 --> 00:33:33,760

 
688
00:33:33,760 --> 00:33:37,210

 
689
00:33:37,210 --> 00:33:39,520

 
690
00:33:39,520 --> 00:33:41,950

 
691
00:33:41,950 --> 00:33:44,230

 
692
00:33:44,230 --> 00:33:49,360

 
693
00:33:49,360 --> 00:33:52,570

 
694
00:33:52,570 --> 00:33:56,320

 
695
00:33:56,320 --> 00:33:56,330

 
696
00:33:56,330 --> 00:34:02,280


697
00:34:02,280 --> 00:34:06,000

 
698
00:34:06,000 --> 00:34:09,240

 
699
00:34:09,240 --> 00:34:11,700

 
700
00:34:11,700 --> 00:34:13,950

 
701
00:34:13,950 --> 00:34:16,110

 
702
00:34:16,110 --> 00:34:18,629

 
703
00:34:18,629 --> 00:34:20,790

 
704
00:34:20,790 --> 00:34:22,409

 
705
00:34:22,409 --> 00:34:24,210

 
706
00:34:24,210 --> 00:34:27,240

 
707
00:34:27,240 --> 00:34:29,669

 
708
00:34:29,669 --> 00:34:42,300

 
709
00:34:42,300 --> 00:34:45,750

 
710
00:34:45,750 --> 00:34:48,000

 
711
00:34:48,000 --> 00:34:49,889

 
712
00:34:49,889 --> 00:34:52,110

 
713
00:34:52,110 --> 00:34:54,210

 
714
00:34:54,210 --> 00:34:57,750

 
715
00:34:57,750 --> 00:35:00,540

 
716
00:35:00,540 --> 00:35:04,170

 
717
00:35:04,170 --> 00:35:07,050

 
718
00:35:07,050 --> 00:35:08,460

 
719
00:35:08,460 --> 00:35:15,360

 
720
00:35:15,360 --> 00:35:17,100

 
721
00:35:17,100 --> 00:35:19,500

 
722
00:35:19,500 --> 00:35:22,830

 
723
00:35:22,830 --> 00:35:25,140

 
724
00:35:25,140 --> 00:35:26,670

 
725
00:35:26,670 --> 00:35:33,540

 
726
00:35:33,540 --> 00:35:35,580

 
727
00:35:35,580 --> 00:35:37,890

 
728
00:35:37,890 --> 00:35:42,560

 
729
00:35:42,560 --> 00:35:45,660

 
730
00:35:45,660 --> 00:35:48,600

 
731
00:35:48,600 --> 00:35:50,940

 
732
00:35:50,940 --> 00:35:52,920

 
733
00:35:52,920 --> 00:35:55,440

 
734
00:35:55,440 --> 00:35:58,800

 
735
00:35:58,800 --> 00:36:03,480

 
736
00:36:03,480 --> 00:36:05,490

 
737
00:36:05,490 --> 00:36:08,220

 
738
00:36:08,220 --> 00:36:09,780

 
739
00:36:09,780 --> 00:36:12,360

 
740
00:36:12,360 --> 00:36:14,500

 
741
00:36:14,500 --> 00:36:17,560

 
742
00:36:17,560 --> 00:36:19,840

 
743
00:36:19,840 --> 00:36:23,890

 
744
00:36:23,890 --> 00:36:25,390

 
745
00:36:25,390 --> 00:36:26,620

 
746
00:36:26,620 --> 00:36:30,730

 
747
00:36:30,730 --> 00:36:33,040

 
748
00:36:33,040 --> 00:36:35,050

 
749
00:36:35,050 --> 00:36:36,940

 
750
00:36:36,940 --> 00:36:39,310

 
751
00:36:39,310 --> 00:36:44,470

 
752
00:36:44,470 --> 00:36:47,440

 
753
00:36:47,440 --> 00:36:51,030

 
754
00:36:51,030 --> 00:36:56,710

 
755
00:36:56,710 --> 00:37:02,050

 
756
00:37:02,050 --> 00:37:03,580

 
757
00:37:03,580 --> 00:37:12,070

 
758
00:37:12,070 --> 00:37:13,750

 
759
00:37:13,750 --> 00:37:16,060

 
760
00:37:16,060 --> 00:37:24,010

 
761
00:37:24,010 --> 00:37:26,380

 
762
00:37:26,380 --> 00:37:29,200

 
763
00:37:29,200 --> 00:37:31,030

 
764
00:37:31,030 --> 00:37:41,140

 
765
00:37:41,140 --> 00:37:44,079

 
766
00:37:44,079 --> 00:37:53,799

 
767
00:37:53,799 --> 00:37:57,279

 
768
00:37:57,279 --> 00:38:02,380

 
769
00:38:02,380 --> 00:38:05,559

 
770
00:38:05,559 --> 00:38:06,940

 
771
00:38:06,940 --> 00:38:09,249

 
772
00:38:09,249 --> 00:38:11,470

 
773
00:38:11,470 --> 00:38:13,569

 
774
00:38:13,569 --> 00:38:18,730

 
775
00:38:18,730 --> 00:38:22,269

 
776
00:38:22,269 --> 00:38:24,940

 
777
00:38:24,940 --> 00:38:26,109

 
778
00:38:26,109 --> 00:38:29,170

 
779
00:38:29,170 --> 00:38:30,400

 
780
00:38:30,400 --> 00:38:34,749

 
781
00:38:34,749 --> 00:38:37,960

 
782
00:38:37,960 --> 00:38:41,259

 
783
00:38:41,259 --> 00:38:43,809

 
784
00:38:43,809 --> 00:38:50,609

 
785
00:38:50,609 --> 00:38:59,109

 
786
00:38:59,109 --> 00:39:20,190

 
787
00:39:20,190 --> 00:39:22,440

 
788
00:39:22,440 --> 00:39:24,299

 
789
00:39:24,299 --> 00:39:26,339

 
790
00:39:26,339 --> 00:39:33,329

 
791
00:39:33,329 --> 00:39:34,890

 
792
00:39:34,890 --> 00:39:37,230

 
793
00:39:37,230 --> 00:39:49,770

 
794
00:39:49,770 --> 00:39:52,799

 
795
00:39:52,799 --> 00:39:54,569

 
796
00:39:54,569 --> 00:39:56,460

 
797
00:39:56,460 --> 00:39:59,609

 
798
00:39:59,609 --> 00:40:01,380

 
799
00:40:01,380 --> 00:40:03,329

 
800
00:40:03,329 --> 00:40:05,460

 
801
00:40:05,460 --> 00:40:07,740

 
802
00:40:07,740 --> 00:40:10,380

 
803
00:40:10,380 --> 00:40:11,789

 
804
00:40:11,789 --> 00:40:15,359

 
805
00:40:15,359 --> 00:40:25,460

 
806
00:40:25,460 --> 00:40:31,069

 
807
00:40:31,069 --> 00:40:34,099

 
808
00:40:34,099 --> 00:40:36,170

 
809
00:40:36,170 --> 00:40:38,540

 
810
00:40:38,540 --> 00:40:42,140

 
811
00:40:42,140 --> 00:40:44,329

 
812
00:40:44,329 --> 00:40:47,359

 
813
00:40:47,359 --> 00:40:52,730

 
814
00:40:52,730 --> 00:41:00,260

 
815
00:41:00,260 --> 00:41:02,329

 
816
00:41:02,329 --> 00:41:05,359

 
817
00:41:05,359 --> 00:41:08,720

 
818
00:41:08,720 --> 00:41:11,240

 
819
00:41:11,240 --> 00:41:13,849

 
820
00:41:13,849 --> 00:41:15,859

 
821
00:41:15,859 --> 00:41:26,050

 
822
00:41:26,050 --> 00:41:28,609

 
823
00:41:28,609 --> 00:41:32,270

 
824
00:41:32,270 --> 00:41:40,670

 
825
00:41:40,670 --> 00:41:42,859

 
826
00:41:42,859 --> 00:41:45,950

 
827
00:41:45,950 --> 00:41:47,780

 
828
00:41:47,780 --> 00:41:51,290

 
829
00:41:51,290 --> 00:41:53,329

 
830
00:41:53,329 --> 00:42:00,990

 
831
00:42:00,990 --> 00:42:03,730

 
832
00:42:03,730 --> 00:42:06,130

 
833
00:42:06,130 --> 00:42:09,130

 
834
00:42:09,130 --> 00:42:11,799

 
835
00:42:11,799 --> 00:42:13,510

 
836
00:42:13,510 --> 00:42:16,120

 
837
00:42:16,120 --> 00:42:18,160

 
838
00:42:18,160 --> 00:42:20,260

 
839
00:42:20,260 --> 00:42:22,900

 
840
00:42:22,900 --> 00:42:27,760

 
841
00:42:27,760 --> 00:42:29,910

 
842
00:42:29,910 --> 00:42:32,440

 
843
00:42:32,440 --> 00:42:37,660

 
844
00:42:37,660 --> 00:42:42,250

 
845
00:42:42,250 --> 00:42:45,549

 
846
00:42:45,549 --> 00:42:48,099

 
847
00:42:48,099 --> 00:42:50,559

 
848
00:42:50,559 --> 00:42:52,270

 
849
00:42:52,270 --> 00:42:55,120

 
850
00:42:55,120 --> 00:43:02,559

 
851
00:43:02,559 --> 00:43:04,590

 
852
00:43:04,590 --> 00:43:04,600

 
853
00:43:04,600 --> 00:43:07,340


854
00:43:07,340 --> 00:43:10,830

 
855
00:43:10,830 --> 00:43:13,350

 
856
00:43:13,350 --> 00:43:15,390

 
857
00:43:15,390 --> 00:43:17,609

 
858
00:43:17,609 --> 00:43:19,260

 
859
00:43:19,260 --> 00:43:21,930

 
860
00:43:21,930 --> 00:43:28,940

 
861
00:43:28,940 --> 00:43:28,950

 
862
00:43:28,950 --> 00:43:32,210


863
00:43:32,210 --> 00:43:36,940

 
864
00:43:36,940 --> 00:43:39,370

 
865
00:43:39,370 --> 00:43:41,200

 
866
00:43:41,200 --> 00:43:45,850

 
867
00:43:45,850 --> 00:43:48,070

 
868
00:43:48,070 --> 00:43:50,110

 
869
00:43:50,110 --> 00:44:05,940

 
870
00:44:05,940 --> 00:44:08,640

 
871
00:44:08,640 --> 00:44:11,039

 
872
00:44:11,039 --> 00:44:13,859

 
873
00:44:13,859 --> 00:44:15,630

 
874
00:44:15,630 --> 00:44:18,240

 
875
00:44:18,240 --> 00:44:22,890

 
876
00:44:22,890 --> 00:44:27,390

 
877
00:44:27,390 --> 00:44:31,799

 
878
00:44:31,799 --> 00:44:34,890

 
879
00:44:34,890 --> 00:44:36,390

 
880
00:44:36,390 --> 00:44:39,359

 
881
00:44:39,359 --> 00:44:41,069

 
882
00:44:41,069 --> 00:44:42,809

 
883
00:44:42,809 --> 00:44:45,630

 
884
00:44:45,630 --> 00:44:47,760

 
885
00:44:47,760 --> 00:44:49,940

 
886
00:44:49,940 --> 00:44:52,829

 
887
00:44:52,829 --> 00:44:54,359

 
888
00:44:54,359 --> 00:44:56,400

 
889
00:44:56,400 --> 00:44:58,890

 
890
00:44:58,890 --> 00:45:01,440

 
891
00:45:01,440 --> 00:45:03,809

 
892
00:45:03,809 --> 00:45:05,760

 
893
00:45:05,760 --> 00:45:07,500

 
894
00:45:07,500 --> 00:45:09,089

 
895
00:45:09,089 --> 00:45:12,900

 
896
00:45:12,900 --> 00:45:14,819

 
897
00:45:14,819 --> 00:45:16,859

 
898
00:45:16,859 --> 00:45:19,220

 
899
00:45:19,220 --> 00:45:22,559

 
900
00:45:22,559 --> 00:45:24,539

 
901
00:45:24,539 --> 00:45:26,520

 
902
00:45:26,520 --> 00:45:29,460

 
903
00:45:29,460 --> 00:45:31,470

 
904
00:45:31,470 --> 00:45:35,970

 
905
00:45:35,970 --> 00:45:38,130

 
906
00:45:38,130 --> 00:45:39,750

 
907
00:45:39,750 --> 00:45:43,620

 
908
00:45:43,620 --> 00:45:50,770

 
909
00:45:50,770 --> 00:45:57,500

 
910
00:45:57,500 --> 00:45:57,510

 
911
00:45:57,510 --> 00:46:00,430


912
00:46:00,430 --> 00:46:02,410

 
913
00:46:02,410 --> 00:46:04,720

 
914
00:46:04,720 --> 00:46:06,700

 
915
00:46:06,700 --> 00:46:08,890

 
916
00:46:08,890 --> 00:46:10,480

 
917
00:46:10,480 --> 00:46:12,160

 
918
00:46:12,160 --> 00:46:13,870

 
919
00:46:13,870 --> 00:46:18,609

 
920
00:46:18,609 --> 00:46:21,010

 
921
00:46:21,010 --> 00:46:23,109

 
922
00:46:23,109 --> 00:46:26,220

 
923
00:46:26,220 --> 00:46:28,359

 
924
00:46:28,359 --> 00:46:30,579

 
925
00:46:30,579 --> 00:46:37,930

 
926
00:46:37,930 --> 00:46:40,960

 
927
00:46:40,960 --> 00:46:42,730

 
928
00:46:42,730 --> 00:46:48,160

 
929
00:46:48,160 --> 00:46:50,920

 
930
00:46:50,920 --> 00:46:53,079

 
931
00:46:53,079 --> 00:46:55,720

 
932
00:46:55,720 --> 00:46:59,440

 
933
00:46:59,440 --> 00:47:04,210

 
934
00:47:04,210 --> 00:47:05,560

 
935
00:47:05,560 --> 00:47:08,050

 
936
00:47:08,050 --> 00:47:10,000

 
937
00:47:10,000 --> 00:47:12,640

 
938
00:47:12,640 --> 00:47:14,109

 
939
00:47:14,109 --> 00:47:16,560

 
940
00:47:16,560 --> 00:47:19,089

 
941
00:47:19,089 --> 00:47:21,430

 
942
00:47:21,430 --> 00:47:23,230

 
943
00:47:23,230 --> 00:47:26,130

 
944
00:47:26,130 --> 00:47:28,810

 
945
00:47:28,810 --> 00:47:30,940

 
946
00:47:30,940 --> 00:47:32,710

 
947
00:47:32,710 --> 00:47:36,220

 
948
00:47:36,220 --> 00:47:38,829

 
949
00:47:38,829 --> 00:47:41,470

 
950
00:47:41,470 --> 00:47:45,099

 
951
00:47:45,099 --> 00:47:47,349

 
952
00:47:47,349 --> 00:47:48,880

 
953
00:47:48,880 --> 00:47:51,040

 
954
00:47:51,040 --> 00:47:52,450

 
955
00:47:52,450 --> 00:47:56,020

 
956
00:47:56,020 --> 00:47:58,150

 
957
00:47:58,150 --> 00:48:01,540

 
958
00:48:01,540 --> 00:48:02,800

 
959
00:48:02,800 --> 00:48:05,140

 
960
00:48:05,140 --> 00:48:07,329

 
961
00:48:07,329 --> 00:48:09,339

 
962
00:48:09,339 --> 00:48:11,170

 
963
00:48:11,170 --> 00:48:14,910

 
964
00:48:14,910 --> 00:48:17,499

 
965
00:48:17,499 --> 00:48:21,819

 
966
00:48:21,819 --> 00:48:24,009

 
967
00:48:24,009 --> 00:48:26,739

 
968
00:48:26,739 --> 00:48:28,420

 
969
00:48:28,420 --> 00:48:32,499

 
970
00:48:32,499 --> 00:48:38,940

 
971
00:48:38,940 --> 00:48:41,529

 
972
00:48:41,529 --> 00:48:43,299

 
973
00:48:43,299 --> 00:48:45,729

 
974
00:48:45,729 --> 00:48:48,700

 
975
00:48:48,700 --> 00:48:50,979

 
976
00:48:50,979 --> 00:48:54,220

 
977
00:48:54,220 --> 00:48:56,289

 
978
00:48:56,289 --> 00:49:01,359

 
979
00:49:01,359 --> 00:49:03,519

 
980
00:49:03,519 --> 00:49:05,440

 
981
00:49:05,440 --> 00:49:07,450

 
982
00:49:07,450 --> 00:49:09,970

 
983
00:49:09,970 --> 00:49:11,979

 
984
00:49:11,979 --> 00:49:14,319

 
985
00:49:14,319 --> 00:49:16,359

 
986
00:49:16,359 --> 00:49:20,109

 
987
00:49:20,109 --> 00:49:23,799

 
988
00:49:23,799 --> 00:49:25,870

 
989
00:49:25,870 --> 00:49:28,599

 
990
00:49:28,599 --> 00:49:31,150

 
991
00:49:31,150 --> 00:49:34,029

 
992
00:49:34,029 --> 00:49:41,620

 
993
00:49:41,620 --> 00:49:44,340

 
994
00:49:44,340 --> 00:49:48,130

 
995
00:49:48,130 --> 00:49:51,700

 
996
00:49:51,700 --> 00:49:55,720

 
997
00:49:55,720 --> 00:49:58,480

 
998
00:49:58,480 --> 00:50:01,270

 
999
00:50:01,270 --> 00:50:07,960

 
1000
00:50:07,960 --> 00:50:10,300

 
1001
00:50:10,300 --> 00:50:12,940

 
1002
00:50:12,940 --> 00:50:15,130

 
1003
00:50:15,130 --> 00:50:19,750

 
1004
00:50:19,750 --> 00:50:21,460

 
1005
00:50:21,460 --> 00:50:25,630

 
1006
00:50:25,630 --> 00:50:46,470

 
1007
00:50:46,470 --> 00:50:52,140

 
1008
00:50:52,140 --> 00:50:56,790

 
1009
00:50:56,790 --> 00:51:03,510

 
1010
00:51:03,510 --> 00:51:06,690

 
1011
00:51:06,690 --> 00:51:08,220

 
1012
00:51:08,220 --> 00:51:12,300

 
1013
00:51:12,300 --> 00:51:19,840

 
1014
00:51:19,840 --> 00:51:22,750

 
1015
00:51:22,750 --> 00:51:31,990

 
1016
00:51:31,990 --> 00:51:36,310

 
1017
00:51:36,310 --> 00:51:40,210

 
1018
00:51:40,210 --> 00:51:44,950

 
1019
00:51:44,950 --> 00:51:49,720

 
1020
00:51:49,720 --> 00:51:52,870

 
1021
00:51:52,870 --> 00:51:54,609

 
1022
00:51:54,609 --> 00:52:01,510

 
1023
00:52:01,510 --> 00:52:03,400

 
1024
00:52:03,400 --> 00:52:06,510

 
1025
00:52:06,510 --> 00:52:11,910

 
1026
00:52:11,910 --> 00:52:26,950

 
1027
00:52:26,950 --> 00:52:28,690

 
1028
00:52:28,690 --> 00:52:31,690

 
1029
00:52:31,690 --> 00:52:34,089

 
1030
00:52:34,089 --> 00:52:38,530

 
1031
00:52:38,530 --> 00:52:41,220

 
1032
00:52:41,220 --> 00:52:43,690

 
1033
00:52:43,690 --> 00:52:46,810

 
1034
00:52:46,810 --> 00:52:49,089

 
1035
00:52:49,089 --> 00:52:51,970

 
1036
00:52:51,970 --> 00:52:54,070

 
1037
00:52:54,070 --> 00:52:58,150

 
1038
00:52:58,150 --> 00:53:00,880

 
1039
00:53:00,880 --> 00:53:04,900

 
1040
00:53:04,900 --> 00:53:08,109

 
1041
00:53:08,109 --> 00:53:10,089

 
1042
00:53:10,089 --> 00:53:12,760

 
1043
00:53:12,760 --> 00:53:14,740

 
1044
00:53:14,740 --> 00:53:16,930

 
1045
00:53:16,930 --> 00:53:18,940

 
1046
00:53:18,940 --> 00:53:21,070

 
1047
00:53:21,070 --> 00:53:23,109

 
1048
00:53:23,109 --> 00:53:25,630

 
1049
00:53:25,630 --> 00:53:27,490

 
1050
00:53:27,490 --> 00:53:31,150

 
1051
00:53:31,150 --> 00:53:34,390

 
1052
00:53:34,390 --> 00:53:36,310

 
1053
00:53:36,310 --> 00:53:37,570

 
1054
00:53:37,570 --> 00:53:39,609

 
1055
00:53:39,609 --> 00:53:41,470

 
1056
00:53:41,470 --> 00:53:42,760

 
1057
00:53:42,760 --> 00:53:45,720

 
1058
00:53:45,720 --> 00:53:48,610

 
1059
00:53:48,610 --> 00:53:51,250

 
1060
00:53:51,250 --> 00:53:53,860

 
1061
00:53:53,860 --> 00:53:56,080

 
1062
00:53:56,080 --> 00:54:00,820

 
1063
00:54:00,820 --> 00:54:07,360

 
1064
00:54:07,360 --> 00:54:09,550

 
1065
00:54:09,550 --> 00:54:11,200

 
1066
00:54:11,200 --> 00:54:13,120

 
1067
00:54:13,120 --> 00:54:15,520

 
1068
00:54:15,520 --> 00:54:17,140

 
1069
00:54:17,140 --> 00:54:19,690

 
1070
00:54:19,690 --> 00:54:22,330

 
1071
00:54:22,330 --> 00:54:24,730

 
1072
00:54:24,730 --> 00:54:29,790

 
1073
00:54:29,790 --> 00:54:32,590

 
1074
00:54:32,590 --> 00:54:34,570

 
1075
00:54:34,570 --> 00:54:37,050

 
1076
00:54:37,050 --> 00:54:41,260

 
1077
00:54:41,260 --> 00:54:44,980

 
1078
00:54:44,980 --> 00:54:46,630

 
1079
00:54:46,630 --> 00:54:48,760

 
1080
00:54:48,760 --> 00:54:52,300

 
1081
00:54:52,300 --> 00:54:54,130

 
1082
00:54:54,130 --> 00:54:56,260

 
1083
00:54:56,260 --> 00:54:58,330

 
1084
00:54:58,330 --> 00:55:00,730

 
1085
00:55:00,730 --> 00:55:05,890

 
1086
00:55:05,890 --> 00:55:08,170

 
1087
00:55:08,170 --> 00:55:10,770

 
1088
00:55:10,770 --> 00:55:13,450

 
1089
00:55:13,450 --> 00:55:15,700

 
1090
00:55:15,700 --> 00:55:19,660

 
1091
00:55:19,660 --> 00:55:21,670

 
1092
00:55:21,670 --> 00:55:24,730

 
1093
00:55:24,730 --> 00:55:27,040

 
1094
00:55:27,040 --> 00:55:28,990

 
1095
00:55:28,990 --> 00:55:31,570

 
1096
00:55:31,570 --> 00:55:34,120

 
1097
00:55:34,120 --> 00:55:38,400

 
1098
00:55:38,400 --> 00:55:46,109

 
1099
00:55:46,109 --> 00:55:52,120

 
1100
00:55:52,120 --> 00:55:53,920

 
1101
00:55:53,920 --> 00:55:57,460

 
1102
00:55:57,460 --> 00:56:00,910

 
1103
00:56:00,910 --> 00:56:03,730

 
1104
00:56:03,730 --> 00:56:05,819

 
1105
00:56:05,819 --> 00:56:08,349

 
1106
00:56:08,349 --> 00:56:10,270

 
1107
00:56:10,270 --> 00:56:12,549

 
1108
00:56:12,549 --> 00:56:14,920

 
1109
00:56:14,920 --> 00:56:16,839

 
1110
00:56:16,839 --> 00:56:18,970

 
1111
00:56:18,970 --> 00:56:23,109

 
1112
00:56:23,109 --> 00:56:25,539

 
1113
00:56:25,539 --> 00:56:27,280

 
1114
00:56:27,280 --> 00:56:29,140

 
1115
00:56:29,140 --> 00:56:32,020

 
1116
00:56:32,020 --> 00:56:33,720

 
1117
00:56:33,720 --> 00:56:36,039

 
1118
00:56:36,039 --> 00:56:37,809

 
1119
00:56:37,809 --> 00:56:39,490

 
1120
00:56:39,490 --> 00:56:41,260

 
1121
00:56:41,260 --> 00:56:46,299

 
1122
00:56:46,299 --> 00:56:50,680

 
1123
00:56:50,680 --> 00:56:55,900

 
1124
00:56:55,900 --> 00:56:58,450

 
1125
00:56:58,450 --> 00:56:59,950

 
1126
00:56:59,950 --> 00:57:01,780

 
1127
00:57:01,780 --> 00:57:04,390

 
1128
00:57:04,390 --> 00:57:06,670

 
1129
00:57:06,670 --> 00:57:08,109

 
1130
00:57:08,109 --> 00:57:10,599

 
1131
00:57:10,599 --> 00:57:12,339

 
1132
00:57:12,339 --> 00:57:13,720

 
1133
00:57:13,720 --> 00:57:24,150

 
1134
00:57:24,150 --> 00:57:27,600

 
1135
00:57:27,600 --> 00:57:27,610

 
1136
00:57:27,610 --> 00:57:28,200


1137
00:57:28,200 --> 00:57:31,220

 
1138
00:57:31,220 --> 00:57:31,230

 
1139
00:57:31,230 --> 00:57:38,240


1140
00:57:38,240 --> 00:57:38,250

 
1141
00:57:38,250 --> 00:57:42,020


1142
00:57:42,020 --> 00:57:44,450

 
1143
00:57:44,450 --> 00:57:48,520

 
1144
00:57:48,520 --> 00:57:51,230

 
1145
00:57:51,230 --> 00:57:53,870

 
1146
00:57:53,870 --> 00:57:57,560

 
1147
00:57:57,560 --> 00:58:01,520

 
1148
00:58:01,520 --> 00:58:03,380

 
1149
00:58:03,380 --> 00:58:06,680

 
1150
00:58:06,680 --> 00:58:09,920

 
1151
00:58:09,920 --> 00:58:11,300

 
1152
00:58:11,300 --> 00:58:14,720

 
1153
00:58:14,720 --> 00:58:19,520

 
1154
00:58:19,520 --> 00:58:20,870

 
1155
00:58:20,870 --> 00:58:23,060

 
1156
00:58:23,060 --> 00:58:24,470

 
1157
00:58:24,470 --> 00:58:27,470

 
1158
00:58:27,470 --> 00:58:28,700

 
1159
00:58:28,700 --> 00:58:30,650

 
1160
00:58:30,650 --> 00:58:32,450

 
1161
00:58:32,450 --> 00:58:33,890

 
1162
00:58:33,890 --> 00:58:36,350

 
1163
00:58:36,350 --> 00:58:38,900

 
1164
00:58:38,900 --> 00:58:41,750

 
1165
00:58:41,750 --> 00:58:43,700

 
1166
00:58:43,700 --> 00:58:45,770

 
1167
00:58:45,770 --> 00:58:47,210

 
1168
00:58:47,210 --> 00:58:51,680

 
1169
00:58:51,680 --> 00:58:56,450

 
1170
00:58:56,450 --> 00:58:58,460

 
1171
00:58:58,460 --> 00:59:00,890

 
1172
00:59:00,890 --> 00:59:04,250

 
1173
00:59:04,250 --> 00:59:06,800

 
1174
00:59:06,800 --> 00:59:09,020

 
1175
00:59:09,020 --> 00:59:11,060

 
1176
00:59:11,060 --> 00:59:13,280

 
1177
00:59:13,280 --> 00:59:14,870

 
1178
00:59:14,870 --> 00:59:17,930

 
1179
00:59:17,930 --> 00:59:20,660

 
1180
00:59:20,660 --> 00:59:23,720

 
1181
00:59:23,720 --> 00:59:25,790

 
1182
00:59:25,790 --> 00:59:28,490

 
1183
00:59:28,490 --> 00:59:30,020

 
1184
00:59:30,020 --> 00:59:32,660

 
1185
00:59:32,660 --> 00:59:34,520

 
1186
00:59:34,520 --> 00:59:37,130

 
1187
00:59:37,130 --> 00:59:39,710

 
1188
00:59:39,710 --> 00:59:41,270

 
1189
00:59:41,270 --> 00:59:43,310

 
1190
00:59:43,310 --> 00:59:45,500

 
1191
00:59:45,500 --> 00:59:46,820

 
1192
00:59:46,820 --> 00:59:49,370

 
1193
00:59:49,370 --> 00:59:49,380

 
1194
00:59:49,380 --> 00:59:52,240


1195
00:59:52,240 --> 00:59:59,030

 
1196
00:59:59,030 --> 01:00:01,520

 
1197
01:00:01,520 --> 01:00:04,700

 
1198
01:00:04,700 --> 01:00:09,530

 
1199
01:00:09,530 --> 01:00:11,930

 
1200
01:00:11,930 --> 01:00:15,349

 
1201
01:00:15,349 --> 01:00:17,390

 
1202
01:00:17,390 --> 01:00:19,640

 
1203
01:00:19,640 --> 01:00:22,970

 
1204
01:00:22,970 --> 01:00:25,370

 
1205
01:00:25,370 --> 01:00:27,770

 
1206
01:00:27,770 --> 01:00:29,420

 
1207
01:00:29,420 --> 01:00:31,490

 
1208
01:00:31,490 --> 01:00:34,849

 
1209
01:00:34,849 --> 01:00:37,250

 
1210
01:00:37,250 --> 01:00:39,500

 
1211
01:00:39,500 --> 01:00:41,390

 
1212
01:00:41,390 --> 01:00:43,220

 
1213
01:00:43,220 --> 01:00:44,900

 
1214
01:00:44,900 --> 01:00:48,740

 
1215
01:00:48,740 --> 01:00:50,390

 
1216
01:00:50,390 --> 01:00:52,730

 
1217
01:00:52,730 --> 01:00:53,890

 
1218
01:00:53,890 --> 01:00:56,839

 
1219
01:00:56,839 --> 01:00:59,180

 
1220
01:00:59,180 --> 01:01:08,480

 
1221
01:01:08,480 --> 01:01:13,640

 
1222
01:01:13,640 --> 01:01:15,559

 
1223
01:01:15,559 --> 01:01:17,359

 
1224
01:01:17,359 --> 01:01:19,160

 
1225
01:01:19,160 --> 01:01:24,680

 
1226
01:01:24,680 --> 01:01:25,670

 
1227
01:01:25,670 --> 01:01:29,240

 
1228
01:01:29,240 --> 01:01:36,520

 
1229
01:01:36,520 --> 01:01:36,530

 
1230
01:01:36,530 --> 01:01:38,799


1231
01:01:38,799 --> 01:01:41,559

 
1232
01:01:41,559 --> 01:01:43,959

 
1233
01:01:43,959 --> 01:01:45,969

 
1234
01:01:45,969 --> 01:01:48,489

 
1235
01:01:48,489 --> 01:01:51,069

 
1236
01:01:51,069 --> 01:01:52,539

 
1237
01:01:52,539 --> 01:01:54,759

 
1238
01:01:54,759 --> 01:01:57,519

 
1239
01:01:57,519 --> 01:01:59,890

 
1240
01:01:59,890 --> 01:02:01,929

 
1241
01:02:01,929 --> 01:02:03,999

 
1242
01:02:03,999 --> 01:02:06,039

 
1243
01:02:06,039 --> 01:02:11,429

 
1244
01:02:11,429 --> 01:02:14,679

 
1245
01:02:14,679 --> 01:02:17,620

 
1246
01:02:17,620 --> 01:02:19,359

 
1247
01:02:19,359 --> 01:02:22,179

 
1248
01:02:22,179 --> 01:02:26,019

 
1249
01:02:26,019 --> 01:02:30,699

 
1250
01:02:30,699 --> 01:02:32,439

 
1251
01:02:32,439 --> 01:02:35,829

 
1252
01:02:35,829 --> 01:02:39,399

 
1253
01:02:39,399 --> 01:02:41,640

 
1254
01:02:41,640 --> 01:02:44,049

 
1255
01:02:44,049 --> 01:02:45,759

 
1256
01:02:45,759 --> 01:02:49,239

 
1257
01:02:49,239 --> 01:02:51,279

 
1258
01:02:51,279 --> 01:02:56,259

 
1259
01:02:56,259 --> 01:02:57,819

 
1260
01:02:57,819 --> 01:03:00,669

 
1261
01:03:00,669 --> 01:03:08,049

 
1262
01:03:08,049 --> 01:03:10,870

 
1263
01:03:10,870 --> 01:03:13,769

 
1264
01:03:13,769 --> 01:03:13,779

 
1265
01:03:13,779 --> 01:03:16,330


1266
01:03:16,330 --> 01:03:16,340

 
1267
01:03:16,340 --> 01:03:18,230


1268
01:03:18,230 --> 01:03:20,930

 
1269
01:03:20,930 --> 01:03:23,089

 
1270
01:03:23,089 --> 01:03:25,280

 
1271
01:03:25,280 --> 01:03:30,920

 
1272
01:03:30,920 --> 01:03:34,880

 
1273
01:03:34,880 --> 01:03:37,579

 
1274
01:03:37,579 --> 01:03:40,220

 
1275
01:03:40,220 --> 01:03:42,020

 
1276
01:03:42,020 --> 01:03:44,060

 
1277
01:03:44,060 --> 01:03:46,670

 
1278
01:03:46,670 --> 01:03:53,780

 
1279
01:03:53,780 --> 01:03:55,070

 
1280
01:03:55,070 --> 01:03:56,810

 
1281
01:03:56,810 --> 01:03:58,550

 
1282
01:03:58,550 --> 01:04:00,079

 
1283
01:04:00,079 --> 01:04:04,730

 
1284
01:04:04,730 --> 01:04:08,240

 
1285
01:04:08,240 --> 01:04:09,770

 
1286
01:04:09,770 --> 01:04:11,120

 
1287
01:04:11,120 --> 01:04:13,099

 
1288
01:04:13,099 --> 01:04:14,930

 
1289
01:04:14,930 --> 01:04:17,839

 
1290
01:04:17,839 --> 01:04:20,240

 
1291
01:04:20,240 --> 01:04:24,560

 
1292
01:04:24,560 --> 01:04:26,000

 
1293
01:04:26,000 --> 01:04:30,530

 
1294
01:04:30,530 --> 01:04:33,170

 
1295
01:04:33,170 --> 01:04:35,240

 
1296
01:04:35,240 --> 01:04:37,609

 
1297
01:04:37,609 --> 01:04:41,030

 
1298
01:04:41,030 --> 01:04:43,730

 
1299
01:04:43,730 --> 01:04:46,160

 
1300
01:04:46,160 --> 01:04:53,720

 
1301
01:04:53,720 --> 01:04:59,960

 
1302
01:04:59,960 --> 01:05:04,670

 
1303
01:05:04,670 --> 01:05:06,530

 
1304
01:05:06,530 --> 01:05:08,870

 
1305
01:05:08,870 --> 01:05:12,620

 
1306
01:05:12,620 --> 01:05:16,970

 
1307
01:05:16,970 --> 01:05:19,550

 
1308
01:05:19,550 --> 01:05:21,980

 
1309
01:05:21,980 --> 01:05:25,580

 
1310
01:05:25,580 --> 01:05:27,620

 
1311
01:05:27,620 --> 01:05:30,080

 
1312
01:05:30,080 --> 01:05:34,160

 
1313
01:05:34,160 --> 01:05:36,500

 
1314
01:05:36,500 --> 01:05:39,070

 
1315
01:05:39,070 --> 01:05:41,450

 
1316
01:05:41,450 --> 01:05:43,910

 
1317
01:05:43,910 --> 01:05:46,550

 
1318
01:05:46,550 --> 01:05:50,950

 
1319
01:05:50,950 --> 01:05:54,620

 
1320
01:05:54,620 --> 01:05:57,680

 
1321
01:05:57,680 --> 01:05:59,690

 
1322
01:05:59,690 --> 01:06:02,060

 
1323
01:06:02,060 --> 01:06:04,520

 
1324
01:06:04,520 --> 01:06:08,600

 
1325
01:06:08,600 --> 01:06:10,730

 
1326
01:06:10,730 --> 01:06:12,980

 
1327
01:06:12,980 --> 01:06:16,040

 
1328
01:06:16,040 --> 01:06:17,630

 
1329
01:06:17,630 --> 01:06:19,040

 
1330
01:06:19,040 --> 01:06:22,150

 
1331
01:06:22,150 --> 01:06:24,590

 
1332
01:06:24,590 --> 01:06:27,050

 
1333
01:06:27,050 --> 01:06:29,480

 
1334
01:06:29,480 --> 01:06:33,260

 
1335
01:06:33,260 --> 01:06:37,850

 
1336
01:06:37,850 --> 01:06:39,650

 
1337
01:06:39,650 --> 01:06:42,020

 
1338
01:06:42,020 --> 01:06:46,490

 
1339
01:06:46,490 --> 01:06:49,520

 
1340
01:06:49,520 --> 01:06:52,850

 
1341
01:06:52,850 --> 01:06:55,700

 
1342
01:06:55,700 --> 01:06:58,610

 
1343
01:06:58,610 --> 01:07:01,490

 
1344
01:07:01,490 --> 01:07:03,530

 
1345
01:07:03,530 --> 01:07:06,320

 
1346
01:07:06,320 --> 01:07:07,069

 
1347
01:07:07,069 --> 01:07:13,190

 
1348
01:07:13,190 --> 01:07:16,039

 
1349
01:07:16,039 --> 01:07:19,640

 
1350
01:07:19,640 --> 01:07:21,979

 
1351
01:07:21,979 --> 01:07:26,479

 
1352
01:07:26,479 --> 01:07:29,569

 
1353
01:07:29,569 --> 01:07:31,249

 
1354
01:07:31,249 --> 01:07:33,019

 
1355
01:07:33,019 --> 01:07:38,299

 
1356
01:07:38,299 --> 01:07:41,599

 
1357
01:07:41,599 --> 01:07:43,370

 
1358
01:07:43,370 --> 01:07:45,680

 
1359
01:07:45,680 --> 01:07:49,670

 
1360
01:07:49,670 --> 01:07:52,430

 
1361
01:07:52,430 --> 01:07:55,309

 
1362
01:07:55,309 --> 01:07:57,890

 
1363
01:07:57,890 --> 01:08:00,099

 
1364
01:08:00,099 --> 01:08:07,219

 
1365
01:08:07,219 --> 01:08:09,709

 
1366
01:08:09,709 --> 01:08:12,709

 
1367
01:08:12,709 --> 01:08:14,809

 
1368
01:08:14,809 --> 01:08:16,609

 
1369
01:08:16,609 --> 01:08:19,700

 
1370
01:08:19,700 --> 01:08:21,530

 
1371
01:08:21,530 --> 01:08:24,829

 
1372
01:08:24,829 --> 01:08:29,349

 
1373
01:08:29,349 --> 01:08:40,180

 
1374
01:08:40,180 --> 01:08:47,450

 
1375
01:08:47,450 --> 01:08:50,539

 
1376
01:08:50,539 --> 01:08:52,430

 
1377
01:08:52,430 --> 01:08:55,880

 
1378
01:08:55,880 --> 01:08:57,820

 
1379
01:08:57,820 --> 01:09:00,079

 
1380
01:09:00,079 --> 01:09:02,599

 
1381
01:09:02,599 --> 01:09:07,700

 
1382
01:09:07,700 --> 01:09:10,640

 
1383
01:09:10,640 --> 01:09:12,979

 
1384
01:09:12,979 --> 01:09:15,019

 
1385
01:09:15,019 --> 01:09:17,840

 
1386
01:09:17,840 --> 01:09:20,479

 
1387
01:09:20,479 --> 01:09:24,740

 
1388
01:09:24,740 --> 01:09:28,280

 
1389
01:09:28,280 --> 01:09:31,999

 
1390
01:09:31,999 --> 01:09:36,919

 
1391
01:09:36,919 --> 01:09:39,380

 
1392
01:09:39,380 --> 01:09:41,570

 
1393
01:09:41,570 --> 01:09:45,620

 
1394
01:09:45,620 --> 01:09:47,689

 
1395
01:09:47,689 --> 01:09:54,709

 
1396
01:09:54,709 --> 01:09:56,300

 
1397
01:09:56,300 --> 01:10:03,160

 
1398
01:10:03,160 --> 01:10:06,020

 
1399
01:10:06,020 --> 01:10:08,090

 
1400
01:10:08,090 --> 01:10:09,950

 
1401
01:10:09,950 --> 01:10:13,580

 
1402
01:10:13,580 --> 01:10:15,320

 
1403
01:10:15,320 --> 01:10:18,620

 
1404
01:10:18,620 --> 01:10:20,300

 
1405
01:10:20,300 --> 01:10:24,200

 
1406
01:10:24,200 --> 01:10:26,810

 
1407
01:10:26,810 --> 01:10:30,439

 
1408
01:10:30,439 --> 01:10:32,510

 
1409
01:10:32,510 --> 01:10:34,340

 
1410
01:10:34,340 --> 01:10:35,899

 
1411
01:10:35,899 --> 01:10:37,820

 
1412
01:10:37,820 --> 01:10:41,990

 
1413
01:10:41,990 --> 01:10:43,550

 
1414
01:10:43,550 --> 01:10:44,990

 
1415
01:10:44,990 --> 01:10:46,520

 
1416
01:10:46,520 --> 01:10:46,530

 
1417
01:10:46,530 --> 01:10:50,010


1418
01:10:50,010 --> 01:10:53,320

 
1419
01:10:53,320 --> 01:10:55,030

 
1420
01:10:55,030 --> 01:10:57,820

 
1421
01:10:57,820 --> 01:11:01,720

 
1422
01:11:01,720 --> 01:11:03,550

 
1423
01:11:03,550 --> 01:11:06,330

 
1424
01:11:06,330 --> 01:11:09,580

 
1425
01:11:09,580 --> 01:11:12,040

 
1426
01:11:12,040 --> 01:11:13,870

 
1427
01:11:13,870 --> 01:11:16,000

 
1428
01:11:16,000 --> 01:11:18,400

 
1429
01:11:18,400 --> 01:11:20,830

 
1430
01:11:20,830 --> 01:11:23,350

 
1431
01:11:23,350 --> 01:11:28,600

 
1432
01:11:28,600 --> 01:11:30,640

 
1433
01:11:30,640 --> 01:11:32,650

 
1434
01:11:32,650 --> 01:11:37,900

 
1435
01:11:37,900 --> 01:11:39,430

 
1436
01:11:39,430 --> 01:11:42,130

 
1437
01:11:42,130 --> 01:11:44,920

 
1438
01:11:44,920 --> 01:11:46,600

 
1439
01:11:46,600 --> 01:11:48,130

 
1440
01:11:48,130 --> 01:11:50,470

 
1441
01:11:50,470 --> 01:11:52,690

 
1442
01:11:52,690 --> 01:11:55,600

 
1443
01:11:55,600 --> 01:11:57,130

 
1444
01:11:57,130 --> 01:12:01,210

 
1445
01:12:01,210 --> 01:12:03,580

 
1446
01:12:03,580 --> 01:12:05,860

 
1447
01:12:05,860 --> 01:12:08,230

 
1448
01:12:08,230 --> 01:12:14,980

 
1449
01:12:14,980 --> 01:12:17,230

 
1450
01:12:17,230 --> 01:12:22,090

 
1451
01:12:22,090 --> 01:12:24,490

 
1452
01:12:24,490 --> 01:12:28,480

 
1453
01:12:28,480 --> 01:12:31,540

 
1454
01:12:31,540 --> 01:12:34,150

 
1455
01:12:34,150 --> 01:12:36,630

 
1456
01:12:36,630 --> 01:12:40,090

 
1457
01:12:40,090 --> 01:12:40,100

 
1458
01:12:40,100 --> 01:12:44,270


1459
01:12:44,270 --> 01:12:47,270

 
1460
01:12:47,270 --> 01:12:50,720

 
1461
01:12:50,720 --> 01:12:52,520

 
1462
01:12:52,520 --> 01:12:55,960

 
1463
01:12:55,960 --> 01:12:59,540

 
1464
01:12:59,540 --> 01:13:00,950

 
1465
01:13:00,950 --> 01:13:03,920

 
1466
01:13:03,920 --> 01:13:07,520

 
1467
01:13:07,520 --> 01:13:09,770

 
1468
01:13:09,770 --> 01:13:13,760

 
1469
01:13:13,760 --> 01:13:16,100

 
1470
01:13:16,100 --> 01:13:18,350

 
1471
01:13:18,350 --> 01:13:20,390

 
1472
01:13:20,390 --> 01:13:23,420

 
1473
01:13:23,420 --> 01:13:25,790

 
1474
01:13:25,790 --> 01:13:28,070

 
1475
01:13:28,070 --> 01:13:32,090

 
1476
01:13:32,090 --> 01:13:35,450

 
1477
01:13:35,450 --> 01:13:38,780

 
1478
01:13:38,780 --> 01:13:41,270

 
1479
01:13:41,270 --> 01:13:46,130

 
1480
01:13:46,130 --> 01:13:47,600

 
1481
01:13:47,600 --> 01:13:50,750

 
1482
01:13:50,750 --> 01:13:52,610

 
1483
01:13:52,610 --> 01:13:56,090

 
1484
01:13:56,090 --> 01:14:00,230

 
1485
01:14:00,230 --> 01:14:01,610

 
1486
01:14:01,610 --> 01:14:03,800

 
1487
01:14:03,800 --> 01:14:05,990

 
1488
01:14:05,990 --> 01:14:08,480

 
1489
01:14:08,480 --> 01:14:10,700

 
1490
01:14:10,700 --> 01:14:15,140

 
1491
01:14:15,140 --> 01:14:18,800

 
1492
01:14:18,800 --> 01:14:22,280

 
1493
01:14:22,280 --> 01:14:24,020

 
1494
01:14:24,020 --> 01:14:26,810

 
1495
01:14:26,810 --> 01:14:29,000

 
1496
01:14:29,000 --> 01:14:32,990

 
1497
01:14:32,990 --> 01:14:34,820

 
1498
01:14:34,820 --> 01:14:37,100

 
1499
01:14:37,100 --> 01:14:39,740

 
1500
01:14:39,740 --> 01:14:44,630

 
1501
01:14:44,630 --> 01:14:47,060

 
1502
01:14:47,060 --> 01:14:50,360

 
1503
01:14:50,360 --> 01:14:52,460

 
1504
01:14:52,460 --> 01:14:56,609

 
1505
01:14:56,609 --> 01:14:58,979

 
1506
01:14:58,979 --> 01:15:01,649

 
1507
01:15:01,649 --> 01:15:05,010

 
1508
01:15:05,010 --> 01:15:06,899

 
1509
01:15:06,899 --> 01:15:08,640

 
1510
01:15:08,640 --> 01:15:10,979

 
1511
01:15:10,979 --> 01:15:12,870

 
1512
01:15:12,870 --> 01:15:16,290

 
1513
01:15:16,290 --> 01:15:30,510

 
1514
01:15:30,510 --> 01:15:33,000

 
1515
01:15:33,000 --> 01:15:40,109

 
1516
01:15:40,109 --> 01:15:43,740

 
1517
01:15:43,740 --> 01:15:47,160

 
1518
01:15:47,160 --> 01:15:49,020

 
1519
01:15:49,020 --> 01:15:54,089

 
1520
01:15:54,089 --> 01:15:56,070

 
1521
01:15:56,070 --> 01:15:57,810

 
1522
01:15:57,810 --> 01:15:59,910

 
1523
01:15:59,910 --> 01:16:04,290

 
1524
01:16:04,290 --> 01:16:06,060

 
1525
01:16:06,060 --> 01:16:08,370

 
1526
01:16:08,370 --> 01:16:10,470

 
1527
01:16:10,470 --> 01:16:12,720

 
1528
01:16:12,720 --> 01:16:15,390

 
1529
01:16:15,390 --> 01:16:17,490

 
1530
01:16:17,490 --> 01:16:20,399

 
1531
01:16:20,399 --> 01:16:24,330

 
1532
01:16:24,330 --> 01:16:28,020

 
1533
01:16:28,020 --> 01:16:30,959

 
1534
01:16:30,959 --> 01:16:33,089

 
1535
01:16:33,089 --> 01:16:35,580

 
1536
01:16:35,580 --> 01:16:37,109

 
1537
01:16:37,109 --> 01:16:40,280

 
1538
01:16:40,280 --> 01:16:43,350

 
1539
01:16:43,350 --> 01:16:46,430

 
1540
01:16:46,430 --> 01:16:49,020

 
1541
01:16:49,020 --> 01:16:50,520

 
1542
01:16:50,520 --> 01:16:52,950

 
1543
01:16:52,950 --> 01:17:00,660

 
1544
01:17:00,660 --> 01:17:02,879

 
1545
01:17:02,879 --> 01:17:05,160

 
1546
01:17:05,160 --> 01:17:09,280

 
1547
01:17:09,280 --> 01:17:11,980

 
1548
01:17:11,980 --> 01:17:16,690

 
1549
01:17:16,690 --> 01:17:19,030

 
1550
01:17:19,030 --> 01:17:21,910

 
1551
01:17:21,910 --> 01:17:24,310

 
1552
01:17:24,310 --> 01:17:25,960

 
1553
01:17:25,960 --> 01:17:27,580

 
1554
01:17:27,580 --> 01:17:29,620

 
1555
01:17:29,620 --> 01:17:35,620

 
1556
01:17:35,620 --> 01:17:37,840

 
1557
01:17:37,840 --> 01:17:40,420

 
1558
01:17:40,420 --> 01:17:43,570

 
1559
01:17:43,570 --> 01:17:48,930

 
1560
01:17:48,930 --> 01:17:51,640

 
1561
01:17:51,640 --> 01:17:54,930

 
1562
01:17:54,930 --> 01:17:57,670

 
1563
01:17:57,670 --> 01:17:59,770

 
1564
01:17:59,770 --> 01:18:02,290

 
1565
01:18:02,290 --> 01:18:05,110

 
1566
01:18:05,110 --> 01:18:07,960

 
1567
01:18:07,960 --> 01:18:10,000

 
1568
01:18:10,000 --> 01:18:15,640

 
1569
01:18:15,640 --> 01:18:18,400

 
1570
01:18:18,400 --> 01:18:22,290

 
1571
01:18:22,290 --> 01:18:24,790

 
1572
01:18:24,790 --> 01:18:26,620

 
1573
01:18:26,620 --> 01:18:29,080

 
1574
01:18:29,080 --> 01:18:33,810

 
1575
01:18:33,810 --> 01:18:39,070

 
1576
01:18:39,070 --> 01:18:42,310

 
1577
01:18:42,310 --> 01:18:45,640

 
1578
01:18:45,640 --> 01:18:49,330

 
1579
01:18:49,330 --> 01:18:51,580

 
1580
01:18:51,580 --> 01:18:53,740

 
1581
01:18:53,740 --> 01:18:57,730

 
1582
01:18:57,730 --> 01:18:59,620

 
1583
01:18:59,620 --> 01:19:02,560

 
1584
01:19:02,560 --> 01:19:04,360

 
1585
01:19:04,360 --> 01:19:07,000

 
1586
01:19:07,000 --> 01:19:10,660

 
1587
01:19:10,660 --> 01:19:13,570

 
1588
01:19:13,570 --> 01:19:15,610

 
1589
01:19:15,610 --> 01:19:22,480

 
1590
01:19:22,480 --> 01:19:25,760

 
1591
01:19:25,760 --> 01:19:27,740

 
1592
01:19:27,740 --> 01:19:30,020

 
1593
01:19:30,020 --> 01:19:32,840

 
1594
01:19:32,840 --> 01:19:34,250

 
1595
01:19:34,250 --> 01:19:36,830

 
1596
01:19:36,830 --> 01:19:39,170

 
1597
01:19:39,170 --> 01:19:41,180

 
1598
01:19:41,180 --> 01:19:44,810

 
1599
01:19:44,810 --> 01:19:46,550

 
1600
01:19:46,550 --> 01:19:48,920

 
1601
01:19:48,920 --> 01:19:52,160

 
1602
01:19:52,160 --> 01:19:55,010

 
1603
01:19:55,010 --> 01:19:58,900

 
1604
01:19:58,900 --> 01:20:02,090

 
1605
01:20:02,090 --> 01:20:04,100

 
1606
01:20:04,100 --> 01:20:06,830

 
1607
01:20:06,830 --> 01:20:09,920

 
1608
01:20:09,920 --> 01:20:12,170

 
1609
01:20:12,170 --> 01:20:12,180

 
1610
01:20:12,180 --> 01:20:12,920


1611
01:20:12,920 --> 01:20:15,080

 
1612
01:20:15,080 --> 01:20:17,960

 
1613
01:20:17,960 --> 01:20:20,120

 
1614
01:20:20,120 --> 01:20:22,280

 
1615
01:20:22,280 --> 01:20:25,310

 
1616
01:20:25,310 --> 01:20:28,310

 
1617
01:20:28,310 --> 01:20:29,870

 
1618
01:20:29,870 --> 01:20:31,670

 
1619
01:20:31,670 --> 01:20:34,250

 
1620
01:20:34,250 --> 01:20:35,750

 
1621
01:20:35,750 --> 01:20:38,210

 
1622
01:20:38,210 --> 01:20:39,980

 
1623
01:20:39,980 --> 01:20:42,410

 
1624
01:20:42,410 --> 01:20:44,450

 
1625
01:20:44,450 --> 01:20:46,100

 
1626
01:20:46,100 --> 01:20:49,430

 
1627
01:20:49,430 --> 01:20:51,260

 
1628
01:20:51,260 --> 01:20:53,690

 
1629
01:20:53,690 --> 01:20:55,900

 
1630
01:20:55,900 --> 01:20:59,510

 
1631
01:20:59,510 --> 01:21:01,340

 
1632
01:21:01,340 --> 01:21:04,010

 
1633
01:21:04,010 --> 01:21:07,580

 
1634
01:21:07,580 --> 01:21:09,830

 
1635
01:21:09,830 --> 01:21:11,330

 
1636
01:21:11,330 --> 01:21:12,830

 
1637
01:21:12,830 --> 01:21:15,470

 
1638
01:21:15,470 --> 01:21:17,240

 
1639
01:21:17,240 --> 01:21:18,590

 
1640
01:21:18,590 --> 01:21:21,140

 
1641
01:21:21,140 --> 01:21:23,240

 
1642
01:21:23,240 --> 01:21:25,160

 
1643
01:21:25,160 --> 01:21:27,110

 
1644
01:21:27,110 --> 01:21:30,330

 
1645
01:21:30,330 --> 01:21:32,280

 
1646
01:21:32,280 --> 01:21:34,020

 
1647
01:21:34,020 --> 01:21:36,660

 
1648
01:21:36,660 --> 01:21:39,120

 
1649
01:21:39,120 --> 01:21:40,800

 
1650
01:21:40,800 --> 01:21:42,780

 
1651
01:21:42,780 --> 01:21:44,040

 
1652
01:21:44,040 --> 01:21:44,050

 
1653
01:21:44,050 --> 01:21:46,349