1
00:00:01,069 --> 00:00:03,189

the following content is provided under  a Creative Commons license your support  will help MIT OpenCourseWare continue to  offer high quality educational resources  for free  to make a donation or to view additional  materials from hundreds of MIT courses  visit MIT opencourseware at ocw.mit.edu  all right so we've talked a little bit  about caching before but today we're  going to talk in much more detail about  caching and how to design cache  efficient algorithms so first let's look  at the caching Hardware on modern  machines today so here's what the cache  hierarchy looks like for a multi-core  chip we have a whole bunch of processors  they're all they all have their own  private l1 caches for both the data as  well as the instruction they also have a  private l2 cache and then they share a  last level cache or l3 cache which is  also called LLC they're all connected to  a memory controller that can access DRAM  and then oftentimes you'll have multiple  chips on the same same server and these  chips would be connected through a  network so here we have a bunch of  multi-core chips that are connected  together so we can see that there are  different levels of memory here and the  sizes of each one of these levels of  memory is different so the sizes tend to  go up as you move up the memory  hierarchy the l1 caches tend to be about  32 kilobytes in fact these are the  specifications for the machines that  you're using in this class so 32  kilobytes for both the l1 data cache in  the l1 instruction cache 256 kilobytes  for the l2 cache so the l2 cache tends  to be about 8 to 10 times larger than l1  cache and then the last level cache the  size is 30 megabytes so this is  typically on the order of tens of  megabytes and then DRAM is on the order  of Giga bytes so  we have 128 gigabyte DRAM and nowadays  you can actually get machines that have  terabytes of DRAM so the associativity  tends to go up as you move up the cache  are key and I'll talk more about social  scientific on the next couple slides the  time to access the memory also tends to  go up so the latency tends to go up as  you move up the memory hierarchies so  the l1 caches are the quickest to access  so about 2 nanoseconds you're just rough  numbers the l2 cache is a little bit  slower so say for nanoseconds last level  cache maybe 6 nanoseconds and then when  you have to go to DRAM it's about an  order of magnitude slower so 50  nanoseconds in this example and the  reason why the memories are further down  in the cache mark you're faster is  because they're using more expensive  materials to manufacture these things  but since they tend to be more expensive  we can't fit as much of that on the  machines so that's why the faster  memories are smaller than the slower  memories but if we're able to take  advantage of locality in our programs  and we can make use of the fast memory  as much as possible we'll talk about  ways to do that in this lecture today  there's also the latency across the  network which tends to be cheaper than  going to main memory but slower than  doing a last level cache access and  there's a lot of work in trying to get  the cache coherence protocols right as  we mentioned before so since these  processors all have private caches we  need to make sure that they all see a  consistent view of memory when they're  trying to access the same memory  addresses in parallel so we talked about  the MSI cache protocol before and there  are many other protocols out there and  you can read more about these things  online but these are very hard to get  right and there's a lot of verification  involved in trying to prove that the  cache coherence protocols are correct  so any questions so far okay so let's  talk about the associativity of a cache  so here I'm showing you a fully  associative cache and in a fully  associative cache a cache block can  reside anywhere in the cache and the  basic unit of movement here is a cache  block in this example the cache block  size is 4 bytes but on the on the  machines that we're using for this class  the cache block size is 64 bytes but for  this example I'm going to use a 4 byte  cache line so each row here corresponds  to one cache line in a fully associative  cache means that each line here can go  anywhere in the cache and then here  we're also assuming a cache size that  has 32 bytes so in total it can store a  cache lines since the cache line is 4  bytes so to find a block in a fully  associative cache you have to actually  search the entire cache because a cache  line can appear anywhere in the cache  and there's a tag associated with each  of these cache lines here that that  basically specify which of the memory  addresses in virtual memory space it  corresponds to so for the fully  associative cache we're actually gonna  use most of the bits of the address as a  tag we don't actually need the two lower  order bits because the things are being  moved that degree and the granularity of  cache lines which are 4 bytes so the two  lower order bits are always going to be  the same but we're just gonna use the  rest of the bits to store the tag so if  our address space is 64 bits then we're  gonna use 62 bits to store the tag in a  fully associative cache in scheme and  when a cache becomes full a block that  has to be victim a chrome for a new  block and there are various ways that  you can decide how to evict a block so  this is known as the replacement policy  one common replacement policy is LRU or  least recently used  so you basically kick the thing out that  has been used the farthest in the past  dollar schemes such as second chance and  clock replacement we're not going to  talk too much about the different  replacement schemes today but you can  feel free to read about these things  online so what's a disadvantage of this  scheme  yes yeah why is it slow  yeah so so the disadvantage is that  searching for a cache line in the cache  can be pretty slow because you have to  search the entire cache in the worst  case since a cache block can reside  anywhere in the cache so even though the  search can go on in parallel and  hardware still expensive in terms of  power and performance to have to search  most of the cache every time so let's  look at another extreme this is a direct  mapped cache so in a direct mapped cache  each cache block can only go in one  place in the cache so I've color-coded  these cache blocks here so the red  blocks can only go in the first row of  this cache the orange ones can only go  in the second row and so on and the  position which cache block can go into  is known as that cache blocks set so the  set determines the location in the cache  for each particular block so let's look  at how the virtual memory address is  divided up into and which of the bits  we're gonna use to figure out where a  cache block should go in the cache so we  have the offset we have the set and then  the tag fields the offset just tells us  which position we want to access within  a cache block so since a cache block has  B bytes we only need log base two of B  bits that's offset and the reason why we  have to also is because we're not always  accessing something at the beginning of  a cache block we might want to access  something in the middle and that's why  we need to offset to specify where in  the cache block we want to access then  there's a set field and the set field is  going to determine which position in the  cache that cache block can go into so  there are eight possible positions for  each cache block here and therefore we  only need log base two of eight bits so  three bits for the set in this example  and more generally it's going to be log  base 2 of M over B and here M over B is  eight  and then finally we're gonna use the  remaining bits as a tag so W minus log  base 2 of M bits for the tag and that  gets stored along with the cache block  in the cache and that's going to  uniquely identify which of the which of  the memory blocks the the cache block  corresponds to in virtual memory and you  can verify that the sum of all of these  quantities here sums to W bits so in  total we have a W bit address space and  the sum of those three things is W so  what's the advantage and disadvantage of  this scheme  so first what's the what's a good thing  about this scheme compared to the  previous scheme that we saw yes yes fast  because you only have to check one place  because each cache block can only go in  one place in the cache and that's the  only place you have to check when you  try to do a lookup if the cache block is  there then you find it if it's not then  you know it's not in the cache what's  the downside to this scheme yeah yeah so  it's a good answer so the downside is  that you might for example or just be  accessing the rad cache blocks and then  not accessing any of the other cache  blocks they'll all get mapped to the  same location in the cache and then  they'll keep keep evicting each other  even though there's a lot of empty space  in the cache and this is known as a  conflict miss and these can be very bad  for performance and very hard to debug  so that's one downside of a direct  mapped cache is that you can get these  conflict misses where you have to evict  things from the cache even though  there's empty space in the cache so as  we said finding a block is very fast  only a single location in the cache has  to be searched but you might suffer from  conflict misses if you keep accessing  things in the same set repeatedly  without accessing the things and the  other sets so any questions  okay so these are sort of the two  extremes for cash design there's  actually a hybrid solution called set  associative cache and in a set  associative cache you still have sets  but each of the sets contains more than  one line now so all the red blocks still  map to the red set but there's actually  two possible locations for the red  blocks now so in this case this is known  as a 2-way associative cache since there  are two possible locations inside each  set and again a cache blocks set  determines K possible cache locations  for that block so within a set it's  fully associative but each block can  only go in one of the sets so let's look  again at how the bits are divided into  in the address so we still have the tag  set and offset fields the offset field  is still log base 2 of B the set field  is going to take log base 2 of M over KB  bits so the number of sets we have is M  over KB so we need log base 2 of that  that number to represent the set of a  block and then finally we use the  remaining bits as a tag so it's going to  be W minus log base 2 of M over K and  now to find a block in the cache only K  locations of the of its set must be  searched  so you basically find which set the  cache block maps to and then you check  all K locations within that set to see  if that cache block is there and  whenever you want to whenever you try to  put something in the cache because it's  not there you have to evict something  and you've X something from the same set  as the block that you're placing into  the cache so for this example I showed a  2-way associative cache but in practice  the associativity is usually bigger say  8 Way 16 way or sometimes 20 way and and  as  you keep increasing the associativity it  looks it's gonna look more and more like  a fully associative cache and if you  have a one-way associative cache then  it's just a direct mapped cache so this  is a sort of a hybrid in-between a fully  mapped cache and a fully associative  cache in a direct mapped cache so any  questions on set associative caches okay  so let's go over a taxonomy of different  types of cache misses that you can incur  so the first type of cache miss is  called a cold miss and this is a cache  miss that you have to incur the first  time you access a cache block and if you  need to access this piece of data  there's no way to get around getting a  cold miss for this because your as cache  starts out not having this block and the  first time you access it you have to  bring it into cache  then there are capacity misses so  capacity misses our cache misses you get  because the cache is full and it can't  fit all of the cache blocks that you  want to access so you get a capacity  miss when the previous cache copy would  have been you victim 'van with a fully  associative scheme so even if all of the  possible locations in your cache could  be used for a particular cache line that  cache line still has to be evicted  because there's not enough space so  that's what's called a capacity miss and  you can deal with capacity misses by  introducing more locality into your code  both spatial in temporal locality and  we'll look at ways to reduce the  capacity misses of algorithms later on  in this lecture then Darr conflict  misses and conflict misses happen in a  set associative caches when you have too  many blocks from the same set mapping to  the wanting to go into the cache and  some of these have to be evicted because  the sack can't fit all of the blocks and  these blocks wouldn't have been you  Victor do had a fully associative  schemes  these are what's called conflict misses  you have to for example if you have 16  things in a set and you keep accessing  17 things that all belong in the set  something's gonna get kicked out every  time you want to access something and  these cache evictions might not have  happened if you had a fully associative  cache and then finally there are sharing  misses so sharing misses only happened  in a parallel context and we talked a  little bit about - sharing a false  sharing misses in Prior lectures so  let's just review this briefly so  sharing this can happen if multiple  processors are accessing the same cache  line and at least one of them is writing  to that cache line if all the processors  are just reading from the cache line  then the cache coherence protocol knows  how to make it work so that you don't  get cache misses they can all access the  same cache line at the same time if  nobody's modifying it but if at least  one processor is modifying it you could  get either true sharing misses or false  sharing misses so a true sharing miss is  when two processors are accessing the  same data on the same cache line and as  you recall from a previous lecture if  one of the two processors is writing to  this cache line whenever it does a write  in that it needs to acquire the cache  line in exclusive mode and then  invalidate the that cache line in all  other caches so then when when another  processor tries to access the same  memory location has to bring it back  into its own cache and then you get a  cache miss there a false sharing miss  happens if two processes are accessing  different data that just happened to  reside on the same cache line because  the basic unit of movement is a cache  line in the architecture so even if  you're accessing different things if  they are on the same cache line you're  still going to get a sharing miss and  fall sharing is pretty hard to deal with  because in general you don't know what  data gets placed on what cache line  there's certain heuristics you can use  for example of your mouth looking a big  we region you know that that memory  region is contiguous so you can space  your access is far enough apart by  different processors so they don't touch  the same cache line so if you're just  declaring local variables on the stack  you don't know where the compiler is  going to decide to place these variables  in the virtual memory address space so  these are for different types of cache  misses that you should know about  and there's many models out there for  analyzing the cache performance of  algorithms and some of the models ignore  some of these different types of cache  misses so just be aware of this when  you're looking at algorithm analysis  because not all the models will capture  all of these different types of cache  misses so let's look at a bad case for  conflict misses so here I'm I have a I  want to access a sub matrix within a  larger matrix and recall that the  matrices are stored in row major order  and let's say our matrix is 4096 columns  by 4096 rows and it still stores doubles  so therefore each row here is going to  contain two to the fifteenth bytes  because 4096 is 2 to the 12 and we have  doubles each one of which takes 8 bytes  so 2 to 12 times 2 to the third which is  2 to 15th we're gonna assume the word  width is 64 which is standard we're  gonna assume that we have a cache size  of 32 K and the cache block size is 64  which again is standard and let's say we  have a four-way associative cache so  let's look at how the how the bits are  divided into so again we have this  offset which takes log base 2 of B bits  so how many bits do we have for the  offset in this example  all right so we have six bits so it's  just log base 2 of 64 what about for  this set how many bits do we have for  that seven who said seven yeah so so it  is seven so M is 32 K which is to 215th  and then k is 2 to the 2 b is 2 to the 2  to the 6 so it's 2 to 15 divided by 2  the 8 which is 2 to 7th and log base 2  of that is 7 and finally what about the  tag field  51 why is that yes it's just 64 minus  seven minus six which is 51 okay so  let's say that we want to access a sub  matrix within this larger matrix let's  say we want to access a 32 by 32 sub  matrix and this is pretty common in  matrix algorithms where you want to  access sub matrices especially in  dividing conquer algorithms and let's  say we want to access a column of this  sub matrix a so the addresses of the  elements that we're going to access are  as follows so let's say we the first  element in a column is sort out address  X then the second element in the column  is going to be stored at address X plus  two to the fifteenth because each row  has two to the fifteenth bytes and we're  skipping over an entire row here to get  to the element in the second in the next  row of the sub matrix so we're gonna add  to the fifteenth and then to get the  third element we're gonna add two times  two to the fifteenth and so on until we  get to the last element which is X plus  31 times two to the fifteenth so which  the olds of the address are changing as  we go through one column of this sub  matrix  yeah  yeah so it's just gonna be the tag  that's changing the set and the offset  are gonna stay the same because we're  just using the lower 13 bits to store  the setna tag and therefore when we  increment by to the 15th we're not going  to touch the set in the offset so all of  these addresses fall into the same set  and this is a problem because ours us  our cache is only 4-way associative so  we can only fit for cash for cache lines  in each set and here we're accessing 31  of these things so by the time we get to  the next column of a all the things that  we access in the current column of a are  going to be evicted from cache already  and this is known as a conflict miss  because if if you had a fully  associative cache this might not have  happened because you could actually use  any location in the cache to store these  cache blocks so is anybody have any  questions on why we get conflict misses  here  so anybody have any ideas on how to fix  this so what can I do to make it so that  I'm not incrementing by to the exactly  to the 15th every time  yeah yeah yeah yeah so one solution is  to Pat the matrix you can add some  constant amount of space to the end of  the matrix so each row is gonna be  longer than to the 15th bytes so maybe  you add some small constant like 17 so  add 17 bytes the end of each row and now  when you act as a column of this sub  matrix  you're not just incrementing by to the  15th you're also adding some small  integer and that's going to cause the  set and offset fields to change as well  and you're not going to get as many  conflict misses so that's one way to  solve the problem it turns out that if  you're doing a matrix multiplication  algorithm that's a cubic work algorithm  and you can basically afford to copy the  sub matrix into a temporary 32 by 32  matrix do all the operations on the  temporary matrix and then copy it back  out to the original matrix the copy  copying only takes quadratic work to do  across the whole algorithm and since the  whole algorithm takes cubic work the  quadratic work is a lower order term so  you can also you can use temporary space  to make sure that you don't get conflict  misses any questions  okay so this was conflict misses so  conflict misses are important but  usually we're going to be first  concerned about getting good spatial and  temporal locality because those are  usually the higher-order factors in the  performance of a program once we get  good spatial and temporal locality in  our program we can then start worrying  about conflict misses for example by  using temporary space or padding or our  data by some small constants so that we  don't have as many conflict misses okay  so now I want to talk about a model that  we can use to analyze the cache  performance of algorithms and this is  called the ideal cache model so in this  model we have a two level cache  hierarchy so we have the cash and then  main memory the cache size is of size M  and the cache line sizes of B bytes and  therefore we can fit M over B cache  lines inside our cache this model  assumes that the cache is fully  associative so any cache block can go  anywhere in the cache and it also  assumes an optimal Nishant replacement  policy so this means that what we want  to evict a cache block from the cache  we're going to pick the thing to evict  that gives us the best performance over  all it gives us the lowest number of  cache misses throughout our entire  algorithm so we're assuming that we know  the sequence of memory requests  throughout the entire algorithm and  that's why it's called the nisshin  replacement policy and if something is  in cache you can operate on it for free  and if something is in main memory you  have to bring it into cache and then you  incur a cache miss so there are two  performance measures that we care about  first we care about the order in any  work which is just the ordinary running  time of a program so this this is the  same as before when we were analyzing  algorithms it's just a total number of  operations that the program does and the  number of cache misses is going to be  the number of  we have to transfer between the main  memory and the cache so a number of  cache misses just counts the number of  cash transfers whereas the work counts  all the operations that you have to do  in the algorithm so ideally we would  like to come up with algorithms that  have a low number of cache misses  without increasing the work from the  traditional standard algorithm sometimes  we can do that sometimes we can't do  that and then there's a trade-off  between the work and the number of cache  misses and and then you it's a trade-off  that you have to decide whether it's  worthwhile as a performance engineer  today we're gonna look at an algorithm  where you can't actually reduce a number  of cache misses without increasing the  work so you basically get the best of  both worlds so any questions on this  ideal cache model  so this model is just used for analyzing  algorithms you can't actually buy one of  these caches at the store so this is a  very ideal cache and they don't exist  but it turns out that this optimal  initial reply sment policy has nice  theoretical properties and this is a  very important lemma that has that was  proved in 1985 it's called the LRU lemma  it was proof by Slater and targe end and  the lemma says suppose that an algorithm  incurs Q cache misses on an ideal cache  of size M then on a fully associative  cache of size 2m that uses the LRU or  least recently used replacement policy  it incurs at most 2q cache misses so  what this says is if I can show the  number of cache misses for an algorithm  on the ideal cache then if I take a  fully associative cache that's twice the  size and use the LRU replacement policy  which is a pretty practical policy then  the algorithm is going to incur at most  twice the number of cache misses and the  implication of this Lama is that for  asymptotic analysis you can assume  either the optimal replacement policy or  the LRU replacement policy as convenient  because because the cache number of  cache misses is just going to be within  a constant factor of each other so this  is a very important lemma it says that  this basically makes it much easier for  us to analyze our cache misses and  algorithms and here's a software  engineering principle that I want to  point out so first when you're trying to  get good performance you should come up  with a theoretically good algorithm that  has good bounds on the work in the cache  complexity and after you come up with a  algorithm that's theoretically good then  you start engineering for detailed  performance you start worrying about the  details such as real world caches not  being full  associative and for example loads and  stores having different costs with  respect to bandwidth and latency but  coming up with a theoretically good  algorithm is the first-order bit to  getting good performance questions  okay so let's start analyzing the number  of cache misses in program so here's a  llama  so the llama says suppose that a program  reads a set of our data segments where  the ithe segment consists of s sub I  bytes and suppose that the sum of the  sizes of all the segments is equal to N  and we're going to assume that n is less  than M over 3 so the segments the sum of  the size of the segments is less than  the cache size divided by 3 we're also  going to assume that n over R is greater  than or equal to B so recall that R is  the number of data segments we have and  n is the total size of the segment so  what does n over B mean and over R means  semantically yes yes oh and over R is  just the average size of a segment and  here we're saying that the average size  of a segment is at least B's at least  the size of a cache line so these two  assumptions hold then all of the  segments are going to fit into cache and  the number of cache misses to read them  all is at most 3 times n over B so if  you if you had just a single array of  size n then the number of cache misses  you would need to read that array into  cache is going to be n over B and this  is saying that even if our data is  divided into a bunch of segments as long  as an average length of the segments is  large enough then a number of cache  misses is just a constant factor worse  than reading a single array so let's try  to prove this cache miss lemma so here's  a proof so a single segment S sub I is  going to incur at most s sub I over B  plus 2 cache misses says anyone want to  tell me where the S sub I over B plus 2  comes from  so this is the let's say this is a  segment that we're analyzing and this is  how it's aligned in virtual memory yes  yeah so S sub I over B plus two is the  number of blocks that could overlap with  in the worst case so you need s mi over  B cache missus just to load those s of I  bytes but then the beginning in the end  of that segment might not be perfectly  aligned with a cache line boundary and  therefore you could waste at most one  block on each side of this segment so  that's where the plus two comes from so  to get the total number of cache misses  we just have to sum this quantity from I  equals 1 to R so if I sum s of I over B  from I equals 1 to R I just get n over B  by definition and then I sum 2 from  idols 1 to R so that just gives me 2 R  now I'm going to multiply the top and  the bottom of the second term by B so to  R B over B now and then that's less than  or equal to n over B plus 2 and over B  so where did I get this inequality here  why do I know that to R B is less than  or equal to 2 n yes  yes you know that n is greater than or  equal to BR by this assumption up here  so therefore RB is less than or equal to  N and then NB plus 2n be just sums up to  3 and B so in the worst case we're gonna  incur 3 and over B cache misses okay so  any questions on this cache miss lemma  so important thing to remember here is  that if you're if you have a whole bunch  of data segments and the average length  of your segments is large enough bigger  than a cache block size that you can  access all of these segments just like  just like a single array it only  increases the number of cache misses by  a constant factor and if you're doing an  asymptotic analysis then it doesn't  matter so we're gonna be using this  cache miss lemma later on when we  analyze algorithms ok so another  assumption that we're gonna need is  called the tall cache assumption and the  tall cache assumption basically says  that the cache is taller than it is wide  so says that B squared is less than cm  for some sufficiently small constant C  less than or equal to 1 so in other  words it says that the number of cache  lines and over B you have is going to be  bigger than B and this tall cache  assumption is usually satisfied in  practice so here the cache line sizes  and the cache sizes on the machines that  we're using so cache line size is 64  bytes and the l1 cache size is 32  kilobytes so 64 bytes squared that's 2  to 12 and 32 kilobytes less to 215th  bytes so 2 to 12 is less than 2 to 15 so  it satisfies a tall cache assumption and  as we go up the memory hierarchy the  cache size increases but the cache line  length stays the same so  the cash has become even taller as we  move up the memory hierarchy so let's  see why this tall cache assumption is  going to be useful to see that we're  going to look at what's wrong with a  short cache  so in the short cache our lines are  going to be very wide and they're wider  than the number of lines that we can  have in our cache and let's say we're  working with an N by n sub matrix sort  and row major order if you have a short  cache then even if n squared is less  than cm meaning that you can fit all the  bytes of the sub matrix in cache you  might still not be able to fit it into a  short cache and this picture sort of  illustrates this so we have n rows here  but we can only fit M over B of the rows  in the cache because the the cache lines  are so long and we're actually wasting a  lot of space on each of the cache lines  we're only using a very small fraction  of each cache line to store the row of  this sub matrix if we if this were the  entire matrix then it would actually be  okay because consecutive rows are going  to be placed together consecutively in  memory but if this is a sub matrix then  we can't be guaranteed that the next row  is going to be placed right after the  current row and oftentimes we have to  deal with sub matrices when we're doing  recursive matrix algorithms okay so this  is what's wrong with short caches and  that's why we want to assume the tall  cache assumption and we can assume that  because it's usually satisfied in  practice the TLB actually tends to be  short it only has a couple entries so it  might not satisfy the tall cache  assumption but all the other caches will  satisfy this assumption any questions  okay  so here's another llama that's going to  be useful this is called the sub-matrix  caching llama so suppose that we have an  N by n matrix a and it's read into a  tall cache that satisfies B squared less  than cm for some constant C less than  equal to 1 and suppose that N squared is  less than M over 3 but it's greater than  or equal to CM then a is going to fit  into cache in a number of cache misses  required to read all of ADEs elements  into cache is at most 3 n squared over B  ok so let's see why this is true so  we're gonna let big n denote the total  number of bytes that we need to access  so big n is going to be equal to N  squared and we're gonna use the cache  miss lemma which says that if the  average length of our segments is large  enough that we can read all the segments  in just like it were a single contiguous  array so the lengths of our segments  here are going to be little n so R is  gonna be little N and that's also the  number of segments is going to be little  and and the segment length is also going  to be little ends since we're working  with a square sub-matrix here and then  we also have that the cache block size B  is less than or equal to N and that's  equal to big and over R and where do we  get this property that B is less than or  equal to n so I made some assumptions up  here where I can use to infer that B is  less than or equal to n does anybody see  where yeah  yeah so I know that B squared is less  than cm cm is less than or equal to n  square so therefore B squared is less  than N squared in B is less than n so so  now I also have that n is less than M  over 3 just by assumption and therefore  I can use the cache miss lemma so the  cache miss lemma tells me that I only  need a total of 3 N squared over B cache  misses to read this whole thing in any  questions on the sub-matrix hashing  lemma  okay so now let's analyze matrix  multiplication so how many of you have  seen matrix multiplication before so a  couple of you okay so here's what the  code looks like for the standard cubic  work matrix multiplication algorithm so  we have two input matrices a and B and  we're gonna store the result in C and  the height in the width of our matrix is  n we're just going to deal with square  matrices here but what I'm going to talk  about also extends to non square  matrices and then we just have three  loops here we're going to loop through I  from 0 to n minus 1 J from 0 to n minus  1 and K from 0 to n minus 1 and then  we're gonna let C of I n plus J be  incremented by a of I n plus K times B  of K n plus J so this is just the  standard code for matrix multiply so  what's the work of this algorithm should  be all should be review for all of you  and cubed  okay so now let's analyze the number of  cache misses this algorithm is going to  incur and again we're gonna assume that  the matrix is in row major order and we  satisfy the tall cache assumption we're  also going to analyze the number of  cache misses in matrix B because it  turns out that the number of cache  misses incurred by matrix B is going to  dominate the number of cache misses  overall in there are three cases we need  to consider the first case is when n is  greater than C M over B for some  constant C and we're gonna analyze  matrix B as I said and we're also going  to assume LRU because we can if you  recall the LRU lemma it says that  whatever we analyzed using the LRU is  just going to be a constant factor  within what we analyzed using the ideal  cache so so to do this matrix  multiplication I'm going to go through  one row of a and one column of B and do  the dot product there this is what  happens in the innermost loop and how  many cache misses am I going to incur  when I go down one column of B here so  here I have the case where n is greater  than M over B so I can't fit all of I  can't fit one block from each row into  the cache so how many cache misses do I  have the first time I go down a column  of B so how many rows of B do I have and  yeah and how many cache misses do I need  for each row one so in total I'm going  to need n cache misses for the first  column of B what about the second column  of B  so recall that I'm assuming the LRU  replacement policy here so when the  cache is full  I'm gonna evict the thing that was least  recently used used the furthest in the  past  sorry could you repeat that yes it's  still gonna be n why is that  yeah so it's still gonna be n because I  can't fit one cache block from each row  into my cache and by the time I get back  to the top of my matrix B the top block  has already been evicted from the cache  and I have to load it back in and this  is the same for every other block that I  access I'm again gonna need n cache  misses for the second column of B and  this is going to be the same for all the  columns of B and I have to do this again  for the second row of a so in total I'm  going to need theta of n cubed number of  cache misses and this is one cache miss  per per entry that I access in B and  this is not very good because the total  work was also theta of n cube so I'm not  gaining anything from having any  locality in this in this algorithm here  so any questions on this analysis so  this is just a case one let's look at  case two so in this case n is less than  C M over B so I can fit one block from  each row of B into cache and then n is  also greater than another constant C  prime times square root of M so I can't  fit the whole matrix into cache and  again let's analyze the number of cache  misses incurred by accessing be assuming  LRU so how many cache misses am I going  to incur for the first column of B and  so the same as before what about the  second column of B so by the time I get  to the beginning of the beginning of the  matrix here is the top block going to be  in cache so who thinks the block is  still gonna be in cache when I get back  to the beginning yeah so a couple of  people who think is gonna be out of  cache okay so turns out it is going to  be in cache because I can fit one block  for every row of B into my  since I have n less than cm over B so  therefore when I get to the beginning of  the second column that block is still  gonna be in cash because I loaded it in  when I was accessing the first column so  I'm not gonna incur any cash misses for  the second column and in general if if  my if I can fit B columns or some  constant times B columns in in into cash  then I can reduce the number of cache  misses I have by a factor of B so I only  need to incur a cache miss the first  time I access a block and not for all  the subsequent accesses and the same is  true for the second row of a and since I  have n rows of a I'm gonna have n times  theta of N squared over B cache misses  for each row of a I'm gonna incur and  squared over B cache misses  so the overall number of cache misses is  and cubed over B and this is because  inside matrix B I can exploit spatial  locality once I load in a block I can  reuse it the next time I traverse down a  column that's nearby any questions on  this analysis  okay so let's look at the third case and  here n is less than C prime times square  root of M so this means that the entire  matrix fits into cache so let's analyze  the number of cache misses for matrix B  again assuming LRU so how many cache  misses do I have now so let's let's  count the total number of cache misses I  have for every time I go through a row  of a yes yeah so the firt for the first  column is going to be n what about the  second column right so for the so  basically for the first row of a the  analysis is going to be the same as  before I need n squared over B cache  misses to load the cache in what about  the second row of a how many cache  misses am I going to incur  yeah so for the second row of a I'm not  gonna incur any cash misses because once  I load B into cash  it's gonna stay in cash because the  entire matrix can fit in cash I assume  that n is less than C prime times square  root of M so total number of cache  misses I need for matrix B is theta of n  squared over B since everything fits in  cache and I just applied the sub matrix  caching lemma from before  overall this is not a very good  algorithm because as you recall in case  one I needed a cubic cubic number of  cache misses what happens if I swap the  order of the inner two loops so recall  that this was one of the optimizations  in lecture one when Charles was talking  about matrix multiplication and how to  speed it up so if I swap the order of  the two inner loops then then for every  iteration what I'm doing is I'm actually  going over a row of C in a row of B and  a stays fixed inside the innermost  iteration so now when I analyze the  number of cache misses of matrix B  assuming LRU I'm gonna benefit from  spatial locality since I'm going row by  row and the matrix is stored in row  major order so across all of the rows  I'm just going to require theta of n  squared over B cache misses and I have  to do this end times for the outer loop  so in total I'm going to get theta of n  cubed over B cache misses so if you swap  the order of the inner two loops this  significantly improves the locality of  your algorithm and you can benefit from  spatial locality and that's why we saw a  significant performance improvement in  the first lecture when we swapped the  order of the loops any questions  does anybody think we can do better than  n cubed over B cache misses or you think  that it's the best you can do so how  many people think you can do better yeah  and how many people think this is the  best you can do okay and how many people  don't care so turns out you can do  better and we're gonna do better by  using an optimization called tiling so  how this is gonna work is instead of  just having three for loops I'm gonna  have six for loops and I'm gonna loop  over tiles so I've got loop over s by s  sub matrices and within each sub makes  listing I'm gonna do all the computation  I need for that sub matrix before moving  on to the next sub matrix so the three  innermost loops are going to loop inside  a sub matrix and the three outermost  loops are going to loop within the  larger matrix one sub matrix at a time  so let's analyze the work of this  algorithm so the work that we need to do  for a sub matrix of size s by s is going  to be s cubed since that's just a bound  for matrix multiplication and then a  number of times I have to operate on sub  matrices is going to be n over s cubed  and you can see this if you just  consider each sub matrix to be a single  element and then using the same cubic  work analysis on the smaller matrix so  the work is n over s cubed times s cubed  which is equal to theta n cube so the  work of this tiled matrix multiplies the  same as the version that didn't do  tiling and now let's analyze the number  of cache misses  so we're gonna tune s so that the  sub-matrices  just fit into cash so we're gonna set s  to be equal to theta of square root of M  who actually need to make this like  one-third square root of M because we  need to fit three sub matrices in the  cache but it's going to be some constant  times square root of M the sub-matrix  cashing leva implies that for each sub  matrix  we're gonna need s squared over B misses  to load it in and once we load it into  cash  it fits entirely into cache so we can do  all of our computations within cash and  not incur any more cache misses so  therefore the total number of cache  misses we're going to incur it's going  to be the number of subproblems which is  n over s cubed and then the number of  cache misses per subproblem which is s  squared over B and if you multiply this  out you're gonna get and cubed over B  times square root of M so here I plugged  in square root of M for s and this is a  pretty cool result because it says that  you can actually do better than the n  cubed over B bound you can improve this  bound by a factor of square root of M  and in practice square root of M is  actually not insignificant so for  example if you're looking at the last  level cache the size of that is on the  order of megabytes so a square root of M  is going to be on the order of thousands  so this significantly improves the  performance of the matrix multiplication  code if you tune s so that the sub  matrices just fit in the cache it turns  out that this bound is off to most so  this was shown in 1981  so for cubic work matrix multiplication  this is the best you can do if you use  another matrix multiply algorithm like  strassens algorithm you can do better so  I want you to remember this bound it's a  very important bound to know says that  for matrix multiplication you can you  can benefit both from spatial locality  as well as  temporal locality so I get spatial  locality in the denominator the B term  in the denominator and then the square  root of M term comes from temporal  locality since I'm doing all of the work  inside the sub-matrix before I evict  that sub matrix from cache any questions  on this analysis so what's one issue  with this algorithm here yes yeah so the  problem here is I have two two nests for  my particular machine and I call this a  voodoo parameter it's sort of like a  magic number I I put into my program so  that it fits in the cache on the  particular machine I'm running on and  this makes the code not portable because  when I try if I try to run this code on  another machine the cache sizes might be  different there and then I won't get the  same performance as I did on my machine  and this is also an issue even if you're  running it on the same machine because  you might have other programs running at  the same time and using a part of the  cache so you don't actually know how  much of the cache your program actually  gets to use in a multi programming  environment and then this was also just  for one level of cache if we want to  optimize for two levels of caches we're  gonna have to voodoo parameters s and T  we're gonna have sub matrices and sub  sub matrices and then we have to tune  both of these parameters to get the best  performance on our machine and  multi-dimensional tuning optimization  can't be done simply with binary search  so if you're just tuning for one level  of cache you can do a binary search on  the parameter s but here you can't do  binary search so so it's much more  expensive to optimize here and the code  becomes a little bit Messier you have  nine four loops instead of six and how  many levels of caches do we have on the  machines that we're using today three so  four three level cache you have three  voodoo  parameters you have 12 nested for-loops  this code becomes very ugly and you have  to tune these parameters for your  particular machine and this makes the  code not very portable as one student  pointed out and in a multi programmed  environment you don't actually know the  effective cache size that your program  has access to because other jobs are  running at the same time and therefore  it's very easy to miss - in the  parameters with their question so I need  questions yeah yes so you I mean you can  auto tune your program so that it's  optimized for the cache sizes of your  particular machine instruction to get  the size of your cache I'm not actually  sure do you know  yeah in the rock  [Music]  yeah proxy pu info yeah yes so you can  probably get that as well yeah yeah but  even if you do that and you're running  this program when other jobs are running  you don't actually know how much cash  your program has access to yes  no it's they're actually general-purpose  we're actually  I mean today we're just looking at  matrix multiply but in on Thursday's  lecture we'll actually be looking at  many other problems and how to optimize  them for the cache hierarchy other  questions  okay so this was a I mean this was a  good algorithm in terms of cache  performance but it wasn't very portable  so let's see if we can do better let's  see we can come up with a simpler design  where we still get pretty good cache  performance so we're going to turn to  dividing conquer we're going to look at  the recursive matrix multiplication  algorithm that we saw before again we're  going to deal with square matrices but  the results generalize to non square  matrices so how this works is we're  going to split our input matrices into  four sub sub matrices or four quadrants  and then for each quadrant of the output  matrix it's just going to be the sum of  two matrix multiplies on n over two by n  over two matrices so c1 1 is going to be  a1 1 times b1 1 plus a12 times b1 b2 1  and then we're gonna do this recursively  so every level of recursion we're gonna  get eight multiply adds of n over two by  n over two matrices here's what the  recursive code looks like you can see  that we have eight recursive calls here  the base case here is of size 1 in  practice you want a course in the base  case to overcome function call overheads  let's also look at what these values  here correspond to so I've color-coded  these that they correspond to particular  elements in the sub matrix that I'm  looking at on the right so these values  here corresponds to the index of the  first element in each of my quadrants so  the first element in my first quadrant  is just gonna have an offset of zero and  then the first element of my second  quadrant that's gonna be on the same row  as the first element in my first  quadrant so I I just need to add a  difference between the I just need to  add the width of my quadrant which is n  over 2 and then to get the first element  in quadrant 2 1 I'm going to jump over  and over to rows and  each row has a length row size so it's  just gonna be an over two times row size  and then to get the first element in  Quadrant 2 2 it's just the first element  in Quadrant 2 1 + + / 2 so that's n over  2 times row size plus 1 so let's analyze  the work of this algorithm so what's the  recurrence for this algorithm for the  work of this algorithm so how many  subproblems do we have 8 and what's the  size of each subproblem n over 2 and how  much work are we doing to set up the  recursive calls constant amount of work  so the recurrence is w of n is equal to  8 w n over 2 plus theta of 1 and what  does that solve to and cubed so it's one  of the three cases of the master theorem  we're actually going to analyze this in  more detail using by drawing out the  recursion tree and this is going to give  us more intuition about why the master  theorem is true so at the top level of  my recursion tree I'm gonna have a  problem of size N and then I'm gonna  branch into 8 sub problems of size n  over 2 and then I'm gonna do a constant  amount of work to set up the recursive  calls here I'm just labeling this with  one so I'm ignoring the constants but  it's not gonna matter for asymptotic  analysis and then I'm gonna branch again  into 8 sub problems of size and over  four and eventually I'm gonna get down  to the leaves and how many levels do I  have until I get to the leaves  yes yes o log n what's the base of the  log  yes it's log base 2 of n because I'm  dividing my problem size by two every  time and therefore the number of leaves  I have is going to be 8 to the log base  2 of n because I'm branching 8 ways  every time 8 to the log base 2 of n is  the same as n to the log base 2 of 8  which is n cubed the amount of work I'm  doing at the top level is constant so  I'm just gonna say one here the next log  was eight times that then 64 and then  when I get to the leaves it's going to  be theta of n cubed since I have n cubed  leaves and they're all doing constant  work and the work is geometrically  increasing as I go down the recursion  tree so the overall work is just  dominated by the work I need to do at  the leaves so the overall work is just  gonna be theta n cubed and this is the  same as the looping versions of matrix  multiply they're all cubic work now  let's analyze the number of cache misses  of this dividing conquer algorithm so  now a migrant recurrence is gonna be  different my base case now is when the  sub matrix fits in the cache so when n  squared is less than CM and when that's  true I just need to load that some  matrix in the cache and then I don't  incur any more cache misses so I need  theta of N squared over the cache misses  when N squared is less than CM for some  sophisticated sufficiently small  constant C less than equal to 1 and then  otherwise i recurse into 8 sub problems  of size n over 2 and then I add theta of  1 because I'm doing a constant amount of  work to set up the recursive calls and I  get this theta of N squared over B term  from the sub matrix cashing lemma says I  can just load the entire matrix into  cache with this many cache misses so so  the difference between the cache  analysis here and the work analysis  before is that I have a different base  case and I think in all of that  that you've seen before the base case  was always of a constant size but here  we're working with a base case that's  not of a constant size so let's try to  analyze this using the recursion tree  approach so at the top level I have a  problem of size n then I'm going to  branch and do eight problems of size n  over two and then I'm also going to  incur constant number of cache misses  I'm just gonna say one here then I'm  gonna branch again and then eventually  I'm going to get to the base case where  N squared is less than CM and when N  squared is less than CM then the number  of cache misses that I'm gonna incur is  going to be theta of CM over B so I can  just plug in CM here for n squared and  the number of levels of recursion I have  in this recursion tree is no longer just  log base two of n I'm gonna have log  base two of n minus log base two of  square root of CM number of levels which  is the same as log base two of n minus  1/2 times log base two of CM and then  the number of leaves I get is going to  be eight to this to this number of  levels here says 8 to log base two of n  minus 1/2 of log base two of cm and this  is equal to theta of n cubed over m to  the three halves so the n cubed comes  from the eight to the log base two of n  term and then if I do eight to the log  base 8 to the negative one half of log  base two of CM that's just gonna give me  m over m to the three halves in the  denominator so any questions on how I  computed the number of levels of this  recursion tree here  so I'm basically dividing my problem  size by two until I get to a problem  size that fits into cash so that means n  is less than square root of CM so  therefore I can subtract that many  levels for my recursion tree and then to  get the number of leaves since I'm  branching eight ways I just do eight to  the power of the number of levels I have  and then that gives me the total number  of leaves  so now let's analyze the number of cache  misses I need at each level of this  recursion tree at the top level I have a  constant number of cache misses let's  just say one the next level I have eight  sixty-four and then at the leaves I'm  going to have theta of n cubed over B  times square root of M cache misses and  I got this quantity just by multiplying  the number of leaves by the number of  cache misses per leaf so number of  leaves is n cubed over m to the three  halves the cache misses per leaves is  state of CM over B so I lose one factor  of B in the denominator I'm left with  the square root of M at the bottom and  then I also divide by the block size B  so overall I get n cubed over B times  square root of M cache misses and again  this is a geometric series and the  number of cache misses at the leaves  dominates all of the other levels so the  total number of cache misses I have is  going to be theta n cubed over B times  square root of M and notice that I'm  getting the same number of cache misses  as I did with the tiling version of the  code but here I don't actually have the  two in my code for the particular cache  size so what cache sizes does this code  work for  so is this code gonna work on your  machine it's gonna get good cache  performance so this code is gonna work  for all cache sizes because I didn't  tune it for any particular cache size  and this is what's known as a cache  oblivious algorithm it doesn't have any  Voodoo tuning parameters has no explicit  knowledge of the caches and it's  essentially passively Auto tuning itself  for the particular cache size of your  machine it can also work for multi-level  caches automatically because I never  specified what level of cache I'm  analyzing this for I can analyze it for  any level of cache and it's still gonna  give me good cache complexity and this  is also good in multi programmed  environments where you might have other  jobs running and you don't know your  effective cache size this is just gonna  pass all the auto-tune for whatever  cache size is available turns out that  the best cache oblivious codes today  work on arbitrary rectangular matrices I  just talked about square matrices but  the best codes work on rectangular  matrices and they perform binary  splitting instead of eight-way splitting  and you split on the largest of I J and  K so this is what the best cache  oblivious matrix multiplication  algorithm does any questions  okay so I only talked about the serial  setting so far I was assuming that these  algorithms ran on just a single thread  what happens if I go to multiple  processors turns out that the results do  generalize to a parallel context so this  is the recursive parallel  matrix-multiply code that we saw before  and notice that we're executing four sub  calls in parallel doing a sync and then  doing four more sub calls in parallel so  let's try to analyze the number of cache  misses in this parallel code and to do  that we're gonna use this theorem which  says that let Q sub P be the number of  cache misses in a deterministic cell  computation won't run on P processors  each with a private cache of size M and  let S sub P be the number of successful  steals during the computation in the  ideal cache model the number of cache  misses we're gonna have is Q sub P equal  to Q sub 1 plus Big O of number of  steals times M over B so a number of  cache misses in the parallel context is  equal to a number of cache misses when  you run it serially plus this term here  which is the number of steals times M  over B and a proof for this goes as fall  so every call in the silk runtime system  we can have workers steal tasks from  other workers when they don't have work  to do and after workers steals it has  from another worker its cache becomes  completely cold in the worst case  because it wasn't actually working on  that sub problem before but after M over  B cold cache misses its cache is going  to become identical to what it would be  in the serial execution so we just need  to pay em over B cache misses to make  make it so that the cache looks the same  as if they were executing serially and  the same is true when a worker resumes a  stolen computation after a silk sync and  the number of times that these two  situations can happen is two times SP  two times the number of steals and each  time we have to pay  I'm over B cache misses and this is  where this additive term comes from  order S sub P times M over B we also  know that the number of steals in a silk  program is upper bounded by P times T  infinity an expectation where P is a  number of processors and T's infinity is  the span of your computation so if you  can minimize the span of your  computation then this also gives you  good cache bounds so moral of the story  here is that minimizing the number of  cache misses in the serial illusion  essentially minimizes them in the  parallel execution for a low span  algorithm so in this recursive matrix  multiplication algorithm the span of  this is as follows so T infinity of n is  2 T infinity of n over 2 plus theta of 1  since we're doing a sync here we have to  pay the critical path length of two sub  calls this solves two theta of N and  applying the previous lemma this gives  us a cache miss bound of theta of n  cubed over B square root of M this is  just the same as the serial execution  and then this additive term is going to  be order P times n and there's a span  times M over B so that was a parallel  that was a parallel algorithm for matrix  multiply and we saw that we can also get  good cache bounds there so here's a  summary of what we talked about today we  talked about Sousa tivity and caches  different different ways you can design  a cache talked about the ideal cache  model that's useful for analyzing  algorithms we talked about cache aware  algorithm that have explicit knowledge  of the cache and the example we use was  tiled matrix multiply then we came up  with a much simpler algorithm that was  cache oblivious using dividing conquer  and then on Thursdays lecture will  actually see much more on cache  oblivious algorithm designed and then  you'll  an opportunity to analyze the cash  efficiency of some algorithms in the  next homework

2
00:00:03,189 --> 00:00:05,769
the following content is provided under  a Creative Commons license your support  will help MIT OpenCourseWare continue to  offer high quality educational resources  for free  to make a donation or to view additional  materials from hundreds of MIT courses  visit MIT opencourseware at ocw.mit.edu  all right so we've talked a little bit  about caching before but today we're  going to talk in much more detail about  caching and how to design cache  efficient algorithms so first let's look  at the caching Hardware on modern  machines today so here's what the cache  hierarchy looks like for a multi-core  chip we have a whole bunch of processors  they're all they all have their own  private l1 caches for both the data as  well as the instruction they also have a  private l2 cache and then they share a  last level cache or l3 cache which is  also called LLC they're all connected to  a memory controller that can access DRAM  and then oftentimes you'll have multiple  chips on the same same server and these  chips would be connected through a  network so here we have a bunch of  multi-core chips that are connected  together so we can see that there are  different levels of memory here and the  sizes of each one of these levels of  memory is different so the sizes tend to  go up as you move up the memory  hierarchy the l1 caches tend to be about  32 kilobytes in fact these are the  specifications for the machines that  you're using in this class so 32  kilobytes for both the l1 data cache in  the l1 instruction cache 256 kilobytes  for the l2 cache so the l2 cache tends  to be about 8 to 10 times larger than l1  cache and then the last level cache the  size is 30 megabytes so this is  typically on the order of tens of  megabytes and then DRAM is on the order  of Giga bytes so  we have 128 gigabyte DRAM and nowadays  you can actually get machines that have  terabytes of DRAM so the associativity  tends to go up as you move up the cache  are key and I'll talk more about social  scientific on the next couple slides the  time to access the memory also tends to  go up so the latency tends to go up as  you move up the memory hierarchies so  the l1 caches are the quickest to access  so about 2 nanoseconds you're just rough  numbers the l2 cache is a little bit  slower so say for nanoseconds last level  cache maybe 6 nanoseconds and then when  you have to go to DRAM it's about an  order of magnitude slower so 50  nanoseconds in this example and the  reason why the memories are further down  in the cache mark you're faster is  because they're using more expensive  materials to manufacture these things  but since they tend to be more expensive  we can't fit as much of that on the  machines so that's why the faster  memories are smaller than the slower  memories but if we're able to take  advantage of locality in our programs  and we can make use of the fast memory  as much as possible we'll talk about  ways to do that in this lecture today  there's also the latency across the  network which tends to be cheaper than  going to main memory but slower than  doing a last level cache access and  there's a lot of work in trying to get  the cache coherence protocols right as  we mentioned before so since these  processors all have private caches we  need to make sure that they all see a  consistent view of memory when they're  trying to access the same memory  addresses in parallel so we talked about  the MSI cache protocol before and there  are many other protocols out there and  you can read more about these things  online but these are very hard to get  right and there's a lot of verification  involved in trying to prove that the  cache coherence protocols are correct  so any questions so far okay so let's  talk about the associativity of a cache  so here I'm showing you a fully  associative cache and in a fully  associative cache a cache block can  reside anywhere in the cache and the  basic unit of movement here is a cache  block in this example the cache block  size is 4 bytes but on the on the  machines that we're using for this class  the cache block size is 64 bytes but for  this example I'm going to use a 4 byte  cache line so each row here corresponds  to one cache line in a fully associative  cache means that each line here can go  anywhere in the cache and then here  we're also assuming a cache size that  has 32 bytes so in total it can store a  cache lines since the cache line is 4  bytes so to find a block in a fully  associative cache you have to actually  search the entire cache because a cache  line can appear anywhere in the cache  and there's a tag associated with each  of these cache lines here that that  basically specify which of the memory  addresses in virtual memory space it  corresponds to so for the fully  associative cache we're actually gonna  use most of the bits of the address as a  tag we don't actually need the two lower  order bits because the things are being  moved that degree and the granularity of  cache lines which are 4 bytes so the two  lower order bits are always going to be  the same but we're just gonna use the  rest of the bits to store the tag so if  our address space is 64 bits then we're  gonna use 62 bits to store the tag in a  fully associative cache in scheme and  when a cache becomes full a block that  has to be victim a chrome for a new  block and there are various ways that  you can decide how to evict a block so  this is known as the replacement policy  one common replacement policy is LRU or  least recently used  so you basically kick the thing out that  has been used the farthest in the past  dollar schemes such as second chance and  clock replacement we're not going to  talk too much about the different  replacement schemes today but you can  feel free to read about these things  online so what's a disadvantage of this  scheme  yes yeah why is it slow  yeah so so the disadvantage is that  searching for a cache line in the cache  can be pretty slow because you have to  search the entire cache in the worst  case since a cache block can reside  anywhere in the cache so even though the  search can go on in parallel and  hardware still expensive in terms of  power and performance to have to search  most of the cache every time so let's  look at another extreme this is a direct  mapped cache so in a direct mapped cache  each cache block can only go in one  place in the cache so I've color-coded  these cache blocks here so the red  blocks can only go in the first row of  this cache the orange ones can only go  in the second row and so on and the  position which cache block can go into  is known as that cache blocks set so the  set determines the location in the cache  for each particular block so let's look  at how the virtual memory address is  divided up into and which of the bits  we're gonna use to figure out where a  cache block should go in the cache so we  have the offset we have the set and then  the tag fields the offset just tells us  which position we want to access within  a cache block so since a cache block has  B bytes we only need log base two of B  bits that's offset and the reason why we  have to also is because we're not always  accessing something at the beginning of  a cache block we might want to access  something in the middle and that's why  we need to offset to specify where in  the cache block we want to access then  there's a set field and the set field is  going to determine which position in the  cache that cache block can go into so  there are eight possible positions for  each cache block here and therefore we  only need log base two of eight bits so  three bits for the set in this example  and more generally it's going to be log  base 2 of M over B and here M over B is  eight  and then finally we're gonna use the  remaining bits as a tag so W minus log  base 2 of M bits for the tag and that  gets stored along with the cache block  in the cache and that's going to  uniquely identify which of the which of  the memory blocks the the cache block  corresponds to in virtual memory and you  can verify that the sum of all of these  quantities here sums to W bits so in  total we have a W bit address space and  the sum of those three things is W so  what's the advantage and disadvantage of  this scheme  so first what's the what's a good thing  about this scheme compared to the  previous scheme that we saw yes yes fast  because you only have to check one place  because each cache block can only go in  one place in the cache and that's the  only place you have to check when you  try to do a lookup if the cache block is  there then you find it if it's not then  you know it's not in the cache what's  the downside to this scheme yeah yeah so  it's a good answer so the downside is  that you might for example or just be  accessing the rad cache blocks and then  not accessing any of the other cache  blocks they'll all get mapped to the  same location in the cache and then  they'll keep keep evicting each other  even though there's a lot of empty space  in the cache and this is known as a  conflict miss and these can be very bad  for performance and very hard to debug  so that's one downside of a direct  mapped cache is that you can get these  conflict misses where you have to evict  things from the cache even though  there's empty space in the cache so as  we said finding a block is very fast  only a single location in the cache has  to be searched but you might suffer from  conflict misses if you keep accessing  things in the same set repeatedly  without accessing the things and the  other sets so any questions  okay so these are sort of the two  extremes for cash design there's  actually a hybrid solution called set  associative cache and in a set  associative cache you still have sets  but each of the sets contains more than  one line now so all the red blocks still  map to the red set but there's actually  two possible locations for the red  blocks now so in this case this is known  as a 2-way associative cache since there  are two possible locations inside each  set and again a cache blocks set  determines K possible cache locations  for that block so within a set it's  fully associative but each block can  only go in one of the sets so let's look  again at how the bits are divided into  in the address so we still have the tag  set and offset fields the offset field  is still log base 2 of B the set field  is going to take log base 2 of M over KB  bits so the number of sets we have is M  over KB so we need log base 2 of that  that number to represent the set of a  block and then finally we use the  remaining bits as a tag so it's going to  be W minus log base 2 of M over K and  now to find a block in the cache only K  locations of the of its set must be  searched  so you basically find which set the  cache block maps to and then you check  all K locations within that set to see  if that cache block is there and  whenever you want to whenever you try to  put something in the cache because it's  not there you have to evict something  and you've X something from the same set  as the block that you're placing into  the cache so for this example I showed a  2-way associative cache but in practice  the associativity is usually bigger say  8 Way 16 way or sometimes 20 way and and  as  you keep increasing the associativity it  looks it's gonna look more and more like  a fully associative cache and if you  have a one-way associative cache then  it's just a direct mapped cache so this  is a sort of a hybrid in-between a fully  mapped cache and a fully associative  cache in a direct mapped cache so any  questions on set associative caches okay  so let's go over a taxonomy of different  types of cache misses that you can incur  so the first type of cache miss is  called a cold miss and this is a cache  miss that you have to incur the first  time you access a cache block and if you  need to access this piece of data  there's no way to get around getting a  cold miss for this because your as cache  starts out not having this block and the  first time you access it you have to  bring it into cache  then there are capacity misses so  capacity misses our cache misses you get  because the cache is full and it can't  fit all of the cache blocks that you  want to access so you get a capacity  miss when the previous cache copy would  have been you victim 'van with a fully  associative scheme so even if all of the  possible locations in your cache could  be used for a particular cache line that  cache line still has to be evicted  because there's not enough space so  that's what's called a capacity miss and  you can deal with capacity misses by  introducing more locality into your code  both spatial in temporal locality and  we'll look at ways to reduce the  capacity misses of algorithms later on  in this lecture then Darr conflict  misses and conflict misses happen in a  set associative caches when you have too  many blocks from the same set mapping to  the wanting to go into the cache and  some of these have to be evicted because  the sack can't fit all of the blocks and  these blocks wouldn't have been you  Victor do had a fully associative  schemes  these are what's called conflict misses  you have to for example if you have 16  things in a set and you keep accessing  17 things that all belong in the set  something's gonna get kicked out every  time you want to access something and  these cache evictions might not have  happened if you had a fully associative  cache and then finally there are sharing  misses so sharing misses only happened  in a parallel context and we talked a  little bit about - sharing a false  sharing misses in Prior lectures so  let's just review this briefly so  sharing this can happen if multiple  processors are accessing the same cache  line and at least one of them is writing  to that cache line if all the processors  are just reading from the cache line  then the cache coherence protocol knows  how to make it work so that you don't  get cache misses they can all access the  same cache line at the same time if  nobody's modifying it but if at least  one processor is modifying it you could  get either true sharing misses or false  sharing misses so a true sharing miss is  when two processors are accessing the  same data on the same cache line and as  you recall from a previous lecture if  one of the two processors is writing to  this cache line whenever it does a write  in that it needs to acquire the cache  line in exclusive mode and then  invalidate the that cache line in all  other caches so then when when another  processor tries to access the same  memory location has to bring it back  into its own cache and then you get a  cache miss there a false sharing miss  happens if two processes are accessing  different data that just happened to  reside on the same cache line because  the basic unit of movement is a cache  line in the architecture so even if  you're accessing different things if  they are on the same cache line you're  still going to get a sharing miss and  fall sharing is pretty hard to deal with  because in general you don't know what  data gets placed on what cache line  there's certain heuristics you can use  for example of your mouth looking a big  we region you know that that memory  region is contiguous so you can space  your access is far enough apart by  different processors so they don't touch  the same cache line so if you're just  declaring local variables on the stack  you don't know where the compiler is  going to decide to place these variables  in the virtual memory address space so  these are for different types of cache  misses that you should know about  and there's many models out there for  analyzing the cache performance of  algorithms and some of the models ignore  some of these different types of cache  misses so just be aware of this when  you're looking at algorithm analysis  because not all the models will capture  all of these different types of cache  misses so let's look at a bad case for  conflict misses so here I'm I have a I  want to access a sub matrix within a  larger matrix and recall that the  matrices are stored in row major order  and let's say our matrix is 4096 columns  by 4096 rows and it still stores doubles  so therefore each row here is going to  contain two to the fifteenth bytes  because 4096 is 2 to the 12 and we have  doubles each one of which takes 8 bytes  so 2 to 12 times 2 to the third which is  2 to 15th we're gonna assume the word  width is 64 which is standard we're  gonna assume that we have a cache size  of 32 K and the cache block size is 64  which again is standard and let's say we  have a four-way associative cache so  let's look at how the how the bits are  divided into so again we have this  offset which takes log base 2 of B bits  so how many bits do we have for the  offset in this example  all right so we have six bits so it's  just log base 2 of 64 what about for  this set how many bits do we have for  that seven who said seven yeah so so it  is seven so M is 32 K which is to 215th  and then k is 2 to the 2 b is 2 to the 2  to the 6 so it's 2 to 15 divided by 2  the 8 which is 2 to 7th and log base 2  of that is 7 and finally what about the  tag field  51 why is that yes it's just 64 minus  seven minus six which is 51 okay so  let's say that we want to access a sub  matrix within this larger matrix let's  say we want to access a 32 by 32 sub  matrix and this is pretty common in  matrix algorithms where you want to  access sub matrices especially in  dividing conquer algorithms and let's  say we want to access a column of this  sub matrix a so the addresses of the  elements that we're going to access are  as follows so let's say we the first  element in a column is sort out address  X then the second element in the column  is going to be stored at address X plus  two to the fifteenth because each row  has two to the fifteenth bytes and we're  skipping over an entire row here to get  to the element in the second in the next  row of the sub matrix so we're gonna add  to the fifteenth and then to get the  third element we're gonna add two times  two to the fifteenth and so on until we  get to the last element which is X plus  31 times two to the fifteenth so which  the olds of the address are changing as  we go through one column of this sub  matrix  yeah  yeah so it's just gonna be the tag  that's changing the set and the offset  are gonna stay the same because we're  just using the lower 13 bits to store  the setna tag and therefore when we  increment by to the 15th we're not going  to touch the set in the offset so all of  these addresses fall into the same set  and this is a problem because ours us  our cache is only 4-way associative so  we can only fit for cash for cache lines  in each set and here we're accessing 31  of these things so by the time we get to  the next column of a all the things that  we access in the current column of a are  going to be evicted from cache already  and this is known as a conflict miss  because if if you had a fully  associative cache this might not have  happened because you could actually use  any location in the cache to store these  cache blocks so is anybody have any  questions on why we get conflict misses  here  so anybody have any ideas on how to fix  this so what can I do to make it so that  I'm not incrementing by to the exactly  to the 15th every time  yeah yeah yeah yeah so one solution is  to Pat the matrix you can add some  constant amount of space to the end of  the matrix so each row is gonna be  longer than to the 15th bytes so maybe  you add some small constant like 17 so  add 17 bytes the end of each row and now  when you act as a column of this sub  matrix  you're not just incrementing by to the  15th you're also adding some small  integer and that's going to cause the  set and offset fields to change as well  and you're not going to get as many  conflict misses so that's one way to  solve the problem it turns out that if  you're doing a matrix multiplication  algorithm that's a cubic work algorithm  and you can basically afford to copy the  sub matrix into a temporary 32 by 32  matrix do all the operations on the  temporary matrix and then copy it back  out to the original matrix the copy  copying only takes quadratic work to do  across the whole algorithm and since the  whole algorithm takes cubic work the  quadratic work is a lower order term so  you can also you can use temporary space  to make sure that you don't get conflict  misses any questions  okay so this was conflict misses so  conflict misses are important but  usually we're going to be first  concerned about getting good spatial and  temporal locality because those are  usually the higher-order factors in the  performance of a program once we get  good spatial and temporal locality in  our program we can then start worrying  about conflict misses for example by  using temporary space or padding or our  data by some small constants so that we  don't have as many conflict misses okay  so now I want to talk about a model that  we can use to analyze the cache  performance of algorithms and this is  called the ideal cache model so in this  model we have a two level cache  hierarchy so we have the cash and then  main memory the cache size is of size M  and the cache line sizes of B bytes and  therefore we can fit M over B cache  lines inside our cache this model  assumes that the cache is fully  associative so any cache block can go  anywhere in the cache and it also  assumes an optimal Nishant replacement  policy so this means that what we want  to evict a cache block from the cache  we're going to pick the thing to evict  that gives us the best performance over  all it gives us the lowest number of  cache misses throughout our entire  algorithm so we're assuming that we know  the sequence of memory requests  throughout the entire algorithm and  that's why it's called the nisshin  replacement policy and if something is  in cache you can operate on it for free  and if something is in main memory you  have to bring it into cache and then you  incur a cache miss so there are two  performance measures that we care about  first we care about the order in any  work which is just the ordinary running  time of a program so this this is the  same as before when we were analyzing  algorithms it's just a total number of  operations that the program does and the  number of cache misses is going to be  the number of  we have to transfer between the main  memory and the cache so a number of  cache misses just counts the number of  cash transfers whereas the work counts  all the operations that you have to do  in the algorithm so ideally we would  like to come up with algorithms that  have a low number of cache misses  without increasing the work from the  traditional standard algorithm sometimes  we can do that sometimes we can't do  that and then there's a trade-off  between the work and the number of cache  misses and and then you it's a trade-off  that you have to decide whether it's  worthwhile as a performance engineer  today we're gonna look at an algorithm  where you can't actually reduce a number  of cache misses without increasing the  work so you basically get the best of  both worlds so any questions on this  ideal cache model  so this model is just used for analyzing  algorithms you can't actually buy one of  these caches at the store so this is a  very ideal cache and they don't exist  but it turns out that this optimal  initial reply sment policy has nice  theoretical properties and this is a  very important lemma that has that was  proved in 1985 it's called the LRU lemma  it was proof by Slater and targe end and  the lemma says suppose that an algorithm  incurs Q cache misses on an ideal cache  of size M then on a fully associative  cache of size 2m that uses the LRU or  least recently used replacement policy  it incurs at most 2q cache misses so  what this says is if I can show the  number of cache misses for an algorithm  on the ideal cache then if I take a  fully associative cache that's twice the  size and use the LRU replacement policy  which is a pretty practical policy then  the algorithm is going to incur at most  twice the number of cache misses and the  implication of this Lama is that for  asymptotic analysis you can assume  either the optimal replacement policy or  the LRU replacement policy as convenient  because because the cache number of  cache misses is just going to be within  a constant factor of each other so this  is a very important lemma it says that  this basically makes it much easier for  us to analyze our cache misses and  algorithms and here's a software  engineering principle that I want to  point out so first when you're trying to  get good performance you should come up  with a theoretically good algorithm that  has good bounds on the work in the cache  complexity and after you come up with a  algorithm that's theoretically good then  you start engineering for detailed  performance you start worrying about the  details such as real world caches not  being full  associative and for example loads and  stores having different costs with  respect to bandwidth and latency but  coming up with a theoretically good  algorithm is the first-order bit to  getting good performance questions  okay so let's start analyzing the number  of cache misses in program so here's a  llama  so the llama says suppose that a program  reads a set of our data segments where  the ithe segment consists of s sub I  bytes and suppose that the sum of the  sizes of all the segments is equal to N  and we're going to assume that n is less  than M over 3 so the segments the sum of  the size of the segments is less than  the cache size divided by 3 we're also  going to assume that n over R is greater  than or equal to B so recall that R is  the number of data segments we have and  n is the total size of the segment so  what does n over B mean and over R means  semantically yes yes oh and over R is  just the average size of a segment and  here we're saying that the average size  of a segment is at least B's at least  the size of a cache line so these two  assumptions hold then all of the  segments are going to fit into cache and  the number of cache misses to read them  all is at most 3 times n over B so if  you if you had just a single array of  size n then the number of cache misses  you would need to read that array into  cache is going to be n over B and this  is saying that even if our data is  divided into a bunch of segments as long  as an average length of the segments is  large enough then a number of cache  misses is just a constant factor worse  than reading a single array so let's try  to prove this cache miss lemma so here's  a proof so a single segment S sub I is  going to incur at most s sub I over B  plus 2 cache misses says anyone want to  tell me where the S sub I over B plus 2  comes from  so this is the let's say this is a  segment that we're analyzing and this is  how it's aligned in virtual memory yes  yeah so S sub I over B plus two is the  number of blocks that could overlap with  in the worst case so you need s mi over  B cache missus just to load those s of I  bytes but then the beginning in the end  of that segment might not be perfectly  aligned with a cache line boundary and  therefore you could waste at most one  block on each side of this segment so  that's where the plus two comes from so  to get the total number of cache misses  we just have to sum this quantity from I  equals 1 to R so if I sum s of I over B  from I equals 1 to R I just get n over B  by definition and then I sum 2 from  idols 1 to R so that just gives me 2 R  now I'm going to multiply the top and  the bottom of the second term by B so to  R B over B now and then that's less than  or equal to n over B plus 2 and over B  so where did I get this inequality here  why do I know that to R B is less than  or equal to 2 n yes  yes you know that n is greater than or  equal to BR by this assumption up here  so therefore RB is less than or equal to  N and then NB plus 2n be just sums up to  3 and B so in the worst case we're gonna  incur 3 and over B cache misses okay so  any questions on this cache miss lemma  so important thing to remember here is  that if you're if you have a whole bunch  of data segments and the average length  of your segments is large enough bigger  than a cache block size that you can  access all of these segments just like  just like a single array it only  increases the number of cache misses by  a constant factor and if you're doing an  asymptotic analysis then it doesn't  matter so we're gonna be using this  cache miss lemma later on when we  analyze algorithms ok so another  assumption that we're gonna need is  called the tall cache assumption and the  tall cache assumption basically says  that the cache is taller than it is wide  so says that B squared is less than cm  for some sufficiently small constant C  less than or equal to 1 so in other  words it says that the number of cache  lines and over B you have is going to be  bigger than B and this tall cache  assumption is usually satisfied in  practice so here the cache line sizes  and the cache sizes on the machines that  we're using so cache line size is 64  bytes and the l1 cache size is 32  kilobytes so 64 bytes squared that's 2  to 12 and 32 kilobytes less to 215th  bytes so 2 to 12 is less than 2 to 15 so  it satisfies a tall cache assumption and  as we go up the memory hierarchy the  cache size increases but the cache line  length stays the same so  the cash has become even taller as we  move up the memory hierarchy so let's  see why this tall cache assumption is  going to be useful to see that we're  going to look at what's wrong with a  short cache  so in the short cache our lines are  going to be very wide and they're wider  than the number of lines that we can  have in our cache and let's say we're  working with an N by n sub matrix sort  and row major order if you have a short  cache then even if n squared is less  than cm meaning that you can fit all the  bytes of the sub matrix in cache you  might still not be able to fit it into a  short cache and this picture sort of  illustrates this so we have n rows here  but we can only fit M over B of the rows  in the cache because the the cache lines  are so long and we're actually wasting a  lot of space on each of the cache lines  we're only using a very small fraction  of each cache line to store the row of  this sub matrix if we if this were the  entire matrix then it would actually be  okay because consecutive rows are going  to be placed together consecutively in  memory but if this is a sub matrix then  we can't be guaranteed that the next row  is going to be placed right after the  current row and oftentimes we have to  deal with sub matrices when we're doing  recursive matrix algorithms okay so this  is what's wrong with short caches and  that's why we want to assume the tall  cache assumption and we can assume that  because it's usually satisfied in  practice the TLB actually tends to be  short it only has a couple entries so it  might not satisfy the tall cache  assumption but all the other caches will  satisfy this assumption any questions  okay  so here's another llama that's going to  be useful this is called the sub-matrix  caching llama so suppose that we have an  N by n matrix a and it's read into a  tall cache that satisfies B squared less  than cm for some constant C less than  equal to 1 and suppose that N squared is  less than M over 3 but it's greater than  or equal to CM then a is going to fit  into cache in a number of cache misses  required to read all of ADEs elements  into cache is at most 3 n squared over B  ok so let's see why this is true so  we're gonna let big n denote the total  number of bytes that we need to access  so big n is going to be equal to N  squared and we're gonna use the cache  miss lemma which says that if the  average length of our segments is large  enough that we can read all the segments  in just like it were a single contiguous  array so the lengths of our segments  here are going to be little n so R is  gonna be little N and that's also the  number of segments is going to be little  and and the segment length is also going  to be little ends since we're working  with a square sub-matrix here and then  we also have that the cache block size B  is less than or equal to N and that's  equal to big and over R and where do we  get this property that B is less than or  equal to n so I made some assumptions up  here where I can use to infer that B is  less than or equal to n does anybody see  where yeah  yeah so I know that B squared is less  than cm cm is less than or equal to n  square so therefore B squared is less  than N squared in B is less than n so so  now I also have that n is less than M  over 3 just by assumption and therefore  I can use the cache miss lemma so the  cache miss lemma tells me that I only  need a total of 3 N squared over B cache  misses to read this whole thing in any  questions on the sub-matrix hashing  lemma  okay so now let's analyze matrix  multiplication so how many of you have  seen matrix multiplication before so a  couple of you okay so here's what the  code looks like for the standard cubic  work matrix multiplication algorithm so  we have two input matrices a and B and  we're gonna store the result in C and  the height in the width of our matrix is  n we're just going to deal with square  matrices here but what I'm going to talk  about also extends to non square  matrices and then we just have three  loops here we're going to loop through I  from 0 to n minus 1 J from 0 to n minus  1 and K from 0 to n minus 1 and then  we're gonna let C of I n plus J be  incremented by a of I n plus K times B  of K n plus J so this is just the  standard code for matrix multiply so  what's the work of this algorithm should  be all should be review for all of you  and cubed  okay so now let's analyze the number of  cache misses this algorithm is going to  incur and again we're gonna assume that  the matrix is in row major order and we  satisfy the tall cache assumption we're  also going to analyze the number of  cache misses in matrix B because it  turns out that the number of cache  misses incurred by matrix B is going to  dominate the number of cache misses  overall in there are three cases we need  to consider the first case is when n is  greater than C M over B for some  constant C and we're gonna analyze  matrix B as I said and we're also going  to assume LRU because we can if you  recall the LRU lemma it says that  whatever we analyzed using the LRU is  just going to be a constant factor  within what we analyzed using the ideal  cache so so to do this matrix  multiplication I'm going to go through  one row of a and one column of B and do  the dot product there this is what  happens in the innermost loop and how  many cache misses am I going to incur  when I go down one column of B here so  here I have the case where n is greater  than M over B so I can't fit all of I  can't fit one block from each row into  the cache so how many cache misses do I  have the first time I go down a column  of B so how many rows of B do I have and  yeah and how many cache misses do I need  for each row one so in total I'm going  to need n cache misses for the first  column of B what about the second column  of B  so recall that I'm assuming the LRU  replacement policy here so when the  cache is full  I'm gonna evict the thing that was least  recently used used the furthest in the  past  sorry could you repeat that yes it's  still gonna be n why is that  yeah so it's still gonna be n because I  can't fit one cache block from each row  into my cache and by the time I get back  to the top of my matrix B the top block  has already been evicted from the cache  and I have to load it back in and this  is the same for every other block that I  access I'm again gonna need n cache  misses for the second column of B and  this is going to be the same for all the  columns of B and I have to do this again  for the second row of a so in total I'm  going to need theta of n cubed number of  cache misses and this is one cache miss  per per entry that I access in B and  this is not very good because the total  work was also theta of n cube so I'm not  gaining anything from having any  locality in this in this algorithm here  so any questions on this analysis so  this is just a case one let's look at  case two so in this case n is less than  C M over B so I can fit one block from  each row of B into cache and then n is  also greater than another constant C  prime times square root of M so I can't  fit the whole matrix into cache and  again let's analyze the number of cache  misses incurred by accessing be assuming  LRU so how many cache misses am I going  to incur for the first column of B and  so the same as before what about the  second column of B so by the time I get  to the beginning of the beginning of the  matrix here is the top block going to be  in cache so who thinks the block is  still gonna be in cache when I get back  to the beginning yeah so a couple of  people who think is gonna be out of  cache okay so turns out it is going to  be in cache because I can fit one block  for every row of B into my  since I have n less than cm over B so  therefore when I get to the beginning of  the second column that block is still  gonna be in cash because I loaded it in  when I was accessing the first column so  I'm not gonna incur any cash misses for  the second column and in general if if  my if I can fit B columns or some  constant times B columns in in into cash  then I can reduce the number of cache  misses I have by a factor of B so I only  need to incur a cache miss the first  time I access a block and not for all  the subsequent accesses and the same is  true for the second row of a and since I  have n rows of a I'm gonna have n times  theta of N squared over B cache misses  for each row of a I'm gonna incur and  squared over B cache misses  so the overall number of cache misses is  and cubed over B and this is because  inside matrix B I can exploit spatial  locality once I load in a block I can  reuse it the next time I traverse down a  column that's nearby any questions on  this analysis  okay so let's look at the third case and  here n is less than C prime times square  root of M so this means that the entire  matrix fits into cache so let's analyze  the number of cache misses for matrix B  again assuming LRU so how many cache  misses do I have now so let's let's  count the total number of cache misses I  have for every time I go through a row  of a yes yeah so the firt for the first  column is going to be n what about the  second column right so for the so  basically for the first row of a the  analysis is going to be the same as  before I need n squared over B cache  misses to load the cache in what about  the second row of a how many cache  misses am I going to incur  yeah so for the second row of a I'm not  gonna incur any cash misses because once  I load B into cash  it's gonna stay in cash because the  entire matrix can fit in cash I assume  that n is less than C prime times square  root of M so total number of cache  misses I need for matrix B is theta of n  squared over B since everything fits in  cache and I just applied the sub matrix  caching lemma from before  overall this is not a very good  algorithm because as you recall in case  one I needed a cubic cubic number of  cache misses what happens if I swap the  order of the inner two loops so recall  that this was one of the optimizations  in lecture one when Charles was talking  about matrix multiplication and how to  speed it up so if I swap the order of  the two inner loops then then for every  iteration what I'm doing is I'm actually  going over a row of C in a row of B and  a stays fixed inside the innermost  iteration so now when I analyze the  number of cache misses of matrix B  assuming LRU I'm gonna benefit from  spatial locality since I'm going row by  row and the matrix is stored in row  major order so across all of the rows  I'm just going to require theta of n  squared over B cache misses and I have  to do this end times for the outer loop  so in total I'm going to get theta of n  cubed over B cache misses so if you swap  the order of the inner two loops this  significantly improves the locality of  your algorithm and you can benefit from  spatial locality and that's why we saw a  significant performance improvement in  the first lecture when we swapped the  order of the loops any questions  does anybody think we can do better than  n cubed over B cache misses or you think  that it's the best you can do so how  many people think you can do better yeah  and how many people think this is the  best you can do okay and how many people  don't care so turns out you can do  better and we're gonna do better by  using an optimization called tiling so  how this is gonna work is instead of  just having three for loops I'm gonna  have six for loops and I'm gonna loop  over tiles so I've got loop over s by s  sub matrices and within each sub makes  listing I'm gonna do all the computation  I need for that sub matrix before moving  on to the next sub matrix so the three  innermost loops are going to loop inside  a sub matrix and the three outermost  loops are going to loop within the  larger matrix one sub matrix at a time  so let's analyze the work of this  algorithm so the work that we need to do  for a sub matrix of size s by s is going  to be s cubed since that's just a bound  for matrix multiplication and then a  number of times I have to operate on sub  matrices is going to be n over s cubed  and you can see this if you just  consider each sub matrix to be a single  element and then using the same cubic  work analysis on the smaller matrix so  the work is n over s cubed times s cubed  which is equal to theta n cube so the  work of this tiled matrix multiplies the  same as the version that didn't do  tiling and now let's analyze the number  of cache misses  so we're gonna tune s so that the  sub-matrices  just fit into cash so we're gonna set s  to be equal to theta of square root of M  who actually need to make this like  one-third square root of M because we  need to fit three sub matrices in the  cache but it's going to be some constant  times square root of M the sub-matrix  cashing leva implies that for each sub  matrix  we're gonna need s squared over B misses  to load it in and once we load it into  cash  it fits entirely into cache so we can do  all of our computations within cash and  not incur any more cache misses so  therefore the total number of cache  misses we're going to incur it's going  to be the number of subproblems which is  n over s cubed and then the number of  cache misses per subproblem which is s  squared over B and if you multiply this  out you're gonna get and cubed over B  times square root of M so here I plugged  in square root of M for s and this is a  pretty cool result because it says that  you can actually do better than the n  cubed over B bound you can improve this  bound by a factor of square root of M  and in practice square root of M is  actually not insignificant so for  example if you're looking at the last  level cache the size of that is on the  order of megabytes so a square root of M  is going to be on the order of thousands  so this significantly improves the  performance of the matrix multiplication  code if you tune s so that the sub  matrices just fit in the cache it turns  out that this bound is off to most so  this was shown in 1981  so for cubic work matrix multiplication  this is the best you can do if you use  another matrix multiply algorithm like  strassens algorithm you can do better so  I want you to remember this bound it's a  very important bound to know says that  for matrix multiplication you can you  can benefit both from spatial locality  as well as  temporal locality so I get spatial  locality in the denominator the B term  in the denominator and then the square  root of M term comes from temporal  locality since I'm doing all of the work  inside the sub-matrix before I evict  that sub matrix from cache any questions  on this analysis so what's one issue  with this algorithm here yes yeah so the  problem here is I have two two nests for  my particular machine and I call this a  voodoo parameter it's sort of like a  magic number I I put into my program so  that it fits in the cache on the  particular machine I'm running on and  this makes the code not portable because  when I try if I try to run this code on  another machine the cache sizes might be  different there and then I won't get the  same performance as I did on my machine  and this is also an issue even if you're  running it on the same machine because  you might have other programs running at  the same time and using a part of the  cache so you don't actually know how  much of the cache your program actually  gets to use in a multi programming  environment and then this was also just  for one level of cache if we want to  optimize for two levels of caches we're  gonna have to voodoo parameters s and T  we're gonna have sub matrices and sub  sub matrices and then we have to tune  both of these parameters to get the best  performance on our machine and  multi-dimensional tuning optimization  can't be done simply with binary search  so if you're just tuning for one level  of cache you can do a binary search on  the parameter s but here you can't do  binary search so so it's much more  expensive to optimize here and the code  becomes a little bit Messier you have  nine four loops instead of six and how  many levels of caches do we have on the  machines that we're using today three so  four three level cache you have three  voodoo  parameters you have 12 nested for-loops  this code becomes very ugly and you have  to tune these parameters for your  particular machine and this makes the  code not very portable as one student  pointed out and in a multi programmed  environment you don't actually know the  effective cache size that your program  has access to because other jobs are  running at the same time and therefore  it's very easy to miss - in the  parameters with their question so I need  questions yeah yes so you I mean you can  auto tune your program so that it's  optimized for the cache sizes of your  particular machine instruction to get  the size of your cache I'm not actually  sure do you know  yeah in the rock  [Music]  yeah proxy pu info yeah yes so you can  probably get that as well yeah yeah but  even if you do that and you're running  this program when other jobs are running  you don't actually know how much cash  your program has access to yes  no it's they're actually general-purpose  we're actually  I mean today we're just looking at  matrix multiply but in on Thursday's  lecture we'll actually be looking at  many other problems and how to optimize  them for the cache hierarchy other  questions  okay so this was a I mean this was a  good algorithm in terms of cache  performance but it wasn't very portable  so let's see if we can do better let's  see we can come up with a simpler design  where we still get pretty good cache  performance so we're going to turn to  dividing conquer we're going to look at  the recursive matrix multiplication  algorithm that we saw before again we're  going to deal with square matrices but  the results generalize to non square  matrices so how this works is we're  going to split our input matrices into  four sub sub matrices or four quadrants  and then for each quadrant of the output  matrix it's just going to be the sum of  two matrix multiplies on n over two by n  over two matrices so c1 1 is going to be  a1 1 times b1 1 plus a12 times b1 b2 1  and then we're gonna do this recursively  so every level of recursion we're gonna  get eight multiply adds of n over two by  n over two matrices here's what the  recursive code looks like you can see  that we have eight recursive calls here  the base case here is of size 1 in  practice you want a course in the base  case to overcome function call overheads  let's also look at what these values  here correspond to so I've color-coded  these that they correspond to particular  elements in the sub matrix that I'm  looking at on the right so these values  here corresponds to the index of the  first element in each of my quadrants so  the first element in my first quadrant  is just gonna have an offset of zero and  then the first element of my second  quadrant that's gonna be on the same row  as the first element in my first  quadrant so I I just need to add a  difference between the I just need to  add the width of my quadrant which is n  over 2 and then to get the first element  in quadrant 2 1 I'm going to jump over  and over to rows and  each row has a length row size so it's  just gonna be an over two times row size  and then to get the first element in  Quadrant 2 2 it's just the first element  in Quadrant 2 1 + + / 2 so that's n over  2 times row size plus 1 so let's analyze  the work of this algorithm so what's the  recurrence for this algorithm for the  work of this algorithm so how many  subproblems do we have 8 and what's the  size of each subproblem n over 2 and how  much work are we doing to set up the  recursive calls constant amount of work  so the recurrence is w of n is equal to  8 w n over 2 plus theta of 1 and what  does that solve to and cubed so it's one  of the three cases of the master theorem  we're actually going to analyze this in  more detail using by drawing out the  recursion tree and this is going to give  us more intuition about why the master  theorem is true so at the top level of  my recursion tree I'm gonna have a  problem of size N and then I'm gonna  branch into 8 sub problems of size n  over 2 and then I'm gonna do a constant  amount of work to set up the recursive  calls here I'm just labeling this with  one so I'm ignoring the constants but  it's not gonna matter for asymptotic  analysis and then I'm gonna branch again  into 8 sub problems of size and over  four and eventually I'm gonna get down  to the leaves and how many levels do I  have until I get to the leaves  yes yes o log n what's the base of the  log  yes it's log base 2 of n because I'm  dividing my problem size by two every  time and therefore the number of leaves  I have is going to be 8 to the log base  2 of n because I'm branching 8 ways  every time 8 to the log base 2 of n is  the same as n to the log base 2 of 8  which is n cubed the amount of work I'm  doing at the top level is constant so  I'm just gonna say one here the next log  was eight times that then 64 and then  when I get to the leaves it's going to  be theta of n cubed since I have n cubed  leaves and they're all doing constant  work and the work is geometrically  increasing as I go down the recursion  tree so the overall work is just  dominated by the work I need to do at  the leaves so the overall work is just  gonna be theta n cubed and this is the  same as the looping versions of matrix  multiply they're all cubic work now  let's analyze the number of cache misses  of this dividing conquer algorithm so  now a migrant recurrence is gonna be  different my base case now is when the  sub matrix fits in the cache so when n  squared is less than CM and when that's  true I just need to load that some  matrix in the cache and then I don't  incur any more cache misses so I need  theta of N squared over the cache misses  when N squared is less than CM for some  sophisticated sufficiently small  constant C less than equal to 1 and then  otherwise i recurse into 8 sub problems  of size n over 2 and then I add theta of  1 because I'm doing a constant amount of  work to set up the recursive calls and I  get this theta of N squared over B term  from the sub matrix cashing lemma says I  can just load the entire matrix into  cache with this many cache misses so so  the difference between the cache  analysis here and the work analysis  before is that I have a different base  case and I think in all of that  that you've seen before the base case  was always of a constant size but here  we're working with a base case that's  not of a constant size so let's try to  analyze this using the recursion tree  approach so at the top level I have a  problem of size n then I'm going to  branch and do eight problems of size n  over two and then I'm also going to  incur constant number of cache misses  I'm just gonna say one here then I'm  gonna branch again and then eventually  I'm going to get to the base case where  N squared is less than CM and when N  squared is less than CM then the number  of cache misses that I'm gonna incur is  going to be theta of CM over B so I can  just plug in CM here for n squared and  the number of levels of recursion I have  in this recursion tree is no longer just  log base two of n I'm gonna have log  base two of n minus log base two of  square root of CM number of levels which  is the same as log base two of n minus  1/2 times log base two of CM and then  the number of leaves I get is going to  be eight to this to this number of  levels here says 8 to log base two of n  minus 1/2 of log base two of cm and this  is equal to theta of n cubed over m to  the three halves so the n cubed comes  from the eight to the log base two of n  term and then if I do eight to the log  base 8 to the negative one half of log  base two of CM that's just gonna give me  m over m to the three halves in the  denominator so any questions on how I  computed the number of levels of this  recursion tree here  so I'm basically dividing my problem  size by two until I get to a problem  size that fits into cash so that means n  is less than square root of CM so  therefore I can subtract that many  levels for my recursion tree and then to  get the number of leaves since I'm  branching eight ways I just do eight to  the power of the number of levels I have  and then that gives me the total number  of leaves  so now let's analyze the number of cache  misses I need at each level of this  recursion tree at the top level I have a  constant number of cache misses let's  just say one the next level I have eight  sixty-four and then at the leaves I'm  going to have theta of n cubed over B  times square root of M cache misses and  I got this quantity just by multiplying  the number of leaves by the number of  cache misses per leaf so number of  leaves is n cubed over m to the three  halves the cache misses per leaves is  state of CM over B so I lose one factor  of B in the denominator I'm left with  the square root of M at the bottom and  then I also divide by the block size B  so overall I get n cubed over B times  square root of M cache misses and again  this is a geometric series and the  number of cache misses at the leaves  dominates all of the other levels so the  total number of cache misses I have is  going to be theta n cubed over B times  square root of M and notice that I'm  getting the same number of cache misses  as I did with the tiling version of the  code but here I don't actually have the  two in my code for the particular cache  size so what cache sizes does this code  work for  so is this code gonna work on your  machine it's gonna get good cache  performance so this code is gonna work  for all cache sizes because I didn't  tune it for any particular cache size  and this is what's known as a cache  oblivious algorithm it doesn't have any  Voodoo tuning parameters has no explicit  knowledge of the caches and it's  essentially passively Auto tuning itself  for the particular cache size of your  machine it can also work for multi-level  caches automatically because I never  specified what level of cache I'm  analyzing this for I can analyze it for  any level of cache and it's still gonna  give me good cache complexity and this  is also good in multi programmed  environments where you might have other  jobs running and you don't know your  effective cache size this is just gonna  pass all the auto-tune for whatever  cache size is available turns out that  the best cache oblivious codes today  work on arbitrary rectangular matrices I  just talked about square matrices but  the best codes work on rectangular  matrices and they perform binary  splitting instead of eight-way splitting  and you split on the largest of I J and  K so this is what the best cache  oblivious matrix multiplication  algorithm does any questions  okay so I only talked about the serial  setting so far I was assuming that these  algorithms ran on just a single thread  what happens if I go to multiple  processors turns out that the results do  generalize to a parallel context so this  is the recursive parallel  matrix-multiply code that we saw before  and notice that we're executing four sub  calls in parallel doing a sync and then  doing four more sub calls in parallel so  let's try to analyze the number of cache  misses in this parallel code and to do  that we're gonna use this theorem which  says that let Q sub P be the number of  cache misses in a deterministic cell  computation won't run on P processors  each with a private cache of size M and  let S sub P be the number of successful  steals during the computation in the  ideal cache model the number of cache  misses we're gonna have is Q sub P equal  to Q sub 1 plus Big O of number of  steals times M over B so a number of  cache misses in the parallel context is  equal to a number of cache misses when  you run it serially plus this term here  which is the number of steals times M  over B and a proof for this goes as fall  so every call in the silk runtime system  we can have workers steal tasks from  other workers when they don't have work  to do and after workers steals it has  from another worker its cache becomes  completely cold in the worst case  because it wasn't actually working on  that sub problem before but after M over  B cold cache misses its cache is going  to become identical to what it would be  in the serial execution so we just need  to pay em over B cache misses to make  make it so that the cache looks the same  as if they were executing serially and  the same is true when a worker resumes a  stolen computation after a silk sync and  the number of times that these two  situations can happen is two times SP  two times the number of steals and each  time we have to pay  I'm over B cache misses and this is  where this additive term comes from  order S sub P times M over B we also  know that the number of steals in a silk  program is upper bounded by P times T  infinity an expectation where P is a  number of processors and T's infinity is  the span of your computation so if you  can minimize the span of your  computation then this also gives you  good cache bounds so moral of the story  here is that minimizing the number of  cache misses in the serial illusion  essentially minimizes them in the  parallel execution for a low span  algorithm so in this recursive matrix  multiplication algorithm the span of  this is as follows so T infinity of n is  2 T infinity of n over 2 plus theta of 1  since we're doing a sync here we have to  pay the critical path length of two sub  calls this solves two theta of N and  applying the previous lemma this gives  us a cache miss bound of theta of n  cubed over B square root of M this is  just the same as the serial execution  and then this additive term is going to  be order P times n and there's a span  times M over B so that was a parallel  that was a parallel algorithm for matrix  multiply and we saw that we can also get  good cache bounds there so here's a  summary of what we talked about today we  talked about Sousa tivity and caches  different different ways you can design  a cache talked about the ideal cache  model that's useful for analyzing  algorithms we talked about cache aware  algorithm that have explicit knowledge  of the cache and the example we use was  tiled matrix multiply then we came up  with a much simpler algorithm that was  cache oblivious using dividing conquer  and then on Thursdays lecture will  actually see much more on cache  oblivious algorithm designed and then  you'll  an opportunity to analyze the cash  efficiency of some algorithms in the  next homework
 

3
00:00:05,769 --> 00:00:08,019

 
4
00:00:08,019 --> 00:00:09,850

 
5
00:00:09,850 --> 00:00:10,930

 
6
00:00:10,930 --> 00:00:13,120

 
7
00:00:13,120 --> 00:00:15,160

 
8
00:00:15,160 --> 00:00:21,389

 
9
00:00:21,389 --> 00:00:24,760

 
10
00:00:24,760 --> 00:00:26,470

 
11
00:00:26,470 --> 00:00:29,380

 
12
00:00:29,380 --> 00:00:31,359

 
13
00:00:31,359 --> 00:00:35,650

 
14
00:00:35,650 --> 00:00:38,710

 
15
00:00:38,710 --> 00:00:42,760

 
16
00:00:42,760 --> 00:00:45,130

 
17
00:00:45,130 --> 00:00:47,560

 
18
00:00:47,560 --> 00:00:50,590

 
19
00:00:50,590 --> 00:00:54,520

 
20
00:00:54,520 --> 00:00:56,650

 
21
00:00:56,650 --> 00:00:58,660

 
22
00:00:58,660 --> 00:01:01,479

 
23
00:01:01,479 --> 00:01:06,070

 
24
00:01:06,070 --> 00:01:08,649

 
25
00:01:08,649 --> 00:01:11,500

 
26
00:01:11,500 --> 00:01:16,240

 
27
00:01:16,240 --> 00:01:17,950

 
28
00:01:17,950 --> 00:01:19,810

 
29
00:01:19,810 --> 00:01:21,550

 
30
00:01:21,550 --> 00:01:25,120

 
31
00:01:25,120 --> 00:01:30,340

 
32
00:01:30,340 --> 00:01:32,080

 
33
00:01:32,080 --> 00:01:35,260

 
34
00:01:35,260 --> 00:01:37,090

 
35
00:01:37,090 --> 00:01:41,710

 
36
00:01:41,710 --> 00:01:45,789

 
37
00:01:45,789 --> 00:01:47,469

 
38
00:01:47,469 --> 00:01:49,270

 
39
00:01:49,270 --> 00:01:51,760

 
40
00:01:51,760 --> 00:01:55,780

 
41
00:01:55,780 --> 00:01:58,570

 
42
00:01:58,570 --> 00:02:02,830

 
43
00:02:02,830 --> 00:02:05,050

 
44
00:02:05,050 --> 00:02:07,120

 
45
00:02:07,120 --> 00:02:08,830

 
46
00:02:08,830 --> 00:02:13,119

 
47
00:02:13,119 --> 00:02:14,590

 
48
00:02:14,590 --> 00:02:18,880

 
49
00:02:18,880 --> 00:02:21,790

 
50
00:02:21,790 --> 00:02:27,940

 
51
00:02:27,940 --> 00:02:30,220

 
52
00:02:30,220 --> 00:02:31,840

 
53
00:02:31,840 --> 00:02:35,050

 
54
00:02:35,050 --> 00:02:37,450

 
55
00:02:37,450 --> 00:02:39,190

 
56
00:02:39,190 --> 00:02:40,840

 
57
00:02:40,840 --> 00:02:43,480

 
58
00:02:43,480 --> 00:02:46,720

 
59
00:02:46,720 --> 00:02:49,900

 
60
00:02:49,900 --> 00:02:53,650

 
61
00:02:53,650 --> 00:02:56,110

 
62
00:02:56,110 --> 00:02:58,300

 
63
00:02:58,300 --> 00:03:00,160

 
64
00:03:00,160 --> 00:03:03,700

 
65
00:03:03,700 --> 00:03:08,500

 
66
00:03:08,500 --> 00:03:10,600

 
67
00:03:10,600 --> 00:03:12,010

 
68
00:03:12,010 --> 00:03:14,050

 
69
00:03:14,050 --> 00:03:16,210

 
70
00:03:16,210 --> 00:03:19,210

 
71
00:03:19,210 --> 00:03:21,250

 
72
00:03:21,250 --> 00:03:23,140

 
73
00:03:23,140 --> 00:03:25,570

 
74
00:03:25,570 --> 00:03:27,490

 
75
00:03:27,490 --> 00:03:30,940

 
76
00:03:30,940 --> 00:03:32,710

 
77
00:03:32,710 --> 00:03:35,760

 
78
00:03:35,760 --> 00:03:38,260

 
79
00:03:38,260 --> 00:03:41,350

 
80
00:03:41,350 --> 00:03:44,370

 
81
00:03:44,370 --> 00:03:50,560

 
82
00:03:50,560 --> 00:03:52,630

 
83
00:03:52,630 --> 00:03:55,180

 
84
00:03:55,180 --> 00:03:57,700

 
85
00:03:57,700 --> 00:03:59,830

 
86
00:03:59,830 --> 00:04:01,390

 
87
00:04:01,390 --> 00:04:03,610

 
88
00:04:03,610 --> 00:04:05,260

 
89
00:04:05,260 --> 00:04:08,740

 
90
00:04:08,740 --> 00:04:11,800

 
91
00:04:11,800 --> 00:04:13,630

 
92
00:04:13,630 --> 00:04:15,820

 
93
00:04:15,820 --> 00:04:18,099

 
94
00:04:18,099 --> 00:04:20,110

 
95
00:04:20,110 --> 00:04:21,789

 
96
00:04:21,789 --> 00:04:26,660

 
97
00:04:26,660 --> 00:04:35,070

 
98
00:04:35,070 --> 00:04:37,440

 
99
00:04:37,440 --> 00:04:39,990

 
100
00:04:39,990 --> 00:04:42,330

 
101
00:04:42,330 --> 00:04:44,850

 
102
00:04:44,850 --> 00:04:47,250

 
103
00:04:47,250 --> 00:04:49,170

 
104
00:04:49,170 --> 00:04:52,170

 
105
00:04:52,170 --> 00:04:54,750

 
106
00:04:54,750 --> 00:04:56,820

 
107
00:04:56,820 --> 00:05:00,450

 
108
00:05:00,450 --> 00:05:01,950

 
109
00:05:01,950 --> 00:05:06,030

 
110
00:05:06,030 --> 00:05:07,860

 
111
00:05:07,860 --> 00:05:10,680

 
112
00:05:10,680 --> 00:05:13,710

 
113
00:05:13,710 --> 00:05:15,810

 
114
00:05:15,810 --> 00:05:18,659

 
115
00:05:18,659 --> 00:05:20,909

 
116
00:05:20,909 --> 00:05:26,550

 
117
00:05:26,550 --> 00:05:29,640

 
118
00:05:29,640 --> 00:05:31,590

 
119
00:05:31,590 --> 00:05:34,250

 
120
00:05:34,250 --> 00:05:37,740

 
121
00:05:37,740 --> 00:05:40,680

 
122
00:05:40,680 --> 00:05:43,920

 
123
00:05:43,920 --> 00:05:45,750

 
124
00:05:45,750 --> 00:05:48,420

 
125
00:05:48,420 --> 00:05:50,010

 
126
00:05:50,010 --> 00:05:52,050

 
127
00:05:52,050 --> 00:05:54,510

 
128
00:05:54,510 --> 00:05:56,940

 
129
00:05:56,940 --> 00:05:58,740

 
130
00:05:58,740 --> 00:06:01,170

 
131
00:06:01,170 --> 00:06:02,550

 
132
00:06:02,550 --> 00:06:04,230

 
133
00:06:04,230 --> 00:06:07,200

 
134
00:06:07,200 --> 00:06:09,870

 
135
00:06:09,870 --> 00:06:12,210

 
136
00:06:12,210 --> 00:06:14,460

 
137
00:06:14,460 --> 00:06:17,580

 
138
00:06:17,580 --> 00:06:20,550

 
139
00:06:20,550 --> 00:06:23,600

 
140
00:06:23,600 --> 00:06:26,730

 
141
00:06:26,730 --> 00:06:28,440

 
142
00:06:28,440 --> 00:06:31,560

 
143
00:06:31,560 --> 00:06:32,880

 
144
00:06:32,880 --> 00:06:34,680

 
145
00:06:34,680 --> 00:06:39,240

 
146
00:06:39,240 --> 00:06:41,520

 
147
00:06:41,520 --> 00:06:43,740

 
148
00:06:43,740 --> 00:06:45,290

 
149
00:06:45,290 --> 00:06:47,790

 
150
00:06:47,790 --> 00:06:49,920

 
151
00:06:49,920 --> 00:06:54,990

 
152
00:06:54,990 --> 00:06:55,000

 
153
00:06:55,000 --> 00:07:04,700


154
00:07:04,700 --> 00:07:12,300

 
155
00:07:12,300 --> 00:07:14,879

 
156
00:07:14,879 --> 00:07:16,830

 
157
00:07:16,830 --> 00:07:18,360

 
158
00:07:18,360 --> 00:07:19,980

 
159
00:07:19,980 --> 00:07:22,740

 
160
00:07:22,740 --> 00:07:26,010

 
161
00:07:26,010 --> 00:07:27,600

 
162
00:07:27,600 --> 00:07:29,400

 
163
00:07:29,400 --> 00:07:32,070

 
164
00:07:32,070 --> 00:07:35,250

 
165
00:07:35,250 --> 00:07:38,700

 
166
00:07:38,700 --> 00:07:40,950

 
167
00:07:40,950 --> 00:07:44,070

 
168
00:07:44,070 --> 00:07:46,320

 
169
00:07:46,320 --> 00:07:49,560

 
170
00:07:49,560 --> 00:07:52,140

 
171
00:07:52,140 --> 00:07:55,440

 
172
00:07:55,440 --> 00:08:02,719

 
173
00:08:02,719 --> 00:08:05,850

 
174
00:08:05,850 --> 00:08:09,360

 
175
00:08:09,360 --> 00:08:11,640

 
176
00:08:11,640 --> 00:08:14,969

 
177
00:08:14,969 --> 00:08:18,750

 
178
00:08:18,750 --> 00:08:21,180

 
179
00:08:21,180 --> 00:08:23,490

 
180
00:08:23,490 --> 00:08:25,920

 
181
00:08:25,920 --> 00:08:29,670

 
182
00:08:29,670 --> 00:08:33,589

 
183
00:08:33,589 --> 00:08:36,390

 
184
00:08:36,390 --> 00:08:39,029

 
185
00:08:39,029 --> 00:08:41,550

 
186
00:08:41,550 --> 00:08:44,550

 
187
00:08:44,550 --> 00:08:46,260

 
188
00:08:46,260 --> 00:08:48,120

 
189
00:08:48,120 --> 00:08:49,770

 
190
00:08:49,770 --> 00:08:51,390

 
191
00:08:51,390 --> 00:08:53,310

 
192
00:08:53,310 --> 00:08:57,840

 
193
00:08:57,840 --> 00:09:00,510

 
194
00:09:00,510 --> 00:09:05,190

 
195
00:09:05,190 --> 00:09:08,220

 
196
00:09:08,220 --> 00:09:11,490

 
197
00:09:11,490 --> 00:09:13,380

 
198
00:09:13,380 --> 00:09:16,380

 
199
00:09:16,380 --> 00:09:18,170

 
200
00:09:18,170 --> 00:09:21,060

 
201
00:09:21,060 --> 00:09:24,150

 
202
00:09:24,150 --> 00:09:24,160

 
203
00:09:24,160 --> 00:09:25,540


204
00:09:25,540 --> 00:09:27,400

 
205
00:09:27,400 --> 00:09:30,040

 
206
00:09:30,040 --> 00:09:33,130

 
207
00:09:33,130 --> 00:09:35,020

 
208
00:09:35,020 --> 00:09:37,230

 
209
00:09:37,230 --> 00:09:41,170

 
210
00:09:41,170 --> 00:09:43,450

 
211
00:09:43,450 --> 00:09:47,560

 
212
00:09:47,560 --> 00:09:51,720

 
213
00:09:51,720 --> 00:09:55,420

 
214
00:09:55,420 --> 00:09:58,150

 
215
00:09:58,150 --> 00:10:03,760

 
216
00:10:03,760 --> 00:10:06,040

 
217
00:10:06,040 --> 00:10:16,380

 
218
00:10:16,380 --> 00:10:18,660

 
219
00:10:18,660 --> 00:10:19,949

 
220
00:10:19,949 --> 00:10:24,300

 
221
00:10:24,300 --> 00:10:26,009

 
222
00:10:26,009 --> 00:10:27,780

 
223
00:10:27,780 --> 00:10:29,940

 
224
00:10:29,940 --> 00:10:31,170

 
225
00:10:31,170 --> 00:10:33,480

 
226
00:10:33,480 --> 00:10:35,340

 
227
00:10:35,340 --> 00:10:38,910

 
228
00:10:38,910 --> 00:10:49,440

 
229
00:10:49,440 --> 00:10:51,900

 
230
00:10:51,900 --> 00:10:54,750

 
231
00:10:54,750 --> 00:10:58,949

 
232
00:10:58,949 --> 00:11:00,540

 
233
00:11:00,540 --> 00:11:02,370

 
234
00:11:02,370 --> 00:11:04,290

 
235
00:11:04,290 --> 00:11:06,060

 
236
00:11:06,060 --> 00:11:07,829

 
237
00:11:07,829 --> 00:11:09,930

 
238
00:11:09,930 --> 00:11:13,110

 
239
00:11:13,110 --> 00:11:16,500

 
240
00:11:16,500 --> 00:11:18,840

 
241
00:11:18,840 --> 00:11:20,280

 
242
00:11:20,280 --> 00:11:23,310

 
243
00:11:23,310 --> 00:11:24,660

 
244
00:11:24,660 --> 00:11:30,269

 
245
00:11:30,269 --> 00:11:32,160

 
246
00:11:32,160 --> 00:11:34,139

 
247
00:11:34,139 --> 00:11:37,889

 
248
00:11:37,889 --> 00:11:39,420

 
249
00:11:39,420 --> 00:11:41,130

 
250
00:11:41,130 --> 00:11:44,310

 
251
00:11:44,310 --> 00:11:52,439

 
252
00:11:52,439 --> 00:11:55,420

 
253
00:11:55,420 --> 00:11:58,989

 
254
00:11:58,989 --> 00:12:01,720

 
255
00:12:01,720 --> 00:12:04,929

 
256
00:12:04,929 --> 00:12:06,939

 
257
00:12:06,939 --> 00:12:09,160

 
258
00:12:09,160 --> 00:12:12,790

 
259
00:12:12,790 --> 00:12:15,549

 
260
00:12:15,549 --> 00:12:17,379

 
261
00:12:17,379 --> 00:12:22,989

 
262
00:12:22,989 --> 00:12:25,119

 
263
00:12:25,119 --> 00:12:27,129

 
264
00:12:27,129 --> 00:12:30,759

 
265
00:12:30,759 --> 00:12:34,150

 
266
00:12:34,150 --> 00:12:36,280

 
267
00:12:36,280 --> 00:12:39,759

 
268
00:12:39,759 --> 00:12:45,249

 
269
00:12:45,249 --> 00:12:48,819

 
270
00:12:48,819 --> 00:12:52,030

 
271
00:12:52,030 --> 00:12:54,549

 
272
00:12:54,549 --> 00:12:58,960

 
273
00:12:58,960 --> 00:13:02,860

 
274
00:13:02,860 --> 00:13:06,280

 
275
00:13:06,280 --> 00:13:09,360

 
276
00:13:09,360 --> 00:13:13,419

 
277
00:13:13,419 --> 00:13:15,460

 
278
00:13:15,460 --> 00:13:17,889

 
279
00:13:17,889 --> 00:13:23,110

 
280
00:13:23,110 --> 00:13:26,559

 
281
00:13:26,559 --> 00:13:28,960

 
282
00:13:28,960 --> 00:13:28,970

 
283
00:13:28,970 --> 00:13:30,249


284
00:13:30,249 --> 00:13:32,650

 
285
00:13:32,650 --> 00:13:34,539

 
286
00:13:34,539 --> 00:13:37,480

 
287
00:13:37,480 --> 00:13:41,400

 
288
00:13:41,400 --> 00:13:44,889

 
289
00:13:44,889 --> 00:13:46,419

 
290
00:13:46,419 --> 00:13:47,859

 
291
00:13:47,859 --> 00:13:49,749

 
292
00:13:49,749 --> 00:13:52,119

 
293
00:13:52,119 --> 00:13:54,970

 
294
00:13:54,970 --> 00:13:57,100

 
295
00:13:57,100 --> 00:13:59,379

 
296
00:13:59,379 --> 00:14:04,059

 
297
00:14:04,059 --> 00:14:04,069

 
298
00:14:04,069 --> 00:14:04,519


299
00:14:04,519 --> 00:14:08,960

 
300
00:14:08,960 --> 00:14:11,090

 
301
00:14:11,090 --> 00:14:13,850

 
302
00:14:13,850 --> 00:14:15,590

 
303
00:14:15,590 --> 00:14:17,329

 
304
00:14:17,329 --> 00:14:19,840

 
305
00:14:19,840 --> 00:14:22,730

 
306
00:14:22,730 --> 00:14:27,889

 
307
00:14:27,889 --> 00:14:38,629

 
308
00:14:38,629 --> 00:14:42,530

 
309
00:14:42,530 --> 00:14:44,509

 
310
00:14:44,509 --> 00:14:47,090

 
311
00:14:47,090 --> 00:14:49,309

 
312
00:14:49,309 --> 00:14:51,139

 
313
00:14:51,139 --> 00:14:54,019

 
314
00:14:54,019 --> 00:14:55,429

 
315
00:14:55,429 --> 00:14:57,410

 
316
00:14:57,410 --> 00:14:59,170

 
317
00:14:59,170 --> 00:15:01,639

 
318
00:15:01,639 --> 00:15:02,840

 
319
00:15:02,840 --> 00:15:06,129

 
320
00:15:06,129 --> 00:15:10,009

 
321
00:15:10,009 --> 00:15:12,949

 
322
00:15:12,949 --> 00:15:15,170

 
323
00:15:15,170 --> 00:15:16,790

 
324
00:15:16,790 --> 00:15:19,970

 
325
00:15:19,970 --> 00:15:22,069

 
326
00:15:22,069 --> 00:15:23,840

 
327
00:15:23,840 --> 00:15:26,210

 
328
00:15:26,210 --> 00:15:28,429

 
329
00:15:28,429 --> 00:15:31,369

 
330
00:15:31,369 --> 00:15:32,869

 
331
00:15:32,869 --> 00:15:34,249

 
332
00:15:34,249 --> 00:15:37,299

 
333
00:15:37,299 --> 00:15:41,150

 
334
00:15:41,150 --> 00:15:43,519

 
335
00:15:43,519 --> 00:15:46,519

 
336
00:15:46,519 --> 00:15:48,199

 
337
00:15:48,199 --> 00:15:50,629

 
338
00:15:50,629 --> 00:15:54,980

 
339
00:15:54,980 --> 00:15:58,280

 
340
00:15:58,280 --> 00:16:00,559

 
341
00:16:00,559 --> 00:16:04,639

 
342
00:16:04,639 --> 00:16:08,299

 
343
00:16:08,299 --> 00:16:10,879

 
344
00:16:10,879 --> 00:16:13,790

 
345
00:16:13,790 --> 00:16:15,769

 
346
00:16:15,769 --> 00:16:17,389

 
347
00:16:17,389 --> 00:16:17,399

 
348
00:16:17,399 --> 00:16:18,300


349
00:16:18,300 --> 00:16:20,180

 
350
00:16:20,180 --> 00:16:24,840

 
351
00:16:24,840 --> 00:16:26,670

 
352
00:16:26,670 --> 00:16:29,210

 
353
00:16:29,210 --> 00:16:32,519

 
354
00:16:32,519 --> 00:16:35,720

 
355
00:16:35,720 --> 00:16:38,370

 
356
00:16:38,370 --> 00:16:40,590

 
357
00:16:40,590 --> 00:16:45,870

 
358
00:16:45,870 --> 00:16:48,810

 
359
00:16:48,810 --> 00:16:51,480

 
360
00:16:51,480 --> 00:16:53,190

 
361
00:16:53,190 --> 00:16:56,460

 
362
00:16:56,460 --> 00:16:59,720

 
363
00:16:59,720 --> 00:17:03,030

 
364
00:17:03,030 --> 00:17:05,100

 
365
00:17:05,100 --> 00:17:07,260

 
366
00:17:07,260 --> 00:17:09,900

 
367
00:17:09,900 --> 00:17:11,699

 
368
00:17:11,699 --> 00:17:13,620

 
369
00:17:13,620 --> 00:17:15,990

 
370
00:17:15,990 --> 00:17:17,970

 
371
00:17:17,970 --> 00:17:19,740

 
372
00:17:19,740 --> 00:17:22,620

 
373
00:17:22,620 --> 00:17:24,630

 
374
00:17:24,630 --> 00:17:26,760

 
375
00:17:26,760 --> 00:17:29,690

 
376
00:17:29,690 --> 00:17:32,190

 
377
00:17:32,190 --> 00:17:36,660

 
378
00:17:36,660 --> 00:17:38,550

 
379
00:17:38,550 --> 00:17:40,590

 
380
00:17:40,590 --> 00:17:42,810

 
381
00:17:42,810 --> 00:17:44,640

 
382
00:17:44,640 --> 00:17:46,560

 
383
00:17:46,560 --> 00:17:49,410

 
384
00:17:49,410 --> 00:17:53,280

 
385
00:17:53,280 --> 00:17:55,020

 
386
00:17:55,020 --> 00:17:56,520

 
387
00:17:56,520 --> 00:17:58,910

 
388
00:17:58,910 --> 00:18:03,060

 
389
00:18:03,060 --> 00:18:04,980

 
390
00:18:04,980 --> 00:18:06,630

 
391
00:18:06,630 --> 00:18:09,120

 
392
00:18:09,120 --> 00:18:11,070

 
393
00:18:11,070 --> 00:18:14,280

 
394
00:18:14,280 --> 00:18:15,840

 
395
00:18:15,840 --> 00:18:17,580

 
396
00:18:17,580 --> 00:18:20,760

 
397
00:18:20,760 --> 00:18:22,740

 
398
00:18:22,740 --> 00:18:25,560

 
399
00:18:25,560 --> 00:18:27,840

 
400
00:18:27,840 --> 00:18:29,850

 
401
00:18:29,850 --> 00:18:31,890

 
402
00:18:31,890 --> 00:18:33,780

 
403
00:18:33,780 --> 00:18:35,880

 
404
00:18:35,880 --> 00:18:37,740

 
405
00:18:37,740 --> 00:18:39,510

 
406
00:18:39,510 --> 00:18:41,460

 
407
00:18:41,460 --> 00:18:43,620

 
408
00:18:43,620 --> 00:18:45,750

 
409
00:18:45,750 --> 00:18:49,550

 
410
00:18:49,550 --> 00:18:53,660

 
411
00:18:53,660 --> 00:18:56,850

 
412
00:18:56,850 --> 00:18:59,790

 
413
00:18:59,790 --> 00:19:03,110

 
414
00:19:03,110 --> 00:19:05,160

 
415
00:19:05,160 --> 00:19:07,350

 
416
00:19:07,350 --> 00:19:09,000

 
417
00:19:09,000 --> 00:19:12,480

 
418
00:19:12,480 --> 00:19:14,400

 
419
00:19:14,400 --> 00:19:16,440

 
420
00:19:16,440 --> 00:19:18,000

 
421
00:19:18,000 --> 00:19:25,380

 
422
00:19:25,380 --> 00:19:30,330

 
423
00:19:30,330 --> 00:19:32,850

 
424
00:19:32,850 --> 00:19:35,750

 
425
00:19:35,750 --> 00:19:38,460

 
426
00:19:38,460 --> 00:19:42,990

 
427
00:19:42,990 --> 00:19:46,910

 
428
00:19:46,910 --> 00:19:50,280

 
429
00:19:50,280 --> 00:19:52,080

 
430
00:19:52,080 --> 00:19:56,850

 
431
00:19:56,850 --> 00:19:59,850

 
432
00:19:59,850 --> 00:20:02,700

 
433
00:20:02,700 --> 00:20:07,740

 
434
00:20:07,740 --> 00:20:11,310

 
435
00:20:11,310 --> 00:20:13,020

 
436
00:20:13,020 --> 00:20:17,310

 
437
00:20:17,310 --> 00:20:20,400

 
438
00:20:20,400 --> 00:20:26,610

 
439
00:20:26,610 --> 00:20:29,580

 
440
00:20:29,580 --> 00:20:33,420

 
441
00:20:33,420 --> 00:20:38,490

 
442
00:20:38,490 --> 00:20:40,230

 
443
00:20:40,230 --> 00:20:47,490

 
444
00:20:47,490 --> 00:20:50,500

 
445
00:20:50,500 --> 00:20:54,520

 
446
00:20:54,520 --> 00:20:57,250

 
447
00:20:57,250 --> 00:21:03,580

 
448
00:21:03,580 --> 00:21:09,460

 
449
00:21:09,460 --> 00:21:16,600

 
450
00:21:16,600 --> 00:21:19,090

 
451
00:21:19,090 --> 00:21:21,730

 
452
00:21:21,730 --> 00:21:25,030

 
453
00:21:25,030 --> 00:21:31,490

 
454
00:21:31,490 --> 00:21:39,570

 
455
00:21:39,570 --> 00:21:44,370

 
456
00:21:44,370 --> 00:21:47,100

 
457
00:21:47,100 --> 00:21:49,740

 
458
00:21:49,740 --> 00:21:52,140

 
459
00:21:52,140 --> 00:21:55,370

 
460
00:21:55,370 --> 00:21:57,570

 
461
00:21:57,570 --> 00:21:59,670

 
462
00:21:59,670 --> 00:22:04,560

 
463
00:22:04,560 --> 00:22:06,690

 
464
00:22:06,690 --> 00:22:11,490

 
465
00:22:11,490 --> 00:22:13,050

 
466
00:22:13,050 --> 00:22:15,680

 
467
00:22:15,680 --> 00:22:18,270

 
468
00:22:18,270 --> 00:22:21,150

 
469
00:22:21,150 --> 00:22:23,340

 
470
00:22:23,340 --> 00:22:25,440

 
471
00:22:25,440 --> 00:22:28,140

 
472
00:22:28,140 --> 00:22:30,090

 
473
00:22:30,090 --> 00:22:33,360

 
474
00:22:33,360 --> 00:22:34,980

 
475
00:22:34,980 --> 00:22:37,020

 
476
00:22:37,020 --> 00:22:39,180

 
477
00:22:39,180 --> 00:22:42,150

 
478
00:22:42,150 --> 00:22:44,010

 
479
00:22:44,010 --> 00:22:48,840

 
480
00:22:48,840 --> 00:22:51,030

 
481
00:22:51,030 --> 00:22:54,210

 
482
00:22:54,210 --> 00:22:54,220

 
483
00:22:54,220 --> 00:23:04,850


484
00:23:04,850 --> 00:23:04,860

 
485
00:23:04,860 --> 00:23:09,240


486
00:23:09,240 --> 00:23:11,620

 
487
00:23:11,620 --> 00:23:14,650

 
488
00:23:14,650 --> 00:23:17,260

 
489
00:23:17,260 --> 00:23:19,660

 
490
00:23:19,660 --> 00:23:23,260

 
491
00:23:23,260 --> 00:23:25,300

 
492
00:23:25,300 --> 00:23:29,230

 
493
00:23:29,230 --> 00:23:31,330

 
494
00:23:31,330 --> 00:23:34,480

 
495
00:23:34,480 --> 00:23:37,240

 
496
00:23:37,240 --> 00:23:41,050

 
497
00:23:41,050 --> 00:23:44,500

 
498
00:23:44,500 --> 00:23:49,030

 
499
00:23:49,030 --> 00:23:51,250

 
500
00:23:51,250 --> 00:23:53,350

 
501
00:23:53,350 --> 00:23:55,380

 
502
00:23:55,380 --> 00:23:58,870

 
503
00:23:58,870 --> 00:24:00,850

 
504
00:24:00,850 --> 00:24:02,680

 
505
00:24:02,680 --> 00:24:04,810

 
506
00:24:04,810 --> 00:24:06,880

 
507
00:24:06,880 --> 00:24:11,350

 
508
00:24:11,350 --> 00:24:14,590

 
509
00:24:14,590 --> 00:24:14,600

 
510
00:24:14,600 --> 00:24:22,000


511
00:24:22,000 --> 00:24:25,280

 
512
00:24:25,280 --> 00:24:28,520

 
513
00:24:28,520 --> 00:24:31,220

 
514
00:24:31,220 --> 00:24:39,230

 
515
00:24:39,230 --> 00:24:45,230

 
516
00:24:45,230 --> 00:24:47,779

 
517
00:24:47,779 --> 00:24:49,760

 
518
00:24:49,760 --> 00:24:52,580

 
519
00:24:52,580 --> 00:24:54,919

 
520
00:24:54,919 --> 00:24:57,500

 
521
00:24:57,500 --> 00:25:00,519

 
522
00:25:00,519 --> 00:25:03,049

 
523
00:25:03,049 --> 00:25:03,059

 
524
00:25:03,059 --> 00:25:03,710


525
00:25:03,710 --> 00:25:06,470

 
526
00:25:06,470 --> 00:25:09,560

 
527
00:25:09,560 --> 00:25:12,409

 
528
00:25:12,409 --> 00:25:15,409

 
529
00:25:15,409 --> 00:25:17,419

 
530
00:25:17,419 --> 00:25:19,720

 
531
00:25:19,720 --> 00:25:23,210

 
532
00:25:23,210 --> 00:25:25,430

 
533
00:25:25,430 --> 00:25:27,500

 
534
00:25:27,500 --> 00:25:30,799

 
535
00:25:30,799 --> 00:25:33,620

 
536
00:25:33,620 --> 00:25:35,630

 
537
00:25:35,630 --> 00:25:37,310

 
538
00:25:37,310 --> 00:25:40,120

 
539
00:25:40,120 --> 00:25:42,830

 
540
00:25:42,830 --> 00:25:45,560

 
541
00:25:45,560 --> 00:25:48,230

 
542
00:25:48,230 --> 00:25:50,799

 
543
00:25:50,799 --> 00:25:53,720

 
544
00:25:53,720 --> 00:25:55,519

 
545
00:25:55,519 --> 00:26:04,950

 
546
00:26:04,950 --> 00:26:09,460

 
547
00:26:09,460 --> 00:26:10,960

 
548
00:26:10,960 --> 00:26:12,370

 
549
00:26:12,370 --> 00:26:14,800

 
550
00:26:14,800 --> 00:26:16,420

 
551
00:26:16,420 --> 00:26:20,080

 
552
00:26:20,080 --> 00:26:22,270

 
553
00:26:22,270 --> 00:26:24,280

 
554
00:26:24,280 --> 00:26:26,080

 
555
00:26:26,080 --> 00:26:28,810

 
556
00:26:28,810 --> 00:26:32,380

 
557
00:26:32,380 --> 00:26:35,290

 
558
00:26:35,290 --> 00:26:41,080

 
559
00:26:41,080 --> 00:26:43,120

 
560
00:26:43,120 --> 00:26:45,100

 
561
00:26:45,100 --> 00:26:46,900

 
562
00:26:46,900 --> 00:26:52,930

 
563
00:26:52,930 --> 00:26:55,090

 
564
00:26:55,090 --> 00:26:58,900

 
565
00:26:58,900 --> 00:27:03,010

 
566
00:27:03,010 --> 00:27:06,370

 
567
00:27:06,370 --> 00:27:08,890

 
568
00:27:08,890 --> 00:27:13,450

 
569
00:27:13,450 --> 00:27:15,160

 
570
00:27:15,160 --> 00:27:17,230

 
571
00:27:17,230 --> 00:27:19,360

 
572
00:27:19,360 --> 00:27:22,060

 
573
00:27:22,060 --> 00:27:24,160

 
574
00:27:24,160 --> 00:27:25,840

 
575
00:27:25,840 --> 00:27:27,700

 
576
00:27:27,700 --> 00:27:29,800

 
577
00:27:29,800 --> 00:27:31,390

 
578
00:27:31,390 --> 00:27:33,160

 
579
00:27:33,160 --> 00:27:35,140

 
580
00:27:35,140 --> 00:27:36,820

 
581
00:27:36,820 --> 00:27:39,070

 
582
00:27:39,070 --> 00:27:40,780

 
583
00:27:40,780 --> 00:27:46,210

 
584
00:27:46,210 --> 00:27:48,580

 
585
00:27:48,580 --> 00:27:50,980

 
586
00:27:50,980 --> 00:27:52,870

 
587
00:27:52,870 --> 00:27:57,460

 
588
00:27:57,460 --> 00:27:59,230

 
589
00:27:59,230 --> 00:28:01,360

 
590
00:28:01,360 --> 00:28:03,460

 
591
00:28:03,460 --> 00:28:06,130

 
592
00:28:06,130 --> 00:28:08,080

 
593
00:28:08,080 --> 00:28:10,390

 
594
00:28:10,390 --> 00:28:13,900

 
595
00:28:13,900 --> 00:28:15,820

 
596
00:28:15,820 --> 00:28:16,610

 
597
00:28:16,610 --> 00:28:19,670

 
598
00:28:19,670 --> 00:28:22,310

 
599
00:28:22,310 --> 00:28:23,960

 
600
00:28:23,960 --> 00:28:26,270

 
601
00:28:26,270 --> 00:28:27,860

 
602
00:28:27,860 --> 00:28:34,730

 
603
00:28:34,730 --> 00:28:36,830

 
604
00:28:36,830 --> 00:28:38,660

 
605
00:28:38,660 --> 00:28:41,150

 
606
00:28:41,150 --> 00:28:44,750

 
607
00:28:44,750 --> 00:28:46,670

 
608
00:28:46,670 --> 00:28:47,690

 
609
00:28:47,690 --> 00:28:50,150

 
610
00:28:50,150 --> 00:28:53,240

 
611
00:28:53,240 --> 00:28:55,610

 
612
00:28:55,610 --> 00:28:57,320

 
613
00:28:57,320 --> 00:28:59,540

 
614
00:28:59,540 --> 00:29:01,250

 
615
00:29:01,250 --> 00:29:03,140

 
616
00:29:03,140 --> 00:29:05,299

 
617
00:29:05,299 --> 00:29:09,890

 
618
00:29:09,890 --> 00:29:18,550

 
619
00:29:18,550 --> 00:29:22,310

 
620
00:29:22,310 --> 00:29:24,980

 
621
00:29:24,980 --> 00:29:27,950

 
622
00:29:27,950 --> 00:29:30,910

 
623
00:29:30,910 --> 00:29:33,770

 
624
00:29:33,770 --> 00:29:36,950

 
625
00:29:36,950 --> 00:29:40,070

 
626
00:29:40,070 --> 00:29:42,530

 
627
00:29:42,530 --> 00:29:45,730

 
628
00:29:45,730 --> 00:29:48,530

 
629
00:29:48,530 --> 00:29:51,230

 
630
00:29:51,230 --> 00:29:54,140

 
631
00:29:54,140 --> 00:29:57,320

 
632
00:29:57,320 --> 00:30:01,430

 
633
00:30:01,430 --> 00:30:04,190

 
634
00:30:04,190 --> 00:30:08,990

 
635
00:30:08,990 --> 00:30:11,930

 
636
00:30:11,930 --> 00:30:14,240

 
637
00:30:14,240 --> 00:30:17,810

 
638
00:30:17,810 --> 00:30:19,880

 
639
00:30:19,880 --> 00:30:22,940

 
640
00:30:22,940 --> 00:30:25,400

 
641
00:30:25,400 --> 00:30:27,200

 
642
00:30:27,200 --> 00:30:31,990

 
643
00:30:31,990 --> 00:30:35,060

 
644
00:30:35,060 --> 00:30:37,100

 
645
00:30:37,100 --> 00:30:39,320

 
646
00:30:39,320 --> 00:30:41,900

 
647
00:30:41,900 --> 00:30:45,290

 
648
00:30:45,290 --> 00:30:46,910

 
649
00:30:46,910 --> 00:30:50,540

 
650
00:30:50,540 --> 00:30:53,150

 
651
00:30:53,150 --> 00:30:55,850

 
652
00:30:55,850 --> 00:30:59,090

 
653
00:30:59,090 --> 00:31:05,120

 
654
00:31:05,120 --> 00:31:08,180

 
655
00:31:08,180 --> 00:31:11,750

 
656
00:31:11,750 --> 00:31:14,030

 
657
00:31:14,030 --> 00:31:16,610

 
658
00:31:16,610 --> 00:31:19,520

 
659
00:31:19,520 --> 00:31:21,830

 
660
00:31:21,830 --> 00:31:24,860

 
661
00:31:24,860 --> 00:31:26,630

 
662
00:31:26,630 --> 00:31:28,090

 
663
00:31:28,090 --> 00:31:31,190

 
664
00:31:31,190 --> 00:31:31,700

 
665
00:31:31,700 --> 00:31:35,150

 
666
00:31:35,150 --> 00:31:36,560

 
667
00:31:36,560 --> 00:31:39,200

 
668
00:31:39,200 --> 00:31:40,700

 
669
00:31:40,700 --> 00:31:42,680

 
670
00:31:42,680 --> 00:31:57,600

 
671
00:31:57,600 --> 00:32:00,090

 
672
00:32:00,090 --> 00:32:03,510

 
673
00:32:03,510 --> 00:32:03,520

 
674
00:32:03,520 --> 00:32:03,960


675
00:32:03,960 --> 00:32:06,240

 
676
00:32:06,240 --> 00:32:09,210

 
677
00:32:09,210 --> 00:32:12,030

 
678
00:32:12,030 --> 00:32:15,480

 
679
00:32:15,480 --> 00:32:18,110

 
680
00:32:18,110 --> 00:32:21,090

 
681
00:32:21,090 --> 00:32:24,480

 
682
00:32:24,480 --> 00:32:26,790

 
683
00:32:26,790 --> 00:32:30,300

 
684
00:32:30,300 --> 00:32:32,730

 
685
00:32:32,730 --> 00:32:35,490

 
686
00:32:35,490 --> 00:32:38,400

 
687
00:32:38,400 --> 00:32:41,100

 
688
00:32:41,100 --> 00:32:43,980

 
689
00:32:43,980 --> 00:32:49,080

 
690
00:32:49,080 --> 00:32:52,940

 
691
00:32:52,940 --> 00:32:55,410

 
692
00:32:55,410 --> 00:32:58,140

 
693
00:32:58,140 --> 00:33:02,460

 
694
00:33:02,460 --> 00:33:04,260

 
695
00:33:04,260 --> 00:33:06,330

 
696
00:33:06,330 --> 00:33:08,460

 
697
00:33:08,460 --> 00:33:14,220

 
698
00:33:14,220 --> 00:33:19,560

 
699
00:33:19,560 --> 00:33:21,660

 
700
00:33:21,660 --> 00:33:23,550

 
701
00:33:23,550 --> 00:33:25,800

 
702
00:33:25,800 --> 00:33:28,680

 
703
00:33:28,680 --> 00:33:30,750

 
704
00:33:30,750 --> 00:33:32,970

 
705
00:33:32,970 --> 00:33:35,040

 
706
00:33:35,040 --> 00:33:37,110

 
707
00:33:37,110 --> 00:33:41,940

 
708
00:33:41,940 --> 00:33:48,900

 
709
00:33:48,900 --> 00:33:51,840

 
710
00:33:51,840 --> 00:33:54,450

 
711
00:33:54,450 --> 00:33:59,610

 
712
00:33:59,610 --> 00:34:01,710

 
713
00:34:01,710 --> 00:34:08,629

 
714
00:34:08,629 --> 00:34:11,460

 
715
00:34:11,460 --> 00:34:13,559

 
716
00:34:13,559 --> 00:34:24,830

 
717
00:34:24,830 --> 00:34:28,159

 
718
00:34:28,159 --> 00:34:30,480

 
719
00:34:30,480 --> 00:34:35,100

 
720
00:34:35,100 --> 00:34:38,340

 
721
00:34:38,340 --> 00:34:41,639

 
722
00:34:41,639 --> 00:34:44,550

 
723
00:34:44,550 --> 00:34:47,100

 
724
00:34:47,100 --> 00:34:49,109

 
725
00:34:49,109 --> 00:34:51,389

 
726
00:34:51,389 --> 00:34:55,440

 
727
00:34:55,440 --> 00:34:57,450

 
728
00:34:57,450 --> 00:35:00,810

 
729
00:35:00,810 --> 00:35:05,400

 
730
00:35:05,400 --> 00:35:07,590

 
731
00:35:07,590 --> 00:35:11,790

 
732
00:35:11,790 --> 00:35:14,270

 
733
00:35:14,270 --> 00:35:16,560

 
734
00:35:16,560 --> 00:35:18,750

 
735
00:35:18,750 --> 00:35:21,990

 
736
00:35:21,990 --> 00:35:25,820

 
737
00:35:25,820 --> 00:35:29,550

 
738
00:35:29,550 --> 00:35:31,380

 
739
00:35:31,380 --> 00:35:37,710

 
740
00:35:37,710 --> 00:35:40,360

 
741
00:35:40,360 --> 00:35:43,180

 
742
00:35:43,180 --> 00:35:45,610

 
743
00:35:45,610 --> 00:35:49,450

 
744
00:35:49,450 --> 00:35:53,050

 
745
00:35:53,050 --> 00:36:00,730

 
746
00:36:00,730 --> 00:36:06,720

 
747
00:36:06,720 --> 00:36:09,220

 
748
00:36:09,220 --> 00:36:11,800

 
749
00:36:11,800 --> 00:36:13,870

 
750
00:36:13,870 --> 00:36:15,970

 
751
00:36:15,970 --> 00:36:19,060

 
752
00:36:19,060 --> 00:36:21,870

 
753
00:36:21,870 --> 00:36:24,550

 
754
00:36:24,550 --> 00:36:26,470

 
755
00:36:26,470 --> 00:36:28,570

 
756
00:36:28,570 --> 00:36:30,520

 
757
00:36:30,520 --> 00:36:31,840

 
758
00:36:31,840 --> 00:36:34,000

 
759
00:36:34,000 --> 00:36:41,440

 
760
00:36:41,440 --> 00:36:44,200

 
761
00:36:44,200 --> 00:36:47,200

 
762
00:36:47,200 --> 00:36:49,660

 
763
00:36:49,660 --> 00:36:51,820

 
764
00:36:51,820 --> 00:36:55,540

 
765
00:36:55,540 --> 00:36:57,970

 
766
00:36:57,970 --> 00:37:03,160

 
767
00:37:03,160 --> 00:37:05,230

 
768
00:37:05,230 --> 00:37:08,170

 
769
00:37:08,170 --> 00:37:15,100

 
770
00:37:15,100 --> 00:37:17,050

 
771
00:37:17,050 --> 00:37:21,580

 
772
00:37:21,580 --> 00:37:23,500

 
773
00:37:23,500 --> 00:37:25,990

 
774
00:37:25,990 --> 00:37:29,470

 
775
00:37:29,470 --> 00:37:35,200

 
776
00:37:35,200 --> 00:37:38,620

 
777
00:37:38,620 --> 00:37:41,650

 
778
00:37:41,650 --> 00:37:43,900

 
779
00:37:43,900 --> 00:37:46,630

 
780
00:37:46,630 --> 00:37:49,360

 
781
00:37:49,360 --> 00:37:51,430

 
782
00:37:51,430 --> 00:37:53,440

 
783
00:37:53,440 --> 00:37:57,460

 
784
00:37:57,460 --> 00:37:59,710

 
785
00:37:59,710 --> 00:38:05,440

 
786
00:38:05,440 --> 00:38:06,730

 
787
00:38:06,730 --> 00:38:07,750

 
788
00:38:07,750 --> 00:38:10,150

 
789
00:38:10,150 --> 00:38:12,190

 
790
00:38:12,190 --> 00:38:14,589

 
791
00:38:14,589 --> 00:38:18,940

 
792
00:38:18,940 --> 00:38:21,970

 
793
00:38:21,970 --> 00:38:24,819

 
794
00:38:24,819 --> 00:38:28,120

 
795
00:38:28,120 --> 00:38:30,700

 
796
00:38:30,700 --> 00:38:33,609

 
797
00:38:33,609 --> 00:38:35,710

 
798
00:38:35,710 --> 00:38:38,920

 
799
00:38:38,920 --> 00:38:42,970

 
800
00:38:42,970 --> 00:38:45,099

 
801
00:38:45,099 --> 00:38:47,890

 
802
00:38:47,890 --> 00:38:50,230

 
803
00:38:50,230 --> 00:38:52,089

 
804
00:38:52,089 --> 00:38:53,440

 
805
00:38:53,440 --> 00:38:55,690

 
806
00:38:55,690 --> 00:38:59,650

 
807
00:38:59,650 --> 00:39:01,599

 
808
00:39:01,599 --> 00:39:05,380

 
809
00:39:05,380 --> 00:39:07,809

 
810
00:39:07,809 --> 00:39:10,480

 
811
00:39:10,480 --> 00:39:13,120

 
812
00:39:13,120 --> 00:39:14,680

 
813
00:39:14,680 --> 00:39:18,160

 
814
00:39:18,160 --> 00:39:19,890

 
815
00:39:19,890 --> 00:39:25,990

 
816
00:39:25,990 --> 00:39:27,849

 
817
00:39:27,849 --> 00:39:30,220

 
818
00:39:30,220 --> 00:39:33,010

 
819
00:39:33,010 --> 00:39:34,870

 
820
00:39:34,870 --> 00:39:39,760

 
821
00:39:39,760 --> 00:39:41,589

 
822
00:39:41,589 --> 00:39:43,359

 
823
00:39:43,359 --> 00:39:46,630

 
824
00:39:46,630 --> 00:39:53,819

 
825
00:39:53,819 --> 00:39:53,829

 
826
00:39:53,829 --> 00:39:55,850


827
00:39:55,850 --> 00:39:58,200

 
828
00:39:58,200 --> 00:39:59,820

 
829
00:39:59,820 --> 00:40:05,130

 
830
00:40:05,130 --> 00:40:07,650

 
831
00:40:07,650 --> 00:40:10,590

 
832
00:40:10,590 --> 00:40:13,230

 
833
00:40:13,230 --> 00:40:18,420

 
834
00:40:18,420 --> 00:40:20,580

 
835
00:40:20,580 --> 00:40:25,560

 
836
00:40:25,560 --> 00:40:27,750

 
837
00:40:27,750 --> 00:40:30,210

 
838
00:40:30,210 --> 00:40:37,370

 
839
00:40:37,370 --> 00:40:42,990

 
840
00:40:42,990 --> 00:40:46,290

 
841
00:40:46,290 --> 00:40:48,120

 
842
00:40:48,120 --> 00:40:50,520

 
843
00:40:50,520 --> 00:40:54,660

 
844
00:40:54,660 --> 00:40:57,480

 
845
00:40:57,480 --> 00:40:59,490

 
846
00:40:59,490 --> 00:41:02,220

 
847
00:41:02,220 --> 00:41:04,470

 
848
00:41:04,470 --> 00:41:06,960

 
849
00:41:06,960 --> 00:41:10,470

 
850
00:41:10,470 --> 00:41:14,340

 
851
00:41:14,340 --> 00:41:16,020

 
852
00:41:16,020 --> 00:41:18,030

 
853
00:41:18,030 --> 00:41:20,160

 
854
00:41:20,160 --> 00:41:24,330

 
855
00:41:24,330 --> 00:41:29,460

 
856
00:41:29,460 --> 00:41:32,760

 
857
00:41:32,760 --> 00:41:36,720

 
858
00:41:36,720 --> 00:41:39,750

 
859
00:41:39,750 --> 00:41:43,860

 
860
00:41:43,860 --> 00:41:48,210

 
861
00:41:48,210 --> 00:41:50,820

 
862
00:41:50,820 --> 00:41:56,890

 
863
00:41:56,890 --> 00:41:59,930

 
864
00:41:59,930 --> 00:42:02,390

 
865
00:42:02,390 --> 00:42:04,160

 
866
00:42:04,160 --> 00:42:10,900

 
867
00:42:10,900 --> 00:42:14,630

 
868
00:42:14,630 --> 00:42:19,010

 
869
00:42:19,010 --> 00:42:21,350

 
870
00:42:21,350 --> 00:42:23,570

 
871
00:42:23,570 --> 00:42:26,480

 
872
00:42:26,480 --> 00:42:32,750

 
873
00:42:32,750 --> 00:42:34,820

 
874
00:42:34,820 --> 00:42:34,830

 
875
00:42:34,830 --> 00:42:48,050


876
00:42:48,050 --> 00:42:50,840

 
877
00:42:50,840 --> 00:42:53,840

 
878
00:42:53,840 --> 00:42:59,480

 
879
00:42:59,480 --> 00:43:06,260

 
880
00:43:06,260 --> 00:43:09,170

 
881
00:43:09,170 --> 00:43:13,010

 
882
00:43:13,010 --> 00:43:15,350

 
883
00:43:15,350 --> 00:43:18,110

 
884
00:43:18,110 --> 00:43:22,490

 
885
00:43:22,490 --> 00:43:23,960

 
886
00:43:23,960 --> 00:43:26,780

 
887
00:43:26,780 --> 00:43:28,460

 
888
00:43:28,460 --> 00:43:31,610

 
889
00:43:31,610 --> 00:43:34,570

 
890
00:43:34,570 --> 00:43:37,730

 
891
00:43:37,730 --> 00:43:40,670

 
892
00:43:40,670 --> 00:43:43,700

 
893
00:43:43,700 --> 00:43:46,490

 
894
00:43:46,490 --> 00:43:49,070

 
895
00:43:49,070 --> 00:43:53,300

 
896
00:43:53,300 --> 00:43:56,920

 
897
00:43:56,920 --> 00:44:00,520

 
898
00:44:00,520 --> 00:44:04,840

 
899
00:44:04,840 --> 00:44:08,440

 
900
00:44:08,440 --> 00:44:10,330

 
901
00:44:10,330 --> 00:44:12,820

 
902
00:44:12,820 --> 00:44:15,340

 
903
00:44:15,340 --> 00:44:20,740

 
904
00:44:20,740 --> 00:44:22,720

 
905
00:44:22,720 --> 00:44:25,270

 
906
00:44:25,270 --> 00:44:26,560

 
907
00:44:26,560 --> 00:44:28,420

 
908
00:44:28,420 --> 00:44:30,010

 
909
00:44:30,010 --> 00:44:32,710

 
910
00:44:32,710 --> 00:44:35,650

 
911
00:44:35,650 --> 00:44:38,680

 
912
00:44:38,680 --> 00:44:43,450

 
913
00:44:43,450 --> 00:44:45,490

 
914
00:44:45,490 --> 00:44:48,820

 
915
00:44:48,820 --> 00:44:50,440

 
916
00:44:50,440 --> 00:44:52,360

 
917
00:44:52,360 --> 00:44:53,500

 
918
00:44:53,500 --> 00:44:55,870

 
919
00:44:55,870 --> 00:45:06,550

 
920
00:45:06,550 --> 00:45:08,350

 
921
00:45:08,350 --> 00:45:11,140

 
922
00:45:11,140 --> 00:45:13,210

 
923
00:45:13,210 --> 00:45:17,740

 
924
00:45:17,740 --> 00:45:19,390

 
925
00:45:19,390 --> 00:45:24,220

 
926
00:45:24,220 --> 00:45:27,040

 
927
00:45:27,040 --> 00:45:30,760

 
928
00:45:30,760 --> 00:45:35,470

 
929
00:45:35,470 --> 00:45:39,430

 
930
00:45:39,430 --> 00:45:41,170

 
931
00:45:41,170 --> 00:45:48,870

 
932
00:45:48,870 --> 00:45:51,640

 
933
00:45:51,640 --> 00:45:56,980

 
934
00:45:56,980 --> 00:45:59,140

 
935
00:45:59,140 --> 00:46:03,340

 
936
00:46:03,340 --> 00:46:08,079

 
937
00:46:08,079 --> 00:46:10,519

 
938
00:46:10,519 --> 00:46:12,679

 
939
00:46:12,679 --> 00:46:13,339

 
940
00:46:13,339 --> 00:46:15,709

 
941
00:46:15,709 --> 00:46:18,140

 
942
00:46:18,140 --> 00:46:18,150

 
943
00:46:18,150 --> 00:46:26,790


944
00:46:26,790 --> 00:46:29,940

 
945
00:46:29,940 --> 00:46:37,550

 
946
00:46:37,550 --> 00:46:41,250

 
947
00:46:41,250 --> 00:46:43,980

 
948
00:46:43,980 --> 00:46:45,990

 
949
00:46:45,990 --> 00:46:49,860

 
950
00:46:49,860 --> 00:46:51,360

 
951
00:46:51,360 --> 00:46:53,610

 
952
00:46:53,610 --> 00:46:55,500

 
953
00:46:55,500 --> 00:46:58,140

 
954
00:46:58,140 --> 00:47:00,300

 
955
00:47:00,300 --> 00:47:03,030

 
956
00:47:03,030 --> 00:47:06,180

 
957
00:47:06,180 --> 00:47:10,800

 
958
00:47:10,800 --> 00:47:14,250

 
959
00:47:14,250 --> 00:47:16,980

 
960
00:47:16,980 --> 00:47:21,630

 
961
00:47:21,630 --> 00:47:24,120

 
962
00:47:24,120 --> 00:47:26,550

 
963
00:47:26,550 --> 00:47:28,560

 
964
00:47:28,560 --> 00:47:31,940

 
965
00:47:31,940 --> 00:47:36,540

 
966
00:47:36,540 --> 00:47:39,810

 
967
00:47:39,810 --> 00:47:43,980

 
968
00:47:43,980 --> 00:47:47,610

 
969
00:47:47,610 --> 00:47:50,820

 
970
00:47:50,820 --> 00:47:54,570

 
971
00:47:54,570 --> 00:47:56,580

 
972
00:47:56,580 --> 00:48:00,300

 
973
00:48:00,300 --> 00:48:02,430

 
974
00:48:02,430 --> 00:48:05,520

 
975
00:48:05,520 --> 00:48:08,910

 
976
00:48:08,910 --> 00:48:13,560

 
977
00:48:13,560 --> 00:48:15,990

 
978
00:48:15,990 --> 00:48:19,230

 
979
00:48:19,230 --> 00:48:22,830

 
980
00:48:22,830 --> 00:48:26,070

 
981
00:48:26,070 --> 00:48:32,160

 
982
00:48:32,160 --> 00:48:33,660

 
983
00:48:33,660 --> 00:48:36,990

 
984
00:48:36,990 --> 00:48:38,430

 
985
00:48:38,430 --> 00:48:43,770

 
986
00:48:43,770 --> 00:48:48,090

 
987
00:48:48,090 --> 00:48:50,490

 
988
00:48:50,490 --> 00:48:54,060

 
989
00:48:54,060 --> 00:48:56,000

 
990
00:48:56,000 --> 00:48:59,490

 
991
00:48:59,490 --> 00:49:01,380

 
992
00:49:01,380 --> 00:49:03,030

 
993
00:49:03,030 --> 00:49:04,680

 
994
00:49:04,680 --> 00:49:09,360

 
995
00:49:09,360 --> 00:49:13,800

 
996
00:49:13,800 --> 00:49:18,810

 
997
00:49:18,810 --> 00:49:20,940

 
998
00:49:20,940 --> 00:49:24,150

 
999
00:49:24,150 --> 00:49:25,710

 
1000
00:49:25,710 --> 00:49:28,230

 
1001
00:49:28,230 --> 00:49:33,960

 
1002
00:49:33,960 --> 00:49:38,340

 
1003
00:49:38,340 --> 00:49:42,060

 
1004
00:49:42,060 --> 00:49:44,640

 
1005
00:49:44,640 --> 00:49:46,620

 
1006
00:49:46,620 --> 00:49:48,350

 
1007
00:49:48,350 --> 00:49:50,880

 
1008
00:49:50,880 --> 00:49:53,820

 
1009
00:49:53,820 --> 00:49:56,310

 
1010
00:49:56,310 --> 00:49:57,990

 
1011
00:49:57,990 --> 00:50:00,900

 
1012
00:50:00,900 --> 00:50:07,440

 
1013
00:50:07,440 --> 00:50:16,089

 
1014
00:50:16,089 --> 00:50:18,549

 
1015
00:50:18,549 --> 00:50:21,940

 
1016
00:50:21,940 --> 00:50:24,430

 
1017
00:50:24,430 --> 00:50:28,180

 
1018
00:50:28,180 --> 00:50:30,250

 
1019
00:50:30,250 --> 00:50:32,799

 
1020
00:50:32,799 --> 00:50:37,900

 
1021
00:50:37,900 --> 00:50:39,670

 
1022
00:50:39,670 --> 00:50:45,549

 
1023
00:50:45,549 --> 00:50:57,910

 
1024
00:50:57,910 --> 00:51:00,670

 
1025
00:51:00,670 --> 00:51:09,010

 
1026
00:51:09,010 --> 00:51:11,230

 
1027
00:51:11,230 --> 00:51:12,549

 
1028
00:51:12,549 --> 00:51:14,410

 
1029
00:51:14,410 --> 00:51:17,079

 
1030
00:51:17,079 --> 00:51:19,329

 
1031
00:51:19,329 --> 00:51:30,100

 
1032
00:51:30,100 --> 00:51:32,380

 
1033
00:51:32,380 --> 00:51:34,240

 
1034
00:51:34,240 --> 00:51:35,920

 
1035
00:51:35,920 --> 00:51:37,720

 
1036
00:51:37,720 --> 00:51:39,820

 
1037
00:51:39,820 --> 00:51:42,910

 
1038
00:51:42,910 --> 00:51:45,790

 
1039
00:51:45,790 --> 00:51:48,790

 
1040
00:51:48,790 --> 00:51:51,220

 
1041
00:51:51,220 --> 00:51:53,320

 
1042
00:51:53,320 --> 00:51:57,520

 
1043
00:51:57,520 --> 00:51:59,680

 
1044
00:51:59,680 --> 00:52:01,720

 
1045
00:52:01,720 --> 00:52:05,140

 
1046
00:52:05,140 --> 00:52:10,720

 
1047
00:52:10,720 --> 00:52:13,450

 
1048
00:52:13,450 --> 00:52:15,190

 
1049
00:52:15,190 --> 00:52:18,520

 
1050
00:52:18,520 --> 00:52:20,140

 
1051
00:52:20,140 --> 00:52:23,530

 
1052
00:52:23,530 --> 00:52:29,770

 
1053
00:52:29,770 --> 00:52:31,630

 
1054
00:52:31,630 --> 00:52:35,200

 
1055
00:52:35,200 --> 00:52:37,990

 
1056
00:52:37,990 --> 00:52:42,220

 
1057
00:52:42,220 --> 00:52:43,870

 
1058
00:52:43,870 --> 00:52:46,750

 
1059
00:52:46,750 --> 00:52:48,790

 
1060
00:52:48,790 --> 00:52:50,470

 
1061
00:52:50,470 --> 00:52:54,880

 
1062
00:52:54,880 --> 00:52:58,360

 
1063
00:52:58,360 --> 00:53:00,670

 
1064
00:53:00,670 --> 00:53:03,180

 
1065
00:53:03,180 --> 00:53:05,680

 
1066
00:53:05,680 --> 00:53:09,010

 
1067
00:53:09,010 --> 00:53:10,810

 
1068
00:53:10,810 --> 00:53:13,030

 
1069
00:53:13,030 --> 00:53:14,590

 
1070
00:53:14,590 --> 00:53:16,330

 
1071
00:53:16,330 --> 00:53:18,490

 
1072
00:53:18,490 --> 00:53:20,170

 
1073
00:53:20,170 --> 00:53:31,109

 
1074
00:53:31,109 --> 00:53:33,690

 
1075
00:53:33,690 --> 00:53:38,069

 
1076
00:53:38,069 --> 00:53:39,690

 
1077
00:53:39,690 --> 00:53:46,229

 
1078
00:53:46,229 --> 00:53:48,900

 
1079
00:53:48,900 --> 00:53:55,469

 
1080
00:53:55,469 --> 00:54:02,130

 
1081
00:54:02,130 --> 00:54:05,579

 
1082
00:54:05,579 --> 00:54:10,019

 
1083
00:54:10,019 --> 00:54:12,180

 
1084
00:54:12,180 --> 00:54:14,069

 
1085
00:54:14,069 --> 00:54:16,140

 
1086
00:54:16,140 --> 00:54:20,849

 
1087
00:54:20,849 --> 00:54:23,489

 
1088
00:54:23,489 --> 00:54:25,469

 
1089
00:54:25,469 --> 00:54:27,779

 
1090
00:54:27,779 --> 00:54:31,349

 
1091
00:54:31,349 --> 00:54:34,529

 
1092
00:54:34,529 --> 00:54:36,209

 
1093
00:54:36,209 --> 00:54:38,249

 
1094
00:54:38,249 --> 00:54:41,690

 
1095
00:54:41,690 --> 00:54:44,160

 
1096
00:54:44,160 --> 00:54:50,880

 
1097
00:54:50,880 --> 00:54:55,620

 
1098
00:54:55,620 --> 00:54:58,499

 
1099
00:54:58,499 --> 00:55:01,200

 
1100
00:55:01,200 --> 00:55:03,209

 
1101
00:55:03,209 --> 00:55:06,559

 
1102
00:55:06,559 --> 00:55:09,180

 
1103
00:55:09,180 --> 00:55:11,640

 
1104
00:55:11,640 --> 00:55:13,529

 
1105
00:55:13,529 --> 00:55:18,870

 
1106
00:55:18,870 --> 00:55:21,769

 
1107
00:55:21,769 --> 00:55:24,809

 
1108
00:55:24,809 --> 00:55:27,150

 
1109
00:55:27,150 --> 00:55:30,269

 
1110
00:55:30,269 --> 00:55:33,299

 
1111
00:55:33,299 --> 00:55:37,920

 
1112
00:55:37,920 --> 00:55:40,559

 
1113
00:55:40,559 --> 00:55:40,569

 
1114
00:55:40,569 --> 00:55:41,400


1115
00:55:41,400 --> 00:55:44,010

 
1116
00:55:44,010 --> 00:55:49,160

 
1117
00:55:49,160 --> 00:55:52,250

 
1118
00:55:52,250 --> 00:55:55,019

 
1119
00:55:55,019 --> 00:55:56,970

 
1120
00:55:56,970 --> 00:55:58,980

 
1121
00:55:58,980 --> 00:56:03,240

 
1122
00:56:03,240 --> 00:56:06,210

 
1123
00:56:06,210 --> 00:56:06,220

 
1124
00:56:06,220 --> 00:56:06,930


1125
00:56:06,930 --> 00:56:09,120

 
1126
00:56:09,120 --> 00:56:11,519

 
1127
00:56:11,519 --> 00:56:11,529

 
1128
00:56:11,529 --> 00:56:12,120


1129
00:56:12,120 --> 00:56:14,279

 
1130
00:56:14,279 --> 00:56:16,440

 
1131
00:56:16,440 --> 00:56:21,630

 
1132
00:56:21,630 --> 00:56:23,370

 
1133
00:56:23,370 --> 00:56:26,250

 
1134
00:56:26,250 --> 00:56:27,900

 
1135
00:56:27,900 --> 00:56:30,480

 
1136
00:56:30,480 --> 00:56:32,700

 
1137
00:56:32,700 --> 00:56:36,329

 
1138
00:56:36,329 --> 00:56:40,650

 
1139
00:56:40,650 --> 00:56:43,769

 
1140
00:56:43,769 --> 00:56:48,990

 
1141
00:56:48,990 --> 00:56:50,760

 
1142
00:56:50,760 --> 00:56:52,380

 
1143
00:56:52,380 --> 00:56:54,750

 
1144
00:56:54,750 --> 00:56:56,849

 
1145
00:56:56,849 --> 00:57:00,599

 
1146
00:57:00,599 --> 00:57:05,579

 
1147
00:57:05,579 --> 00:57:06,870

 
1148
00:57:06,870 --> 00:57:09,210

 
1149
00:57:09,210 --> 00:57:11,549

 
1150
00:57:11,549 --> 00:57:13,140

 
1151
00:57:13,140 --> 00:57:14,970

 
1152
00:57:14,970 --> 00:57:17,700

 
1153
00:57:17,700 --> 00:57:19,890

 
1154
00:57:19,890 --> 00:57:23,849

 
1155
00:57:23,849 --> 00:57:26,309

 
1156
00:57:26,309 --> 00:57:30,269

 
1157
00:57:30,269 --> 00:57:32,579

 
1158
00:57:32,579 --> 00:57:34,289

 
1159
00:57:34,289 --> 00:57:36,029

 
1160
00:57:36,029 --> 00:57:40,349

 
1161
00:57:40,349 --> 00:57:42,390

 
1162
00:57:42,390 --> 00:57:45,890

 
1163
00:57:45,890 --> 00:57:48,720

 
1164
00:57:48,720 --> 00:57:50,789

 
1165
00:57:50,789 --> 00:57:51,589

 
1166
00:57:51,589 --> 00:57:53,809

 
1167
00:57:53,809 --> 00:57:57,170

 
1168
00:57:57,170 --> 00:57:59,420

 
1169
00:57:59,420 --> 00:58:01,640

 
1170
00:58:01,640 --> 00:58:03,199

 
1171
00:58:03,199 --> 00:58:05,630

 
1172
00:58:05,630 --> 00:58:10,759

 
1173
00:58:10,759 --> 00:58:14,299

 
1174
00:58:14,299 --> 00:58:25,549

 
1175
00:58:25,549 --> 00:58:28,130

 
1176
00:58:28,130 --> 00:58:31,400

 
1177
00:58:31,400 --> 00:58:33,170

 
1178
00:58:33,170 --> 00:58:36,529

 
1179
00:58:36,529 --> 00:58:38,900

 
1180
00:58:38,900 --> 00:58:40,819

 
1181
00:58:40,819 --> 00:58:42,739

 
1182
00:58:42,739 --> 00:58:44,989

 
1183
00:58:44,989 --> 00:58:47,809

 
1184
00:58:47,809 --> 00:58:50,269

 
1185
00:58:50,269 --> 00:58:52,999

 
1186
00:58:52,999 --> 00:58:57,289

 
1187
00:58:57,289 --> 00:58:59,029

 
1188
00:58:59,029 --> 00:59:00,650

 
1189
00:59:00,650 --> 00:59:02,870

 
1190
00:59:02,870 --> 00:59:04,489

 
1191
00:59:04,489 --> 00:59:07,430

 
1192
00:59:07,430 --> 00:59:09,829

 
1193
00:59:09,829 --> 00:59:15,799

 
1194
00:59:15,799 --> 00:59:17,689

 
1195
00:59:17,689 --> 00:59:20,719

 
1196
00:59:20,719 --> 00:59:22,989

 
1197
00:59:22,989 --> 00:59:25,880

 
1198
00:59:25,880 --> 00:59:27,890

 
1199
00:59:27,890 --> 00:59:30,589

 
1200
00:59:30,589 --> 00:59:31,930

 
1201
00:59:31,930 --> 00:59:34,370

 
1202
00:59:34,370 --> 00:59:36,349

 
1203
00:59:36,349 --> 00:59:38,329

 
1204
00:59:38,329 --> 00:59:39,920

 
1205
00:59:39,920 --> 00:59:42,140

 
1206
00:59:42,140 --> 00:59:44,029

 
1207
00:59:44,029 --> 00:59:48,259

 
1208
00:59:48,259 --> 00:59:51,380

 
1209
00:59:51,380 --> 00:59:56,689

 
1210
00:59:56,689 --> 00:59:58,880

 
1211
00:59:58,880 --> 01:00:02,959

 
1212
01:00:02,959 --> 01:00:05,180

 
1213
01:00:05,180 --> 01:00:05,190

 
1214
01:00:05,190 --> 01:00:05,599


1215
01:00:05,599 --> 01:00:08,150

 
1216
01:00:08,150 --> 01:00:12,019

 
1217
01:00:12,019 --> 01:00:13,819

 
1218
01:00:13,819 --> 01:00:16,130

 
1219
01:00:16,130 --> 01:00:18,319

 
1220
01:00:18,319 --> 01:00:21,019

 
1221
01:00:21,019 --> 01:00:22,309

 
1222
01:00:22,309 --> 01:00:24,680

 
1223
01:00:24,680 --> 01:00:26,390

 
1224
01:00:26,390 --> 01:00:27,710

 
1225
01:00:27,710 --> 01:00:29,390

 
1226
01:00:29,390 --> 01:00:33,559

 
1227
01:00:33,559 --> 01:00:42,319

 
1228
01:00:42,319 --> 01:00:44,599

 
1229
01:00:44,599 --> 01:00:47,269

 
1230
01:00:47,269 --> 01:00:53,359

 
1231
01:00:53,359 --> 01:00:56,989

 
1232
01:00:56,989 --> 01:01:00,490

 
1233
01:01:00,490 --> 01:01:03,650

 
1234
01:01:03,650 --> 01:01:03,660

 
1235
01:01:03,660 --> 01:01:07,020


1236
01:01:07,020 --> 01:01:11,940

 
1237
01:01:11,940 --> 01:01:23,580

 
1238
01:01:23,580 --> 01:01:26,010

 
1239
01:01:26,010 --> 01:01:27,930

 
1240
01:01:27,930 --> 01:01:29,339

 
1241
01:01:29,339 --> 01:01:36,320

 
1242
01:01:36,320 --> 01:01:39,500

 
1243
01:01:39,500 --> 01:01:40,980

 
1244
01:01:40,980 --> 01:01:42,390

 
1245
01:01:42,390 --> 01:01:45,510

 
1246
01:01:45,510 --> 01:01:46,770

 
1247
01:01:46,770 --> 01:01:48,630

 
1248
01:01:48,630 --> 01:01:56,280

 
1249
01:01:56,280 --> 01:01:56,290

 
1250
01:01:56,290 --> 01:01:59,840


1251
01:01:59,840 --> 01:02:04,370

 
1252
01:02:04,370 --> 01:02:05,900

 
1253
01:02:05,900 --> 01:02:07,460

 
1254
01:02:07,460 --> 01:02:09,440

 
1255
01:02:09,440 --> 01:02:11,200

 
1256
01:02:11,200 --> 01:02:13,550

 
1257
01:02:13,550 --> 01:02:16,220

 
1258
01:02:16,220 --> 01:02:19,760

 
1259
01:02:19,760 --> 01:02:21,680

 
1260
01:02:21,680 --> 01:02:25,340

 
1261
01:02:25,340 --> 01:02:26,870

 
1262
01:02:26,870 --> 01:02:28,580

 
1263
01:02:28,580 --> 01:02:32,240

 
1264
01:02:32,240 --> 01:02:35,240

 
1265
01:02:35,240 --> 01:02:37,880

 
1266
01:02:37,880 --> 01:02:40,610

 
1267
01:02:40,610 --> 01:02:42,500

 
1268
01:02:42,500 --> 01:02:45,620

 
1269
01:02:45,620 --> 01:02:48,680

 
1270
01:02:48,680 --> 01:02:54,020

 
1271
01:02:54,020 --> 01:02:56,080

 
1272
01:02:56,080 --> 01:03:00,230

 
1273
01:03:00,230 --> 01:03:04,160

 
1274
01:03:04,160 --> 01:03:08,300

 
1275
01:03:08,300 --> 01:03:10,790

 
1276
01:03:10,790 --> 01:03:13,690

 
1277
01:03:13,690 --> 01:03:17,120

 
1278
01:03:17,120 --> 01:03:18,500

 
1279
01:03:18,500 --> 01:03:23,020

 
1280
01:03:23,020 --> 01:03:26,300

 
1281
01:03:26,300 --> 01:03:28,070

 
1282
01:03:28,070 --> 01:03:32,300

 
1283
01:03:32,300 --> 01:03:34,070

 
1284
01:03:34,070 --> 01:03:37,340

 
1285
01:03:37,340 --> 01:03:39,260

 
1286
01:03:39,260 --> 01:03:41,510

 
1287
01:03:41,510 --> 01:03:43,640

 
1288
01:03:43,640 --> 01:03:46,450

 
1289
01:03:46,450 --> 01:03:49,520

 
1290
01:03:49,520 --> 01:03:51,500

 
1291
01:03:51,500 --> 01:03:53,210

 
1292
01:03:53,210 --> 01:03:55,400

 
1293
01:03:55,400 --> 01:03:59,810

 
1294
01:03:59,810 --> 01:04:02,990

 
1295
01:04:02,990 --> 01:04:07,070

 
1296
01:04:07,070 --> 01:04:10,760

 
1297
01:04:10,760 --> 01:04:12,930

 
1298
01:04:12,930 --> 01:04:16,380

 
1299
01:04:16,380 --> 01:04:18,210

 
1300
01:04:18,210 --> 01:04:21,380

 
1301
01:04:21,380 --> 01:04:25,319

 
1302
01:04:25,319 --> 01:04:28,620

 
1303
01:04:28,620 --> 01:04:35,010

 
1304
01:04:35,010 --> 01:04:39,059

 
1305
01:04:39,059 --> 01:04:42,180

 
1306
01:04:42,180 --> 01:04:45,030

 
1307
01:04:45,030 --> 01:04:48,359

 
1308
01:04:48,359 --> 01:04:54,270

 
1309
01:04:54,270 --> 01:04:56,790

 
1310
01:04:56,790 --> 01:05:02,359

 
1311
01:05:02,359 --> 01:05:05,609

 
1312
01:05:05,609 --> 01:05:09,839

 
1313
01:05:09,839 --> 01:05:14,339

 
1314
01:05:14,339 --> 01:05:19,940

 
1315
01:05:19,940 --> 01:05:22,200

 
1316
01:05:22,200 --> 01:05:25,050

 
1317
01:05:25,050 --> 01:05:26,819

 
1318
01:05:26,819 --> 01:05:29,640

 
1319
01:05:29,640 --> 01:05:34,109

 
1320
01:05:34,109 --> 01:05:35,910

 
1321
01:05:35,910 --> 01:05:38,819

 
1322
01:05:38,819 --> 01:05:41,640

 
1323
01:05:41,640 --> 01:05:43,559

 
1324
01:05:43,559 --> 01:05:45,270

 
1325
01:05:45,270 --> 01:05:47,220

 
1326
01:05:47,220 --> 01:05:48,900

 
1327
01:05:48,900 --> 01:05:50,370

 
1328
01:05:50,370 --> 01:05:54,180

 
1329
01:05:54,180 --> 01:05:56,609

 
1330
01:05:56,609 --> 01:05:59,790

 
1331
01:05:59,790 --> 01:06:03,390

 
1332
01:06:03,390 --> 01:06:10,800

 
1333
01:06:10,800 --> 01:06:15,640

 
1334
01:06:15,640 --> 01:06:15,650

 
1335
01:06:15,650 --> 01:06:17,100


1336
01:06:17,100 --> 01:06:19,900

 
1337
01:06:19,900 --> 01:06:22,060

 
1338
01:06:22,060 --> 01:06:26,200

 
1339
01:06:26,200 --> 01:06:28,180

 
1340
01:06:28,180 --> 01:06:30,220

 
1341
01:06:30,220 --> 01:06:32,980

 
1342
01:06:32,980 --> 01:06:35,890

 
1343
01:06:35,890 --> 01:06:41,470

 
1344
01:06:41,470 --> 01:06:44,890

 
1345
01:06:44,890 --> 01:06:48,070

 
1346
01:06:48,070 --> 01:06:52,660

 
1347
01:06:52,660 --> 01:06:54,550

 
1348
01:06:54,550 --> 01:06:57,070

 
1349
01:06:57,070 --> 01:06:59,110

 
1350
01:06:59,110 --> 01:07:02,650

 
1351
01:07:02,650 --> 01:07:04,450

 
1352
01:07:04,450 --> 01:07:06,520

 
1353
01:07:06,520 --> 01:07:08,350

 
1354
01:07:08,350 --> 01:07:11,140

 
1355
01:07:11,140 --> 01:07:14,320

 
1356
01:07:14,320 --> 01:07:16,330

 
1357
01:07:16,330 --> 01:07:20,530

 
1358
01:07:20,530 --> 01:07:21,910

 
1359
01:07:21,910 --> 01:07:26,350

 
1360
01:07:26,350 --> 01:07:27,910

 
1361
01:07:27,910 --> 01:07:31,680

 
1362
01:07:31,680 --> 01:07:34,780

 
1363
01:07:34,780 --> 01:07:37,480

 
1364
01:07:37,480 --> 01:07:39,550

 
1365
01:07:39,550 --> 01:07:42,070

 
1366
01:07:42,070 --> 01:07:43,750

 
1367
01:07:43,750 --> 01:07:45,490

 
1368
01:07:45,490 --> 01:07:48,490

 
1369
01:07:48,490 --> 01:07:50,020

 
1370
01:07:50,020 --> 01:07:52,540

 
1371
01:07:52,540 --> 01:07:56,080

 
1372
01:07:56,080 --> 01:07:58,600

 
1373
01:07:58,600 --> 01:08:00,310

 
1374
01:08:00,310 --> 01:08:03,940

 
1375
01:08:03,940 --> 01:08:06,460

 
1376
01:08:06,460 --> 01:08:09,610

 
1377
01:08:09,610 --> 01:08:11,890

 
1378
01:08:11,890 --> 01:08:15,760

 
1379
01:08:15,760 --> 01:08:17,710

 
1380
01:08:17,710 --> 01:08:19,240

 
1381
01:08:19,240 --> 01:08:20,740

 
1382
01:08:20,740 --> 01:08:24,099

 
1383
01:08:24,099 --> 01:08:25,899

 
1384
01:08:25,899 --> 01:08:28,300

 
1385
01:08:28,300 --> 01:08:29,919

 
1386
01:08:29,919 --> 01:08:34,749

 
1387
01:08:34,749 --> 01:08:36,579

 
1388
01:08:36,579 --> 01:08:40,599

 
1389
01:08:40,599 --> 01:08:42,820

 
1390
01:08:42,820 --> 01:08:44,620

 
1391
01:08:44,620 --> 01:08:46,149

 
1392
01:08:46,149 --> 01:08:48,490

 
1393
01:08:48,490 --> 01:08:51,820

 
1394
01:08:51,820 --> 01:08:55,450

 
1395
01:08:55,450 --> 01:08:58,359

 
1396
01:08:58,359 --> 01:09:02,349

 
1397
01:09:02,349 --> 01:09:05,200

 
1398
01:09:05,200 --> 01:09:06,970

 
1399
01:09:06,970 --> 01:09:10,419

 
1400
01:09:10,419 --> 01:09:15,789

 
1401
01:09:15,789 --> 01:09:18,189

 
1402
01:09:18,189 --> 01:09:20,379

 
1403
01:09:20,379 --> 01:09:22,930

 
1404
01:09:22,930 --> 01:09:27,309

 
1405
01:09:27,309 --> 01:09:31,059

 
1406
01:09:31,059 --> 01:09:32,829

 
1407
01:09:32,829 --> 01:09:37,620

 
1408
01:09:37,620 --> 01:09:40,840

 
1409
01:09:40,840 --> 01:09:44,109

 
1410
01:09:44,109 --> 01:09:46,419

 
1411
01:09:46,419 --> 01:09:52,180

 
1412
01:09:52,180 --> 01:09:55,149

 
1413
01:09:55,149 --> 01:09:57,490

 
1414
01:09:57,490 --> 01:09:59,200

 
1415
01:09:59,200 --> 01:10:02,590

 
1416
01:10:02,590 --> 01:10:06,879

 
1417
01:10:06,879 --> 01:10:10,090

 
1418
01:10:10,090 --> 01:10:12,669

 
1419
01:10:12,669 --> 01:10:17,290

 
1420
01:10:17,290 --> 01:10:19,359

 
1421
01:10:19,359 --> 01:10:28,459

 
1422
01:10:28,459 --> 01:10:30,810

 
1423
01:10:30,810 --> 01:10:33,240

 
1424
01:10:33,240 --> 01:10:36,479

 
1425
01:10:36,479 --> 01:10:40,320

 
1426
01:10:40,320 --> 01:10:41,790

 
1427
01:10:41,790 --> 01:10:46,890

 
1428
01:10:46,890 --> 01:10:48,090

 
1429
01:10:48,090 --> 01:10:50,010

 
1430
01:10:50,010 --> 01:10:52,560

 
1431
01:10:52,560 --> 01:10:54,120

 
1432
01:10:54,120 --> 01:10:57,680

 
1433
01:10:57,680 --> 01:11:00,030

 
1434
01:11:00,030 --> 01:11:02,180

 
1435
01:11:02,180 --> 01:11:04,800

 
1436
01:11:04,800 --> 01:11:06,840

 
1437
01:11:06,840 --> 01:11:09,200

 
1438
01:11:09,200 --> 01:11:14,160

 
1439
01:11:14,160 --> 01:11:16,410

 
1440
01:11:16,410 --> 01:11:19,379

 
1441
01:11:19,379 --> 01:11:21,510

 
1442
01:11:21,510 --> 01:11:23,729

 
1443
01:11:23,729 --> 01:11:25,950

 
1444
01:11:25,950 --> 01:11:27,930

 
1445
01:11:27,930 --> 01:11:30,060

 
1446
01:11:30,060 --> 01:11:34,290

 
1447
01:11:34,290 --> 01:11:36,000

 
1448
01:11:36,000 --> 01:11:38,010

 
1449
01:11:38,010 --> 01:11:40,550

 
1450
01:11:40,550 --> 01:11:43,740

 
1451
01:11:43,740 --> 01:11:46,500

 
1452
01:11:46,500 --> 01:11:48,899

 
1453
01:11:48,899 --> 01:11:50,490

 
1454
01:11:50,490 --> 01:11:53,520

 
1455
01:11:53,520 --> 01:11:55,820

 
1456
01:11:55,820 --> 01:11:59,010

 
1457
01:11:59,010 --> 01:12:02,879

 
1458
01:12:02,879 --> 01:12:04,560

 
1459
01:12:04,560 --> 01:12:06,540

 
1460
01:12:06,540 --> 01:12:08,550

 
1461
01:12:08,550 --> 01:12:10,680

 
1462
01:12:10,680 --> 01:12:14,189

 
1463
01:12:14,189 --> 01:12:21,240

 
1464
01:12:21,240 --> 01:12:23,920

 
1465
01:12:23,920 --> 01:12:28,870

 
1466
01:12:28,870 --> 01:12:31,810

 
1467
01:12:31,810 --> 01:12:33,970

 
1468
01:12:33,970 --> 01:12:37,710

 
1469
01:12:37,710 --> 01:12:40,480

 
1470
01:12:40,480 --> 01:12:43,030

 
1471
01:12:43,030 --> 01:12:45,370

 
1472
01:12:45,370 --> 01:12:47,260

 
1473
01:12:47,260 --> 01:12:49,480

 
1474
01:12:49,480 --> 01:12:52,600

 
1475
01:12:52,600 --> 01:12:55,210

 
1476
01:12:55,210 --> 01:12:57,730

 
1477
01:12:57,730 --> 01:12:59,560

 
1478
01:12:59,560 --> 01:13:01,720

 
1479
01:13:01,720 --> 01:13:03,580

 
1480
01:13:03,580 --> 01:13:06,490

 
1481
01:13:06,490 --> 01:13:08,050

 
1482
01:13:08,050 --> 01:13:09,400

 
1483
01:13:09,400 --> 01:13:11,110

 
1484
01:13:11,110 --> 01:13:12,850

 
1485
01:13:12,850 --> 01:13:14,230

 
1486
01:13:14,230 --> 01:13:19,690

 
1487
01:13:19,690 --> 01:13:21,250

 
1488
01:13:21,250 --> 01:13:24,070

 
1489
01:13:24,070 --> 01:13:26,530

 
1490
01:13:26,530 --> 01:13:28,300

 
1491
01:13:28,300 --> 01:13:30,010

 
1492
01:13:30,010 --> 01:13:31,510

 
1493
01:13:31,510 --> 01:13:35,110

 
1494
01:13:35,110 --> 01:13:38,740

 
1495
01:13:38,740 --> 01:13:40,210

 
1496
01:13:40,210 --> 01:13:49,880

 
1497
01:13:49,880 --> 01:13:53,520

 
1498
01:13:53,520 --> 01:13:55,470

 
1499
01:13:55,470 --> 01:13:58,050

 
1500
01:13:58,050 --> 01:13:59,790

 
1501
01:13:59,790 --> 01:14:04,530

 
1502
01:14:04,530 --> 01:14:08,610

 
1503
01:14:08,610 --> 01:14:09,960

 
1504
01:14:09,960 --> 01:14:11,870

 
1505
01:14:11,870 --> 01:14:15,780

 
1506
01:14:15,780 --> 01:14:17,850

 
1507
01:14:17,850 --> 01:14:23,460

 
1508
01:14:23,460 --> 01:14:25,770

 
1509
01:14:25,770 --> 01:14:27,900

 
1510
01:14:27,900 --> 01:14:30,120

 
1511
01:14:30,120 --> 01:14:32,550

 
1512
01:14:32,550 --> 01:14:34,290

 
1513
01:14:34,290 --> 01:14:36,300

 
1514
01:14:36,300 --> 01:14:38,670

 
1515
01:14:38,670 --> 01:14:40,920

 
1516
01:14:40,920 --> 01:14:43,979

 
1517
01:14:43,979 --> 01:14:46,620

 
1518
01:14:46,620 --> 01:14:49,770

 
1519
01:14:49,770 --> 01:14:53,729

 
1520
01:14:53,729 --> 01:14:56,340

 
1521
01:14:56,340 --> 01:14:59,070

 
1522
01:14:59,070 --> 01:15:00,900

 
1523
01:15:00,900 --> 01:15:04,530

 
1524
01:15:04,530 --> 01:15:06,750

 
1525
01:15:06,750 --> 01:15:11,670

 
1526
01:15:11,670 --> 01:15:14,990

 
1527
01:15:14,990 --> 01:15:17,880

 
1528
01:15:17,880 --> 01:15:19,590

 
1529
01:15:19,590 --> 01:15:22,050

 
1530
01:15:22,050 --> 01:15:24,510

 
1531
01:15:24,510 --> 01:15:26,370

 
1532
01:15:26,370 --> 01:15:28,170

 
1533
01:15:28,170 --> 01:15:32,130

 
1534
01:15:32,130 --> 01:15:34,740

 
1535
01:15:34,740 --> 01:15:36,720

 
1536
01:15:36,720 --> 01:15:38,910

 
1537
01:15:38,910 --> 01:15:41,280

 
1538
01:15:41,280 --> 01:15:43,320

 
1539
01:15:43,320 --> 01:15:46,830

 
1540
01:15:46,830 --> 01:15:49,229

 
1541
01:15:49,229 --> 01:15:52,020

 
1542
01:15:52,020 --> 01:15:53,850

 
1543
01:15:53,850 --> 01:15:57,540

 
1544
01:15:57,540 --> 01:16:00,600

 
1545
01:16:00,600 --> 01:16:02,489

 
1546
01:16:02,489 --> 01:16:04,140

 
1547
01:16:04,140 --> 01:16:06,419

 
1548
01:16:06,419 --> 01:16:13,439

 
1549
01:16:13,439 --> 01:16:16,200

 
1550
01:16:16,200 --> 01:16:20,189

 
1551
01:16:20,189 --> 01:16:23,129

 
1552
01:16:23,129 --> 01:16:25,350

 
1553
01:16:25,350 --> 01:16:28,049

 
1554
01:16:28,049 --> 01:16:29,339

 
1555
01:16:29,339 --> 01:16:31,200

 
1556
01:16:31,200 --> 01:16:34,799

 
1557
01:16:34,799 --> 01:16:37,469

 
1558
01:16:37,469 --> 01:16:40,520

 
1559
01:16:40,520 --> 01:16:42,540

 
1560
01:16:42,540 --> 01:16:45,330

 
1561
01:16:45,330 --> 01:16:50,489

 
1562
01:16:50,489 --> 01:16:52,229

 
1563
01:16:52,229 --> 01:16:57,149

 
1564
01:16:57,149 --> 01:17:00,439

 
1565
01:17:00,439 --> 01:17:02,939

 
1566
01:17:02,939 --> 01:17:05,850

 
1567
01:17:05,850 --> 01:17:08,689

 
1568
01:17:08,689 --> 01:17:11,100

 
1569
01:17:11,100 --> 01:17:15,029

 
1570
01:17:15,029 --> 01:17:17,339

 
1571
01:17:17,339 --> 01:17:19,469

 
1572
01:17:19,469 --> 01:17:22,439

 
1573
01:17:22,439 --> 01:17:25,009

 
1574
01:17:25,009 --> 01:17:32,359

 
1575
01:17:32,359 --> 01:17:34,979

 
1576
01:17:34,979 --> 01:17:36,750

 
1577
01:17:36,750 --> 01:17:39,899

 
1578
01:17:39,899 --> 01:17:41,399

 
1579
01:17:41,399 --> 01:17:43,609

 
1580
01:17:43,609 --> 01:17:46,259

 
1581
01:17:46,259 --> 01:17:48,899

 
1582
01:17:48,899 --> 01:17:50,339

 
1583
01:17:50,339 --> 01:17:53,879

 
1584
01:17:53,879 --> 01:17:56,939

 
1585
01:17:56,939 --> 01:17:59,310

 
1586
01:17:59,310 --> 01:18:02,250

 
1587
01:18:02,250 --> 01:18:04,140

 
1588
01:18:04,140 --> 01:18:08,270

 
1589
01:18:08,270 --> 01:18:11,489

 
1590
01:18:11,489 --> 01:18:13,799

 
1591
01:18:13,799 --> 01:18:15,419

 
1592
01:18:15,419 --> 01:18:15,429

 
1593
01:18:15,429 --> 01:18:15,930


1594
01:18:15,930 --> 01:18:18,600

 
1595
01:18:18,600 --> 01:18:20,310

 
1596
01:18:20,310 --> 01:18:22,920