1
00:00:01,069 --> 00:00:03,189

the following content is provided under  a Creative Commons license your support  will help MIT OpenCourseWare continue to  offer high quality educational resources  for free  to make a donation or to view additional  materials from hundreds of MIT courses  visit MIT opencourseware at ocw.mit.edu  today we're gonna talk about multi-core  programming and I as I was just informed  by Charles it's 2018  I had 2017 on the slide okay so first  congratulations to all of you you turned  in the first projects beta here's a  here's a plot showing the tiers that  different groups reach for the beta and  this is in sorted order and we set the  beta caught off to be tier 45 the final  cut off is tier 48 so the final cut off  we did set a little bit aggressively but  keep in mind that you don't necessarily  have to get to the final cut off in  order to get an A on this project okay  so we're going to talk about multi-core  processing today that's gonna be a topic  of the next project after you finish the  first project so so in a multi-core  processor we have a whole bunch of cores  that are all placed on the same chip and  they have access to shared memory they  usually also have some sort of private  cache and then a shared last level cache  so l3 in this case and then they they  all have access the same memory  controller which goes out to main memory  and then they also have access to IO but  for a very long time  chips only had a single core on them so  why do we have multi-core processors  nowadays why did semiconductor vendors  start producing chips that had multiple  processor cores on them so the answer is  because of two things so first there's  Moore's law which says that we get more  trans  sisters every year so the number of  transistors that you can fit on a chip  doubles approximately every two years  and secondly there's an end of scaling  of clock frequencies so for a very long  time we could just keep increasing the  frequency of the single core on the chip  but at around 2004 to 2005 that was no  longer the case  we couldn't scale the clock frequency  anymore so here's a plot showing both  the number of transistors you can fit on  the chip over time as well as a clock  frequency of the processors over time  and notice that the y-axis is in log  scale here and the blue line is  basically Moore's law which says that  the number of transistors you can fit on  the chip doubles approximately every two  years and that's been growing pretty  steadily so this plot goes up to 2010  but in fact it's been growing even up  until the president it will continue to  grow for a couple more years before  Moore's law ends however if you look at  the clock frequency line you see that it  was growing quite steadily until about  the early 2000s and then at that point  is sort of flattened out so at that  point we couldn't increase the clock  frequencies anymore and the clock speed  was bounded at about 4 gigahertz so  nowadays if you go buy a processor is  usually still bounded by around 4  gigahertz it's usually a little bit less  than 4 gig Hertz because it doesn't  really make sense to push it all the way  but you might find some processors that  are around 4 gigahertz nowadays so so  what happened at around 2004 to 2005  does anyone know  okay so so Moore's law basically says  that we can fit more transistors on a  chip because the transistors become  smaller and when the transistors become  smaller you can reduce the voltage  that's needed to operate the transistors  and as a result you could increase the  clock frequency while maintaining the  same power density and that's what  manufacturers did until about 2004 to  2005 they just kept increasing the clock  frequency to take advantage of Moore's  law but turns out that once transistors  become small enough in the voltage used  to operate them becomes small enough  there's something called leakage current  so there's current that leaks and we're  unable to keep reducing the voltage  while still having reliable switching  and if you can't reduce a voltage  anymore then you can't increase the  frequency the clock frequency if you  want to keep the same power density so  here's a plot from Intel back in 2004  when they first started producing  multi-core processors and this is  plotting the power density versus time  and again the y-axis is in log scale  here so the green data points are actual  data points and the orange ones are  projected and they projected what the  power density would be if we kept  increasing the clock frequency at a  trend of about 25 to 30 percent per year  which is what happened up until around  2004 and because we couldn't reduce the  voltage anymore the power density will  go up and you can see that eventually it  reaches the power density of a nuclear  reactor which is pretty hot and then and  then it reaches the power density of a  rocket nozzle and eventually you get to  the power density of a of the sun's  surface so if you have a chip that's  that has a power density equal to the  Sun surface well you don't actually  really have a chip anymore  you right and so basically if you get  into this orange region you basically  have a fire and you can't really do  anything interesting in terms of  performance engineering at that point so  to solve this problem semiconductor vet  semiconductor vendors they didn't  include increase the clock frequency  anymore but we still had Moore's Law  giving us more and more transistors  every year so what they decided to do  with these extra transistors was to put  put them into multiple cores and then  put multiple cores on the same chip so  we can see that starting at around 2004  the number of cores per chip becomes  more than one and and each generation of  Moore's Law will potentially double the  number of cores that you can fit on a  chip because it's doubling the number of  transistors and we we've seen this trend  up until about today and again it's  going to continue for a couple more  years before Moore's law ends so that's  why we have chips with multiple cores  today so today we're going to look at  multi-core processing so I first want to  introduce the abstract multi-core  architecture so this is a very  simplified version but I can fit it on  this slide and it's a good example for  illustration so here we have a whole  bunch of processors they each have a  cache so that's indicated with the  dollar sign and usually they they have a  private cache as well as a shared cache  so I shared last level cache like the l3  cache and then they're all connected to  the network and then throughout through  the network they can connect to the main  memory they can all access the same  shared memory and then usually there's a  separate network for the i/o as well  even though I've drawn them as a single  network here so they can access the i/o  interface and potentially the network  will also connect to other multi  processors on the same system  and this abstract multi-core  architecture is known as a chip  multiprocessor or cmp so that's the  architecture that we'll be looking at  today so here's an outline of today's  lectures so first I'm gonna go over some  hardware challenges with shared memory  multi-core machines so we're gonna look  at the cache coherence protocol and then  after looking at hardware we're going to  look at some software solutions to  program to write parallel programs on  these multi-core machines to take  advantage of the extra cores and we're  gonna look at several concurrency  platforms listed here we're gonna look  at P threads this is basically a  low-level API for accessing or for  running your code in parallel and if you  program on Microsoft products the win  API threads is pretty similar  then there's Intel threading building  blocks which is a library solution to  concurrency and then there are two  linguistic solutions that we'll be  looking at openmp and so plus and so  plus is actually the concurrency  platform that will be using for most of  this class okay  all right so let's look at how caches  work so so let's say that we have a  value in memory at some location and  that value is let's say that value is x  equals 3  if one processor says we want to load X  what happens is that processor reads  this value from a main memory brings it  into its own cache and then it also  reads a value loads it into one of those  registers and it keeps this value in  cache so that if it wants to access this  value again in the near future it  doesn't have to go all the way out to  main memory you can just look at the  value in its cache now what happens if  another processor wants to load X what  just does the same thing it reads the  value from main memory brings it in  addiction is cache and then also loads  it into one of the registers and then  same thing with another processor turns  out that you don't actually always have  to go out to main memory to get the  value of the value resides in one of the  other processors caches you can also get  the value through the other processors  cache and sometimes ask cheaper than  going all the way out to main memory  okay so now okay so the second processor  now loads acts again and it's in cache  so it doesn't have to go to main memory  or anybody else's cache so what happens  now if we want to store X if we want to  set the value of X to something else  okay so let's say this processor wants  to set X equal to five so it's going to  write x equals five and store that  result in its own cache so that's all  well and good now what what happens when  the first processor wants to load X well  it sees that the value of X is is it is  in its own cache so it's just gonna read  the value of x there and it gets a value  of  three so what's the problem there yes  yes so the problem is that the value of  x in the first processors cache is stale  because another processor updated it so  now we can't this value of x in the  first processors cache is invalid okay  so so that's the problem and one of the  main challenges of multi-core hardware  is to try to solve this problem of cache  coherence making sure that the values in  different processors caches are  consistent across updates okay so one  basic protocol for solving this problem  is known as the MSI protocol and in this  protocol each cache line is labeled with  a state so there are three possible  states M s and I and this is done on the  granularity of cache lines because it  turns out that storing this information  is relatively expensive so you don't  want to store it for every memory  location so they do it on a per cache  line basis so anyone know what the size  of a cache line is on the machines that  we're using  yeah yeah so it's 64 bytes and that's  typically what you see today on most  intel and AMD machines there are some  architectures that have different cache  lines like 128 bytes but for our class  the machines that were using will have  64 byte cache lines is important to  remember that so that when you're doing  back at the envelope calculations you  can get accurate estimates so three  states in the MSI protocol or ms and iso  M stands for modified and when a cache  block is in the modified state that  means no other caches can contain this  block in the M or the a States the S  state means that the block is shared so  other caches can also have this block in  shared state and if finally I means the  cache block is invalid so that's  essentially the same as the cache block  not being in the cache and to solve the  problem of cache coherency when one  cache modifies a location it has to  inform all the other caches that their  values are now stale because because  this cache modified the value so it's  going to invalidate all of the other  copies of that cache line in other  caches by changing their state from s to  I so let's see how this works so let's  say that the second processor wants to  store y equals 5 so previously a value  of y was 17 and it was in shared State  the cache line containing y equals 17  was in shared State so now when I do y  equals 5 I'm gonna set the second  processors cache that cache line to  modified State and then I'm going to  invalidate the cache line in all of the  other caches that contain that cache  line so now the first cache in the  fourth cache each have a state of I 4 y  equals 17 because that value is still  there any questions yes  we already other yeah so there are  actually some protocols that do that so  this is just a most basic protocol so  this protocol doesn't do it but there  are some that are used in practice that  actually do do that so it's a good point  yeah but I just want to present the most  basic protocol for now sorry okay and  then when you load a value you can first  check whether whether your cache line is  it is an M or a state and if it is an M  or S state then you can just read that  value directly but if it's in the eye  state orbs or if it's not there then you  have to fetch that block from either  another processors cache or fetch it  from main memory so turns out that there  are many other protocols out there  there's something known as mes I the  Massey protocol  there's also m OE si and many other  different protocols and some of them are  proprietary and they all do different  things and it turns out that all of  these protocols are quite complicated  and it's very hard to get these  protocols right and in fact one of the  most earliest successes of formal  verification was improving some of these  cache coherence protocols to be correct  yes question  yeah so if two processors try to modify  the value one of them has to happen  first so the hardware is going to take  care of that so the first one that  actually modifies it will invalidate all  the other copies and then the second one  that modifies the value will again  invalidate all the other copies and when  you do that when a lot of processors try  to modify the same value you get  something known as an invalidation storm  so you have a bunch of invalidation  messages going through throughout the  hardware and that can lead to a big  performance bottleneck because each  processor when it modifies it's about  yes inform all the other processors and  if all the processors are modifying the  same value get the sort of quadratic  behavior the hardware is still going to  guarantee that one of the processor is  going to end up writing the value there  but you should be aware of this  performance issue when you're writing a  parallel code yes yes so this is all  implemented in hardware so if we take a  computer architecture class you'll learn  much more about these protocols and all  of their variants so for our purposes we  don't actually need to understand all  the details of the hardware we just need  to understand what it's doing at a high  level so we can understand when we have  a performance bottleneck and why we have  a performance bottleneck so that's why  I'm just introducing the most basic  protocol here any other questions  all right so so I talked a little bit  about the shared memory hardware let's  now look at some concurrency platforms  so these are the four platforms that  we'll be looking at today so first what  is a concurrency platform both writing  parallel programs is very difficult it's  very hard to get these programs to be  correct and if you want to optimize  their performance it becomes even harder  so it's very painful and error-prone and  a concurrency platform abstracts  processor cores and handles  synchronization and communication  protocols and it also performs load  balancing for you so it makes your lives  much easier and so today we're gonna  talk about some of these different  concurrency platforms so to illustrate  these concurrency platforms I'm gonna do  the Fibonacci numbers example so does  anybody not know what Fibonacci is so  good everybody knows what Fibonacci is  okay so so it's a sequence where each  numbers are sum of the previous two  numbers and the recurrence here is the  recurrence is shown in this brown box  here the sequence is named after  Leonardo di Pisa who was also known as  Fibonacci which is a contraction of  filius Bonacci son of Bonacci oh so  that's where the name Fibonacci came  from and in Fibonacci's 1202 book libra  bocce he introduced the sequence the  Fibonacci sequence to Western  mathematics although it had been  previously known to India and  mathematicians for several centuries but  this is what we call the the sequence  nowadays Fibonacci numbers  all right so here's a Fibonacci program  has anyone seen this out algorithm  before a couple people probably more but  people didn't raise their hands okay so  all right so it's a recursive program so  it basically implements the recurrence  from the previous slide so if n is less  than 2 we just return n otherwise we  compute fib of n minus 1 store that  value in X filled with n minus 2 store  that value in Y and then return the sum  of x and y so I do want to make a  disclaimer to the algorithms police that  this is actually a very bad algorithm so  this algorithm takes exponential time  and there's actually much better ways to  compute the Fibonacci thence the Bonacci  number there's a linear time algorithm  which just computes the Fibonacci  numbers from bottom up this algorithm  here is actually redoing a lot of the  work because it's computing Fibonacci  numbers multiple times whereas if you do  a linear scan from the smallest numbers  up you only have to compute each one  once and there's actually an even better  algorithm that takes logarithmic time  and it's based on squaring matrices so  has anyone seen that algorithm before  yeah so a couple people so if you're  interested in learning more about this  algorithm I encourage you to look at  your favorite textbook introduction  algorithms by cormen leiserson rivest  and stein so even though this is a yes  so even though this is a pretty bad  algorithm it's still a good educational  example because I can fit it on on one  slide and illustrate all the concepts of  parallelism that we want to cover today  so here's the execution tree for fib of  4  so we see that fair before is going to  call fib of 3 in favor of to fib of 3 is  gonna call fib of 2 fib of 1 and so on  and you can see that repeated  computations here it's a fib of 2 is  being computed twice and so on and if  you have a much larger tree say you ran  it ran this on fit before T then you'll  have many more overlapping computations  it turns out that the two recursive  calls can actually be paralyzed because  they're completely independent  calculations so the key idea for  parallelization is to simultaneously  execute the two recursive sub calls to  fib and in fact you can do this  recursively so the two sob calls to Fela  3 can also be executed in parallel and  the two sub calls a fiblet 2 can also be  executed in parallel on so on so you  have all of these calls that can be  executed in parallel so that's a key  idea for extracting parallelism from  this algorithm okay so let's now look at  how we can use P threads to implement  this simple Fibonacci algorithm so P  threads is a standard API for threading  and it's supported on all UNIX based  machines and if you're programming using  Microsoft products then then the  equivalent is when API threads and this  is actually a P threads is actually  standard and ANSI I Triple E so there's  this number here that specifies a  standard but nowadays we just call it P  threads and it's basically a  do-it-yourself concurrency platforms so  it's like the assembly language of  parallel programming it's built as a  library of functions with special non C  semantics because if you're just writing  code in C you can't really say which  parts of the code should be executed in  parallel so P threads provides you a  library of functions that allow you to  specify concurrency  your program and each thread implements  an abstraction of a processor and these  threads are then multiplex onto the  actual machine resources so the number  of threads that you create doesn't  necessarily have to match the number of  processors you have on your machine so  if you have more threads than the number  of processors you have then they'll just  be multiplexing so you can actually run  a pthreads program on a single single  core even though you have multiple  threads in the program they would just  be time sharing all the threads  communicate through shared memory so  they all have access to the same view of  the memory and the library functions  that pthreads provides mask the  protocols involved in inner thread  coordination so you don't have to do it  yourself because it turns out that this  is quite difficult to do correctly by  hand so now I want to look at the key P  thread functions so the first P thread  function is pthread create and this  takes four arguments so the first  argument is this a P thread underscore T  type this is this is basically going to  store and identify for the new thread  that pthread create will create so that  we can use that thread in our  computations P thread attribute T decide  some thread attributes and for our  purposes we can just set it to null and  use the default attributes the third  argument is this function that's going  to be executed after we create the  thread so we're going to need to define  this function that we want the thread to  execute and then finally we have this  void star argue arg argument which  stores the arguments that are going to  be passed to the function that we're  going to be executing and then pthread  create also returns an error status  returns an integer specifying whether  the thread creation was successful or  not and then there's another  function called pthread join pthread  join basically says that we want to  block at this part of our code until all  the until so until this specified thread  finishes so it takes us argument  pthread underscore t so this thread  identifier and these thread identifiers  were created when we call p thread  pthread create also has the second  argument status which is going to store  the status of the terminating thread and  that p thread join also returns an error  status so essentially what this does is  says to wait until there's threat  finishes before we continue on in our  program so any any questions so far  ok so here's what the implementation of  Fibonacci looks like using P threads so  on the Left we see the original program  that we had the fib function there  that's just the sequential code and then  we have all this other stuff to enable  it to run in parallel so first we have  this struct on the left thread args this  struct here is used to store the  arguments that are passed to the  function that the thread is going to  execute and then we have this thread  func what that does is it it reads the  input argument from this thread args  struct and then it sets that to I and  then it calls fib of I and that gives  you the output and then we store the  result into the output of the struct and  then that also just returns null and  then over on the right hand side we have  the main function that will actually  call the fib function on the left so we  initialize a whole bunch of variables  that we need for  to execute these threads and then we  first check if n is less than 30 if n is  less than 30 it turns out that it's  actually not worse  creating threats to execute this program  in parallel because of the overhead of  thread creation so if n is less than 30  we'll just execute the program  sequentially and this idea is known as  coarsening so you saw a similar example  couple lectures ago when we did  coarsening for sorting but this is in  the context of parallel program so here  because there are some overheads to  running function parallel if the input  size is small enough sometimes you want  to just execute it sequentially and then  we're going to okay so let me just walk  through this code since have an  animation okay so the next thing it's  going to do is it's going to marshal the  input argument to the threat so it's  going to store the input argument n  minus 1 in this args struct and then  we're gonna call pthread create with a  thread variable for thread arts we're  just gonna use null and then we're gonna  pass the thread func that we defined on  the left and then we're going to pass  the args structure and in inside this  arc structure the input is set to n  minus 1 which we did on the previous  line okay and then pthread create is  going to it's going to give a return  value so if if the P thread creation was  successful then the status is going to  be no and we can continue and if that  and when we continue we're going to  execute now fib of n minus 2 and store  the result of that into our result  variable and this is done at the same  time that fib of n minus 1 is executing  because we created this P thread and we  told it to call this thread func  function that we defined on the left so  fib of n minus 1 and fib of n minus 2  are executing in parallel now and then  we have this pthread join which says  we're going to wait until the thread  that we created finishes before we move  on because we need to know the result of  both of the sub calls before we can  finish this function and once that's  done well we first check the status to  see if it was successful and if so then  we add the outputs of the arguments  truck to the results so arg star output  will store the output of fib of n minus  1 so that's the that's the pthreads code  any questions on how this works yeah  yeah  yeah so this is because the pthread  create function takes as input void star  pointer because it's actually a generic  function so it doesn't know what the  data type is it has to work for all data  types and that's why we need to cast it  to void star when we pass it the pthread  create and then inside the thread func  we actually do know what type of pointer  that is so then we cast it yeah so does  this code seem very parallel so how many  parallel calls am i doing here yeah  yeah so I'm only creating one threat so  I'm executing two things in parallel so  if I ran this code on four processors  what's the maximum speed-up I could get  so maximum speed-up I can get is just  two because I'm only creating I'm only  running two things in parallel so this  doesn't recursively create threads only  creates threads at the it only creates  one thread at the top level and if you  wanted to make make it so that this code  actually recursively created threads it  would actually become much more  complicated and that's one of the  disadvantages of implementing this code  in P threads so we'll look at other  solutions that will make this task much  easier so some of the issues with P  threads are shown on this slide here so  there's a high overhead to creating a  thread so creating a thread typically  takes over 10 to the 4/4 cycles and this  leads to very coarse-grained concurrency  because your tasks have to do a lot of  work in order to amortize the cost of  creating that thread there are something  called thread pools which can help and  the idea here is to create a whole bunch  of threads at the same time to amortize  the cost of thread creation and then  when you need a thread you just take one  from the thread pool so the thread pool  contains threads that are just waiting  to do work there's also a scalability  issue with this code that I showed on  the previous slide the Fibonacci code  gets at most one point five x speed-up  for two cores why is it one point five  here does anyone know yeah  yeah so it turns out that the two the  two calls and I'm executing in parallel  they're not doing the same amount of  work so one is computing fib of n minus  one one is computing fib of n minus twos  and does anyone know what the ratio  between these two values is yeah so it's  the golden ratio it's about 1.6 it turns  out that if you can get a speed-up of  1.6 and that's great but there are some  overheads so this code will get about a  1.5 speed-up and if you want to run this  to take at the edge of more course and  you need to rewrite this code and it  becomes more complicated third there's  the issue of modularity so if you look  at this code here you see that the  Fibonacci logic is not nicely  encapsulated within one function we have  that logic in the function on the left  but then we also have some of the fib  logic on the right in our main function  and this makes it this makes this code  not modular and if we if we want to  build programs on top of this it makes  it very hard to maintain if we want to  just change the logic of the Fibonacci  function a little bit because now we  have to change it in multiple places  instead of just having everything in one  place so it's not a good idea to write  code that's not modular so please don't  do it do that in your projects okay and  then finally the code becomes  complicated because you have to actually  move these arguments around that's known  as argument marshalling and then you  have to engage in error-prone protocols  in order to do load balancing so if you  recall here we have to actually place  the argument n minus 1 into into our dot  input and we have to extract the value  out of Arts dot output so that makes the  code very messy  so why do I say shades of 1958 here  suddenly one no what what happened in  1958 who is around in 1958 Jess Charles  so there is a first something in 1958  what was it so turns out in 1958 we had  the first compiler and this was the  Fortran compiler and before we had the  fortune compiler programmers were  writing things in assembly and when you  write things in assembly you have to do  argument argument marshalling because  you have to place things into the  appropriate registers before calling a  function and also move things around  when you return from a function and the  nice thing about the first compiler is  that it actually did all of this  argument marshalling for you so now you  can just pass arguments to a function  and the compiler will generate code that  will do the argument marshalling for for  us so having you do this in P threads is  similar to having to write code in  assembly because you have to actually  manually marshal these arguments so  hopefully there are better ways to do  this and indeed we'll look at some other  solutions that will make it easier on  the programmer any questions before I  continue  all right so we looked at pthreads next  let's look at threading building blocks  so threading building blocks is a  library solution  it was developed by Intel and it's  implemented as a C++ library that runs  on top of native threads so the  underlying implementation uses threads  but the programmer doesn't deal with  threads instead the programmer specifies  tasks and these tasks are automatically  load balanced across the threads using a  work-stealing algorithm inspired by  research at MIT charles license research  in the focus of Intel TBB is on  performance and as we'll see the code  written using TBB is simpler than what  you would have to write if you used P  threads so let's look at how we can  implement Fibonacci using TBB okay so in  TBB we have to create these tasks so in  the Fibonacci code we create this v as  class and inside the task we have to  define this execute function so the xq  function is a function that performs the  computation when we start the task and  this is where we define the Fibonacci  logic this task also takes as input  these arguments parameter and in sum so  and as they input here in sum is the  output okay and in T bbbb we can easily  create a recursive program that extracts  more parallelism and here what we're  doing is we're recursively creating two  child tasks a and B that's a syntax for  creating the task and here we can just  pass two arguments to fib tasks instead  of marshaling the arguments ourselves  and and then what we have here is a set  ref count  and this basically is the number of  casts that we have to wait for plus 1 so  plus one for ourselves and in this case  we create a two children task and we  have ourselves so that's 2 plus 1 and  then after that we start has be using  the spawn be call and then we do spawn  and wait for all with a as the argument  in this place he says we're going to run  a start task a and then also wait for  both a and B to finish before we proceed  so this spawn and wait for all call is  going to look at the raft count that we  set above and wait for that many tasks  to finish before it continues and then  after after both a and B have completed  then we can just sum up the results and  store that in the sum variable and here  these tasks are created recursively so  unlike the pthreads implementation that  was only creating one thread at the top  level here we're actually recursively  creating more and more tasks so we can  actually get more parallelism from this  code and scale to more processors we  also need this main function just to  start up the program so what we do here  is we create a root task which just  computes fib of n and that we call spawn  root and weight of a so a is it has for  the root and then it will just run the  root task so that's what T Fibonacci  looks like in TBB so this is much  simpler than the pthreads implementation  and it also gets better performance  because we can extract more parallelism  from from the computation any questions  okay so TBB also has many other features  in addition to tasks  so TBB provides many C++ templates to  express common patterns and you can use  these templates on different data types  so they have a parallel for which is  used to express loop parallelism so you  can loop over a bunch of iterations in  parallel they also have a parallel  reduce for data aggregation for example  if you if you want to sum together a  whole bunch of values you can use a  parallel reduce to do that in parallel  they also have a pipeline and filter  that's used for software pipelining TBB  provides many concurrent container  classes which allow multiple threats is  safely access and update the items in  the container concurrently so for  example they have hash tables trees  priority queues and so on and you can  just use these out-of-the-box and don't  work in parallel you can do concurrent  updates and reads to these data  structures  TPB also has a variety of mutual  exclusion library functions such as  locks and atomic operations so there are  a lot of features of TBB which is why is  one of the more popular concurrency  platforms and because of all of these  features you don't have to implement  many of these things by yourself and  still get pretty good performance so TBB  was a library solution to the  concurrency problem now we're gonna look  at two linguistics solutions OpenMP and  silk so start with OpenMP so OpenMP is a  specification by an industry consortium  and there are several compilers  available that support OpenMP both open  source and proprietary so nowadays GCC  icc and clang all support OpenMP as well  as visual studio and openmp is it  provides linguistic extensions to C and  C++ as well as Fortran and  form of compiler pragmas so you use  these compiler pragmas in your code to  specify which pieces of code should run  in parallel and OpenMP also runs on top  of native threads but the programmer  isn't exposed to these threads OpenMP  supports loop parallelism so you can do  parallel for loops  they have task parallelism as well as  pipeline parallelism so let's look at  how we can implement fibonacci in OpenMP  so this is the entire code so I want you  to compare this to the P threads  implementation that we saw 10 minutes  ago so so this code is much cleaner than  the P threads implementation it it also  performs better so let's see how this  code works so we have these compiler  pragmas or compiler directives and the  compiler pragma for creating a parallel  task is a OMP task so we're gonna create  a openmp task for faileth n minus 1 as  well it's a Fablab n minus 2 there's  also this yeah there's also this shared  pragma which specifies that the two  variables in the arguments are shared  across different threads so you also  have to specify whether variables are  private or shared and then the pragma  OMP weight just says we're gonna wait  for the preceding task to complete  before we continue so here is going to  wait for fib of n minus 1 and fib of n  minus 2 to finish before we return the  result which is what we want and then  after that we just returned X plus y so  that's the entire code  and OpenMP also provides many other  pragma directives in addition to tasks  so we can use a parallel for to do loop  parallelism  there's reduction there's also  directives for scheduling and data  sharing so you can specify how you want  a particular loop to be scheduled OpenMP  has many different scheduling policies  they have static parallelism dynamic  parallelism and so on and then these  scheduling directives also have  different grain sizes the data sharing  directives are specifying whether a  variables are private or shared OpenMP  also supplies a variety of  synchronization constructs such as  barriers atomic updates mutual exclusion  or mutex locks so openmp also has many  features and it's also one of the more  popular solutions to writing parallel  programs and as you saw in the previous  example the code is much simpler than if  you were to write something using  pthreads or even a TVB this is a much  simpler solution any questions  yeah  so so this code here is actually  independent of the number of processors  so there is actually a scheduling  algorithm that will determine how the  casket map to different processors so if  you spawned in you test it doesn't  necessarily put it on the different  processor and you can have more tasks  than the number of processes available  so there's a scheduling algorithm that  will take care of how these tasks get  mapped to different processors and  that's hidden from the programmer  although you can't use these scheduling  pragmas to give hints the compiler how  it should schedule it yeah us so I mean  underneath this is implemented using P  threads which has to make operating  system calls to basically directly talk  to the processor cores and do  multiplexing and so forth so it's the  operating system is involved at a very  low level  okay so the last concurrency platform  that we'll be looking at today is silk  okay so or we're gonna look at so plus  actually in the silk part of silk plus  is a small set of linguistic extensions  to C and C++ that support for joint  parallelism so for example the Fibonacci  example uses fork/join parallelism so  you can use soap to implement that and  the plus part of soap Plus supports  vector parallelism which you had  experience working with in your  homeworks so silk was so plus was  initially devolved by silk arts which  was an MIT spin-off and silk arts was  acquired by Intel in July 2009 and the  so plus implementation is based on award  winning based on the award-winning silk  multi-threaded languished I was  developed two decades ago here at MIT by  Charles artisans research group and it  features a provably efficient work  stealing scheduler so this schedule is  provably efficient you can actually  prove theoretical bounds on it and this  allows you to implement theoretically  efficient algorithms which we'll talk  more about in another lecture algorithm  design but it provides a provably  efficient work stealing scheduler and  Charles Larsen has a very famous paper  that has a proof of that this scheduler  is optimal so if you're interested in  reading about this you can talk to us  offline so plus also provides a hyper  object library for paralyzing code with  global variables and you'll have a  chance to play around with hyper objects  in homework for the so plus ecosystem  also includes useful programming tools  such as a silk screen race detectors so  this allows you to detect determinacy  races in your program to help you  isolate bugs and performance bottlenecks  it also has a scalability analyzer  called silk view and silk view will base  analyze the amount of work that your  program is doing as well as the maximum  amount of parallelism that your code  could possibly extract from the hardware  so that's in Tulsa plus but it turns out  that we're not actually going to be  using in Tulsa plus in this class we're  gonna be using a better compiler and  this compiler is based on taper LLVM and  it supports the silk subset of silk plus  and taper LLVM was actually recently  developed at MIT by TB charnel who gave  a lecture last week  William Moses who's a grad student  working with Charles as well as charles  leiserson and so talking a lot about  Charles's work today and taper LVM  generally produces better code relative  to its base compiler than all other  implementations of silk out there so  it's the best cell compiler that's  available today and they actually wrote  a very nice paper on this last year  Charles lisen in this group and that  paper received the best paper award at  the annual symposium on principles and  practices of parallel programming or  p-pop so you should look at that paper  as well  so right now taper LVM uses the Intel  so plus runtime system by I believe  Charles's group has plans to implement a  better runtime system and tabor LVM also  supports more general features than  existing sole compilers so in addition  to spawning functions you can also spawn  code blocks that are not separate  functions and this makes your this right  makes writing programs more flexible you  don't have to actually create a separate  function if you want to execute a code  block in parallel  any questions  so this is the salt coat for Fibonacci  so it's also pretty simple it looks very  similar to the sequential program except  we have these silk spawn and silk stink  statements in the code so what do these  statements do so  silk spawn says that the named child  function which is the function that is  right after the silk spawn statement may  execute in parallel with the parent  caller the parent caller is the function  that is calling soaked spawn so this  says that fib of n minus 1 can execute  in parallel with the function that  called it and then this function is then  gonna call fib of n minus 2 and fib of n  minus 2 and fib of n minus 1 now can be  executing in parallel and then silk sync  says that control cannot pass this point  until all of the spawn children have  returned since it's going to wait for  fifth of N minus 1 to return before we  go to the return statement where we add  up X&Y so one important thing to note is  that the so key where it's grant  permission for parallel execution but  they don't actually force or command  parallel execution so even though I said  soak spawn here the the runtime system  doesn't necessarily have to run fib of n  minus 1 in parallel with fib of n minus  2 I'm just saying that I could run these  two things in parallel and it's up to  the runtime system to decide whether or  not to run these things in parallel  based on its scheduling policy so let's  look at another example of silk so let's  look at loop parallelism so here we want  to do a matrix transpose and we want to  do this in place so the idea here is we  want to basically swap the elements  below the diagonal to the  - its mirror mirror image above the  diagonal and here's some code to do this  so we have a silk for so this is  basically a parallel for loop it goes  from I equals 1 to n minus 1 and then  the inner for loop goes from J equals 0  up to I minus 1 and then we just swap a  of IJ with a of ji using these three  statements inside the body of the for  loop so so to execute a for loop in  parallel you just have to add silk  underscore to the to the for keyword and  that's as simple as it gets  so this code is actually going to run in  parallel and get a pretty good speed up  on this for this particular problem and  internally silk for loops are  transformed into nested silks spawn in  silks Inc calls so the compiler is going  to get rid of the silk for and change it  into silks spawn and silks Inc so it's  going to recursively divide the  iteration space into half and then it's  going to spawn off one half in the next  acute the other half in parallel with  that and then recursively do that until  the iteration range becomes small enough  at which point doesn't make sense to  execute it in parallel anymore so we  just execute that range sequentially  okay so that's loop parallelism in silk  any questions yes  there's something weird can I still do  that yeah so the compiler can actually  figure out what the iteration space is  so you don't necessarily have to be  incrementing by one you can do something  else you just have to guarantee that all  of the iterations are independent so if  if you act if you have a determinacy  race across the different iterations of  your cell for loop then your result  might not necessarily be correct so you  have to make sure that the iterations  are indeed independent yes yeah so you  can that's silk force it turns out that  for this example usually you already  have enough parallelism in the outer  loop for large enough values of n so it  doesn't make sense to put a slope for  loop inside because using a silk for  loop adds some additional overheads  but you can actually do nested silk for  loops and in some cases it does make  sense especially if the if there's not  enough parallelism in that outermost for  loop it's a good question yes  so I mean it it it has a bunch of calls  to the silk run time system I don't  actually I don't I mean I don't know all  the details because I haven't looked at  this recently but I think you can  actually generate the assembly code  using a flag in the clang compiler so  that's a good exercise yeah you probably  want to look at the LLVM ir rather than  the assembly to begin with to understand  what's going on it has three  instructions that are not in the  standard LLVM which which were added to  support parallelism those things when  it's lowered into the assembler into  assembly each of those instructions  becomes a bunch of assembly language  instructions so you don't want to mess  with with looking at it in the assembler  until you see what it looks like in the  LLVM first yes a good question any other  questions about this this code here  okay so let's look at another example so  let's say we had this floral loop where  on each iteration I were just  incrementing a variable sum by I so this  is essentially going to commute compute  the summation of everything from I  equals 0 up to n minus 1 and then print  out the result so one straightforward  way to try to paralyze this code is to  just change the 4 to a soap 4 so does  this code work who thinks that this code  doesn't work or doesn't compute the  correct result so about half of you and  who thinks this code does work so couple  people in guess the rest of the people  don't care so it turns out that it's not  actually necessarily going to give you  the right answer because the silk for  loop says you can execute these  iterations in parallel but they're all  updating the same shared variable sum  here so you have a what's called a  determinacy race where multiple  processors can be writing to the same  memory location we'll talk much more  about determinacy races in the next  lecture but for this example it's not  necessarily going to work if you run it  on more than one processor and silk  actually has a nice way to deal with  this  so in silk we have something known as a  reducer this is one example of a hyper  object which I mentioned earlier and  with a reducer what you have to do is  instead of declaring this some variable  just has an unsigned long data type what  you do is use this macro called silk see  reducer op ad which specifies we want to  create a reducer with the addition  function then we have the variable name  sum the data  type unsigned long and then the initial  value zero and then we have a macro to  register this reducer so a silk see  register reducer and then now inside  this soap for loop we can increment the  sum or reducer view of sum which is  another macro by I and you can actually  execute this in parallel and it will  give you the same answer that you would  get if you ran this sequentially so the  reducer will take care of this  determinacy race for you and at the end  when you print out this result you'll  see that the sum is equal to the sum  that you expect and then after you  finish using the reducer use this other  macro called silky an unregistered  reducer of sum that tells the system  that you're done using this reducer so  this is one way to deal with this  problem when you're when you want to do  a reduction and it turns out that there  are many other interesting reduction  operators that you might want to use and  in general you can create reducers for  Mahanoy Chand ma nodes or algebraic  structures that have an associative  binary operation as well as an identity  element so the addition operator is a  mono it because it's associative it's  binary and the identity element is 0  silk also has several other predefined  reducers including multiplication min  Max and or XOR etc so these are all more  mono it's and you can also define your  own reducer so in fact in the next  homework you'll have the opportunity to  play around with reducers and write a  reducer for lists so that's reducers  another nice thing about silk is that  there's always a valid serial  interpretation of the program so the  serial elision of a silk program is  always a legal interpretation and for  the silk source code on the left the  serial elision is basically the code you  get if you get rid of the silk spawn and  soak sync statements  this looks just like the sequential code  remember that the sole keywords grant  permission for parallel execution but  they don't necessarily command parallel  executions if you ran this if you ran  this SIL code using a single core it  wouldn't actually create these parallel  tasks and you would get the same answer  as the sequential program and this is in  the serial illusion is also a correct  interpretation so unlike other solutions  such as TBB and P threads it's actually  difficult in those environments to get a  program that does what the sequential  program does because they're actually  doing a lot of additional work to set up  these parallel calls and create these  argument structures and other scheduling  constructs whereas in silk it's very  easy just to get the serial illusion you  just define silks pawn and silks Inc to  be null you also define silk for to be  for and then this gives you a valid  sequential program so when you're  debugging code and you might first want  to check if your if the sequential  elision of your silk program is correct  and you can easily do that by using  these macros or actually there's  actually a compiler flag that will do  that for you and give you the equivalent  C program so this is a nice way to debug  because you don't have to start with the  parallel program you can first check if  this serial program is correct before  you go on to debug the parallel program  questions yes  so it turns out that by default it it  groups a bunch of iterations together  into a single task because it doesn't  make sense to break it down into such  small chunks do that overhead so  parallelism but there's actually a  setting you can do to change the grain  size of the for loop so you could  actually make it so that each iteration  it's its own task and then and then as  you said the scheduler scheduler will  decide how to map these different tasks  onto different processors or even if it  wants to execute any of these tasks in  parallel it's a good question okay so  the idea in seok is to allow the  programmer to express logical  parallelism in an application so the  programmer just has to identify which  pieces of the code could be executed in  parallel but doesn't necessarily have to  determine which of the piece which  pieces of code should be executed in  parallel and then silk has a runtime  scheduler that will monitor matically  map the executing program on to the  available processor cores runtime and it  does this dynamically using  work-stealing scheduling algorithm and  the work stealing scheduler is used to  balance the tasks evenly across the  different processors and we'll talk more  about the work stealing scheduler in a  future lecture but I want to emphasize  that unlike unlike the other concurrency  platforms that we looked at today  silks works the only scheduling  algorithm is theoretically efficient  whereas the OpenMP and TBB schedulers  are not theoretically efficient so this  is a nice property because it will  guarantee you that  the algorithms you write on top of silk  will also be theoretically efficient  okay so here's a high-level illustration  of the silk ecosystem it's a very  simplified view but I did this to fit it  on a single slide so so what you do is  you take the silk source code you pass  it to your favorite silk compiler the  taper LLVM compiler and this gives you a  binary that you can run on multiple  processors and then you pass a program  input to the binary you run it on  however many processors you you have and  then this allows you to benchmark the  parallel performance of your program you  can also do serial testing and to do  this you just obtain a serial illusion  of the Silk program and you pass it to  an ordinary C or C++ compiler generates  a binary that can only run on a single  processor and you run your suite of  serial regression tests on this single  threaded binary and this will allow you  to benchmark to performance of your  serial code and also debug any issues  that might have arised when you are  running this program sequentially  another way to do this is you can  actually just compile the original silk  code but run it on a single processor so  there's a command-line argument that  tells the runtime system how many  processors you want to use any of you  set that parameter to 1 then it will  only use a single processor and this  allows you to benchmark the  single-threaded performance of your code  as well and the parallel program  executing on a single course should  behave exactly the same way as the  execution of this serial Alishan so  that's one of the advantages of using  silk and because because you can easily  do serial testing using the silk  platform this allows you to separate out  the serial correctness from the parallel  correctness as I said earlier you can  first debug the serial correctness as  well as any performance issues before  moving on to the parallel version and  another point I want to make is that  because the because Silk actually uses  the serial program inside his tasks it's  actually good to optimize a serial  program even when you're writing a  parallel program because optimizing the  serial program for performance will also  translate to better parallel performance  another nice feature of seok is that it  has this tool called silks and which  stands for silk sanitizer and the silk  silks and will detect any determinacy  races that you have in your code which  will significantly help you with  debugging the correctness and as well as  the performance of your code so silks  and wool if you compile the silk code  using the silks and flag it will  generate an instrumented binary that  when you run it will find and localize  all of the determinacy races in your  programs it will tell you where the  determinacy races occur so that you can  go inspect that part of your code and  fix it if necessary so this is a very  useful tool for benchmarking your  parallel programs silk also has another  nice tool called silk scale silk scale  it's a performance analyzer it will  analyze how much parallelism is  available in your program as well as the  total amount of work that it's doing so  again you pass a flag to the compiler  that will turn on silk silk scale and it  will generate a binary that is  instrumented and then when you run run  this code it will give you a scalability  report so you'll find these tools very  useful when you're doing the next  project and we'll talk a little bit more  about these two tools in the next  lecture  right as I said silk skill will analyze  how well your program will scale to  larger machines so as it will basically  tell you the  maximum number of processors that your  code could possibly take advantage of  any questions yes so I mean the  scheduler the silk runtime scheduler  does scheduling that different tasks  when you're running the program so it's  linked from the binary it's not stored  in the same place it's linked other  questions all right so let me summarize  what we looked at today so first we saw  that most processors today have multiple  cores and probably all of your laptops  have more than one core on it who has a  laptop that only has one core ok when  did you buy it  yeah probably a long time ago I see so  so nowadays obtaining high performance  on your machines requires you to write  parallel programs but parallel  programming can be very hard especially  if you have to program directly on the  processor cores and interact with the  operating system yourself so silk is  very nice because it abstracts the  processor cores from the programmer a  handle synchronization and communication  protocols and it also performs provably  good load balancing and in the next  project you'll have a chance to play  around with silk you'll be implementing  your own parallel screensaver so that's  a very fun project to do and possibly in  one of the future lectures we'll post  some of the nicest screen savers that  students developed for everyone to see  ok so that's all  you

2
00:00:03,189 --> 00:00:05,769
the following content is provided under  a Creative Commons license your support  will help MIT OpenCourseWare continue to  offer high quality educational resources  for free  to make a donation or to view additional  materials from hundreds of MIT courses  visit MIT opencourseware at ocw.mit.edu  today we're gonna talk about multi-core  programming and I as I was just informed  by Charles it's 2018  I had 2017 on the slide okay so first  congratulations to all of you you turned  in the first projects beta here's a  here's a plot showing the tiers that  different groups reach for the beta and  this is in sorted order and we set the  beta caught off to be tier 45 the final  cut off is tier 48 so the final cut off  we did set a little bit aggressively but  keep in mind that you don't necessarily  have to get to the final cut off in  order to get an A on this project okay  so we're going to talk about multi-core  processing today that's gonna be a topic  of the next project after you finish the  first project so so in a multi-core  processor we have a whole bunch of cores  that are all placed on the same chip and  they have access to shared memory they  usually also have some sort of private  cache and then a shared last level cache  so l3 in this case and then they they  all have access the same memory  controller which goes out to main memory  and then they also have access to IO but  for a very long time  chips only had a single core on them so  why do we have multi-core processors  nowadays why did semiconductor vendors  start producing chips that had multiple  processor cores on them so the answer is  because of two things so first there's  Moore's law which says that we get more  trans  sisters every year so the number of  transistors that you can fit on a chip  doubles approximately every two years  and secondly there's an end of scaling  of clock frequencies so for a very long  time we could just keep increasing the  frequency of the single core on the chip  but at around 2004 to 2005 that was no  longer the case  we couldn't scale the clock frequency  anymore so here's a plot showing both  the number of transistors you can fit on  the chip over time as well as a clock  frequency of the processors over time  and notice that the y-axis is in log  scale here and the blue line is  basically Moore's law which says that  the number of transistors you can fit on  the chip doubles approximately every two  years and that's been growing pretty  steadily so this plot goes up to 2010  but in fact it's been growing even up  until the president it will continue to  grow for a couple more years before  Moore's law ends however if you look at  the clock frequency line you see that it  was growing quite steadily until about  the early 2000s and then at that point  is sort of flattened out so at that  point we couldn't increase the clock  frequencies anymore and the clock speed  was bounded at about 4 gigahertz so  nowadays if you go buy a processor is  usually still bounded by around 4  gigahertz it's usually a little bit less  than 4 gig Hertz because it doesn't  really make sense to push it all the way  but you might find some processors that  are around 4 gigahertz nowadays so so  what happened at around 2004 to 2005  does anyone know  okay so so Moore's law basically says  that we can fit more transistors on a  chip because the transistors become  smaller and when the transistors become  smaller you can reduce the voltage  that's needed to operate the transistors  and as a result you could increase the  clock frequency while maintaining the  same power density and that's what  manufacturers did until about 2004 to  2005 they just kept increasing the clock  frequency to take advantage of Moore's  law but turns out that once transistors  become small enough in the voltage used  to operate them becomes small enough  there's something called leakage current  so there's current that leaks and we're  unable to keep reducing the voltage  while still having reliable switching  and if you can't reduce a voltage  anymore then you can't increase the  frequency the clock frequency if you  want to keep the same power density so  here's a plot from Intel back in 2004  when they first started producing  multi-core processors and this is  plotting the power density versus time  and again the y-axis is in log scale  here so the green data points are actual  data points and the orange ones are  projected and they projected what the  power density would be if we kept  increasing the clock frequency at a  trend of about 25 to 30 percent per year  which is what happened up until around  2004 and because we couldn't reduce the  voltage anymore the power density will  go up and you can see that eventually it  reaches the power density of a nuclear  reactor which is pretty hot and then and  then it reaches the power density of a  rocket nozzle and eventually you get to  the power density of a of the sun's  surface so if you have a chip that's  that has a power density equal to the  Sun surface well you don't actually  really have a chip anymore  you right and so basically if you get  into this orange region you basically  have a fire and you can't really do  anything interesting in terms of  performance engineering at that point so  to solve this problem semiconductor vet  semiconductor vendors they didn't  include increase the clock frequency  anymore but we still had Moore's Law  giving us more and more transistors  every year so what they decided to do  with these extra transistors was to put  put them into multiple cores and then  put multiple cores on the same chip so  we can see that starting at around 2004  the number of cores per chip becomes  more than one and and each generation of  Moore's Law will potentially double the  number of cores that you can fit on a  chip because it's doubling the number of  transistors and we we've seen this trend  up until about today and again it's  going to continue for a couple more  years before Moore's law ends so that's  why we have chips with multiple cores  today so today we're going to look at  multi-core processing so I first want to  introduce the abstract multi-core  architecture so this is a very  simplified version but I can fit it on  this slide and it's a good example for  illustration so here we have a whole  bunch of processors they each have a  cache so that's indicated with the  dollar sign and usually they they have a  private cache as well as a shared cache  so I shared last level cache like the l3  cache and then they're all connected to  the network and then throughout through  the network they can connect to the main  memory they can all access the same  shared memory and then usually there's a  separate network for the i/o as well  even though I've drawn them as a single  network here so they can access the i/o  interface and potentially the network  will also connect to other multi  processors on the same system  and this abstract multi-core  architecture is known as a chip  multiprocessor or cmp so that's the  architecture that we'll be looking at  today so here's an outline of today's  lectures so first I'm gonna go over some  hardware challenges with shared memory  multi-core machines so we're gonna look  at the cache coherence protocol and then  after looking at hardware we're going to  look at some software solutions to  program to write parallel programs on  these multi-core machines to take  advantage of the extra cores and we're  gonna look at several concurrency  platforms listed here we're gonna look  at P threads this is basically a  low-level API for accessing or for  running your code in parallel and if you  program on Microsoft products the win  API threads is pretty similar  then there's Intel threading building  blocks which is a library solution to  concurrency and then there are two  linguistic solutions that we'll be  looking at openmp and so plus and so  plus is actually the concurrency  platform that will be using for most of  this class okay  all right so let's look at how caches  work so so let's say that we have a  value in memory at some location and  that value is let's say that value is x  equals 3  if one processor says we want to load X  what happens is that processor reads  this value from a main memory brings it  into its own cache and then it also  reads a value loads it into one of those  registers and it keeps this value in  cache so that if it wants to access this  value again in the near future it  doesn't have to go all the way out to  main memory you can just look at the  value in its cache now what happens if  another processor wants to load X what  just does the same thing it reads the  value from main memory brings it in  addiction is cache and then also loads  it into one of the registers and then  same thing with another processor turns  out that you don't actually always have  to go out to main memory to get the  value of the value resides in one of the  other processors caches you can also get  the value through the other processors  cache and sometimes ask cheaper than  going all the way out to main memory  okay so now okay so the second processor  now loads acts again and it's in cache  so it doesn't have to go to main memory  or anybody else's cache so what happens  now if we want to store X if we want to  set the value of X to something else  okay so let's say this processor wants  to set X equal to five so it's going to  write x equals five and store that  result in its own cache so that's all  well and good now what what happens when  the first processor wants to load X well  it sees that the value of X is is it is  in its own cache so it's just gonna read  the value of x there and it gets a value  of  three so what's the problem there yes  yes so the problem is that the value of  x in the first processors cache is stale  because another processor updated it so  now we can't this value of x in the  first processors cache is invalid okay  so so that's the problem and one of the  main challenges of multi-core hardware  is to try to solve this problem of cache  coherence making sure that the values in  different processors caches are  consistent across updates okay so one  basic protocol for solving this problem  is known as the MSI protocol and in this  protocol each cache line is labeled with  a state so there are three possible  states M s and I and this is done on the  granularity of cache lines because it  turns out that storing this information  is relatively expensive so you don't  want to store it for every memory  location so they do it on a per cache  line basis so anyone know what the size  of a cache line is on the machines that  we're using  yeah yeah so it's 64 bytes and that's  typically what you see today on most  intel and AMD machines there are some  architectures that have different cache  lines like 128 bytes but for our class  the machines that were using will have  64 byte cache lines is important to  remember that so that when you're doing  back at the envelope calculations you  can get accurate estimates so three  states in the MSI protocol or ms and iso  M stands for modified and when a cache  block is in the modified state that  means no other caches can contain this  block in the M or the a States the S  state means that the block is shared so  other caches can also have this block in  shared state and if finally I means the  cache block is invalid so that's  essentially the same as the cache block  not being in the cache and to solve the  problem of cache coherency when one  cache modifies a location it has to  inform all the other caches that their  values are now stale because because  this cache modified the value so it's  going to invalidate all of the other  copies of that cache line in other  caches by changing their state from s to  I so let's see how this works so let's  say that the second processor wants to  store y equals 5 so previously a value  of y was 17 and it was in shared State  the cache line containing y equals 17  was in shared State so now when I do y  equals 5 I'm gonna set the second  processors cache that cache line to  modified State and then I'm going to  invalidate the cache line in all of the  other caches that contain that cache  line so now the first cache in the  fourth cache each have a state of I 4 y  equals 17 because that value is still  there any questions yes  we already other yeah so there are  actually some protocols that do that so  this is just a most basic protocol so  this protocol doesn't do it but there  are some that are used in practice that  actually do do that so it's a good point  yeah but I just want to present the most  basic protocol for now sorry okay and  then when you load a value you can first  check whether whether your cache line is  it is an M or a state and if it is an M  or S state then you can just read that  value directly but if it's in the eye  state orbs or if it's not there then you  have to fetch that block from either  another processors cache or fetch it  from main memory so turns out that there  are many other protocols out there  there's something known as mes I the  Massey protocol  there's also m OE si and many other  different protocols and some of them are  proprietary and they all do different  things and it turns out that all of  these protocols are quite complicated  and it's very hard to get these  protocols right and in fact one of the  most earliest successes of formal  verification was improving some of these  cache coherence protocols to be correct  yes question  yeah so if two processors try to modify  the value one of them has to happen  first so the hardware is going to take  care of that so the first one that  actually modifies it will invalidate all  the other copies and then the second one  that modifies the value will again  invalidate all the other copies and when  you do that when a lot of processors try  to modify the same value you get  something known as an invalidation storm  so you have a bunch of invalidation  messages going through throughout the  hardware and that can lead to a big  performance bottleneck because each  processor when it modifies it's about  yes inform all the other processors and  if all the processors are modifying the  same value get the sort of quadratic  behavior the hardware is still going to  guarantee that one of the processor is  going to end up writing the value there  but you should be aware of this  performance issue when you're writing a  parallel code yes yes so this is all  implemented in hardware so if we take a  computer architecture class you'll learn  much more about these protocols and all  of their variants so for our purposes we  don't actually need to understand all  the details of the hardware we just need  to understand what it's doing at a high  level so we can understand when we have  a performance bottleneck and why we have  a performance bottleneck so that's why  I'm just introducing the most basic  protocol here any other questions  all right so so I talked a little bit  about the shared memory hardware let's  now look at some concurrency platforms  so these are the four platforms that  we'll be looking at today so first what  is a concurrency platform both writing  parallel programs is very difficult it's  very hard to get these programs to be  correct and if you want to optimize  their performance it becomes even harder  so it's very painful and error-prone and  a concurrency platform abstracts  processor cores and handles  synchronization and communication  protocols and it also performs load  balancing for you so it makes your lives  much easier and so today we're gonna  talk about some of these different  concurrency platforms so to illustrate  these concurrency platforms I'm gonna do  the Fibonacci numbers example so does  anybody not know what Fibonacci is so  good everybody knows what Fibonacci is  okay so so it's a sequence where each  numbers are sum of the previous two  numbers and the recurrence here is the  recurrence is shown in this brown box  here the sequence is named after  Leonardo di Pisa who was also known as  Fibonacci which is a contraction of  filius Bonacci son of Bonacci oh so  that's where the name Fibonacci came  from and in Fibonacci's 1202 book libra  bocce he introduced the sequence the  Fibonacci sequence to Western  mathematics although it had been  previously known to India and  mathematicians for several centuries but  this is what we call the the sequence  nowadays Fibonacci numbers  all right so here's a Fibonacci program  has anyone seen this out algorithm  before a couple people probably more but  people didn't raise their hands okay so  all right so it's a recursive program so  it basically implements the recurrence  from the previous slide so if n is less  than 2 we just return n otherwise we  compute fib of n minus 1 store that  value in X filled with n minus 2 store  that value in Y and then return the sum  of x and y so I do want to make a  disclaimer to the algorithms police that  this is actually a very bad algorithm so  this algorithm takes exponential time  and there's actually much better ways to  compute the Fibonacci thence the Bonacci  number there's a linear time algorithm  which just computes the Fibonacci  numbers from bottom up this algorithm  here is actually redoing a lot of the  work because it's computing Fibonacci  numbers multiple times whereas if you do  a linear scan from the smallest numbers  up you only have to compute each one  once and there's actually an even better  algorithm that takes logarithmic time  and it's based on squaring matrices so  has anyone seen that algorithm before  yeah so a couple people so if you're  interested in learning more about this  algorithm I encourage you to look at  your favorite textbook introduction  algorithms by cormen leiserson rivest  and stein so even though this is a yes  so even though this is a pretty bad  algorithm it's still a good educational  example because I can fit it on on one  slide and illustrate all the concepts of  parallelism that we want to cover today  so here's the execution tree for fib of  4  so we see that fair before is going to  call fib of 3 in favor of to fib of 3 is  gonna call fib of 2 fib of 1 and so on  and you can see that repeated  computations here it's a fib of 2 is  being computed twice and so on and if  you have a much larger tree say you ran  it ran this on fit before T then you'll  have many more overlapping computations  it turns out that the two recursive  calls can actually be paralyzed because  they're completely independent  calculations so the key idea for  parallelization is to simultaneously  execute the two recursive sub calls to  fib and in fact you can do this  recursively so the two sob calls to Fela  3 can also be executed in parallel and  the two sub calls a fiblet 2 can also be  executed in parallel on so on so you  have all of these calls that can be  executed in parallel so that's a key  idea for extracting parallelism from  this algorithm okay so let's now look at  how we can use P threads to implement  this simple Fibonacci algorithm so P  threads is a standard API for threading  and it's supported on all UNIX based  machines and if you're programming using  Microsoft products then then the  equivalent is when API threads and this  is actually a P threads is actually  standard and ANSI I Triple E so there's  this number here that specifies a  standard but nowadays we just call it P  threads and it's basically a  do-it-yourself concurrency platforms so  it's like the assembly language of  parallel programming it's built as a  library of functions with special non C  semantics because if you're just writing  code in C you can't really say which  parts of the code should be executed in  parallel so P threads provides you a  library of functions that allow you to  specify concurrency  your program and each thread implements  an abstraction of a processor and these  threads are then multiplex onto the  actual machine resources so the number  of threads that you create doesn't  necessarily have to match the number of  processors you have on your machine so  if you have more threads than the number  of processors you have then they'll just  be multiplexing so you can actually run  a pthreads program on a single single  core even though you have multiple  threads in the program they would just  be time sharing all the threads  communicate through shared memory so  they all have access to the same view of  the memory and the library functions  that pthreads provides mask the  protocols involved in inner thread  coordination so you don't have to do it  yourself because it turns out that this  is quite difficult to do correctly by  hand so now I want to look at the key P  thread functions so the first P thread  function is pthread create and this  takes four arguments so the first  argument is this a P thread underscore T  type this is this is basically going to  store and identify for the new thread  that pthread create will create so that  we can use that thread in our  computations P thread attribute T decide  some thread attributes and for our  purposes we can just set it to null and  use the default attributes the third  argument is this function that's going  to be executed after we create the  thread so we're going to need to define  this function that we want the thread to  execute and then finally we have this  void star argue arg argument which  stores the arguments that are going to  be passed to the function that we're  going to be executing and then pthread  create also returns an error status  returns an integer specifying whether  the thread creation was successful or  not and then there's another  function called pthread join pthread  join basically says that we want to  block at this part of our code until all  the until so until this specified thread  finishes so it takes us argument  pthread underscore t so this thread  identifier and these thread identifiers  were created when we call p thread  pthread create also has the second  argument status which is going to store  the status of the terminating thread and  that p thread join also returns an error  status so essentially what this does is  says to wait until there's threat  finishes before we continue on in our  program so any any questions so far  ok so here's what the implementation of  Fibonacci looks like using P threads so  on the Left we see the original program  that we had the fib function there  that's just the sequential code and then  we have all this other stuff to enable  it to run in parallel so first we have  this struct on the left thread args this  struct here is used to store the  arguments that are passed to the  function that the thread is going to  execute and then we have this thread  func what that does is it it reads the  input argument from this thread args  struct and then it sets that to I and  then it calls fib of I and that gives  you the output and then we store the  result into the output of the struct and  then that also just returns null and  then over on the right hand side we have  the main function that will actually  call the fib function on the left so we  initialize a whole bunch of variables  that we need for  to execute these threads and then we  first check if n is less than 30 if n is  less than 30 it turns out that it's  actually not worse  creating threats to execute this program  in parallel because of the overhead of  thread creation so if n is less than 30  we'll just execute the program  sequentially and this idea is known as  coarsening so you saw a similar example  couple lectures ago when we did  coarsening for sorting but this is in  the context of parallel program so here  because there are some overheads to  running function parallel if the input  size is small enough sometimes you want  to just execute it sequentially and then  we're going to okay so let me just walk  through this code since have an  animation okay so the next thing it's  going to do is it's going to marshal the  input argument to the threat so it's  going to store the input argument n  minus 1 in this args struct and then  we're gonna call pthread create with a  thread variable for thread arts we're  just gonna use null and then we're gonna  pass the thread func that we defined on  the left and then we're going to pass  the args structure and in inside this  arc structure the input is set to n  minus 1 which we did on the previous  line okay and then pthread create is  going to it's going to give a return  value so if if the P thread creation was  successful then the status is going to  be no and we can continue and if that  and when we continue we're going to  execute now fib of n minus 2 and store  the result of that into our result  variable and this is done at the same  time that fib of n minus 1 is executing  because we created this P thread and we  told it to call this thread func  function that we defined on the left so  fib of n minus 1 and fib of n minus 2  are executing in parallel now and then  we have this pthread join which says  we're going to wait until the thread  that we created finishes before we move  on because we need to know the result of  both of the sub calls before we can  finish this function and once that's  done well we first check the status to  see if it was successful and if so then  we add the outputs of the arguments  truck to the results so arg star output  will store the output of fib of n minus  1 so that's the that's the pthreads code  any questions on how this works yeah  yeah  yeah so this is because the pthread  create function takes as input void star  pointer because it's actually a generic  function so it doesn't know what the  data type is it has to work for all data  types and that's why we need to cast it  to void star when we pass it the pthread  create and then inside the thread func  we actually do know what type of pointer  that is so then we cast it yeah so does  this code seem very parallel so how many  parallel calls am i doing here yeah  yeah so I'm only creating one threat so  I'm executing two things in parallel so  if I ran this code on four processors  what's the maximum speed-up I could get  so maximum speed-up I can get is just  two because I'm only creating I'm only  running two things in parallel so this  doesn't recursively create threads only  creates threads at the it only creates  one thread at the top level and if you  wanted to make make it so that this code  actually recursively created threads it  would actually become much more  complicated and that's one of the  disadvantages of implementing this code  in P threads so we'll look at other  solutions that will make this task much  easier so some of the issues with P  threads are shown on this slide here so  there's a high overhead to creating a  thread so creating a thread typically  takes over 10 to the 4/4 cycles and this  leads to very coarse-grained concurrency  because your tasks have to do a lot of  work in order to amortize the cost of  creating that thread there are something  called thread pools which can help and  the idea here is to create a whole bunch  of threads at the same time to amortize  the cost of thread creation and then  when you need a thread you just take one  from the thread pool so the thread pool  contains threads that are just waiting  to do work there's also a scalability  issue with this code that I showed on  the previous slide the Fibonacci code  gets at most one point five x speed-up  for two cores why is it one point five  here does anyone know yeah  yeah so it turns out that the two the  two calls and I'm executing in parallel  they're not doing the same amount of  work so one is computing fib of n minus  one one is computing fib of n minus twos  and does anyone know what the ratio  between these two values is yeah so it's  the golden ratio it's about 1.6 it turns  out that if you can get a speed-up of  1.6 and that's great but there are some  overheads so this code will get about a  1.5 speed-up and if you want to run this  to take at the edge of more course and  you need to rewrite this code and it  becomes more complicated third there's  the issue of modularity so if you look  at this code here you see that the  Fibonacci logic is not nicely  encapsulated within one function we have  that logic in the function on the left  but then we also have some of the fib  logic on the right in our main function  and this makes it this makes this code  not modular and if we if we want to  build programs on top of this it makes  it very hard to maintain if we want to  just change the logic of the Fibonacci  function a little bit because now we  have to change it in multiple places  instead of just having everything in one  place so it's not a good idea to write  code that's not modular so please don't  do it do that in your projects okay and  then finally the code becomes  complicated because you have to actually  move these arguments around that's known  as argument marshalling and then you  have to engage in error-prone protocols  in order to do load balancing so if you  recall here we have to actually place  the argument n minus 1 into into our dot  input and we have to extract the value  out of Arts dot output so that makes the  code very messy  so why do I say shades of 1958 here  suddenly one no what what happened in  1958 who is around in 1958 Jess Charles  so there is a first something in 1958  what was it so turns out in 1958 we had  the first compiler and this was the  Fortran compiler and before we had the  fortune compiler programmers were  writing things in assembly and when you  write things in assembly you have to do  argument argument marshalling because  you have to place things into the  appropriate registers before calling a  function and also move things around  when you return from a function and the  nice thing about the first compiler is  that it actually did all of this  argument marshalling for you so now you  can just pass arguments to a function  and the compiler will generate code that  will do the argument marshalling for for  us so having you do this in P threads is  similar to having to write code in  assembly because you have to actually  manually marshal these arguments so  hopefully there are better ways to do  this and indeed we'll look at some other  solutions that will make it easier on  the programmer any questions before I  continue  all right so we looked at pthreads next  let's look at threading building blocks  so threading building blocks is a  library solution  it was developed by Intel and it's  implemented as a C++ library that runs  on top of native threads so the  underlying implementation uses threads  but the programmer doesn't deal with  threads instead the programmer specifies  tasks and these tasks are automatically  load balanced across the threads using a  work-stealing algorithm inspired by  research at MIT charles license research  in the focus of Intel TBB is on  performance and as we'll see the code  written using TBB is simpler than what  you would have to write if you used P  threads so let's look at how we can  implement Fibonacci using TBB okay so in  TBB we have to create these tasks so in  the Fibonacci code we create this v as  class and inside the task we have to  define this execute function so the xq  function is a function that performs the  computation when we start the task and  this is where we define the Fibonacci  logic this task also takes as input  these arguments parameter and in sum so  and as they input here in sum is the  output okay and in T bbbb we can easily  create a recursive program that extracts  more parallelism and here what we're  doing is we're recursively creating two  child tasks a and B that's a syntax for  creating the task and here we can just  pass two arguments to fib tasks instead  of marshaling the arguments ourselves  and and then what we have here is a set  ref count  and this basically is the number of  casts that we have to wait for plus 1 so  plus one for ourselves and in this case  we create a two children task and we  have ourselves so that's 2 plus 1 and  then after that we start has be using  the spawn be call and then we do spawn  and wait for all with a as the argument  in this place he says we're going to run  a start task a and then also wait for  both a and B to finish before we proceed  so this spawn and wait for all call is  going to look at the raft count that we  set above and wait for that many tasks  to finish before it continues and then  after after both a and B have completed  then we can just sum up the results and  store that in the sum variable and here  these tasks are created recursively so  unlike the pthreads implementation that  was only creating one thread at the top  level here we're actually recursively  creating more and more tasks so we can  actually get more parallelism from this  code and scale to more processors we  also need this main function just to  start up the program so what we do here  is we create a root task which just  computes fib of n and that we call spawn  root and weight of a so a is it has for  the root and then it will just run the  root task so that's what T Fibonacci  looks like in TBB so this is much  simpler than the pthreads implementation  and it also gets better performance  because we can extract more parallelism  from from the computation any questions  okay so TBB also has many other features  in addition to tasks  so TBB provides many C++ templates to  express common patterns and you can use  these templates on different data types  so they have a parallel for which is  used to express loop parallelism so you  can loop over a bunch of iterations in  parallel they also have a parallel  reduce for data aggregation for example  if you if you want to sum together a  whole bunch of values you can use a  parallel reduce to do that in parallel  they also have a pipeline and filter  that's used for software pipelining TBB  provides many concurrent container  classes which allow multiple threats is  safely access and update the items in  the container concurrently so for  example they have hash tables trees  priority queues and so on and you can  just use these out-of-the-box and don't  work in parallel you can do concurrent  updates and reads to these data  structures  TPB also has a variety of mutual  exclusion library functions such as  locks and atomic operations so there are  a lot of features of TBB which is why is  one of the more popular concurrency  platforms and because of all of these  features you don't have to implement  many of these things by yourself and  still get pretty good performance so TBB  was a library solution to the  concurrency problem now we're gonna look  at two linguistics solutions OpenMP and  silk so start with OpenMP so OpenMP is a  specification by an industry consortium  and there are several compilers  available that support OpenMP both open  source and proprietary so nowadays GCC  icc and clang all support OpenMP as well  as visual studio and openmp is it  provides linguistic extensions to C and  C++ as well as Fortran and  form of compiler pragmas so you use  these compiler pragmas in your code to  specify which pieces of code should run  in parallel and OpenMP also runs on top  of native threads but the programmer  isn't exposed to these threads OpenMP  supports loop parallelism so you can do  parallel for loops  they have task parallelism as well as  pipeline parallelism so let's look at  how we can implement fibonacci in OpenMP  so this is the entire code so I want you  to compare this to the P threads  implementation that we saw 10 minutes  ago so so this code is much cleaner than  the P threads implementation it it also  performs better so let's see how this  code works so we have these compiler  pragmas or compiler directives and the  compiler pragma for creating a parallel  task is a OMP task so we're gonna create  a openmp task for faileth n minus 1 as  well it's a Fablab n minus 2 there's  also this yeah there's also this shared  pragma which specifies that the two  variables in the arguments are shared  across different threads so you also  have to specify whether variables are  private or shared and then the pragma  OMP weight just says we're gonna wait  for the preceding task to complete  before we continue so here is going to  wait for fib of n minus 1 and fib of n  minus 2 to finish before we return the  result which is what we want and then  after that we just returned X plus y so  that's the entire code  and OpenMP also provides many other  pragma directives in addition to tasks  so we can use a parallel for to do loop  parallelism  there's reduction there's also  directives for scheduling and data  sharing so you can specify how you want  a particular loop to be scheduled OpenMP  has many different scheduling policies  they have static parallelism dynamic  parallelism and so on and then these  scheduling directives also have  different grain sizes the data sharing  directives are specifying whether a  variables are private or shared OpenMP  also supplies a variety of  synchronization constructs such as  barriers atomic updates mutual exclusion  or mutex locks so openmp also has many  features and it's also one of the more  popular solutions to writing parallel  programs and as you saw in the previous  example the code is much simpler than if  you were to write something using  pthreads or even a TVB this is a much  simpler solution any questions  yeah  so so this code here is actually  independent of the number of processors  so there is actually a scheduling  algorithm that will determine how the  casket map to different processors so if  you spawned in you test it doesn't  necessarily put it on the different  processor and you can have more tasks  than the number of processes available  so there's a scheduling algorithm that  will take care of how these tasks get  mapped to different processors and  that's hidden from the programmer  although you can't use these scheduling  pragmas to give hints the compiler how  it should schedule it yeah us so I mean  underneath this is implemented using P  threads which has to make operating  system calls to basically directly talk  to the processor cores and do  multiplexing and so forth so it's the  operating system is involved at a very  low level  okay so the last concurrency platform  that we'll be looking at today is silk  okay so or we're gonna look at so plus  actually in the silk part of silk plus  is a small set of linguistic extensions  to C and C++ that support for joint  parallelism so for example the Fibonacci  example uses fork/join parallelism so  you can use soap to implement that and  the plus part of soap Plus supports  vector parallelism which you had  experience working with in your  homeworks so silk was so plus was  initially devolved by silk arts which  was an MIT spin-off and silk arts was  acquired by Intel in July 2009 and the  so plus implementation is based on award  winning based on the award-winning silk  multi-threaded languished I was  developed two decades ago here at MIT by  Charles artisans research group and it  features a provably efficient work  stealing scheduler so this schedule is  provably efficient you can actually  prove theoretical bounds on it and this  allows you to implement theoretically  efficient algorithms which we'll talk  more about in another lecture algorithm  design but it provides a provably  efficient work stealing scheduler and  Charles Larsen has a very famous paper  that has a proof of that this scheduler  is optimal so if you're interested in  reading about this you can talk to us  offline so plus also provides a hyper  object library for paralyzing code with  global variables and you'll have a  chance to play around with hyper objects  in homework for the so plus ecosystem  also includes useful programming tools  such as a silk screen race detectors so  this allows you to detect determinacy  races in your program to help you  isolate bugs and performance bottlenecks  it also has a scalability analyzer  called silk view and silk view will base  analyze the amount of work that your  program is doing as well as the maximum  amount of parallelism that your code  could possibly extract from the hardware  so that's in Tulsa plus but it turns out  that we're not actually going to be  using in Tulsa plus in this class we're  gonna be using a better compiler and  this compiler is based on taper LLVM and  it supports the silk subset of silk plus  and taper LLVM was actually recently  developed at MIT by TB charnel who gave  a lecture last week  William Moses who's a grad student  working with Charles as well as charles  leiserson and so talking a lot about  Charles's work today and taper LVM  generally produces better code relative  to its base compiler than all other  implementations of silk out there so  it's the best cell compiler that's  available today and they actually wrote  a very nice paper on this last year  Charles lisen in this group and that  paper received the best paper award at  the annual symposium on principles and  practices of parallel programming or  p-pop so you should look at that paper  as well  so right now taper LVM uses the Intel  so plus runtime system by I believe  Charles's group has plans to implement a  better runtime system and tabor LVM also  supports more general features than  existing sole compilers so in addition  to spawning functions you can also spawn  code blocks that are not separate  functions and this makes your this right  makes writing programs more flexible you  don't have to actually create a separate  function if you want to execute a code  block in parallel  any questions  so this is the salt coat for Fibonacci  so it's also pretty simple it looks very  similar to the sequential program except  we have these silk spawn and silk stink  statements in the code so what do these  statements do so  silk spawn says that the named child  function which is the function that is  right after the silk spawn statement may  execute in parallel with the parent  caller the parent caller is the function  that is calling soaked spawn so this  says that fib of n minus 1 can execute  in parallel with the function that  called it and then this function is then  gonna call fib of n minus 2 and fib of n  minus 2 and fib of n minus 1 now can be  executing in parallel and then silk sync  says that control cannot pass this point  until all of the spawn children have  returned since it's going to wait for  fifth of N minus 1 to return before we  go to the return statement where we add  up X&Y so one important thing to note is  that the so key where it's grant  permission for parallel execution but  they don't actually force or command  parallel execution so even though I said  soak spawn here the the runtime system  doesn't necessarily have to run fib of n  minus 1 in parallel with fib of n minus  2 I'm just saying that I could run these  two things in parallel and it's up to  the runtime system to decide whether or  not to run these things in parallel  based on its scheduling policy so let's  look at another example of silk so let's  look at loop parallelism so here we want  to do a matrix transpose and we want to  do this in place so the idea here is we  want to basically swap the elements  below the diagonal to the  - its mirror mirror image above the  diagonal and here's some code to do this  so we have a silk for so this is  basically a parallel for loop it goes  from I equals 1 to n minus 1 and then  the inner for loop goes from J equals 0  up to I minus 1 and then we just swap a  of IJ with a of ji using these three  statements inside the body of the for  loop so so to execute a for loop in  parallel you just have to add silk  underscore to the to the for keyword and  that's as simple as it gets  so this code is actually going to run in  parallel and get a pretty good speed up  on this for this particular problem and  internally silk for loops are  transformed into nested silks spawn in  silks Inc calls so the compiler is going  to get rid of the silk for and change it  into silks spawn and silks Inc so it's  going to recursively divide the  iteration space into half and then it's  going to spawn off one half in the next  acute the other half in parallel with  that and then recursively do that until  the iteration range becomes small enough  at which point doesn't make sense to  execute it in parallel anymore so we  just execute that range sequentially  okay so that's loop parallelism in silk  any questions yes  there's something weird can I still do  that yeah so the compiler can actually  figure out what the iteration space is  so you don't necessarily have to be  incrementing by one you can do something  else you just have to guarantee that all  of the iterations are independent so if  if you act if you have a determinacy  race across the different iterations of  your cell for loop then your result  might not necessarily be correct so you  have to make sure that the iterations  are indeed independent yes yeah so you  can that's silk force it turns out that  for this example usually you already  have enough parallelism in the outer  loop for large enough values of n so it  doesn't make sense to put a slope for  loop inside because using a silk for  loop adds some additional overheads  but you can actually do nested silk for  loops and in some cases it does make  sense especially if the if there's not  enough parallelism in that outermost for  loop it's a good question yes  so I mean it it it has a bunch of calls  to the silk run time system I don't  actually I don't I mean I don't know all  the details because I haven't looked at  this recently but I think you can  actually generate the assembly code  using a flag in the clang compiler so  that's a good exercise yeah you probably  want to look at the LLVM ir rather than  the assembly to begin with to understand  what's going on it has three  instructions that are not in the  standard LLVM which which were added to  support parallelism those things when  it's lowered into the assembler into  assembly each of those instructions  becomes a bunch of assembly language  instructions so you don't want to mess  with with looking at it in the assembler  until you see what it looks like in the  LLVM first yes a good question any other  questions about this this code here  okay so let's look at another example so  let's say we had this floral loop where  on each iteration I were just  incrementing a variable sum by I so this  is essentially going to commute compute  the summation of everything from I  equals 0 up to n minus 1 and then print  out the result so one straightforward  way to try to paralyze this code is to  just change the 4 to a soap 4 so does  this code work who thinks that this code  doesn't work or doesn't compute the  correct result so about half of you and  who thinks this code does work so couple  people in guess the rest of the people  don't care so it turns out that it's not  actually necessarily going to give you  the right answer because the silk for  loop says you can execute these  iterations in parallel but they're all  updating the same shared variable sum  here so you have a what's called a  determinacy race where multiple  processors can be writing to the same  memory location we'll talk much more  about determinacy races in the next  lecture but for this example it's not  necessarily going to work if you run it  on more than one processor and silk  actually has a nice way to deal with  this  so in silk we have something known as a  reducer this is one example of a hyper  object which I mentioned earlier and  with a reducer what you have to do is  instead of declaring this some variable  just has an unsigned long data type what  you do is use this macro called silk see  reducer op ad which specifies we want to  create a reducer with the addition  function then we have the variable name  sum the data  type unsigned long and then the initial  value zero and then we have a macro to  register this reducer so a silk see  register reducer and then now inside  this soap for loop we can increment the  sum or reducer view of sum which is  another macro by I and you can actually  execute this in parallel and it will  give you the same answer that you would  get if you ran this sequentially so the  reducer will take care of this  determinacy race for you and at the end  when you print out this result you'll  see that the sum is equal to the sum  that you expect and then after you  finish using the reducer use this other  macro called silky an unregistered  reducer of sum that tells the system  that you're done using this reducer so  this is one way to deal with this  problem when you're when you want to do  a reduction and it turns out that there  are many other interesting reduction  operators that you might want to use and  in general you can create reducers for  Mahanoy Chand ma nodes or algebraic  structures that have an associative  binary operation as well as an identity  element so the addition operator is a  mono it because it's associative it's  binary and the identity element is 0  silk also has several other predefined  reducers including multiplication min  Max and or XOR etc so these are all more  mono it's and you can also define your  own reducer so in fact in the next  homework you'll have the opportunity to  play around with reducers and write a  reducer for lists so that's reducers  another nice thing about silk is that  there's always a valid serial  interpretation of the program so the  serial elision of a silk program is  always a legal interpretation and for  the silk source code on the left the  serial elision is basically the code you  get if you get rid of the silk spawn and  soak sync statements  this looks just like the sequential code  remember that the sole keywords grant  permission for parallel execution but  they don't necessarily command parallel  executions if you ran this if you ran  this SIL code using a single core it  wouldn't actually create these parallel  tasks and you would get the same answer  as the sequential program and this is in  the serial illusion is also a correct  interpretation so unlike other solutions  such as TBB and P threads it's actually  difficult in those environments to get a  program that does what the sequential  program does because they're actually  doing a lot of additional work to set up  these parallel calls and create these  argument structures and other scheduling  constructs whereas in silk it's very  easy just to get the serial illusion you  just define silks pawn and silks Inc to  be null you also define silk for to be  for and then this gives you a valid  sequential program so when you're  debugging code and you might first want  to check if your if the sequential  elision of your silk program is correct  and you can easily do that by using  these macros or actually there's  actually a compiler flag that will do  that for you and give you the equivalent  C program so this is a nice way to debug  because you don't have to start with the  parallel program you can first check if  this serial program is correct before  you go on to debug the parallel program  questions yes  so it turns out that by default it it  groups a bunch of iterations together  into a single task because it doesn't  make sense to break it down into such  small chunks do that overhead so  parallelism but there's actually a  setting you can do to change the grain  size of the for loop so you could  actually make it so that each iteration  it's its own task and then and then as  you said the scheduler scheduler will  decide how to map these different tasks  onto different processors or even if it  wants to execute any of these tasks in  parallel it's a good question okay so  the idea in seok is to allow the  programmer to express logical  parallelism in an application so the  programmer just has to identify which  pieces of the code could be executed in  parallel but doesn't necessarily have to  determine which of the piece which  pieces of code should be executed in  parallel and then silk has a runtime  scheduler that will monitor matically  map the executing program on to the  available processor cores runtime and it  does this dynamically using  work-stealing scheduling algorithm and  the work stealing scheduler is used to  balance the tasks evenly across the  different processors and we'll talk more  about the work stealing scheduler in a  future lecture but I want to emphasize  that unlike unlike the other concurrency  platforms that we looked at today  silks works the only scheduling  algorithm is theoretically efficient  whereas the OpenMP and TBB schedulers  are not theoretically efficient so this  is a nice property because it will  guarantee you that  the algorithms you write on top of silk  will also be theoretically efficient  okay so here's a high-level illustration  of the silk ecosystem it's a very  simplified view but I did this to fit it  on a single slide so so what you do is  you take the silk source code you pass  it to your favorite silk compiler the  taper LLVM compiler and this gives you a  binary that you can run on multiple  processors and then you pass a program  input to the binary you run it on  however many processors you you have and  then this allows you to benchmark the  parallel performance of your program you  can also do serial testing and to do  this you just obtain a serial illusion  of the Silk program and you pass it to  an ordinary C or C++ compiler generates  a binary that can only run on a single  processor and you run your suite of  serial regression tests on this single  threaded binary and this will allow you  to benchmark to performance of your  serial code and also debug any issues  that might have arised when you are  running this program sequentially  another way to do this is you can  actually just compile the original silk  code but run it on a single processor so  there's a command-line argument that  tells the runtime system how many  processors you want to use any of you  set that parameter to 1 then it will  only use a single processor and this  allows you to benchmark the  single-threaded performance of your code  as well and the parallel program  executing on a single course should  behave exactly the same way as the  execution of this serial Alishan so  that's one of the advantages of using  silk and because because you can easily  do serial testing using the silk  platform this allows you to separate out  the serial correctness from the parallel  correctness as I said earlier you can  first debug the serial correctness as  well as any performance issues before  moving on to the parallel version and  another point I want to make is that  because the because Silk actually uses  the serial program inside his tasks it's  actually good to optimize a serial  program even when you're writing a  parallel program because optimizing the  serial program for performance will also  translate to better parallel performance  another nice feature of seok is that it  has this tool called silks and which  stands for silk sanitizer and the silk  silks and will detect any determinacy  races that you have in your code which  will significantly help you with  debugging the correctness and as well as  the performance of your code so silks  and wool if you compile the silk code  using the silks and flag it will  generate an instrumented binary that  when you run it will find and localize  all of the determinacy races in your  programs it will tell you where the  determinacy races occur so that you can  go inspect that part of your code and  fix it if necessary so this is a very  useful tool for benchmarking your  parallel programs silk also has another  nice tool called silk scale silk scale  it's a performance analyzer it will  analyze how much parallelism is  available in your program as well as the  total amount of work that it's doing so  again you pass a flag to the compiler  that will turn on silk silk scale and it  will generate a binary that is  instrumented and then when you run run  this code it will give you a scalability  report so you'll find these tools very  useful when you're doing the next  project and we'll talk a little bit more  about these two tools in the next  lecture  right as I said silk skill will analyze  how well your program will scale to  larger machines so as it will basically  tell you the  maximum number of processors that your  code could possibly take advantage of  any questions yes so I mean the  scheduler the silk runtime scheduler  does scheduling that different tasks  when you're running the program so it's  linked from the binary it's not stored  in the same place it's linked other  questions all right so let me summarize  what we looked at today so first we saw  that most processors today have multiple  cores and probably all of your laptops  have more than one core on it who has a  laptop that only has one core ok when  did you buy it  yeah probably a long time ago I see so  so nowadays obtaining high performance  on your machines requires you to write  parallel programs but parallel  programming can be very hard especially  if you have to program directly on the  processor cores and interact with the  operating system yourself so silk is  very nice because it abstracts the  processor cores from the programmer a  handle synchronization and communication  protocols and it also performs provably  good load balancing and in the next  project you'll have a chance to play  around with silk you'll be implementing  your own parallel screensaver so that's  a very fun project to do and possibly in  one of the future lectures we'll post  some of the nicest screen savers that  students developed for everyone to see  ok so that's all  you
 

3
00:00:05,769 --> 00:00:08,019

 
4
00:00:08,019 --> 00:00:09,850

 
5
00:00:09,850 --> 00:00:10,930

 
6
00:00:10,930 --> 00:00:13,120

 
7
00:00:13,120 --> 00:00:15,160

 
8
00:00:15,160 --> 00:00:21,690

 
9
00:00:21,690 --> 00:00:23,950

 
10
00:00:23,950 --> 00:00:28,210

 
11
00:00:28,210 --> 00:00:30,549

 
12
00:00:30,549 --> 00:00:37,740

 
13
00:00:37,740 --> 00:00:40,660

 
14
00:00:40,660 --> 00:00:45,000

 
15
00:00:45,000 --> 00:00:48,280

 
16
00:00:48,280 --> 00:00:51,520

 
17
00:00:51,520 --> 00:00:54,370

 
18
00:00:54,370 --> 00:00:58,330

 
19
00:00:58,330 --> 00:01:01,000

 
20
00:01:01,000 --> 00:01:03,760

 
21
00:01:03,760 --> 00:01:05,710

 
22
00:01:05,710 --> 00:01:08,110

 
23
00:01:08,110 --> 00:01:14,110

 
24
00:01:14,110 --> 00:01:16,539

 
25
00:01:16,539 --> 00:01:19,660

 
26
00:01:19,660 --> 00:01:22,030

 
27
00:01:22,030 --> 00:01:24,969

 
28
00:01:24,969 --> 00:01:27,399

 
29
00:01:27,399 --> 00:01:30,820

 
30
00:01:30,820 --> 00:01:34,510

 
31
00:01:34,510 --> 00:01:37,330

 
32
00:01:37,330 --> 00:01:39,670

 
33
00:01:39,670 --> 00:01:42,640

 
34
00:01:42,640 --> 00:01:43,929

 
35
00:01:43,929 --> 00:01:46,240

 
36
00:01:46,240 --> 00:01:50,080

 
37
00:01:50,080 --> 00:01:52,020

 
38
00:01:52,020 --> 00:01:55,000

 
39
00:01:55,000 --> 00:01:57,490

 
40
00:01:57,490 --> 00:02:00,010

 
41
00:02:00,010 --> 00:02:02,200

 
42
00:02:02,200 --> 00:02:07,570

 
43
00:02:07,570 --> 00:02:10,929

 
44
00:02:10,929 --> 00:02:14,589

 
45
00:02:14,589 --> 00:02:14,599

 
46
00:02:14,599 --> 00:02:14,950


47
00:02:14,950 --> 00:02:17,410

 
48
00:02:17,410 --> 00:02:18,910

 
49
00:02:18,910 --> 00:02:20,950

 
50
00:02:20,950 --> 00:02:24,010

 
51
00:02:24,010 --> 00:02:26,080

 
52
00:02:26,080 --> 00:02:28,120

 
53
00:02:28,120 --> 00:02:31,450

 
54
00:02:31,450 --> 00:02:36,070

 
55
00:02:36,070 --> 00:02:37,150

 
56
00:02:37,150 --> 00:02:39,190

 
57
00:02:39,190 --> 00:02:45,400

 
58
00:02:45,400 --> 00:02:47,260

 
59
00:02:47,260 --> 00:02:49,420

 
60
00:02:49,420 --> 00:02:51,940

 
61
00:02:51,940 --> 00:02:54,310

 
62
00:02:54,310 --> 00:02:56,650

 
63
00:02:56,650 --> 00:02:59,380

 
64
00:02:59,380 --> 00:03:01,360

 
65
00:03:01,360 --> 00:03:03,670

 
66
00:03:03,670 --> 00:03:05,050

 
67
00:03:05,050 --> 00:03:07,890

 
68
00:03:07,890 --> 00:03:10,480

 
69
00:03:10,480 --> 00:03:12,610

 
70
00:03:12,610 --> 00:03:14,500

 
71
00:03:14,500 --> 00:03:17,440

 
72
00:03:17,440 --> 00:03:20,950

 
73
00:03:20,950 --> 00:03:23,800

 
74
00:03:23,800 --> 00:03:26,530

 
75
00:03:26,530 --> 00:03:33,700

 
76
00:03:33,700 --> 00:03:35,920

 
77
00:03:35,920 --> 00:03:38,200

 
78
00:03:38,200 --> 00:03:40,990

 
79
00:03:40,990 --> 00:03:43,060

 
80
00:03:43,060 --> 00:03:46,240

 
81
00:03:46,240 --> 00:03:48,100

 
82
00:03:48,100 --> 00:03:50,080

 
83
00:03:50,080 --> 00:03:51,580

 
84
00:03:51,580 --> 00:03:55,420

 
85
00:03:55,420 --> 00:04:00,310

 
86
00:04:00,310 --> 00:04:03,430

 
87
00:04:03,430 --> 00:04:10,369

 
88
00:04:10,369 --> 00:04:14,940

 
89
00:04:14,940 --> 00:04:17,009

 
90
00:04:17,009 --> 00:04:19,469

 
91
00:04:19,469 --> 00:04:22,379

 
92
00:04:22,379 --> 00:04:24,689

 
93
00:04:24,689 --> 00:04:26,760

 
94
00:04:26,760 --> 00:04:29,400

 
95
00:04:29,400 --> 00:04:31,499

 
96
00:04:31,499 --> 00:04:34,080

 
97
00:04:34,080 --> 00:04:37,050

 
98
00:04:37,050 --> 00:04:39,330

 
99
00:04:39,330 --> 00:04:41,430

 
100
00:04:41,430 --> 00:04:43,860

 
101
00:04:43,860 --> 00:04:46,070

 
102
00:04:46,070 --> 00:04:49,610

 
103
00:04:49,610 --> 00:04:52,080

 
104
00:04:52,080 --> 00:04:54,930

 
105
00:04:54,930 --> 00:04:56,610

 
106
00:04:56,610 --> 00:04:59,540

 
107
00:04:59,540 --> 00:05:02,040

 
108
00:05:02,040 --> 00:05:04,950

 
109
00:05:04,950 --> 00:05:07,290

 
110
00:05:07,290 --> 00:05:13,610

 
111
00:05:13,610 --> 00:05:17,730

 
112
00:05:17,730 --> 00:05:20,070

 
113
00:05:20,070 --> 00:05:22,379

 
114
00:05:22,379 --> 00:05:24,930

 
115
00:05:24,930 --> 00:05:26,730

 
116
00:05:26,730 --> 00:05:31,529

 
117
00:05:31,529 --> 00:05:33,180

 
118
00:05:33,180 --> 00:05:37,200

 
119
00:05:37,200 --> 00:05:39,629

 
120
00:05:39,629 --> 00:05:43,080

 
121
00:05:43,080 --> 00:05:45,839

 
122
00:05:45,839 --> 00:05:48,710

 
123
00:05:48,710 --> 00:05:52,409

 
124
00:05:52,409 --> 00:05:54,810

 
125
00:05:54,810 --> 00:05:59,070

 
126
00:05:59,070 --> 00:06:01,020

 
127
00:06:01,020 --> 00:06:05,189

 
128
00:06:05,189 --> 00:06:07,589

 
129
00:06:07,589 --> 00:06:09,390

 
130
00:06:09,390 --> 00:06:11,250

 
131
00:06:11,250 --> 00:06:15,360

 
132
00:06:15,360 --> 00:06:18,240

 
133
00:06:18,240 --> 00:06:20,279

 
134
00:06:20,279 --> 00:06:22,890

 
135
00:06:22,890 --> 00:06:26,740

 
136
00:06:26,740 --> 00:06:28,689

 
137
00:06:28,689 --> 00:06:30,790

 
138
00:06:30,790 --> 00:06:32,469

 
139
00:06:32,469 --> 00:06:36,399

 
140
00:06:36,399 --> 00:06:41,790

 
141
00:06:41,790 --> 00:06:44,050

 
142
00:06:44,050 --> 00:06:46,390

 
143
00:06:46,390 --> 00:06:48,909

 
144
00:06:48,909 --> 00:06:51,070

 
145
00:06:51,070 --> 00:06:53,559

 
146
00:06:53,559 --> 00:06:56,770

 
147
00:06:56,770 --> 00:06:59,709

 
148
00:06:59,709 --> 00:07:02,260

 
149
00:07:02,260 --> 00:07:05,290

 
150
00:07:05,290 --> 00:07:09,730

 
151
00:07:09,730 --> 00:07:15,129

 
152
00:07:15,129 --> 00:07:17,050

 
153
00:07:17,050 --> 00:07:18,519

 
154
00:07:18,519 --> 00:07:20,860

 
155
00:07:20,860 --> 00:07:23,760

 
156
00:07:23,760 --> 00:07:26,680

 
157
00:07:26,680 --> 00:07:29,140

 
158
00:07:29,140 --> 00:07:32,290

 
159
00:07:32,290 --> 00:07:34,269

 
160
00:07:34,269 --> 00:07:38,619

 
161
00:07:38,619 --> 00:07:42,730

 
162
00:07:42,730 --> 00:07:44,680

 
163
00:07:44,680 --> 00:07:47,110

 
164
00:07:47,110 --> 00:07:49,240

 
165
00:07:49,240 --> 00:07:52,839

 
166
00:07:52,839 --> 00:07:55,329

 
167
00:07:55,329 --> 00:07:58,240

 
168
00:07:58,240 --> 00:08:00,279

 
169
00:08:00,279 --> 00:08:04,059

 
170
00:08:04,059 --> 00:08:06,089

 
171
00:08:06,089 --> 00:08:09,010

 
172
00:08:09,010 --> 00:08:10,600

 
173
00:08:10,600 --> 00:08:14,619

 
174
00:08:14,619 --> 00:08:17,050

 
175
00:08:17,050 --> 00:08:18,670

 
176
00:08:18,670 --> 00:08:21,820

 
177
00:08:21,820 --> 00:08:23,860

 
178
00:08:23,860 --> 00:08:25,450

 
179
00:08:25,450 --> 00:08:27,189

 
180
00:08:27,189 --> 00:08:29,559

 
181
00:08:29,559 --> 00:08:33,040

 
182
00:08:33,040 --> 00:08:35,719

 
183
00:08:35,719 --> 00:08:37,129

 
184
00:08:37,129 --> 00:08:39,009

 
185
00:08:39,009 --> 00:08:42,050

 
186
00:08:42,050 --> 00:08:43,639

 
187
00:08:43,639 --> 00:08:51,170

 
188
00:08:51,170 --> 00:08:55,150

 
189
00:08:55,150 --> 00:08:58,220

 
190
00:08:58,220 --> 00:09:02,840

 
191
00:09:02,840 --> 00:09:05,689

 
192
00:09:05,689 --> 00:09:08,000

 
193
00:09:08,000 --> 00:09:10,670

 
194
00:09:10,670 --> 00:09:12,920

 
195
00:09:12,920 --> 00:09:14,569

 
196
00:09:14,569 --> 00:09:17,810

 
197
00:09:17,810 --> 00:09:19,189

 
198
00:09:19,189 --> 00:09:21,379

 
199
00:09:21,379 --> 00:09:24,170

 
200
00:09:24,170 --> 00:09:28,689

 
201
00:09:28,689 --> 00:09:31,430

 
202
00:09:31,430 --> 00:09:34,370

 
203
00:09:34,370 --> 00:09:36,350

 
204
00:09:36,350 --> 00:09:38,689

 
205
00:09:38,689 --> 00:09:40,639

 
206
00:09:40,639 --> 00:09:42,800

 
207
00:09:42,800 --> 00:09:44,420

 
208
00:09:44,420 --> 00:09:48,439

 
209
00:09:48,439 --> 00:09:51,199

 
210
00:09:51,199 --> 00:09:53,569

 
211
00:09:53,569 --> 00:10:05,480

 
212
00:10:05,480 --> 00:10:09,870

 
213
00:10:09,870 --> 00:10:14,370

 
214
00:10:14,370 --> 00:10:18,120

 
215
00:10:18,120 --> 00:10:22,050

 
216
00:10:22,050 --> 00:10:24,319

 
217
00:10:24,319 --> 00:10:27,900

 
218
00:10:27,900 --> 00:10:31,590

 
219
00:10:31,590 --> 00:10:34,290

 
220
00:10:34,290 --> 00:10:36,449

 
221
00:10:36,449 --> 00:10:38,939

 
222
00:10:38,939 --> 00:10:41,850

 
223
00:10:41,850 --> 00:10:44,430

 
224
00:10:44,430 --> 00:10:46,920

 
225
00:10:46,920 --> 00:10:48,449

 
226
00:10:48,449 --> 00:10:49,889

 
227
00:10:49,889 --> 00:10:53,550

 
228
00:10:53,550 --> 00:10:58,500

 
229
00:10:58,500 --> 00:11:00,000

 
230
00:11:00,000 --> 00:11:01,710

 
231
00:11:01,710 --> 00:11:03,540

 
232
00:11:03,540 --> 00:11:07,560

 
233
00:11:07,560 --> 00:11:11,430

 
234
00:11:11,430 --> 00:11:12,809

 
235
00:11:12,809 --> 00:11:14,939

 
236
00:11:14,939 --> 00:11:17,730

 
237
00:11:17,730 --> 00:11:19,829

 
238
00:11:19,829 --> 00:11:22,500

 
239
00:11:22,500 --> 00:11:24,900

 
240
00:11:24,900 --> 00:11:31,850

 
241
00:11:31,850 --> 00:11:34,800

 
242
00:11:34,800 --> 00:11:36,629

 
243
00:11:36,629 --> 00:11:38,309

 
244
00:11:38,309 --> 00:11:41,850

 
245
00:11:41,850 --> 00:11:45,000

 
246
00:11:45,000 --> 00:11:48,019

 
247
00:11:48,019 --> 00:11:50,670

 
248
00:11:50,670 --> 00:11:55,199

 
249
00:11:55,199 --> 00:11:57,660

 
250
00:11:57,660 --> 00:12:00,750

 
251
00:12:00,750 --> 00:12:06,660

 
252
00:12:06,660 --> 00:12:09,629

 
253
00:12:09,629 --> 00:12:12,630

 
254
00:12:12,630 --> 00:12:15,150

 
255
00:12:15,150 --> 00:12:17,490

 
256
00:12:17,490 --> 00:12:17,500

 
257
00:12:17,500 --> 00:12:18,090


258
00:12:18,090 --> 00:12:29,480

 
259
00:12:29,480 --> 00:12:32,490

 
260
00:12:32,490 --> 00:12:35,490

 
261
00:12:35,490 --> 00:12:38,640

 
262
00:12:38,640 --> 00:12:41,100

 
263
00:12:41,100 --> 00:12:45,570

 
264
00:12:45,570 --> 00:12:48,900

 
265
00:12:48,900 --> 00:12:51,060

 
266
00:12:51,060 --> 00:12:53,220

 
267
00:12:53,220 --> 00:12:57,180

 
268
00:12:57,180 --> 00:12:59,550

 
269
00:12:59,550 --> 00:13:08,580

 
270
00:13:08,580 --> 00:13:11,250

 
271
00:13:11,250 --> 00:13:15,240

 
272
00:13:15,240 --> 00:13:17,550

 
273
00:13:17,550 --> 00:13:20,100

 
274
00:13:20,100 --> 00:13:23,190

 
275
00:13:23,190 --> 00:13:25,620

 
276
00:13:25,620 --> 00:13:28,260

 
277
00:13:28,260 --> 00:13:30,000

 
278
00:13:30,000 --> 00:13:31,170

 
279
00:13:31,170 --> 00:13:34,020

 
280
00:13:34,020 --> 00:13:37,290

 
281
00:13:37,290 --> 00:13:39,420

 
282
00:13:39,420 --> 00:13:46,990

 
283
00:13:46,990 --> 00:13:52,329

 
284
00:13:52,329 --> 00:13:55,449

 
285
00:13:55,449 --> 00:13:57,999

 
286
00:13:57,999 --> 00:13:59,499

 
287
00:13:59,499 --> 00:14:02,980

 
288
00:14:02,980 --> 00:14:04,480

 
289
00:14:04,480 --> 00:14:07,119

 
290
00:14:07,119 --> 00:14:09,220

 
291
00:14:09,220 --> 00:14:10,960

 
292
00:14:10,960 --> 00:14:14,559

 
293
00:14:14,559 --> 00:14:18,160

 
294
00:14:18,160 --> 00:14:21,249

 
295
00:14:21,249 --> 00:14:23,139

 
296
00:14:23,139 --> 00:14:25,329

 
297
00:14:25,329 --> 00:14:29,230

 
298
00:14:29,230 --> 00:14:32,019

 
299
00:14:32,019 --> 00:14:34,929

 
300
00:14:34,929 --> 00:14:38,230

 
301
00:14:38,230 --> 00:14:40,629

 
302
00:14:40,629 --> 00:14:42,639

 
303
00:14:42,639 --> 00:14:46,660

 
304
00:14:46,660 --> 00:14:49,540

 
305
00:14:49,540 --> 00:14:52,179

 
306
00:14:52,179 --> 00:14:56,110

 
307
00:14:56,110 --> 00:14:58,929

 
308
00:14:58,929 --> 00:15:02,019

 
309
00:15:02,019 --> 00:15:03,759

 
310
00:15:03,759 --> 00:15:05,980

 
311
00:15:05,980 --> 00:15:10,179

 
312
00:15:10,179 --> 00:15:15,009

 
313
00:15:15,009 --> 00:15:17,230

 
314
00:15:17,230 --> 00:15:20,139

 
315
00:15:20,139 --> 00:15:23,170

 
316
00:15:23,170 --> 00:15:25,389

 
317
00:15:25,389 --> 00:15:28,569

 
318
00:15:28,569 --> 00:15:33,610

 
319
00:15:33,610 --> 00:15:37,389

 
320
00:15:37,389 --> 00:15:40,030

 
321
00:15:40,030 --> 00:15:42,519

 
322
00:15:42,519 --> 00:15:44,319

 
323
00:15:44,319 --> 00:15:46,749

 
324
00:15:46,749 --> 00:15:50,949

 
325
00:15:50,949 --> 00:15:54,210

 
326
00:15:54,210 --> 00:15:58,370

 
327
00:15:58,370 --> 00:16:05,780

 
328
00:16:05,780 --> 00:16:08,480

 
329
00:16:08,480 --> 00:16:11,810

 
330
00:16:11,810 --> 00:16:13,670

 
331
00:16:13,670 --> 00:16:15,920

 
332
00:16:15,920 --> 00:16:19,720

 
333
00:16:19,720 --> 00:16:22,250

 
334
00:16:22,250 --> 00:16:32,900

 
335
00:16:32,900 --> 00:16:35,480

 
336
00:16:35,480 --> 00:16:39,080

 
337
00:16:39,080 --> 00:16:41,600

 
338
00:16:41,600 --> 00:16:43,490

 
339
00:16:43,490 --> 00:16:47,300

 
340
00:16:47,300 --> 00:16:49,760

 
341
00:16:49,760 --> 00:16:52,370

 
342
00:16:52,370 --> 00:16:54,620

 
343
00:16:54,620 --> 00:17:01,580

 
344
00:17:01,580 --> 00:17:03,050

 
345
00:17:03,050 --> 00:17:06,020

 
346
00:17:06,020 --> 00:17:07,760

 
347
00:17:07,760 --> 00:17:10,490

 
348
00:17:10,490 --> 00:17:12,800

 
349
00:17:12,800 --> 00:17:15,410

 
350
00:17:15,410 --> 00:17:18,500

 
351
00:17:18,500 --> 00:17:20,570

 
352
00:17:20,570 --> 00:17:23,210

 
353
00:17:23,210 --> 00:17:26,000

 
354
00:17:26,000 --> 00:17:28,820

 
355
00:17:28,820 --> 00:17:30,890

 
356
00:17:30,890 --> 00:17:33,770

 
357
00:17:33,770 --> 00:17:39,500

 
358
00:17:39,500 --> 00:17:42,900

 
359
00:17:42,900 --> 00:17:45,000

 
360
00:17:45,000 --> 00:17:46,590

 
361
00:17:46,590 --> 00:17:48,720

 
362
00:17:48,720 --> 00:17:50,760

 
363
00:17:50,760 --> 00:17:52,500

 
364
00:17:52,500 --> 00:17:54,480

 
365
00:17:54,480 --> 00:17:57,360

 
366
00:17:57,360 --> 00:17:59,910

 
367
00:17:59,910 --> 00:18:01,980

 
368
00:18:01,980 --> 00:18:03,930

 
369
00:18:03,930 --> 00:18:05,790

 
370
00:18:05,790 --> 00:18:08,190

 
371
00:18:08,190 --> 00:18:10,410

 
372
00:18:10,410 --> 00:18:12,090

 
373
00:18:12,090 --> 00:18:14,580

 
374
00:18:14,580 --> 00:18:17,430

 
375
00:18:17,430 --> 00:18:19,020

 
376
00:18:19,020 --> 00:18:20,820

 
377
00:18:20,820 --> 00:18:23,430

 
378
00:18:23,430 --> 00:18:25,890

 
379
00:18:25,890 --> 00:18:27,540

 
380
00:18:27,540 --> 00:18:29,190

 
381
00:18:29,190 --> 00:18:31,260

 
382
00:18:31,260 --> 00:18:38,280

 
383
00:18:38,280 --> 00:18:40,590

 
384
00:18:40,590 --> 00:18:43,140

 
385
00:18:43,140 --> 00:18:45,960

 
386
00:18:45,960 --> 00:18:51,360

 
387
00:18:51,360 --> 00:18:53,400

 
388
00:18:53,400 --> 00:18:55,320

 
389
00:18:55,320 --> 00:18:57,300

 
390
00:18:57,300 --> 00:19:00,030

 
391
00:19:00,030 --> 00:19:03,330

 
392
00:19:03,330 --> 00:19:04,830

 
393
00:19:04,830 --> 00:19:07,080

 
394
00:19:07,080 --> 00:19:19,060

 
395
00:19:19,060 --> 00:19:22,760

 
396
00:19:22,760 --> 00:19:26,510

 
397
00:19:26,510 --> 00:19:29,230

 
398
00:19:29,230 --> 00:19:32,690

 
399
00:19:32,690 --> 00:19:38,600

 
400
00:19:38,600 --> 00:19:41,020

 
401
00:19:41,020 --> 00:19:44,240

 
402
00:19:44,240 --> 00:19:46,310

 
403
00:19:46,310 --> 00:19:47,900

 
404
00:19:47,900 --> 00:19:49,730

 
405
00:19:49,730 --> 00:19:51,860

 
406
00:19:51,860 --> 00:19:54,549

 
407
00:19:54,549 --> 00:19:56,539

 
408
00:19:56,539 --> 00:19:57,970

 
409
00:19:57,970 --> 00:20:00,890

 
410
00:20:00,890 --> 00:20:03,500

 
411
00:20:03,500 --> 00:20:07,820

 
412
00:20:07,820 --> 00:20:10,430

 
413
00:20:10,430 --> 00:20:14,870

 
414
00:20:14,870 --> 00:20:17,470

 
415
00:20:17,470 --> 00:20:21,470

 
416
00:20:21,470 --> 00:20:27,980

 
417
00:20:27,980 --> 00:20:31,120

 
418
00:20:31,120 --> 00:20:35,659

 
419
00:20:35,659 --> 00:20:37,010

 
420
00:20:37,010 --> 00:20:40,250

 
421
00:20:40,250 --> 00:20:42,289

 
422
00:20:42,289 --> 00:20:45,529

 
423
00:20:45,529 --> 00:20:50,299

 
424
00:20:50,299 --> 00:20:52,820

 
425
00:20:52,820 --> 00:20:55,970

 
426
00:20:55,970 --> 00:20:57,440

 
427
00:20:57,440 --> 00:21:03,440

 
428
00:21:03,440 --> 00:21:06,770

 
429
00:21:06,770 --> 00:21:08,659

 
430
00:21:08,659 --> 00:21:11,480

 
431
00:21:11,480 --> 00:21:13,520

 
432
00:21:13,520 --> 00:21:19,430

 
433
00:21:19,430 --> 00:21:21,320

 
434
00:21:21,320 --> 00:21:24,480

 
435
00:21:24,480 --> 00:21:30,180

 
436
00:21:30,180 --> 00:21:32,710

 
437
00:21:32,710 --> 00:21:40,330

 
438
00:21:40,330 --> 00:21:44,580

 
439
00:21:44,580 --> 00:21:49,570

 
440
00:21:49,570 --> 00:21:51,760

 
441
00:21:51,760 --> 00:21:53,920

 
442
00:21:53,920 --> 00:21:57,040

 
443
00:21:57,040 --> 00:21:59,710

 
444
00:21:59,710 --> 00:22:01,840

 
445
00:22:01,840 --> 00:22:04,030

 
446
00:22:04,030 --> 00:22:11,290

 
447
00:22:11,290 --> 00:22:13,630

 
448
00:22:13,630 --> 00:22:16,600

 
449
00:22:16,600 --> 00:22:19,560

 
450
00:22:19,560 --> 00:22:22,120

 
451
00:22:22,120 --> 00:22:24,580

 
452
00:22:24,580 --> 00:22:26,890

 
453
00:22:26,890 --> 00:22:29,140

 
454
00:22:29,140 --> 00:22:32,410

 
455
00:22:32,410 --> 00:22:34,000

 
456
00:22:34,000 --> 00:22:38,230

 
457
00:22:38,230 --> 00:22:40,150

 
458
00:22:40,150 --> 00:22:42,640

 
459
00:22:42,640 --> 00:22:44,590

 
460
00:22:44,590 --> 00:22:46,540

 
461
00:22:46,540 --> 00:22:49,470

 
462
00:22:49,470 --> 00:22:53,400

 
463
00:22:53,400 --> 00:22:56,190

 
464
00:22:56,190 --> 00:22:59,290

 
465
00:22:59,290 --> 00:23:01,210

 
466
00:23:01,210 --> 00:23:03,700

 
467
00:23:03,700 --> 00:23:05,440

 
468
00:23:05,440 --> 00:23:08,290

 
469
00:23:08,290 --> 00:23:14,910

 
470
00:23:14,910 --> 00:23:18,010

 
471
00:23:18,010 --> 00:23:20,950

 
472
00:23:20,950 --> 00:23:23,800

 
473
00:23:23,800 --> 00:23:27,400

 
474
00:23:27,400 --> 00:23:29,560

 
475
00:23:29,560 --> 00:23:34,870

 
476
00:23:34,870 --> 00:23:34,880

 
477
00:23:34,880 --> 00:23:36,100


478
00:23:36,100 --> 00:23:39,279

 
479
00:23:39,279 --> 00:23:41,919

 
480
00:23:41,919 --> 00:23:44,649

 
481
00:23:44,649 --> 00:23:46,450

 
482
00:23:46,450 --> 00:23:48,399

 
483
00:23:48,399 --> 00:23:52,539

 
484
00:23:52,539 --> 00:23:55,570

 
485
00:23:55,570 --> 00:23:57,580

 
486
00:23:57,580 --> 00:24:03,330

 
487
00:24:03,330 --> 00:24:06,820

 
488
00:24:06,820 --> 00:24:10,870

 
489
00:24:10,870 --> 00:24:12,000

 
490
00:24:12,000 --> 00:24:14,740

 
491
00:24:14,740 --> 00:24:17,159

 
492
00:24:17,159 --> 00:24:21,009

 
493
00:24:21,009 --> 00:24:23,350

 
494
00:24:23,350 --> 00:24:26,860

 
495
00:24:26,860 --> 00:24:29,110

 
496
00:24:29,110 --> 00:24:31,419

 
497
00:24:31,419 --> 00:24:33,159

 
498
00:24:33,159 --> 00:24:36,159

 
499
00:24:36,159 --> 00:24:38,740

 
500
00:24:38,740 --> 00:24:41,529

 
501
00:24:41,529 --> 00:24:48,100

 
502
00:24:48,100 --> 00:24:51,250

 
503
00:24:51,250 --> 00:24:57,159

 
504
00:24:57,159 --> 00:25:00,399

 
505
00:25:00,399 --> 00:25:03,100

 
506
00:25:03,100 --> 00:25:06,340

 
507
00:25:06,340 --> 00:25:10,149

 
508
00:25:10,149 --> 00:25:13,299

 
509
00:25:13,299 --> 00:25:15,220

 
510
00:25:15,220 --> 00:25:18,759

 
511
00:25:18,759 --> 00:25:20,500

 
512
00:25:20,500 --> 00:25:23,049

 
513
00:25:23,049 --> 00:25:24,639

 
514
00:25:24,639 --> 00:25:26,830

 
515
00:25:26,830 --> 00:25:29,230

 
516
00:25:29,230 --> 00:25:32,169

 
517
00:25:32,169 --> 00:25:35,500

 
518
00:25:35,500 --> 00:25:37,930

 
519
00:25:37,930 --> 00:25:41,019

 
520
00:25:41,019 --> 00:25:43,330

 
521
00:25:43,330 --> 00:25:45,580

 
522
00:25:45,580 --> 00:25:47,820

 
523
00:25:47,820 --> 00:25:49,830

 
524
00:25:49,830 --> 00:25:54,090

 
525
00:25:54,090 --> 00:25:57,390

 
526
00:25:57,390 --> 00:25:59,340

 
527
00:25:59,340 --> 00:26:03,060

 
528
00:26:03,060 --> 00:26:04,890

 
529
00:26:04,890 --> 00:26:06,659

 
530
00:26:06,659 --> 00:26:09,450

 
531
00:26:09,450 --> 00:26:12,000

 
532
00:26:12,000 --> 00:26:13,380

 
533
00:26:13,380 --> 00:26:16,019

 
534
00:26:16,019 --> 00:26:18,870

 
535
00:26:18,870 --> 00:26:20,460

 
536
00:26:20,460 --> 00:26:22,320

 
537
00:26:22,320 --> 00:26:26,039

 
538
00:26:26,039 --> 00:26:28,680

 
539
00:26:28,680 --> 00:26:30,930

 
540
00:26:30,930 --> 00:26:34,200

 
541
00:26:34,200 --> 00:26:36,149

 
542
00:26:36,149 --> 00:26:39,440

 
543
00:26:39,440 --> 00:26:41,159

 
544
00:26:41,159 --> 00:26:42,720

 
545
00:26:42,720 --> 00:26:45,870

 
546
00:26:45,870 --> 00:26:50,970

 
547
00:26:50,970 --> 00:26:54,120

 
548
00:26:54,120 --> 00:26:57,090

 
549
00:26:57,090 --> 00:27:00,690

 
550
00:27:00,690 --> 00:27:04,560

 
551
00:27:04,560 --> 00:27:08,220

 
552
00:27:08,220 --> 00:27:10,260

 
553
00:27:10,260 --> 00:27:13,769

 
554
00:27:13,769 --> 00:27:15,289

 
555
00:27:15,289 --> 00:27:21,630

 
556
00:27:21,630 --> 00:27:24,180

 
557
00:27:24,180 --> 00:27:25,799

 
558
00:27:25,799 --> 00:27:29,760

 
559
00:27:29,760 --> 00:27:32,820

 
560
00:27:32,820 --> 00:27:35,100

 
561
00:27:35,100 --> 00:27:37,350

 
562
00:27:37,350 --> 00:27:39,180

 
563
00:27:39,180 --> 00:27:43,470

 
564
00:27:43,470 --> 00:27:46,320

 
565
00:27:46,320 --> 00:27:48,389

 
566
00:27:48,389 --> 00:27:50,070

 
567
00:27:50,070 --> 00:27:54,480

 
568
00:27:54,480 --> 00:27:56,480

 
569
00:27:56,480 --> 00:27:58,740

 
570
00:27:58,740 --> 00:28:01,169

 
571
00:28:01,169 --> 00:28:03,990

 
572
00:28:03,990 --> 00:28:07,200

 
573
00:28:07,200 --> 00:28:09,870

 
574
00:28:09,870 --> 00:28:13,320

 
575
00:28:13,320 --> 00:28:17,310

 
576
00:28:17,310 --> 00:28:19,950

 
577
00:28:19,950 --> 00:28:22,500

 
578
00:28:22,500 --> 00:28:24,270

 
579
00:28:24,270 --> 00:28:26,040

 
580
00:28:26,040 --> 00:28:30,270

 
581
00:28:30,270 --> 00:28:32,760

 
582
00:28:32,760 --> 00:28:36,360

 
583
00:28:36,360 --> 00:28:38,880

 
584
00:28:38,880 --> 00:28:40,740

 
585
00:28:40,740 --> 00:28:43,680

 
586
00:28:43,680 --> 00:28:46,530

 
587
00:28:46,530 --> 00:28:59,810

 
588
00:28:59,810 --> 00:29:03,090

 
589
00:29:03,090 --> 00:29:07,470

 
590
00:29:07,470 --> 00:29:10,020

 
591
00:29:10,020 --> 00:29:13,440

 
592
00:29:13,440 --> 00:29:17,100

 
593
00:29:17,100 --> 00:29:19,650

 
594
00:29:19,650 --> 00:29:22,950

 
595
00:29:22,950 --> 00:29:26,970

 
596
00:29:26,970 --> 00:29:29,790

 
597
00:29:29,790 --> 00:29:31,500

 
598
00:29:31,500 --> 00:29:33,090

 
599
00:29:33,090 --> 00:29:36,540

 
600
00:29:36,540 --> 00:29:42,030

 
601
00:29:42,030 --> 00:29:44,220

 
602
00:29:44,220 --> 00:29:47,580

 
603
00:29:47,580 --> 00:29:50,460

 
604
00:29:50,460 --> 00:29:51,960

 
605
00:29:51,960 --> 00:29:57,690

 
606
00:29:57,690 --> 00:30:03,120

 
607
00:30:03,120 --> 00:30:04,920

 
608
00:30:04,920 --> 00:30:07,070

 
609
00:30:07,070 --> 00:30:12,840

 
610
00:30:12,840 --> 00:30:14,850

 
611
00:30:14,850 --> 00:30:16,940

 
612
00:30:16,940 --> 00:30:20,720

 
613
00:30:20,720 --> 00:30:23,660

 
614
00:30:23,660 --> 00:30:25,250

 
615
00:30:25,250 --> 00:30:26,330

 
616
00:30:26,330 --> 00:30:28,430

 
617
00:30:28,430 --> 00:30:30,500

 
618
00:30:30,500 --> 00:30:32,690

 
619
00:30:32,690 --> 00:30:34,220

 
620
00:30:34,220 --> 00:30:38,510

 
621
00:30:38,510 --> 00:30:41,150

 
622
00:30:41,150 --> 00:30:43,130

 
623
00:30:43,130 --> 00:30:45,710

 
624
00:30:45,710 --> 00:30:48,800

 
625
00:30:48,800 --> 00:30:50,750

 
626
00:30:50,750 --> 00:30:53,900

 
627
00:30:53,900 --> 00:30:55,910

 
628
00:30:55,910 --> 00:31:00,560

 
629
00:31:00,560 --> 00:31:03,590

 
630
00:31:03,590 --> 00:31:06,050

 
631
00:31:06,050 --> 00:31:11,420

 
632
00:31:11,420 --> 00:31:13,640

 
633
00:31:13,640 --> 00:31:15,080

 
634
00:31:15,080 --> 00:31:16,610

 
635
00:31:16,610 --> 00:31:23,090

 
636
00:31:23,090 --> 00:31:26,300

 
637
00:31:26,300 --> 00:31:29,450

 
638
00:31:29,450 --> 00:31:31,700

 
639
00:31:31,700 --> 00:31:33,950

 
640
00:31:33,950 --> 00:31:36,740

 
641
00:31:36,740 --> 00:31:40,430

 
642
00:31:40,430 --> 00:31:43,280

 
643
00:31:43,280 --> 00:31:44,780

 
644
00:31:44,780 --> 00:31:53,150

 
645
00:31:53,150 --> 00:31:56,570

 
646
00:31:56,570 --> 00:32:03,470

 
647
00:32:03,470 --> 00:32:06,350

 
648
00:32:06,350 --> 00:32:09,560

 
649
00:32:09,560 --> 00:32:11,750

 
650
00:32:11,750 --> 00:32:15,020

 
651
00:32:15,020 --> 00:32:16,940

 
652
00:32:16,940 --> 00:32:18,710

 
653
00:32:18,710 --> 00:32:21,620

 
654
00:32:21,620 --> 00:32:25,130

 
655
00:32:25,130 --> 00:32:27,920

 
656
00:32:27,920 --> 00:32:30,620

 
657
00:32:30,620 --> 00:32:33,140

 
658
00:32:33,140 --> 00:32:36,409

 
659
00:32:36,409 --> 00:32:39,830

 
660
00:32:39,830 --> 00:32:41,750

 
661
00:32:41,750 --> 00:32:43,730

 
662
00:32:43,730 --> 00:32:45,770

 
663
00:32:45,770 --> 00:32:48,680

 
664
00:32:48,680 --> 00:32:51,799

 
665
00:32:51,799 --> 00:32:54,470

 
666
00:32:54,470 --> 00:32:56,870

 
667
00:32:56,870 --> 00:32:59,779

 
668
00:32:59,779 --> 00:33:02,180

 
669
00:33:02,180 --> 00:33:05,029

 
670
00:33:05,029 --> 00:33:12,549

 
671
00:33:12,549 --> 00:33:23,409

 
672
00:33:23,409 --> 00:33:23,419

 
673
00:33:23,419 --> 00:33:32,519


674
00:33:32,519 --> 00:33:36,610

 
675
00:33:36,610 --> 00:33:40,210

 
676
00:33:40,210 --> 00:33:41,740

 
677
00:33:41,740 --> 00:33:43,330

 
678
00:33:43,330 --> 00:33:45,640

 
679
00:33:45,640 --> 00:33:47,500

 
680
00:33:47,500 --> 00:33:49,840

 
681
00:33:49,840 --> 00:33:52,330

 
682
00:33:52,330 --> 00:33:54,940

 
683
00:33:54,940 --> 00:34:03,130

 
684
00:34:03,130 --> 00:34:09,820

 
685
00:34:09,820 --> 00:34:16,200

 
686
00:34:16,200 --> 00:34:18,480

 
687
00:34:18,480 --> 00:34:21,450

 
688
00:34:21,450 --> 00:34:25,440

 
689
00:34:25,440 --> 00:34:29,180

 
690
00:34:29,180 --> 00:34:31,649

 
691
00:34:31,649 --> 00:34:33,930

 
692
00:34:33,930 --> 00:34:35,609

 
693
00:34:35,609 --> 00:34:39,030

 
694
00:34:39,030 --> 00:34:41,159

 
695
00:34:41,159 --> 00:34:43,919

 
696
00:34:43,919 --> 00:34:46,109

 
697
00:34:46,109 --> 00:34:49,349

 
698
00:34:49,349 --> 00:34:50,639

 
699
00:34:50,639 --> 00:34:53,430

 
700
00:34:53,430 --> 00:34:57,329

 
701
00:34:57,329 --> 00:34:59,220

 
702
00:34:59,220 --> 00:35:01,200

 
703
00:35:01,200 --> 00:35:06,450

 
704
00:35:06,450 --> 00:35:08,540

 
705
00:35:08,540 --> 00:35:11,550

 
706
00:35:11,550 --> 00:35:13,410

 
707
00:35:13,410 --> 00:35:19,109

 
708
00:35:19,109 --> 00:35:21,270

 
709
00:35:21,270 --> 00:35:24,089

 
710
00:35:24,089 --> 00:35:28,410

 
711
00:35:28,410 --> 00:35:30,810

 
712
00:35:30,810 --> 00:35:32,670

 
713
00:35:32,670 --> 00:35:33,990

 
714
00:35:33,990 --> 00:35:35,880

 
715
00:35:35,880 --> 00:35:38,880

 
716
00:35:38,880 --> 00:35:41,160

 
717
00:35:41,160 --> 00:35:42,660

 
718
00:35:42,660 --> 00:35:44,820

 
719
00:35:44,820 --> 00:35:49,290

 
720
00:35:49,290 --> 00:35:51,180

 
721
00:35:51,180 --> 00:35:54,390

 
722
00:35:54,390 --> 00:35:57,630

 
723
00:35:57,630 --> 00:36:00,510

 
724
00:36:00,510 --> 00:36:07,410

 
725
00:36:07,410 --> 00:36:10,620

 
726
00:36:10,620 --> 00:36:12,450

 
727
00:36:12,450 --> 00:36:14,549

 
728
00:36:14,549 --> 00:36:16,559

 
729
00:36:16,559 --> 00:36:19,109

 
730
00:36:19,109 --> 00:36:21,839

 
731
00:36:21,839 --> 00:36:28,109

 
732
00:36:28,109 --> 00:36:31,559

 
733
00:36:31,559 --> 00:36:33,299

 
734
00:36:33,299 --> 00:36:35,549

 
735
00:36:35,549 --> 00:36:39,089

 
736
00:36:39,089 --> 00:36:43,440

 
737
00:36:43,440 --> 00:36:45,390

 
738
00:36:45,390 --> 00:36:46,920

 
739
00:36:46,920 --> 00:36:50,970

 
740
00:36:50,970 --> 00:36:54,029

 
741
00:36:54,029 --> 00:36:57,720

 
742
00:36:57,720 --> 00:37:00,059

 
743
00:37:00,059 --> 00:37:02,250

 
744
00:37:02,250 --> 00:37:04,490

 
745
00:37:04,490 --> 00:37:07,170

 
746
00:37:07,170 --> 00:37:09,839

 
747
00:37:09,839 --> 00:37:13,079

 
748
00:37:13,079 --> 00:37:14,819

 
749
00:37:14,819 --> 00:37:17,099

 
750
00:37:17,099 --> 00:37:18,779

 
751
00:37:18,779 --> 00:37:22,319

 
752
00:37:22,319 --> 00:37:24,690

 
753
00:37:24,690 --> 00:37:25,950

 
754
00:37:25,950 --> 00:37:28,049

 
755
00:37:28,049 --> 00:37:31,440

 
756
00:37:31,440 --> 00:37:33,870

 
757
00:37:33,870 --> 00:37:40,529

 
758
00:37:40,529 --> 00:37:43,890

 
759
00:37:43,890 --> 00:37:45,980

 
760
00:37:45,980 --> 00:37:48,359

 
761
00:37:48,359 --> 00:37:50,460

 
762
00:37:50,460 --> 00:37:52,710

 
763
00:37:52,710 --> 00:37:56,400

 
764
00:37:56,400 --> 00:37:58,200

 
765
00:37:58,200 --> 00:38:02,069

 
766
00:38:02,069 --> 00:38:03,930

 
767
00:38:03,930 --> 00:38:05,970

 
768
00:38:05,970 --> 00:38:12,120

 
769
00:38:12,120 --> 00:38:16,920

 
770
00:38:16,920 --> 00:38:19,660

 
771
00:38:19,660 --> 00:38:28,470

 
772
00:38:28,470 --> 00:38:32,730

 
773
00:38:32,730 --> 00:38:45,340

 
774
00:38:45,340 --> 00:38:47,830

 
775
00:38:47,830 --> 00:38:51,730

 
776
00:38:51,730 --> 00:38:53,320

 
777
00:38:53,320 --> 00:38:55,600

 
778
00:38:55,600 --> 00:38:57,460

 
779
00:38:57,460 --> 00:38:59,380

 
780
00:38:59,380 --> 00:39:00,700

 
781
00:39:00,700 --> 00:39:03,220

 
782
00:39:03,220 --> 00:39:04,840

 
783
00:39:04,840 --> 00:39:07,150

 
784
00:39:07,150 --> 00:39:09,580

 
785
00:39:09,580 --> 00:39:11,410

 
786
00:39:11,410 --> 00:39:13,030

 
787
00:39:13,030 --> 00:39:15,160

 
788
00:39:15,160 --> 00:39:17,530

 
789
00:39:17,530 --> 00:39:20,320

 
790
00:39:20,320 --> 00:39:24,400

 
791
00:39:24,400 --> 00:39:27,010

 
792
00:39:27,010 --> 00:39:28,690

 
793
00:39:28,690 --> 00:39:31,530

 
794
00:39:31,530 --> 00:39:33,550

 
795
00:39:33,550 --> 00:39:35,230

 
796
00:39:35,230 --> 00:39:38,050

 
797
00:39:38,050 --> 00:39:41,830

 
798
00:39:41,830 --> 00:39:41,840

 
799
00:39:41,840 --> 00:39:47,650


800
00:39:47,650 --> 00:39:51,260

 
801
00:39:51,260 --> 00:39:56,260

 
802
00:39:56,260 --> 00:39:58,550

 
803
00:39:58,550 --> 00:39:59,930

 
804
00:39:59,930 --> 00:40:03,710

 
805
00:40:03,710 --> 00:40:06,380

 
806
00:40:06,380 --> 00:40:09,170

 
807
00:40:09,170 --> 00:40:11,860

 
808
00:40:11,860 --> 00:40:14,840

 
809
00:40:14,840 --> 00:40:16,910

 
810
00:40:16,910 --> 00:40:20,540

 
811
00:40:20,540 --> 00:40:22,970

 
812
00:40:22,970 --> 00:40:25,700

 
813
00:40:25,700 --> 00:40:30,280

 
814
00:40:30,280 --> 00:40:33,410

 
815
00:40:33,410 --> 00:40:36,350

 
816
00:40:36,350 --> 00:40:39,260

 
817
00:40:39,260 --> 00:40:41,270

 
818
00:40:41,270 --> 00:40:43,040

 
819
00:40:43,040 --> 00:40:51,670

 
820
00:40:51,670 --> 00:40:55,550

 
821
00:40:55,550 --> 00:40:58,670

 
822
00:40:58,670 --> 00:41:04,610

 
823
00:41:04,610 --> 00:41:11,480

 
824
00:41:11,480 --> 00:41:13,520

 
825
00:41:13,520 --> 00:41:15,710

 
826
00:41:15,710 --> 00:41:18,590

 
827
00:41:18,590 --> 00:41:22,910

 
828
00:41:22,910 --> 00:41:25,580

 
829
00:41:25,580 --> 00:41:26,990

 
830
00:41:26,990 --> 00:41:34,960

 
831
00:41:34,960 --> 00:41:38,060

 
832
00:41:38,060 --> 00:41:41,930

 
833
00:41:41,930 --> 00:41:43,880

 
834
00:41:43,880 --> 00:41:47,240

 
835
00:41:47,240 --> 00:41:50,300

 
836
00:41:50,300 --> 00:41:52,430

 
837
00:41:52,430 --> 00:41:55,300

 
838
00:41:55,300 --> 00:41:59,240

 
839
00:41:59,240 --> 00:42:00,250

 
840
00:42:00,250 --> 00:42:03,130

 
841
00:42:03,130 --> 00:42:05,710

 
842
00:42:05,710 --> 00:42:08,170

 
843
00:42:08,170 --> 00:42:11,140

 
844
00:42:11,140 --> 00:42:14,560

 
845
00:42:14,560 --> 00:42:17,620

 
846
00:42:17,620 --> 00:42:22,180

 
847
00:42:22,180 --> 00:42:26,650

 
848
00:42:26,650 --> 00:42:29,109

 
849
00:42:29,109 --> 00:42:31,900

 
850
00:42:31,900 --> 00:42:34,570

 
851
00:42:34,570 --> 00:42:37,750

 
852
00:42:37,750 --> 00:42:39,760

 
853
00:42:39,760 --> 00:42:42,460

 
854
00:42:42,460 --> 00:42:44,530

 
855
00:42:44,530 --> 00:42:47,680

 
856
00:42:47,680 --> 00:42:50,380

 
857
00:42:50,380 --> 00:42:54,340

 
858
00:42:54,340 --> 00:42:56,950

 
859
00:42:56,950 --> 00:42:58,900

 
860
00:42:58,900 --> 00:43:01,330

 
861
00:43:01,330 --> 00:43:03,270

 
862
00:43:03,270 --> 00:43:05,440

 
863
00:43:05,440 --> 00:43:07,660

 
864
00:43:07,660 --> 00:43:14,590

 
865
00:43:14,590 --> 00:43:16,120

 
866
00:43:16,120 --> 00:43:18,940

 
867
00:43:18,940 --> 00:43:22,660

 
868
00:43:22,660 --> 00:43:26,440

 
869
00:43:26,440 --> 00:43:29,950

 
870
00:43:29,950 --> 00:43:32,859

 
871
00:43:32,859 --> 00:43:38,590

 
872
00:43:38,590 --> 00:43:41,590

 
873
00:43:41,590 --> 00:43:43,960

 
874
00:43:43,960 --> 00:43:46,480

 
875
00:43:46,480 --> 00:43:48,040

 
876
00:43:48,040 --> 00:44:01,329

 
877
00:44:01,329 --> 00:44:05,540

 
878
00:44:05,540 --> 00:44:07,910

 
879
00:44:07,910 --> 00:44:10,970

 
880
00:44:10,970 --> 00:44:12,680

 
881
00:44:12,680 --> 00:44:15,490

 
882
00:44:15,490 --> 00:44:18,349

 
883
00:44:18,349 --> 00:44:21,140

 
884
00:44:21,140 --> 00:44:22,819

 
885
00:44:22,819 --> 00:44:24,530

 
886
00:44:24,530 --> 00:44:26,990

 
887
00:44:26,990 --> 00:44:28,670

 
888
00:44:28,670 --> 00:44:30,290

 
889
00:44:30,290 --> 00:44:32,650

 
890
00:44:32,650 --> 00:44:36,339

 
891
00:44:36,339 --> 00:44:40,609

 
892
00:44:40,609 --> 00:44:42,380

 
893
00:44:42,380 --> 00:44:45,050

 
894
00:44:45,050 --> 00:44:47,120

 
895
00:44:47,120 --> 00:44:48,589

 
896
00:44:48,589 --> 00:44:51,980

 
897
00:44:51,980 --> 00:44:53,900

 
898
00:44:53,900 --> 00:44:56,870

 
899
00:44:56,870 --> 00:44:59,240

 
900
00:44:59,240 --> 00:45:02,150

 
901
00:45:02,150 --> 00:45:02,160

 
902
00:45:02,160 --> 00:45:03,069


903
00:45:03,069 --> 00:45:05,839

 
904
00:45:05,839 --> 00:45:07,849

 
905
00:45:07,849 --> 00:45:11,630

 
906
00:45:11,630 --> 00:45:14,870

 
907
00:45:14,870 --> 00:45:16,940

 
908
00:45:16,940 --> 00:45:19,460

 
909
00:45:19,460 --> 00:45:21,230

 
910
00:45:21,230 --> 00:45:22,970

 
911
00:45:22,970 --> 00:45:29,359

 
912
00:45:29,359 --> 00:45:32,359

 
913
00:45:32,359 --> 00:45:34,309

 
914
00:45:34,309 --> 00:45:37,130

 
915
00:45:37,130 --> 00:45:44,870

 
916
00:45:44,870 --> 00:45:48,260

 
917
00:45:48,260 --> 00:45:51,349

 
918
00:45:51,349 --> 00:45:54,770

 
919
00:45:54,770 --> 00:45:58,130

 
920
00:45:58,130 --> 00:46:02,030

 
921
00:46:02,030 --> 00:46:05,690

 
922
00:46:05,690 --> 00:46:08,630

 
923
00:46:08,630 --> 00:46:11,150

 
924
00:46:11,150 --> 00:46:13,820

 
925
00:46:13,820 --> 00:46:16,460

 
926
00:46:16,460 --> 00:46:20,300

 
927
00:46:20,300 --> 00:46:24,710

 
928
00:46:24,710 --> 00:46:26,930

 
929
00:46:26,930 --> 00:46:30,860

 
930
00:46:30,860 --> 00:46:33,470

 
931
00:46:33,470 --> 00:46:34,970

 
932
00:46:34,970 --> 00:46:37,520

 
933
00:46:37,520 --> 00:46:42,170

 
934
00:46:42,170 --> 00:46:46,720

 
935
00:46:46,720 --> 00:46:51,950

 
936
00:46:51,950 --> 00:46:53,480

 
937
00:46:53,480 --> 00:46:55,940

 
938
00:46:55,940 --> 00:46:59,810

 
939
00:46:59,810 --> 00:47:02,210

 
940
00:47:02,210 --> 00:47:05,300

 
941
00:47:05,300 --> 00:47:11,420

 
942
00:47:11,420 --> 00:47:15,380

 
943
00:47:15,380 --> 00:47:19,280

 
944
00:47:19,280 --> 00:47:25,670

 
945
00:47:25,670 --> 00:47:28,670

 
946
00:47:28,670 --> 00:47:31,580

 
947
00:47:31,580 --> 00:47:39,530

 
948
00:47:39,530 --> 00:47:42,500

 
949
00:47:42,500 --> 00:47:44,930

 
950
00:47:44,930 --> 00:47:47,870

 
951
00:47:47,870 --> 00:47:49,700

 
952
00:47:49,700 --> 00:47:56,870

 
953
00:47:56,870 --> 00:47:58,730

 
954
00:47:58,730 --> 00:48:01,580

 
955
00:48:01,580 --> 00:48:03,950

 
956
00:48:03,950 --> 00:48:05,960

 
957
00:48:05,960 --> 00:48:08,690

 
958
00:48:08,690 --> 00:48:11,180

 
959
00:48:11,180 --> 00:48:14,390

 
960
00:48:14,390 --> 00:48:22,089

 
961
00:48:22,089 --> 00:48:24,000

 
962
00:48:24,000 --> 00:48:27,390

 
963
00:48:27,390 --> 00:48:31,540

 
964
00:48:31,540 --> 00:48:31,550

 
965
00:48:31,550 --> 00:48:32,380


966
00:48:32,380 --> 00:48:33,880

 
967
00:48:33,880 --> 00:48:35,530

 
968
00:48:35,530 --> 00:48:38,260

 
969
00:48:38,260 --> 00:48:40,810

 
970
00:48:40,810 --> 00:48:43,089

 
971
00:48:43,089 --> 00:48:44,620

 
972
00:48:44,620 --> 00:48:47,680

 
973
00:48:47,680 --> 00:48:49,240

 
974
00:48:49,240 --> 00:48:51,880

 
975
00:48:51,880 --> 00:48:55,330

 
976
00:48:55,330 --> 00:48:59,170

 
977
00:48:59,170 --> 00:49:00,339

 
978
00:49:00,339 --> 00:49:02,050

 
979
00:49:02,050 --> 00:49:05,080

 
980
00:49:05,080 --> 00:49:08,589

 
981
00:49:08,589 --> 00:49:11,710

 
982
00:49:11,710 --> 00:49:15,190

 
983
00:49:15,190 --> 00:49:17,890

 
984
00:49:17,890 --> 00:49:20,770

 
985
00:49:20,770 --> 00:49:22,630

 
986
00:49:22,630 --> 00:49:26,230

 
987
00:49:26,230 --> 00:49:36,849

 
988
00:49:36,849 --> 00:49:36,859

 
989
00:49:36,859 --> 00:49:46,449


990
00:49:46,449 --> 00:49:48,940

 
991
00:49:48,940 --> 00:49:51,279

 
992
00:49:51,279 --> 00:49:53,049

 
993
00:49:53,049 --> 00:49:54,870

 
994
00:49:54,870 --> 00:49:58,690

 
995
00:49:58,690 --> 00:49:59,799

 
996
00:49:59,799 --> 00:50:01,930

 
997
00:50:01,930 --> 00:50:03,789

 
998
00:50:03,789 --> 00:50:05,469

 
999
00:50:05,469 --> 00:50:06,940

 
1000
00:50:06,940 --> 00:50:09,549

 
1001
00:50:09,549 --> 00:50:11,199

 
1002
00:50:11,199 --> 00:50:12,599

 
1003
00:50:12,599 --> 00:50:16,150

 
1004
00:50:16,150 --> 00:50:19,779

 
1005
00:50:19,779 --> 00:50:28,299

 
1006
00:50:28,299 --> 00:50:30,609

 
1007
00:50:30,609 --> 00:50:32,229

 
1008
00:50:32,229 --> 00:50:36,160

 
1009
00:50:36,160 --> 00:50:38,380

 
1010
00:50:38,380 --> 00:50:41,469

 
1011
00:50:41,469 --> 00:50:43,120

 
1012
00:50:43,120 --> 00:50:55,599

 
1013
00:50:55,599 --> 00:50:58,390

 
1014
00:50:58,390 --> 00:51:04,229

 
1015
00:51:04,229 --> 00:51:09,249

 
1016
00:51:09,249 --> 00:51:11,229

 
1017
00:51:11,229 --> 00:51:13,479

 
1018
00:51:13,479 --> 00:51:16,839

 
1019
00:51:16,839 --> 00:51:20,589

 
1020
00:51:20,589 --> 00:51:22,630

 
1021
00:51:22,630 --> 00:51:24,459

 
1022
00:51:24,459 --> 00:51:27,400

 
1023
00:51:27,400 --> 00:51:30,009

 
1024
00:51:30,009 --> 00:51:31,539

 
1025
00:51:31,539 --> 00:51:37,690

 
1026
00:51:37,690 --> 00:51:40,089

 
1027
00:51:40,089 --> 00:51:43,779

 
1028
00:51:43,779 --> 00:51:48,519

 
1029
00:51:48,519 --> 00:51:50,979

 
1030
00:51:50,979 --> 00:51:53,739

 
1031
00:51:53,739 --> 00:51:55,449

 
1032
00:51:55,449 --> 00:51:59,079

 
1033
00:51:59,079 --> 00:52:03,459

 
1034
00:52:03,459 --> 00:52:05,799

 
1035
00:52:05,799 --> 00:52:08,349

 
1036
00:52:08,349 --> 00:52:09,880

 
1037
00:52:09,880 --> 00:52:12,670

 
1038
00:52:12,670 --> 00:52:14,199

 
1039
00:52:14,199 --> 00:52:16,150

 
1040
00:52:16,150 --> 00:52:18,400

 
1041
00:52:18,400 --> 00:52:21,189

 
1042
00:52:21,189 --> 00:52:23,499

 
1043
00:52:23,499 --> 00:52:26,019

 
1044
00:52:26,019 --> 00:52:29,019

 
1045
00:52:29,019 --> 00:52:30,910

 
1046
00:52:30,910 --> 00:52:33,219

 
1047
00:52:33,219 --> 00:52:36,819

 
1048
00:52:36,819 --> 00:52:39,579

 
1049
00:52:39,579 --> 00:52:42,699

 
1050
00:52:42,699 --> 00:52:44,650

 
1051
00:52:44,650 --> 00:52:49,390

 
1052
00:52:49,390 --> 00:52:51,910

 
1053
00:52:51,910 --> 00:52:54,880

 
1054
00:52:54,880 --> 00:52:56,979

 
1055
00:52:56,979 --> 00:52:59,380

 
1056
00:52:59,380 --> 00:53:02,199

 
1057
00:53:02,199 --> 00:53:05,049

 
1058
00:53:05,049 --> 00:53:08,930

 
1059
00:53:08,930 --> 00:53:12,280

 
1060
00:53:12,280 --> 00:53:15,829

 
1061
00:53:15,829 --> 00:53:17,660

 
1062
00:53:17,660 --> 00:53:21,010

 
1063
00:53:21,010 --> 00:53:26,329

 
1064
00:53:26,329 --> 00:53:27,710

 
1065
00:53:27,710 --> 00:53:29,599

 
1066
00:53:29,599 --> 00:53:32,359

 
1067
00:53:32,359 --> 00:53:36,680

 
1068
00:53:36,680 --> 00:53:39,819

 
1069
00:53:39,819 --> 00:53:44,150

 
1070
00:53:44,150 --> 00:53:48,559

 
1071
00:53:48,559 --> 00:53:50,089

 
1072
00:53:50,089 --> 00:53:52,579

 
1073
00:53:52,579 --> 00:53:54,500

 
1074
00:53:54,500 --> 00:53:59,930

 
1075
00:53:59,930 --> 00:54:04,520

 
1076
00:54:04,520 --> 00:54:06,440

 
1077
00:54:06,440 --> 00:54:09,109

 
1078
00:54:09,109 --> 00:54:11,059

 
1079
00:54:11,059 --> 00:54:12,890

 
1080
00:54:12,890 --> 00:54:17,329

 
1081
00:54:17,329 --> 00:54:19,940

 
1082
00:54:19,940 --> 00:54:22,010

 
1083
00:54:22,010 --> 00:54:23,809

 
1084
00:54:23,809 --> 00:54:26,450

 
1085
00:54:26,450 --> 00:54:28,579

 
1086
00:54:28,579 --> 00:54:31,400

 
1087
00:54:31,400 --> 00:54:34,190

 
1088
00:54:34,190 --> 00:54:37,339

 
1089
00:54:37,339 --> 00:54:39,770

 
1090
00:54:39,770 --> 00:54:43,819

 
1091
00:54:43,819 --> 00:54:47,960

 
1092
00:54:47,960 --> 00:54:49,579

 
1093
00:54:49,579 --> 00:54:53,599

 
1094
00:54:53,599 --> 00:54:55,940

 
1095
00:54:55,940 --> 00:54:57,559

 
1096
00:54:57,559 --> 00:55:01,700

 
1097
00:55:01,700 --> 00:55:03,859

 
1098
00:55:03,859 --> 00:55:05,270

 
1099
00:55:05,270 --> 00:55:08,660

 
1100
00:55:08,660 --> 00:55:11,349

 
1101
00:55:11,349 --> 00:55:20,599

 
1102
00:55:20,599 --> 00:55:25,339

 
1103
00:55:25,339 --> 00:55:29,789

 
1104
00:55:29,789 --> 00:55:32,279

 
1105
00:55:32,279 --> 00:55:35,219

 
1106
00:55:35,219 --> 00:55:37,469

 
1107
00:55:37,469 --> 00:55:41,189

 
1108
00:55:41,189 --> 00:55:44,430

 
1109
00:55:44,430 --> 00:55:46,259

 
1110
00:55:46,259 --> 00:55:48,989

 
1111
00:55:48,989 --> 00:55:50,759

 
1112
00:55:50,759 --> 00:55:52,890

 
1113
00:55:52,890 --> 00:55:55,559

 
1114
00:55:55,559 --> 00:55:58,289

 
1115
00:55:58,289 --> 00:56:01,140

 
1116
00:56:01,140 --> 00:56:03,719

 
1117
00:56:03,719 --> 00:56:06,900

 
1118
00:56:06,900 --> 00:56:09,029

 
1119
00:56:09,029 --> 00:56:13,349

 
1120
00:56:13,349 --> 00:56:15,829

 
1121
00:56:15,829 --> 00:56:19,229

 
1122
00:56:19,229 --> 00:56:22,079

 
1123
00:56:22,079 --> 00:56:25,680

 
1124
00:56:25,680 --> 00:56:28,620

 
1125
00:56:28,620 --> 00:56:36,329

 
1126
00:56:36,329 --> 00:56:38,069

 
1127
00:56:38,069 --> 00:56:40,559

 
1128
00:56:40,559 --> 00:56:42,749

 
1129
00:56:42,749 --> 00:56:45,390

 
1130
00:56:45,390 --> 00:56:48,900

 
1131
00:56:48,900 --> 00:56:51,479

 
1132
00:56:51,479 --> 00:56:53,640

 
1133
00:56:53,640 --> 00:56:56,789

 
1134
00:56:56,789 --> 00:56:58,799

 
1135
00:56:58,799 --> 00:57:02,009

 
1136
00:57:02,009 --> 00:57:03,660

 
1137
00:57:03,660 --> 00:57:09,989

 
1138
00:57:09,989 --> 00:57:13,589

 
1139
00:57:13,589 --> 00:57:16,650

 
1140
00:57:16,650 --> 00:57:19,410

 
1141
00:57:19,410 --> 00:57:22,650

 
1142
00:57:22,650 --> 00:57:25,049

 
1143
00:57:25,049 --> 00:57:28,300

 
1144
00:57:28,300 --> 00:57:32,230

 
1145
00:57:32,230 --> 00:57:36,280

 
1146
00:57:36,280 --> 00:57:39,520

 
1147
00:57:39,520 --> 00:57:42,820

 
1148
00:57:42,820 --> 00:57:45,849

 
1149
00:57:45,849 --> 00:57:48,310

 
1150
00:57:48,310 --> 00:57:53,290

 
1151
00:57:53,290 --> 00:57:56,710

 
1152
00:57:56,710 --> 00:57:58,599

 
1153
00:57:58,599 --> 00:58:03,700

 
1154
00:58:03,700 --> 00:58:05,620

 
1155
00:58:05,620 --> 00:58:10,690

 
1156
00:58:10,690 --> 00:58:13,290

 
1157
00:58:13,290 --> 00:58:16,570

 
1158
00:58:16,570 --> 00:58:18,940

 
1159
00:58:18,940 --> 00:58:22,870

 
1160
00:58:22,870 --> 00:58:24,880

 
1161
00:58:24,880 --> 00:58:27,550

 
1162
00:58:27,550 --> 00:58:29,950

 
1163
00:58:29,950 --> 00:58:33,730

 
1164
00:58:33,730 --> 00:58:36,339

 
1165
00:58:36,339 --> 00:58:37,839

 
1166
00:58:37,839 --> 00:58:41,079

 
1167
00:58:41,079 --> 00:58:44,680

 
1168
00:58:44,680 --> 00:58:46,599

 
1169
00:58:46,599 --> 00:58:48,460

 
1170
00:58:48,460 --> 00:58:50,920

 
1171
00:58:50,920 --> 00:58:53,020

 
1172
00:58:53,020 --> 00:58:55,690

 
1173
00:58:55,690 --> 00:58:59,010

 
1174
00:58:59,010 --> 00:59:02,680

 
1175
00:59:02,680 --> 00:59:10,170

 
1176
00:59:10,170 --> 00:59:13,000

 
1177
00:59:13,000 --> 00:59:16,990

 
1178
00:59:16,990 --> 00:59:19,359

 
1179
00:59:19,359 --> 00:59:21,849

 
1180
00:59:21,849 --> 00:59:23,530

 
1181
00:59:23,530 --> 00:59:25,450

 
1182
00:59:25,450 --> 00:59:29,079

 
1183
00:59:29,079 --> 00:59:31,240

 
1184
00:59:31,240 --> 00:59:33,609

 
1185
00:59:33,609 --> 00:59:35,980

 
1186
00:59:35,980 --> 00:59:37,900

 
1187
00:59:37,900 --> 00:59:39,250

 
1188
00:59:39,250 --> 00:59:44,980

 
1189
00:59:44,980 --> 00:59:48,880

 
1190
00:59:48,880 --> 00:59:50,920

 
1191
00:59:50,920 --> 00:59:52,750

 
1192
00:59:52,750 --> 00:59:55,060

 
1193
00:59:55,060 --> 00:59:57,190

 
1194
00:59:57,190 --> 00:59:58,990

 
1195
00:59:58,990 --> 01:00:00,790

 
1196
01:00:00,790 --> 01:00:04,329

 
1197
01:00:04,329 --> 01:00:05,980

 
1198
01:00:05,980 --> 01:00:08,500

 
1199
01:00:08,500 --> 01:00:12,160

 
1200
01:00:12,160 --> 01:00:19,620

 
1201
01:00:19,620 --> 01:00:22,900

 
1202
01:00:22,900 --> 01:00:25,689

 
1203
01:00:25,689 --> 01:00:28,239

 
1204
01:00:28,239 --> 01:00:29,920

 
1205
01:00:29,920 --> 01:00:31,959

 
1206
01:00:31,959 --> 01:00:33,400

 
1207
01:00:33,400 --> 01:00:35,890

 
1208
01:00:35,890 --> 01:00:48,400

 
1209
01:00:48,400 --> 01:00:53,769

 
1210
01:00:53,769 --> 01:00:56,109

 
1211
01:00:56,109 --> 01:00:58,410

 
1212
01:00:58,410 --> 01:01:00,160

 
1213
01:01:00,160 --> 01:01:05,920

 
1214
01:01:05,920 --> 01:01:09,189

 
1215
01:01:09,189 --> 01:01:12,120

 
1216
01:01:12,120 --> 01:01:14,709

 
1217
01:01:14,709 --> 01:01:17,559

 
1218
01:01:17,559 --> 01:01:20,559

 
1219
01:01:20,559 --> 01:01:23,920

 
1220
01:01:23,920 --> 01:01:25,479

 
1221
01:01:25,479 --> 01:01:34,120

 
1222
01:01:34,120 --> 01:01:43,480

 
1223
01:01:43,480 --> 01:01:49,940

 
1224
01:01:49,940 --> 01:01:52,250

 
1225
01:01:52,250 --> 01:01:54,680

 
1226
01:01:54,680 --> 01:01:58,760

 
1227
01:01:58,760 --> 01:02:01,250

 
1228
01:02:01,250 --> 01:02:02,900

 
1229
01:02:02,900 --> 01:02:06,230

 
1230
01:02:06,230 --> 01:02:09,470

 
1231
01:02:09,470 --> 01:02:14,000

 
1232
01:02:14,000 --> 01:02:19,670

 
1233
01:02:19,670 --> 01:02:28,550

 
1234
01:02:28,550 --> 01:02:32,960

 
1235
01:02:32,960 --> 01:02:39,010

 
1236
01:02:39,010 --> 01:02:44,180

 
1237
01:02:44,180 --> 01:02:46,970

 
1238
01:02:46,970 --> 01:02:53,060

 
1239
01:02:53,060 --> 01:02:55,550

 
1240
01:02:55,550 --> 01:02:58,670

 
1241
01:02:58,670 --> 01:03:00,560

 
1242
01:03:00,560 --> 01:03:02,720

 
1243
01:03:02,720 --> 01:03:05,750

 
1244
01:03:05,750 --> 01:03:07,970

 
1245
01:03:07,970 --> 01:03:10,700

 
1246
01:03:10,700 --> 01:03:12,590

 
1247
01:03:12,590 --> 01:03:13,970

 
1248
01:03:13,970 --> 01:03:15,890

 
1249
01:03:15,890 --> 01:03:18,620

 
1250
01:03:18,620 --> 01:03:20,420

 
1251
01:03:20,420 --> 01:03:25,490

 
1252
01:03:25,490 --> 01:03:27,620

 
1253
01:03:27,620 --> 01:03:27,630

 
1254
01:03:27,630 --> 01:03:28,070


1255
01:03:28,070 --> 01:03:30,830

 
1256
01:03:30,830 --> 01:03:32,960

 
1257
01:03:32,960 --> 01:03:35,390

 
1258
01:03:35,390 --> 01:03:38,200

 
1259
01:03:38,200 --> 01:03:41,720

 
1260
01:03:41,720 --> 01:03:44,600

 
1261
01:03:44,600 --> 01:03:47,360

 
1262
01:03:47,360 --> 01:03:50,590

 
1263
01:03:50,590 --> 01:03:53,300

 
1264
01:03:53,300 --> 01:03:55,700

 
1265
01:03:55,700 --> 01:03:56,640

 
1266
01:03:56,640 --> 01:03:58,890

 
1267
01:03:58,890 --> 01:04:01,680

 
1268
01:04:01,680 --> 01:04:04,289

 
1269
01:04:04,289 --> 01:04:07,500

 
1270
01:04:07,500 --> 01:04:10,940

 
1271
01:04:10,940 --> 01:04:13,529

 
1272
01:04:13,529 --> 01:04:16,980

 
1273
01:04:16,980 --> 01:04:18,809

 
1274
01:04:18,809 --> 01:04:21,960

 
1275
01:04:21,960 --> 01:04:24,089

 
1276
01:04:24,089 --> 01:04:25,760

 
1277
01:04:25,760 --> 01:04:29,490

 
1278
01:04:29,490 --> 01:04:31,410

 
1279
01:04:31,410 --> 01:04:35,010

 
1280
01:04:35,010 --> 01:04:36,990

 
1281
01:04:36,990 --> 01:04:39,630

 
1282
01:04:39,630 --> 01:04:41,609

 
1283
01:04:41,609 --> 01:04:44,640

 
1284
01:04:44,640 --> 01:04:48,569

 
1285
01:04:48,569 --> 01:04:50,940

 
1286
01:04:50,940 --> 01:04:52,980

 
1287
01:04:52,980 --> 01:04:56,190

 
1288
01:04:56,190 --> 01:04:57,900

 
1289
01:04:57,900 --> 01:05:00,000

 
1290
01:05:00,000 --> 01:05:02,339

 
1291
01:05:02,339 --> 01:05:05,400

 
1292
01:05:05,400 --> 01:05:07,620

 
1293
01:05:07,620 --> 01:05:09,539

 
1294
01:05:09,539 --> 01:05:12,690

 
1295
01:05:12,690 --> 01:05:14,849

 
1296
01:05:14,849 --> 01:05:19,039

 
1297
01:05:19,039 --> 01:05:22,019

 
1298
01:05:22,019 --> 01:05:24,599

 
1299
01:05:24,599 --> 01:05:28,799

 
1300
01:05:28,799 --> 01:05:30,870

 
1301
01:05:30,870 --> 01:05:33,510

 
1302
01:05:33,510 --> 01:05:34,920

 
1303
01:05:34,920 --> 01:05:37,049

 
1304
01:05:37,049 --> 01:05:45,920

 
1305
01:05:45,920 --> 01:05:48,809

 
1306
01:05:48,809 --> 01:05:50,660

 
1307
01:05:50,660 --> 01:05:53,760

 
1308
01:05:53,760 --> 01:05:56,730

 
1309
01:05:56,730 --> 01:06:00,029

 
1310
01:06:00,029 --> 01:06:02,700

 
1311
01:06:02,700 --> 01:06:04,769

 
1312
01:06:04,769 --> 01:06:07,019

 
1313
01:06:07,019 --> 01:06:09,680

 
1314
01:06:09,680 --> 01:06:16,750

 
1315
01:06:16,750 --> 01:06:19,520

 
1316
01:06:19,520 --> 01:06:21,560

 
1317
01:06:21,560 --> 01:06:23,600

 
1318
01:06:23,600 --> 01:06:26,000

 
1319
01:06:26,000 --> 01:06:28,940

 
1320
01:06:28,940 --> 01:06:31,010

 
1321
01:06:31,010 --> 01:06:32,270

 
1322
01:06:32,270 --> 01:06:36,860

 
1323
01:06:36,860 --> 01:06:38,780

 
1324
01:06:38,780 --> 01:06:41,420

 
1325
01:06:41,420 --> 01:06:44,960

 
1326
01:06:44,960 --> 01:06:47,930

 
1327
01:06:47,930 --> 01:06:49,700

 
1328
01:06:49,700 --> 01:06:52,340

 
1329
01:06:52,340 --> 01:06:56,060

 
1330
01:06:56,060 --> 01:06:58,490

 
1331
01:06:58,490 --> 01:07:00,500

 
1332
01:07:00,500 --> 01:07:02,840

 
1333
01:07:02,840 --> 01:07:04,610

 
1334
01:07:04,610 --> 01:07:07,790

 
1335
01:07:07,790 --> 01:07:11,600

 
1336
01:07:11,600 --> 01:07:13,820

 
1337
01:07:13,820 --> 01:07:16,520

 
1338
01:07:16,520 --> 01:07:20,120

 
1339
01:07:20,120 --> 01:07:23,180

 
1340
01:07:23,180 --> 01:07:25,070

 
1341
01:07:25,070 --> 01:07:27,020

 
1342
01:07:27,020 --> 01:07:29,060

 
1343
01:07:29,060 --> 01:07:31,700

 
1344
01:07:31,700 --> 01:07:34,460

 
1345
01:07:34,460 --> 01:07:36,440

 
1346
01:07:36,440 --> 01:07:38,930

 
1347
01:07:38,930 --> 01:07:40,520

 
1348
01:07:40,520 --> 01:07:42,530

 
1349
01:07:42,530 --> 01:07:47,290

 
1350
01:07:47,290 --> 01:08:08,100

 
1351
01:08:08,100 --> 01:08:13,120

 
1352
01:08:13,120 --> 01:08:15,400

 
1353
01:08:15,400 --> 01:08:17,590

 
1354
01:08:17,590 --> 01:08:19,930

 
1355
01:08:19,930 --> 01:08:22,150

 
1356
01:08:22,150 --> 01:08:24,990

 
1357
01:08:24,990 --> 01:08:27,400

 
1358
01:08:27,400 --> 01:08:29,020

 
1359
01:08:29,020 --> 01:08:31,170

 
1360
01:08:31,170 --> 01:08:34,780

 
1361
01:08:34,780 --> 01:08:36,790

 
1362
01:08:36,790 --> 01:08:39,700

 
1363
01:08:39,700 --> 01:08:42,430

 
1364
01:08:42,430 --> 01:08:44,500

 
1365
01:08:44,500 --> 01:08:56,590

 
1366
01:08:56,590 --> 01:08:59,770

 
1367
01:08:59,770 --> 01:09:01,540

 
1368
01:09:01,540 --> 01:09:04,810

 
1369
01:09:04,810 --> 01:09:07,060

 
1370
01:09:07,060 --> 01:09:09,160

 
1371
01:09:09,160 --> 01:09:12,270

 
1372
01:09:12,270 --> 01:09:14,350

 
1373
01:09:14,350 --> 01:09:16,060

 
1374
01:09:16,060 --> 01:09:20,620

 
1375
01:09:20,620 --> 01:09:23,260

 
1376
01:09:23,260 --> 01:09:25,660

 
1377
01:09:25,660 --> 01:09:28,900

 
1378
01:09:28,900 --> 01:09:31,170

 
1379
01:09:31,170 --> 01:09:34,210

 
1380
01:09:34,210 --> 01:09:36,300

 
1381
01:09:36,300 --> 01:09:39,340

 
1382
01:09:39,340 --> 01:09:41,380

 
1383
01:09:41,380 --> 01:09:44,170

 
1384
01:09:44,170 --> 01:09:46,240

 
1385
01:09:46,240 --> 01:09:49,300

 
1386
01:09:49,300 --> 01:09:51,690

 
1387
01:09:51,690 --> 01:09:54,040

 
1388
01:09:54,040 --> 01:09:55,930

 
1389
01:09:55,930 --> 01:09:59,440

 
1390
01:09:59,440 --> 01:10:02,290

 
1391
01:10:02,290 --> 01:10:03,730

 
1392
01:10:03,730 --> 01:10:05,430

 
1393
01:10:05,430 --> 01:10:07,740

 
1394
01:10:07,740 --> 01:10:12,350

 
1395
01:10:12,350 --> 01:10:15,300

 
1396
01:10:15,300 --> 01:10:19,980

 
1397
01:10:19,980 --> 01:10:22,830

 
1398
01:10:22,830 --> 01:10:26,700

 
1399
01:10:26,700 --> 01:10:29,130

 
1400
01:10:29,130 --> 01:10:32,430

 
1401
01:10:32,430 --> 01:10:35,730

 
1402
01:10:35,730 --> 01:10:38,520

 
1403
01:10:38,520 --> 01:10:42,240

 
1404
01:10:42,240 --> 01:10:44,210

 
1405
01:10:44,210 --> 01:10:48,510

 
1406
01:10:48,510 --> 01:10:50,430

 
1407
01:10:50,430 --> 01:10:55,860

 
1408
01:10:55,860 --> 01:10:58,410

 
1409
01:10:58,410 --> 01:11:00,990

 
1410
01:11:00,990 --> 01:11:03,240

 
1411
01:11:03,240 --> 01:11:06,840

 
1412
01:11:06,840 --> 01:11:10,500

 
1413
01:11:10,500 --> 01:11:12,420

 
1414
01:11:12,420 --> 01:11:15,120

 
1415
01:11:15,120 --> 01:11:17,730

 
1416
01:11:17,730 --> 01:11:20,100

 
1417
01:11:20,100 --> 01:11:22,860

 
1418
01:11:22,860 --> 01:11:24,750

 
1419
01:11:24,750 --> 01:11:29,630

 
1420
01:11:29,630 --> 01:11:32,070

 
1421
01:11:32,070 --> 01:11:34,680

 
1422
01:11:34,680 --> 01:11:37,650

 
1423
01:11:37,650 --> 01:11:39,530

 
1424
01:11:39,530 --> 01:11:41,670

 
1425
01:11:41,670 --> 01:11:43,230

 
1426
01:11:43,230 --> 01:11:45,840

 
1427
01:11:45,840 --> 01:11:47,880

 
1428
01:11:47,880 --> 01:11:51,510

 
1429
01:11:51,510 --> 01:11:53,550

 
1430
01:11:53,550 --> 01:11:55,610

 
1431
01:11:55,610 --> 01:11:57,720

 
1432
01:11:57,720 --> 01:12:00,450

 
1433
01:12:00,450 --> 01:12:02,910

 
1434
01:12:02,910 --> 01:12:04,470

 
1435
01:12:04,470 --> 01:12:10,260

 
1436
01:12:10,260 --> 01:12:12,510

 
1437
01:12:12,510 --> 01:12:15,240

 
1438
01:12:15,240 --> 01:12:17,370

 
1439
01:12:17,370 --> 01:12:18,930

 
1440
01:12:18,930 --> 01:12:20,820

 
1441
01:12:20,820 --> 01:12:22,680

 
1442
01:12:22,680 --> 01:12:26,220

 
1443
01:12:26,220 --> 01:12:28,130

 
1444
01:12:28,130 --> 01:12:34,020

 
1445
01:12:34,020 --> 01:12:36,840

 
1446
01:12:36,840 --> 01:12:38,700

 
1447
01:12:38,700 --> 01:12:41,010

 
1448
01:12:41,010 --> 01:12:42,900

 
1449
01:12:42,900 --> 01:12:44,640

 
1450
01:12:44,640 --> 01:12:51,680

 
1451
01:12:51,680 --> 01:12:55,439

 
1452
01:12:55,439 --> 01:12:58,320

 
1453
01:12:58,320 --> 01:13:02,010

 
1454
01:13:02,010 --> 01:13:05,430

 
1455
01:13:05,430 --> 01:13:07,979

 
1456
01:13:07,979 --> 01:13:09,570

 
1457
01:13:09,570 --> 01:13:12,990

 
1458
01:13:12,990 --> 01:13:16,680

 
1459
01:13:16,680 --> 01:13:19,320

 
1460
01:13:19,320 --> 01:13:21,330

 
1461
01:13:21,330 --> 01:13:23,550

 
1462
01:13:23,550 --> 01:13:26,310

 
1463
01:13:26,310 --> 01:13:28,170

 
1464
01:13:28,170 --> 01:13:29,729

 
1465
01:13:29,729 --> 01:13:31,830

 
1466
01:13:31,830 --> 01:13:33,840

 
1467
01:13:33,840 --> 01:13:38,400

 
1468
01:13:38,400 --> 01:13:41,840

 
1469
01:13:41,840 --> 01:13:46,740

 
1470
01:13:46,740 --> 01:13:51,500

 
1471
01:13:51,500 --> 01:13:53,880

 
1472
01:13:53,880 --> 01:13:55,920

 
1473
01:13:55,920 --> 01:13:58,380

 
1474
01:13:58,380 --> 01:14:00,930

 
1475
01:14:00,930 --> 01:14:02,970

 
1476
01:14:02,970 --> 01:14:05,310

 
1477
01:14:05,310 --> 01:14:07,530

 
1478
01:14:07,530 --> 01:14:09,600

 
1479
01:14:09,600 --> 01:14:12,120

 
1480
01:14:12,120 --> 01:14:16,620

 
1481
01:14:16,620 --> 01:14:18,660

 
1482
01:14:18,660 --> 01:14:21,870

 
1483
01:14:21,870 --> 01:14:23,610

 
1484
01:14:23,610 --> 01:14:23,620

 
1485
01:14:23,620 --> 01:14:26,130


1486
01:14:26,130 --> 01:14:28,410

 
1487
01:14:28,410 --> 01:14:30,000

 
1488
01:14:30,000 --> 01:14:31,620

 
1489
01:14:31,620 --> 01:14:32,790

 
1490
01:14:32,790 --> 01:14:34,740

 
1491
01:14:34,740 --> 01:14:38,959

 
1492
01:14:38,959 --> 01:14:45,080

 
1493
01:14:45,080 --> 01:14:47,340

 
1494
01:14:47,340 --> 01:14:50,040

 
1495
01:14:50,040 --> 01:14:56,040

 
1496
01:14:56,040 --> 01:14:58,440

 
1497
01:14:58,440 --> 01:15:04,290

 
1498
01:15:04,290 --> 01:15:09,540

 
1499
01:15:09,540 --> 01:15:13,050

 
1500
01:15:13,050 --> 01:15:16,950

 
1501
01:15:16,950 --> 01:15:19,290

 
1502
01:15:19,290 --> 01:15:21,290

 
1503
01:15:21,290 --> 01:15:29,370

 
1504
01:15:29,370 --> 01:15:30,950

 
1505
01:15:30,950 --> 01:15:41,880

 
1506
01:15:41,880 --> 01:15:44,310

 
1507
01:15:44,310 --> 01:15:46,680

 
1508
01:15:46,680 --> 01:15:48,690

 
1509
01:15:48,690 --> 01:15:51,630

 
1510
01:15:51,630 --> 01:15:53,190

 
1511
01:15:53,190 --> 01:15:54,810

 
1512
01:15:54,810 --> 01:15:58,080

 
1513
01:15:58,080 --> 01:15:59,850

 
1514
01:15:59,850 --> 01:16:01,770

 
1515
01:16:01,770 --> 01:16:03,240

 
1516
01:16:03,240 --> 01:16:07,290

 
1517
01:16:07,290 --> 01:16:10,620

 
1518
01:16:10,620 --> 01:16:12,209

 
1519
01:16:12,209 --> 01:16:14,940

 
1520
01:16:14,940 --> 01:16:17,760

 
1521
01:16:17,760 --> 01:16:21,000

 
1522
01:16:21,000 --> 01:16:22,770

 
1523
01:16:22,770 --> 01:16:24,990

 
1524
01:16:24,990 --> 01:16:28,160

 
1525
01:16:28,160 --> 01:16:38,169

 
1526
01:16:38,169 --> 01:16:38,179

 
1527
01:16:38,179 --> 01:16:40,239