1 00:00:01,069 --> 00:00:03,189 the following content is provided under a Creative Commons license your support will help MIT OpenCourseWare continue to offer high quality educational resources for free to make a donation or to view additional materials from hundreds of MIT courses visit MIT opencourseware at ocw.mit.edu today we're gonna talk about multi-core programming and I as I was just informed by Charles it's 2018 I had 2017 on the slide okay so first congratulations to all of you you turned in the first projects beta here's a here's a plot showing the tiers that different groups reach for the beta and this is in sorted order and we set the beta caught off to be tier 45 the final cut off is tier 48 so the final cut off we did set a little bit aggressively but keep in mind that you don't necessarily have to get to the final cut off in order to get an A on this project okay so we're going to talk about multi-core processing today that's gonna be a topic of the next project after you finish the first project so so in a multi-core processor we have a whole bunch of cores that are all placed on the same chip and they have access to shared memory they usually also have some sort of private cache and then a shared last level cache so l3 in this case and then they they all have access the same memory controller which goes out to main memory and then they also have access to IO but for a very long time chips only had a single core on them so why do we have multi-core processors nowadays why did semiconductor vendors start producing chips that had multiple processor cores on them so the answer is because of two things so first there's Moore's law which says that we get more trans sisters every year so the number of transistors that you can fit on a chip doubles approximately every two years and secondly there's an end of scaling of clock frequencies so for a very long time we could just keep increasing the frequency of the single core on the chip but at around 2004 to 2005 that was no longer the case we couldn't scale the clock frequency anymore so here's a plot showing both the number of transistors you can fit on the chip over time as well as a clock frequency of the processors over time and notice that the y-axis is in log scale here and the blue line is basically Moore's law which says that the number of transistors you can fit on the chip doubles approximately every two years and that's been growing pretty steadily so this plot goes up to 2010 but in fact it's been growing even up until the president it will continue to grow for a couple more years before Moore's law ends however if you look at the clock frequency line you see that it was growing quite steadily until about the early 2000s and then at that point is sort of flattened out so at that point we couldn't increase the clock frequencies anymore and the clock speed was bounded at about 4 gigahertz so nowadays if you go buy a processor is usually still bounded by around 4 gigahertz it's usually a little bit less than 4 gig Hertz because it doesn't really make sense to push it all the way but you might find some processors that are around 4 gigahertz nowadays so so what happened at around 2004 to 2005 does anyone know okay so so Moore's law basically says that we can fit more transistors on a chip because the transistors become smaller and when the transistors become smaller you can reduce the voltage that's needed to operate the transistors and as a result you could increase the clock frequency while maintaining the same power density and that's what manufacturers did until about 2004 to 2005 they just kept increasing the clock frequency to take advantage of Moore's law but turns out that once transistors become small enough in the voltage used to operate them becomes small enough there's something called leakage current so there's current that leaks and we're unable to keep reducing the voltage while still having reliable switching and if you can't reduce a voltage anymore then you can't increase the frequency the clock frequency if you want to keep the same power density so here's a plot from Intel back in 2004 when they first started producing multi-core processors and this is plotting the power density versus time and again the y-axis is in log scale here so the green data points are actual data points and the orange ones are projected and they projected what the power density would be if we kept increasing the clock frequency at a trend of about 25 to 30 percent per year which is what happened up until around 2004 and because we couldn't reduce the voltage anymore the power density will go up and you can see that eventually it reaches the power density of a nuclear reactor which is pretty hot and then and then it reaches the power density of a rocket nozzle and eventually you get to the power density of a of the sun's surface so if you have a chip that's that has a power density equal to the Sun surface well you don't actually really have a chip anymore you right and so basically if you get into this orange region you basically have a fire and you can't really do anything interesting in terms of performance engineering at that point so to solve this problem semiconductor vet semiconductor vendors they didn't include increase the clock frequency anymore but we still had Moore's Law giving us more and more transistors every year so what they decided to do with these extra transistors was to put put them into multiple cores and then put multiple cores on the same chip so we can see that starting at around 2004 the number of cores per chip becomes more than one and and each generation of Moore's Law will potentially double the number of cores that you can fit on a chip because it's doubling the number of transistors and we we've seen this trend up until about today and again it's going to continue for a couple more years before Moore's law ends so that's why we have chips with multiple cores today so today we're going to look at multi-core processing so I first want to introduce the abstract multi-core architecture so this is a very simplified version but I can fit it on this slide and it's a good example for illustration so here we have a whole bunch of processors they each have a cache so that's indicated with the dollar sign and usually they they have a private cache as well as a shared cache so I shared last level cache like the l3 cache and then they're all connected to the network and then throughout through the network they can connect to the main memory they can all access the same shared memory and then usually there's a separate network for the i/o as well even though I've drawn them as a single network here so they can access the i/o interface and potentially the network will also connect to other multi processors on the same system and this abstract multi-core architecture is known as a chip multiprocessor or cmp so that's the architecture that we'll be looking at today so here's an outline of today's lectures so first I'm gonna go over some hardware challenges with shared memory multi-core machines so we're gonna look at the cache coherence protocol and then after looking at hardware we're going to look at some software solutions to program to write parallel programs on these multi-core machines to take advantage of the extra cores and we're gonna look at several concurrency platforms listed here we're gonna look at P threads this is basically a low-level API for accessing or for running your code in parallel and if you program on Microsoft products the win API threads is pretty similar then there's Intel threading building blocks which is a library solution to concurrency and then there are two linguistic solutions that we'll be looking at openmp and so plus and so plus is actually the concurrency platform that will be using for most of this class okay all right so let's look at how caches work so so let's say that we have a value in memory at some location and that value is let's say that value is x equals 3 if one processor says we want to load X what happens is that processor reads this value from a main memory brings it into its own cache and then it also reads a value loads it into one of those registers and it keeps this value in cache so that if it wants to access this value again in the near future it doesn't have to go all the way out to main memory you can just look at the value in its cache now what happens if another processor wants to load X what just does the same thing it reads the value from main memory brings it in addiction is cache and then also loads it into one of the registers and then same thing with another processor turns out that you don't actually always have to go out to main memory to get the value of the value resides in one of the other processors caches you can also get the value through the other processors cache and sometimes ask cheaper than going all the way out to main memory okay so now okay so the second processor now loads acts again and it's in cache so it doesn't have to go to main memory or anybody else's cache so what happens now if we want to store X if we want to set the value of X to something else okay so let's say this processor wants to set X equal to five so it's going to write x equals five and store that result in its own cache so that's all well and good now what what happens when the first processor wants to load X well it sees that the value of X is is it is in its own cache so it's just gonna read the value of x there and it gets a value of three so what's the problem there yes yes so the problem is that the value of x in the first processors cache is stale because another processor updated it so now we can't this value of x in the first processors cache is invalid okay so so that's the problem and one of the main challenges of multi-core hardware is to try to solve this problem of cache coherence making sure that the values in different processors caches are consistent across updates okay so one basic protocol for solving this problem is known as the MSI protocol and in this protocol each cache line is labeled with a state so there are three possible states M s and I and this is done on the granularity of cache lines because it turns out that storing this information is relatively expensive so you don't want to store it for every memory location so they do it on a per cache line basis so anyone know what the size of a cache line is on the machines that we're using yeah yeah so it's 64 bytes and that's typically what you see today on most intel and AMD machines there are some architectures that have different cache lines like 128 bytes but for our class the machines that were using will have 64 byte cache lines is important to remember that so that when you're doing back at the envelope calculations you can get accurate estimates so three states in the MSI protocol or ms and iso M stands for modified and when a cache block is in the modified state that means no other caches can contain this block in the M or the a States the S state means that the block is shared so other caches can also have this block in shared state and if finally I means the cache block is invalid so that's essentially the same as the cache block not being in the cache and to solve the problem of cache coherency when one cache modifies a location it has to inform all the other caches that their values are now stale because because this cache modified the value so it's going to invalidate all of the other copies of that cache line in other caches by changing their state from s to I so let's see how this works so let's say that the second processor wants to store y equals 5 so previously a value of y was 17 and it was in shared State the cache line containing y equals 17 was in shared State so now when I do y equals 5 I'm gonna set the second processors cache that cache line to modified State and then I'm going to invalidate the cache line in all of the other caches that contain that cache line so now the first cache in the fourth cache each have a state of I 4 y equals 17 because that value is still there any questions yes we already other yeah so there are actually some protocols that do that so this is just a most basic protocol so this protocol doesn't do it but there are some that are used in practice that actually do do that so it's a good point yeah but I just want to present the most basic protocol for now sorry okay and then when you load a value you can first check whether whether your cache line is it is an M or a state and if it is an M or S state then you can just read that value directly but if it's in the eye state orbs or if it's not there then you have to fetch that block from either another processors cache or fetch it from main memory so turns out that there are many other protocols out there there's something known as mes I the Massey protocol there's also m OE si and many other different protocols and some of them are proprietary and they all do different things and it turns out that all of these protocols are quite complicated and it's very hard to get these protocols right and in fact one of the most earliest successes of formal verification was improving some of these cache coherence protocols to be correct yes question yeah so if two processors try to modify the value one of them has to happen first so the hardware is going to take care of that so the first one that actually modifies it will invalidate all the other copies and then the second one that modifies the value will again invalidate all the other copies and when you do that when a lot of processors try to modify the same value you get something known as an invalidation storm so you have a bunch of invalidation messages going through throughout the hardware and that can lead to a big performance bottleneck because each processor when it modifies it's about yes inform all the other processors and if all the processors are modifying the same value get the sort of quadratic behavior the hardware is still going to guarantee that one of the processor is going to end up writing the value there but you should be aware of this performance issue when you're writing a parallel code yes yes so this is all implemented in hardware so if we take a computer architecture class you'll learn much more about these protocols and all of their variants so for our purposes we don't actually need to understand all the details of the hardware we just need to understand what it's doing at a high level so we can understand when we have a performance bottleneck and why we have a performance bottleneck so that's why I'm just introducing the most basic protocol here any other questions all right so so I talked a little bit about the shared memory hardware let's now look at some concurrency platforms so these are the four platforms that we'll be looking at today so first what is a concurrency platform both writing parallel programs is very difficult it's very hard to get these programs to be correct and if you want to optimize their performance it becomes even harder so it's very painful and error-prone and a concurrency platform abstracts processor cores and handles synchronization and communication protocols and it also performs load balancing for you so it makes your lives much easier and so today we're gonna talk about some of these different concurrency platforms so to illustrate these concurrency platforms I'm gonna do the Fibonacci numbers example so does anybody not know what Fibonacci is so good everybody knows what Fibonacci is okay so so it's a sequence where each numbers are sum of the previous two numbers and the recurrence here is the recurrence is shown in this brown box here the sequence is named after Leonardo di Pisa who was also known as Fibonacci which is a contraction of filius Bonacci son of Bonacci oh so that's where the name Fibonacci came from and in Fibonacci's 1202 book libra bocce he introduced the sequence the Fibonacci sequence to Western mathematics although it had been previously known to India and mathematicians for several centuries but this is what we call the the sequence nowadays Fibonacci numbers all right so here's a Fibonacci program has anyone seen this out algorithm before a couple people probably more but people didn't raise their hands okay so all right so it's a recursive program so it basically implements the recurrence from the previous slide so if n is less than 2 we just return n otherwise we compute fib of n minus 1 store that value in X filled with n minus 2 store that value in Y and then return the sum of x and y so I do want to make a disclaimer to the algorithms police that this is actually a very bad algorithm so this algorithm takes exponential time and there's actually much better ways to compute the Fibonacci thence the Bonacci number there's a linear time algorithm which just computes the Fibonacci numbers from bottom up this algorithm here is actually redoing a lot of the work because it's computing Fibonacci numbers multiple times whereas if you do a linear scan from the smallest numbers up you only have to compute each one once and there's actually an even better algorithm that takes logarithmic time and it's based on squaring matrices so has anyone seen that algorithm before yeah so a couple people so if you're interested in learning more about this algorithm I encourage you to look at your favorite textbook introduction algorithms by cormen leiserson rivest and stein so even though this is a yes so even though this is a pretty bad algorithm it's still a good educational example because I can fit it on on one slide and illustrate all the concepts of parallelism that we want to cover today so here's the execution tree for fib of 4 so we see that fair before is going to call fib of 3 in favor of to fib of 3 is gonna call fib of 2 fib of 1 and so on and you can see that repeated computations here it's a fib of 2 is being computed twice and so on and if you have a much larger tree say you ran it ran this on fit before T then you'll have many more overlapping computations it turns out that the two recursive calls can actually be paralyzed because they're completely independent calculations so the key idea for parallelization is to simultaneously execute the two recursive sub calls to fib and in fact you can do this recursively so the two sob calls to Fela 3 can also be executed in parallel and the two sub calls a fiblet 2 can also be executed in parallel on so on so you have all of these calls that can be executed in parallel so that's a key idea for extracting parallelism from this algorithm okay so let's now look at how we can use P threads to implement this simple Fibonacci algorithm so P threads is a standard API for threading and it's supported on all UNIX based machines and if you're programming using Microsoft products then then the equivalent is when API threads and this is actually a P threads is actually standard and ANSI I Triple E so there's this number here that specifies a standard but nowadays we just call it P threads and it's basically a do-it-yourself concurrency platforms so it's like the assembly language of parallel programming it's built as a library of functions with special non C semantics because if you're just writing code in C you can't really say which parts of the code should be executed in parallel so P threads provides you a library of functions that allow you to specify concurrency your program and each thread implements an abstraction of a processor and these threads are then multiplex onto the actual machine resources so the number of threads that you create doesn't necessarily have to match the number of processors you have on your machine so if you have more threads than the number of processors you have then they'll just be multiplexing so you can actually run a pthreads program on a single single core even though you have multiple threads in the program they would just be time sharing all the threads communicate through shared memory so they all have access to the same view of the memory and the library functions that pthreads provides mask the protocols involved in inner thread coordination so you don't have to do it yourself because it turns out that this is quite difficult to do correctly by hand so now I want to look at the key P thread functions so the first P thread function is pthread create and this takes four arguments so the first argument is this a P thread underscore T type this is this is basically going to store and identify for the new thread that pthread create will create so that we can use that thread in our computations P thread attribute T decide some thread attributes and for our purposes we can just set it to null and use the default attributes the third argument is this function that's going to be executed after we create the thread so we're going to need to define this function that we want the thread to execute and then finally we have this void star argue arg argument which stores the arguments that are going to be passed to the function that we're going to be executing and then pthread create also returns an error status returns an integer specifying whether the thread creation was successful or not and then there's another function called pthread join pthread join basically says that we want to block at this part of our code until all the until so until this specified thread finishes so it takes us argument pthread underscore t so this thread identifier and these thread identifiers were created when we call p thread pthread create also has the second argument status which is going to store the status of the terminating thread and that p thread join also returns an error status so essentially what this does is says to wait until there's threat finishes before we continue on in our program so any any questions so far ok so here's what the implementation of Fibonacci looks like using P threads so on the Left we see the original program that we had the fib function there that's just the sequential code and then we have all this other stuff to enable it to run in parallel so first we have this struct on the left thread args this struct here is used to store the arguments that are passed to the function that the thread is going to execute and then we have this thread func what that does is it it reads the input argument from this thread args struct and then it sets that to I and then it calls fib of I and that gives you the output and then we store the result into the output of the struct and then that also just returns null and then over on the right hand side we have the main function that will actually call the fib function on the left so we initialize a whole bunch of variables that we need for to execute these threads and then we first check if n is less than 30 if n is less than 30 it turns out that it's actually not worse creating threats to execute this program in parallel because of the overhead of thread creation so if n is less than 30 we'll just execute the program sequentially and this idea is known as coarsening so you saw a similar example couple lectures ago when we did coarsening for sorting but this is in the context of parallel program so here because there are some overheads to running function parallel if the input size is small enough sometimes you want to just execute it sequentially and then we're going to okay so let me just walk through this code since have an animation okay so the next thing it's going to do is it's going to marshal the input argument to the threat so it's going to store the input argument n minus 1 in this args struct and then we're gonna call pthread create with a thread variable for thread arts we're just gonna use null and then we're gonna pass the thread func that we defined on the left and then we're going to pass the args structure and in inside this arc structure the input is set to n minus 1 which we did on the previous line okay and then pthread create is going to it's going to give a return value so if if the P thread creation was successful then the status is going to be no and we can continue and if that and when we continue we're going to execute now fib of n minus 2 and store the result of that into our result variable and this is done at the same time that fib of n minus 1 is executing because we created this P thread and we told it to call this thread func function that we defined on the left so fib of n minus 1 and fib of n minus 2 are executing in parallel now and then we have this pthread join which says we're going to wait until the thread that we created finishes before we move on because we need to know the result of both of the sub calls before we can finish this function and once that's done well we first check the status to see if it was successful and if so then we add the outputs of the arguments truck to the results so arg star output will store the output of fib of n minus 1 so that's the that's the pthreads code any questions on how this works yeah yeah yeah so this is because the pthread create function takes as input void star pointer because it's actually a generic function so it doesn't know what the data type is it has to work for all data types and that's why we need to cast it to void star when we pass it the pthread create and then inside the thread func we actually do know what type of pointer that is so then we cast it yeah so does this code seem very parallel so how many parallel calls am i doing here yeah yeah so I'm only creating one threat so I'm executing two things in parallel so if I ran this code on four processors what's the maximum speed-up I could get so maximum speed-up I can get is just two because I'm only creating I'm only running two things in parallel so this doesn't recursively create threads only creates threads at the it only creates one thread at the top level and if you wanted to make make it so that this code actually recursively created threads it would actually become much more complicated and that's one of the disadvantages of implementing this code in P threads so we'll look at other solutions that will make this task much easier so some of the issues with P threads are shown on this slide here so there's a high overhead to creating a thread so creating a thread typically takes over 10 to the 4/4 cycles and this leads to very coarse-grained concurrency because your tasks have to do a lot of work in order to amortize the cost of creating that thread there are something called thread pools which can help and the idea here is to create a whole bunch of threads at the same time to amortize the cost of thread creation and then when you need a thread you just take one from the thread pool so the thread pool contains threads that are just waiting to do work there's also a scalability issue with this code that I showed on the previous slide the Fibonacci code gets at most one point five x speed-up for two cores why is it one point five here does anyone know yeah yeah so it turns out that the two the two calls and I'm executing in parallel they're not doing the same amount of work so one is computing fib of n minus one one is computing fib of n minus twos and does anyone know what the ratio between these two values is yeah so it's the golden ratio it's about 1.6 it turns out that if you can get a speed-up of 1.6 and that's great but there are some overheads so this code will get about a 1.5 speed-up and if you want to run this to take at the edge of more course and you need to rewrite this code and it becomes more complicated third there's the issue of modularity so if you look at this code here you see that the Fibonacci logic is not nicely encapsulated within one function we have that logic in the function on the left but then we also have some of the fib logic on the right in our main function and this makes it this makes this code not modular and if we if we want to build programs on top of this it makes it very hard to maintain if we want to just change the logic of the Fibonacci function a little bit because now we have to change it in multiple places instead of just having everything in one place so it's not a good idea to write code that's not modular so please don't do it do that in your projects okay and then finally the code becomes complicated because you have to actually move these arguments around that's known as argument marshalling and then you have to engage in error-prone protocols in order to do load balancing so if you recall here we have to actually place the argument n minus 1 into into our dot input and we have to extract the value out of Arts dot output so that makes the code very messy so why do I say shades of 1958 here suddenly one no what what happened in 1958 who is around in 1958 Jess Charles so there is a first something in 1958 what was it so turns out in 1958 we had the first compiler and this was the Fortran compiler and before we had the fortune compiler programmers were writing things in assembly and when you write things in assembly you have to do argument argument marshalling because you have to place things into the appropriate registers before calling a function and also move things around when you return from a function and the nice thing about the first compiler is that it actually did all of this argument marshalling for you so now you can just pass arguments to a function and the compiler will generate code that will do the argument marshalling for for us so having you do this in P threads is similar to having to write code in assembly because you have to actually manually marshal these arguments so hopefully there are better ways to do this and indeed we'll look at some other solutions that will make it easier on the programmer any questions before I continue all right so we looked at pthreads next let's look at threading building blocks so threading building blocks is a library solution it was developed by Intel and it's implemented as a C++ library that runs on top of native threads so the underlying implementation uses threads but the programmer doesn't deal with threads instead the programmer specifies tasks and these tasks are automatically load balanced across the threads using a work-stealing algorithm inspired by research at MIT charles license research in the focus of Intel TBB is on performance and as we'll see the code written using TBB is simpler than what you would have to write if you used P threads so let's look at how we can implement Fibonacci using TBB okay so in TBB we have to create these tasks so in the Fibonacci code we create this v as class and inside the task we have to define this execute function so the xq function is a function that performs the computation when we start the task and this is where we define the Fibonacci logic this task also takes as input these arguments parameter and in sum so and as they input here in sum is the output okay and in T bbbb we can easily create a recursive program that extracts more parallelism and here what we're doing is we're recursively creating two child tasks a and B that's a syntax for creating the task and here we can just pass two arguments to fib tasks instead of marshaling the arguments ourselves and and then what we have here is a set ref count and this basically is the number of casts that we have to wait for plus 1 so plus one for ourselves and in this case we create a two children task and we have ourselves so that's 2 plus 1 and then after that we start has be using the spawn be call and then we do spawn and wait for all with a as the argument in this place he says we're going to run a start task a and then also wait for both a and B to finish before we proceed so this spawn and wait for all call is going to look at the raft count that we set above and wait for that many tasks to finish before it continues and then after after both a and B have completed then we can just sum up the results and store that in the sum variable and here these tasks are created recursively so unlike the pthreads implementation that was only creating one thread at the top level here we're actually recursively creating more and more tasks so we can actually get more parallelism from this code and scale to more processors we also need this main function just to start up the program so what we do here is we create a root task which just computes fib of n and that we call spawn root and weight of a so a is it has for the root and then it will just run the root task so that's what T Fibonacci looks like in TBB so this is much simpler than the pthreads implementation and it also gets better performance because we can extract more parallelism from from the computation any questions okay so TBB also has many other features in addition to tasks so TBB provides many C++ templates to express common patterns and you can use these templates on different data types so they have a parallel for which is used to express loop parallelism so you can loop over a bunch of iterations in parallel they also have a parallel reduce for data aggregation for example if you if you want to sum together a whole bunch of values you can use a parallel reduce to do that in parallel they also have a pipeline and filter that's used for software pipelining TBB provides many concurrent container classes which allow multiple threats is safely access and update the items in the container concurrently so for example they have hash tables trees priority queues and so on and you can just use these out-of-the-box and don't work in parallel you can do concurrent updates and reads to these data structures TPB also has a variety of mutual exclusion library functions such as locks and atomic operations so there are a lot of features of TBB which is why is one of the more popular concurrency platforms and because of all of these features you don't have to implement many of these things by yourself and still get pretty good performance so TBB was a library solution to the concurrency problem now we're gonna look at two linguistics solutions OpenMP and silk so start with OpenMP so OpenMP is a specification by an industry consortium and there are several compilers available that support OpenMP both open source and proprietary so nowadays GCC icc and clang all support OpenMP as well as visual studio and openmp is it provides linguistic extensions to C and C++ as well as Fortran and form of compiler pragmas so you use these compiler pragmas in your code to specify which pieces of code should run in parallel and OpenMP also runs on top of native threads but the programmer isn't exposed to these threads OpenMP supports loop parallelism so you can do parallel for loops they have task parallelism as well as pipeline parallelism so let's look at how we can implement fibonacci in OpenMP so this is the entire code so I want you to compare this to the P threads implementation that we saw 10 minutes ago so so this code is much cleaner than the P threads implementation it it also performs better so let's see how this code works so we have these compiler pragmas or compiler directives and the compiler pragma for creating a parallel task is a OMP task so we're gonna create a openmp task for faileth n minus 1 as well it's a Fablab n minus 2 there's also this yeah there's also this shared pragma which specifies that the two variables in the arguments are shared across different threads so you also have to specify whether variables are private or shared and then the pragma OMP weight just says we're gonna wait for the preceding task to complete before we continue so here is going to wait for fib of n minus 1 and fib of n minus 2 to finish before we return the result which is what we want and then after that we just returned X plus y so that's the entire code and OpenMP also provides many other pragma directives in addition to tasks so we can use a parallel for to do loop parallelism there's reduction there's also directives for scheduling and data sharing so you can specify how you want a particular loop to be scheduled OpenMP has many different scheduling policies they have static parallelism dynamic parallelism and so on and then these scheduling directives also have different grain sizes the data sharing directives are specifying whether a variables are private or shared OpenMP also supplies a variety of synchronization constructs such as barriers atomic updates mutual exclusion or mutex locks so openmp also has many features and it's also one of the more popular solutions to writing parallel programs and as you saw in the previous example the code is much simpler than if you were to write something using pthreads or even a TVB this is a much simpler solution any questions yeah so so this code here is actually independent of the number of processors so there is actually a scheduling algorithm that will determine how the casket map to different processors so if you spawned in you test it doesn't necessarily put it on the different processor and you can have more tasks than the number of processes available so there's a scheduling algorithm that will take care of how these tasks get mapped to different processors and that's hidden from the programmer although you can't use these scheduling pragmas to give hints the compiler how it should schedule it yeah us so I mean underneath this is implemented using P threads which has to make operating system calls to basically directly talk to the processor cores and do multiplexing and so forth so it's the operating system is involved at a very low level okay so the last concurrency platform that we'll be looking at today is silk okay so or we're gonna look at so plus actually in the silk part of silk plus is a small set of linguistic extensions to C and C++ that support for joint parallelism so for example the Fibonacci example uses fork/join parallelism so you can use soap to implement that and the plus part of soap Plus supports vector parallelism which you had experience working with in your homeworks so silk was so plus was initially devolved by silk arts which was an MIT spin-off and silk arts was acquired by Intel in July 2009 and the so plus implementation is based on award winning based on the award-winning silk multi-threaded languished I was developed two decades ago here at MIT by Charles artisans research group and it features a provably efficient work stealing scheduler so this schedule is provably efficient you can actually prove theoretical bounds on it and this allows you to implement theoretically efficient algorithms which we'll talk more about in another lecture algorithm design but it provides a provably efficient work stealing scheduler and Charles Larsen has a very famous paper that has a proof of that this scheduler is optimal so if you're interested in reading about this you can talk to us offline so plus also provides a hyper object library for paralyzing code with global variables and you'll have a chance to play around with hyper objects in homework for the so plus ecosystem also includes useful programming tools such as a silk screen race detectors so this allows you to detect determinacy races in your program to help you isolate bugs and performance bottlenecks it also has a scalability analyzer called silk view and silk view will base analyze the amount of work that your program is doing as well as the maximum amount of parallelism that your code could possibly extract from the hardware so that's in Tulsa plus but it turns out that we're not actually going to be using in Tulsa plus in this class we're gonna be using a better compiler and this compiler is based on taper LLVM and it supports the silk subset of silk plus and taper LLVM was actually recently developed at MIT by TB charnel who gave a lecture last week William Moses who's a grad student working with Charles as well as charles leiserson and so talking a lot about Charles's work today and taper LVM generally produces better code relative to its base compiler than all other implementations of silk out there so it's the best cell compiler that's available today and they actually wrote a very nice paper on this last year Charles lisen in this group and that paper received the best paper award at the annual symposium on principles and practices of parallel programming or p-pop so you should look at that paper as well so right now taper LVM uses the Intel so plus runtime system by I believe Charles's group has plans to implement a better runtime system and tabor LVM also supports more general features than existing sole compilers so in addition to spawning functions you can also spawn code blocks that are not separate functions and this makes your this right makes writing programs more flexible you don't have to actually create a separate function if you want to execute a code block in parallel any questions so this is the salt coat for Fibonacci so it's also pretty simple it looks very similar to the sequential program except we have these silk spawn and silk stink statements in the code so what do these statements do so silk spawn says that the named child function which is the function that is right after the silk spawn statement may execute in parallel with the parent caller the parent caller is the function that is calling soaked spawn so this says that fib of n minus 1 can execute in parallel with the function that called it and then this function is then gonna call fib of n minus 2 and fib of n minus 2 and fib of n minus 1 now can be executing in parallel and then silk sync says that control cannot pass this point until all of the spawn children have returned since it's going to wait for fifth of N minus 1 to return before we go to the return statement where we add up X&Y so one important thing to note is that the so key where it's grant permission for parallel execution but they don't actually force or command parallel execution so even though I said soak spawn here the the runtime system doesn't necessarily have to run fib of n minus 1 in parallel with fib of n minus 2 I'm just saying that I could run these two things in parallel and it's up to the runtime system to decide whether or not to run these things in parallel based on its scheduling policy so let's look at another example of silk so let's look at loop parallelism so here we want to do a matrix transpose and we want to do this in place so the idea here is we want to basically swap the elements below the diagonal to the - its mirror mirror image above the diagonal and here's some code to do this so we have a silk for so this is basically a parallel for loop it goes from I equals 1 to n minus 1 and then the inner for loop goes from J equals 0 up to I minus 1 and then we just swap a of IJ with a of ji using these three statements inside the body of the for loop so so to execute a for loop in parallel you just have to add silk underscore to the to the for keyword and that's as simple as it gets so this code is actually going to run in parallel and get a pretty good speed up on this for this particular problem and internally silk for loops are transformed into nested silks spawn in silks Inc calls so the compiler is going to get rid of the silk for and change it into silks spawn and silks Inc so it's going to recursively divide the iteration space into half and then it's going to spawn off one half in the next acute the other half in parallel with that and then recursively do that until the iteration range becomes small enough at which point doesn't make sense to execute it in parallel anymore so we just execute that range sequentially okay so that's loop parallelism in silk any questions yes there's something weird can I still do that yeah so the compiler can actually figure out what the iteration space is so you don't necessarily have to be incrementing by one you can do something else you just have to guarantee that all of the iterations are independent so if if you act if you have a determinacy race across the different iterations of your cell for loop then your result might not necessarily be correct so you have to make sure that the iterations are indeed independent yes yeah so you can that's silk force it turns out that for this example usually you already have enough parallelism in the outer loop for large enough values of n so it doesn't make sense to put a slope for loop inside because using a silk for loop adds some additional overheads but you can actually do nested silk for loops and in some cases it does make sense especially if the if there's not enough parallelism in that outermost for loop it's a good question yes so I mean it it it has a bunch of calls to the silk run time system I don't actually I don't I mean I don't know all the details because I haven't looked at this recently but I think you can actually generate the assembly code using a flag in the clang compiler so that's a good exercise yeah you probably want to look at the LLVM ir rather than the assembly to begin with to understand what's going on it has three instructions that are not in the standard LLVM which which were added to support parallelism those things when it's lowered into the assembler into assembly each of those instructions becomes a bunch of assembly language instructions so you don't want to mess with with looking at it in the assembler until you see what it looks like in the LLVM first yes a good question any other questions about this this code here okay so let's look at another example so let's say we had this floral loop where on each iteration I were just incrementing a variable sum by I so this is essentially going to commute compute the summation of everything from I equals 0 up to n minus 1 and then print out the result so one straightforward way to try to paralyze this code is to just change the 4 to a soap 4 so does this code work who thinks that this code doesn't work or doesn't compute the correct result so about half of you and who thinks this code does work so couple people in guess the rest of the people don't care so it turns out that it's not actually necessarily going to give you the right answer because the silk for loop says you can execute these iterations in parallel but they're all updating the same shared variable sum here so you have a what's called a determinacy race where multiple processors can be writing to the same memory location we'll talk much more about determinacy races in the next lecture but for this example it's not necessarily going to work if you run it on more than one processor and silk actually has a nice way to deal with this so in silk we have something known as a reducer this is one example of a hyper object which I mentioned earlier and with a reducer what you have to do is instead of declaring this some variable just has an unsigned long data type what you do is use this macro called silk see reducer op ad which specifies we want to create a reducer with the addition function then we have the variable name sum the data type unsigned long and then the initial value zero and then we have a macro to register this reducer so a silk see register reducer and then now inside this soap for loop we can increment the sum or reducer view of sum which is another macro by I and you can actually execute this in parallel and it will give you the same answer that you would get if you ran this sequentially so the reducer will take care of this determinacy race for you and at the end when you print out this result you'll see that the sum is equal to the sum that you expect and then after you finish using the reducer use this other macro called silky an unregistered reducer of sum that tells the system that you're done using this reducer so this is one way to deal with this problem when you're when you want to do a reduction and it turns out that there are many other interesting reduction operators that you might want to use and in general you can create reducers for Mahanoy Chand ma nodes or algebraic structures that have an associative binary operation as well as an identity element so the addition operator is a mono it because it's associative it's binary and the identity element is 0 silk also has several other predefined reducers including multiplication min Max and or XOR etc so these are all more mono it's and you can also define your own reducer so in fact in the next homework you'll have the opportunity to play around with reducers and write a reducer for lists so that's reducers another nice thing about silk is that there's always a valid serial interpretation of the program so the serial elision of a silk program is always a legal interpretation and for the silk source code on the left the serial elision is basically the code you get if you get rid of the silk spawn and soak sync statements this looks just like the sequential code remember that the sole keywords grant permission for parallel execution but they don't necessarily command parallel executions if you ran this if you ran this SIL code using a single core it wouldn't actually create these parallel tasks and you would get the same answer as the sequential program and this is in the serial illusion is also a correct interpretation so unlike other solutions such as TBB and P threads it's actually difficult in those environments to get a program that does what the sequential program does because they're actually doing a lot of additional work to set up these parallel calls and create these argument structures and other scheduling constructs whereas in silk it's very easy just to get the serial illusion you just define silks pawn and silks Inc to be null you also define silk for to be for and then this gives you a valid sequential program so when you're debugging code and you might first want to check if your if the sequential elision of your silk program is correct and you can easily do that by using these macros or actually there's actually a compiler flag that will do that for you and give you the equivalent C program so this is a nice way to debug because you don't have to start with the parallel program you can first check if this serial program is correct before you go on to debug the parallel program questions yes so it turns out that by default it it groups a bunch of iterations together into a single task because it doesn't make sense to break it down into such small chunks do that overhead so parallelism but there's actually a setting you can do to change the grain size of the for loop so you could actually make it so that each iteration it's its own task and then and then as you said the scheduler scheduler will decide how to map these different tasks onto different processors or even if it wants to execute any of these tasks in parallel it's a good question okay so the idea in seok is to allow the programmer to express logical parallelism in an application so the programmer just has to identify which pieces of the code could be executed in parallel but doesn't necessarily have to determine which of the piece which pieces of code should be executed in parallel and then silk has a runtime scheduler that will monitor matically map the executing program on to the available processor cores runtime and it does this dynamically using work-stealing scheduling algorithm and the work stealing scheduler is used to balance the tasks evenly across the different processors and we'll talk more about the work stealing scheduler in a future lecture but I want to emphasize that unlike unlike the other concurrency platforms that we looked at today silks works the only scheduling algorithm is theoretically efficient whereas the OpenMP and TBB schedulers are not theoretically efficient so this is a nice property because it will guarantee you that the algorithms you write on top of silk will also be theoretically efficient okay so here's a high-level illustration of the silk ecosystem it's a very simplified view but I did this to fit it on a single slide so so what you do is you take the silk source code you pass it to your favorite silk compiler the taper LLVM compiler and this gives you a binary that you can run on multiple processors and then you pass a program input to the binary you run it on however many processors you you have and then this allows you to benchmark the parallel performance of your program you can also do serial testing and to do this you just obtain a serial illusion of the Silk program and you pass it to an ordinary C or C++ compiler generates a binary that can only run on a single processor and you run your suite of serial regression tests on this single threaded binary and this will allow you to benchmark to performance of your serial code and also debug any issues that might have arised when you are running this program sequentially another way to do this is you can actually just compile the original silk code but run it on a single processor so there's a command-line argument that tells the runtime system how many processors you want to use any of you set that parameter to 1 then it will only use a single processor and this allows you to benchmark the single-threaded performance of your code as well and the parallel program executing on a single course should behave exactly the same way as the execution of this serial Alishan so that's one of the advantages of using silk and because because you can easily do serial testing using the silk platform this allows you to separate out the serial correctness from the parallel correctness as I said earlier you can first debug the serial correctness as well as any performance issues before moving on to the parallel version and another point I want to make is that because the because Silk actually uses the serial program inside his tasks it's actually good to optimize a serial program even when you're writing a parallel program because optimizing the serial program for performance will also translate to better parallel performance another nice feature of seok is that it has this tool called silks and which stands for silk sanitizer and the silk silks and will detect any determinacy races that you have in your code which will significantly help you with debugging the correctness and as well as the performance of your code so silks and wool if you compile the silk code using the silks and flag it will generate an instrumented binary that when you run it will find and localize all of the determinacy races in your programs it will tell you where the determinacy races occur so that you can go inspect that part of your code and fix it if necessary so this is a very useful tool for benchmarking your parallel programs silk also has another nice tool called silk scale silk scale it's a performance analyzer it will analyze how much parallelism is available in your program as well as the total amount of work that it's doing so again you pass a flag to the compiler that will turn on silk silk scale and it will generate a binary that is instrumented and then when you run run this code it will give you a scalability report so you'll find these tools very useful when you're doing the next project and we'll talk a little bit more about these two tools in the next lecture right as I said silk skill will analyze how well your program will scale to larger machines so as it will basically tell you the maximum number of processors that your code could possibly take advantage of any questions yes so I mean the scheduler the silk runtime scheduler does scheduling that different tasks when you're running the program so it's linked from the binary it's not stored in the same place it's linked other questions all right so let me summarize what we looked at today so first we saw that most processors today have multiple cores and probably all of your laptops have more than one core on it who has a laptop that only has one core ok when did you buy it yeah probably a long time ago I see so so nowadays obtaining high performance on your machines requires you to write parallel programs but parallel programming can be very hard especially if you have to program directly on the processor cores and interact with the operating system yourself so silk is very nice because it abstracts the processor cores from the programmer a handle synchronization and communication protocols and it also performs provably good load balancing and in the next project you'll have a chance to play around with silk you'll be implementing your own parallel screensaver so that's a very fun project to do and possibly in one of the future lectures we'll post some of the nicest screen savers that students developed for everyone to see ok so that's all you 2 00:00:03,189 --> 00:00:05,769 the following content is provided under a Creative Commons license your support will help MIT OpenCourseWare continue to offer high quality educational resources for free to make a donation or to view additional materials from hundreds of MIT courses visit MIT opencourseware at ocw.mit.edu today we're gonna talk about multi-core programming and I as I was just informed by Charles it's 2018 I had 2017 on the slide okay so first congratulations to all of you you turned in the first projects beta here's a here's a plot showing the tiers that different groups reach for the beta and this is in sorted order and we set the beta caught off to be tier 45 the final cut off is tier 48 so the final cut off we did set a little bit aggressively but keep in mind that you don't necessarily have to get to the final cut off in order to get an A on this project okay so we're going to talk about multi-core processing today that's gonna be a topic of the next project after you finish the first project so so in a multi-core processor we have a whole bunch of cores that are all placed on the same chip and they have access to shared memory they usually also have some sort of private cache and then a shared last level cache so l3 in this case and then they they all have access the same memory controller which goes out to main memory and then they also have access to IO but for a very long time chips only had a single core on them so why do we have multi-core processors nowadays why did semiconductor vendors start producing chips that had multiple processor cores on them so the answer is because of two things so first there's Moore's law which says that we get more trans sisters every year so the number of transistors that you can fit on a chip doubles approximately every two years and secondly there's an end of scaling of clock frequencies so for a very long time we could just keep increasing the frequency of the single core on the chip but at around 2004 to 2005 that was no longer the case we couldn't scale the clock frequency anymore so here's a plot showing both the number of transistors you can fit on the chip over time as well as a clock frequency of the processors over time and notice that the y-axis is in log scale here and the blue line is basically Moore's law which says that the number of transistors you can fit on the chip doubles approximately every two years and that's been growing pretty steadily so this plot goes up to 2010 but in fact it's been growing even up until the president it will continue to grow for a couple more years before Moore's law ends however if you look at the clock frequency line you see that it was growing quite steadily until about the early 2000s and then at that point is sort of flattened out so at that point we couldn't increase the clock frequencies anymore and the clock speed was bounded at about 4 gigahertz so nowadays if you go buy a processor is usually still bounded by around 4 gigahertz it's usually a little bit less than 4 gig Hertz because it doesn't really make sense to push it all the way but you might find some processors that are around 4 gigahertz nowadays so so what happened at around 2004 to 2005 does anyone know okay so so Moore's law basically says that we can fit more transistors on a chip because the transistors become smaller and when the transistors become smaller you can reduce the voltage that's needed to operate the transistors and as a result you could increase the clock frequency while maintaining the same power density and that's what manufacturers did until about 2004 to 2005 they just kept increasing the clock frequency to take advantage of Moore's law but turns out that once transistors become small enough in the voltage used to operate them becomes small enough there's something called leakage current so there's current that leaks and we're unable to keep reducing the voltage while still having reliable switching and if you can't reduce a voltage anymore then you can't increase the frequency the clock frequency if you want to keep the same power density so here's a plot from Intel back in 2004 when they first started producing multi-core processors and this is plotting the power density versus time and again the y-axis is in log scale here so the green data points are actual data points and the orange ones are projected and they projected what the power density would be if we kept increasing the clock frequency at a trend of about 25 to 30 percent per year which is what happened up until around 2004 and because we couldn't reduce the voltage anymore the power density will go up and you can see that eventually it reaches the power density of a nuclear reactor which is pretty hot and then and then it reaches the power density of a rocket nozzle and eventually you get to the power density of a of the sun's surface so if you have a chip that's that has a power density equal to the Sun surface well you don't actually really have a chip anymore you right and so basically if you get into this orange region you basically have a fire and you can't really do anything interesting in terms of performance engineering at that point so to solve this problem semiconductor vet semiconductor vendors they didn't include increase the clock frequency anymore but we still had Moore's Law giving us more and more transistors every year so what they decided to do with these extra transistors was to put put them into multiple cores and then put multiple cores on the same chip so we can see that starting at around 2004 the number of cores per chip becomes more than one and and each generation of Moore's Law will potentially double the number of cores that you can fit on a chip because it's doubling the number of transistors and we we've seen this trend up until about today and again it's going to continue for a couple more years before Moore's law ends so that's why we have chips with multiple cores today so today we're going to look at multi-core processing so I first want to introduce the abstract multi-core architecture so this is a very simplified version but I can fit it on this slide and it's a good example for illustration so here we have a whole bunch of processors they each have a cache so that's indicated with the dollar sign and usually they they have a private cache as well as a shared cache so I shared last level cache like the l3 cache and then they're all connected to the network and then throughout through the network they can connect to the main memory they can all access the same shared memory and then usually there's a separate network for the i/o as well even though I've drawn them as a single network here so they can access the i/o interface and potentially the network will also connect to other multi processors on the same system and this abstract multi-core architecture is known as a chip multiprocessor or cmp so that's the architecture that we'll be looking at today so here's an outline of today's lectures so first I'm gonna go over some hardware challenges with shared memory multi-core machines so we're gonna look at the cache coherence protocol and then after looking at hardware we're going to look at some software solutions to program to write parallel programs on these multi-core machines to take advantage of the extra cores and we're gonna look at several concurrency platforms listed here we're gonna look at P threads this is basically a low-level API for accessing or for running your code in parallel and if you program on Microsoft products the win API threads is pretty similar then there's Intel threading building blocks which is a library solution to concurrency and then there are two linguistic solutions that we'll be looking at openmp and so plus and so plus is actually the concurrency platform that will be using for most of this class okay all right so let's look at how caches work so so let's say that we have a value in memory at some location and that value is let's say that value is x equals 3 if one processor says we want to load X what happens is that processor reads this value from a main memory brings it into its own cache and then it also reads a value loads it into one of those registers and it keeps this value in cache so that if it wants to access this value again in the near future it doesn't have to go all the way out to main memory you can just look at the value in its cache now what happens if another processor wants to load X what just does the same thing it reads the value from main memory brings it in addiction is cache and then also loads it into one of the registers and then same thing with another processor turns out that you don't actually always have to go out to main memory to get the value of the value resides in one of the other processors caches you can also get the value through the other processors cache and sometimes ask cheaper than going all the way out to main memory okay so now okay so the second processor now loads acts again and it's in cache so it doesn't have to go to main memory or anybody else's cache so what happens now if we want to store X if we want to set the value of X to something else okay so let's say this processor wants to set X equal to five so it's going to write x equals five and store that result in its own cache so that's all well and good now what what happens when the first processor wants to load X well it sees that the value of X is is it is in its own cache so it's just gonna read the value of x there and it gets a value of three so what's the problem there yes yes so the problem is that the value of x in the first processors cache is stale because another processor updated it so now we can't this value of x in the first processors cache is invalid okay so so that's the problem and one of the main challenges of multi-core hardware is to try to solve this problem of cache coherence making sure that the values in different processors caches are consistent across updates okay so one basic protocol for solving this problem is known as the MSI protocol and in this protocol each cache line is labeled with a state so there are three possible states M s and I and this is done on the granularity of cache lines because it turns out that storing this information is relatively expensive so you don't want to store it for every memory location so they do it on a per cache line basis so anyone know what the size of a cache line is on the machines that we're using yeah yeah so it's 64 bytes and that's typically what you see today on most intel and AMD machines there are some architectures that have different cache lines like 128 bytes but for our class the machines that were using will have 64 byte cache lines is important to remember that so that when you're doing back at the envelope calculations you can get accurate estimates so three states in the MSI protocol or ms and iso M stands for modified and when a cache block is in the modified state that means no other caches can contain this block in the M or the a States the S state means that the block is shared so other caches can also have this block in shared state and if finally I means the cache block is invalid so that's essentially the same as the cache block not being in the cache and to solve the problem of cache coherency when one cache modifies a location it has to inform all the other caches that their values are now stale because because this cache modified the value so it's going to invalidate all of the other copies of that cache line in other caches by changing their state from s to I so let's see how this works so let's say that the second processor wants to store y equals 5 so previously a value of y was 17 and it was in shared State the cache line containing y equals 17 was in shared State so now when I do y equals 5 I'm gonna set the second processors cache that cache line to modified State and then I'm going to invalidate the cache line in all of the other caches that contain that cache line so now the first cache in the fourth cache each have a state of I 4 y equals 17 because that value is still there any questions yes we already other yeah so there are actually some protocols that do that so this is just a most basic protocol so this protocol doesn't do it but there are some that are used in practice that actually do do that so it's a good point yeah but I just want to present the most basic protocol for now sorry okay and then when you load a value you can first check whether whether your cache line is it is an M or a state and if it is an M or S state then you can just read that value directly but if it's in the eye state orbs or if it's not there then you have to fetch that block from either another processors cache or fetch it from main memory so turns out that there are many other protocols out there there's something known as mes I the Massey protocol there's also m OE si and many other different protocols and some of them are proprietary and they all do different things and it turns out that all of these protocols are quite complicated and it's very hard to get these protocols right and in fact one of the most earliest successes of formal verification was improving some of these cache coherence protocols to be correct yes question yeah so if two processors try to modify the value one of them has to happen first so the hardware is going to take care of that so the first one that actually modifies it will invalidate all the other copies and then the second one that modifies the value will again invalidate all the other copies and when you do that when a lot of processors try to modify the same value you get something known as an invalidation storm so you have a bunch of invalidation messages going through throughout the hardware and that can lead to a big performance bottleneck because each processor when it modifies it's about yes inform all the other processors and if all the processors are modifying the same value get the sort of quadratic behavior the hardware is still going to guarantee that one of the processor is going to end up writing the value there but you should be aware of this performance issue when you're writing a parallel code yes yes so this is all implemented in hardware so if we take a computer architecture class you'll learn much more about these protocols and all of their variants so for our purposes we don't actually need to understand all the details of the hardware we just need to understand what it's doing at a high level so we can understand when we have a performance bottleneck and why we have a performance bottleneck so that's why I'm just introducing the most basic protocol here any other questions all right so so I talked a little bit about the shared memory hardware let's now look at some concurrency platforms so these are the four platforms that we'll be looking at today so first what is a concurrency platform both writing parallel programs is very difficult it's very hard to get these programs to be correct and if you want to optimize their performance it becomes even harder so it's very painful and error-prone and a concurrency platform abstracts processor cores and handles synchronization and communication protocols and it also performs load balancing for you so it makes your lives much easier and so today we're gonna talk about some of these different concurrency platforms so to illustrate these concurrency platforms I'm gonna do the Fibonacci numbers example so does anybody not know what Fibonacci is so good everybody knows what Fibonacci is okay so so it's a sequence where each numbers are sum of the previous two numbers and the recurrence here is the recurrence is shown in this brown box here the sequence is named after Leonardo di Pisa who was also known as Fibonacci which is a contraction of filius Bonacci son of Bonacci oh so that's where the name Fibonacci came from and in Fibonacci's 1202 book libra bocce he introduced the sequence the Fibonacci sequence to Western mathematics although it had been previously known to India and mathematicians for several centuries but this is what we call the the sequence nowadays Fibonacci numbers all right so here's a Fibonacci program has anyone seen this out algorithm before a couple people probably more but people didn't raise their hands okay so all right so it's a recursive program so it basically implements the recurrence from the previous slide so if n is less than 2 we just return n otherwise we compute fib of n minus 1 store that value in X filled with n minus 2 store that value in Y and then return the sum of x and y so I do want to make a disclaimer to the algorithms police that this is actually a very bad algorithm so this algorithm takes exponential time and there's actually much better ways to compute the Fibonacci thence the Bonacci number there's a linear time algorithm which just computes the Fibonacci numbers from bottom up this algorithm here is actually redoing a lot of the work because it's computing Fibonacci numbers multiple times whereas if you do a linear scan from the smallest numbers up you only have to compute each one once and there's actually an even better algorithm that takes logarithmic time and it's based on squaring matrices so has anyone seen that algorithm before yeah so a couple people so if you're interested in learning more about this algorithm I encourage you to look at your favorite textbook introduction algorithms by cormen leiserson rivest and stein so even though this is a yes so even though this is a pretty bad algorithm it's still a good educational example because I can fit it on on one slide and illustrate all the concepts of parallelism that we want to cover today so here's the execution tree for fib of 4 so we see that fair before is going to call fib of 3 in favor of to fib of 3 is gonna call fib of 2 fib of 1 and so on and you can see that repeated computations here it's a fib of 2 is being computed twice and so on and if you have a much larger tree say you ran it ran this on fit before T then you'll have many more overlapping computations it turns out that the two recursive calls can actually be paralyzed because they're completely independent calculations so the key idea for parallelization is to simultaneously execute the two recursive sub calls to fib and in fact you can do this recursively so the two sob calls to Fela 3 can also be executed in parallel and the two sub calls a fiblet 2 can also be executed in parallel on so on so you have all of these calls that can be executed in parallel so that's a key idea for extracting parallelism from this algorithm okay so let's now look at how we can use P threads to implement this simple Fibonacci algorithm so P threads is a standard API for threading and it's supported on all UNIX based machines and if you're programming using Microsoft products then then the equivalent is when API threads and this is actually a P threads is actually standard and ANSI I Triple E so there's this number here that specifies a standard but nowadays we just call it P threads and it's basically a do-it-yourself concurrency platforms so it's like the assembly language of parallel programming it's built as a library of functions with special non C semantics because if you're just writing code in C you can't really say which parts of the code should be executed in parallel so P threads provides you a library of functions that allow you to specify concurrency your program and each thread implements an abstraction of a processor and these threads are then multiplex onto the actual machine resources so the number of threads that you create doesn't necessarily have to match the number of processors you have on your machine so if you have more threads than the number of processors you have then they'll just be multiplexing so you can actually run a pthreads program on a single single core even though you have multiple threads in the program they would just be time sharing all the threads communicate through shared memory so they all have access to the same view of the memory and the library functions that pthreads provides mask the protocols involved in inner thread coordination so you don't have to do it yourself because it turns out that this is quite difficult to do correctly by hand so now I want to look at the key P thread functions so the first P thread function is pthread create and this takes four arguments so the first argument is this a P thread underscore T type this is this is basically going to store and identify for the new thread that pthread create will create so that we can use that thread in our computations P thread attribute T decide some thread attributes and for our purposes we can just set it to null and use the default attributes the third argument is this function that's going to be executed after we create the thread so we're going to need to define this function that we want the thread to execute and then finally we have this void star argue arg argument which stores the arguments that are going to be passed to the function that we're going to be executing and then pthread create also returns an error status returns an integer specifying whether the thread creation was successful or not and then there's another function called pthread join pthread join basically says that we want to block at this part of our code until all the until so until this specified thread finishes so it takes us argument pthread underscore t so this thread identifier and these thread identifiers were created when we call p thread pthread create also has the second argument status which is going to store the status of the terminating thread and that p thread join also returns an error status so essentially what this does is says to wait until there's threat finishes before we continue on in our program so any any questions so far ok so here's what the implementation of Fibonacci looks like using P threads so on the Left we see the original program that we had the fib function there that's just the sequential code and then we have all this other stuff to enable it to run in parallel so first we have this struct on the left thread args this struct here is used to store the arguments that are passed to the function that the thread is going to execute and then we have this thread func what that does is it it reads the input argument from this thread args struct and then it sets that to I and then it calls fib of I and that gives you the output and then we store the result into the output of the struct and then that also just returns null and then over on the right hand side we have the main function that will actually call the fib function on the left so we initialize a whole bunch of variables that we need for to execute these threads and then we first check if n is less than 30 if n is less than 30 it turns out that it's actually not worse creating threats to execute this program in parallel because of the overhead of thread creation so if n is less than 30 we'll just execute the program sequentially and this idea is known as coarsening so you saw a similar example couple lectures ago when we did coarsening for sorting but this is in the context of parallel program so here because there are some overheads to running function parallel if the input size is small enough sometimes you want to just execute it sequentially and then we're going to okay so let me just walk through this code since have an animation okay so the next thing it's going to do is it's going to marshal the input argument to the threat so it's going to store the input argument n minus 1 in this args struct and then we're gonna call pthread create with a thread variable for thread arts we're just gonna use null and then we're gonna pass the thread func that we defined on the left and then we're going to pass the args structure and in inside this arc structure the input is set to n minus 1 which we did on the previous line okay and then pthread create is going to it's going to give a return value so if if the P thread creation was successful then the status is going to be no and we can continue and if that and when we continue we're going to execute now fib of n minus 2 and store the result of that into our result variable and this is done at the same time that fib of n minus 1 is executing because we created this P thread and we told it to call this thread func function that we defined on the left so fib of n minus 1 and fib of n minus 2 are executing in parallel now and then we have this pthread join which says we're going to wait until the thread that we created finishes before we move on because we need to know the result of both of the sub calls before we can finish this function and once that's done well we first check the status to see if it was successful and if so then we add the outputs of the arguments truck to the results so arg star output will store the output of fib of n minus 1 so that's the that's the pthreads code any questions on how this works yeah yeah yeah so this is because the pthread create function takes as input void star pointer because it's actually a generic function so it doesn't know what the data type is it has to work for all data types and that's why we need to cast it to void star when we pass it the pthread create and then inside the thread func we actually do know what type of pointer that is so then we cast it yeah so does this code seem very parallel so how many parallel calls am i doing here yeah yeah so I'm only creating one threat so I'm executing two things in parallel so if I ran this code on four processors what's the maximum speed-up I could get so maximum speed-up I can get is just two because I'm only creating I'm only running two things in parallel so this doesn't recursively create threads only creates threads at the it only creates one thread at the top level and if you wanted to make make it so that this code actually recursively created threads it would actually become much more complicated and that's one of the disadvantages of implementing this code in P threads so we'll look at other solutions that will make this task much easier so some of the issues with P threads are shown on this slide here so there's a high overhead to creating a thread so creating a thread typically takes over 10 to the 4/4 cycles and this leads to very coarse-grained concurrency because your tasks have to do a lot of work in order to amortize the cost of creating that thread there are something called thread pools which can help and the idea here is to create a whole bunch of threads at the same time to amortize the cost of thread creation and then when you need a thread you just take one from the thread pool so the thread pool contains threads that are just waiting to do work there's also a scalability issue with this code that I showed on the previous slide the Fibonacci code gets at most one point five x speed-up for two cores why is it one point five here does anyone know yeah yeah so it turns out that the two the two calls and I'm executing in parallel they're not doing the same amount of work so one is computing fib of n minus one one is computing fib of n minus twos and does anyone know what the ratio between these two values is yeah so it's the golden ratio it's about 1.6 it turns out that if you can get a speed-up of 1.6 and that's great but there are some overheads so this code will get about a 1.5 speed-up and if you want to run this to take at the edge of more course and you need to rewrite this code and it becomes more complicated third there's the issue of modularity so if you look at this code here you see that the Fibonacci logic is not nicely encapsulated within one function we have that logic in the function on the left but then we also have some of the fib logic on the right in our main function and this makes it this makes this code not modular and if we if we want to build programs on top of this it makes it very hard to maintain if we want to just change the logic of the Fibonacci function a little bit because now we have to change it in multiple places instead of just having everything in one place so it's not a good idea to write code that's not modular so please don't do it do that in your projects okay and then finally the code becomes complicated because you have to actually move these arguments around that's known as argument marshalling and then you have to engage in error-prone protocols in order to do load balancing so if you recall here we have to actually place the argument n minus 1 into into our dot input and we have to extract the value out of Arts dot output so that makes the code very messy so why do I say shades of 1958 here suddenly one no what what happened in 1958 who is around in 1958 Jess Charles so there is a first something in 1958 what was it so turns out in 1958 we had the first compiler and this was the Fortran compiler and before we had the fortune compiler programmers were writing things in assembly and when you write things in assembly you have to do argument argument marshalling because you have to place things into the appropriate registers before calling a function and also move things around when you return from a function and the nice thing about the first compiler is that it actually did all of this argument marshalling for you so now you can just pass arguments to a function and the compiler will generate code that will do the argument marshalling for for us so having you do this in P threads is similar to having to write code in assembly because you have to actually manually marshal these arguments so hopefully there are better ways to do this and indeed we'll look at some other solutions that will make it easier on the programmer any questions before I continue all right so we looked at pthreads next let's look at threading building blocks so threading building blocks is a library solution it was developed by Intel and it's implemented as a C++ library that runs on top of native threads so the underlying implementation uses threads but the programmer doesn't deal with threads instead the programmer specifies tasks and these tasks are automatically load balanced across the threads using a work-stealing algorithm inspired by research at MIT charles license research in the focus of Intel TBB is on performance and as we'll see the code written using TBB is simpler than what you would have to write if you used P threads so let's look at how we can implement Fibonacci using TBB okay so in TBB we have to create these tasks so in the Fibonacci code we create this v as class and inside the task we have to define this execute function so the xq function is a function that performs the computation when we start the task and this is where we define the Fibonacci logic this task also takes as input these arguments parameter and in sum so and as they input here in sum is the output okay and in T bbbb we can easily create a recursive program that extracts more parallelism and here what we're doing is we're recursively creating two child tasks a and B that's a syntax for creating the task and here we can just pass two arguments to fib tasks instead of marshaling the arguments ourselves and and then what we have here is a set ref count and this basically is the number of casts that we have to wait for plus 1 so plus one for ourselves and in this case we create a two children task and we have ourselves so that's 2 plus 1 and then after that we start has be using the spawn be call and then we do spawn and wait for all with a as the argument in this place he says we're going to run a start task a and then also wait for both a and B to finish before we proceed so this spawn and wait for all call is going to look at the raft count that we set above and wait for that many tasks to finish before it continues and then after after both a and B have completed then we can just sum up the results and store that in the sum variable and here these tasks are created recursively so unlike the pthreads implementation that was only creating one thread at the top level here we're actually recursively creating more and more tasks so we can actually get more parallelism from this code and scale to more processors we also need this main function just to start up the program so what we do here is we create a root task which just computes fib of n and that we call spawn root and weight of a so a is it has for the root and then it will just run the root task so that's what T Fibonacci looks like in TBB so this is much simpler than the pthreads implementation and it also gets better performance because we can extract more parallelism from from the computation any questions okay so TBB also has many other features in addition to tasks so TBB provides many C++ templates to express common patterns and you can use these templates on different data types so they have a parallel for which is used to express loop parallelism so you can loop over a bunch of iterations in parallel they also have a parallel reduce for data aggregation for example if you if you want to sum together a whole bunch of values you can use a parallel reduce to do that in parallel they also have a pipeline and filter that's used for software pipelining TBB provides many concurrent container classes which allow multiple threats is safely access and update the items in the container concurrently so for example they have hash tables trees priority queues and so on and you can just use these out-of-the-box and don't work in parallel you can do concurrent updates and reads to these data structures TPB also has a variety of mutual exclusion library functions such as locks and atomic operations so there are a lot of features of TBB which is why is one of the more popular concurrency platforms and because of all of these features you don't have to implement many of these things by yourself and still get pretty good performance so TBB was a library solution to the concurrency problem now we're gonna look at two linguistics solutions OpenMP and silk so start with OpenMP so OpenMP is a specification by an industry consortium and there are several compilers available that support OpenMP both open source and proprietary so nowadays GCC icc and clang all support OpenMP as well as visual studio and openmp is it provides linguistic extensions to C and C++ as well as Fortran and form of compiler pragmas so you use these compiler pragmas in your code to specify which pieces of code should run in parallel and OpenMP also runs on top of native threads but the programmer isn't exposed to these threads OpenMP supports loop parallelism so you can do parallel for loops they have task parallelism as well as pipeline parallelism so let's look at how we can implement fibonacci in OpenMP so this is the entire code so I want you to compare this to the P threads implementation that we saw 10 minutes ago so so this code is much cleaner than the P threads implementation it it also performs better so let's see how this code works so we have these compiler pragmas or compiler directives and the compiler pragma for creating a parallel task is a OMP task so we're gonna create a openmp task for faileth n minus 1 as well it's a Fablab n minus 2 there's also this yeah there's also this shared pragma which specifies that the two variables in the arguments are shared across different threads so you also have to specify whether variables are private or shared and then the pragma OMP weight just says we're gonna wait for the preceding task to complete before we continue so here is going to wait for fib of n minus 1 and fib of n minus 2 to finish before we return the result which is what we want and then after that we just returned X plus y so that's the entire code and OpenMP also provides many other pragma directives in addition to tasks so we can use a parallel for to do loop parallelism there's reduction there's also directives for scheduling and data sharing so you can specify how you want a particular loop to be scheduled OpenMP has many different scheduling policies they have static parallelism dynamic parallelism and so on and then these scheduling directives also have different grain sizes the data sharing directives are specifying whether a variables are private or shared OpenMP also supplies a variety of synchronization constructs such as barriers atomic updates mutual exclusion or mutex locks so openmp also has many features and it's also one of the more popular solutions to writing parallel programs and as you saw in the previous example the code is much simpler than if you were to write something using pthreads or even a TVB this is a much simpler solution any questions yeah so so this code here is actually independent of the number of processors so there is actually a scheduling algorithm that will determine how the casket map to different processors so if you spawned in you test it doesn't necessarily put it on the different processor and you can have more tasks than the number of processes available so there's a scheduling algorithm that will take care of how these tasks get mapped to different processors and that's hidden from the programmer although you can't use these scheduling pragmas to give hints the compiler how it should schedule it yeah us so I mean underneath this is implemented using P threads which has to make operating system calls to basically directly talk to the processor cores and do multiplexing and so forth so it's the operating system is involved at a very low level okay so the last concurrency platform that we'll be looking at today is silk okay so or we're gonna look at so plus actually in the silk part of silk plus is a small set of linguistic extensions to C and C++ that support for joint parallelism so for example the Fibonacci example uses fork/join parallelism so you can use soap to implement that and the plus part of soap Plus supports vector parallelism which you had experience working with in your homeworks so silk was so plus was initially devolved by silk arts which was an MIT spin-off and silk arts was acquired by Intel in July 2009 and the so plus implementation is based on award winning based on the award-winning silk multi-threaded languished I was developed two decades ago here at MIT by Charles artisans research group and it features a provably efficient work stealing scheduler so this schedule is provably efficient you can actually prove theoretical bounds on it and this allows you to implement theoretically efficient algorithms which we'll talk more about in another lecture algorithm design but it provides a provably efficient work stealing scheduler and Charles Larsen has a very famous paper that has a proof of that this scheduler is optimal so if you're interested in reading about this you can talk to us offline so plus also provides a hyper object library for paralyzing code with global variables and you'll have a chance to play around with hyper objects in homework for the so plus ecosystem also includes useful programming tools such as a silk screen race detectors so this allows you to detect determinacy races in your program to help you isolate bugs and performance bottlenecks it also has a scalability analyzer called silk view and silk view will base analyze the amount of work that your program is doing as well as the maximum amount of parallelism that your code could possibly extract from the hardware so that's in Tulsa plus but it turns out that we're not actually going to be using in Tulsa plus in this class we're gonna be using a better compiler and this compiler is based on taper LLVM and it supports the silk subset of silk plus and taper LLVM was actually recently developed at MIT by TB charnel who gave a lecture last week William Moses who's a grad student working with Charles as well as charles leiserson and so talking a lot about Charles's work today and taper LVM generally produces better code relative to its base compiler than all other implementations of silk out there so it's the best cell compiler that's available today and they actually wrote a very nice paper on this last year Charles lisen in this group and that paper received the best paper award at the annual symposium on principles and practices of parallel programming or p-pop so you should look at that paper as well so right now taper LVM uses the Intel so plus runtime system by I believe Charles's group has plans to implement a better runtime system and tabor LVM also supports more general features than existing sole compilers so in addition to spawning functions you can also spawn code blocks that are not separate functions and this makes your this right makes writing programs more flexible you don't have to actually create a separate function if you want to execute a code block in parallel any questions so this is the salt coat for Fibonacci so it's also pretty simple it looks very similar to the sequential program except we have these silk spawn and silk stink statements in the code so what do these statements do so silk spawn says that the named child function which is the function that is right after the silk spawn statement may execute in parallel with the parent caller the parent caller is the function that is calling soaked spawn so this says that fib of n minus 1 can execute in parallel with the function that called it and then this function is then gonna call fib of n minus 2 and fib of n minus 2 and fib of n minus 1 now can be executing in parallel and then silk sync says that control cannot pass this point until all of the spawn children have returned since it's going to wait for fifth of N minus 1 to return before we go to the return statement where we add up X&Y so one important thing to note is that the so key where it's grant permission for parallel execution but they don't actually force or command parallel execution so even though I said soak spawn here the the runtime system doesn't necessarily have to run fib of n minus 1 in parallel with fib of n minus 2 I'm just saying that I could run these two things in parallel and it's up to the runtime system to decide whether or not to run these things in parallel based on its scheduling policy so let's look at another example of silk so let's look at loop parallelism so here we want to do a matrix transpose and we want to do this in place so the idea here is we want to basically swap the elements below the diagonal to the - its mirror mirror image above the diagonal and here's some code to do this so we have a silk for so this is basically a parallel for loop it goes from I equals 1 to n minus 1 and then the inner for loop goes from J equals 0 up to I minus 1 and then we just swap a of IJ with a of ji using these three statements inside the body of the for loop so so to execute a for loop in parallel you just have to add silk underscore to the to the for keyword and that's as simple as it gets so this code is actually going to run in parallel and get a pretty good speed up on this for this particular problem and internally silk for loops are transformed into nested silks spawn in silks Inc calls so the compiler is going to get rid of the silk for and change it into silks spawn and silks Inc so it's going to recursively divide the iteration space into half and then it's going to spawn off one half in the next acute the other half in parallel with that and then recursively do that until the iteration range becomes small enough at which point doesn't make sense to execute it in parallel anymore so we just execute that range sequentially okay so that's loop parallelism in silk any questions yes there's something weird can I still do that yeah so the compiler can actually figure out what the iteration space is so you don't necessarily have to be incrementing by one you can do something else you just have to guarantee that all of the iterations are independent so if if you act if you have a determinacy race across the different iterations of your cell for loop then your result might not necessarily be correct so you have to make sure that the iterations are indeed independent yes yeah so you can that's silk force it turns out that for this example usually you already have enough parallelism in the outer loop for large enough values of n so it doesn't make sense to put a slope for loop inside because using a silk for loop adds some additional overheads but you can actually do nested silk for loops and in some cases it does make sense especially if the if there's not enough parallelism in that outermost for loop it's a good question yes so I mean it it it has a bunch of calls to the silk run time system I don't actually I don't I mean I don't know all the details because I haven't looked at this recently but I think you can actually generate the assembly code using a flag in the clang compiler so that's a good exercise yeah you probably want to look at the LLVM ir rather than the assembly to begin with to understand what's going on it has three instructions that are not in the standard LLVM which which were added to support parallelism those things when it's lowered into the assembler into assembly each of those instructions becomes a bunch of assembly language instructions so you don't want to mess with with looking at it in the assembler until you see what it looks like in the LLVM first yes a good question any other questions about this this code here okay so let's look at another example so let's say we had this floral loop where on each iteration I were just incrementing a variable sum by I so this is essentially going to commute compute the summation of everything from I equals 0 up to n minus 1 and then print out the result so one straightforward way to try to paralyze this code is to just change the 4 to a soap 4 so does this code work who thinks that this code doesn't work or doesn't compute the correct result so about half of you and who thinks this code does work so couple people in guess the rest of the people don't care so it turns out that it's not actually necessarily going to give you the right answer because the silk for loop says you can execute these iterations in parallel but they're all updating the same shared variable sum here so you have a what's called a determinacy race where multiple processors can be writing to the same memory location we'll talk much more about determinacy races in the next lecture but for this example it's not necessarily going to work if you run it on more than one processor and silk actually has a nice way to deal with this so in silk we have something known as a reducer this is one example of a hyper object which I mentioned earlier and with a reducer what you have to do is instead of declaring this some variable just has an unsigned long data type what you do is use this macro called silk see reducer op ad which specifies we want to create a reducer with the addition function then we have the variable name sum the data type unsigned long and then the initial value zero and then we have a macro to register this reducer so a silk see register reducer and then now inside this soap for loop we can increment the sum or reducer view of sum which is another macro by I and you can actually execute this in parallel and it will give you the same answer that you would get if you ran this sequentially so the reducer will take care of this determinacy race for you and at the end when you print out this result you'll see that the sum is equal to the sum that you expect and then after you finish using the reducer use this other macro called silky an unregistered reducer of sum that tells the system that you're done using this reducer so this is one way to deal with this problem when you're when you want to do a reduction and it turns out that there are many other interesting reduction operators that you might want to use and in general you can create reducers for Mahanoy Chand ma nodes or algebraic structures that have an associative binary operation as well as an identity element so the addition operator is a mono it because it's associative it's binary and the identity element is 0 silk also has several other predefined reducers including multiplication min Max and or XOR etc so these are all more mono it's and you can also define your own reducer so in fact in the next homework you'll have the opportunity to play around with reducers and write a reducer for lists so that's reducers another nice thing about silk is that there's always a valid serial interpretation of the program so the serial elision of a silk program is always a legal interpretation and for the silk source code on the left the serial elision is basically the code you get if you get rid of the silk spawn and soak sync statements this looks just like the sequential code remember that the sole keywords grant permission for parallel execution but they don't necessarily command parallel executions if you ran this if you ran this SIL code using a single core it wouldn't actually create these parallel tasks and you would get the same answer as the sequential program and this is in the serial illusion is also a correct interpretation so unlike other solutions such as TBB and P threads it's actually difficult in those environments to get a program that does what the sequential program does because they're actually doing a lot of additional work to set up these parallel calls and create these argument structures and other scheduling constructs whereas in silk it's very easy just to get the serial illusion you just define silks pawn and silks Inc to be null you also define silk for to be for and then this gives you a valid sequential program so when you're debugging code and you might first want to check if your if the sequential elision of your silk program is correct and you can easily do that by using these macros or actually there's actually a compiler flag that will do that for you and give you the equivalent C program so this is a nice way to debug because you don't have to start with the parallel program you can first check if this serial program is correct before you go on to debug the parallel program questions yes so it turns out that by default it it groups a bunch of iterations together into a single task because it doesn't make sense to break it down into such small chunks do that overhead so parallelism but there's actually a setting you can do to change the grain size of the for loop so you could actually make it so that each iteration it's its own task and then and then as you said the scheduler scheduler will decide how to map these different tasks onto different processors or even if it wants to execute any of these tasks in parallel it's a good question okay so the idea in seok is to allow the programmer to express logical parallelism in an application so the programmer just has to identify which pieces of the code could be executed in parallel but doesn't necessarily have to determine which of the piece which pieces of code should be executed in parallel and then silk has a runtime scheduler that will monitor matically map the executing program on to the available processor cores runtime and it does this dynamically using work-stealing scheduling algorithm and the work stealing scheduler is used to balance the tasks evenly across the different processors and we'll talk more about the work stealing scheduler in a future lecture but I want to emphasize that unlike unlike the other concurrency platforms that we looked at today silks works the only scheduling algorithm is theoretically efficient whereas the OpenMP and TBB schedulers are not theoretically efficient so this is a nice property because it will guarantee you that the algorithms you write on top of silk will also be theoretically efficient okay so here's a high-level illustration of the silk ecosystem it's a very simplified view but I did this to fit it on a single slide so so what you do is you take the silk source code you pass it to your favorite silk compiler the taper LLVM compiler and this gives you a binary that you can run on multiple processors and then you pass a program input to the binary you run it on however many processors you you have and then this allows you to benchmark the parallel performance of your program you can also do serial testing and to do this you just obtain a serial illusion of the Silk program and you pass it to an ordinary C or C++ compiler generates a binary that can only run on a single processor and you run your suite of serial regression tests on this single threaded binary and this will allow you to benchmark to performance of your serial code and also debug any issues that might have arised when you are running this program sequentially another way to do this is you can actually just compile the original silk code but run it on a single processor so there's a command-line argument that tells the runtime system how many processors you want to use any of you set that parameter to 1 then it will only use a single processor and this allows you to benchmark the single-threaded performance of your code as well and the parallel program executing on a single course should behave exactly the same way as the execution of this serial Alishan so that's one of the advantages of using silk and because because you can easily do serial testing using the silk platform this allows you to separate out the serial correctness from the parallel correctness as I said earlier you can first debug the serial correctness as well as any performance issues before moving on to the parallel version and another point I want to make is that because the because Silk actually uses the serial program inside his tasks it's actually good to optimize a serial program even when you're writing a parallel program because optimizing the serial program for performance will also translate to better parallel performance another nice feature of seok is that it has this tool called silks and which stands for silk sanitizer and the silk silks and will detect any determinacy races that you have in your code which will significantly help you with debugging the correctness and as well as the performance of your code so silks and wool if you compile the silk code using the silks and flag it will generate an instrumented binary that when you run it will find and localize all of the determinacy races in your programs it will tell you where the determinacy races occur so that you can go inspect that part of your code and fix it if necessary so this is a very useful tool for benchmarking your parallel programs silk also has another nice tool called silk scale silk scale it's a performance analyzer it will analyze how much parallelism is available in your program as well as the total amount of work that it's doing so again you pass a flag to the compiler that will turn on silk silk scale and it will generate a binary that is instrumented and then when you run run this code it will give you a scalability report so you'll find these tools very useful when you're doing the next project and we'll talk a little bit more about these two tools in the next lecture right as I said silk skill will analyze how well your program will scale to larger machines so as it will basically tell you the maximum number of processors that your code could possibly take advantage of any questions yes so I mean the scheduler the silk runtime scheduler does scheduling that different tasks when you're running the program so it's linked from the binary it's not stored in the same place it's linked other questions all right so let me summarize what we looked at today so first we saw that most processors today have multiple cores and probably all of your laptops have more than one core on it who has a laptop that only has one core ok when did you buy it yeah probably a long time ago I see so so nowadays obtaining high performance on your machines requires you to write parallel programs but parallel programming can be very hard especially if you have to program directly on the processor cores and interact with the operating system yourself so silk is very nice because it abstracts the processor cores from the programmer a handle synchronization and communication protocols and it also performs provably good load balancing and in the next project you'll have a chance to play around with silk you'll be implementing your own parallel screensaver so that's a very fun project to do and possibly in one of the future lectures we'll post some of the nicest screen savers that students developed for everyone to see ok so that's all you 3 00:00:05,769 --> 00:00:08,019 4 00:00:08,019 --> 00:00:09,850 5 00:00:09,850 --> 00:00:10,930 6 00:00:10,930 --> 00:00:13,120 7 00:00:13,120 --> 00:00:15,160 8 00:00:15,160 --> 00:00:21,690 9 00:00:21,690 --> 00:00:23,950 10 00:00:23,950 --> 00:00:28,210 11 00:00:28,210 --> 00:00:30,549 12 00:00:30,549 --> 00:00:37,740 13 00:00:37,740 --> 00:00:40,660 14 00:00:40,660 --> 00:00:45,000 15 00:00:45,000 --> 00:00:48,280 16 00:00:48,280 --> 00:00:51,520 17 00:00:51,520 --> 00:00:54,370 18 00:00:54,370 --> 00:00:58,330 19 00:00:58,330 --> 00:01:01,000 20 00:01:01,000 --> 00:01:03,760 21 00:01:03,760 --> 00:01:05,710 22 00:01:05,710 --> 00:01:08,110 23 00:01:08,110 --> 00:01:14,110 24 00:01:14,110 --> 00:01:16,539 25 00:01:16,539 --> 00:01:19,660 26 00:01:19,660 --> 00:01:22,030 27 00:01:22,030 --> 00:01:24,969 28 00:01:24,969 --> 00:01:27,399 29 00:01:27,399 --> 00:01:30,820 30 00:01:30,820 --> 00:01:34,510 31 00:01:34,510 --> 00:01:37,330 32 00:01:37,330 --> 00:01:39,670 33 00:01:39,670 --> 00:01:42,640 34 00:01:42,640 --> 00:01:43,929 35 00:01:43,929 --> 00:01:46,240 36 00:01:46,240 --> 00:01:50,080 37 00:01:50,080 --> 00:01:52,020 38 00:01:52,020 --> 00:01:55,000 39 00:01:55,000 --> 00:01:57,490 40 00:01:57,490 --> 00:02:00,010 41 00:02:00,010 --> 00:02:02,200 42 00:02:02,200 --> 00:02:07,570 43 00:02:07,570 --> 00:02:10,929 44 00:02:10,929 --> 00:02:14,589 45 00:02:14,589 --> 00:02:14,599 46 00:02:14,599 --> 00:02:14,950 47 00:02:14,950 --> 00:02:17,410 48 00:02:17,410 --> 00:02:18,910 49 00:02:18,910 --> 00:02:20,950 50 00:02:20,950 --> 00:02:24,010 51 00:02:24,010 --> 00:02:26,080 52 00:02:26,080 --> 00:02:28,120 53 00:02:28,120 --> 00:02:31,450 54 00:02:31,450 --> 00:02:36,070 55 00:02:36,070 --> 00:02:37,150 56 00:02:37,150 --> 00:02:39,190 57 00:02:39,190 --> 00:02:45,400 58 00:02:45,400 --> 00:02:47,260 59 00:02:47,260 --> 00:02:49,420 60 00:02:49,420 --> 00:02:51,940 61 00:02:51,940 --> 00:02:54,310 62 00:02:54,310 --> 00:02:56,650 63 00:02:56,650 --> 00:02:59,380 64 00:02:59,380 --> 00:03:01,360 65 00:03:01,360 --> 00:03:03,670 66 00:03:03,670 --> 00:03:05,050 67 00:03:05,050 --> 00:03:07,890 68 00:03:07,890 --> 00:03:10,480 69 00:03:10,480 --> 00:03:12,610 70 00:03:12,610 --> 00:03:14,500 71 00:03:14,500 --> 00:03:17,440 72 00:03:17,440 --> 00:03:20,950 73 00:03:20,950 --> 00:03:23,800 74 00:03:23,800 --> 00:03:26,530 75 00:03:26,530 --> 00:03:33,700 76 00:03:33,700 --> 00:03:35,920 77 00:03:35,920 --> 00:03:38,200 78 00:03:38,200 --> 00:03:40,990 79 00:03:40,990 --> 00:03:43,060 80 00:03:43,060 --> 00:03:46,240 81 00:03:46,240 --> 00:03:48,100 82 00:03:48,100 --> 00:03:50,080 83 00:03:50,080 --> 00:03:51,580 84 00:03:51,580 --> 00:03:55,420 85 00:03:55,420 --> 00:04:00,310 86 00:04:00,310 --> 00:04:03,430 87 00:04:03,430 --> 00:04:10,369 88 00:04:10,369 --> 00:04:14,940 89 00:04:14,940 --> 00:04:17,009 90 00:04:17,009 --> 00:04:19,469 91 00:04:19,469 --> 00:04:22,379 92 00:04:22,379 --> 00:04:24,689 93 00:04:24,689 --> 00:04:26,760 94 00:04:26,760 --> 00:04:29,400 95 00:04:29,400 --> 00:04:31,499 96 00:04:31,499 --> 00:04:34,080 97 00:04:34,080 --> 00:04:37,050 98 00:04:37,050 --> 00:04:39,330 99 00:04:39,330 --> 00:04:41,430 100 00:04:41,430 --> 00:04:43,860 101 00:04:43,860 --> 00:04:46,070 102 00:04:46,070 --> 00:04:49,610 103 00:04:49,610 --> 00:04:52,080 104 00:04:52,080 --> 00:04:54,930 105 00:04:54,930 --> 00:04:56,610 106 00:04:56,610 --> 00:04:59,540 107 00:04:59,540 --> 00:05:02,040 108 00:05:02,040 --> 00:05:04,950 109 00:05:04,950 --> 00:05:07,290 110 00:05:07,290 --> 00:05:13,610 111 00:05:13,610 --> 00:05:17,730 112 00:05:17,730 --> 00:05:20,070 113 00:05:20,070 --> 00:05:22,379 114 00:05:22,379 --> 00:05:24,930 115 00:05:24,930 --> 00:05:26,730 116 00:05:26,730 --> 00:05:31,529 117 00:05:31,529 --> 00:05:33,180 118 00:05:33,180 --> 00:05:37,200 119 00:05:37,200 --> 00:05:39,629 120 00:05:39,629 --> 00:05:43,080 121 00:05:43,080 --> 00:05:45,839 122 00:05:45,839 --> 00:05:48,710 123 00:05:48,710 --> 00:05:52,409 124 00:05:52,409 --> 00:05:54,810 125 00:05:54,810 --> 00:05:59,070 126 00:05:59,070 --> 00:06:01,020 127 00:06:01,020 --> 00:06:05,189 128 00:06:05,189 --> 00:06:07,589 129 00:06:07,589 --> 00:06:09,390 130 00:06:09,390 --> 00:06:11,250 131 00:06:11,250 --> 00:06:15,360 132 00:06:15,360 --> 00:06:18,240 133 00:06:18,240 --> 00:06:20,279 134 00:06:20,279 --> 00:06:22,890 135 00:06:22,890 --> 00:06:26,740 136 00:06:26,740 --> 00:06:28,689 137 00:06:28,689 --> 00:06:30,790 138 00:06:30,790 --> 00:06:32,469 139 00:06:32,469 --> 00:06:36,399 140 00:06:36,399 --> 00:06:41,790 141 00:06:41,790 --> 00:06:44,050 142 00:06:44,050 --> 00:06:46,390 143 00:06:46,390 --> 00:06:48,909 144 00:06:48,909 --> 00:06:51,070 145 00:06:51,070 --> 00:06:53,559 146 00:06:53,559 --> 00:06:56,770 147 00:06:56,770 --> 00:06:59,709 148 00:06:59,709 --> 00:07:02,260 149 00:07:02,260 --> 00:07:05,290 150 00:07:05,290 --> 00:07:09,730 151 00:07:09,730 --> 00:07:15,129 152 00:07:15,129 --> 00:07:17,050 153 00:07:17,050 --> 00:07:18,519 154 00:07:18,519 --> 00:07:20,860 155 00:07:20,860 --> 00:07:23,760 156 00:07:23,760 --> 00:07:26,680 157 00:07:26,680 --> 00:07:29,140 158 00:07:29,140 --> 00:07:32,290 159 00:07:32,290 --> 00:07:34,269 160 00:07:34,269 --> 00:07:38,619 161 00:07:38,619 --> 00:07:42,730 162 00:07:42,730 --> 00:07:44,680 163 00:07:44,680 --> 00:07:47,110 164 00:07:47,110 --> 00:07:49,240 165 00:07:49,240 --> 00:07:52,839 166 00:07:52,839 --> 00:07:55,329 167 00:07:55,329 --> 00:07:58,240 168 00:07:58,240 --> 00:08:00,279 169 00:08:00,279 --> 00:08:04,059 170 00:08:04,059 --> 00:08:06,089 171 00:08:06,089 --> 00:08:09,010 172 00:08:09,010 --> 00:08:10,600 173 00:08:10,600 --> 00:08:14,619 174 00:08:14,619 --> 00:08:17,050 175 00:08:17,050 --> 00:08:18,670 176 00:08:18,670 --> 00:08:21,820 177 00:08:21,820 --> 00:08:23,860 178 00:08:23,860 --> 00:08:25,450 179 00:08:25,450 --> 00:08:27,189 180 00:08:27,189 --> 00:08:29,559 181 00:08:29,559 --> 00:08:33,040 182 00:08:33,040 --> 00:08:35,719 183 00:08:35,719 --> 00:08:37,129 184 00:08:37,129 --> 00:08:39,009 185 00:08:39,009 --> 00:08:42,050 186 00:08:42,050 --> 00:08:43,639 187 00:08:43,639 --> 00:08:51,170 188 00:08:51,170 --> 00:08:55,150 189 00:08:55,150 --> 00:08:58,220 190 00:08:58,220 --> 00:09:02,840 191 00:09:02,840 --> 00:09:05,689 192 00:09:05,689 --> 00:09:08,000 193 00:09:08,000 --> 00:09:10,670 194 00:09:10,670 --> 00:09:12,920 195 00:09:12,920 --> 00:09:14,569 196 00:09:14,569 --> 00:09:17,810 197 00:09:17,810 --> 00:09:19,189 198 00:09:19,189 --> 00:09:21,379 199 00:09:21,379 --> 00:09:24,170 200 00:09:24,170 --> 00:09:28,689 201 00:09:28,689 --> 00:09:31,430 202 00:09:31,430 --> 00:09:34,370 203 00:09:34,370 --> 00:09:36,350 204 00:09:36,350 --> 00:09:38,689 205 00:09:38,689 --> 00:09:40,639 206 00:09:40,639 --> 00:09:42,800 207 00:09:42,800 --> 00:09:44,420 208 00:09:44,420 --> 00:09:48,439 209 00:09:48,439 --> 00:09:51,199 210 00:09:51,199 --> 00:09:53,569 211 00:09:53,569 --> 00:10:05,480 212 00:10:05,480 --> 00:10:09,870 213 00:10:09,870 --> 00:10:14,370 214 00:10:14,370 --> 00:10:18,120 215 00:10:18,120 --> 00:10:22,050 216 00:10:22,050 --> 00:10:24,319 217 00:10:24,319 --> 00:10:27,900 218 00:10:27,900 --> 00:10:31,590 219 00:10:31,590 --> 00:10:34,290 220 00:10:34,290 --> 00:10:36,449 221 00:10:36,449 --> 00:10:38,939 222 00:10:38,939 --> 00:10:41,850 223 00:10:41,850 --> 00:10:44,430 224 00:10:44,430 --> 00:10:46,920 225 00:10:46,920 --> 00:10:48,449 226 00:10:48,449 --> 00:10:49,889 227 00:10:49,889 --> 00:10:53,550 228 00:10:53,550 --> 00:10:58,500 229 00:10:58,500 --> 00:11:00,000 230 00:11:00,000 --> 00:11:01,710 231 00:11:01,710 --> 00:11:03,540 232 00:11:03,540 --> 00:11:07,560 233 00:11:07,560 --> 00:11:11,430 234 00:11:11,430 --> 00:11:12,809 235 00:11:12,809 --> 00:11:14,939 236 00:11:14,939 --> 00:11:17,730 237 00:11:17,730 --> 00:11:19,829 238 00:11:19,829 --> 00:11:22,500 239 00:11:22,500 --> 00:11:24,900 240 00:11:24,900 --> 00:11:31,850 241 00:11:31,850 --> 00:11:34,800 242 00:11:34,800 --> 00:11:36,629 243 00:11:36,629 --> 00:11:38,309 244 00:11:38,309 --> 00:11:41,850 245 00:11:41,850 --> 00:11:45,000 246 00:11:45,000 --> 00:11:48,019 247 00:11:48,019 --> 00:11:50,670 248 00:11:50,670 --> 00:11:55,199 249 00:11:55,199 --> 00:11:57,660 250 00:11:57,660 --> 00:12:00,750 251 00:12:00,750 --> 00:12:06,660 252 00:12:06,660 --> 00:12:09,629 253 00:12:09,629 --> 00:12:12,630 254 00:12:12,630 --> 00:12:15,150 255 00:12:15,150 --> 00:12:17,490 256 00:12:17,490 --> 00:12:17,500 257 00:12:17,500 --> 00:12:18,090 258 00:12:18,090 --> 00:12:29,480 259 00:12:29,480 --> 00:12:32,490 260 00:12:32,490 --> 00:12:35,490 261 00:12:35,490 --> 00:12:38,640 262 00:12:38,640 --> 00:12:41,100 263 00:12:41,100 --> 00:12:45,570 264 00:12:45,570 --> 00:12:48,900 265 00:12:48,900 --> 00:12:51,060 266 00:12:51,060 --> 00:12:53,220 267 00:12:53,220 --> 00:12:57,180 268 00:12:57,180 --> 00:12:59,550 269 00:12:59,550 --> 00:13:08,580 270 00:13:08,580 --> 00:13:11,250 271 00:13:11,250 --> 00:13:15,240 272 00:13:15,240 --> 00:13:17,550 273 00:13:17,550 --> 00:13:20,100 274 00:13:20,100 --> 00:13:23,190 275 00:13:23,190 --> 00:13:25,620 276 00:13:25,620 --> 00:13:28,260 277 00:13:28,260 --> 00:13:30,000 278 00:13:30,000 --> 00:13:31,170 279 00:13:31,170 --> 00:13:34,020 280 00:13:34,020 --> 00:13:37,290 281 00:13:37,290 --> 00:13:39,420 282 00:13:39,420 --> 00:13:46,990 283 00:13:46,990 --> 00:13:52,329 284 00:13:52,329 --> 00:13:55,449 285 00:13:55,449 --> 00:13:57,999 286 00:13:57,999 --> 00:13:59,499 287 00:13:59,499 --> 00:14:02,980 288 00:14:02,980 --> 00:14:04,480 289 00:14:04,480 --> 00:14:07,119 290 00:14:07,119 --> 00:14:09,220 291 00:14:09,220 --> 00:14:10,960 292 00:14:10,960 --> 00:14:14,559 293 00:14:14,559 --> 00:14:18,160 294 00:14:18,160 --> 00:14:21,249 295 00:14:21,249 --> 00:14:23,139 296 00:14:23,139 --> 00:14:25,329 297 00:14:25,329 --> 00:14:29,230 298 00:14:29,230 --> 00:14:32,019 299 00:14:32,019 --> 00:14:34,929 300 00:14:34,929 --> 00:14:38,230 301 00:14:38,230 --> 00:14:40,629 302 00:14:40,629 --> 00:14:42,639 303 00:14:42,639 --> 00:14:46,660 304 00:14:46,660 --> 00:14:49,540 305 00:14:49,540 --> 00:14:52,179 306 00:14:52,179 --> 00:14:56,110 307 00:14:56,110 --> 00:14:58,929 308 00:14:58,929 --> 00:15:02,019 309 00:15:02,019 --> 00:15:03,759 310 00:15:03,759 --> 00:15:05,980 311 00:15:05,980 --> 00:15:10,179 312 00:15:10,179 --> 00:15:15,009 313 00:15:15,009 --> 00:15:17,230 314 00:15:17,230 --> 00:15:20,139 315 00:15:20,139 --> 00:15:23,170 316 00:15:23,170 --> 00:15:25,389 317 00:15:25,389 --> 00:15:28,569 318 00:15:28,569 --> 00:15:33,610 319 00:15:33,610 --> 00:15:37,389 320 00:15:37,389 --> 00:15:40,030 321 00:15:40,030 --> 00:15:42,519 322 00:15:42,519 --> 00:15:44,319 323 00:15:44,319 --> 00:15:46,749 324 00:15:46,749 --> 00:15:50,949 325 00:15:50,949 --> 00:15:54,210 326 00:15:54,210 --> 00:15:58,370 327 00:15:58,370 --> 00:16:05,780 328 00:16:05,780 --> 00:16:08,480 329 00:16:08,480 --> 00:16:11,810 330 00:16:11,810 --> 00:16:13,670 331 00:16:13,670 --> 00:16:15,920 332 00:16:15,920 --> 00:16:19,720 333 00:16:19,720 --> 00:16:22,250 334 00:16:22,250 --> 00:16:32,900 335 00:16:32,900 --> 00:16:35,480 336 00:16:35,480 --> 00:16:39,080 337 00:16:39,080 --> 00:16:41,600 338 00:16:41,600 --> 00:16:43,490 339 00:16:43,490 --> 00:16:47,300 340 00:16:47,300 --> 00:16:49,760 341 00:16:49,760 --> 00:16:52,370 342 00:16:52,370 --> 00:16:54,620 343 00:16:54,620 --> 00:17:01,580 344 00:17:01,580 --> 00:17:03,050 345 00:17:03,050 --> 00:17:06,020 346 00:17:06,020 --> 00:17:07,760 347 00:17:07,760 --> 00:17:10,490 348 00:17:10,490 --> 00:17:12,800 349 00:17:12,800 --> 00:17:15,410 350 00:17:15,410 --> 00:17:18,500 351 00:17:18,500 --> 00:17:20,570 352 00:17:20,570 --> 00:17:23,210 353 00:17:23,210 --> 00:17:26,000 354 00:17:26,000 --> 00:17:28,820 355 00:17:28,820 --> 00:17:30,890 356 00:17:30,890 --> 00:17:33,770 357 00:17:33,770 --> 00:17:39,500 358 00:17:39,500 --> 00:17:42,900 359 00:17:42,900 --> 00:17:45,000 360 00:17:45,000 --> 00:17:46,590 361 00:17:46,590 --> 00:17:48,720 362 00:17:48,720 --> 00:17:50,760 363 00:17:50,760 --> 00:17:52,500 364 00:17:52,500 --> 00:17:54,480 365 00:17:54,480 --> 00:17:57,360 366 00:17:57,360 --> 00:17:59,910 367 00:17:59,910 --> 00:18:01,980 368 00:18:01,980 --> 00:18:03,930 369 00:18:03,930 --> 00:18:05,790 370 00:18:05,790 --> 00:18:08,190 371 00:18:08,190 --> 00:18:10,410 372 00:18:10,410 --> 00:18:12,090 373 00:18:12,090 --> 00:18:14,580 374 00:18:14,580 --> 00:18:17,430 375 00:18:17,430 --> 00:18:19,020 376 00:18:19,020 --> 00:18:20,820 377 00:18:20,820 --> 00:18:23,430 378 00:18:23,430 --> 00:18:25,890 379 00:18:25,890 --> 00:18:27,540 380 00:18:27,540 --> 00:18:29,190 381 00:18:29,190 --> 00:18:31,260 382 00:18:31,260 --> 00:18:38,280 383 00:18:38,280 --> 00:18:40,590 384 00:18:40,590 --> 00:18:43,140 385 00:18:43,140 --> 00:18:45,960 386 00:18:45,960 --> 00:18:51,360 387 00:18:51,360 --> 00:18:53,400 388 00:18:53,400 --> 00:18:55,320 389 00:18:55,320 --> 00:18:57,300 390 00:18:57,300 --> 00:19:00,030 391 00:19:00,030 --> 00:19:03,330 392 00:19:03,330 --> 00:19:04,830 393 00:19:04,830 --> 00:19:07,080 394 00:19:07,080 --> 00:19:19,060 395 00:19:19,060 --> 00:19:22,760 396 00:19:22,760 --> 00:19:26,510 397 00:19:26,510 --> 00:19:29,230 398 00:19:29,230 --> 00:19:32,690 399 00:19:32,690 --> 00:19:38,600 400 00:19:38,600 --> 00:19:41,020 401 00:19:41,020 --> 00:19:44,240 402 00:19:44,240 --> 00:19:46,310 403 00:19:46,310 --> 00:19:47,900 404 00:19:47,900 --> 00:19:49,730 405 00:19:49,730 --> 00:19:51,860 406 00:19:51,860 --> 00:19:54,549 407 00:19:54,549 --> 00:19:56,539 408 00:19:56,539 --> 00:19:57,970 409 00:19:57,970 --> 00:20:00,890 410 00:20:00,890 --> 00:20:03,500 411 00:20:03,500 --> 00:20:07,820 412 00:20:07,820 --> 00:20:10,430 413 00:20:10,430 --> 00:20:14,870 414 00:20:14,870 --> 00:20:17,470 415 00:20:17,470 --> 00:20:21,470 416 00:20:21,470 --> 00:20:27,980 417 00:20:27,980 --> 00:20:31,120 418 00:20:31,120 --> 00:20:35,659 419 00:20:35,659 --> 00:20:37,010 420 00:20:37,010 --> 00:20:40,250 421 00:20:40,250 --> 00:20:42,289 422 00:20:42,289 --> 00:20:45,529 423 00:20:45,529 --> 00:20:50,299 424 00:20:50,299 --> 00:20:52,820 425 00:20:52,820 --> 00:20:55,970 426 00:20:55,970 --> 00:20:57,440 427 00:20:57,440 --> 00:21:03,440 428 00:21:03,440 --> 00:21:06,770 429 00:21:06,770 --> 00:21:08,659 430 00:21:08,659 --> 00:21:11,480 431 00:21:11,480 --> 00:21:13,520 432 00:21:13,520 --> 00:21:19,430 433 00:21:19,430 --> 00:21:21,320 434 00:21:21,320 --> 00:21:24,480 435 00:21:24,480 --> 00:21:30,180 436 00:21:30,180 --> 00:21:32,710 437 00:21:32,710 --> 00:21:40,330 438 00:21:40,330 --> 00:21:44,580 439 00:21:44,580 --> 00:21:49,570 440 00:21:49,570 --> 00:21:51,760 441 00:21:51,760 --> 00:21:53,920 442 00:21:53,920 --> 00:21:57,040 443 00:21:57,040 --> 00:21:59,710 444 00:21:59,710 --> 00:22:01,840 445 00:22:01,840 --> 00:22:04,030 446 00:22:04,030 --> 00:22:11,290 447 00:22:11,290 --> 00:22:13,630 448 00:22:13,630 --> 00:22:16,600 449 00:22:16,600 --> 00:22:19,560 450 00:22:19,560 --> 00:22:22,120 451 00:22:22,120 --> 00:22:24,580 452 00:22:24,580 --> 00:22:26,890 453 00:22:26,890 --> 00:22:29,140 454 00:22:29,140 --> 00:22:32,410 455 00:22:32,410 --> 00:22:34,000 456 00:22:34,000 --> 00:22:38,230 457 00:22:38,230 --> 00:22:40,150 458 00:22:40,150 --> 00:22:42,640 459 00:22:42,640 --> 00:22:44,590 460 00:22:44,590 --> 00:22:46,540 461 00:22:46,540 --> 00:22:49,470 462 00:22:49,470 --> 00:22:53,400 463 00:22:53,400 --> 00:22:56,190 464 00:22:56,190 --> 00:22:59,290 465 00:22:59,290 --> 00:23:01,210 466 00:23:01,210 --> 00:23:03,700 467 00:23:03,700 --> 00:23:05,440 468 00:23:05,440 --> 00:23:08,290 469 00:23:08,290 --> 00:23:14,910 470 00:23:14,910 --> 00:23:18,010 471 00:23:18,010 --> 00:23:20,950 472 00:23:20,950 --> 00:23:23,800 473 00:23:23,800 --> 00:23:27,400 474 00:23:27,400 --> 00:23:29,560 475 00:23:29,560 --> 00:23:34,870 476 00:23:34,870 --> 00:23:34,880 477 00:23:34,880 --> 00:23:36,100 478 00:23:36,100 --> 00:23:39,279 479 00:23:39,279 --> 00:23:41,919 480 00:23:41,919 --> 00:23:44,649 481 00:23:44,649 --> 00:23:46,450 482 00:23:46,450 --> 00:23:48,399 483 00:23:48,399 --> 00:23:52,539 484 00:23:52,539 --> 00:23:55,570 485 00:23:55,570 --> 00:23:57,580 486 00:23:57,580 --> 00:24:03,330 487 00:24:03,330 --> 00:24:06,820 488 00:24:06,820 --> 00:24:10,870 489 00:24:10,870 --> 00:24:12,000 490 00:24:12,000 --> 00:24:14,740 491 00:24:14,740 --> 00:24:17,159 492 00:24:17,159 --> 00:24:21,009 493 00:24:21,009 --> 00:24:23,350 494 00:24:23,350 --> 00:24:26,860 495 00:24:26,860 --> 00:24:29,110 496 00:24:29,110 --> 00:24:31,419 497 00:24:31,419 --> 00:24:33,159 498 00:24:33,159 --> 00:24:36,159 499 00:24:36,159 --> 00:24:38,740 500 00:24:38,740 --> 00:24:41,529 501 00:24:41,529 --> 00:24:48,100 502 00:24:48,100 --> 00:24:51,250 503 00:24:51,250 --> 00:24:57,159 504 00:24:57,159 --> 00:25:00,399 505 00:25:00,399 --> 00:25:03,100 506 00:25:03,100 --> 00:25:06,340 507 00:25:06,340 --> 00:25:10,149 508 00:25:10,149 --> 00:25:13,299 509 00:25:13,299 --> 00:25:15,220 510 00:25:15,220 --> 00:25:18,759 511 00:25:18,759 --> 00:25:20,500 512 00:25:20,500 --> 00:25:23,049 513 00:25:23,049 --> 00:25:24,639 514 00:25:24,639 --> 00:25:26,830 515 00:25:26,830 --> 00:25:29,230 516 00:25:29,230 --> 00:25:32,169 517 00:25:32,169 --> 00:25:35,500 518 00:25:35,500 --> 00:25:37,930 519 00:25:37,930 --> 00:25:41,019 520 00:25:41,019 --> 00:25:43,330 521 00:25:43,330 --> 00:25:45,580 522 00:25:45,580 --> 00:25:47,820 523 00:25:47,820 --> 00:25:49,830 524 00:25:49,830 --> 00:25:54,090 525 00:25:54,090 --> 00:25:57,390 526 00:25:57,390 --> 00:25:59,340 527 00:25:59,340 --> 00:26:03,060 528 00:26:03,060 --> 00:26:04,890 529 00:26:04,890 --> 00:26:06,659 530 00:26:06,659 --> 00:26:09,450 531 00:26:09,450 --> 00:26:12,000 532 00:26:12,000 --> 00:26:13,380 533 00:26:13,380 --> 00:26:16,019 534 00:26:16,019 --> 00:26:18,870 535 00:26:18,870 --> 00:26:20,460 536 00:26:20,460 --> 00:26:22,320 537 00:26:22,320 --> 00:26:26,039 538 00:26:26,039 --> 00:26:28,680 539 00:26:28,680 --> 00:26:30,930 540 00:26:30,930 --> 00:26:34,200 541 00:26:34,200 --> 00:26:36,149 542 00:26:36,149 --> 00:26:39,440 543 00:26:39,440 --> 00:26:41,159 544 00:26:41,159 --> 00:26:42,720 545 00:26:42,720 --> 00:26:45,870 546 00:26:45,870 --> 00:26:50,970 547 00:26:50,970 --> 00:26:54,120 548 00:26:54,120 --> 00:26:57,090 549 00:26:57,090 --> 00:27:00,690 550 00:27:00,690 --> 00:27:04,560 551 00:27:04,560 --> 00:27:08,220 552 00:27:08,220 --> 00:27:10,260 553 00:27:10,260 --> 00:27:13,769 554 00:27:13,769 --> 00:27:15,289 555 00:27:15,289 --> 00:27:21,630 556 00:27:21,630 --> 00:27:24,180 557 00:27:24,180 --> 00:27:25,799 558 00:27:25,799 --> 00:27:29,760 559 00:27:29,760 --> 00:27:32,820 560 00:27:32,820 --> 00:27:35,100 561 00:27:35,100 --> 00:27:37,350 562 00:27:37,350 --> 00:27:39,180 563 00:27:39,180 --> 00:27:43,470 564 00:27:43,470 --> 00:27:46,320 565 00:27:46,320 --> 00:27:48,389 566 00:27:48,389 --> 00:27:50,070 567 00:27:50,070 --> 00:27:54,480 568 00:27:54,480 --> 00:27:56,480 569 00:27:56,480 --> 00:27:58,740 570 00:27:58,740 --> 00:28:01,169 571 00:28:01,169 --> 00:28:03,990 572 00:28:03,990 --> 00:28:07,200 573 00:28:07,200 --> 00:28:09,870 574 00:28:09,870 --> 00:28:13,320 575 00:28:13,320 --> 00:28:17,310 576 00:28:17,310 --> 00:28:19,950 577 00:28:19,950 --> 00:28:22,500 578 00:28:22,500 --> 00:28:24,270 579 00:28:24,270 --> 00:28:26,040 580 00:28:26,040 --> 00:28:30,270 581 00:28:30,270 --> 00:28:32,760 582 00:28:32,760 --> 00:28:36,360 583 00:28:36,360 --> 00:28:38,880 584 00:28:38,880 --> 00:28:40,740 585 00:28:40,740 --> 00:28:43,680 586 00:28:43,680 --> 00:28:46,530 587 00:28:46,530 --> 00:28:59,810 588 00:28:59,810 --> 00:29:03,090 589 00:29:03,090 --> 00:29:07,470 590 00:29:07,470 --> 00:29:10,020 591 00:29:10,020 --> 00:29:13,440 592 00:29:13,440 --> 00:29:17,100 593 00:29:17,100 --> 00:29:19,650 594 00:29:19,650 --> 00:29:22,950 595 00:29:22,950 --> 00:29:26,970 596 00:29:26,970 --> 00:29:29,790 597 00:29:29,790 --> 00:29:31,500 598 00:29:31,500 --> 00:29:33,090 599 00:29:33,090 --> 00:29:36,540 600 00:29:36,540 --> 00:29:42,030 601 00:29:42,030 --> 00:29:44,220 602 00:29:44,220 --> 00:29:47,580 603 00:29:47,580 --> 00:29:50,460 604 00:29:50,460 --> 00:29:51,960 605 00:29:51,960 --> 00:29:57,690 606 00:29:57,690 --> 00:30:03,120 607 00:30:03,120 --> 00:30:04,920 608 00:30:04,920 --> 00:30:07,070 609 00:30:07,070 --> 00:30:12,840 610 00:30:12,840 --> 00:30:14,850 611 00:30:14,850 --> 00:30:16,940 612 00:30:16,940 --> 00:30:20,720 613 00:30:20,720 --> 00:30:23,660 614 00:30:23,660 --> 00:30:25,250 615 00:30:25,250 --> 00:30:26,330 616 00:30:26,330 --> 00:30:28,430 617 00:30:28,430 --> 00:30:30,500 618 00:30:30,500 --> 00:30:32,690 619 00:30:32,690 --> 00:30:34,220 620 00:30:34,220 --> 00:30:38,510 621 00:30:38,510 --> 00:30:41,150 622 00:30:41,150 --> 00:30:43,130 623 00:30:43,130 --> 00:30:45,710 624 00:30:45,710 --> 00:30:48,800 625 00:30:48,800 --> 00:30:50,750 626 00:30:50,750 --> 00:30:53,900 627 00:30:53,900 --> 00:30:55,910 628 00:30:55,910 --> 00:31:00,560 629 00:31:00,560 --> 00:31:03,590 630 00:31:03,590 --> 00:31:06,050 631 00:31:06,050 --> 00:31:11,420 632 00:31:11,420 --> 00:31:13,640 633 00:31:13,640 --> 00:31:15,080 634 00:31:15,080 --> 00:31:16,610 635 00:31:16,610 --> 00:31:23,090 636 00:31:23,090 --> 00:31:26,300 637 00:31:26,300 --> 00:31:29,450 638 00:31:29,450 --> 00:31:31,700 639 00:31:31,700 --> 00:31:33,950 640 00:31:33,950 --> 00:31:36,740 641 00:31:36,740 --> 00:31:40,430 642 00:31:40,430 --> 00:31:43,280 643 00:31:43,280 --> 00:31:44,780 644 00:31:44,780 --> 00:31:53,150 645 00:31:53,150 --> 00:31:56,570 646 00:31:56,570 --> 00:32:03,470 647 00:32:03,470 --> 00:32:06,350 648 00:32:06,350 --> 00:32:09,560 649 00:32:09,560 --> 00:32:11,750 650 00:32:11,750 --> 00:32:15,020 651 00:32:15,020 --> 00:32:16,940 652 00:32:16,940 --> 00:32:18,710 653 00:32:18,710 --> 00:32:21,620 654 00:32:21,620 --> 00:32:25,130 655 00:32:25,130 --> 00:32:27,920 656 00:32:27,920 --> 00:32:30,620 657 00:32:30,620 --> 00:32:33,140 658 00:32:33,140 --> 00:32:36,409 659 00:32:36,409 --> 00:32:39,830 660 00:32:39,830 --> 00:32:41,750 661 00:32:41,750 --> 00:32:43,730 662 00:32:43,730 --> 00:32:45,770 663 00:32:45,770 --> 00:32:48,680 664 00:32:48,680 --> 00:32:51,799 665 00:32:51,799 --> 00:32:54,470 666 00:32:54,470 --> 00:32:56,870 667 00:32:56,870 --> 00:32:59,779 668 00:32:59,779 --> 00:33:02,180 669 00:33:02,180 --> 00:33:05,029 670 00:33:05,029 --> 00:33:12,549 671 00:33:12,549 --> 00:33:23,409 672 00:33:23,409 --> 00:33:23,419 673 00:33:23,419 --> 00:33:32,519 674 00:33:32,519 --> 00:33:36,610 675 00:33:36,610 --> 00:33:40,210 676 00:33:40,210 --> 00:33:41,740 677 00:33:41,740 --> 00:33:43,330 678 00:33:43,330 --> 00:33:45,640 679 00:33:45,640 --> 00:33:47,500 680 00:33:47,500 --> 00:33:49,840 681 00:33:49,840 --> 00:33:52,330 682 00:33:52,330 --> 00:33:54,940 683 00:33:54,940 --> 00:34:03,130 684 00:34:03,130 --> 00:34:09,820 685 00:34:09,820 --> 00:34:16,200 686 00:34:16,200 --> 00:34:18,480 687 00:34:18,480 --> 00:34:21,450 688 00:34:21,450 --> 00:34:25,440 689 00:34:25,440 --> 00:34:29,180 690 00:34:29,180 --> 00:34:31,649 691 00:34:31,649 --> 00:34:33,930 692 00:34:33,930 --> 00:34:35,609 693 00:34:35,609 --> 00:34:39,030 694 00:34:39,030 --> 00:34:41,159 695 00:34:41,159 --> 00:34:43,919 696 00:34:43,919 --> 00:34:46,109 697 00:34:46,109 --> 00:34:49,349 698 00:34:49,349 --> 00:34:50,639 699 00:34:50,639 --> 00:34:53,430 700 00:34:53,430 --> 00:34:57,329 701 00:34:57,329 --> 00:34:59,220 702 00:34:59,220 --> 00:35:01,200 703 00:35:01,200 --> 00:35:06,450 704 00:35:06,450 --> 00:35:08,540 705 00:35:08,540 --> 00:35:11,550 706 00:35:11,550 --> 00:35:13,410 707 00:35:13,410 --> 00:35:19,109 708 00:35:19,109 --> 00:35:21,270 709 00:35:21,270 --> 00:35:24,089 710 00:35:24,089 --> 00:35:28,410 711 00:35:28,410 --> 00:35:30,810 712 00:35:30,810 --> 00:35:32,670 713 00:35:32,670 --> 00:35:33,990 714 00:35:33,990 --> 00:35:35,880 715 00:35:35,880 --> 00:35:38,880 716 00:35:38,880 --> 00:35:41,160 717 00:35:41,160 --> 00:35:42,660 718 00:35:42,660 --> 00:35:44,820 719 00:35:44,820 --> 00:35:49,290 720 00:35:49,290 --> 00:35:51,180 721 00:35:51,180 --> 00:35:54,390 722 00:35:54,390 --> 00:35:57,630 723 00:35:57,630 --> 00:36:00,510 724 00:36:00,510 --> 00:36:07,410 725 00:36:07,410 --> 00:36:10,620 726 00:36:10,620 --> 00:36:12,450 727 00:36:12,450 --> 00:36:14,549 728 00:36:14,549 --> 00:36:16,559 729 00:36:16,559 --> 00:36:19,109 730 00:36:19,109 --> 00:36:21,839 731 00:36:21,839 --> 00:36:28,109 732 00:36:28,109 --> 00:36:31,559 733 00:36:31,559 --> 00:36:33,299 734 00:36:33,299 --> 00:36:35,549 735 00:36:35,549 --> 00:36:39,089 736 00:36:39,089 --> 00:36:43,440 737 00:36:43,440 --> 00:36:45,390 738 00:36:45,390 --> 00:36:46,920 739 00:36:46,920 --> 00:36:50,970 740 00:36:50,970 --> 00:36:54,029 741 00:36:54,029 --> 00:36:57,720 742 00:36:57,720 --> 00:37:00,059 743 00:37:00,059 --> 00:37:02,250 744 00:37:02,250 --> 00:37:04,490 745 00:37:04,490 --> 00:37:07,170 746 00:37:07,170 --> 00:37:09,839 747 00:37:09,839 --> 00:37:13,079 748 00:37:13,079 --> 00:37:14,819 749 00:37:14,819 --> 00:37:17,099 750 00:37:17,099 --> 00:37:18,779 751 00:37:18,779 --> 00:37:22,319 752 00:37:22,319 --> 00:37:24,690 753 00:37:24,690 --> 00:37:25,950 754 00:37:25,950 --> 00:37:28,049 755 00:37:28,049 --> 00:37:31,440 756 00:37:31,440 --> 00:37:33,870 757 00:37:33,870 --> 00:37:40,529 758 00:37:40,529 --> 00:37:43,890 759 00:37:43,890 --> 00:37:45,980 760 00:37:45,980 --> 00:37:48,359 761 00:37:48,359 --> 00:37:50,460 762 00:37:50,460 --> 00:37:52,710 763 00:37:52,710 --> 00:37:56,400 764 00:37:56,400 --> 00:37:58,200 765 00:37:58,200 --> 00:38:02,069 766 00:38:02,069 --> 00:38:03,930 767 00:38:03,930 --> 00:38:05,970 768 00:38:05,970 --> 00:38:12,120 769 00:38:12,120 --> 00:38:16,920 770 00:38:16,920 --> 00:38:19,660 771 00:38:19,660 --> 00:38:28,470 772 00:38:28,470 --> 00:38:32,730 773 00:38:32,730 --> 00:38:45,340 774 00:38:45,340 --> 00:38:47,830 775 00:38:47,830 --> 00:38:51,730 776 00:38:51,730 --> 00:38:53,320 777 00:38:53,320 --> 00:38:55,600 778 00:38:55,600 --> 00:38:57,460 779 00:38:57,460 --> 00:38:59,380 780 00:38:59,380 --> 00:39:00,700 781 00:39:00,700 --> 00:39:03,220 782 00:39:03,220 --> 00:39:04,840 783 00:39:04,840 --> 00:39:07,150 784 00:39:07,150 --> 00:39:09,580 785 00:39:09,580 --> 00:39:11,410 786 00:39:11,410 --> 00:39:13,030 787 00:39:13,030 --> 00:39:15,160 788 00:39:15,160 --> 00:39:17,530 789 00:39:17,530 --> 00:39:20,320 790 00:39:20,320 --> 00:39:24,400 791 00:39:24,400 --> 00:39:27,010 792 00:39:27,010 --> 00:39:28,690 793 00:39:28,690 --> 00:39:31,530 794 00:39:31,530 --> 00:39:33,550 795 00:39:33,550 --> 00:39:35,230 796 00:39:35,230 --> 00:39:38,050 797 00:39:38,050 --> 00:39:41,830 798 00:39:41,830 --> 00:39:41,840 799 00:39:41,840 --> 00:39:47,650 800 00:39:47,650 --> 00:39:51,260 801 00:39:51,260 --> 00:39:56,260 802 00:39:56,260 --> 00:39:58,550 803 00:39:58,550 --> 00:39:59,930 804 00:39:59,930 --> 00:40:03,710 805 00:40:03,710 --> 00:40:06,380 806 00:40:06,380 --> 00:40:09,170 807 00:40:09,170 --> 00:40:11,860 808 00:40:11,860 --> 00:40:14,840 809 00:40:14,840 --> 00:40:16,910 810 00:40:16,910 --> 00:40:20,540 811 00:40:20,540 --> 00:40:22,970 812 00:40:22,970 --> 00:40:25,700 813 00:40:25,700 --> 00:40:30,280 814 00:40:30,280 --> 00:40:33,410 815 00:40:33,410 --> 00:40:36,350 816 00:40:36,350 --> 00:40:39,260 817 00:40:39,260 --> 00:40:41,270 818 00:40:41,270 --> 00:40:43,040 819 00:40:43,040 --> 00:40:51,670 820 00:40:51,670 --> 00:40:55,550 821 00:40:55,550 --> 00:40:58,670 822 00:40:58,670 --> 00:41:04,610 823 00:41:04,610 --> 00:41:11,480 824 00:41:11,480 --> 00:41:13,520 825 00:41:13,520 --> 00:41:15,710 826 00:41:15,710 --> 00:41:18,590 827 00:41:18,590 --> 00:41:22,910 828 00:41:22,910 --> 00:41:25,580 829 00:41:25,580 --> 00:41:26,990 830 00:41:26,990 --> 00:41:34,960 831 00:41:34,960 --> 00:41:38,060 832 00:41:38,060 --> 00:41:41,930 833 00:41:41,930 --> 00:41:43,880 834 00:41:43,880 --> 00:41:47,240 835 00:41:47,240 --> 00:41:50,300 836 00:41:50,300 --> 00:41:52,430 837 00:41:52,430 --> 00:41:55,300 838 00:41:55,300 --> 00:41:59,240 839 00:41:59,240 --> 00:42:00,250 840 00:42:00,250 --> 00:42:03,130 841 00:42:03,130 --> 00:42:05,710 842 00:42:05,710 --> 00:42:08,170 843 00:42:08,170 --> 00:42:11,140 844 00:42:11,140 --> 00:42:14,560 845 00:42:14,560 --> 00:42:17,620 846 00:42:17,620 --> 00:42:22,180 847 00:42:22,180 --> 00:42:26,650 848 00:42:26,650 --> 00:42:29,109 849 00:42:29,109 --> 00:42:31,900 850 00:42:31,900 --> 00:42:34,570 851 00:42:34,570 --> 00:42:37,750 852 00:42:37,750 --> 00:42:39,760 853 00:42:39,760 --> 00:42:42,460 854 00:42:42,460 --> 00:42:44,530 855 00:42:44,530 --> 00:42:47,680 856 00:42:47,680 --> 00:42:50,380 857 00:42:50,380 --> 00:42:54,340 858 00:42:54,340 --> 00:42:56,950 859 00:42:56,950 --> 00:42:58,900 860 00:42:58,900 --> 00:43:01,330 861 00:43:01,330 --> 00:43:03,270 862 00:43:03,270 --> 00:43:05,440 863 00:43:05,440 --> 00:43:07,660 864 00:43:07,660 --> 00:43:14,590 865 00:43:14,590 --> 00:43:16,120 866 00:43:16,120 --> 00:43:18,940 867 00:43:18,940 --> 00:43:22,660 868 00:43:22,660 --> 00:43:26,440 869 00:43:26,440 --> 00:43:29,950 870 00:43:29,950 --> 00:43:32,859 871 00:43:32,859 --> 00:43:38,590 872 00:43:38,590 --> 00:43:41,590 873 00:43:41,590 --> 00:43:43,960 874 00:43:43,960 --> 00:43:46,480 875 00:43:46,480 --> 00:43:48,040 876 00:43:48,040 --> 00:44:01,329 877 00:44:01,329 --> 00:44:05,540 878 00:44:05,540 --> 00:44:07,910 879 00:44:07,910 --> 00:44:10,970 880 00:44:10,970 --> 00:44:12,680 881 00:44:12,680 --> 00:44:15,490 882 00:44:15,490 --> 00:44:18,349 883 00:44:18,349 --> 00:44:21,140 884 00:44:21,140 --> 00:44:22,819 885 00:44:22,819 --> 00:44:24,530 886 00:44:24,530 --> 00:44:26,990 887 00:44:26,990 --> 00:44:28,670 888 00:44:28,670 --> 00:44:30,290 889 00:44:30,290 --> 00:44:32,650 890 00:44:32,650 --> 00:44:36,339 891 00:44:36,339 --> 00:44:40,609 892 00:44:40,609 --> 00:44:42,380 893 00:44:42,380 --> 00:44:45,050 894 00:44:45,050 --> 00:44:47,120 895 00:44:47,120 --> 00:44:48,589 896 00:44:48,589 --> 00:44:51,980 897 00:44:51,980 --> 00:44:53,900 898 00:44:53,900 --> 00:44:56,870 899 00:44:56,870 --> 00:44:59,240 900 00:44:59,240 --> 00:45:02,150 901 00:45:02,150 --> 00:45:02,160 902 00:45:02,160 --> 00:45:03,069 903 00:45:03,069 --> 00:45:05,839 904 00:45:05,839 --> 00:45:07,849 905 00:45:07,849 --> 00:45:11,630 906 00:45:11,630 --> 00:45:14,870 907 00:45:14,870 --> 00:45:16,940 908 00:45:16,940 --> 00:45:19,460 909 00:45:19,460 --> 00:45:21,230 910 00:45:21,230 --> 00:45:22,970 911 00:45:22,970 --> 00:45:29,359 912 00:45:29,359 --> 00:45:32,359 913 00:45:32,359 --> 00:45:34,309 914 00:45:34,309 --> 00:45:37,130 915 00:45:37,130 --> 00:45:44,870 916 00:45:44,870 --> 00:45:48,260 917 00:45:48,260 --> 00:45:51,349 918 00:45:51,349 --> 00:45:54,770 919 00:45:54,770 --> 00:45:58,130 920 00:45:58,130 --> 00:46:02,030 921 00:46:02,030 --> 00:46:05,690 922 00:46:05,690 --> 00:46:08,630 923 00:46:08,630 --> 00:46:11,150 924 00:46:11,150 --> 00:46:13,820 925 00:46:13,820 --> 00:46:16,460 926 00:46:16,460 --> 00:46:20,300 927 00:46:20,300 --> 00:46:24,710 928 00:46:24,710 --> 00:46:26,930 929 00:46:26,930 --> 00:46:30,860 930 00:46:30,860 --> 00:46:33,470 931 00:46:33,470 --> 00:46:34,970 932 00:46:34,970 --> 00:46:37,520 933 00:46:37,520 --> 00:46:42,170 934 00:46:42,170 --> 00:46:46,720 935 00:46:46,720 --> 00:46:51,950 936 00:46:51,950 --> 00:46:53,480 937 00:46:53,480 --> 00:46:55,940 938 00:46:55,940 --> 00:46:59,810 939 00:46:59,810 --> 00:47:02,210 940 00:47:02,210 --> 00:47:05,300 941 00:47:05,300 --> 00:47:11,420 942 00:47:11,420 --> 00:47:15,380 943 00:47:15,380 --> 00:47:19,280 944 00:47:19,280 --> 00:47:25,670 945 00:47:25,670 --> 00:47:28,670 946 00:47:28,670 --> 00:47:31,580 947 00:47:31,580 --> 00:47:39,530 948 00:47:39,530 --> 00:47:42,500 949 00:47:42,500 --> 00:47:44,930 950 00:47:44,930 --> 00:47:47,870 951 00:47:47,870 --> 00:47:49,700 952 00:47:49,700 --> 00:47:56,870 953 00:47:56,870 --> 00:47:58,730 954 00:47:58,730 --> 00:48:01,580 955 00:48:01,580 --> 00:48:03,950 956 00:48:03,950 --> 00:48:05,960 957 00:48:05,960 --> 00:48:08,690 958 00:48:08,690 --> 00:48:11,180 959 00:48:11,180 --> 00:48:14,390 960 00:48:14,390 --> 00:48:22,089 961 00:48:22,089 --> 00:48:24,000 962 00:48:24,000 --> 00:48:27,390 963 00:48:27,390 --> 00:48:31,540 964 00:48:31,540 --> 00:48:31,550 965 00:48:31,550 --> 00:48:32,380 966 00:48:32,380 --> 00:48:33,880 967 00:48:33,880 --> 00:48:35,530 968 00:48:35,530 --> 00:48:38,260 969 00:48:38,260 --> 00:48:40,810 970 00:48:40,810 --> 00:48:43,089 971 00:48:43,089 --> 00:48:44,620 972 00:48:44,620 --> 00:48:47,680 973 00:48:47,680 --> 00:48:49,240 974 00:48:49,240 --> 00:48:51,880 975 00:48:51,880 --> 00:48:55,330 976 00:48:55,330 --> 00:48:59,170 977 00:48:59,170 --> 00:49:00,339 978 00:49:00,339 --> 00:49:02,050 979 00:49:02,050 --> 00:49:05,080 980 00:49:05,080 --> 00:49:08,589 981 00:49:08,589 --> 00:49:11,710 982 00:49:11,710 --> 00:49:15,190 983 00:49:15,190 --> 00:49:17,890 984 00:49:17,890 --> 00:49:20,770 985 00:49:20,770 --> 00:49:22,630 986 00:49:22,630 --> 00:49:26,230 987 00:49:26,230 --> 00:49:36,849 988 00:49:36,849 --> 00:49:36,859 989 00:49:36,859 --> 00:49:46,449 990 00:49:46,449 --> 00:49:48,940 991 00:49:48,940 --> 00:49:51,279 992 00:49:51,279 --> 00:49:53,049 993 00:49:53,049 --> 00:49:54,870 994 00:49:54,870 --> 00:49:58,690 995 00:49:58,690 --> 00:49:59,799 996 00:49:59,799 --> 00:50:01,930 997 00:50:01,930 --> 00:50:03,789 998 00:50:03,789 --> 00:50:05,469 999 00:50:05,469 --> 00:50:06,940 1000 00:50:06,940 --> 00:50:09,549 1001 00:50:09,549 --> 00:50:11,199 1002 00:50:11,199 --> 00:50:12,599 1003 00:50:12,599 --> 00:50:16,150 1004 00:50:16,150 --> 00:50:19,779 1005 00:50:19,779 --> 00:50:28,299 1006 00:50:28,299 --> 00:50:30,609 1007 00:50:30,609 --> 00:50:32,229 1008 00:50:32,229 --> 00:50:36,160 1009 00:50:36,160 --> 00:50:38,380 1010 00:50:38,380 --> 00:50:41,469 1011 00:50:41,469 --> 00:50:43,120 1012 00:50:43,120 --> 00:50:55,599 1013 00:50:55,599 --> 00:50:58,390 1014 00:50:58,390 --> 00:51:04,229 1015 00:51:04,229 --> 00:51:09,249 1016 00:51:09,249 --> 00:51:11,229 1017 00:51:11,229 --> 00:51:13,479 1018 00:51:13,479 --> 00:51:16,839 1019 00:51:16,839 --> 00:51:20,589 1020 00:51:20,589 --> 00:51:22,630 1021 00:51:22,630 --> 00:51:24,459 1022 00:51:24,459 --> 00:51:27,400 1023 00:51:27,400 --> 00:51:30,009 1024 00:51:30,009 --> 00:51:31,539 1025 00:51:31,539 --> 00:51:37,690 1026 00:51:37,690 --> 00:51:40,089 1027 00:51:40,089 --> 00:51:43,779 1028 00:51:43,779 --> 00:51:48,519 1029 00:51:48,519 --> 00:51:50,979 1030 00:51:50,979 --> 00:51:53,739 1031 00:51:53,739 --> 00:51:55,449 1032 00:51:55,449 --> 00:51:59,079 1033 00:51:59,079 --> 00:52:03,459 1034 00:52:03,459 --> 00:52:05,799 1035 00:52:05,799 --> 00:52:08,349 1036 00:52:08,349 --> 00:52:09,880 1037 00:52:09,880 --> 00:52:12,670 1038 00:52:12,670 --> 00:52:14,199 1039 00:52:14,199 --> 00:52:16,150 1040 00:52:16,150 --> 00:52:18,400 1041 00:52:18,400 --> 00:52:21,189 1042 00:52:21,189 --> 00:52:23,499 1043 00:52:23,499 --> 00:52:26,019 1044 00:52:26,019 --> 00:52:29,019 1045 00:52:29,019 --> 00:52:30,910 1046 00:52:30,910 --> 00:52:33,219 1047 00:52:33,219 --> 00:52:36,819 1048 00:52:36,819 --> 00:52:39,579 1049 00:52:39,579 --> 00:52:42,699 1050 00:52:42,699 --> 00:52:44,650 1051 00:52:44,650 --> 00:52:49,390 1052 00:52:49,390 --> 00:52:51,910 1053 00:52:51,910 --> 00:52:54,880 1054 00:52:54,880 --> 00:52:56,979 1055 00:52:56,979 --> 00:52:59,380 1056 00:52:59,380 --> 00:53:02,199 1057 00:53:02,199 --> 00:53:05,049 1058 00:53:05,049 --> 00:53:08,930 1059 00:53:08,930 --> 00:53:12,280 1060 00:53:12,280 --> 00:53:15,829 1061 00:53:15,829 --> 00:53:17,660 1062 00:53:17,660 --> 00:53:21,010 1063 00:53:21,010 --> 00:53:26,329 1064 00:53:26,329 --> 00:53:27,710 1065 00:53:27,710 --> 00:53:29,599 1066 00:53:29,599 --> 00:53:32,359 1067 00:53:32,359 --> 00:53:36,680 1068 00:53:36,680 --> 00:53:39,819 1069 00:53:39,819 --> 00:53:44,150 1070 00:53:44,150 --> 00:53:48,559 1071 00:53:48,559 --> 00:53:50,089 1072 00:53:50,089 --> 00:53:52,579 1073 00:53:52,579 --> 00:53:54,500 1074 00:53:54,500 --> 00:53:59,930 1075 00:53:59,930 --> 00:54:04,520 1076 00:54:04,520 --> 00:54:06,440 1077 00:54:06,440 --> 00:54:09,109 1078 00:54:09,109 --> 00:54:11,059 1079 00:54:11,059 --> 00:54:12,890 1080 00:54:12,890 --> 00:54:17,329 1081 00:54:17,329 --> 00:54:19,940 1082 00:54:19,940 --> 00:54:22,010 1083 00:54:22,010 --> 00:54:23,809 1084 00:54:23,809 --> 00:54:26,450 1085 00:54:26,450 --> 00:54:28,579 1086 00:54:28,579 --> 00:54:31,400 1087 00:54:31,400 --> 00:54:34,190 1088 00:54:34,190 --> 00:54:37,339 1089 00:54:37,339 --> 00:54:39,770 1090 00:54:39,770 --> 00:54:43,819 1091 00:54:43,819 --> 00:54:47,960 1092 00:54:47,960 --> 00:54:49,579 1093 00:54:49,579 --> 00:54:53,599 1094 00:54:53,599 --> 00:54:55,940 1095 00:54:55,940 --> 00:54:57,559 1096 00:54:57,559 --> 00:55:01,700 1097 00:55:01,700 --> 00:55:03,859 1098 00:55:03,859 --> 00:55:05,270 1099 00:55:05,270 --> 00:55:08,660 1100 00:55:08,660 --> 00:55:11,349 1101 00:55:11,349 --> 00:55:20,599 1102 00:55:20,599 --> 00:55:25,339 1103 00:55:25,339 --> 00:55:29,789 1104 00:55:29,789 --> 00:55:32,279 1105 00:55:32,279 --> 00:55:35,219 1106 00:55:35,219 --> 00:55:37,469 1107 00:55:37,469 --> 00:55:41,189 1108 00:55:41,189 --> 00:55:44,430 1109 00:55:44,430 --> 00:55:46,259 1110 00:55:46,259 --> 00:55:48,989 1111 00:55:48,989 --> 00:55:50,759 1112 00:55:50,759 --> 00:55:52,890 1113 00:55:52,890 --> 00:55:55,559 1114 00:55:55,559 --> 00:55:58,289 1115 00:55:58,289 --> 00:56:01,140 1116 00:56:01,140 --> 00:56:03,719 1117 00:56:03,719 --> 00:56:06,900 1118 00:56:06,900 --> 00:56:09,029 1119 00:56:09,029 --> 00:56:13,349 1120 00:56:13,349 --> 00:56:15,829 1121 00:56:15,829 --> 00:56:19,229 1122 00:56:19,229 --> 00:56:22,079 1123 00:56:22,079 --> 00:56:25,680 1124 00:56:25,680 --> 00:56:28,620 1125 00:56:28,620 --> 00:56:36,329 1126 00:56:36,329 --> 00:56:38,069 1127 00:56:38,069 --> 00:56:40,559 1128 00:56:40,559 --> 00:56:42,749 1129 00:56:42,749 --> 00:56:45,390 1130 00:56:45,390 --> 00:56:48,900 1131 00:56:48,900 --> 00:56:51,479 1132 00:56:51,479 --> 00:56:53,640 1133 00:56:53,640 --> 00:56:56,789 1134 00:56:56,789 --> 00:56:58,799 1135 00:56:58,799 --> 00:57:02,009 1136 00:57:02,009 --> 00:57:03,660 1137 00:57:03,660 --> 00:57:09,989 1138 00:57:09,989 --> 00:57:13,589 1139 00:57:13,589 --> 00:57:16,650 1140 00:57:16,650 --> 00:57:19,410 1141 00:57:19,410 --> 00:57:22,650 1142 00:57:22,650 --> 00:57:25,049 1143 00:57:25,049 --> 00:57:28,300 1144 00:57:28,300 --> 00:57:32,230 1145 00:57:32,230 --> 00:57:36,280 1146 00:57:36,280 --> 00:57:39,520 1147 00:57:39,520 --> 00:57:42,820 1148 00:57:42,820 --> 00:57:45,849 1149 00:57:45,849 --> 00:57:48,310 1150 00:57:48,310 --> 00:57:53,290 1151 00:57:53,290 --> 00:57:56,710 1152 00:57:56,710 --> 00:57:58,599 1153 00:57:58,599 --> 00:58:03,700 1154 00:58:03,700 --> 00:58:05,620 1155 00:58:05,620 --> 00:58:10,690 1156 00:58:10,690 --> 00:58:13,290 1157 00:58:13,290 --> 00:58:16,570 1158 00:58:16,570 --> 00:58:18,940 1159 00:58:18,940 --> 00:58:22,870 1160 00:58:22,870 --> 00:58:24,880 1161 00:58:24,880 --> 00:58:27,550 1162 00:58:27,550 --> 00:58:29,950 1163 00:58:29,950 --> 00:58:33,730 1164 00:58:33,730 --> 00:58:36,339 1165 00:58:36,339 --> 00:58:37,839 1166 00:58:37,839 --> 00:58:41,079 1167 00:58:41,079 --> 00:58:44,680 1168 00:58:44,680 --> 00:58:46,599 1169 00:58:46,599 --> 00:58:48,460 1170 00:58:48,460 --> 00:58:50,920 1171 00:58:50,920 --> 00:58:53,020 1172 00:58:53,020 --> 00:58:55,690 1173 00:58:55,690 --> 00:58:59,010 1174 00:58:59,010 --> 00:59:02,680 1175 00:59:02,680 --> 00:59:10,170 1176 00:59:10,170 --> 00:59:13,000 1177 00:59:13,000 --> 00:59:16,990 1178 00:59:16,990 --> 00:59:19,359 1179 00:59:19,359 --> 00:59:21,849 1180 00:59:21,849 --> 00:59:23,530 1181 00:59:23,530 --> 00:59:25,450 1182 00:59:25,450 --> 00:59:29,079 1183 00:59:29,079 --> 00:59:31,240 1184 00:59:31,240 --> 00:59:33,609 1185 00:59:33,609 --> 00:59:35,980 1186 00:59:35,980 --> 00:59:37,900 1187 00:59:37,900 --> 00:59:39,250 1188 00:59:39,250 --> 00:59:44,980 1189 00:59:44,980 --> 00:59:48,880 1190 00:59:48,880 --> 00:59:50,920 1191 00:59:50,920 --> 00:59:52,750 1192 00:59:52,750 --> 00:59:55,060 1193 00:59:55,060 --> 00:59:57,190 1194 00:59:57,190 --> 00:59:58,990 1195 00:59:58,990 --> 01:00:00,790 1196 01:00:00,790 --> 01:00:04,329 1197 01:00:04,329 --> 01:00:05,980 1198 01:00:05,980 --> 01:00:08,500 1199 01:00:08,500 --> 01:00:12,160 1200 01:00:12,160 --> 01:00:19,620 1201 01:00:19,620 --> 01:00:22,900 1202 01:00:22,900 --> 01:00:25,689 1203 01:00:25,689 --> 01:00:28,239 1204 01:00:28,239 --> 01:00:29,920 1205 01:00:29,920 --> 01:00:31,959 1206 01:00:31,959 --> 01:00:33,400 1207 01:00:33,400 --> 01:00:35,890 1208 01:00:35,890 --> 01:00:48,400 1209 01:00:48,400 --> 01:00:53,769 1210 01:00:53,769 --> 01:00:56,109 1211 01:00:56,109 --> 01:00:58,410 1212 01:00:58,410 --> 01:01:00,160 1213 01:01:00,160 --> 01:01:05,920 1214 01:01:05,920 --> 01:01:09,189 1215 01:01:09,189 --> 01:01:12,120 1216 01:01:12,120 --> 01:01:14,709 1217 01:01:14,709 --> 01:01:17,559 1218 01:01:17,559 --> 01:01:20,559 1219 01:01:20,559 --> 01:01:23,920 1220 01:01:23,920 --> 01:01:25,479 1221 01:01:25,479 --> 01:01:34,120 1222 01:01:34,120 --> 01:01:43,480 1223 01:01:43,480 --> 01:01:49,940 1224 01:01:49,940 --> 01:01:52,250 1225 01:01:52,250 --> 01:01:54,680 1226 01:01:54,680 --> 01:01:58,760 1227 01:01:58,760 --> 01:02:01,250 1228 01:02:01,250 --> 01:02:02,900 1229 01:02:02,900 --> 01:02:06,230 1230 01:02:06,230 --> 01:02:09,470 1231 01:02:09,470 --> 01:02:14,000 1232 01:02:14,000 --> 01:02:19,670 1233 01:02:19,670 --> 01:02:28,550 1234 01:02:28,550 --> 01:02:32,960 1235 01:02:32,960 --> 01:02:39,010 1236 01:02:39,010 --> 01:02:44,180 1237 01:02:44,180 --> 01:02:46,970 1238 01:02:46,970 --> 01:02:53,060 1239 01:02:53,060 --> 01:02:55,550 1240 01:02:55,550 --> 01:02:58,670 1241 01:02:58,670 --> 01:03:00,560 1242 01:03:00,560 --> 01:03:02,720 1243 01:03:02,720 --> 01:03:05,750 1244 01:03:05,750 --> 01:03:07,970 1245 01:03:07,970 --> 01:03:10,700 1246 01:03:10,700 --> 01:03:12,590 1247 01:03:12,590 --> 01:03:13,970 1248 01:03:13,970 --> 01:03:15,890 1249 01:03:15,890 --> 01:03:18,620 1250 01:03:18,620 --> 01:03:20,420 1251 01:03:20,420 --> 01:03:25,490 1252 01:03:25,490 --> 01:03:27,620 1253 01:03:27,620 --> 01:03:27,630 1254 01:03:27,630 --> 01:03:28,070 1255 01:03:28,070 --> 01:03:30,830 1256 01:03:30,830 --> 01:03:32,960 1257 01:03:32,960 --> 01:03:35,390 1258 01:03:35,390 --> 01:03:38,200 1259 01:03:38,200 --> 01:03:41,720 1260 01:03:41,720 --> 01:03:44,600 1261 01:03:44,600 --> 01:03:47,360 1262 01:03:47,360 --> 01:03:50,590 1263 01:03:50,590 --> 01:03:53,300 1264 01:03:53,300 --> 01:03:55,700 1265 01:03:55,700 --> 01:03:56,640 1266 01:03:56,640 --> 01:03:58,890 1267 01:03:58,890 --> 01:04:01,680 1268 01:04:01,680 --> 01:04:04,289 1269 01:04:04,289 --> 01:04:07,500 1270 01:04:07,500 --> 01:04:10,940 1271 01:04:10,940 --> 01:04:13,529 1272 01:04:13,529 --> 01:04:16,980 1273 01:04:16,980 --> 01:04:18,809 1274 01:04:18,809 --> 01:04:21,960 1275 01:04:21,960 --> 01:04:24,089 1276 01:04:24,089 --> 01:04:25,760 1277 01:04:25,760 --> 01:04:29,490 1278 01:04:29,490 --> 01:04:31,410 1279 01:04:31,410 --> 01:04:35,010 1280 01:04:35,010 --> 01:04:36,990 1281 01:04:36,990 --> 01:04:39,630 1282 01:04:39,630 --> 01:04:41,609 1283 01:04:41,609 --> 01:04:44,640 1284 01:04:44,640 --> 01:04:48,569 1285 01:04:48,569 --> 01:04:50,940 1286 01:04:50,940 --> 01:04:52,980 1287 01:04:52,980 --> 01:04:56,190 1288 01:04:56,190 --> 01:04:57,900 1289 01:04:57,900 --> 01:05:00,000 1290 01:05:00,000 --> 01:05:02,339 1291 01:05:02,339 --> 01:05:05,400 1292 01:05:05,400 --> 01:05:07,620 1293 01:05:07,620 --> 01:05:09,539 1294 01:05:09,539 --> 01:05:12,690 1295 01:05:12,690 --> 01:05:14,849 1296 01:05:14,849 --> 01:05:19,039 1297 01:05:19,039 --> 01:05:22,019 1298 01:05:22,019 --> 01:05:24,599 1299 01:05:24,599 --> 01:05:28,799 1300 01:05:28,799 --> 01:05:30,870 1301 01:05:30,870 --> 01:05:33,510 1302 01:05:33,510 --> 01:05:34,920 1303 01:05:34,920 --> 01:05:37,049 1304 01:05:37,049 --> 01:05:45,920 1305 01:05:45,920 --> 01:05:48,809 1306 01:05:48,809 --> 01:05:50,660 1307 01:05:50,660 --> 01:05:53,760 1308 01:05:53,760 --> 01:05:56,730 1309 01:05:56,730 --> 01:06:00,029 1310 01:06:00,029 --> 01:06:02,700 1311 01:06:02,700 --> 01:06:04,769 1312 01:06:04,769 --> 01:06:07,019 1313 01:06:07,019 --> 01:06:09,680 1314 01:06:09,680 --> 01:06:16,750 1315 01:06:16,750 --> 01:06:19,520 1316 01:06:19,520 --> 01:06:21,560 1317 01:06:21,560 --> 01:06:23,600 1318 01:06:23,600 --> 01:06:26,000 1319 01:06:26,000 --> 01:06:28,940 1320 01:06:28,940 --> 01:06:31,010 1321 01:06:31,010 --> 01:06:32,270 1322 01:06:32,270 --> 01:06:36,860 1323 01:06:36,860 --> 01:06:38,780 1324 01:06:38,780 --> 01:06:41,420 1325 01:06:41,420 --> 01:06:44,960 1326 01:06:44,960 --> 01:06:47,930 1327 01:06:47,930 --> 01:06:49,700 1328 01:06:49,700 --> 01:06:52,340 1329 01:06:52,340 --> 01:06:56,060 1330 01:06:56,060 --> 01:06:58,490 1331 01:06:58,490 --> 01:07:00,500 1332 01:07:00,500 --> 01:07:02,840 1333 01:07:02,840 --> 01:07:04,610 1334 01:07:04,610 --> 01:07:07,790 1335 01:07:07,790 --> 01:07:11,600 1336 01:07:11,600 --> 01:07:13,820 1337 01:07:13,820 --> 01:07:16,520 1338 01:07:16,520 --> 01:07:20,120 1339 01:07:20,120 --> 01:07:23,180 1340 01:07:23,180 --> 01:07:25,070 1341 01:07:25,070 --> 01:07:27,020 1342 01:07:27,020 --> 01:07:29,060 1343 01:07:29,060 --> 01:07:31,700 1344 01:07:31,700 --> 01:07:34,460 1345 01:07:34,460 --> 01:07:36,440 1346 01:07:36,440 --> 01:07:38,930 1347 01:07:38,930 --> 01:07:40,520 1348 01:07:40,520 --> 01:07:42,530 1349 01:07:42,530 --> 01:07:47,290 1350 01:07:47,290 --> 01:08:08,100 1351 01:08:08,100 --> 01:08:13,120 1352 01:08:13,120 --> 01:08:15,400 1353 01:08:15,400 --> 01:08:17,590 1354 01:08:17,590 --> 01:08:19,930 1355 01:08:19,930 --> 01:08:22,150 1356 01:08:22,150 --> 01:08:24,990 1357 01:08:24,990 --> 01:08:27,400 1358 01:08:27,400 --> 01:08:29,020 1359 01:08:29,020 --> 01:08:31,170 1360 01:08:31,170 --> 01:08:34,780 1361 01:08:34,780 --> 01:08:36,790 1362 01:08:36,790 --> 01:08:39,700 1363 01:08:39,700 --> 01:08:42,430 1364 01:08:42,430 --> 01:08:44,500 1365 01:08:44,500 --> 01:08:56,590 1366 01:08:56,590 --> 01:08:59,770 1367 01:08:59,770 --> 01:09:01,540 1368 01:09:01,540 --> 01:09:04,810 1369 01:09:04,810 --> 01:09:07,060 1370 01:09:07,060 --> 01:09:09,160 1371 01:09:09,160 --> 01:09:12,270 1372 01:09:12,270 --> 01:09:14,350 1373 01:09:14,350 --> 01:09:16,060 1374 01:09:16,060 --> 01:09:20,620 1375 01:09:20,620 --> 01:09:23,260 1376 01:09:23,260 --> 01:09:25,660 1377 01:09:25,660 --> 01:09:28,900 1378 01:09:28,900 --> 01:09:31,170 1379 01:09:31,170 --> 01:09:34,210 1380 01:09:34,210 --> 01:09:36,300 1381 01:09:36,300 --> 01:09:39,340 1382 01:09:39,340 --> 01:09:41,380 1383 01:09:41,380 --> 01:09:44,170 1384 01:09:44,170 --> 01:09:46,240 1385 01:09:46,240 --> 01:09:49,300 1386 01:09:49,300 --> 01:09:51,690 1387 01:09:51,690 --> 01:09:54,040 1388 01:09:54,040 --> 01:09:55,930 1389 01:09:55,930 --> 01:09:59,440 1390 01:09:59,440 --> 01:10:02,290 1391 01:10:02,290 --> 01:10:03,730 1392 01:10:03,730 --> 01:10:05,430 1393 01:10:05,430 --> 01:10:07,740 1394 01:10:07,740 --> 01:10:12,350 1395 01:10:12,350 --> 01:10:15,300 1396 01:10:15,300 --> 01:10:19,980 1397 01:10:19,980 --> 01:10:22,830 1398 01:10:22,830 --> 01:10:26,700 1399 01:10:26,700 --> 01:10:29,130 1400 01:10:29,130 --> 01:10:32,430 1401 01:10:32,430 --> 01:10:35,730 1402 01:10:35,730 --> 01:10:38,520 1403 01:10:38,520 --> 01:10:42,240 1404 01:10:42,240 --> 01:10:44,210 1405 01:10:44,210 --> 01:10:48,510 1406 01:10:48,510 --> 01:10:50,430 1407 01:10:50,430 --> 01:10:55,860 1408 01:10:55,860 --> 01:10:58,410 1409 01:10:58,410 --> 01:11:00,990 1410 01:11:00,990 --> 01:11:03,240 1411 01:11:03,240 --> 01:11:06,840 1412 01:11:06,840 --> 01:11:10,500 1413 01:11:10,500 --> 01:11:12,420 1414 01:11:12,420 --> 01:11:15,120 1415 01:11:15,120 --> 01:11:17,730 1416 01:11:17,730 --> 01:11:20,100 1417 01:11:20,100 --> 01:11:22,860 1418 01:11:22,860 --> 01:11:24,750 1419 01:11:24,750 --> 01:11:29,630 1420 01:11:29,630 --> 01:11:32,070 1421 01:11:32,070 --> 01:11:34,680 1422 01:11:34,680 --> 01:11:37,650 1423 01:11:37,650 --> 01:11:39,530 1424 01:11:39,530 --> 01:11:41,670 1425 01:11:41,670 --> 01:11:43,230 1426 01:11:43,230 --> 01:11:45,840 1427 01:11:45,840 --> 01:11:47,880 1428 01:11:47,880 --> 01:11:51,510 1429 01:11:51,510 --> 01:11:53,550 1430 01:11:53,550 --> 01:11:55,610 1431 01:11:55,610 --> 01:11:57,720 1432 01:11:57,720 --> 01:12:00,450 1433 01:12:00,450 --> 01:12:02,910 1434 01:12:02,910 --> 01:12:04,470 1435 01:12:04,470 --> 01:12:10,260 1436 01:12:10,260 --> 01:12:12,510 1437 01:12:12,510 --> 01:12:15,240 1438 01:12:15,240 --> 01:12:17,370 1439 01:12:17,370 --> 01:12:18,930 1440 01:12:18,930 --> 01:12:20,820 1441 01:12:20,820 --> 01:12:22,680 1442 01:12:22,680 --> 01:12:26,220 1443 01:12:26,220 --> 01:12:28,130 1444 01:12:28,130 --> 01:12:34,020 1445 01:12:34,020 --> 01:12:36,840 1446 01:12:36,840 --> 01:12:38,700 1447 01:12:38,700 --> 01:12:41,010 1448 01:12:41,010 --> 01:12:42,900 1449 01:12:42,900 --> 01:12:44,640 1450 01:12:44,640 --> 01:12:51,680 1451 01:12:51,680 --> 01:12:55,439 1452 01:12:55,439 --> 01:12:58,320 1453 01:12:58,320 --> 01:13:02,010 1454 01:13:02,010 --> 01:13:05,430 1455 01:13:05,430 --> 01:13:07,979 1456 01:13:07,979 --> 01:13:09,570 1457 01:13:09,570 --> 01:13:12,990 1458 01:13:12,990 --> 01:13:16,680 1459 01:13:16,680 --> 01:13:19,320 1460 01:13:19,320 --> 01:13:21,330 1461 01:13:21,330 --> 01:13:23,550 1462 01:13:23,550 --> 01:13:26,310 1463 01:13:26,310 --> 01:13:28,170 1464 01:13:28,170 --> 01:13:29,729 1465 01:13:29,729 --> 01:13:31,830 1466 01:13:31,830 --> 01:13:33,840 1467 01:13:33,840 --> 01:13:38,400 1468 01:13:38,400 --> 01:13:41,840 1469 01:13:41,840 --> 01:13:46,740 1470 01:13:46,740 --> 01:13:51,500 1471 01:13:51,500 --> 01:13:53,880 1472 01:13:53,880 --> 01:13:55,920 1473 01:13:55,920 --> 01:13:58,380 1474 01:13:58,380 --> 01:14:00,930 1475 01:14:00,930 --> 01:14:02,970 1476 01:14:02,970 --> 01:14:05,310 1477 01:14:05,310 --> 01:14:07,530 1478 01:14:07,530 --> 01:14:09,600 1479 01:14:09,600 --> 01:14:12,120 1480 01:14:12,120 --> 01:14:16,620 1481 01:14:16,620 --> 01:14:18,660 1482 01:14:18,660 --> 01:14:21,870 1483 01:14:21,870 --> 01:14:23,610 1484 01:14:23,610 --> 01:14:23,620 1485 01:14:23,620 --> 01:14:26,130 1486 01:14:26,130 --> 01:14:28,410 1487 01:14:28,410 --> 01:14:30,000 1488 01:14:30,000 --> 01:14:31,620 1489 01:14:31,620 --> 01:14:32,790 1490 01:14:32,790 --> 01:14:34,740 1491 01:14:34,740 --> 01:14:38,959 1492 01:14:38,959 --> 01:14:45,080 1493 01:14:45,080 --> 01:14:47,340 1494 01:14:47,340 --> 01:14:50,040 1495 01:14:50,040 --> 01:14:56,040 1496 01:14:56,040 --> 01:14:58,440 1497 01:14:58,440 --> 01:15:04,290 1498 01:15:04,290 --> 01:15:09,540 1499 01:15:09,540 --> 01:15:13,050 1500 01:15:13,050 --> 01:15:16,950 1501 01:15:16,950 --> 01:15:19,290 1502 01:15:19,290 --> 01:15:21,290 1503 01:15:21,290 --> 01:15:29,370 1504 01:15:29,370 --> 01:15:30,950 1505 01:15:30,950 --> 01:15:41,880 1506 01:15:41,880 --> 01:15:44,310 1507 01:15:44,310 --> 01:15:46,680 1508 01:15:46,680 --> 01:15:48,690 1509 01:15:48,690 --> 01:15:51,630 1510 01:15:51,630 --> 01:15:53,190 1511 01:15:53,190 --> 01:15:54,810 1512 01:15:54,810 --> 01:15:58,080 1513 01:15:58,080 --> 01:15:59,850 1514 01:15:59,850 --> 01:16:01,770 1515 01:16:01,770 --> 01:16:03,240 1516 01:16:03,240 --> 01:16:07,290 1517 01:16:07,290 --> 01:16:10,620 1518 01:16:10,620 --> 01:16:12,209 1519 01:16:12,209 --> 01:16:14,940 1520 01:16:14,940 --> 01:16:17,760 1521 01:16:17,760 --> 01:16:21,000 1522 01:16:21,000 --> 01:16:22,770 1523 01:16:22,770 --> 01:16:24,990 1524 01:16:24,990 --> 01:16:28,160 1525 01:16:28,160 --> 01:16:38,169 1526 01:16:38,169 --> 01:16:38,179 1527 01:16:38,179 --> 01:16:40,239