Transcript 00:22 okay any questions before we start today 00:31 well before starting let me just point 00:33 out that one of you pointed out last 00:36 time that you can actually initialize 00:39 these correspondent structures in linear 00:41 time and basically what you do is you 00:43 take a look at the N elements and look 00:48 at them in pairs the smaller of each 00:50 pair will go into the min structure and 00:52 the larger of the pair will go into the 00:53 max and that'll it ensure total 00:55 correspondence okay so you don't need to 00:58 sort the elements to figure out who goes 01:00 in which structure just look at n over 2 01:02 pairs put the smaller in one and the 01:04 bigger and the other 01:05 okay all right so let's take a look at 01:09 one of the data structures that was kind 01:13 of specifically designed for 01:14 double-ended project queues and this is 01:18 the interval heap as I mentioned last 01:21 time there are a large number of 01:22 structures designed for double-ended 01:24 prior to queues but of all of the ones 01:27 out there the symmetric min max heap is 01:30 probably the simplest the interval heap 01:32 is also very simple and both give very 01:35 good performance as we'll see the 01:38 interval heap up will allow us to solve 01:40 a problem called the complimentary range 01:43 search problem in asymptotically optimal 01:47 time okay and so this will serve both 01:51 purposes double-ended priority queue as 01:54 well as the complimentary range search 01:55 problem as far as an interval heap is 02:01 concerned it's a complete binary tree 02:03 just like a min heap or a max heap and 02:09 then unlike a min of Maxie for each node 02:13 contains only one element here each node 02:16 will have two elements in it an 02:18 exception being the last node in the 02:21 heap so in case you have an odd number 02:22 of elements then the last node will have 02:24 one otherwise everybody has to the 02:33 elements in a node will be ordered so 02:36 that a we call one of them the a element 02:43 the other one the B a is 02:45 smaller of the two okay so basically we 02:50 will have the notion of a node 02:52 representing an interval on the straight 02:54 line where an interval begins at the 02:57 smaller element and goes up to the 02:58 larger element okay so each node really 03:02 represents an interval beginning at the 03:04 smaller element in the node going up to 03:06 the larger element in the node and 03:08 that's where the name of the structure 03:10 comes from interval heap if the last 03:16 node has only one element in at a then 03:19 it still represents an interval where 03:21 the start and finish are the same we say 03:28 we have the notion of containment for 03:30 intervals one interval is contained in 03:32 the other if this relationship holds 03:34 less the normal intuitive notion of 03:37 containment the requirement here is that 03:45 if you look at nodes then the interval 03:50 of a node is contained in that of its 03:52 parent and that holds for every node 03:55 except the root which doesn't have a 03:56 parent okay so a requirement here is 04:00 that every nodes interval must be 04:03 contained in that of its parent 04:04 exception being the root I also look at 04:09 an example alright so here I've got an 04:15 odd number of elements by last node I 04:18 mean and since this is a complete binary 04:20 tree it's going to get mapped into an 04:22 array so the root is in position 1 of 04:24 the array position to position 3 4 and 04:27 so you have a notion of left to right 04:29 with respect to the array so last means 04:32 the last occupied position in the array 04:34 which is going to be this guy okay so 04:36 that's the last node in the complete 04:38 binary tree that one can have one or two 04:41 elements since we have an odd number of 04:44 elements all told in this example my 04:46 last node has one element in it every 04:50 other nodes about two elements the 04:53 elements are ordered the one on the left 04:56 is smaller than the one on the right 04:58 you can certainly have ties they can be 04:59 equal that's all right so the route 05:07 defines an interval that begins at 10 05:09 and goes to 90 05:10 this fellow defines an interval that 05:12 begins at 15 and goes to 80 and this 05:15 interval is contained inside of that 05:17 interval this node here defines an 05:20 interval that begins in 15 and goes to 05:22 20 that interval is contained inside of 05:24 that interval okay so if you look at any 05:28 node there the interval defined by that 05:30 node is contained in that of its parent 05:32 this node defines an interval that 05:35 begins at 35 goes to 35 and it's 05:37 contained inside this interval 05:45 all right so an interval heap is a 05:47 complete binary tree all nodes other 05:50 than the last have two elements the last 05:53 may have one or two the elements are 05:56 ordered the left is smaller than the 05:58 right and then each node defines an 06:02 interval interval in a node is contained 06:05 in that to its parent or any questions 06:12 about the definition 06:22 how many numbers each node can contain 06:25 up to two elements is that variable a 06:31 fixed it's fixed okay by definitional 06:34 interval heap each node is supposed to 06:37 define an intervals an interval has a 06:38 left endpoint and a right endpoint and 06:40 so you limit it to two elements 06:51 all right now from the definition okay 06:56 it follows that all the left endpoints 06:58 have to define a min heap because this 07:01 guy's interval must be contained in that 07:03 guy's interval the left value here okay 07:07 must be greater than equal to the left 07:08 value that if this left was smaller than 07:10 that you can't have containment okay so 07:13 the left endpoint here has to be 07:15 groaning going to the left endpoint in 07:16 the parent okay and that's true for 07:18 every node and so the left endpoints 07:21 have to define a min heap similarly from 07:25 the interval containment property okay 07:28 the right endpoint here has to be less 07:31 than equal to the right endpoint there 07:33 if this was bigger if this was a hundred 07:35 then this interval cannot possibly 07:37 contained in that okay so for every node 07:40 its right endpoint is less than equal to 07:43 the right endpoint of its parent and 07:45 that's the definition of a max heap okay 07:47 so the left endpoints define a min heap 07:50 the right endpoints define a max heap 07:57 the other thing you can note here is 08:04 with the exception of that fellow there 08:07 okay you really have total 08:10 correspondence okay if you think of the 08:13 read fellows and blue fellows is two 08:17 different structures a min heap and a 08:19 max heap then for every element in the 08:21 main heap you have a greater element in 08:24 the max heap 08:24 so you really have total correspondence 08:26 with the exception of this fellow which 08:28 is the buffer okay using a proof similar 08:38 to that used for total correspondence we 08:43 can see that the smallest element in the 08:45 collection is going to be this guy and 08:47 the biggest element in the collection is 08:49 going to be that kind so the root of the 08:51 min trees the smallest all told 08:54 including the buffer because now the 08:56 buffer has to satisfy the same 08:58 requirement that this fellow has to lie 09:00 in the interval of the root and all of 09:03 its 09:04 ancestors okay in these stuff we were 09:08 looking at last time the buffer didn't 09:09 satisfy any relationship with the 09:11 elements in the min and Max he Friday 09:14 cues but here it does okay so because of 09:17 that even if you have an odd number of 09:20 elements the root of the men tree of the 09:24 men heap will have the smallest element 09:26 and the root of the max heap will have 09:29 the biggest element okay all right so 09:34 you can easily locate the smallest and 09:36 largest elements 09:49 all right yeah it's a complete binary 09:52 tree as stored as an array also since 09:55 it's a complete binary tree its height 09:57 is logarithmic in the total number of 09:59 elements okay so it's got all the nice 10:03 properties you don't need to store any 10:06 pointers just wrap it all into an array 10:09 there's no wastage of space let's take a 10:18 look at how we would perform the 10:20 standard double ended prior to queue 10:23 operations insert delete max delete min 10:28 all right so here's my current double 10:34 ended product queue stored as an 10:38 interval heap and let's suppose in to 10:41 this I want to insert the element whose 10:43 keys 27 so if the last node is not full 10:50 then you can put it in here in this case 10:52 the last node is not full you can put 10:54 one more element there if the last node 10:56 had been full we'd had to create a new 10:57 node down here okay but in this case 11:00 this space in the last node so the 27 11:04 could go in the last node as far as 11:08 capacity is concerned okay so you 11:11 compare the 27 with the 35 because you 11:13 got to have the ordering and 27 is less 11:17 than 35 so 27 is going to be put in as a 11:20 left endpoint okay and then you have to 11:26 make sure that it satisfies the left 11:28 endpoint property which is you got to 11:30 have a min heap with respect to the left 11:32 endpoint okay 11:38 right so it's going to go in this node 11:40 it's going to go in as a left endpoint 11:43 and then we have you can put it in here 11:47 only if the red fellows will form a 11:51 min-heap when you put the 27 n as a left 11:55 endpoint in this case that is true 11:58 27:35 will be contained in that interval 12:01 so you'd be okay so in this case you 12:07 just put it in there and you're done 12:13 okay let's look at a different example 12:19 let's suppose this time we want to put 12:21 in an 18 but again it's going to go into 12:25 that node since 18 is more than 35 it's 12:30 going to go in as a left endpoint and 12:35 then if it's legal to put it in here 12:39 you should still be left with a min heap 12:41 with respect to the right endpoints okay 12:43 in this case if you put the 18 here 12:45 you'd violate the main heap property 12:47 because 18 is smaller than 25 so we're 12:50 going to use the standard bubble up 12:52 algorithm that's used for insertion into 12:54 a min heap okay so I'm going to compare 13:02 the 18 with the 25 since 25 is bigger it 13:06 would be illegal to put the 18 here so 13:09 I'm going to move the 25 out here and 13:10 I'm going to try and put the 18 here 13:12 basically this is going to expand the 13:15 interval represented by this node and 13:17 when you expand an interval you can't 13:20 affect the properties for any of the 13:22 fellows down here if these were 13:23 contained before expanding the interval 13:25 that still be contained so I don't have 13:27 to worry about problems down below I 13:29 only have to worry about problems 13:30 further up the tree okay so the strategy 13:34 now is you move the 25 down okay and I'm 13:39 going to try and put the 18 out here 13:42 okay 13:44 so you put the 18 here compared with the 13:46 20 okay again 18 is smaller so I'm going 13:50 to move the 20 down okay when I move the 13:53 20 down 13:54 I don't affect any of the interval 13:55 containment properties because all of 13:57 those were already contained as far as 13:58 this left endpoint was concerned so I'm 14:01 not constricting the left endpoint any 14:03 more than what was already satisfied 14:04 okay 14:05 so the 20 moves down and now I'm going 14:08 to try and put the 18 here again I'm 14:11 expanding this interval from what it was 14:13 before so it can't affect anybody on the 14:15 other side or all the containment 14:17 properties are still satisfied okay so I 14:20 can play the 18 with the 15 the 18 is 14:24 bigger so I can put the 18 out here and 14:27 satisfy the min heap property alright so 14:34 I basically just run the min heap 14:36 insertion algorithm here with respect to 14:40 the left endpoints 14:52 let's try putting in an 82 so again I'm 14:57 going to go in this node you compared 15:00 the 82 with the 35 the 82 is bigger and 15:04 since it's bigger we'll put it in as a 15:07 right endpoint okay when you put 15:10 somebody in as a right endpoint the only 15:14 property you risk violating if you put 15:16 it here is with respect to the right 15:18 endpoints with respect to the max-heap 15:22 okay so at this time I will do a 15:26 max-heap insert beginning at that yellow 15:28 node so you compare the 82 with the 60 15:38 the 82 is bigger so it can't stay here 15:40 you violate the Maxie property so I'm 15:42 going to move the 60 down and then if 15:45 you try to put the 82 here we'll be 15:47 expanding this interval like we were 15:49 before 15:49 and whenever we expand an interval you 15:52 cannot violate properties with respect 15:54 to descendents okay so I don't have to 15:56 worry about descendents okay right so 15:59 the sorry 16:06 okay all right so so the 60 moves down 16:12 I'm going to try and put the 82 here 16:14 compare it with the 70 80 2 is bigger so 16:19 I'll move the 70 down here again I don't 16:21 have to worry about any of the 16:23 descendants the 82 is bigger than the 80 16:26 okay 16:28 so I'll move the 80 down and then I'll 16:31 try and put the 82 here it's smaller 16:33 than the 90 so I can put the 82 there 16:35 safely okay there's a standard max-heap 16:40 insert and so after you've done those 16:48 relocations your structure looks like 16:50 this 17:12 what's supposed to be inserting into a 17:16 structure that has an even number of 17:18 elements okay so at this time we're 17:21 going to create a new node independent 17:23 of what you're inserting there'll be new 17:25 node created down here and that node 17:31 will end up having just one element in 17:33 it all right so let's suppose we're 17:48 inserting an eight 17:52 so I'm sitting an eight I have to decide 17:55 whether I'm going to put it in as part 17:57 of the min heap or part of the Maxie 18:00 because eight really means it's both the 18:03 left and right endpoint so you could go 18:05 either way now if the element you're 18:09 inserting lies between 25 and 70s here 18:13 they already have interval containment 18:17 so the new element lies between was 18:20 contained in the interval of the parent 18:22 node you're done you just leave it here 18:23 the other possibilities are the new 18:25 element lies to the left of this 18:27 interval or lies to the right of that 18:29 interval if it lies to the left you're 18:34 going to have a min heap violation if 18:36 you put it here okay if it lies to the 18:39 right you're going to have a max 18:40 evaluation if you put it here okay and 18:42 that determines whether to put it into 18:44 the men of the max heap in this case 18:47 instance eight is to the left of the 18:49 interval inserting it here would give 18:51 you a min heap violation so I do a min 18:54 heap insert beginning from here going up 19:01 so in this case I will do a min heap 19:04 insert and the reason for that is 8 is 19:06 to the left of the interval in the 19:08 parent so 8 is smaller than 25 so you 19:16 can't put it here you got to move it 19:17 down okay again I don't have the 19:21 animation so let me 19:24 just do it here okay so the eight is 19:28 smaller than the 25 we moved the 25 down 19:30 just like before when you put the eight 19:32 here we're expanding the interval I 19:34 don't have a problem with this node okay 19:36 the 8 is smaller than the 20 you move 19:39 the 20 down the 8 is smaller than the 15 19:42 you move the 15 down the 8 is smaller 19:45 than the 10 you move the 10 down and 19:46 then the 8 goes into the root right so 19:54 that's the net result 20:07 now if instead of eight we'd been 20:09 inserting something that was to the 20:11 right of this interval we do the same 20:13 thing but with respect to the max with 20:16 respect to the right ten points all 20:26 right so as far as insert goes we can 20:32 insert in logarithmic time yeah yeah 20:41 yeah this one here right right so you 20:55 know as you grow you'll eventually end 20:56 up trying to put a node out here okay 21:00 well the adjustment will be automatic 21:03 because when you get a point here okay 21:06 our test is that that's that point like 21:08 to the left of the 17 to the right of 21:09 the 17 or in between okay so if the 21:12 point you're putting in is 17 you can 21:14 just put it here if you're putting in a 21:16 15 you'd insert into the mint stuff and 21:19 then things will start moving down and 21:20 this will become a bigger interval if 21:23 you're trying to put in a 20 you put it 21:25 into the blue part okay yeah yeah so it 21:33 happens by itself yeah 21:39 a question is as you insert we're losing 21:51 balance because you're getting inserted 21:52 as the left branch that's not true 21:54 because if I start now and I insert 21:59 somebody it'll fill up this node okay 22:02 the next one you insert will give you a 22:04 left child here and then the one after 22:07 that will fill up the left child after 22:09 that you'll get a right child here okay 22:11 the next two will fill up that then 22:13 you'll get somebody here two fellows 22:15 then out here so it works just like in 22:18 in an ordinary heap you are filling this 22:21 stuff up by levels you're really not 22:26 consistently adding left child's no 22:36 that's not true again because it's like 22:38 I said it's going to be done by levels 22:40 okay once you fill this node you'll add 22:42 a left child here and then a left child 22:43 there then a left child here a left 22:45 child here a left child a left child the 22:47 left till a right or left or right or 22:49 left or right or left or right because 22:53 you have to maintain a complete binary 22:54 tree okay so as you grow it you cannot 22:59 get any node down here until you have 23:02 filled this level for what binary tree 23:15 for a full binary tree so for a moment 23:18 suppose these two nodes aren't here okay 23:19 so we've got the full tree with four 23:21 levels okay and then if you want to add 23:24 an item you still have to maintain the 23:26 complete binary tree properly okay so 23:29 you have to go to the next numbered node 23:31 which would be this one okay again 23:35 remember in a complete binary tree the 23:39 definition is you start with a full 23:41 binary tree number the nodes top to 23:43 bottom left to right beginning with the 23:46 number one complete binary tree with n 23:48 nodes has the first end numbered nodes 23:52 sure yeah right so uh that's full you'd 23:56 do this one then this one it's left 23:58 childless right child its left child 24:00 it's right child and so on can you 24:13 search no you can't you cannot search 24:16 efficiently in a in standard party queue 24:20 structures they designed to let you 24:23 efficiently find say the smallest 24:25 element or the largest or in the case of 24:28 a double-ended one both the smallest and 24:29 the largest but not arbitrary elements 24:33 for that you have to go to a search 24:35 structure all right so we can insert in 24:41 love with make time because the height 24:43 is logarithmic the algorithms are simple 24:45 if you already have the code for men 24:48 heaps and Max Eve's is the same thing 24:49 nothing new let's take a look at 24:55 removing an element we'll look at only 24:57 removing the men removing the max is 24:59 pretty much the same case is our one 25:04 your structure is empty okay n is the 25:06 number of elements you don't have any so 25:08 the removal fail and it's one you have 25:13 only one element that's got to be both 25:14 the men and the max okay so you just 25:16 take that element out your structure 25:19 becomes empty and it's two then you have 25:24 only one node the root okay so the left 25:29 endpoint is the minimum you take it out 25:32 you're done so let's assume you have 25:36 more than two so that means you've got 25:37 more than the root you've got children 25:45 all right so I want to take out the 25:47 minimum element the minimum element is 25:50 the left endpoint of the root that's the 25:52 10 my strategy is going to be similar to 25:59 that used for removal from a min heap or 26:02 a max heap okay you take out somebody 26:04 from the root you try and refill that 26:07 empty slot by taking somebody out from 26:09 the last node in the structure ok the 26:12 last note in the structures 35 is this 26:14 one here I'm going to take one of these 26:15 fellows out and try and put it back in 26:19 all right so remove the left endpoint 26:21 that node has a vacancy not allowed to 26:25 have the vacancy because it's not the 26:27 last node we want to take an element out 26:31 from here okay 26:32 it doesn't really matter which one you 26:34 do it'll work with both I'm going to 26:36 just assume we take out the left 26:37 endpoint okay that works even if you 26:42 have only one point in here then you can 26:44 just take it out from the left slot yeah 26:48 yeah that's right yeah yeah so if there 26:58 are only two elements in the structure 27:00 you take out the left endpoint then 27:01 there if you're looking at writing code 27:03 you got two slots in there one called l1 27:06 called R and the point is physically 27:08 only going to be in one of them so you 27:09 move it from the our slot to the else 27:11 what they're conceptually you can think 27:17 of it as being both the left and the 27:18 right endpoint all right so here I've 27:23 got two I can take one of the two out it 27:25 doesn't really matter which I'm going to 27:28 take the left one out the 35 okay so 27:36 this fellow now has one that's okay it's 27:39 the last node it's okay for it to have 27:40 one the 35 has to be reinserted so I'm 27:45 going to start at the top okay to 27:49 reinsert the 35 now this stuff I say 27:53 there's in some cases this would have 27:55 had only one fellow you took the fellow 27:56 out it's become empty it's no longer 27:58 part of the structure 28:03 okay so I have to reinsert the 35 and 28:09 I'm going to start at the root as I 28:11 would for a min or max heap reinsertion 28:13 and I'm going to use the trickle-down 28:16 strategy that's used out there there's 28:20 going to be a minor twist to it though 28:21 okay and when you're trying to put this 28:25 fellow in I know that the 35 has to be 28:32 less than the 90 because it came from 28:35 down here and this interval is contained 28:36 in that okay so there's no problem 28:39 trying to put it in as the left endpoint 28:41 over there okay to see if it's valid to 28:47 put the 35 there's a left endpoint I 28:49 need to make sure that these two 28:51 intervals okay must be contained in the 28:54 new interval there if these two are 28:55 everybody else will because they're all 28:57 contained in those fellows okay and the 29:01 only problem I can have with interval 29:02 containment is with the left points the 29:04 right points are already contained okay 29:07 so look at the 15 and the 30 okay and I 29:10 find the smaller of the two which is 15 29:15 okay since this point is to the left of 29:18 35 I know I have an interval violation 29:21 containment valuation okay so I'm going 29:25 to take the 15 and move it up and then 29:26 I'll try and put the 35 down here 29:29 exactly as you would for reinsertion 29:32 into min-heap okay so the 15 moves up 29:35 and I'm going to try and put the 35 here 29:38 okay now at this point I'm not 29:41 guaranteed the 35 is smaller than 82 the 29:44 first time I was because of where it 29:45 came from but now I'm not so I got a 29:47 check and make sure that this is still 29:50 smaller than that if not I'll swap the 29:52 roles of the two and continue to insert 29:54 in the min-heap okay if I swap the roles 29:58 I'm making this interval bigger on the 29:59 right so all the containment properties 30:01 with the right endpoint is still 30:02 satisfied it's only the left containment 30:04 that may be a problem okay all right so 30:07 here I don't have to do a swap to see if 30:10 there's a valid place to put 35 at the 30:12 left 30:13 I find the smaller of these two which is 30:14 15 since 15 is to the left of that it's 30:18 smaller you can't put the 35 there so 30:21 instead this one bubbles up so the 15 30:23 goes up and now I'm going to try and put 30:28 the 35 here okay and here's where we see 30:31 that this fellow is no longer smaller 30:33 than that guy okay 30:35 so I have to interchange the roles okay 30:38 so the 35 will park itself here 30:40 expanding this interval to the right 30:43 doesn't disturb any of these guys okay 30:46 with respect your right endpoints okay 30:48 so then I'll take the 20 and say is 20 a 30:51 valid left endpoint for this place so 30:54 find the smaller of those two which is 30:56 16 is to the left of the 20 so you can't 30:59 put the 20 here instead I move the 16 up 31:07 then I compare the 20 and the 19 because 31:10 I got to get the left right ordering 31:12 first 20 is bigger so that becomes the 31:16 right endpoint 19 is the new target left 31:18 endpoint and I'll find the smaller of 31:20 the two left endpoints and the children 31:22 if they had been children here there 31:24 aren't so you can put that 19 back in 31:28 there and have a 19 20 31:37 okay alright so the way you remove the 31:39 main element is very similar to how you 31:42 would from a min-heap you take it out 31:45 from here you find a replacement from 31:47 the last node you throw away the last 31:49 node if it becomes empty you do a 31:53 trickle down the important difference is 31:56 when you're trickling down in the main 31:57 heap you have to first compare with the 32:00 right end point and you may have to swap 32:14 okay alright so remove men is 32:19 essentially the same as that for a man 32:22 heap with the exception of that swap 32:23 operation max remove max is very similar 32:26 to that and they both run in logarithmic 32:30 time you initialize the structure you 32:41 just throw the elements into the array 32:43 any way you like and then you start from 32:49 the last node okay and make sure that 32:54 the subtree here is an interval heap 32:56 then you move to this node 32:58 make sure that's an interval heap then 33:00 you move to that node okay so you kind 33:02 of do it right to left in the array make 33:05 sure that each position in the array is 33:07 the root of an interval heap like you 33:10 would initialize a min heap or a max 33:12 heap so you examine the nodes bottom to 33:18 top and actually right to left that's 33:20 the easier way to code it you just 33:21 started the right end of the array 33:22 moving to the left end of the array when 33:28 you look at a node you make sure you 33:31 satisfy the left right property okay so 33:34 you swap here the 14 and the 35 then you 33:40 take the left endpoint the 14 and 33:42 reinsert it into the interval heap 33:45 rooted here you take the 35 and reinsert 33:49 it into the interval he proved it here 33:50 using that trickle-down process we just 33:53 talked about okay then you come to this 33:56 node and then to that node and so on 33:58 okay eventually when you come here and 34:04 you're working on the root you've made 34:06 sure that this whole thing is an 34:08 interval heap you made sure that this 34:11 whole thing is an interval heap would 34:13 swap these two fellows insert the 39 34:16 into the bin structure insert the 70 34:19 into the max structure that would then 34:22 make sure that this whole thing is an 34:24 interval 34:24 okay so it's it's very much like 34:28 initializing a min heap or a max heap 34:30 and it runs in the same amount of time 34:33 as that one does which is order in 34:47 all right so if you feel comfortable 34:49 with min and Max heaps then you should 34:52 feel quite comfortable with the interval 34:53 heap the operations work very much the 34:56 same alright next let's take a look at 35:01 cache optimization both for heaps as 35:07 well as for interval heaps alright 35:15 suppose you have uniformly distributed 35:17 keys and you're putting them into heap 35:19 then people know that on average okay 35:24 the insertion starts at the bottom and 35:26 moves an item up on average you go up 35:28 less than two levels see why they end up 35:30 end up putting it in the last node of 35:32 one level up and the other hand when you 35:37 do a remove min order to remove max you 35:39 take something out of the root you take 35:41 something from way down the bottom the 35:43 last node and reinsert it 35:44 well that fellow tends to go almost all 35:46 the way back down which is intuitively 35:49 what you'd expect okay so the remove 35:53 operation ends up looking at almost all 35:56 the levels in the tree the insert 35:57 operation ends up looking at about two 35:59 levels in the tree the expensive 36:01 operation is the remove and so will 36:04 optimize our structure for the remove 36:09 operation all right so let's suppose 36:15 we're dealing with a computer where you 36:18 have a single cache it's called an l1 36:21 cache in each line is 32 bytes long and 36:23 you got about 16 kilobytes of cache okay 36:26 so it's not a very big cache and we got 36:29 a huge collection of elements so they're 36:32 not going to all fit in cache and 36:33 chances are if you access one element by 36:36 the time you come back to reuse it a few 36:39 seconds later it's been kicked out of 36:40 cache by something else that came in so 36:45 if your heap is containing elements that 36:51 are eight bytes long 36:56 then you can get four elements into a 36:59 cache line okay all right so let's say 37:05 here's my array into which I've mapped 37:08 my main heap of max-heap and when I 37:13 created the array I asked for a cached 37:15 align array which means that this begins 37:17 at a cache line boundary okay that would 37:20 mean that position 0 1 2 3 fit into the 37:24 same cache line if you access any one of 37:26 these three elements this whole block 37:28 would be brought into the cache 37:31 similarly if you access 4 5 6 or 7 any 37:34 one of those four fellows all of these 37:36 four will be brought in to a cache line 37:39 with with the same cache miss ok all 37:43 right so so the programming languages 37:47 have ways to ask for cache line memory 37:49 so you ask for a cache line memory it 37:51 begins at a cache line boundary then I 37:54 know where all the remaining boundaries 37:56 are alright so the yellow stuff is one 38:01 cache line the red stuff forms another 38:05 cache line and that forms another cache 38:07 line and so on you access anybody in the 38:13 same cache block the whole block goes in 38:15 to a cache and that means that with one 38:21 cache miss you can actually use all four 38:23 of these fellows with one cache miss you 38:26 can use all four of these fellows and so 38:28 on all right so if you do a remove men 38:35 using a standard heap okay then we start 38:43 at position 1 then I'll need to look at 38:46 its two children to find out which one 38:48 is smaller so there's two and three okay 38:50 so all that comes with one cache miss 38:52 and if you move to position two then 38:54 I'll need to look at his two children 38:55 that's four and five so that's another 38:57 cache miss now if you move to four 39:00 you'll need to look at two children 8 9 39:02 so that'll be in this cache block you 39:04 need to look at his two children 16 17 39:07 that'll be in some cache block further 39:08 down 39:09 so siblings are always in the same cache 39:13 block if your tree has a height H you go 39:16 down all age levels or H minus one in 39:19 the worst case on average you'll have 39:21 about H cache misses to silver say 39:25 remove min operation so you have about 39:33 log event-based to cache misses to do a 39:36 remove min operation or remove max using 39:39 the standard heap but you'll notice that 39:46 except for this block only half of each 39:49 cache block is used here we get 4 and 5 39:52 but 6 & 7 are not used when you go to 39:54 get 8 & 9 you won't use 10 and 11 okay 39:59 so there's room for doing better than 40:02 this all right suppose you went to a 40:07 higher degree heap that means that still 40:11 each node has only one element in it but 40:13 a node has many children so the standard 40:16 heap is a two heap each node has two 40:18 children if you went to a D heap said D 40:20 is for each node would have four 40:21 children but it'll still be a same entry 40:24 if you wanted a min heap you number the 40:30 nodes in the same way top to bottom left 40:32 to right and you end up with similar 40:34 formulas to access the children of a 40:37 node or the parent of a node okay you 40:39 can do arithmetic so again you don't 40:41 need to keep pointers okay you can go 40:43 home and verify these formulas the 40:48 height now becomes log of n base D 40:50 because the degree of the tree is D 40:52 instead of 2 so if you try D of 4 then 40:58 your height is half that of the binary 41:01 heap 41:05 okay if you look at inserts in the worst 41:11 case and insert may go all the way up 41:13 the tree but the number of levels is 41:15 half of what it was in a binary tree so 41:17 your worst case number of accesses 41:20 becomes half okay but the average 41:24 doesn't really change it's still about 41:25 one point six if you look at remove main 41:30 operations then as you are going 41:33 downwards trickling down you now have to 41:37 find the min of four children rather 41:38 than two okay so instead of comparing 41:43 two items to find the many and I have to 41:45 compare four to find the men but the 41:47 only half as many levels you do it half 41:48 as many times the number of levels you 41:53 go down is half okay so the number of 41:56 levels goes down by a factor of two but 41:58 the number of comparisons is actually 42:01 going up to find the smaller of two you 42:03 make one compared to find the smaller of 42:05 four you make three compares well number 42:12 of levels is half the number of 42:15 iterations of the loop goes down by half 42:25 the real performance will be will be 42:27 determined here by the number of cache 42:29 misses if you if you can reduce the 42:31 cache misses by a factor of two you 42:34 could probably see something close to a 42:35 factor of two improvement in the cost 42:39 even though the number of comparisons in 42:41 each level went from 1 to 3 and the 42:44 number of levels only went to half so it 42:45 looks like you got a 1.5 increase in the 42:47 number of compares okay but if the cache 42:50 misses go down by a factor of 2 42:52 the cache misses cost a lot more than 42:54 any of those compares to maybe by a 42:56 factor of 100 all right if you use a 43:00 standard mapping okay 43:01 you start with a cache aligned array so 43:04 this is the start of a cache boundary 43:05 okay you start at one you want as 43:08 children that's 2 3 4 5 so they're in 43:10 two different cache blocks then you want 43:12 the children to for they'll also be in 43:14 two different cache blocks okay so at 43:18 each level you have to access to cache 43:20 blocks whereas with a binary heap I had 43:23 to access only one cache block okay so 43:28 the number of cache misses remains the 43:31 same number of levels went to half with 43:34 the cache misses per level doubled so 43:36 the total number of cache misses remains 43:38 the same you're doing more compares 43:40 you're doing more work so your cost has 43:41 actually gone up but if you shift the 43:48 array by two slots okay so I start with 43:54 a cache line here cache boundary and I 43:58 shift the array by two slots as a change 44:00 in indexing by two so the route is here 44:03 then it's four children are in this 44:06 block and all of two is children are in 44:10 one block all of threes children are in 44:11 one block and so on so now you can get 44:14 all four children with one cache miss 44:17 the siblings and the same cache line the 44:20 number of cache misses becomes equal to 44:24 the number of levels which is half of 44:26 what it was in the binary case and so 44:28 now you're doing half as many cache 44:30 misses okay so you can expect to see 44:35 some improvement in performance 44:37 and in fact people have played around 44:39 with heapsort using this for a heap 44:44 shifted by two slots and you can 44:48 outperform standard heap sort by a 44:50 factor of like 1.5 to 1.8 if you got a 44:53 large number of elements large enough 44:56 that you keep getting cache misses 45:00 all right so catch you lying for heaps 45:04 generally work better than binary heaps 45:06 similarly you can do that for interval 45:09 heaps go to higher degrees if your cache 45:13 lines are wide enough alright finally I 45:18 want to talk about the complimentary 45:19 range search problem here you're given a 45:22 bunch of points okay you want to be able 45:25 to insert a point you can do that in 45:27 logarithmic time using interval heaps 45:29 using the standard insert operation you 45:32 want to be able to remove a point you 45:34 can do that in logarithmic time using 45:35 the standard remove operation provided 45:39 you given its location so somehow you 45:41 got to know where the point is okay you 45:44 are doing a search for it external to 45:46 this structure you've kept track of the 45:48 point somehow but the important 45:54 operation is you want to report all 45:55 points that are not in a given range so 46:00 that's the complimentary range search 46:01 problem you can do this in time linear 46:04 and the number of points that you're 46:05 going to report let's take a quick look 46:10 at how this works all right so here's my 46:15 interval heap of points current status 46:18 of the points I get a complimentary 46:20 range search query which says report all 46:23 points that are outside of the interval 46:25 5 100 so I started the route and I asked 46:30 is the interval here contained in the 46:32 query interval since 10 20 is contained 46:36 in 5 100 I know that every point is 46:39 contained in that interval ok so I don't 46:42 have to go any further let's try a 46:47 different one 46:48 265 46:50 okay I asked the same question here is 46:53 this interval contained in the query 46:55 interval it's not and if it's not that 46:58 tells me two things one is that at least 47:01 one of these two points is part of the 47:03 answer okay 47:05 maybe both but at least one is part of 47:07 the answer plus they may be other points 47:09 down here and other points down here 47:11 that are part of the answer okay so 47:14 first I report the one or two points in 47:16 the root that are part of the answer in 47:18 this case 90 so I report the 90 and then 47:22 I recursively search the left sub-tree 47:24 and the right subtree okay 47:26 ask the same question here okay it's 47:29 1580 contained in the interval it's not 47:32 so at least one of these two points is 47:34 part of the answer in this case 80 I 47:36 recursively do this side and that side 47:38 is 2070 contained in that interval it's 47:42 not so at least one of these two points 47:44 is part of the answer the 70 years 47:46 recursively come here 25 60 is in that 47:50 interval so at no point looking anyway 47:53 down there 30 50 isn't that interval so 47:57 you don't look at it subtrees you when 48:00 the recursion comes here 15 20 is 48:03 contained in the interval so nobody is 48:05 examined down here when you come out 48:08 here 30 60 is contained in the interval 48:11 so you don't have to look at that 48:12 subtree okay so it's a recursive process 48:17 see if the root interval is contained if 48:20 so you're done 48:21 if it's not report the one or two points 48:23 in the root that are part of the answer 48:25 recursively search the left and right 48:27 subtrees now the claim is this runs in 48:33 linear time with the number of points 48:36 okay so to see that will use a costing 48:42 scheme which says if a point reports if 48:47 a node reports one or two points give it 48:49 a constant of one okay if a node like 48:54 this one here reports no points then add 48:57 one to the cost of its parent okay 49:02 similarly this 49:03 we'll add one to the cost of its parent 49:04 this node here we'll add one to the cost 49:07 of its parent and this node here would 49:09 add one to the cost of the parent okay 49:13 alright so let me back up first okay 49:17 every node that is reached okay give it 49:20 a constant of one if you add up all the 49:23 ones that's the cost of the algorithm 49:25 okay now take the nodes that do not 49:28 report a point take their one unit of 49:31 cost and shift it to the parent okay 49:34 assuming it has a parent if it's the 49:36 root then you have no place to shift it 49:38 okay right so if a node doesn't report a 49:43 point shift its cost to the parent okay 49:46 now a node that does not report a point 49:48 doesn't have any children of its own 49:50 okay so notes that don't report points 49:53 don't get any cost shifted to them their 49:56 cost got shifted to the parent no node 50:00 can have more than two units have cost 50:02 shifted to them one from each other's 50:04 children so no node has a cost of more 50:07 than three okay so each node that has a 50:11 cost left has a cost of at most three 50:14 each such node reported at least one 50:16 point with the exception of the root so 50:19 if you add up all the costs it's three 50:21 times the number of points reported plus 50:23 one to account for the root which may 50:24 not have reported a point so your total 50:27 cost is three k plus one order of K but 50:35 what well this is a cost of the 50:37 algorithm what's the complexity okay so 50:39 the complexity is linear in the number 50:41 of points that are part of the answer 50:49 all right any questions 50:54 okay so we've stuck