Transcript 00:11 okay right before we start to look at a 00:17 new data structure today you have any 00:19 questions you want to ask all right 00:28 today we're going to look at another 00:29 data structure that's useful in 00:31 computational geometry this one is 00:34 called an interval tree and actually 00:36 we'll be taking a look at two versions 00:38 of an interval tree so the first one 00:42 will simply call an interval free and 00:44 the second one I'll call that a variant 00:46 of an interval tree okay so the two 00:48 versions allow you to perform different 00:50 types of operations efficiently the 00:54 first version that we're going to look 00:55 at is going to store intervals okay and 01:01 actually even the second one both will 01:03 store intervals and as we saw last time 01:05 intervals have a left point associated 01:08 with them on a right point associated 01:09 with them the difference between the 01:13 intervals we're talking about today and 01:14 those we looked at last time is that the 01:17 intervals don't need to have a left and 01:20 right point that are different so it's 01:22 possible for the left and right endpoint 01:23 to be the same in which case our 01:26 interval represents a point okay so we 01:30 can have points as well as intervals of 01:34 length one and more the operations that 01:41 can be supported by both variants would 01:44 be insert and delete so you can add and 01:46 remove intervals from your collection 01:48 the search operation is the one that's 01:52 different for the two the first version 01:55 we'll look at will allow us to answer 01:57 queries of the form from your collection 02:00 of intervals report all intervals that 02:03 overlap a given query interval from L to 02:07 R so here overlap is not full overlap so 02:12 long as you touch anywhere 02:13 between L and R will say there's an 02:15 overlap all right the second version 02:23 will only give us one interval where 02:26 there's an overlap okay so this will 02:28 give us all intervals where there's an 02:30 overlap all right so let me first define 02:34 an interval tree it's going to be a 02:38 binary tree okay each node in the tree 02:43 is going to have a point stored in it 02:45 okay so be some point associated with an 02:48 interval with the node we've called that 02:51 point V dot PT for the node V okay 02:56 also in that node will store two lists a 02:59 left list and a right list okay so each 03:03 node in this binary tree will have three 03:06 entities a point PT a left list and a 03:10 right list I will see what the left and 03:12 right lists looked like in a moment okay 03:15 now as far as this entity is concerned 03:21 PT the binary tree will be a binary 03:24 search tree on that entity okay so the 03:29 points in the left subtree of V will be 03:31 smaller than the point in the node V 03:34 okay the points in the right subtree of 03:37 V will be bigger than the point in V 03:40 okay so each node has three entities a 03:46 point and then left and right if you 03:50 look only at point okay so every node 03:52 has a point if you look only at the 03:54 points in the nodes you will have a 03:56 binary search tree on the points okay 04:01 all right let's take a look at the left 04:04 and right entities these are actually 04:06 going to be lists of intervals the 04:10 intervals stored in left will be exactly 04:13 the same as those stored and right 04:15 except that the ordering is different 04:21 all right so if you look at a node okay 04:27 oh if you look at a subtree okay so 04:29 let's suppose the V is the root of that 04:31 subtree okay then the intervals 04:34 contained in the subtree have the 04:36 property that if the right endpoint of 04:39 the interval is to the left of the point 04:42 in V so if the right endpoints to the 04:44 left and the whole interval is to the 04:45 left of that point so that interval is 04:47 going to be stored in the left subtree 04:49 of V on the other hand if the left 04:55 endpoint is bigger than the point then 04:57 the whole intervals to the right of the 04:59 point so the whole interval is stored in 05:01 the right subtree okay so each subtree 05:04 stores a collection of intervals if you 05:07 look at an interval if the interval is 05:09 to the left of the point in the root 05:12 that goes to the left subtree if the 05:14 interval is to the right of the point in 05:15 the root it goes in the right subtree 05:17 but if the interval is neither to the 05:19 left nor to the right then it overlaps 05:22 the point okay right so this is the 05:25 third case the interval in question is 05:27 neither to the left nor the right so it 05:29 has to overlap the point which means the 05:32 intervals left endpoint is less than 05:36 equal to the point which is less than 05:37 equal to the right endpoint is okay so 05:40 in this case the interval is stored in 05:41 the node V and the interval is stored at 05:47 two places in the node 1 in this entity 05:50 left and also in right left and right 05:54 are organized so that then left you can 05:57 easily access the intervals in ascending 06:00 order of the left endpoints and right so 06:04 that you can easily access the intervals 06:05 in Si ascending order of the right 06:08 endpoints okay so a possibility is that 06:13 these two lists are sorted arrays this 06:15 is sorted by left endpoints that's 06:17 sorted by right endpoints another 06:19 possibility is that this itself is a 06:21 binary search tree on the left endpoints 06:23 and that's a binary search tree on the 06:25 right end points so depending upon the 06:29 application you may choose the data 06:31 structure for these lists differently 06:36 okay now let's take a look at an example 06:42 first I've got a collection here of six 06:45 intervals that I want to store the 06:50 interval a has this left endpoint one 06:52 and it's right endpoint three 06:56 okay let's suppose that the root of my 07:01 interval tree in which these are being 07:02 stored has a point value of four you 07:07 will see later how you select a point 07:09 value so let's suppose that the point 07:11 value is four okay so if I look at my 07:17 collection of intervals the interval II 07:20 is to the left of four okay so you 07:23 mentally draw a line out here okay so E 07:26 is to the left and a is to the left okay 07:30 there are no other intervals that are 07:32 fully to the left of 4 since the left 07:34 sub-tree will contain a and E to the 07:40 right of 4 we've got this interval here 07:44 that's these to the right of 4 fully to 07:47 the right of 4 and I guess that's the 07:50 only one that's fully to the right of 4 07:52 is the right subtree will contain the 07:54 interval D the remaining intervals 07:58 that's FC and B they overlap the point 4 08:03 so these three intervals will be stored 08:06 in the node fee and they'll be stored 08:10 twice each one in the list l1 in the 08:13 list are the lists L is organized for 08:19 easy access by sending order left 08:21 endpoints this one has the smallest left 08:23 endpoint that's 2 then F and then V so V 08:29 dot left if you're thinking of these as 08:32 in a sorted array it would look like CF 08:36 and be V dot right will have the same 08:41 intervals CF and B but in a different 08:46 order 08:47 either explicitly or implicitly okay so 08:50 if you look in the right ten points 08:51 that's six that's four in that six okay 08:58 so it's going to be at this had the 08:59 smallest right endpoint then you got a 09:01 tie out here so any order for those two 09:16 or any questions about this version of 09:21 an interval tree before we look at some 09:23 properties and operations perhaps every 09:30 node in interval tree has a point 09:33 associated with it and it has two lists 09:37 of intervals both lists contain exactly 09:39 the same set of intervals okay those 09:41 lists are left and right left has both 09:45 have intervals that overlap the point 09:48 left has them in ascending order of left 09:51 endpoint right in ascending order of 09:52 right endpoint either explicitly or 09:54 implicitly intervals that lied to the 10:00 fully to the left of this pointer and 10:02 this subtree those that lie fully to the 10:04 right of this point are in that subtree 10:10 all right so some properties right so 10:15 each interval is stored in exactly one 10:17 node okay and like segment trees when 10:20 interval could be stored in logarithmic 10:21 number of nodes the number of intervals 10:27 in a node could vary between 1 and n but 10:33 there are no nodes that are empty 10:39 so the total number of nodes will also 10:44 vary between 1 and then ok it's possible 10:48 that all the intervals are packed in one 10:49 node so you only have a root or if you 10:52 have only one node in each sorry one 10:54 interval in each node you'll end up with 10:55 exactly n nodes okay so the total number 10:58 of nodes lies between 1 and n assuming n 11:01 is at least 1 if you look at the sizes 11:16 of the left and right lists across all 11:18 of the nodes well if you look only at 11:23 the sum of the size of the left lists 11:27 that's going to be equal to the total 11:29 number of intervals and similarly the 11:32 sum of the right lists was the same as 11:34 the sum of the left lists so the total 11:38 amount of space needed is order of n 11:44 again that's different from segment 11:46 trees where you need about n log n space 11:49 because each interval could be stored in 11:51 log n nodes or 2 log n nodes all right 11:57 the tree height depends on how you 12:00 choose the points and node and one way 12:06 to select the points would be to select 12:13 it as the median of the end points of 12:15 the intervals being stored in that 12:17 subtree okay so for example if you're 12:20 storing all of these intervals then the 12:24 end points are 1 through actually I'm 12:30 missing an end point 1 it should be 1 12:31 through 7 ok it's a end point missing 12:35 out there so the end points are really 1 12:37 through 7 then you picked the middle 12:39 value and if you just stay with the 12:44 example as it is here 1 through 6 then 12:47 you write the pick 3 or 4 to be the 12:49 point 12:53 so one possibility is take a look at all 12:58 your end points and find the median of 13:00 the end points and use that as the point 13:05 so when you do this then the number of 13:08 end points in the left subtree would be 13:10 half the total number of end points you 13:12 started with the number the right 13:13 subtree would be half the total that you 13:15 started with so as you go down each 13:17 level the number of end points in each 13:20 subtree becomes 1/2 so you can have only 13:23 log n levels so that's one possibility 13:30 so if you select n points in that 13:33 fashion you're guaranteed that the 13:34 height is log n unfortunately you can do 13:40 that only if you know the intervals 13:41 ahead of time so if you want to support 13:49 insert and delete without putting prior 13:51 limits on boundaries or bounds for 13:54 endpoints then you can't use this 13:56 strategy to select the point all right 14:04 so instead we say just select the end 14:07 point are we totally so select a point 14:13 for the node anywhere you like again you 14:19 need to satisfy the binary search tree 14:21 requirement on the points and to ensure 14:29 logarithmic height we will maintain our 14:32 tree a binary search tree of points as a 14:36 red-black tree so again we guaranteed 14:40 height that's order of log N 14:47 how are we going to make a variation 14:49 from the previous requirement which say 14:53 that each node is glad to have at least 14:55 one interval we're going to change that 14:58 to allow some empty nodes and this will 15:00 make it possible to perform deletions 15:02 quickly so in the interest of being able 15:09 to perform deletions efficiently we will 15:12 permit some nodes in the tree to be 15:14 empty well we place a limit on the total 15:20 number of nodes the total number of 15:23 nodes in the tree will not be allowed to 15:25 go beyond two times the total number of 15:28 intervals at that point in time okay so 15:30 end kind of changes in time so if 15:33 currently your n is 50 you'll be allowed 15:35 to have up to 100 nodes which means you 15:40 could have probably hundred 99 nodes 15:44 that are empty so the limit is not on 15:49 the number of empty nodes but the limit 15:53 is on the total number of nodes in the 15:55 tree 16:01 all right so you can have it most n 16:06 empty nodes because the non empty nodes 16:10 will then have at least one interval in 16:14 them each 16:22 okay alright so every non-empty interval 16:24 has at least one node or one interval 16:27 and so this thing here implies this 16:33 coupled with the fact that non-empty 16:36 nodes have at least one interval would 16:37 imply this condition here or the height 16:44 is log n is log n because the total 16:46 number of nodes is 2n and it's a 16:48 red-black tree i'll see why you need to 16:57 have empty nodes okay to support 17:01 deletion alright suppose you're deleting 17:06 an interval and the interval you're 17:11 deleting happens to be sitting in a node 17:12 as degrees two if that node becomes or 17:20 still has some intervals left you're 17:21 okay you don't have to change anything 17:23 so if the node is deleting from 17:25 previously had four intervals you take 17:27 one out it's now got three you don't 17:29 have to make any changes to the 17:30 structure because you don't you haven't 17:32 created an empty node but if that node 17:35 had only one interval you deleted it 17:37 becomes empty and if you say I'm not 17:39 going to allow any empty nodes now I've 17:41 got to react to that condition okay all 17:45 right so for example if this is what 17:47 your tree looks like and suppose you 17:50 delete a node an interval from this root 17:53 and if the root becomes empty then my 17:58 strategy to handle deletion from a 18:02 degree two node in a binary tree is to 18:06 replace what was here with somebody from 18:11 the left subtree that has highest point 18:13 values so from here okay or somebody 18:17 from the right sub tree that has 18:18 smallest point value from that okay so 18:21 this has become empty and we'll change 18:24 the point value here it's no longer 18:27 going to be 20 and would it make it 18 18:30 okay so when I make the point value in 18:33 the root 18 18:34 then people stored in this list left and 18:38 right they have to move over here okay 18:40 because they overlap with 18 but in 18:43 addition they may be people here that 18:45 overlap with 18 okay the intervals that 18:48 are stored here are only guaranteed to 18:50 be the left of 20 so the to the left of 18:53 20 and the overlap with 10 but you could 18:57 also overlap with 18 okay the intervals 19:00 here I guaranteed to be to the right of 19:03 10 and the left of 20 and the overlap 19:06 with 15 but they could still overlap 19:08 with 18 okay so the intervals here and 19:11 here I mean some of them may need to be 19:14 moved over there in particular if you 19:19 look at intervals along the path from 19:22 this node all the way down here all of 19:25 those nodes could contain intervals that 19:28 need to be migrated into this node here 19:30 okay and the cost of that so you have to 19:36 pay the cost of that migration but in 19:38 addition when you migrate from here to 19:40 here this node may become empty okay 19:44 similarly that node may become empty so 19:47 one empty node may create several other 19:51 empty nodes and when you try to remedy 19:53 that they may create additional empty 19:56 nodes okay so you could get a ballooning 19:58 effect of having to rectify many empty 20:01 node problems and that makes the 20:04 complexity of deletion if you don't 20:06 allow empty nodes to be somewhat high by 20:12 having a provision for empty nodes we 20:17 can simply avoid doing anything if this 20:20 node becomes empty I say well no problem 20:22 just leave it like that leave it empty 20:23 then I don't have to relocate anybody 20:26 but that in itself may result in a large 20:30 number of nodes over time so maybe you 20:32 got only three intervals and 10,000 20:34 empty nodes okay so that's why you got 20:37 this other bound which says you can have 20:40 at most a total of two n nodes 20:45 all right so so empty notes are 20:48 necessary without an empty node deletion 20:52 from a degree two node could be very 20:54 expensive deletion from nodes of degree 20:59 one and zero is not a problem okay so if 21:02 you're deleting from here and this 21:04 becomes empty 21:04 well you just throw it away if you're 21:08 deleting from here and this becomes 21:10 empty our strategy is to change this 21:13 pointer from there to there okay so 21:15 again there are no intervals that 21:18 migrate because there may be a rotation 21:20 but prior to moving to the rotation 21:23 stage there is no need to migrate 21:24 intervals okay so it's only when a 21:29 degree two node becomes empty that we 21:31 have this migration problem which 21:33 triggers other nodes to become empty all 21:40 right so if we place a limit of two n 21:42 nodes you have to then establish the two 21:47 end nodes is sufficient for you to be 21:52 able to resolve deletion so what you can 21:59 prove and I'm not going to prove it here 22:01 in class but you can check the readings 22:03 there's a short proof maybe about five 22:06 lines on which shows that when the 22:10 number of nodes becomes more than two n 22:12 if you take a look at all of the empty 22:15 nodes that you currently have then at 22:18 least one of those empty nodes will be a 22:21 degree one or a degree zero empty node 22:25 and removing a degree one or degree zero 22:29 empty node as easy as we just saw so you 22:31 can bring the number back down to below 22:33 to less than equal to 2n so if you're 22:38 deleting from a degree two node and you 22:41 suddenly find that leaving it there 22:42 would leave you with 2n plus 1 nodes 22:44 well then your collection of empty nodes 22:46 has got to have at least one empty node 22:49 that is of degree zero or degree one and 22:52 we know how to get rid of nodes of that 22:54 type okay so you can bring it back down 22:57 to two it 22:58 right so that proof shows that it's 23:01 enough to work with two n total notes 23:14 okay alright so uh in the new version of 23:22 the interval tree we are going to allow 23:24 empty nodes however the total number of 23:29 nodes in the tree will be limited to two 23:31 n all right so when you do insertions 23:39 and deletions you will at times have to 23:42 perform rotations to maintain your tree 23:46 as a red-black tree so we need to see 23:49 what kind of work is needed to support 23:51 these rotations we take a look only at 23:54 the ll rotation yeah 24:10 yep 24:23 [Music] 24:26 question is I I said that if the number 24:30 of empty notes becomes 2 n plus 1 then 24:32 we can bring it back to 2n by throwing 24:34 away a node of degree 0 or 1 and the 24:37 question is what happens if the number 24:39 of empty nodes is at the total number of 24:41 nodes is 2n plus 2 or 2n plus 8 24:43 well when you do a delete okay only one 24:47 node can become empty the node from 24:50 which you take the interval out so prior 24:52 to the delete if you had less than 2n 24:56 empty nodes now you will have at most to 24:58 an empty nodes I'm sorry if you had if 25:01 prior to the delete you had at most you 25:04 had less than 2 n nodes now you'll have 25:06 at most 2 n nodes and if you okay let me 25:18 say it differently okay suppose there 25:21 are 20 suppose you have 2n plus 20 nodes 25:27 okay so anytime you have more than 2 n 25:32 nodes 25:32 you must have sufficient number of nodes 25:35 of degree 0 or 1 so if you've got 2 n 25:39 plus 10 nodes then they're going to be 25:41 at least 10 nodes of degree 0 and 1 you 25:43 can throw them all out okay so basically 25:47 once you have a proof that says if 25:49 you've got more than two end nodes and 25:52 that proof really relies only on 2n plus 25:55 1 ok the same proof tells you that if 25:58 you've got 2n plus 5 or 2n plus 8 then 26:00 you've got 5 extra nodes or 8 extra 26:02 nodes that can be thrown away 26:04 or if you look at it from the other side 26:07 when you start doing deletes the number 26:09 of empty nodes only increases by 1 each 26:11 time 26:15 okay all right any other questions all 26:25 right so let's look at the rotation okay 26:28 so when you do inserts or deletes you 26:35 may have to perform rotations to get 26:37 yourself back into balance okay now the 26:40 nice thing about red-black trees is you 26:41 only have one rotation at most one 26:43 rotation to perform either an insert or 26:46 a delete everything else is a color 26:47 change all right so if you're performing 26:52 a rotation then we saw there are four 26:54 types of rotations basically to play 26:57 around with and they're all really 26:59 similar because the RR is symmetric to 27:03 ll and if you look at LR LR is really an 27:06 RR followed by an L oh so it's enough to 27:09 just look at one case and you can figure 27:11 the rest out okay all right so if you 27:13 have an ll rotation well then this is 27:16 what it looks like after you've done an 27:18 insert say so the insert in this case 27:20 took place in this subtree and then you 27:23 perform a rotation and you create that 27:25 tree so when you create this tree here 27:33 okay 27:34 well then this node here has a point in 27:39 it and that node has a point in it as 27:42 well so do all the others okay as far as 27:45 these ones concerned down here the 27:48 intervals they store are correct 27:50 it's the intervals over here and here 27:53 that might have gone gone bad as a 27:56 result of the rotation so the intervals 28:02 that are stored in a and B may need to 28:06 be changed and in particular the change 28:15 you need to make is there are some 28:17 intervals that are stored in a which now 28:21 should be promoted up to be 28:25 so if you look at this side okay I've 28:30 got some I've got intervals here all of 28:32 the intervals stored here overlap with 28:34 the point in a okay but some of those 28:37 intervals also overlap with the point in 28:40 B so the intervals 28:44 stored in a that overlap with the point 28:45 and B in the after picture have to be 28:48 taken out of a and moved into B the 29:00 intervals previously stored and B don't 29:03 get moved anywhere the intervals 29:05 previously stored here overlap at the 29:06 point and be the still overlap with a 29:08 point in be on the right side intervals 29:12 get trapped at the highest level where 29:13 they overlap so so you're not going to 29:15 be shifting them anywhere from where 29:17 they are it's only the intervals some of 29:19 the intervals currently stored in a have 29:21 to be taken out of the structure for a 29:23 from the left and right listen a and 29:25 moved into the left and right lists for 29:28 B all right so so so this could take 29:42 some amount of time okay depending upon 29:45 the data structure that you use if you 29:50 have stored them left and right 29:53 themselves as red-black trees okay then 29:57 by doing a split on a you can extract 30:01 the ones that have to be moved into B 30:02 and then you do a join into B or if you 30:09 have them as a raise you need to kind of 30:10 search the arrays the sorted so you can 30:13 just search them from one side picking 30:16 up the ones that overlap and move them 30:18 out so depending upon the structure you 30:20 have the time to actually do this would 30:24 vary but in any case this is not an over 30:27 one operation okay and so it's good that 30:32 only one rotation is performed because 30:35 this migration of the intervals is 30:37 expensive 30:45 all right so so all of the rotations 30:53 that are needed to restore balance 30:55 require you to migrate intervals from 30:58 some constant number of notes if it's an 31:02 LL or an hour or it's from one node if 31:05 you're doing an L R then we said that 31:08 could be looked upon as an RR followed 31:12 by an LLC you've got two nodes in which 31:13 to do the migration from all right so 31:23 the complexity of the insert or delete 31:24 okay becomes something like this where F 31:35 of n is the time needed to relocate 31:39 items okay all right so as far as insert 31:49 and delete go the total number of nodes 31:51 you're going to reach is like 31:53 logarithmic but then because of the 31:56 migration you have to spend extra time 31:58 so that the time complexity doesn't 32:01 really become order of log n it depends 32:03 also on the time to migrate the 32:05 intervals let's look at the search 32:11 problem so you're given a query interval 32:16 so I say the query interval is from L to 32:20 R and you want to report all intervals 32:23 in your collection that have an overlap 32:26 with this query interval so there are 32:35 three cases to consider first one is 32:39 that the point stored in that node is 32:41 contained in the interval that's the 32:43 case shown in this diagram here the 32:45 point four is contained in the interval 32:48 or the interval overlaps that point okay 32:51 so in this case everyone who is stored 32:55 in this node it's all of the intervals 32:59 stored here overlap with this fellow 33:03 okay because all of them overlap with 33:05 four and so they go to overlap with that 33:07 kind so all intervals stole in the node 33:12 overlap so I go either into the left 33:14 list of the right list and extract all 33:16 the intervals and report them in 33:20 addition they may be intervals in the 33:22 left subtree and intervals in the right 33:24 subtree so I have to recursively search 33:26 the left and right subtrees for 33:28 additional intervals 33:36 right if the Korean tour lives totally 33:38 to the right even now they may be people 33:45 here that overlap with that okay because 33:48 you got it this only says the intervals 33:50 overlap with four but some of them could 33:52 extend out this way and overlap here so 33:59 to find these guys we will search this 34:01 list in order of their right endpoints 34:04 so once the right endpoint becomes 34:07 smaller than L nobody else can overlap 34:10 all right so all right so the intervals 34:20 that are stored in V okay with the right 34:24 endpoints big enough overlap so to get 34:27 this I go through the right list and if 34:30 the right list is an array then the time 34:33 needed to search this list is that the 34:34 same order as the number of intervals 34:37 here that are part of the answer nobody 34:43 in the left subtree overlaps I don't 34:45 have to search the left subtree and then 34:47 I have to search the right subtree for 34:49 additional intervals right the third 34:55 case is symmetric your Curie interval 34:58 the other one is to the left of the 34:59 point so there are some people in the 35:02 node that overlap but to find the people 35:05 here if you think of it as an array and 35:09 we look at it in order of their left 35:10 points okay so starting with the 35:13 smallest left point going up once the 35:15 left point becomes bigger than R I don't 35:17 need to look at any more so the time to 35:20 search that structure is linear in the 35:22 number of intervals that are part of the 35:23 answer I don't have to search the right 35:27 subtree but I have to search the left 35:29 subtree as far as the complexity goes 35:38 using the same reasoning as we used for 35:41 segment trees the total number of nodes 35:43 that will get visited is logarithmic 35:45 again you look at the 35:48 and closing envelope of the search okay 35:53 so you look at all the nodes and the 35:54 enclosing envelope plus the other child 35:56 of each of these nodes you end up with 35:59 like log n nodes are encountered either 36:05 all the intervals in an OD encounter 36:08 overlap you can pick those out in linear 36:10 time or only those intervals with the 36:14 right endpoint sufficiently large you 36:16 can pick those in linear time they're 36:18 going down the right list if it's an 36:19 array or those were they left interval 36:22 sufficiently small you can pick these up 36:24 by going down the left list in linear 36:26 time for soon array all right so the 36:32 total time for a search becomes log n a 36:36 total number of nodes in counter plus 36:41 the number of items that are part of the 36:44 answer 36:53 okay all right so search is good inserts 36:57 and deletes are not logarithmic because 37:00 you've got this migration of intervals 37:03 even though you encounter only log n 37:05 nodes you do have to migrate intervals 37:06 so it depends on how much time it takes 37:08 to migrate those intervals all right any 37:14 questions about this version before we 37:16 go to the next version the question is 37:39 well if it if a single node contains all 37:41 n intervals how does the search become 37:46 log n ok well if this node here 37:50 contained all n intervals ok these 37:53 intervals are stored in the left and 37:55 right lists in ascending order of left 37:58 endpoints and right endpoints ok so then 38:01 depending upon which of the three cases 38:03 your rent if you're in the first case 38:06 everybody in the left list gets reported 38:08 so that takes your order of n time which 38:11 is captured here size of the output is n 38:16 if you're in one of the other two cases 38:19 you either search the left list picking 38:22 up things that are part of the answer 38:24 and you examine only one person that's 38:26 not part of the answer ok so it's still 38:30 captured out here ok same thing if you 38:32 in the third case 38:39 okay 38:43 I didn t variant each node is going to 38:48 store only one interval so I'm not gonna 38:52 have a list of intervals and we have 38:53 only one interval and my tree is going 38:58 to be a binary search tree based upon 39:01 the left end points of the intervals 39:03 stored in the nodes 39:05 okay so if you look only at the left end 39:07 points you'll have a binary search tree 39:10 in addition each node is going to store 39:13 another value which is the max of the 39:17 right end points in that subtree okay so 39:20 each node will have two values in it one 39:22 is an interval the left end points of 39:25 the intervals will define a binary 39:26 search tree and the other is going to be 39:29 a max value of the right end points in 39:31 the search in that sub tree now in order 39:37 to define our binary search tree on left 39:44 endpoints I'm going to define a 39:46 relationship between intervals which 39:48 says one interval is smaller than the 39:51 other okay so that's used for our search 39:53 tree okay so an interval is smaller than 39:55 the other 39:55 if either it the left endpoint of one is 40:00 to the left of the other but in case of 40:03 a tie I'm going to have a somewhat 40:04 unusual rule and I'm going to have the 40:07 unusual rule because this rule is 40:09 necessary for my search algorithm to 40:11 work and the unusual rule is that if you 40:14 have a tie in the left endpoints then is 40:18 smaller if it's right endpoint is bigger 40:20 and then the right endpoint of the other 40:23 guy okay so look at some examples okay 40:27 all right so the yellow interval here is 40:30 smaller than the red interval I is less 40:34 than J so in terms of our binary search 40:37 tree if this is sitting in a node this 40:41 will be in the left subtree of that node 40:43 say or if this is higher up then this 40:47 would be in the right subtree good this 40:49 is bigger than that 40:52 all right same thing out here the left 40:54 endpoint is smaller than the right 40:56 endpoint so this is less than that as an 40:58 interval here I've got a tie okay so in 41:05 case of a tie I'll say this fellow is 41:07 smaller than that even though it's right 41:10 hey this is smaller than that because 41:11 it's right endpoint is bigger than the 41:13 right right endpoint so I have a binary 41:18 search tree on intervals and I have an 41:22 ordering on intervals which is defined 41:23 by this rule it is basically defined on 41:26 the Left endpoints but in case of a tie 41:28 I've got this criteria here which says 41:32 the fellow with the bigger right 41:33 endpoint is smaller okay so that's the 41:36 unusual part you might normally think 41:38 that the fellow with the smaller right 41:40 endpoint is smaller as I said when you 41:44 look at the search algorithm we need 41:45 that for that algorithm to do its job 41:47 correctly all right so it's going to be 41:56 a red-black tree like we had before each 42:00 node has exactly one interval each node 42:03 also has a value which is the largest 42:07 right endpoint in that subtree let's 42:14 look at example here all right so I've 42:18 got a binary search tree set up over 42:21 here each node has an interval so the 42:25 interval f e a and so on so the number 42:28 of nodes is exactly equal to n the 42:31 number of intervals 42:31 no empty nodes each node also has a 42:36 value which is the max right endpoint in 42:40 that subtree okay so the max is 7 the 7 42:45 is the max overall okay if you look at 42:47 this node here I've got three intervals 42:50 a C and E so a C and E and the max right 42:55 point is 4 43:18 all right suppose you want to search 43:23 you're given a Korean - ho okay Q and 43:27 you want to search the search that's 43:29 supported here is finally one interval 43:31 that overlaps with the Korean tool okay 43:35 so I'm not trying to find all of them 43:36 just one all right so the Korea interval 43:42 is given and if so you start at the root 43:50 if F and that overlap then you found a 43:53 one interval you report that as the 43:56 answer okay if F does not overlap with 44:00 L&R; with the curry interval we got to 44:04 decide whether you're going to the left 44:05 subtree or the right subtree to search 44:06 further and the way you make that 44:09 decision is you check the max value here 44:13 so if the max value here is at least L 44:17 okay then there's a possibility of 44:20 finding something out here okay then you 44:24 search this subtree and under that 44:27 condition you can show that they cannot 44:29 be a match on the right side okay and 44:31 that's because of the tiebreaker that we 44:34 used okay so you're going to go home and 44:37 verify that okay if this condition is 44:40 not satisfied then you search the right 44:43 subtree 44:50 so since the height of the tree is 44:52 logarithmic the search works in 44:54 logarithmic time because we don't really 44:57 search both left and right subtrees but 45:08 then you have to look at inserts and 45:09 deletes now here if you are doing a 45:15 deletes from a degree 2 node you can see 45:18 that we don't really have the migration 45:20 problem we had before so for example if 45:26 I delete the F okay 45:29 then I can take this fellow here and 45:31 move it up there and all I'll need to do 45:38 is I'll need to update the max value 45:40 okay so the max value may no longer be 45:43 for so so long as I have a rule to 45:47 update the max value there I should be 45:49 okay as you will focus only on the 45:57 rotations so if you're doing a rotation 46:01 it's an ll rotation so when you do the 46:05 rotation the max values in these sub 46:10 trees don't change the max values that 46:16 can't change you can change the max and 46:17 be in the maximum a because right now 46:20 the max in B is defined over all the 46:23 intervals in B prime BR and itself and 46:28 here the Max and B is defined over B 46:30 prime B are AR and in this node so AR 46:36 and a are the extra people that have 46:38 shown up so I could change the max the 46:42 max in a was previously defined over a 46:45 bigger set of intervals now it's defined 46:47 over a smaller set of intervals so we 46:51 need to update the max in a and B and 46:54 you can come up with these equations 46:59 fairly easily 47:01 the max in a is the max value here the 47:07 max value there and the right endpoint 47:09 of the interval stored here so the 47:11 larger of those three numbers it's only 47:15 the max out here 47:16 once you've computed the new max for a 47:18 is the larger of the max endpoint here 47:22 the max endpoint there and the interval 47:23 stored in there all right so you can get 47:28 the maxes fairly easily after the 47:32 rotation 47:43 okay 47:50 all right we do only one rotation per 47:53 insert or delete so you got to recompute 47:56 maxes for some constant number of nodes 47:58 he's taking order one time so an insert 48:03 or a delete runs in logarithmic time all 48:12 right so here the search as well as the 48:16 insert and delete run in logarithmic 48:17 time in this variant in the previous 48:20 variant the inserts and deletes don't 48:22 run in logarithmic time the search is 48:24 logarithm in the number of intervals 48:26 plus the size of the output the 48:29 different there's a difference between 48:30 the two structures in their capability 48:32 the second one which is better from the 48:35 complexity point of view only lets you 48:37 report one interval that overlaps and 48:39 the first one lets you report all of the 48:41 intervals that overlap all right any 48:48 questions 48:55 okay so we'll stop here