Transcript 00:00 well this is our last lecture for the 00:03 semester we have a final exam that's 00:06 scheduled for two Fridays from now in 00:09 December 16th and it's pretty early in 00:13 the morning it's like 7:30 or something 00:16 that there is say well guess there are a 00:18 few sample tests from the past they're 00:20 on the website the syllabus is posted on 00:22 the website again as you've been doing 00:25 for the other two tests do all the 00:30 preparation you think you need and then 00:33 try out the sample tests they'll be a 00:34 good indicator of whether you're ready 00:37 to take the last exam or not again as 00:41 was the case with the other tests there 00:42 are no surprises if you if you study and 00:45 if you can do these sample tests there's 00:47 no reason why you shouldn't do the third 00:50 test completely alright you have any 00:54 questions before I start talking about 00:56 our last data structure the artery okay 01:03 all right so today we're going to look 01:06 at our trees AR trees R and C are you 01:12 able to read the stuff we need more 01:14 contrast yeah it's okay alright alright 01:20 so so the artery data structure is 01:24 really an extension of a structure we 01:26 saw perhaps toward the middle of this 01:29 course there was the B plus tree if you 01:34 recall in a B plus tree which itself is 01:38 a variant of a B tree we basically had 01:41 two kinds of nodes we had data nodes and 01:44 then we had index nodes so data nodes 01:46 formed the leaves of the B plus tree 01:48 index nodes were used to navigate from 01:51 the root to the proper leaf where you 01:53 would find the data you're looking for 01:56 all the leaf nodes are the same level 01:58 and then there were constraints on the 02:01 minimum and maximum degree for an index 02:03 node all right 02:05 so an artery is really an extension of 02:09 the ideas used in the B plus tree 02:13 it's used to store multi-dimensional 02:18 objects okay so you might in two 02:21 dimensions you might be storing a 02:23 collection of say rectangles where you 02:25 interpret rectangle in its most general 02:28 term terms so a straight line is a 02:31 rectangle with very small width are very 02:36 small height and a point is also a 02:38 rectangle so in two dimensions for 02:44 example we may want to be storing this 02:47 collection of objects okay so the greens 02:50 are their art angles and I got some blue 02:52 points I don't have any lines in this 02:55 example and then you want to be able to 03:01 insert rectangles and time you want to 03:03 be able to delete rectangles in time you 03:06 also want to be able to perform curries 03:09 typically the kind of curries you 03:11 perform here intersection pipe curries 03:13 so you may be given a career rectangle 03:16 and find all the objects that intersect 03:19 that curry rectangle now you can use it 03:28 to represent non rectangles objects that 03:32 are not really rectangular also by 03:35 enclosing these objects in minimum 03:38 bounding rectangles okay so if I wanted 03:42 to represent these four objects in my 03:45 say two-dimensional art tree what I 03:49 would do is I would bound this with a 03:52 we'll call an M be our minimum bounding 03:54 rectangle and I'll use this as kind of a 03:58 surrogate for that okay and eventually 04:02 when you do your search and you want to 04:05 find an intersection your search will 04:08 get you here thinking you have a 04:11 rectangle and then to verify that you 04:13 actually have an intersect you have to 04:15 go back and do an intersect with the 04:16 actual object okay so similarly that 04:20 cross would be represented as a 04:23 rectangle 04:25 the u-turn is a rectangle in that odd 04:29 shape is a rectangle alright so as I 04:39 said it's an extension of a B plus tree 04:41 so you have two types of nodes you have 04:45 data nodes these are the leaves of the 04:48 our tree and data nodes will contain the 04:52 objects that you're storing just like in 04:54 a B+ tree that they are nodes stored the 04:56 objects in your dictionary here our 05:00 objects happen to be rectangles then you 05:06 have index nodes these are the interior 05:08 nodes these are used to navigate 05:10 yourself to a proper leaf so if you 05:20 think of the structures if you're 05:22 sitting at a node in the structure then 05:26 that node is going to come if it's a 05:28 leaf it's going to comprise some number 05:31 of rectangles your raw data objects if 05:34 it's an index node is going to contain 05:37 the minimum bounding rectangles for the 05:40 objects stored in the sub trees okay so 05:44 for node has four sub trees it'll 05:47 contain four minimum bounding rectangles 05:49 one for each of its four sub trees we'll 05:53 see an example would you make that a 05:54 little bit clearer okay all right so 06:02 first the definition before we look at 06:04 an example so you have an order M 06:08 associated with in our tree so the 06:16 requirement is that every node other 06:19 than the root okay 06:21 must have a certain number of rectangles 06:26 stored in it again I'm using rectangles 06:29 in a general term a rectangle could be a 06:33 data object like in a leaf or it could 06:36 be a minimum bounding rectangle as 06:38 stored in an index node okay so the 06:41 number of rectangles in a node other 06:43 than the root has to lie between some 06:46 minimum number and some maximum number 06:48 okay 06:48 the maximum is M that's the capacity of 06:51 the node then there's a lower bound 06:53 requirement which if this was a B+ tree 06:56 it would be something like this okay but 06:59 you got a little bit flexible and you in 07:02 fact this permit people to use lower 07:04 bounds that may be smaller than M over 07:06 two and in our discussions we'll just 07:11 assume that the bound is ceiling of M 07:13 over two okay so each non root has to 07:17 contain at least ceiling of M over two 07:20 rectangles in it and at most M 07:23 rectangles okay and again in fact as 07:28 typically people use that but not always 07:31 they do sometimes go below that you have 07:41 a relaxed constraint for the root the 07:46 root can contain anywhere from one to 07:50 its capacity M rectangles the number of 08:00 rectangles in an index node equals the 08:02 number of children it has this is an 08:06 another place where this definition 08:08 differs from that of a B+ tree okay two 08:11 places where we differ one is here in a 08:13 B+ tree you have a definite lower bound 08:15 of M over two where as here at least in 08:18 practice people tend to use things that 08:20 may be smaller than that the other one 08:22 is out here if you think back to a B+ 08:26 tree the number of data sitting in an 08:29 index node is one less than the number 08:32 of children okay so if you've got four 08:37 nodes and then you may store the largest 08:42 value saying this or the smallest value 08:45 in the second subtree smallest in the 08:46 third smallest in the fourth okay so 08:50 a B+ tree the number of data in an index 08:54 node is 1 less than the number of 08:55 children but here the number of 08:57 rectangles in an index node exactly 09:00 equals the number of children that the 09:01 index node has and then the standard B 09:07 tree type requirement all the data nodes 09:10 that all the leaves are the same level 09:15 all right so the definition is very 09:17 similar to that for a B+ tree there are 09:20 some minor changes or differences let's 09:27 look at an example to try and 09:28 crystallize these ideas or concepts 09:33 we'll construct an artery of order 4 for 09:38 a collection of rectangles in 09:39 two-dimensional space so in such an R 09:44 tree okay the capacity of a node is 4 so 09:49 a data node can have up to 4 rectangles 09:52 in it an index node can have up to 4 09:56 minimum bounding rectangles in it if you 10:00 are the root you can have as few as 1 10:03 minimum bounding rectangle if you're not 10:06 a root you could have at least 2 ok and 10:10 we will so half of your order alright so 10:16 here's my collection of two-dimensional 10:19 rectangles the first question that may 10:26 arise is well how do you decide which of 10:28 these rectangles gets grouped into a 10:32 leaf there's really no no ordering 10:36 relationship on a bunch of rectangles 10:41 you could for example take these four 10:46 rectangles and put them all into one 10:48 group and put them into a leaf okay or 10:51 you could take this rectangle that 10:54 rectangle at this point and that 10:56 rectangle and put them all into a leaf 10:58 okay so the structure really doesn't 11:01 place any constraint on how you select 11:03 the 11:04 tangles that go into a data node is 11:07 legitimate to take any subset of e 11:12 rectangles so long as you don't take 11:14 more than M and of course you got to 11:20 have at least M over 2 unless you happen 11:23 to be the root all right so let's look 11:30 at a possible grouping of our rectangles 11:34 into leaves and I'm going to group them 11:37 into a total of 12 leaves because this 11:40 is possible only if you have at most 48 11:42 rectangles all right so my first 11:48 grouping will contain these four 11:50 rectangles and would have tossed them 11:52 all into one leaf node or one data node 11:56 and in red I've got its minimum bounding 11:59 rectangle okay so the parent of this 12:03 data node will contain this red 12:05 rectangle unit even though this 12:11 rectangle here overlaps with this 12:13 minimum bounding rectangle this 12:15 rectangle or a part of it is not stored 12:19 in this node okay so in an artery 12:24 you don't take a data object and chop it 12:26 up into pieces like we've been doing 12:28 with some of our other structures the 12:31 entire object is stored in a single 12:33 place all right so here's another 12:38 grouping that I'm going to have and 12:41 again in red I've got its minimum 12:43 bounding rectangle and this leaf or data 12:45 node will have these four rectangles in 12:47 it there's another one for my third leaf 12:51 is one for a fourth leaf okay 12:54 so I'll have these four because it's not 12:57 necessary to have for anything between 13:01 two and four we'll do and as you can see 13:05 here whenever bounding rectangles may 13:07 overlap okay so you got another one 13:13 three rectangles and a point and again 13:15 we have an overlap between minimum 13:18 bounding rectangles okay right here's my 13:26 7/2 rectangle another one all right so 13:40 I've taken my collection of 13:43 two-dimensional rectangles and 13:45 petitioned them out into 12 groups no 13:49 group has more than four rectangles in 13:52 it some of them don't have four for 13:55 example this one here only has three 13:57 okay but the only requirement is that 13:59 you got to have at least two all right 14:09 so this is a possible grouping of my 14:13 data objects into data nodes 14:19 all right the possible our tree okay 14:23 just look at the structure so I'm going 14:29 to have 12 leaf nodes the leaf nodes of 14:32 the data nodes okay and these are going 14:38 to all have to be at the same level okay 14:43 so the data nodes have parents which are 14:45 index nodes the number of rectangles and 14:50 each of these will be able on bounding 14:52 rectangle number of rectangles stored in 14:54 an index node exactly equals the number 14:56 of children that the node has so this 14:58 node is that four children there's a 15:01 leaves you have four bounding rectangles 15:03 this one here is grad only two children 15:06 so it's got two rectangles in it 15:08 this guy's got three children it's got 15:10 three rectangles this guy's got three 15:13 children's are three rectangles okay all 15:17 right so these things here label 15:18 rectangles which will identify in a 15:21 moment 15:21 but this a is the minimum bounding 15:24 rectangle 15:25 for the four rectangles stored in this 15:27 node the B here is the minimum bounding 15:30 rectangle for whatever data is stored 15:33 here C is the minimum bounding rectangle 15:36 for whatever stood here okay all right 15:39 so these index nodes will also have a 15:42 parent okay 15:44 and the number of rectangles stored in 15:50 this node will be exactly equal to the 15:52 number of children that it has which 15:54 happens to be for the M out here is the 15:58 minimum bounding rectangle for ABC and D 16:05 is you have four rectangles here so they 16:07 have an MBR which includes all of these 16:10 four rectangles that's going to be M the 16:13 N out here is the minimum bounding 16:14 rectangle for E and F o is the minimum 16:17 bounding rectangle for G H and I P is 16:19 the minoan bonding rectangle for JK and 16:21 L all right so let's see see that at 16:36 work with our example let's see what 16:42 these bounding rectangles here give us 16:45 the groupings we had decided on a little 16:48 while ago so let's suppose that okay so 16:58 this leaf out here represents this 17:02 rectangle okay so it's about these four 17:05 green rectangles in it the next leaf B 17:09 represents this grouping so it's got 17:13 these three rectangles in that point and 17:16 then it's sibling represents this 17:19 rectangle and it's sibling represents 17:22 this one okay again this at this point 17:26 is just a matter of choice I've decided 17:28 to take these four groups and make them 17:32 all siblings out here 17:36 there really aren't any rules at this 17:37 point to say how you select which groups 17:40 become siblings you can do it any way 17:41 you like and you still be able to build 17:43 a tree all you will affect is the search 17:49 complexity the number of nodes are 17:52 getting examined when you do a search 17:53 will differ depending upon how you group 17:57 the original data and then how you group 18:01 minimum bounding rectangles at higher 18:03 levels all right so let's suppose that 18:08 I've chosen to make a b c and d at these 18:11 four groups siblings then this a 18:16 represents that bounding rectangle and 18:20 then the be out there is this bounding 18:22 rectangle and so on now the m out there 18:27 okay right let's this fellow up here 18:31 this M is going to be the bounding 18:33 rectangle for a B C and D so it's going 18:37 to be this big red rectangle that I just 18:40 gave you okay so you take the minimum 18:46 bounding rectangle for ABC and D that 18:49 becomes M all right suppose that this is 18:57 our E and that's our F again you could 19:00 just as well have said that's the E and 19:02 that's your F okay but let's suppose 19:04 that's how a and F then here these four 19:09 rectangles are stored here and these 19:11 four are stored over there and then 19:15 irreps this bounding rectangle and that 19:17 F is this bounding rectangle the parent 19:20 n out there is going to be the bounding 19:22 rectangle for E and F okay so the parent 19:27 will be that rectangle 19:34 all right if I look at the sibling of 19:36 this notes that's got a G H and I any of 19:41 the remaining subgroups are candidates 19:44 for G H and I and I'm you assume we 19:49 chose it this way okay then in the root 19:55 we have a bounding rectangle o which is 19:58 the bounding rectangle for G H and I so 20:01 it's going to be that fellow okay well 20:05 then for the last guy you don't have any 20:07 choice so we've got JK l would be these 20:13 three fellows and then the bounding 20:14 rectangle for JK l is P which is sitting 20:17 in the root all right so an r plus tree 20:27 is basically a B plus tree which stores 20:35 multi-dimensional objects we've seen 20:38 other structures that could be used for 20:40 multi dimensional objects but those are 20:42 mostly binary tree type structures here 20:45 we have a high degree tree which makes 20:47 it suitable for storage on a disk so you 20:50 can reduce the number of disk accesses 20:52 made so if you really have a very large 20:55 collection and you want to store them on 20:57 a disk you wouldn't want to use a binary 20:59 tree type structure all right any 21:06 questions about what an R plus tree is 21:08 before we look at how you perform 21:11 queries inserts and deletes yeah 21:21 is there any ordering of MN o P up there 21:24 well the only ordering that comes is 21:27 once you have decided this is the 21:30 leftmost child then M has to occupy the 21:33 first position but this could easily 21:35 have been here and that could easily 21:37 have been there in which case this would 21:39 look like n m o P okay so there really 21:44 no constraints on how you order the 21:46 siblings or anything o at any other 21:54 questions alright so the kind of curries 22:03 that your support our intersection 22:06 queries so somebody gives you a query 22:08 rectangle and you need to report all of 22:12 the objects in your collection that have 22:14 some overlap with this object over this 22:17 query rectangle alright so in blue I've 22:27 got a query rectangle I want to find 22:31 everything in my collection that 22:32 overlaps with this so I want to report 22:35 this object that object that object that 22:39 object in that one all right the way you 22:49 do it is you start at the root of your R 22:53 plus tree and you look at the MBRs that 22:56 are stored in the R plus tree and see 22:58 which ones have an overlap with a query 23:00 rectangle those are the sub trees you 23:04 have to search and you recursively apply 23:07 this rule at all levels till you get 23:09 down to a data node and there you do a 23:13 real intersection with the data that's 23:15 sitting there 23:21 alright so the root of our r+ tree has 4 23:26 MB ours and m n o in P so that's what 23:30 you see at the root 23:32 that's your curry so you see if the 23:37 query intersects M it does it intersects 23:40 oh oh and it intersects and but not P so 23:43 there are three sub trees that you have 23:45 to search 23:55 I'll search those three sub trees 23:58 recursively to report the follows that 24:03 intersect okay so when you're searching 24:06 em we go down to that leftmost child of 24:12 the root and over there we find these 24:17 four bounding rectangles so you run your 24:23 intersection test against these four 24:25 bounding rectangles and you find you 24:29 have an overlap with that rectangle you 24:31 have an overlap with this one not with 24:34 that and not with that one okay so now 24:38 you find your way down to two data nodes 24:40 in that subtree and then each of those 24:45 has four rectangles or at most four 24:49 rectangles so now when you get here you 24:50 got to see if any of those four 24:53 intersects with this and then when you 24:55 get in here see if any of those four 24:57 intersects with that if so you report 24:59 them all right so that's how a search 25:02 works 25:11 if you want to perform an insert again 25:20 at some level the ideas are pretty much 25:22 the same as those employed in a B+ tree 25:27 but since there is really no rule to 25:30 tell you how you group rectangles 25:34 together the new object you're inserting 25:39 could at least in principle go into any 25:42 one of your better notes okay there's 25:46 nothing to stop you from putting it into 25:47 any one of the data nodes in there you 25:50 have total freedom to do that and if 25:54 they don't they don't know you're trying 25:55 to put it into exceeds its capacity 25:57 you'd have to split it okay so you can 26:01 put it any way you want 26:02 and if the capacity is exceeded you need 26:04 to split and so we have some flexibility 26:08 here in trying to build an our tree 26:12 which can be searched efficiently okay 26:19 few thing back at the search algorithm 26:21 we looked at its complexity in terms of 26:26 number of nodes you actually visit 26:28 depends a lot on how you did the 26:30 grouping okay so you'd like to build it 26:35 so that the probability of searching too 26:38 many nodes is reduced okay and this is 26:44 where you have leeway so you need some 26:50 rules to tell you which leaf should you 26:52 insert the new data object into and then 26:57 when a split takes place you need some 27:00 rules to tell you how do you do the 27:01 split now there really aren't any rules 27:09 that around that you can say well 27:11 theoretically this is the best way to do 27:13 it so you end up using heuristics which 27:18 have good intuitive appeal 27:22 so first let's take a look at leaf 27:24 selection okay to determine the leaf you 27:29 really have to start at the top okay I 27:31 mean you can't examine all of your data 27:33 at this time to decide which leaf to go 27:35 into you really have to start at the top 27:37 and at each index node you have to make 27:40 a heuristic choice which sub tree to 27:43 move into and finally whichever leaf you 27:45 reach that's where you'll make the 27:47 insert okay so you're going to follow a 27:51 path and to guide you in the path to 27:56 follow it's a heuristic really you take 27:59 a look at your choices okay and say well 28:03 if you moved into one of these sub trees 28:06 and you made the insert there well that 28:09 would change the MBR for the collection 28:11 of data stored there and by how much 28:14 would you change the MBR the minimum 28:17 bounding rectangle okay so the heuristic 28:25 here says of your choices move into the 28:29 sub tree whose minimum bounding 28:31 rectangle area would increase the least 28:35 the rationale for this is when you do a 28:38 query ok the probability of moving into 28:42 a sub tree is related to the area of 28:44 that MBRs if you got a large area of a 28:47 higher probability a query would cause 28:49 you to move into it if you have a small 28:51 area you have a smaller probability of 28:53 moving into it okay so based on that 28:56 simple intuition you end up with this 28:58 heuristic which says trying to keep the 29:01 MBRs of your rectangles of your covering 29:06 rectangle try to keep the area of the 29:08 MBRs as small as possible all right so 29:14 what I'm trying to insert a rectangle 29:16 into my example our tree and this is 29:19 what I see at the root okay and here is 29:23 the rectangle I'm trying to insert well 29:28 if I insert it into the leftmost subtree 29:30 okay then the MBR is going to change 29:33 from m 29:34 to this one okay so the area of my MBR 29:38 goes up by this amount okay if I put it 29:44 into the next subtree the MBR of that is 29:48 n and if this rectangle goes there the 29:51 MBR is going to look like this when 29:54 you're done and so the area changes by 29:57 the area of this big rectangle minus the 29:59 area of the original one so that's the 30:02 increase in area if you put it into oh 30:07 okay your new MBR would look like that 30:11 so it's this much area by which you go 30:15 up and if you put it into P okay you go 30:20 up by that okay so you do this 30:23 computation on all of your choices and 30:25 figure out which choice seals the 30:27 smallest area increase and that's where 30:30 you do the insert all right so that 30:37 tells you how to go down to do an insert 30:40 eventually you reach a leaf node if the 30:43 leaf isn't already full you can place a 30:46 new object in there if the leaf is full 30:48 you have to split it so now you have to 30:51 decide how do you split you have total 30:53 flexibility the only requirement is that 30:56 the two new leaves must each contain 31:01 some minimum number of rectangles that's 31:03 little m in our example two and no one 31:06 should contain more than four in our 31:08 example of Big M in general other than 31:11 that you can split anywhere you like 31:13 okay right so that M plus one rectangles 31:17 we need to split them into two sets a 31:19 and B each of them has to have the 31:23 minimum number and of course no one can 31:27 have more than M okay and the heuristic 31:32 that's used to split here is you want to 31:37 do it so that the sum of the areas of 31:39 the rectangles a and B that you create 31:42 is minimum so the subsets a and B have 31:46 MBRs you measure their areas 31:48 you want to think again the rationale is 31:51 the same if you keep the area small then 31:55 the probability of having to search them 31:57 become small all right so let's say I 32:05 reach a leaf and this is an example with 32:09 Big M equals 8 and little M equals 4 so 32:12 reach a leaf 32:13 it already has 8 rectangles in it I'm 32:18 going to insert this red one it now has 32:20 9 I need to split it okay so one 32:26 possibility is to have a which has these 32:28 four in it okay and then the other five 32:33 go into B so the you compute the area of 32:37 this blue rectangle in the area of that 32:39 yellow one another partitioning is like 32:44 this so the blue one has these four 32:46 sorry those five and then the yellow one 32:49 gets the four down there okay so 32:52 depending upon how you partition you get 32:54 different areas all right so the 33:06 question is well how do you pick simpler 33:10 strategies to just try out all possible 33:12 choices for a and B and for each choice 33:16 you just compute the area of the MBR of 33:21 a and the area of the MBR of B add them 33:23 up find them in across all possible 33:24 choices once you pick a B is fixed okay 33:34 so if you're looking only at a s that 33:38 have exactly M the minimum number in 33:40 them okay then you better wrestle with 33:46 that many choices the number of ways you 33:48 can choose m items out of n plus 1 now 33:54 if you're dealing with small decrease 33:56 that's not too bad but 33:59 if you're dealing with large degrees 34:03 let's say expensive computation so 34:10 people then really resort to automate 34:12 call heuristics clustering type 34:14 strategies so use a clustering strategy 34:19 and if you're familiar with these 34:21 typically the way you build clusters is 34:23 you seed them so you start with a 34:26 candidate object in a and a candidate 34:28 object in B and then you grow these 34:31 clusters say one object at a time okay 34:35 so you grow a and B and the growth in 34:41 this case continues until you have 34:46 assigned all of your rectangles alright 34:52 so you need a seeding strategy then you 34:56 need a growth strategy two popular 35:06 strategies to cluster for our trees one 35:11 is called the quadratic method the other 35:13 one we're going to look at is the linear 35:14 method okay 35:16 the quadratic method as the name 35:19 suggests going to run in square of M 35:21 time all right so I've got em plus one 35:29 rectangles that need to be partitioned 35:31 into two groups or two clusters so first 35:36 I'm going to find a seed okay so this is 35:39 the seed for cluster a that's the seed 35:42 for cluster B and the way I select the 35:46 seed is by maximizing this function okay 35:52 what does this function give you right 35:57 so here I've got the minimum bounding 35:59 rectangle of the two seeds okay now look 36:02 at the area of that and from that I 36:04 subtract the area of each of the seeds 36:07 okay so in some sense this measures the 36:10 penalty associated with 36:12 and be into the same cluster this is the 36:16 amount by which the area of this cluster 36:20 grows individually they only had this 36:23 much area but together you end up with 36:25 this much so this gives you the amount 36:27 of unused space in that MBR so it's a 36:30 penalty function okay so you find two 36:34 rectangles for which this penalty 36:36 function is maximum and then you start 36:38 by putting them into different clusters 36:44 all right so let's say this is my 36:45 collection of rectangles I want to find 36:47 two seeds if these are by a and B okay 36:54 then the MBR is this so the penalty 36:58 associated with putting these two 37:00 fellows in the same cluster at this time 37:02 is the white space in that rectangle 37:04 okay so the white space is MBR of the 37:08 area of the MBR minus the area of the 37:10 two blue boxes okay that's what that 37:12 penalty function has if you put those 37:19 two fellows in the same cluster then the 37:24 MBR is given by the red rectangle and 37:26 the penalty is the white space so that's 37:29 the area of the red rectangle minus the 37:31 area of the two blue ones okay so you 37:35 have more to lose by putting these two 37:37 fellows in the in the same cluster than 37:40 we had by putting the previous two okay 37:42 so we find two rectangles that have the 37:46 highest penalty if they were put into 37:48 the same cluster and we use those as 37:51 seeds for a and B all right so for this 37:55 you value try out all M square 37:57 possibilities and pick the one with the 38:00 maximum penalty right so once you've got 38:04 the seed then you look at your 38:09 rectangles and grow these clusters these 38:16 clusters have one rectangle each now 38:18 we're going to now grow them one at a 38:20 time and to grow 38:24 you select a rectangle that's not yet 38:26 assigned to a cluster it so examine all 38:30 the rectangles not yet assigned to a 38:32 cluster and I compute this function okay 38:36 this says if you take a rectangle C and 38:39 you put it into cluster a okay then the 38:43 penalty but how much does the area of 38:45 cluster a go up this is the new area of 38:47 cluster array that's the old area of 38:49 cluster Rea okay so if I put C into 38:52 cluster riots area would go up by this 38:54 much but if I put C into cluster B its 38:57 area would go up by that much okay so 39:00 this difference tells me how much of an 39:03 affinity the C have for a over B if this 39:07 difference is zero as far as C is 39:09 concerned doesn't matter where you put 39:10 it if this difference is huge either 39:13 positive or negative then C has a big 39:16 affinity either to go here or to go 39:18 there okay versus the other cluster okay 39:22 so we're going to pick a rectangle that 39:25 has a relative high affinity for another 39:27 clusters and then put it into that 39:30 cluster okay all right so let's say I've 39:35 seeded this is my cluster a that's my 39:38 cluster B the yellow one okay I look at 39:40 this rectangle if I put it into cluster 39:44 a the penalty is the white space 39:47 okay look at this bounding rectangle 39:49 area - this blue array of - that okay 39:54 sorry I'm looking at okay this is my new 39:59 MBR okay and then I look at the old NBR 40:02 is this one okay so subtract the area of 40:05 the new MBR and the old one so the 40:08 penalty is I guess really this whole 40:10 thing here what's that 40:13 okay so that's the cost of growing this 40:16 cluster by adding this rectangle okay if 40:20 you took this rectangle and put it in 40:22 there then your cost is the area of this 40:24 big rectangle - the yellow one okay so 40:27 you look at the difference between these 40:29 two costs and you see that this one has 40:32 a preference for being here versus being 40:35 out there and the difference in the cost 40:37 you how much of a preferences that have 40:39 of being here was is there okay and then 40:42 you try it out with another rectangle 40:44 okay in this guy's cost is this big area 40:49 - this blue one if this guy went over 40:52 there it would be this huge rectangle - 40:55 the yellow one okay so this guy also has 40:57 a preference for being here over there 41:00 but the relative preference is much more 41:02 for this guy than it was for that guy 41:05 okay so we would prefer at this round to 41:08 put this guy in here then to put that 41:10 guy in here at this time so you try this 41:14 out or all of them and figure out which 41:17 one has the highest relative preference 41:19 to be in one of the clusters and you put 41:21 it in okay all right so once you've 41:25 found the see that maximize their 41:26 objective function you put it into the 41:29 appropriate place and then you continue 41:33 this in each round one rectangle gets 41:36 assigned okay that's how you do this 41:38 point all right 41:46 there's a linear method over here you 41:53 look at normalized separation so if you 41:57 take two rectangles you measure the 42:00 separation in the x-direction which is 42:04 defined as the you take the largest left 42:09 edge and the smallest right edge of the 42:15 two rectangles and subtract the distance 42:18 okay so if you're looking at the 42:22 normalized separation this blue in that 42:24 yellow you look at the largest left edge 42:27 that's this one and the smaller right 42:29 edge and subtract the distance okay so 42:33 you find the rectangles which have the 42:35 maximum x separation and you divide that 42:38 by the spread in X values to normalize 42:42 you do the same thing in the Y direction 42:45 you look at the 42:50 at the largest bottom and subtract from 42:52 that the smaller top largest part of 42:56 mine is smaller top in this case you get 42:57 a negative number okay and here again 43:02 you would take the larger bottom minus 43:04 the smaller top so you get a positive 43:06 separation okay 43:09 and then you normalize by the y spread 43:12 the pair of rectangles that have the 43:15 maximum normalized separation are used 43:19 as the seeds okay all right so once 43:23 you've got the seeds then you take the 43:26 other rectangles in some random order 43:28 and you assign them to whichever 43:31 rectangle they have a preference 43:33 whichever partition okay and so that's 43:37 just done by looking at how much the MBR 43:40 goes up if you put it into a versus it 43:42 to be and so this process runs in linear 43:47 time okay so those are the two more 43:51 popular ways of doing the split right 43:57 finally we have the delete operation so 44:01 to do a delete you have to find the 44:05 rectangle so we use the query algorithm 44:08 to find the rectangle then you reach a 44:11 leaf you take the rectangle out if the 44:15 capacity hasn't gone below the minimum 44:18 you're okay for it's gone below the 44:19 minimum you need to do what you would do 44:25 when you're dealing with a B plus tree 44:27 you try to borrow from a neighbor and 44:32 when you borrow you have to readjust the 44:34 npr's and if you can't borrow you do a 44:40 combine and then you have to redo the 44:45 MBRs 44:50 so as far as delete goes there aren't 44:54 any new ideas really here though it's 44:57 probably worth mentioning that some 44:58 people are instead of just doing the B+ 45:02 tree type of delete they use the delete 45:04 as an opportunity to reorganize the 45:06 whole structure or at least do a more 45:09 more than just this but try and do some 45:12 optimization on the petition inks but I 45:16 could get pretty expensive alright so 45:21 basically that's what an our tree is 45:24 it's an extension of a B+ tree used to 45:27 store objects and multi-dimensional 45:31 space you can perform queries searches 45:33 and inserts performance depends a lot on 45:37 how well these heuristics function in 45:40 terms of getting you partitioning x' 45:43 which don't require you to search too 45:45 many nodes the structure is used quite a 45:50 bit in robotic type applications you've 45:52 got all kinds of objects sitting inside 45:54 3-dimensional space and people are 45:56 moving around you need to be able to 45:57 tell who would you hit if you moved in a 46:00 certain direction say there are a number 46:04 of variants of this structure of time 46:07 maybe only just to mention the name so 46:09 I'm going to kind of skip the little 46:11 descriptions that might come with them 46:12 so it leaves familiar with the names 46:14 something called an R plus tree very 46:17 similar to and sorry in our start tree 46:20 there's a Hilbert artery they're quite 46:26 similar to the our tree that we've 46:27 looked at there's an hour-plus tree what 46:34 this one does is it doesn't allow 46:40 overlapping MBRs so if you don't allow 46:44 overlapping MBRs you have to accept 46:46 cutting objects into pieces so some 46:49 object may fall into two or more NVRs so 46:56 you cut objects into pieces and they 47:00 have to relax the upper and lower bounds 47:02 on 47:03 data stored in a node in order to make 47:05 the inserts and deletes somewhat 47:06 manageable so this structure can end up 47:10 in a degenerate case where you only have 47:14 a long chain of nodes because there's 47:16 really no enforcement of lower bound of 47:18 a node but again and practice you you 47:22 tend to get good balance even though 47:26 you're not enforcing the lower bound all 47:30 right finally there's something called a 47:31 Cell tree which combines ideas from the 47:34 binary space partitioning tree we looked 47:36 at last time and the art plus tree that 47:40 I very briefly mentioned a second ago 47:42 instead of using MBRs you use convex 47:46 polyhedra to divide the space out and 47:50 that reduces the amount of white space 47:52 in a region white space being the space 47:55 that's not occupied by objects in your 47:57 data set and so hopefully that reduces 48:00 the number of nodes you have to visit 48:01 all right again 48:09 as far as Leafs go they don't enforce 48:11 any lower upper bounds when you're not 48:16 going to allow overlap of MPR's 48:19 you really can't have an upper bound on 48:21 the capacity of a leaf node because you 48:24 may have a large number of rectangles 48:25 that overlap and cover the same point 48:31 but they do enforce a lower bound on the 48:35 degree of an index node and so you can't 48:37 get a degenerate version here all right 48:44 so that's about all I had to say on our 48:46 trees you have any questions No 48:50 okay well hopefully you're gonna have a 48:53 good final exam normally I'll call it a 48:55 final a good third exam okay all right 49:01 you