Transcript


00:00
well this is our last lecture for the
00:03
semester we have a final exam that's
00:06
scheduled for two Fridays from now in
00:09
December 16th and it's pretty early in
00:13
the morning it's like 7:30 or something
00:16
that there is say well guess there are a
00:18
few sample tests from the past they're
00:20
on the website the syllabus is posted on
00:22
the website again as you've been doing
00:25
for the other two tests do all the
00:30
preparation you think you need and then
00:33
try out the sample tests they'll be a
00:34
good indicator of whether you're ready
00:37
to take the last exam or not again as
00:41
was the case with the other tests there
00:42
are no surprises if you if you study and
00:45
if you can do these sample tests there's
00:47
no reason why you shouldn't do the third
00:50
test completely alright you have any
00:54
questions before I start talking about
00:56
our last data structure the artery okay
01:03
all right so today we're going to look
01:06
at our trees AR trees R and C are you
01:12
able to read the stuff we need more
01:14
contrast yeah it's okay alright alright
01:20
so so the artery data structure is
01:24
really an extension of a structure we
01:26
saw perhaps toward the middle of this
01:29
course there was the B plus tree if you
01:34
recall in a B plus tree which itself is
01:38
a variant of a B tree we basically had
01:41
two kinds of nodes we had data nodes and
01:44
then we had index nodes so data nodes
01:46
formed the leaves of the B plus tree
01:48
index nodes were used to navigate from
01:51
the root to the proper leaf where you
01:53
would find the data you're looking for
01:56
all the leaf nodes are the same level
01:58
and then there were constraints on the
02:01
minimum and maximum degree for an index
02:03
node all right
02:05
so an artery is really an extension of
02:09
the ideas used in the B plus tree
02:13
it's used to store multi-dimensional
02:18
objects okay so you might in two
02:21
dimensions you might be storing a
02:23
collection of say rectangles where you
02:25
interpret rectangle in its most general
02:28
term terms so a straight line is a
02:31
rectangle with very small width are very
02:36
small height and a point is also a
02:38
rectangle so in two dimensions for
02:44
example we may want to be storing this
02:47
collection of objects okay so the greens
02:50
are their art angles and I got some blue
02:52
points I don't have any lines in this
02:55
example and then you want to be able to
03:01
insert rectangles and time you want to
03:03
be able to delete rectangles in time you
03:06
also want to be able to perform curries
03:09
typically the kind of curries you
03:11
perform here intersection pipe curries
03:13
so you may be given a career rectangle
03:16
and find all the objects that intersect
03:19
that curry rectangle now you can use it
03:28
to represent non rectangles objects that
03:32
are not really rectangular also by
03:35
enclosing these objects in minimum
03:38
bounding rectangles okay so if I wanted
03:42
to represent these four objects in my
03:45
say two-dimensional art tree what I
03:49
would do is I would bound this with a
03:52
we'll call an M be our minimum bounding
03:54
rectangle and I'll use this as kind of a
03:58
surrogate for that okay and eventually
04:02
when you do your search and you want to
04:05
find an intersection your search will
04:08
get you here thinking you have a
04:11
rectangle and then to verify that you
04:13
actually have an intersect you have to
04:15
go back and do an intersect with the
04:16
actual object okay so similarly that
04:20
cross would be represented as a
04:23
rectangle
04:25
the u-turn is a rectangle in that odd
04:29
shape is a rectangle alright so as I
04:39
said it's an extension of a B plus tree
04:41
so you have two types of nodes you have
04:45
data nodes these are the leaves of the
04:48
our tree and data nodes will contain the
04:52
objects that you're storing just like in
04:54
a B+ tree that they are nodes stored the
04:56
objects in your dictionary here our
05:00
objects happen to be rectangles then you
05:06
have index nodes these are the interior
05:08
nodes these are used to navigate
05:10
yourself to a proper leaf so if you
05:20
think of the structures if you're
05:22
sitting at a node in the structure then
05:26
that node is going to come if it's a
05:28
leaf it's going to comprise some number
05:31
of rectangles your raw data objects if
05:34
it's an index node is going to contain
05:37
the minimum bounding rectangles for the
05:40
objects stored in the sub trees okay so
05:44
for node has four sub trees it'll
05:47
contain four minimum bounding rectangles
05:49
one for each of its four sub trees we'll
05:53
see an example would you make that a
05:54
little bit clearer okay all right so
06:02
first the definition before we look at
06:04
an example so you have an order M
06:08
associated with in our tree so the
06:16
requirement is that every node other
06:19
than the root okay
06:21
must have a certain number of rectangles
06:26
stored in it again I'm using rectangles
06:29
in a general term a rectangle could be a
06:33
data object like in a leaf or it could
06:36
be a minimum bounding rectangle as
06:38
stored in an index node okay so the
06:41
number of rectangles in a node other
06:43
than the root has to lie between some
06:46
minimum number and some maximum number
06:48
okay
06:48
the maximum is M that's the capacity of
06:51
the node then there's a lower bound
06:53
requirement which if this was a B+ tree
06:56
it would be something like this okay but
06:59
you got a little bit flexible and you in
07:02
fact this permit people to use lower
07:04
bounds that may be smaller than M over
07:06
two and in our discussions we'll just
07:11
assume that the bound is ceiling of M
07:13
over two okay so each non root has to
07:17
contain at least ceiling of M over two
07:20
rectangles in it and at most M
07:23
rectangles okay and again in fact as
07:28
typically people use that but not always
07:31
they do sometimes go below that you have
07:41
a relaxed constraint for the root the
07:46
root can contain anywhere from one to
07:50
its capacity M rectangles the number of
08:00
rectangles in an index node equals the
08:02
number of children it has this is an
08:06
another place where this definition
08:08
differs from that of a B+ tree okay two
08:11
places where we differ one is here in a
08:13
B+ tree you have a definite lower bound
08:15
of M over two where as here at least in
08:18
practice people tend to use things that
08:20
may be smaller than that the other one
08:22
is out here if you think back to a B+
08:26
tree the number of data sitting in an
08:29
index node is one less than the number
08:32
of children okay so if you've got four
08:37
nodes and then you may store the largest
08:42
value saying this or the smallest value
08:45
in the second subtree smallest in the
08:46
third smallest in the fourth okay so
08:50
a B+ tree the number of data in an index
08:54
node is 1 less than the number of
08:55
children but here the number of
08:57
rectangles in an index node exactly
09:00
equals the number of children that the
09:01
index node has and then the standard B
09:07
tree type requirement all the data nodes
09:10
that all the leaves are the same level
09:15
all right so the definition is very
09:17
similar to that for a B+ tree there are
09:20
some minor changes or differences let's
09:27
look at an example to try and
09:28
crystallize these ideas or concepts
09:33
we'll construct an artery of order 4 for
09:38
a collection of rectangles in
09:39
two-dimensional space so in such an R
09:44
tree okay the capacity of a node is 4 so
09:49
a data node can have up to 4 rectangles
09:52
in it an index node can have up to 4
09:56
minimum bounding rectangles in it if you
10:00
are the root you can have as few as 1
10:03
minimum bounding rectangle if you're not
10:06
a root you could have at least 2 ok and
10:10
we will so half of your order alright so
10:16
here's my collection of two-dimensional
10:19
rectangles the first question that may
10:26
arise is well how do you decide which of
10:28
these rectangles gets grouped into a
10:32
leaf there's really no no ordering
10:36
relationship on a bunch of rectangles
10:41
you could for example take these four
10:46
rectangles and put them all into one
10:48
group and put them into a leaf okay or
10:51
you could take this rectangle that
10:54
rectangle at this point and that
10:56
rectangle and put them all into a leaf
10:58
okay so the structure really doesn't
11:01
place any constraint on how you select
11:03
the
11:04
tangles that go into a data node is
11:07
legitimate to take any subset of e
11:12
rectangles so long as you don't take
11:14
more than M and of course you got to
11:20
have at least M over 2 unless you happen
11:23
to be the root all right so let's look
11:30
at a possible grouping of our rectangles
11:34
into leaves and I'm going to group them
11:37
into a total of 12 leaves because this
11:40
is possible only if you have at most 48
11:42
rectangles all right so my first
11:48
grouping will contain these four
11:50
rectangles and would have tossed them
11:52
all into one leaf node or one data node
11:56
and in red I've got its minimum bounding
11:59
rectangle okay so the parent of this
12:03
data node will contain this red
12:05
rectangle unit even though this
12:11
rectangle here overlaps with this
12:13
minimum bounding rectangle this
12:15
rectangle or a part of it is not stored
12:19
in this node okay so in an artery
12:24
you don't take a data object and chop it
12:26
up into pieces like we've been doing
12:28
with some of our other structures the
12:31
entire object is stored in a single
12:33
place all right so here's another
12:38
grouping that I'm going to have and
12:41
again in red I've got its minimum
12:43
bounding rectangle and this leaf or data
12:45
node will have these four rectangles in
12:47
it there's another one for my third leaf
12:51
is one for a fourth leaf okay
12:54
so I'll have these four because it's not
12:57
necessary to have for anything between
13:01
two and four we'll do and as you can see
13:05
here whenever bounding rectangles may
13:07
overlap okay so you got another one
13:13
three rectangles and a point and again
13:15
we have an overlap between minimum
13:18
bounding rectangles okay right here's my
13:26
7/2 rectangle another one all right so
13:40
I've taken my collection of
13:43
two-dimensional rectangles and
13:45
petitioned them out into 12 groups no
13:49
group has more than four rectangles in
13:52
it some of them don't have four for
13:55
example this one here only has three
13:57
okay but the only requirement is that
13:59
you got to have at least two all right
14:09
so this is a possible grouping of my
14:13
data objects into data nodes
14:19
all right the possible our tree okay
14:23
just look at the structure so I'm going
14:29
to have 12 leaf nodes the leaf nodes of
14:32
the data nodes okay and these are going
14:38
to all have to be at the same level okay
14:43
so the data nodes have parents which are
14:45
index nodes the number of rectangles and
14:50
each of these will be able on bounding
14:52
rectangle number of rectangles stored in
14:54
an index node exactly equals the number
14:56
of children that the node has so this
14:58
node is that four children there's a
15:01
leaves you have four bounding rectangles
15:03
this one here is grad only two children
15:06
so it's got two rectangles in it
15:08
this guy's got three children it's got
15:10
three rectangles this guy's got three
15:13
children's are three rectangles okay all
15:17
right so these things here label
15:18
rectangles which will identify in a
15:21
moment
15:21
but this a is the minimum bounding
15:24
rectangle
15:25
for the four rectangles stored in this
15:27
node the B here is the minimum bounding
15:30
rectangle for whatever data is stored
15:33
here C is the minimum bounding rectangle
15:36
for whatever stood here okay all right
15:39
so these index nodes will also have a
15:42
parent okay
15:44
and the number of rectangles stored in
15:50
this node will be exactly equal to the
15:52
number of children that it has which
15:54
happens to be for the M out here is the
15:58
minimum bounding rectangle for ABC and D
16:05
is you have four rectangles here so they
16:07
have an MBR which includes all of these
16:10
four rectangles that's going to be M the
16:13
N out here is the minimum bounding
16:14
rectangle for E and F o is the minimum
16:17
bounding rectangle for G H and I P is
16:19
the minoan bonding rectangle for JK and
16:21
L all right so let's see see that at
16:36
work with our example let's see what
16:42
these bounding rectangles here give us
16:45
the groupings we had decided on a little
16:48
while ago so let's suppose that okay so
16:58
this leaf out here represents this
17:02
rectangle okay so it's about these four
17:05
green rectangles in it the next leaf B
17:09
represents this grouping so it's got
17:13
these three rectangles in that point and
17:16
then it's sibling represents this
17:19
rectangle and it's sibling represents
17:22
this one okay again this at this point
17:26
is just a matter of choice I've decided
17:28
to take these four groups and make them
17:32
all siblings out here
17:36
there really aren't any rules at this
17:37
point to say how you select which groups
17:40
become siblings you can do it any way
17:41
you like and you still be able to build
17:43
a tree all you will affect is the search
17:49
complexity the number of nodes are
17:52
getting examined when you do a search
17:53
will differ depending upon how you group
17:57
the original data and then how you group
18:01
minimum bounding rectangles at higher
18:03
levels all right so let's suppose that
18:08
I've chosen to make a b c and d at these
18:11
four groups siblings then this a
18:16
represents that bounding rectangle and
18:20
then the be out there is this bounding
18:22
rectangle and so on now the m out there
18:27
okay right let's this fellow up here
18:31
this M is going to be the bounding
18:33
rectangle for a B C and D so it's going
18:37
to be this big red rectangle that I just
18:40
gave you okay so you take the minimum
18:46
bounding rectangle for ABC and D that
18:49
becomes M all right suppose that this is
18:57
our E and that's our F again you could
19:00
just as well have said that's the E and
19:02
that's your F okay but let's suppose
19:04
that's how a and F then here these four
19:09
rectangles are stored here and these
19:11
four are stored over there and then
19:15
irreps this bounding rectangle and that
19:17
F is this bounding rectangle the parent
19:20
n out there is going to be the bounding
19:22
rectangle for E and F okay so the parent
19:27
will be that rectangle
19:34
all right if I look at the sibling of
19:36
this notes that's got a G H and I any of
19:41
the remaining subgroups are candidates
19:44
for G H and I and I'm you assume we
19:49
chose it this way okay then in the root
19:55
we have a bounding rectangle o which is
19:58
the bounding rectangle for G H and I so
20:01
it's going to be that fellow okay well
20:05
then for the last guy you don't have any
20:07
choice so we've got JK l would be these
20:13
three fellows and then the bounding
20:14
rectangle for JK l is P which is sitting
20:17
in the root all right so an r plus tree
20:27
is basically a B plus tree which stores
20:35
multi-dimensional objects we've seen
20:38
other structures that could be used for
20:40
multi dimensional objects but those are
20:42
mostly binary tree type structures here
20:45
we have a high degree tree which makes
20:47
it suitable for storage on a disk so you
20:50
can reduce the number of disk accesses
20:52
made so if you really have a very large
20:55
collection and you want to store them on
20:57
a disk you wouldn't want to use a binary
20:59
tree type structure all right any
21:06
questions about what an R plus tree is
21:08
before we look at how you perform
21:11
queries inserts and deletes yeah
21:21
is there any ordering of MN o P up there
21:24
well the only ordering that comes is
21:27
once you have decided this is the
21:30
leftmost child then M has to occupy the
21:33
first position but this could easily
21:35
have been here and that could easily
21:37
have been there in which case this would
21:39
look like n m o P okay so there really
21:44
no constraints on how you order the
21:46
siblings or anything o at any other
21:54
questions alright so the kind of curries
22:03
that your support our intersection
22:06
queries so somebody gives you a query
22:08
rectangle and you need to report all of
22:12
the objects in your collection that have
22:14
some overlap with this object over this
22:17
query rectangle alright so in blue I've
22:27
got a query rectangle I want to find
22:31
everything in my collection that
22:32
overlaps with this so I want to report
22:35
this object that object that object that
22:39
object in that one all right the way you
22:49
do it is you start at the root of your R
22:53
plus tree and you look at the MBRs that
22:56
are stored in the R plus tree and see
22:58
which ones have an overlap with a query
23:00
rectangle those are the sub trees you
23:04
have to search and you recursively apply
23:07
this rule at all levels till you get
23:09
down to a data node and there you do a
23:13
real intersection with the data that's
23:15
sitting there
23:21
alright so the root of our r+ tree has 4
23:26
MB ours and m n o in P so that's what
23:30
you see at the root
23:32
that's your curry so you see if the
23:37
query intersects M it does it intersects
23:40
oh oh and it intersects and but not P so
23:43
there are three sub trees that you have
23:45
to search
23:55
I'll search those three sub trees
23:58
recursively to report the follows that
24:03
intersect okay so when you're searching
24:06
em we go down to that leftmost child of
24:12
the root and over there we find these
24:17
four bounding rectangles so you run your
24:23
intersection test against these four
24:25
bounding rectangles and you find you
24:29
have an overlap with that rectangle you
24:31
have an overlap with this one not with
24:34
that and not with that one okay so now
24:38
you find your way down to two data nodes
24:40
in that subtree and then each of those
24:45
has four rectangles or at most four
24:49
rectangles so now when you get here you
24:50
got to see if any of those four
24:53
intersects with this and then when you
24:55
get in here see if any of those four
24:57
intersects with that if so you report
24:59
them all right so that's how a search
25:02
works
25:11
if you want to perform an insert again
25:20
at some level the ideas are pretty much
25:22
the same as those employed in a B+ tree
25:27
but since there is really no rule to
25:30
tell you how you group rectangles
25:34
together the new object you're inserting
25:39
could at least in principle go into any
25:42
one of your better notes okay there's
25:46
nothing to stop you from putting it into
25:47
any one of the data nodes in there you
25:50
have total freedom to do that and if
25:54
they don't they don't know you're trying
25:55
to put it into exceeds its capacity
25:57
you'd have to split it okay so you can
26:01
put it any way you want
26:02
and if the capacity is exceeded you need
26:04
to split and so we have some flexibility
26:08
here in trying to build an our tree
26:12
which can be searched efficiently okay
26:19
few thing back at the search algorithm
26:21
we looked at its complexity in terms of
26:26
number of nodes you actually visit
26:28
depends a lot on how you did the
26:30
grouping okay so you'd like to build it
26:35
so that the probability of searching too
26:38
many nodes is reduced okay and this is
26:44
where you have leeway so you need some
26:50
rules to tell you which leaf should you
26:52
insert the new data object into and then
26:57
when a split takes place you need some
27:00
rules to tell you how do you do the
27:01
split now there really aren't any rules
27:09
that around that you can say well
27:11
theoretically this is the best way to do
27:13
it so you end up using heuristics which
27:18
have good intuitive appeal
27:22
so first let's take a look at leaf
27:24
selection okay to determine the leaf you
27:29
really have to start at the top okay I
27:31
mean you can't examine all of your data
27:33
at this time to decide which leaf to go
27:35
into you really have to start at the top
27:37
and at each index node you have to make
27:40
a heuristic choice which sub tree to
27:43
move into and finally whichever leaf you
27:45
reach that's where you'll make the
27:47
insert okay so you're going to follow a
27:51
path and to guide you in the path to
27:56
follow it's a heuristic really you take
27:59
a look at your choices okay and say well
28:03
if you moved into one of these sub trees
28:06
and you made the insert there well that
28:09
would change the MBR for the collection
28:11
of data stored there and by how much
28:14
would you change the MBR the minimum
28:17
bounding rectangle okay so the heuristic
28:25
here says of your choices move into the
28:29
sub tree whose minimum bounding
28:31
rectangle area would increase the least
28:35
the rationale for this is when you do a
28:38
query ok the probability of moving into
28:42
a sub tree is related to the area of
28:44
that MBRs if you got a large area of a
28:47
higher probability a query would cause
28:49
you to move into it if you have a small
28:51
area you have a smaller probability of
28:53
moving into it okay so based on that
28:56
simple intuition you end up with this
28:58
heuristic which says trying to keep the
29:01
MBRs of your rectangles of your covering
29:06
rectangle try to keep the area of the
29:08
MBRs as small as possible all right so
29:14
what I'm trying to insert a rectangle
29:16
into my example our tree and this is
29:19
what I see at the root okay and here is
29:23
the rectangle I'm trying to insert well
29:28
if I insert it into the leftmost subtree
29:30
okay then the MBR is going to change
29:33
from m
29:34
to this one okay so the area of my MBR
29:38
goes up by this amount okay if I put it
29:44
into the next subtree the MBR of that is
29:48
n and if this rectangle goes there the
29:51
MBR is going to look like this when
29:54
you're done and so the area changes by
29:57
the area of this big rectangle minus the
29:59
area of the original one so that's the
30:02
increase in area if you put it into oh
30:07
okay your new MBR would look like that
30:11
so it's this much area by which you go
30:15
up and if you put it into P okay you go
30:20
up by that okay so you do this
30:23
computation on all of your choices and
30:25
figure out which choice seals the
30:27
smallest area increase and that's where
30:30
you do the insert all right so that
30:37
tells you how to go down to do an insert
30:40
eventually you reach a leaf node if the
30:43
leaf isn't already full you can place a
30:46
new object in there if the leaf is full
30:48
you have to split it so now you have to
30:51
decide how do you split you have total
30:53
flexibility the only requirement is that
30:56
the two new leaves must each contain
31:01
some minimum number of rectangles that's
31:03
little m in our example two and no one
31:06
should contain more than four in our
31:08
example of Big M in general other than
31:11
that you can split anywhere you like
31:13
okay right so that M plus one rectangles
31:17
we need to split them into two sets a
31:19
and B each of them has to have the
31:23
minimum number and of course no one can
31:27
have more than M okay and the heuristic
31:32
that's used to split here is you want to
31:37
do it so that the sum of the areas of
31:39
the rectangles a and B that you create
31:42
is minimum so the subsets a and B have
31:46
MBRs you measure their areas
31:48
you want to think again the rationale is
31:51
the same if you keep the area small then
31:55
the probability of having to search them
31:57
become small all right so let's say I
32:05
reach a leaf and this is an example with
32:09
Big M equals 8 and little M equals 4 so
32:12
reach a leaf
32:13
it already has 8 rectangles in it I'm
32:18
going to insert this red one it now has
32:20
9 I need to split it okay so one
32:26
possibility is to have a which has these
32:28
four in it okay and then the other five
32:33
go into B so the you compute the area of
32:37
this blue rectangle in the area of that
32:39
yellow one another partitioning is like
32:44
this so the blue one has these four
32:46
sorry those five and then the yellow one
32:49
gets the four down there okay so
32:52
depending upon how you partition you get
32:54
different areas all right so the
33:06
question is well how do you pick simpler
33:10
strategies to just try out all possible
33:12
choices for a and B and for each choice
33:16
you just compute the area of the MBR of
33:21
a and the area of the MBR of B add them
33:23
up find them in across all possible
33:24
choices once you pick a B is fixed okay
33:34
so if you're looking only at a s that
33:38
have exactly M the minimum number in
33:40
them okay then you better wrestle with
33:46
that many choices the number of ways you
33:48
can choose m items out of n plus 1 now
33:54
if you're dealing with small decrease
33:56
that's not too bad but
33:59
if you're dealing with large degrees
34:03
let's say expensive computation so
34:10
people then really resort to automate
34:12
call heuristics clustering type
34:14
strategies so use a clustering strategy
34:19
and if you're familiar with these
34:21
typically the way you build clusters is
34:23
you seed them so you start with a
34:26
candidate object in a and a candidate
34:28
object in B and then you grow these
34:31
clusters say one object at a time okay
34:35
so you grow a and B and the growth in
34:41
this case continues until you have
34:46
assigned all of your rectangles alright
34:52
so you need a seeding strategy then you
34:56
need a growth strategy two popular
35:06
strategies to cluster for our trees one
35:11
is called the quadratic method the other
35:13
one we're going to look at is the linear
35:14
method okay
35:16
the quadratic method as the name
35:19
suggests going to run in square of M
35:21
time all right so I've got em plus one
35:29
rectangles that need to be partitioned
35:31
into two groups or two clusters so first
35:36
I'm going to find a seed okay so this is
35:39
the seed for cluster a that's the seed
35:42
for cluster B and the way I select the
35:46
seed is by maximizing this function okay
35:52
what does this function give you right
35:57
so here I've got the minimum bounding
35:59
rectangle of the two seeds okay now look
36:02
at the area of that and from that I
36:04
subtract the area of each of the seeds
36:07
okay so in some sense this measures the
36:10
penalty associated with
36:12
and be into the same cluster this is the
36:16
amount by which the area of this cluster
36:20
grows individually they only had this
36:23
much area but together you end up with
36:25
this much so this gives you the amount
36:27
of unused space in that MBR so it's a
36:30
penalty function okay so you find two
36:34
rectangles for which this penalty
36:36
function is maximum and then you start
36:38
by putting them into different clusters
36:44
all right so let's say this is my
36:45
collection of rectangles I want to find
36:47
two seeds if these are by a and B okay
36:54
then the MBR is this so the penalty
36:58
associated with putting these two
37:00
fellows in the same cluster at this time
37:02
is the white space in that rectangle
37:04
okay so the white space is MBR of the
37:08
area of the MBR minus the area of the
37:10
two blue boxes okay that's what that
37:12
penalty function has if you put those
37:19
two fellows in the same cluster then the
37:24
MBR is given by the red rectangle and
37:26
the penalty is the white space so that's
37:29
the area of the red rectangle minus the
37:31
area of the two blue ones okay so you
37:35
have more to lose by putting these two
37:37
fellows in the in the same cluster than
37:40
we had by putting the previous two okay
37:42
so we find two rectangles that have the
37:46
highest penalty if they were put into
37:48
the same cluster and we use those as
37:51
seeds for a and B all right so for this
37:55
you value try out all M square
37:57
possibilities and pick the one with the
38:00
maximum penalty right so once you've got
38:04
the seed then you look at your
38:09
rectangles and grow these clusters these
38:16
clusters have one rectangle each now
38:18
we're going to now grow them one at a
38:20
time and to grow
38:24
you select a rectangle that's not yet
38:26
assigned to a cluster it so examine all
38:30
the rectangles not yet assigned to a
38:32
cluster and I compute this function okay
38:36
this says if you take a rectangle C and
38:39
you put it into cluster a okay then the
38:43
penalty but how much does the area of
38:45
cluster a go up this is the new area of
38:47
cluster array that's the old area of
38:49
cluster Rea okay so if I put C into
38:52
cluster riots area would go up by this
38:54
much but if I put C into cluster B its
38:57
area would go up by that much okay so
39:00
this difference tells me how much of an
39:03
affinity the C have for a over B if this
39:07
difference is zero as far as C is
39:09
concerned doesn't matter where you put
39:10
it if this difference is huge either
39:13
positive or negative then C has a big
39:16
affinity either to go here or to go
39:18
there okay versus the other cluster okay
39:22
so we're going to pick a rectangle that
39:25
has a relative high affinity for another
39:27
clusters and then put it into that
39:30
cluster okay all right so let's say I've
39:35
seeded this is my cluster a that's my
39:38
cluster B the yellow one okay I look at
39:40
this rectangle if I put it into cluster
39:44
a the penalty is the white space
39:47
okay look at this bounding rectangle
39:49
area - this blue array of - that okay
39:54
sorry I'm looking at okay this is my new
39:59
MBR okay and then I look at the old NBR
40:02
is this one okay so subtract the area of
40:05
the new MBR and the old one so the
40:08
penalty is I guess really this whole
40:10
thing here what's that
40:13
okay so that's the cost of growing this
40:16
cluster by adding this rectangle okay if
40:20
you took this rectangle and put it in
40:22
there then your cost is the area of this
40:24
big rectangle - the yellow one okay so
40:27
you look at the difference between these
40:29
two costs and you see that this one has
40:32
a preference for being here versus being
40:35
out there and the difference in the cost
40:37
you how much of a preferences that have
40:39
of being here was is there okay and then
40:42
you try it out with another rectangle
40:44
okay in this guy's cost is this big area
40:49
- this blue one if this guy went over
40:52
there it would be this huge rectangle -
40:55
the yellow one okay so this guy also has
40:57
a preference for being here over there
41:00
but the relative preference is much more
41:02
for this guy than it was for that guy
41:05
okay so we would prefer at this round to
41:08
put this guy in here then to put that
41:10
guy in here at this time so you try this
41:14
out or all of them and figure out which
41:17
one has the highest relative preference
41:19
to be in one of the clusters and you put
41:21
it in okay all right so once you've
41:25
found the see that maximize their
41:26
objective function you put it into the
41:29
appropriate place and then you continue
41:33
this in each round one rectangle gets
41:36
assigned okay that's how you do this
41:38
point all right
41:46
there's a linear method over here you
41:53
look at normalized separation so if you
41:57
take two rectangles you measure the
42:00
separation in the x-direction which is
42:04
defined as the you take the largest left
42:09
edge and the smallest right edge of the
42:15
two rectangles and subtract the distance
42:18
okay so if you're looking at the
42:22
normalized separation this blue in that
42:24
yellow you look at the largest left edge
42:27
that's this one and the smaller right
42:29
edge and subtract the distance okay so
42:33
you find the rectangles which have the
42:35
maximum x separation and you divide that
42:38
by the spread in X values to normalize
42:42
you do the same thing in the Y direction
42:45
you look at the
42:50
at the largest bottom and subtract from
42:52
that the smaller top largest part of
42:56
mine is smaller top in this case you get
42:57
a negative number okay and here again
43:02
you would take the larger bottom minus
43:04
the smaller top so you get a positive
43:06
separation okay
43:09
and then you normalize by the y spread
43:12
the pair of rectangles that have the
43:15
maximum normalized separation are used
43:19
as the seeds okay all right so once
43:23
you've got the seeds then you take the
43:26
other rectangles in some random order
43:28
and you assign them to whichever
43:31
rectangle they have a preference
43:33
whichever partition okay and so that's
43:37
just done by looking at how much the MBR
43:40
goes up if you put it into a versus it
43:42
to be and so this process runs in linear
43:47
time okay so those are the two more
43:51
popular ways of doing the split right
43:57
finally we have the delete operation so
44:01
to do a delete you have to find the
44:05
rectangle so we use the query algorithm
44:08
to find the rectangle then you reach a
44:11
leaf you take the rectangle out if the
44:15
capacity hasn't gone below the minimum
44:18
you're okay for it's gone below the
44:19
minimum you need to do what you would do
44:25
when you're dealing with a B plus tree
44:27
you try to borrow from a neighbor and
44:32
when you borrow you have to readjust the
44:34
npr's and if you can't borrow you do a
44:40
combine and then you have to redo the
44:45
MBRs
44:50
so as far as delete goes there aren't
44:54
any new ideas really here though it's
44:57
probably worth mentioning that some
44:58
people are instead of just doing the B+
45:02
tree type of delete they use the delete
45:04
as an opportunity to reorganize the
45:06
whole structure or at least do a more
45:09
more than just this but try and do some
45:12
optimization on the petition inks but I
45:16
could get pretty expensive alright so
45:21
basically that's what an our tree is
45:24
it's an extension of a B+ tree used to
45:27
store objects and multi-dimensional
45:31
space you can perform queries searches
45:33
and inserts performance depends a lot on
45:37
how well these heuristics function in
45:40
terms of getting you partitioning x'
45:43
which don't require you to search too
45:45
many nodes the structure is used quite a
45:50
bit in robotic type applications you've
45:52
got all kinds of objects sitting inside
45:54
3-dimensional space and people are
45:56
moving around you need to be able to
45:57
tell who would you hit if you moved in a
46:00
certain direction say there are a number
46:04
of variants of this structure of time
46:07
maybe only just to mention the name so
46:09
I'm going to kind of skip the little
46:11
descriptions that might come with them
46:12
so it leaves familiar with the names
46:14
something called an R plus tree very
46:17
similar to and sorry in our start tree
46:20
there's a Hilbert artery they're quite
46:26
similar to the our tree that we've
46:27
looked at there's an hour-plus tree what
46:34
this one does is it doesn't allow
46:40
overlapping MBRs so if you don't allow
46:44
overlapping MBRs you have to accept
46:46
cutting objects into pieces so some
46:49
object may fall into two or more NVRs so
46:56
you cut objects into pieces and they
47:00
have to relax the upper and lower bounds
47:02
on
47:03
data stored in a node in order to make
47:05
the inserts and deletes somewhat
47:06
manageable so this structure can end up
47:10
in a degenerate case where you only have
47:14
a long chain of nodes because there's
47:16
really no enforcement of lower bound of
47:18
a node but again and practice you you
47:22
tend to get good balance even though
47:26
you're not enforcing the lower bound all
47:30
right finally there's something called a
47:31
Cell tree which combines ideas from the
47:34
binary space partitioning tree we looked
47:36
at last time and the art plus tree that
47:40
I very briefly mentioned a second ago
47:42
instead of using MBRs you use convex
47:46
polyhedra to divide the space out and
47:50
that reduces the amount of white space
47:52
in a region white space being the space
47:55
that's not occupied by objects in your
47:57
data set and so hopefully that reduces
48:00
the number of nodes you have to visit
48:01
all right again
48:09
as far as Leafs go they don't enforce
48:11
any lower upper bounds when you're not
48:16
going to allow overlap of MPR's
48:19
you really can't have an upper bound on
48:21
the capacity of a leaf node because you
48:24
may have a large number of rectangles
48:25
that overlap and cover the same point
48:31
but they do enforce a lower bound on the
48:35
degree of an index node and so you can't
48:37
get a degenerate version here all right
48:44
so that's about all I had to say on our
48:46
trees you have any questions No
48:50
okay well hopefully you're gonna have a
48:53
good final exam normally I'll call it a
48:55
final a good third exam okay all right
49:01
you