Transcript


00:05
you
00:14
okay all right you have any questions
00:18
you want to ask before we start talking
00:21
about dictionaries today or in the last
00:31
lecture we looked at a couple of
00:32
applications of dictionaries and we we
00:38
saw that that one way to represent a
00:43
dictionary was to make use of a binary
00:45
tree or a search tree in general and
00:48
when you went this route you were able
00:50
to perform operations in addition to the
00:53
basic dictionary operations of search
00:56
insert and delete and you could perform
00:58
these fairly efficiently more so than if
01:01
you use the hash table so we're going to
01:06
be spending the next several lectures
01:08
just looking at different tree
01:10
organizations for dictionaries and today
01:15
we're going to take a look at a kind of
01:18
a special kind of dictionary this one we
01:22
call a static dictionary this has the
01:26
property that once it's created you
01:29
don't really make any changes to it okay
01:32
so a static dictionary is a collection
01:34
of items each item is a pair pairs have
01:39
two components a key and an element
01:41
again we'll assume that all our pairs
01:45
have different keys the operations
01:50
supported by a static dictionary are you
01:55
need to create it so you're given
01:59
potentially a large number of pairs you
02:01
need to organize them into a structure
02:04
and once it's been organized then you
02:07
just perform such operations on it
02:10
you don't make any changes alright so
02:15
for example you you might take say a
02:17
physical dictionary of words you may
02:22
organize it into some data structure
02:25
store it on a CD and then sell it the
02:29
user can't make any changes to your
02:30
collection of words and meanings alright
02:36
so we're gonna take a look at a static
02:38
dictionary no changes are made after
02:40
it's created and we will make the
02:47
assumption that we have knowledge about
02:50
the frequency with which various pairs
02:54
are accessed by the user okay so in a
03:03
static environment such as this our
03:07
objective really is to come up with a
03:09
data structure which optimizes the
03:12
search operation okay we aren't terribly
03:15
concerned about how much time it takes
03:17
to initialize that's done only once so
03:20
assuming it's a reasonable amount of
03:22
time
03:22
we're reasonable could be hours could be
03:25
days but tablet run is into your okay so
03:28
assuming we can create our data
03:32
structure in a reasonable amount of time
03:35
there's a whole lot of payoff to making
03:37
this reasonable a whole lot less okay
03:40
you construct the dictionary once and
03:43
then potentially thousands of users use
03:47
it millions of times and you recover
03:49
whatever cost you had in constructing it
03:52
once from the multiple users and we look
04:02
only at a binary search tree
04:03
organization for our static dictionary
04:06
okay all right let's take a look at an
04:08
example suppose I have three pairs to
04:12
store and the keys are a B and C and
04:15
that I know that a is going to be
04:19
accessed with the probability of 0.8 and
04:22
B and C each going to be access to the
04:24
probability of point one then
04:31
a possible search tree organization will
04:34
be this one to get a binary search tree
04:36
with a B and C it's nice from the
04:39
standpoint that it's the smallest height
04:40
binary search tree you can create with
04:42
two with three elements if you look at
04:46
the cost of searching in this tree okay
04:52
if you look for a B you have to make one
04:56
comparison okay
04:57
so we'll use the number of comparisons
04:59
as the cost of the tree okay so when you
05:01
search for a B you make one comparison
05:04
okay when you search for an A you first
05:08
compare with B and then it's less than
05:10
B's you fall here and then you compare
05:13
with a and you get an exact match that's
05:14
two compares and when you search for a
05:17
see it's also two compares you first
05:19
compare against the be what you're
05:21
looking for is bigger than that you come
05:22
here and then there's a match with the C
05:25
okay so if you use the number of
05:27
compares as a rough estimate of the cost
05:30
of a search then searching for a B cost
05:34
you one searching for an A cost you two
05:36
and searching for a C cost you two okay
05:40
you search for an a with probability 0.8
05:43
so we multiplied the probability with
05:48
the cost so you get a point eight times
05:49
two cost here a point one times one over
05:54
there in a point one times two over
05:56
there
05:57
and you add that up that gives you the
05:59
expected cost over search okay so the
06:03
expected cost of a search using this
06:05
tree is 1.9 now if you look at the tree
06:16
on the right this is right skewed tree
06:19
and its height is one more than the
06:21
height of this one okay in that tree the
06:25
cost of searching for an A is one
06:26
searching for a B has a cost of two and
06:29
searching for a C as a constant of three
06:32
okay
06:33
the expected cost of a search in that
06:35
tree so you take the cost of searching
06:39
feature of the items multiplied by the
06:40
probability of searching for that item
06:42
add up you get a cost of one point
06:45
- all right so for these probabilities
06:53
of search the tree on the right is quite
06:57
a bit better than the tree on the left
07:03
right so our objective is given a
07:07
collection of pairs for which you are
07:09
searching we want to find the best
07:11
possible tree in which to house these
07:14
pairs all right so if you look at a
07:26
search in a general situation you can
07:29
perform two types of search the first
07:33
one we'll call a successful search and
07:36
that's when you're looking for one of
07:38
the keys that's in your dictionary okay
07:41
so in my three element dictionary or my
07:43
three pair dictionary a successful
07:45
search takes place when you're looking
07:47
for an A or B or a C right
07:53
a successful search terminates at an
07:57
internal node okay those are the yellow
08:02
nodes in that tree you can also have
08:08
unsuccessful searches so for example you
08:11
may be looking for something other than
08:13
a B or C okay so if you're looking for
08:17
something other than a B or C your
08:20
search will terminate at one of these
08:22
external nodes down here okay so for
08:27
example if you're looking for something
08:28
smaller than a you've compared with B
08:30
it's less than B it's less than a you'll
08:32
fall here okay
08:34
so unsuccessful searches terminate at an
08:37
external node and so an external node
08:40
may also be referred to as a failure
08:42
node you have a failure in the search
08:51
all right so an unsuccessful search
08:53
terminates at a failure note all right
08:56
so there are two types of searches
08:59
successful and unsuccessful and our
09:06
model for building a best search tree
09:11
will account for both kinds of searches
09:18
now we know from our earlier discussion
09:22
that if a binary tree has n nodes the
09:25
number of external nodes will be n plus
09:27
1 external nodes is one more than the
09:30
number of internal nodes if you look at
09:40
the internal nodes and in order so in
09:42
this case a b c then the keys are in
09:48
ascending order that's a property of a
09:50
binary search tree if you look at the
09:52
keys or if you look at the nodes in
09:55
inorder the contents are in ascending
09:57
order of key we also define a couple of
10:04
boundary keys okay a minus infinity in a
10:07
plus infinity
10:08
at the two ends okay if you look at the
10:15
external nodes in the binary search tree
10:19
also in in order in this case there will
10:23
be F 0 through F 3 left-to-right then
10:30
what you'll see is that the failure node
10:32
numbered I is reached during a search
10:36
exactly when you're looking for
10:38
something that lies between the key in
10:41
the eyath internal node and the I plus
10:46
first internal node okay so for example
10:52
here you reach this when you're looking
10:54
for something between minus infinity and
10:57
s1 that's a key you reach here when
11:02
you're looking for something between
11:03
the key and s1 and the key and s2 so
11:06
between a and B you reach here when
11:08
you're looking for something between B
11:09
and C and then you reach here when
11:11
you're looking for something between C
11:12
and infinity okay so there's a nice
11:17
mapping between when you reach a
11:20
particular external node during the
11:22
failure search and the range of keys you
11:24
must be searching for at that time okay
11:29
and that's given by this relationship
11:31
here
11:34
right any questions about that
11:43
okay so suppose we're given the
11:48
probability with which you search for
11:51
one of the keys that's in your
11:53
collection okay suppose you're also
11:59
given the probability with which you
12:02
search for something that lies between
12:03
two consecutive keys okay in other words
12:08
this is a probability of reaching the
12:10
eye of failure node okay all right so
12:17
we're going to assume you're given these
12:18
two quantities the probability of
12:21
reaching a particular internal node or
12:24
searching for a particular key that's in
12:27
the collection and also the probability
12:30
of reaching a particular failure node in
12:33
other words the probability that you're
12:35
going to be searching for something that
12:36
lies between two consecutive keys in the
12:39
collection so assume you're given these
12:45
and of course the sum of the peas and
12:49
the Q's has to be one because no other
12:50
outcomes are possible if you're given a
12:53
key you either have a successful search
12:55
order or an unsuccessful search this
12:59
accounts for all the successful searches
13:01
that accounts for all the unsuccessful
13:03
searches all right so the cost of a tree
13:11
okay if you look at that one as an
13:12
example this summation here is trying to
13:17
add up the costs of whenever you stopped
13:21
at an internal node so you have a
13:25
successful search okay so if you stop
13:29
here you're searching for an A then the
13:32
cost of that search is sorry this is the
13:40
queues so that's the cost of a failure
13:44
unsuccessful search okay so for example
13:46
if you come out here okay then this is
13:50
at level three roots level one level two
13:52
level three but the number of
13:54
comparisons you make to get to level
13:56
three is only two okay oh these nodes
14:00
are really not stored in the structure
14:01
you fall off the tree okay there's no
14:03
comparison done that so if you know the
14:07
level at which your external node lies
14:10
you subtract one from that that tells
14:12
you how many compares you made getting
14:14
to it okay so this is the cost of
14:17
getting to the failure node i it's level
14:19
number minus 1 and then you multiply it
14:22
with the probability of getting to that
14:23
failure node which is qi okay and you
14:27
sum that up across all of the failure
14:29
nodes that gives you the expected costs
14:33
of all of the failure nodes to that
14:38
we're going to add in the cost of the
14:42
internal nodes okay so for an internal
14:46
node if you're at level 2 it costs you
14:48
two compares to get there if you're at
14:50
level six in costly six compares so here
14:52
we don't subtract one from the level
14:54
number okay so take the probability of
14:58
reaching the eyuth internal node x by
15:02
its level and add up across all the
15:04
internal nodes all right so given a
15:08
binary search tree given the peas and
15:10
the Q's this formula here gives us the
15:14
expected cost of performing a search in
15:16
that tree does everybody agree with that
15:24
formula yeah
15:32
why they arranged for I okay the number
15:35
of external nodes is one more than the
15:37
number of internal nodes okay and so the
15:41
range for the external nodes is one more
15:44
than it is for the internals had any
15:49
other questions
16:01
all right this sum on the right is
16:04
called the weighted path length of the
16:06
binary tree several lectures ago we
16:15
looked at weighted external path length
16:17
when we were looking at Huffman trees so
16:20
there we were not looking at the cost of
16:23
internal nodes so it's like having all
16:25
of the P is equal to 0 and costs only
16:29
associated with the external nodes so
16:32
Huffman trees optimize the weighted
16:38
external path length and those can be
16:41
found fairly easy easily running a
16:43
greedy algorithm here we're interested
16:46
in optimizing or minimizing the weighted
16:49
path length so this is both the external
16:51
and internal path length added together
16:53
and this is a much harder problem you
17:02
know a greedy algorithm here doesn't
17:04
work you can try and formulate one and
17:06
you'll see it doesn't give you the best
17:08
possible search tree you could try a
17:12
brute force scheme where you generate
17:13
all possible binary trees that have n
17:16
internal nodes compute their cost and
17:19
pick the one with the best that runs in
17:29
something like 4 raise to n amount of
17:31
time and being the number of internal
17:33
nodes and so that's pretty impractical
17:39
unless your n is very very small
17:46
alright instead we're going to arrive at
17:48
an algorithm to find the best tree
17:49
employing a technique called dynamic
17:52
programming it doesn't really matter
17:55
whether you know what dynamic
17:56
programming is or not we aren't really
17:59
going to rely on any deep theories from
18:02
dynamic programming all right so we we
18:08
have a collection of n keys in ascending
18:11
order there are a 1 through a n we'll
18:18
use some terms like t IJ represents the
18:22
least cost tree for a subset of these
18:26
keys it's a contiguous subset
18:29
beginning at I plus 1 and going through
18:31
J ok so T IJ is the best tree for a
18:37
contiguous subset of the keys beginning
18:39
at I plus 1 and going through J so with
18:43
that definition the tree we're trying to
18:47
build is T 0 n ok the least cost tree
18:51
for all of the N keys ok all right so
19:00
this is what we want and this is kind of
19:03
a general statement an optimal tree for
19:07
a contiguous subset so for example this
19:19
binary search tree here could be an
19:23
optimal search tree for the keys a 3 4 5
19:29
6 & 7
19:37
now when I say it's an optimal tree for
19:40
eight three four five six and seven like
19:43
here okay so in terms of this notation
19:46
I'm talking about t27 okay so t2 seven
19:51
will be a three to a seven but what
19:55
hasn't been stated explicitly here is a
19:57
tree that has the elements a three to a
20:01
seven either five of these will also
20:03
have six external nodes and the external
20:08
nodes are going to be F two three four
20:12
five six and seven
20:14
okay so this external node in terms of
20:19
the whole picture is a node that you get
20:21
to when you're looking for something
20:22
between a two and a three but in this
20:26
context here you get to this fellow if
20:29
you're looking for anybody smaller than
20:30
a three okay similarly F seven in this
20:36
context you get to if you're looking for
20:38
anybody bigger than a seven but in the
20:41
whole picture you get to F seven when
20:43
you're looking for somebody between a
20:44
seven and a eight okay the other fellows
20:48
have the same meaning here as they do in
20:49
the overall tree now by definition when
20:57
you talk about t27 this tree is going to
21:01
have internal nodes for a three to a
21:03
seven its external nodes will be labeled
21:07
F two through seven the probabilities of
21:10
these are the same as the peas for the
21:13
corresponding keys so this is P three
21:15
here P 5 P 6 P 4 P seven the queues for
21:20
these guys are the same queues as before
21:23
so this is Q two even though the meaning
21:26
is a little bit different because you
21:27
get here whenever you're looking for
21:29
something less than a 3 but the queue
21:31
associated with this node is going to be
21:33
Q 2 and the queue here is Q 3 and F 7 s
21:38
Q is Q 7
21:43
so explicitly tij is the least-cost tree
21:49
for the case where you have the success
21:55
nodes corresponding to these guys and
21:57
the failure nodes would be the ones
22:00
beginning one below this and then going
22:03
up to FJ or any questions about that
22:18
alright so explicitly in terms of P's
22:21
and Q's T IJ has exactly this set of P's
22:23
and that set of cues in it again one
22:26
more Q then P alright more terminology
22:33
okay so CIJ is defined to be the cost of
22:38
T IJ okay if I go back to the previous
22:42
picture
22:50
so c-27 is the cost of this tree
22:54
assuming this is the best tree for those
22:56
P's and Q's
22:58
and so the way you've compute c-27 you'd
23:02
use the level numbers in this tree and
23:04
then the peas and the Q's multiply and
23:06
add all right so CIJ is we go through
23:20
the Q's okay the Q's are from a through
23:24
J the peas are from I plus 1 through J
23:27
okay so here when you're looking at the
23:30
level the level is with respect to that
23:32
tree the tree that we had before if
23:39
you're looking at c-27 we look at that
23:41
tree in isolation same thing for the
23:46
piece or an R IJ is the root of that
23:53
tree okay which key is sitting at the
23:57
root W IJ is the weight of that tree
24:01
okay so T IJ is the best tree for the
24:05
stated set of p's and q's sees the cost
24:09
of the tree
24:09
r is the root of the tree and W is the
24:11
weight of the tree the weight of the
24:18
tree is the sum of the peas and the Q's
24:20
on the tree
24:30
okay
24:37
now a strategy to build t0n is really
24:42
going to be to start with optimal trees
24:45
for a small number of keys so maybe
24:48
start with optimal trees for zero keys
24:50
and then build from that optimal trees
24:53
with one key optimal trees with two keys
24:55
three keys for keys and finally an
24:57
optimal tree with all n keys okay so
25:01
we're going to solve this larger problem
25:03
of n pairs of keys in a bunch of steps
25:07
each step getting us a collection of
25:10
optimal trees for one more pair than we
25:13
had in the previous step okay all right
25:18
so for the time being suppose that all
25:23
we want to do is we want to compute the
25:25
R's the W's and the C's and then later
25:28
we'll see how we can actually build the
25:30
tree once we know the hours okay all
25:35
right so time being I just want to
25:37
compute the RS the seeds and the double
25:39
use small starting with the small tree
25:44
and iteratively building up to this
25:46
large and pear tree so I'm going to
25:51
start first with the case I equals J
25:53
that's TI I yes
26:03
can we use a max-heap structure to build
26:06
an optimal cost tree well before we can
26:09
think about what data structure we are
26:10
going to use we need to have an
26:12
algorithm okay unless we have an
26:15
algorithm we have no idea as to what
26:18
kind of structure is going to be useful
26:19
at this point we don't have an algorithm
26:23
to build such a tree well again like I
26:51
said first you got to have an algorithm
26:52
all right so so maybe what you're
26:56
thinking of in your mind right now some
26:57
kind of a greedy algorithm okay which
27:00
says you're going to look at the
27:01
probabilities those are the high
27:02
probabilities maybe you're going to try
27:04
and move them up to the top of the tree
27:06
then you need to sit down and formulate
27:08
it into an algorithm and prove that it
27:12
actually works and once you have done
27:15
that then it makes sense to look at what
27:17
kind of data structures would you use to
27:18
implement that algorithm so whenever
27:24
you're trying to solve a problem long
27:26
before you start working on data
27:29
structures you work on the algorithm
27:31
what is the strategy that's going to
27:34
work establish that it actually works
27:35
and then go to the next step which is to
27:39
decide on the data structures okay and
27:41
so where we are with this problem right
27:44
now
27:44
is the first step coming up with an
27:46
algorithm that will get us the best tree
27:55
alright so the strategy that I'm looking
27:58
at and I haven't proved that I can make
28:00
it work I would have to prove is to
28:04
build my best tree as the result of
28:07
stages where in each stage I build a
28:11
tree that is best for one more item than
28:14
I had at the previous stage okay alright
28:18
so the starting point is going to be
28:21
having trees with zero elements in them
28:26
the best trees with no element and
28:31
that's the case I equals J in terms of
28:33
our tij notation okay because by
28:37
definition tij includes these peas and
28:40
those cues when I equals J the first
28:45
index here is bigger than the last index
28:47
which means there are no peas okay
28:50
but here the first index equals the last
28:53
so there's 1q okay so TI I represents a
28:57
tree with no internal node but one
29:00
external node okay so I'm going to start
29:05
with the optimal trees with no internal
29:08
node but one external node and then from
29:11
there go to optimal trees with one
29:14
internal node two external nodes two
29:16
internal nodes three external nodes and
29:18
eventually n internal nodes n plus 1
29:21
external nodes alright so tii includes
29:27
only qi and so it looks like this
29:32
there's an external node in it and
29:35
that's it
29:42
alright so the cost of such a node is
29:44
zero okay so the cost of such a tree no
29:48
comparisons are made when you search an
29:51
empty tree alright so it's fast as zero
29:56
it has no root external nodes kind of a
29:59
fictitious thing it does have a weight
30:03
because the weight is the sum of the
30:04
peas and the Q's and in this case
30:06
there's only a Q so its weight is Qi
30:15
okay
30:16
all right so so as far as optimal trees
30:19
with no pairs are concerned we can
30:24
compute the Seas and the arse and w's
30:26
for those quite easily okay or any
30:34
question about these numbers here for
30:37
the C's the hours and the W's all right
30:48
so next let's look at the case where you
30:50
have at least one internal node all
30:59
right so I've got at least one P in
31:01
there and if you have at least one P we
31:05
know that there is going to be a root
31:08
okay and at the root you'll have one of
31:11
the keys okay the keys in the tree are
31:15
numbered from I plus 1 through J so one
31:17
of the fellows between I plus 1 through
31:19
J is going to be sitting at the root I
31:23
don't really know which one okay for the
31:27
time being let me just say it's K okay
31:32
that a case sitting at the root I don't
31:34
know the value of K I know that it lies
31:36
in this range okay later I'll worry
31:39
about how to figure out what the value
31:41
of K is but for the time being
31:43
let's just suppose what's at the root is
31:45
a cake
31:50
so your optimal tree tij looks like this
31:53
aks at the root is got a left subtree
31:56
and it's got a right subtree now since
32:07
this is a binary search tree I know that
32:10
in the left subtree you're going to have
32:13
all of the fellows smaller than a K and
32:15
in the right subtree going to have
32:17
everybody bigger than a K so in terms of
32:20
the peas okay the whole tree together
32:23
has about all of these peas PK is out
32:26
here so P I plus 1 to P K minus 1 has to
32:30
be in the left subtree and then along
32:34
with that you get the corresponding Q's
32:36
the Q's are going to be from Qi up to
32:39
the Q that corresponds to values smaller
32:41
than a K so that's Q K minus 1 the left
32:47
subtree contains these P's and Q's and
32:49
the right subtree will have the rest of
32:52
them except for this fellow okay so the
32:56
right subtree will have P k plus 1 up to
32:59
PJ and then it's going to have the
33:03
corresponding Q's the first Q failure
33:06
note in here will be 4 keys that are
33:08
bigger than a K less than a k plus 1 so
33:12
that's F K so you've got Q K through Q J
33:20
all right so once you know what's at the
33:22
root and of course we don't we're only
33:24
postulated it's a K I don't know what K
33:26
is
33:26
I can draw these two inferences L
33:30
contains this collection of P's and Q's
33:33
and R contains that collection of P's
33:36
and Q's all right any questions about
33:41
this conclusion
33:54
well said I'm going to define something
33:57
called castel cost of the left subtree
33:59
okay so this is the weighted path length
34:03
of the left subtree viewed in isolation
34:07
so I pull it out forget about the rest
34:10
of the tree its parent and everything
34:13
else and I compute my formula for cost
34:17
where the root out here is at level one
34:22
and then working yourself down alright
34:32
so let's look at the contribution of L
34:39
to the cost of this whole tree tij all
34:49
right so CIJ is given by this sum when
34:58
you view L in isolation the root is at
35:02
level one but when you put it back in
35:05
here the root becomes level 2 with
35:08
respect to that tree okay so the level
35:11
numbers of all of the nodes and L are
35:13
one bigger here than they are here
35:17
castel is defined with respect to this
35:20
thing okay
35:22
and so if you know what cost L is and
35:25
you want to figure out what is the
35:26
contribution of this tree to the CIJ for
35:30
that whole thing and there's a little
35:32
mismatch because in Costello you use
35:34
level numbers it were one less than what
35:36
you need to use out there but you can
35:42
remedy that if all the level numbers go
35:47
up by one then the sum is going up by
35:51
the sum of the peas and the Q's that's
35:54
the weight of the tree
35:58
so the contribution of Elle when it's
36:02
viewed as a subtree of a que is cost of
36:06
Elle when it's viewed in isolation plus
36:09
the weight of Elle okay the weight of L
36:15
is the sum of the P's and Q's and L and
36:18
that's WI K minus 1 all right so the
36:33
cost of ally to CIJ is Castelo plus WI K
36:36
minus 1 similarly the cost of our to CIJ
36:40
is cost our when viewed in isolation
36:42
plus the weight of the tree are so now
36:51
the cost of T IJ which is CIJ is the
36:55
contribution of L plus the contribution
36:57
of R plus the contribution of the root
37:01
and the contribution of the root is just
37:04
PK if you look at these three terms the
37:14
weight of L weight of R + PK so weight
37:19
of this plus weight of that what's this
37:21
that's really the weight of the whole
37:23
tree sum of all the P's and Q's in the
37:25
whole tree okay so I can take those
37:27
three terms out and replace them with W
37:30
IJ
37:38
all right so I get this formula here
37:50
now I don't really know what Costello is
37:53
because I don't know the shape of L okay
37:57
so even if I knew a K I don't know the
38:00
shape of elf to compute cost L what I'm
38:06
going to assert is that cost L is in
38:08
fact the cost of the best tree T I K
38:12
minus one the best tree for the P's and
38:14
Q's in L okay suppose that this is not
38:21
the best tree for the P's and Q's in it
38:24
then you can take this L throw it out
38:27
and bring in a better tree okay somebody
38:30
with smaller cost if you bring in
38:32
somebody to smaller cost so this number
38:34
goes down this one doesn't change that
38:37
doesn't change so I've got a better tree
38:40
tij okay so if T IJ in is in fact
38:44
optimal and there should be no way to
38:47
reduce its cost and so this has to be
38:51
optimal for its P's and Q's if it's
38:55
optimal for its P's and Q's then castel
38:57
must be CI K minus one this by
39:01
definition TI K minus one is the best
39:04
tree for those P's and Q's the same
39:11
reasoning allows us to conclude that
39:13
cost R has to be ck J because R has to
39:19
be the best tree for that set of P's and
39:21
Q's otherwise T IJ cannot be the best
39:25
for the entire set of P's and Q's
39:28
all right so both Ln R have to be best
39:31
for the P's and Q's they have and so I
39:33
can replace these things with these
39:35
other numbers
39:49
all right so I've got rid of these
39:51
fellows that I dentally know how to
39:52
handle now I've got them got things back
39:55
in terms of things I was planning to
39:57
compute the C's w's and ours ours don't
40:00
show up here yet I can't use this
40:06
equation in its present form because I
40:08
don't know what K is what I do know is
40:16
that K lies in a certain range and that
40:18
range is known to me K has to be able to
40:21
and I plus 1 and J ok since T IJ is
40:29
supposed to be the best tree okay
40:32
for its given set of P's and Q's what
40:34
you can conclude is that if you try it
40:36
out all possible choices of K and you
40:39
computed this right side for all of them
40:41
then when you have the correct K this
40:45
right side should be smallest so since
40:49
there are many possibilities for K well
40:52
what we just try out all of them
40:53
and from amongst those possibilities
40:55
pick the one that gives me the smallest
40:57
value of the right side that has to
41:00
define the left side okay and that's
41:05
exactly what we're going to do all right
41:14
so this equation was valid under the
41:16
assumption you knew what K is but since
41:21
I don't know what K is
41:23
I say well try out all of them okay and
41:28
this left side has to be equal to the
41:32
right side for some value of K some
41:34
valid value of K and it's got to be for
41:37
the one for which that right side
41:39
becomes minimum so we get from here we
41:44
get this equation
41:49
let me think the next one has it at a
41:51
higher level okay all right this one
41:56
differs a little bit from the version I
41:58
had in the previous slide I've taken the
42:00
W IJ and moved it out the W IJ doesn't
42:05
change as you change the value of K so
42:07
there's no point keeping it inside the
42:09
mentor all right so you go through all
42:16
choices for K compute this find out the
42:20
one that gives you the min now if you
42:29
look at this equation here on the right
42:35
side I'm looking at the cost of a tree
42:37
that has some number of pairs and that's
42:40
defined in terms of the cost of the best
42:43
trees for some other number of pairs on
42:46
the right side in order for us to be
42:50
able to use this equation the number of
42:54
pairs involved here in these trees and
42:55
those ones should be smaller than what's
42:58
involved here then I can start with
43:01
trees that have 0 pairs and go to trees
43:03
with one two three and so on if we can
43:07
verify that's the case because this is
43:10
representing the best tree for L the
43:14
tree for L cannot include this P so at
43:20
worst this tree has one less P than the
43:23
whole tree here because this fellow a K
43:26
is out of the picture
43:27
it may have far fewer peas in here than
43:31
1 less than the whole thing because you
43:34
may have many fellows in here because
43:37
you could have 4 here 8 here and one
43:39
there or you could have 1 here and 12
43:42
here and nothing there okay but you're
43:44
guaranteed that this follows out and so
43:46
this tree has one less P at least than
43:48
that guy same is true on the right side
43:51
okay so I've got a well-defined
43:56
recurrence the trees on the Left involve
44:00
more number of elements than the
44:03
trees on the right and so if your
44:04
computed by increasing the size of the
44:06
tree in terms of number of elements or
44:08
pairs you would have always computed the
44:11
right side terms when you're trying to
44:13
compute a left side term now we'll also
44:21
keep track of an R from here we know we
44:26
don't know the roots okay is our thing
44:28
which is trying out all possible roots
44:30
and the fella that minimizes it is the
44:32
root so R IJ is the K that minimizes
44:35
these stuff on the right so that tells
44:38
me what the root is okay so that's how I
44:44
get the roots okay what I really want to
44:53
compute a si0n and our 0n c 0 n is the
44:56
cost of the best tree for all of the n
44:59
pairs and our 0 n is the root of that
45:02
tree the strategies were to start with
45:08
this these are the trees that have 0
45:11
pairs in them then I'll go to I'll use
45:21
that equation that I just developed to
45:23
compute the CII plus ones okay these
45:26
fellows have one pair in them okay and
45:29
the corresponding roots okay so all
45:33
trees comprised a one pair then I'll get
45:39
the trees comprised of two pairs so
45:41
those are given as CI I plus two and
45:45
their roots and then we'll go to trees
45:52
with three pairs in them so those are CI
45:54
I plus 3 and their roots
46:00
and you'll keep doing this until you get
46:03
c0 yeah okay so we start with I equals J
46:08
then J is one more than I then J is 2
46:12
more than I then 3 more than I than four
46:14
more and in the last cycle J's
46:16
n more than I will make use of a of an
46:29
array and actually only the upper
46:31
triangle of the array will get used
46:32
because J is always at least I okay so
46:37
I'll start by setting the diagonal terms
46:39
so those are the in there I can save the
46:42
CII is our eye eyes that's the initial
46:48
configuration then we'll go and get the
46:51
I I plus 1 terms those are on that
46:54
diagonal then I'll get the I I plus 2
46:58
terms there on that diagonal the I I
47:01
plus 3 terms and then the I I plus 4
47:05
terms okay so this would be for a case
47:07
of four four keys
47:19
all right so that's the sequence in
47:22
which we'll compute the C's in the arse
47:26
the total time needed to compute 1c you
47:30
have to take the men of a bunch of items
47:32
assuming the right side is already known
47:35
then there are at most n fellows
47:38
involved here so it can take you at most
47:42
n units of time to compute 1c the number
47:47
of czi to compute is about n square
47:48
really N squared over 2 you'll need to
47:50
the upper triangle so it takes you n
47:54
cubed time to compute all of the Seas
47:56
and the ARS you can cut the time down to
48:02
n square by changing the search to this
48:06
range the proof that that range is
48:11
adequate is is a bit long and we're not
48:14
going to go through it you're not
48:17
expected to be able to prove it but you
48:21
should be able to see that if you use
48:22
this range the complexity will drop to n
48:24
square if you just add up the terms
48:26
you'll find that all the ARS will kind
48:29
of cancel out once you have computed the
48:37
ARS
48:38
ok we can build the tree at this point
48:42
we don't even need to know the C's to
48:46
build the tree we know that R 0 n is the
48:48
root of the overall tree ok suppose R 0
48:52
n is 10 then we know the tree looks like
48:55
this 8 n is out there this is T 0 9 and
48:58
that's T 10 9 okay well the root of this
49:03
fella is going to be R 0 9 the root of
49:06
that guy is going to be our 10 9 sorry
49:08
our yes 0 9 and our 10 and then you know
49:12
what the left and right subtrees and
49:13
what their roots are okay and this
49:16
process is best described recursively
49:17
which says R 0 n is the root here this
49:23
is T 0 once you know the root so this if
49:27
it stands even with T 0 9 recursively
49:29
build t09 and then
49:31
cousin we build p10 and takes one unit
49:38
of time to identify each route total of
49:41
n roots so in order n time you can build
49:43
the tree once you have computed the ARS
49:48
all right so you can build the best
49:51
search tree to use for a static
49:53
dictionary in n cubed time the proof we
49:57
have only shows it for n cube but you
50:01
can do it in n square if you use the
50:03
reduced range that I put up or any
50:08
questions ok we'll stop here