Transcript 00:05 you 00:14 okay all right you have any questions 00:18 you want to ask before we start talking 00:21 about dictionaries today or in the last 00:31 lecture we looked at a couple of 00:32 applications of dictionaries and we we 00:38 saw that that one way to represent a 00:43 dictionary was to make use of a binary 00:45 tree or a search tree in general and 00:48 when you went this route you were able 00:50 to perform operations in addition to the 00:53 basic dictionary operations of search 00:56 insert and delete and you could perform 00:58 these fairly efficiently more so than if 01:01 you use the hash table so we're going to 01:06 be spending the next several lectures 01:08 just looking at different tree 01:10 organizations for dictionaries and today 01:15 we're going to take a look at a kind of 01:18 a special kind of dictionary this one we 01:22 call a static dictionary this has the 01:26 property that once it's created you 01:29 don't really make any changes to it okay 01:32 so a static dictionary is a collection 01:34 of items each item is a pair pairs have 01:39 two components a key and an element 01:41 again we'll assume that all our pairs 01:45 have different keys the operations 01:50 supported by a static dictionary are you 01:55 need to create it so you're given 01:59 potentially a large number of pairs you 02:01 need to organize them into a structure 02:04 and once it's been organized then you 02:07 just perform such operations on it 02:10 you don't make any changes alright so 02:15 for example you you might take say a 02:17 physical dictionary of words you may 02:22 organize it into some data structure 02:25 store it on a CD and then sell it the 02:29 user can't make any changes to your 02:30 collection of words and meanings alright 02:36 so we're gonna take a look at a static 02:38 dictionary no changes are made after 02:40 it's created and we will make the 02:47 assumption that we have knowledge about 02:50 the frequency with which various pairs 02:54 are accessed by the user okay so in a 03:03 static environment such as this our 03:07 objective really is to come up with a 03:09 data structure which optimizes the 03:12 search operation okay we aren't terribly 03:15 concerned about how much time it takes 03:17 to initialize that's done only once so 03:20 assuming it's a reasonable amount of 03:22 time 03:22 we're reasonable could be hours could be 03:25 days but tablet run is into your okay so 03:28 assuming we can create our data 03:32 structure in a reasonable amount of time 03:35 there's a whole lot of payoff to making 03:37 this reasonable a whole lot less okay 03:40 you construct the dictionary once and 03:43 then potentially thousands of users use 03:47 it millions of times and you recover 03:49 whatever cost you had in constructing it 03:52 once from the multiple users and we look 04:02 only at a binary search tree 04:03 organization for our static dictionary 04:06 okay all right let's take a look at an 04:08 example suppose I have three pairs to 04:12 store and the keys are a B and C and 04:15 that I know that a is going to be 04:19 accessed with the probability of 0.8 and 04:22 B and C each going to be access to the 04:24 probability of point one then 04:31 a possible search tree organization will 04:34 be this one to get a binary search tree 04:36 with a B and C it's nice from the 04:39 standpoint that it's the smallest height 04:40 binary search tree you can create with 04:42 two with three elements if you look at 04:46 the cost of searching in this tree okay 04:52 if you look for a B you have to make one 04:56 comparison okay 04:57 so we'll use the number of comparisons 04:59 as the cost of the tree okay so when you 05:01 search for a B you make one comparison 05:04 okay when you search for an A you first 05:08 compare with B and then it's less than 05:10 B's you fall here and then you compare 05:13 with a and you get an exact match that's 05:14 two compares and when you search for a 05:17 see it's also two compares you first 05:19 compare against the be what you're 05:21 looking for is bigger than that you come 05:22 here and then there's a match with the C 05:25 okay so if you use the number of 05:27 compares as a rough estimate of the cost 05:30 of a search then searching for a B cost 05:34 you one searching for an A cost you two 05:36 and searching for a C cost you two okay 05:40 you search for an a with probability 0.8 05:43 so we multiplied the probability with 05:48 the cost so you get a point eight times 05:49 two cost here a point one times one over 05:54 there in a point one times two over 05:56 there 05:57 and you add that up that gives you the 05:59 expected cost over search okay so the 06:03 expected cost of a search using this 06:05 tree is 1.9 now if you look at the tree 06:16 on the right this is right skewed tree 06:19 and its height is one more than the 06:21 height of this one okay in that tree the 06:25 cost of searching for an A is one 06:26 searching for a B has a cost of two and 06:29 searching for a C as a constant of three 06:32 okay 06:33 the expected cost of a search in that 06:35 tree so you take the cost of searching 06:39 feature of the items multiplied by the 06:40 probability of searching for that item 06:42 add up you get a cost of one point 06:45 - all right so for these probabilities 06:53 of search the tree on the right is quite 06:57 a bit better than the tree on the left 07:03 right so our objective is given a 07:07 collection of pairs for which you are 07:09 searching we want to find the best 07:11 possible tree in which to house these 07:14 pairs all right so if you look at a 07:26 search in a general situation you can 07:29 perform two types of search the first 07:33 one we'll call a successful search and 07:36 that's when you're looking for one of 07:38 the keys that's in your dictionary okay 07:41 so in my three element dictionary or my 07:43 three pair dictionary a successful 07:45 search takes place when you're looking 07:47 for an A or B or a C right 07:53 a successful search terminates at an 07:57 internal node okay those are the yellow 08:02 nodes in that tree you can also have 08:08 unsuccessful searches so for example you 08:11 may be looking for something other than 08:13 a B or C okay so if you're looking for 08:17 something other than a B or C your 08:20 search will terminate at one of these 08:22 external nodes down here okay so for 08:27 example if you're looking for something 08:28 smaller than a you've compared with B 08:30 it's less than B it's less than a you'll 08:32 fall here okay 08:34 so unsuccessful searches terminate at an 08:37 external node and so an external node 08:40 may also be referred to as a failure 08:42 node you have a failure in the search 08:51 all right so an unsuccessful search 08:53 terminates at a failure note all right 08:56 so there are two types of searches 08:59 successful and unsuccessful and our 09:06 model for building a best search tree 09:11 will account for both kinds of searches 09:18 now we know from our earlier discussion 09:22 that if a binary tree has n nodes the 09:25 number of external nodes will be n plus 09:27 1 external nodes is one more than the 09:30 number of internal nodes if you look at 09:40 the internal nodes and in order so in 09:42 this case a b c then the keys are in 09:48 ascending order that's a property of a 09:50 binary search tree if you look at the 09:52 keys or if you look at the nodes in 09:55 inorder the contents are in ascending 09:57 order of key we also define a couple of 10:04 boundary keys okay a minus infinity in a 10:07 plus infinity 10:08 at the two ends okay if you look at the 10:15 external nodes in the binary search tree 10:19 also in in order in this case there will 10:23 be F 0 through F 3 left-to-right then 10:30 what you'll see is that the failure node 10:32 numbered I is reached during a search 10:36 exactly when you're looking for 10:38 something that lies between the key in 10:41 the eyath internal node and the I plus 10:46 first internal node okay so for example 10:52 here you reach this when you're looking 10:54 for something between minus infinity and 10:57 s1 that's a key you reach here when 11:02 you're looking for something between 11:03 the key and s1 and the key and s2 so 11:06 between a and B you reach here when 11:08 you're looking for something between B 11:09 and C and then you reach here when 11:11 you're looking for something between C 11:12 and infinity okay so there's a nice 11:17 mapping between when you reach a 11:20 particular external node during the 11:22 failure search and the range of keys you 11:24 must be searching for at that time okay 11:29 and that's given by this relationship 11:31 here 11:34 right any questions about that 11:43 okay so suppose we're given the 11:48 probability with which you search for 11:51 one of the keys that's in your 11:53 collection okay suppose you're also 11:59 given the probability with which you 12:02 search for something that lies between 12:03 two consecutive keys okay in other words 12:08 this is a probability of reaching the 12:10 eye of failure node okay all right so 12:17 we're going to assume you're given these 12:18 two quantities the probability of 12:21 reaching a particular internal node or 12:24 searching for a particular key that's in 12:27 the collection and also the probability 12:30 of reaching a particular failure node in 12:33 other words the probability that you're 12:35 going to be searching for something that 12:36 lies between two consecutive keys in the 12:39 collection so assume you're given these 12:45 and of course the sum of the peas and 12:49 the Q's has to be one because no other 12:50 outcomes are possible if you're given a 12:53 key you either have a successful search 12:55 order or an unsuccessful search this 12:59 accounts for all the successful searches 13:01 that accounts for all the unsuccessful 13:03 searches all right so the cost of a tree 13:11 okay if you look at that one as an 13:12 example this summation here is trying to 13:17 add up the costs of whenever you stopped 13:21 at an internal node so you have a 13:25 successful search okay so if you stop 13:29 here you're searching for an A then the 13:32 cost of that search is sorry this is the 13:40 queues so that's the cost of a failure 13:44 unsuccessful search okay so for example 13:46 if you come out here okay then this is 13:50 at level three roots level one level two 13:52 level three but the number of 13:54 comparisons you make to get to level 13:56 three is only two okay oh these nodes 14:00 are really not stored in the structure 14:01 you fall off the tree okay there's no 14:03 comparison done that so if you know the 14:07 level at which your external node lies 14:10 you subtract one from that that tells 14:12 you how many compares you made getting 14:14 to it okay so this is the cost of 14:17 getting to the failure node i it's level 14:19 number minus 1 and then you multiply it 14:22 with the probability of getting to that 14:23 failure node which is qi okay and you 14:27 sum that up across all of the failure 14:29 nodes that gives you the expected costs 14:33 of all of the failure nodes to that 14:38 we're going to add in the cost of the 14:42 internal nodes okay so for an internal 14:46 node if you're at level 2 it costs you 14:48 two compares to get there if you're at 14:50 level six in costly six compares so here 14:52 we don't subtract one from the level 14:54 number okay so take the probability of 14:58 reaching the eyuth internal node x by 15:02 its level and add up across all the 15:04 internal nodes all right so given a 15:08 binary search tree given the peas and 15:10 the Q's this formula here gives us the 15:14 expected cost of performing a search in 15:16 that tree does everybody agree with that 15:24 formula yeah 15:32 why they arranged for I okay the number 15:35 of external nodes is one more than the 15:37 number of internal nodes okay and so the 15:41 range for the external nodes is one more 15:44 than it is for the internals had any 15:49 other questions 16:01 all right this sum on the right is 16:04 called the weighted path length of the 16:06 binary tree several lectures ago we 16:15 looked at weighted external path length 16:17 when we were looking at Huffman trees so 16:20 there we were not looking at the cost of 16:23 internal nodes so it's like having all 16:25 of the P is equal to 0 and costs only 16:29 associated with the external nodes so 16:32 Huffman trees optimize the weighted 16:38 external path length and those can be 16:41 found fairly easy easily running a 16:43 greedy algorithm here we're interested 16:46 in optimizing or minimizing the weighted 16:49 path length so this is both the external 16:51 and internal path length added together 16:53 and this is a much harder problem you 17:02 know a greedy algorithm here doesn't 17:04 work you can try and formulate one and 17:06 you'll see it doesn't give you the best 17:08 possible search tree you could try a 17:12 brute force scheme where you generate 17:13 all possible binary trees that have n 17:16 internal nodes compute their cost and 17:19 pick the one with the best that runs in 17:29 something like 4 raise to n amount of 17:31 time and being the number of internal 17:33 nodes and so that's pretty impractical 17:39 unless your n is very very small 17:46 alright instead we're going to arrive at 17:48 an algorithm to find the best tree 17:49 employing a technique called dynamic 17:52 programming it doesn't really matter 17:55 whether you know what dynamic 17:56 programming is or not we aren't really 17:59 going to rely on any deep theories from 18:02 dynamic programming all right so we we 18:08 have a collection of n keys in ascending 18:11 order there are a 1 through a n we'll 18:18 use some terms like t IJ represents the 18:22 least cost tree for a subset of these 18:26 keys it's a contiguous subset 18:29 beginning at I plus 1 and going through 18:31 J ok so T IJ is the best tree for a 18:37 contiguous subset of the keys beginning 18:39 at I plus 1 and going through J so with 18:43 that definition the tree we're trying to 18:47 build is T 0 n ok the least cost tree 18:51 for all of the N keys ok all right so 19:00 this is what we want and this is kind of 19:03 a general statement an optimal tree for 19:07 a contiguous subset so for example this 19:19 binary search tree here could be an 19:23 optimal search tree for the keys a 3 4 5 19:29 6 & 7 19:37 now when I say it's an optimal tree for 19:40 eight three four five six and seven like 19:43 here okay so in terms of this notation 19:46 I'm talking about t27 okay so t2 seven 19:51 will be a three to a seven but what 19:55 hasn't been stated explicitly here is a 19:57 tree that has the elements a three to a 20:01 seven either five of these will also 20:03 have six external nodes and the external 20:08 nodes are going to be F two three four 20:12 five six and seven 20:14 okay so this external node in terms of 20:19 the whole picture is a node that you get 20:21 to when you're looking for something 20:22 between a two and a three but in this 20:26 context here you get to this fellow if 20:29 you're looking for anybody smaller than 20:30 a three okay similarly F seven in this 20:36 context you get to if you're looking for 20:38 anybody bigger than a seven but in the 20:41 whole picture you get to F seven when 20:43 you're looking for somebody between a 20:44 seven and a eight okay the other fellows 20:48 have the same meaning here as they do in 20:49 the overall tree now by definition when 20:57 you talk about t27 this tree is going to 21:01 have internal nodes for a three to a 21:03 seven its external nodes will be labeled 21:07 F two through seven the probabilities of 21:10 these are the same as the peas for the 21:13 corresponding keys so this is P three 21:15 here P 5 P 6 P 4 P seven the queues for 21:20 these guys are the same queues as before 21:23 so this is Q two even though the meaning 21:26 is a little bit different because you 21:27 get here whenever you're looking for 21:29 something less than a 3 but the queue 21:31 associated with this node is going to be 21:33 Q 2 and the queue here is Q 3 and F 7 s 21:38 Q is Q 7 21:43 so explicitly tij is the least-cost tree 21:49 for the case where you have the success 21:55 nodes corresponding to these guys and 21:57 the failure nodes would be the ones 22:00 beginning one below this and then going 22:03 up to FJ or any questions about that 22:18 alright so explicitly in terms of P's 22:21 and Q's T IJ has exactly this set of P's 22:23 and that set of cues in it again one 22:26 more Q then P alright more terminology 22:33 okay so CIJ is defined to be the cost of 22:38 T IJ okay if I go back to the previous 22:42 picture 22:50 so c-27 is the cost of this tree 22:54 assuming this is the best tree for those 22:56 P's and Q's 22:58 and so the way you've compute c-27 you'd 23:02 use the level numbers in this tree and 23:04 then the peas and the Q's multiply and 23:06 add all right so CIJ is we go through 23:20 the Q's okay the Q's are from a through 23:24 J the peas are from I plus 1 through J 23:27 okay so here when you're looking at the 23:30 level the level is with respect to that 23:32 tree the tree that we had before if 23:39 you're looking at c-27 we look at that 23:41 tree in isolation same thing for the 23:46 piece or an R IJ is the root of that 23:53 tree okay which key is sitting at the 23:57 root W IJ is the weight of that tree 24:01 okay so T IJ is the best tree for the 24:05 stated set of p's and q's sees the cost 24:09 of the tree 24:09 r is the root of the tree and W is the 24:11 weight of the tree the weight of the 24:18 tree is the sum of the peas and the Q's 24:20 on the tree 24:30 okay 24:37 now a strategy to build t0n is really 24:42 going to be to start with optimal trees 24:45 for a small number of keys so maybe 24:48 start with optimal trees for zero keys 24:50 and then build from that optimal trees 24:53 with one key optimal trees with two keys 24:55 three keys for keys and finally an 24:57 optimal tree with all n keys okay so 25:01 we're going to solve this larger problem 25:03 of n pairs of keys in a bunch of steps 25:07 each step getting us a collection of 25:10 optimal trees for one more pair than we 25:13 had in the previous step okay all right 25:18 so for the time being suppose that all 25:23 we want to do is we want to compute the 25:25 R's the W's and the C's and then later 25:28 we'll see how we can actually build the 25:30 tree once we know the hours okay all 25:35 right so time being I just want to 25:37 compute the RS the seeds and the double 25:39 use small starting with the small tree 25:44 and iteratively building up to this 25:46 large and pear tree so I'm going to 25:51 start first with the case I equals J 25:53 that's TI I yes 26:03 can we use a max-heap structure to build 26:06 an optimal cost tree well before we can 26:09 think about what data structure we are 26:10 going to use we need to have an 26:12 algorithm okay unless we have an 26:15 algorithm we have no idea as to what 26:18 kind of structure is going to be useful 26:19 at this point we don't have an algorithm 26:23 to build such a tree well again like I 26:51 said first you got to have an algorithm 26:52 all right so so maybe what you're 26:56 thinking of in your mind right now some 26:57 kind of a greedy algorithm okay which 27:00 says you're going to look at the 27:01 probabilities those are the high 27:02 probabilities maybe you're going to try 27:04 and move them up to the top of the tree 27:06 then you need to sit down and formulate 27:08 it into an algorithm and prove that it 27:12 actually works and once you have done 27:15 that then it makes sense to look at what 27:17 kind of data structures would you use to 27:18 implement that algorithm so whenever 27:24 you're trying to solve a problem long 27:26 before you start working on data 27:29 structures you work on the algorithm 27:31 what is the strategy that's going to 27:34 work establish that it actually works 27:35 and then go to the next step which is to 27:39 decide on the data structures okay and 27:41 so where we are with this problem right 27:44 now 27:44 is the first step coming up with an 27:46 algorithm that will get us the best tree 27:55 alright so the strategy that I'm looking 27:58 at and I haven't proved that I can make 28:00 it work I would have to prove is to 28:04 build my best tree as the result of 28:07 stages where in each stage I build a 28:11 tree that is best for one more item than 28:14 I had at the previous stage okay alright 28:18 so the starting point is going to be 28:21 having trees with zero elements in them 28:26 the best trees with no element and 28:31 that's the case I equals J in terms of 28:33 our tij notation okay because by 28:37 definition tij includes these peas and 28:40 those cues when I equals J the first 28:45 index here is bigger than the last index 28:47 which means there are no peas okay 28:50 but here the first index equals the last 28:53 so there's 1q okay so TI I represents a 28:57 tree with no internal node but one 29:00 external node okay so I'm going to start 29:05 with the optimal trees with no internal 29:08 node but one external node and then from 29:11 there go to optimal trees with one 29:14 internal node two external nodes two 29:16 internal nodes three external nodes and 29:18 eventually n internal nodes n plus 1 29:21 external nodes alright so tii includes 29:27 only qi and so it looks like this 29:32 there's an external node in it and 29:35 that's it 29:42 alright so the cost of such a node is 29:44 zero okay so the cost of such a tree no 29:48 comparisons are made when you search an 29:51 empty tree alright so it's fast as zero 29:56 it has no root external nodes kind of a 29:59 fictitious thing it does have a weight 30:03 because the weight is the sum of the 30:04 peas and the Q's and in this case 30:06 there's only a Q so its weight is Qi 30:15 okay 30:16 all right so so as far as optimal trees 30:19 with no pairs are concerned we can 30:24 compute the Seas and the arse and w's 30:26 for those quite easily okay or any 30:34 question about these numbers here for 30:37 the C's the hours and the W's all right 30:48 so next let's look at the case where you 30:50 have at least one internal node all 30:59 right so I've got at least one P in 31:01 there and if you have at least one P we 31:05 know that there is going to be a root 31:08 okay and at the root you'll have one of 31:11 the keys okay the keys in the tree are 31:15 numbered from I plus 1 through J so one 31:17 of the fellows between I plus 1 through 31:19 J is going to be sitting at the root I 31:23 don't really know which one okay for the 31:27 time being let me just say it's K okay 31:32 that a case sitting at the root I don't 31:34 know the value of K I know that it lies 31:36 in this range okay later I'll worry 31:39 about how to figure out what the value 31:41 of K is but for the time being 31:43 let's just suppose what's at the root is 31:45 a cake 31:50 so your optimal tree tij looks like this 31:53 aks at the root is got a left subtree 31:56 and it's got a right subtree now since 32:07 this is a binary search tree I know that 32:10 in the left subtree you're going to have 32:13 all of the fellows smaller than a K and 32:15 in the right subtree going to have 32:17 everybody bigger than a K so in terms of 32:20 the peas okay the whole tree together 32:23 has about all of these peas PK is out 32:26 here so P I plus 1 to P K minus 1 has to 32:30 be in the left subtree and then along 32:34 with that you get the corresponding Q's 32:36 the Q's are going to be from Qi up to 32:39 the Q that corresponds to values smaller 32:41 than a K so that's Q K minus 1 the left 32:47 subtree contains these P's and Q's and 32:49 the right subtree will have the rest of 32:52 them except for this fellow okay so the 32:56 right subtree will have P k plus 1 up to 32:59 PJ and then it's going to have the 33:03 corresponding Q's the first Q failure 33:06 note in here will be 4 keys that are 33:08 bigger than a K less than a k plus 1 so 33:12 that's F K so you've got Q K through Q J 33:20 all right so once you know what's at the 33:22 root and of course we don't we're only 33:24 postulated it's a K I don't know what K 33:26 is 33:26 I can draw these two inferences L 33:30 contains this collection of P's and Q's 33:33 and R contains that collection of P's 33:36 and Q's all right any questions about 33:41 this conclusion 33:54 well said I'm going to define something 33:57 called castel cost of the left subtree 33:59 okay so this is the weighted path length 34:03 of the left subtree viewed in isolation 34:07 so I pull it out forget about the rest 34:10 of the tree its parent and everything 34:13 else and I compute my formula for cost 34:17 where the root out here is at level one 34:22 and then working yourself down alright 34:32 so let's look at the contribution of L 34:39 to the cost of this whole tree tij all 34:49 right so CIJ is given by this sum when 34:58 you view L in isolation the root is at 35:02 level one but when you put it back in 35:05 here the root becomes level 2 with 35:08 respect to that tree okay so the level 35:11 numbers of all of the nodes and L are 35:13 one bigger here than they are here 35:17 castel is defined with respect to this 35:20 thing okay 35:22 and so if you know what cost L is and 35:25 you want to figure out what is the 35:26 contribution of this tree to the CIJ for 35:30 that whole thing and there's a little 35:32 mismatch because in Costello you use 35:34 level numbers it were one less than what 35:36 you need to use out there but you can 35:42 remedy that if all the level numbers go 35:47 up by one then the sum is going up by 35:51 the sum of the peas and the Q's that's 35:54 the weight of the tree 35:58 so the contribution of Elle when it's 36:02 viewed as a subtree of a que is cost of 36:06 Elle when it's viewed in isolation plus 36:09 the weight of Elle okay the weight of L 36:15 is the sum of the P's and Q's and L and 36:18 that's WI K minus 1 all right so the 36:33 cost of ally to CIJ is Castelo plus WI K 36:36 minus 1 similarly the cost of our to CIJ 36:40 is cost our when viewed in isolation 36:42 plus the weight of the tree are so now 36:51 the cost of T IJ which is CIJ is the 36:55 contribution of L plus the contribution 36:57 of R plus the contribution of the root 37:01 and the contribution of the root is just 37:04 PK if you look at these three terms the 37:14 weight of L weight of R + PK so weight 37:19 of this plus weight of that what's this 37:21 that's really the weight of the whole 37:23 tree sum of all the P's and Q's in the 37:25 whole tree okay so I can take those 37:27 three terms out and replace them with W 37:30 IJ 37:38 all right so I get this formula here 37:50 now I don't really know what Costello is 37:53 because I don't know the shape of L okay 37:57 so even if I knew a K I don't know the 38:00 shape of elf to compute cost L what I'm 38:06 going to assert is that cost L is in 38:08 fact the cost of the best tree T I K 38:12 minus one the best tree for the P's and 38:14 Q's in L okay suppose that this is not 38:21 the best tree for the P's and Q's in it 38:24 then you can take this L throw it out 38:27 and bring in a better tree okay somebody 38:30 with smaller cost if you bring in 38:32 somebody to smaller cost so this number 38:34 goes down this one doesn't change that 38:37 doesn't change so I've got a better tree 38:40 tij okay so if T IJ in is in fact 38:44 optimal and there should be no way to 38:47 reduce its cost and so this has to be 38:51 optimal for its P's and Q's if it's 38:55 optimal for its P's and Q's then castel 38:57 must be CI K minus one this by 39:01 definition TI K minus one is the best 39:04 tree for those P's and Q's the same 39:11 reasoning allows us to conclude that 39:13 cost R has to be ck J because R has to 39:19 be the best tree for that set of P's and 39:21 Q's otherwise T IJ cannot be the best 39:25 for the entire set of P's and Q's 39:28 all right so both Ln R have to be best 39:31 for the P's and Q's they have and so I 39:33 can replace these things with these 39:35 other numbers 39:49 all right so I've got rid of these 39:51 fellows that I dentally know how to 39:52 handle now I've got them got things back 39:55 in terms of things I was planning to 39:57 compute the C's w's and ours ours don't 40:00 show up here yet I can't use this 40:06 equation in its present form because I 40:08 don't know what K is what I do know is 40:16 that K lies in a certain range and that 40:18 range is known to me K has to be able to 40:21 and I plus 1 and J ok since T IJ is 40:29 supposed to be the best tree okay 40:32 for its given set of P's and Q's what 40:34 you can conclude is that if you try it 40:36 out all possible choices of K and you 40:39 computed this right side for all of them 40:41 then when you have the correct K this 40:45 right side should be smallest so since 40:49 there are many possibilities for K well 40:52 what we just try out all of them 40:53 and from amongst those possibilities 40:55 pick the one that gives me the smallest 40:57 value of the right side that has to 41:00 define the left side okay and that's 41:05 exactly what we're going to do all right 41:14 so this equation was valid under the 41:16 assumption you knew what K is but since 41:21 I don't know what K is 41:23 I say well try out all of them okay and 41:28 this left side has to be equal to the 41:32 right side for some value of K some 41:34 valid value of K and it's got to be for 41:37 the one for which that right side 41:39 becomes minimum so we get from here we 41:44 get this equation 41:49 let me think the next one has it at a 41:51 higher level okay all right this one 41:56 differs a little bit from the version I 41:58 had in the previous slide I've taken the 42:00 W IJ and moved it out the W IJ doesn't 42:05 change as you change the value of K so 42:07 there's no point keeping it inside the 42:09 mentor all right so you go through all 42:16 choices for K compute this find out the 42:20 one that gives you the min now if you 42:29 look at this equation here on the right 42:35 side I'm looking at the cost of a tree 42:37 that has some number of pairs and that's 42:40 defined in terms of the cost of the best 42:43 trees for some other number of pairs on 42:46 the right side in order for us to be 42:50 able to use this equation the number of 42:54 pairs involved here in these trees and 42:55 those ones should be smaller than what's 42:58 involved here then I can start with 43:01 trees that have 0 pairs and go to trees 43:03 with one two three and so on if we can 43:07 verify that's the case because this is 43:10 representing the best tree for L the 43:14 tree for L cannot include this P so at 43:20 worst this tree has one less P than the 43:23 whole tree here because this fellow a K 43:26 is out of the picture 43:27 it may have far fewer peas in here than 43:31 1 less than the whole thing because you 43:34 may have many fellows in here because 43:37 you could have 4 here 8 here and one 43:39 there or you could have 1 here and 12 43:42 here and nothing there okay but you're 43:44 guaranteed that this follows out and so 43:46 this tree has one less P at least than 43:48 that guy same is true on the right side 43:51 okay so I've got a well-defined 43:56 recurrence the trees on the Left involve 44:00 more number of elements than the 44:03 trees on the right and so if your 44:04 computed by increasing the size of the 44:06 tree in terms of number of elements or 44:08 pairs you would have always computed the 44:11 right side terms when you're trying to 44:13 compute a left side term now we'll also 44:21 keep track of an R from here we know we 44:26 don't know the roots okay is our thing 44:28 which is trying out all possible roots 44:30 and the fella that minimizes it is the 44:32 root so R IJ is the K that minimizes 44:35 these stuff on the right so that tells 44:38 me what the root is okay so that's how I 44:44 get the roots okay what I really want to 44:53 compute a si0n and our 0n c 0 n is the 44:56 cost of the best tree for all of the n 44:59 pairs and our 0 n is the root of that 45:02 tree the strategies were to start with 45:08 this these are the trees that have 0 45:11 pairs in them then I'll go to I'll use 45:21 that equation that I just developed to 45:23 compute the CII plus ones okay these 45:26 fellows have one pair in them okay and 45:29 the corresponding roots okay so all 45:33 trees comprised a one pair then I'll get 45:39 the trees comprised of two pairs so 45:41 those are given as CI I plus two and 45:45 their roots and then we'll go to trees 45:52 with three pairs in them so those are CI 45:54 I plus 3 and their roots 46:00 and you'll keep doing this until you get 46:03 c0 yeah okay so we start with I equals J 46:08 then J is one more than I then J is 2 46:12 more than I then 3 more than I than four 46:14 more and in the last cycle J's 46:16 n more than I will make use of a of an 46:29 array and actually only the upper 46:31 triangle of the array will get used 46:32 because J is always at least I okay so 46:37 I'll start by setting the diagonal terms 46:39 so those are the in there I can save the 46:42 CII is our eye eyes that's the initial 46:48 configuration then we'll go and get the 46:51 I I plus 1 terms those are on that 46:54 diagonal then I'll get the I I plus 2 46:58 terms there on that diagonal the I I 47:01 plus 3 terms and then the I I plus 4 47:05 terms okay so this would be for a case 47:07 of four four keys 47:19 all right so that's the sequence in 47:22 which we'll compute the C's in the arse 47:26 the total time needed to compute 1c you 47:30 have to take the men of a bunch of items 47:32 assuming the right side is already known 47:35 then there are at most n fellows 47:38 involved here so it can take you at most 47:42 n units of time to compute 1c the number 47:47 of czi to compute is about n square 47:48 really N squared over 2 you'll need to 47:50 the upper triangle so it takes you n 47:54 cubed time to compute all of the Seas 47:56 and the ARS you can cut the time down to 48:02 n square by changing the search to this 48:06 range the proof that that range is 48:11 adequate is is a bit long and we're not 48:14 going to go through it you're not 48:17 expected to be able to prove it but you 48:21 should be able to see that if you use 48:22 this range the complexity will drop to n 48:24 square if you just add up the terms 48:26 you'll find that all the ARS will kind 48:29 of cancel out once you have computed the 48:37 ARS 48:38 ok we can build the tree at this point 48:42 we don't even need to know the C's to 48:46 build the tree we know that R 0 n is the 48:48 root of the overall tree ok suppose R 0 48:52 n is 10 then we know the tree looks like 48:55 this 8 n is out there this is T 0 9 and 48:58 that's T 10 9 okay well the root of this 49:03 fella is going to be R 0 9 the root of 49:06 that guy is going to be our 10 9 sorry 49:08 our yes 0 9 and our 10 and then you know 49:12 what the left and right subtrees and 49:13 what their roots are okay and this 49:16 process is best described recursively 49:17 which says R 0 n is the root here this 49:23 is T 0 once you know the root so this if 49:27 it stands even with T 0 9 recursively 49:29 build t09 and then 49:31 cousin we build p10 and takes one unit 49:38 of time to identify each route total of 49:41 n roots so in order n time you can build 49:43 the tree once you have computed the ARS 49:48 all right so you can build the best 49:51 search tree to use for a static 49:53 dictionary in n cubed time the proof we 49:57 have only shows it for n cube but you 50:01 can do it in n square if you use the 50:03 reduced range that I put up or any 50:08 questions ok we'll stop here