Transcript 00:09 [Music] 00:12 okay you have any questions on past 00:15 material 00:20 okay right today we're going to take a 00:24 look at higher-order tries and this will 00:27 be our last lecture on the Tri data 00:30 structure so far we have looked only at 00:35 binary tries and as you might expect the 00:38 height of binary tries could be large 00:41 because you only doing 2-way branching 00:43 and when the height is large that in 00:47 turn means that when you do a search you 00:49 have the potential for a large number of 00:51 cache misses similarly when you do an 00:53 insert or delete by going to a higher 00:56 order try perhaps by performing let's 00:59 say 10 Way branching or 100 way 01:01 branching or thousand way branching we 01:03 can reduce the height and thereby also 01:06 reduce the number of cache misses that 01:09 take place and so improve performance 01:14 all right so let's take a look at an 01:19 example let's suppose we're building a 01:22 dictionary structure for an environment 01:27 where the keys are Social Security 01:28 numbers okay so we have a large number 01:30 of customers and they're being keyed in 01:32 by their social security number the 01:34 social security number we know is a nine 01:38 digit number so for example this could 01:42 be a potential social security number 01:44 for someone nine decimal digits now if 01:49 we did ten way branching which means 01:52 each branch node has a degree that is 01:54 ten then the branch nodes of our try 01:59 would look something like this if the 02:02 digit you looking at is a zero you go 02:04 into that subtree if it's a three you're 02:06 going to this subtype it's a mine you go 02:08 into that subtree okay so in ten way 02:11 branching we would examine the digits in 02:13 the key left-to-right using a base ten 02:16 decomposition of the key we could 02:21 certainly set up a binary try with the 02:22 same key then we'd do a two-way 02:25 branching using a base two decomposition 02:28 of the key so using a base ten 02:31 decomposition 02:32 at the root level weed branch on the 02:34 first decimal digit in this case of four 02:37 then at the next level we look at the 02:39 next decimal digit and so on okay so our 02:42 branch nodes would have ten child fields 02:46 in them so depending upon whether the 02:48 digit you're looking at is a zero or or 02:50 mine any one of these numbers you would 02:52 go into the appropriate subtree when you 02:58 reach a situation where a subtree has 03:02 only one key in it then you would place 03:05 an element node over there okay so we 03:10 could do ten way branching that would 03:12 give us an order ten try in this case 03:16 the height of our try could be as much 03:21 as 10 so we could be doing let's see so 03:29 if we have a 10 way try the height could 03:31 be as much as 10 which means you can 03:34 have nine levels of branch nodes for 03:38 each of the nine digits in our social 03:41 security key and then you have one level 03:44 of element nodes so if you're performing 03:47 a search we may end up looking at all 03:50 nine of the decimal digits before we end 03:53 up at an element node and so you could 03:57 move down ten levels of nodes which in 03:59 terms which in turn would mean that you 04:03 would have at most ten cache misses 04:05 during a search right so with 10 Way 04:10 branching we could have up to ten cache 04:13 misses you know to get a feel for why 04:21 does 10 and not Big Oak is this going to 04:24 be important when you look at higher 04:25 order trice let's go back at this 04:27 picture here okay all right so if you 04:30 look at this picture here if you think 04:33 about the node getting very large then 04:35 certainly you can't read the whole node 04:37 in with one cache miss it's not going to 04:39 fit into a cache line okay so if you 04:41 have to read the whole node in that 04:44 potentially could 04:46 involve many cache misses okay now the 04:49 nice thing about tries is that you don't 04:52 need the whole node to be read in okay 04:56 so for example if I know that my digit 04:59 is 4 then all I need is this pointer 05:03 okay and if this thing is set up right 05:07 for example now let's think of this as 05:08 an array okay then I don't need to get 05:13 position 0 through 9 of this array I 05:15 only need to get position 4 of the array 05:18 and I can get position 4 of the array 05:20 with a single cache mess regardless of 05:23 how large this array is okay so if the 05:28 note is set up as an array it doesn't 05:31 really matter how big the node is at 05:33 every level we only need a small part of 05:36 the node and we know which part and we 05:38 can get that part but having a single 05:41 cache miss okay 05:49 you know if we went to a hundredweight 05:51 Rai show the nodes are going to get 05:53 bigger each node will branch node will 05:55 have 100 pointers in it okay but with 05:59 the same mind digit P okay at the root 06:02 level we would branch using the first 06:05 two digits okay at the next level we 06:09 look at the next two digits then at 06:11 level three the next two at level four 06:13 those two and at level five we would 06:16 have only one digit to look at okay so 06:21 in the case of a hundredweight rai our 06:25 height would be at most six five levels 06:27 of branch nodes the first four would be 06:30 hundred way and the last one could only 06:32 be a ten way and then you would have an 06:36 element node also in the picture so you 06:40 could have up to five levels of branch 06:42 nodes on any path and then one element 06:45 node so you would do five up to five 06:51 branches on your way down followed by a 06:54 single compare at the end where you 06:56 would compare this mind digit key with 06:58 whatever sitting in the element node you 07:00 reach if you compare this with using say 07:08 some of the balanced structures we've 07:10 seen if you look at a red-black tree 07:14 well the height of a red-black tree if 07:17 you take this thing to an extreme when 07:19 you assume you've got all potential 07:23 billion social security numbers in use 07:29 okay so you've got ten to the nine 07:30 different numbers okay so if you assume 07:32 that all 10 to the nine are in use then 07:34 your height could be as much as two 07:36 times log of n which means you could 07:42 have 60 levels of nodes and what that 07:49 means is that to find what you're 07:51 looking for you could compare you could 07:53 perform up to 60 comparisons of nine 07:56 digit numbers using a try there is only 08:00 one comparison so it doesn't really 08:02 matter how big 08:03 the key is because you're only going to 08:04 do that compare once you do a lot of 08:06 digital but you can extract digits in 08:12 constant time doesn't really matter how 08:14 big the thing is that you're extracting 08:15 your digits from especially if you 08:18 choose your radix is right okay and so 08:24 here we're looking at doing maybe up to 08:27 60 compares of large numbers whereas if 08:32 you had a hundredweight try we would do 08:35 at most 5 branches followed by 1 compare 08:37 ok so far fewer cache misses and far 08:42 fewer compares if you used an AVL tree 08:48 then the height could be as much as 40 08:51 and that means up to 40 compares but it 08:59 also means up to 40 cache misses if you 09:05 use the best binary tree so that would 09:08 be a full binary tree or a complete 09:11 binary tree you're at about 30 levels 09:16 now certainly you can bring this number 09:18 down you can go to red black trees and 09:20 things but when you're talking about I'm 09:23 certain that red black but B trees okay 09:24 but when you're talking about B trees as 09:27 you get to larger nodes the number of 09:31 cache misses per node goes up because 09:33 you do have to read that whole node in 09:35 to figure out which sub tree to go into 09:37 okay unlike a try where you only need to 09:41 read in the particular pointer that you 09:43 need to move down in in the case of B 09:49 trees because you have to perform a 09:51 binary search inside the node to 09:53 determine which sub tree to move you 09:56 potentially need that whole node yeah 10:01 all right so so high degree tries have a 10:07 lot of potential for dictionary 10:11 applications it's a they represent a 10:15 very simple structure 10:16 there's really no active balancing 10:20 technique that's employed there and yet 10:23 they provide very good performance in 10:28 the right kind of application now in the 10:31 same social security application if we 10:33 went through and said well I don't 10:35 really have close to that billion but I 10:37 have only five hundred keys well even 10:41 with five hundred keys our try could 10:43 have the maximum possible height whereas 10:49 in the case of an AVL we're looking at 10:51 log of 500 base two so those structures 10:57 if you get small enough they will do a 10:59 lot better than I try would do you got 11:04 to have reasonable population of the 11:06 space now let's take a look at a 11:10 compressed social security try the note 11:17 structure that will employ assuming that 11:20 you have ten way branching a branch node 11:23 will have ten child pointers one for 11:26 each of the possible values for the 11:28 digit you're branching on you'll have a 11:31 character number that tells you which 11:34 character or digit of the key you are 11:36 branching on okay so this is equivalent 11:40 to the bit number field that was being 11:41 used in a compressed binary tree then we 11:47 have this field here that we didn't have 11:49 before which tells us the number of non 11:56 null pointers in the node okay this is 12:00 going to be useful when we perform a 12:02 deletion because a property of a 12:04 compressed try with its binary or 12:07 otherwise is that each branch node must 12:11 have at least two non null pointers in 12:13 it okay so using this field here we can 12:18 keep track of how many non null pointers 12:21 there are and every time you remove one 12:22 we can decrease this count by one every 12:25 time you add in one you can increase 12:26 this count by one and when the count 12:28 drops below two 12:29 it's time to get rid of the branch node 12:33 okay so the structure is somewhat 12:37 similar to what's used in compressed 12:38 binary tries except that here we keep in 12:41 track of the number of pointers to avoid 12:43 having to count how many non null 12:45 pointers there are every time you do a 12:47 delete now let's take a look at the 12:52 insert and delete operations as we'll 12:56 see these are just a natural extension 12:58 of the ones discussed earlier for 13:00 compressed binary tries I suppose we 13:06 start with this particular key and we're 13:11 going to build a 10-way compressed try 13:17 so we start with an empty try we're 13:20 going to insert this key in it we're 13:22 going to end up with one element node 13:25 again we have the same relationship here 13:30 with respect to well I guess that's 13:34 really not the same but if you have know 13:40 if you have only one element then there 13:43 will be no branch node the requirement 13:46 is that every branch node must be the 13:49 root of a try that has at least two 13:52 elements in it of course a branch node 13:56 could be the root of a try that has many 13:57 elements in it okay so for example if 14:01 you had ten elements it's possible that 14:05 you have only one branch node because 14:08 the branch node has ten pointers and 14:10 each of these ten pointers could go to 14:11 an element node all right so here we're 14:16 starting with an empty try you get one 14:18 element and we just create a try that 14:23 has a single element node in it which 14:24 contains the key and of course 14:27 associated with the key they will 14:29 generally be other values too but here 14:32 we're going to show only the key all 14:36 right now let's suppose we want to 14:38 insert this well then you're first going 14:40 to search there try to see if this is 14:42 already there 14:43 so when you perform the search in this 14:45 case we end up at this element node when 14:50 you end up at an element node you 14:51 compare the two to make sure they're 14:53 different since they're different you 14:55 sample these two from left to right 14:57 finding the first digit where they 14:59 differ so in this case the first digit 15:01 is number three that's where this one's 15:04 got a two that one's got a five so I'm 15:06 going to introduce a branch node where 15:09 the branching is done on character or 15:12 digit three if it's a two you go down 15:15 here if it's a five you go down there 15:17 all right the picture doesn't show the 15:22 null pointers we actually have eight 15:24 null pointers in this node right next 15:32 let's suppose we're going to insert this 15:34 particular key okay again you perform a 15:38 search you need to first look at digit 15:40 three so you extract digit three which 15:42 is a five so you branch on that you end 15:45 up at an element node you compare the 15:47 two they're different you find the first 15:50 digit where they differ so you scan left 15:53 to right so one one one five zero one 15:57 five two three agreed that this were out 15:59 here 16:00 this is digit number six okay all right 16:08 so having found out where they differ 16:09 you then need to know where to place the 16:11 appropriate branch node as was the case 16:13 for binary tries the digit numbers has 16:17 to increase on the way down okay so 16:19 you're going to place the new branch 16:20 node on the search path in such a manner 16:24 that these digit numbers are increasing 16:27 so you're going to have to place it over 16:28 here okay so I break that link put in a 16:33 branch node on digit 6 and if digit 6 is 16:37 a 1 you end up here if it's a 4 you end 16:39 up down there 16:45 all right the next insert it's going to 16:47 be for a fellow with this key okay you 16:50 perform a search you first look at digit 16:52 3 which is a 9 ok so you're going to 16:59 branch on 9 9 is a null ok so you find 17:03 nothing when you end up at a null 17:09 position from wherever you fall off you 17:14 back up to the appropriate branch nerve 17:16 which is the branch node I fell off off 17:18 and you follow any path downwards trying 17:21 to find an element ok it doesn't matter 17:23 which element you find ok because all of 17:26 those elements agree on the first so 17:28 many bits ok so from here we find a path 17:33 going downwards and let's suppose you 17:40 end up over here okay well then you 17:44 sample these two ok it doesn't matter 17:46 which one you get to we know that when 17:49 you sample the first place they differ 17:51 is going to be smaller than the digit 17:56 number where you fell off because you 17:58 had to know out there ok all right so we 18:01 sample and in this case you find that 18:03 this differs from that at position 2 ok 18:07 this is a 7 that's a 1 ok and that a 18:10 little bit same would be true if you'd 18:11 come here or that ok so you're going to 18:14 have to branch on character or digit 2 18:16 and that's going to happen before you 18:19 reach this node here so you have to put 18:20 up a node over there ok so you do a 18:24 digit 2 branching 18:37 alright next let's suppose we get this 18:39 fellow okay so you perform a search you 18:42 first look at digit 2 which is a 1 so 18:44 you end up here then you look at digit 3 18:47 which is a 2 so you end up here okay you 18:50 compare the two they're different find 18:52 the first place where they differ so 18:54 they agree on 0 1 2 3 4 5 6 and they 18:57 differ out here and that's number 8 ok 19:03 digit 8 so we're going to put a digit 8 19:07 branch on this path and it has to be 19:10 done so that they're in ascending order 19:12 so you'll need to put it here by 19:16 breaking this link so we get a digit 8 19:19 branch here well we do yet another one 19:25 so we insert this fellow do a search you 19:29 first look at digit 2 which is a 1 so 19:31 you come here you look at digit 3 which 19:33 is a 1 so you fall off the one pointers 19:40 are null so at this point you find 19:45 anybody in this search tree here ok so 19:51 all of them agree on the first two 19:54 digits which is 0 and 1 so let's say you 19:57 find somebody in this subtree maybe you 20:00 find this guy here okay you compare and 20:04 the place where you differ is digit 3 so 20:09 you got a 2 there and here you got a 1 20:14 okay so I'm going to be putting in a 1 20:17 branch over there 20:26 whenever you fall off at a null we know 20:29 that you differ at least on the digit 20:33 represented here okay and possibly on 20:37 something earlier 20:38 so you so the first place where you 20:39 differ cannot be something bigger than 20:41 where you fell off it either has to be 20:43 the digit number where you fell off or 20:46 something 20:46 Olivia okay okay so the insert is it's 20:56 pretty much the way you would do it for 20:58 a compressed binary try of course when 21:04 you do the insert you have one added 21:07 operation you have to keep track of the 21:09 number of pointers so whenever you 21:11 change a null pointer to not know you 21:13 need to bump up the number of pointer 21:15 counter also get it delete okay again 21:22 delete is similar to how it would work 21:25 for a compressed binary try you find 21:28 what you're looking for in an element 21:30 node you throw away the element node so 21:34 for example if you want to delete this 21:36 fellow you perform a search and you find 21:39 it here okay so we're going to throw 21:42 this node away which means we're going 21:44 to lose this pointer okay and the case 21:48 of a compressed binary try when you lost 21:50 a pointer you know that the parent 21:52 became deficient it has only one non 21:55 null pointer okay that's not necessarily 21:57 true here in fact here we still have two 22:00 non null pointers okay so to distinguish 22:04 we have that pointer field which others 22:07 how many non null pointers there are 22:08 previously that field had the number 22:10 three in it now we drop it down to two 22:13 which means this node survives okay so 22:16 in this case we just throw this node 22:19 away and change that pointer to no and 22:22 decrement the count 22:28 all right if you're deleting this fellow 22:30 you look for it and you find it here 22:34 okay so we get rid of that element node 22:37 a non null pointer changes to now the 22:42 count for non null pointers drops from 22:45 two to one which tells us that this node 22:50 should also be dispensed with 22:52 okay so we're going to get rid of this 22:54 node and just change the pointer from 22:56 the parent to come here you don't have 23:01 to go any further than the grandparent 23:04 the grandparents count doesn't change 23:16 right if you're removing this fellow we 23:18 perform a search and you find it here to 23:25 get rid of that fellow you look at the 23:28 count and apparent the count goes down 23:31 by one if it becomes less than two then 23:35 the parent has to disappear okay but the 23:40 grandparent count isn't going to change 23:43 it's going to be whatever taste we had 23:44 to change the pointer in the grandparent 23:46 to come down here and for that you 23:49 actually have to if you're thinking of 23:51 this as an array you're going to have to 23:53 search through the pointers to find out 23:56 where that non null pointer is in the 23:59 node 24:08 all right you have any questions about 24:11 adapting the compressed binary insert 24:14 and delete through the case of 24:16 compressed higher-order trice okay let's 24:25 take a look at what we need to do if 24:27 you're going to be working with variable 24:29 length keys the discussion so far has 24:31 focused on fixed length keys okay so 24:36 suppose you had this particular try ten 24:40 way try and you're asked to insert 0 1 2 24:45 3 ok so we would start here we'd look at 24:50 digit 3 which is a 2 so you'll end up 24:53 over here we'll compare the two keys 24:56 we'll find that they're different and 24:58 then my rule says finally leftmost digit 25:01 where these two fellows differ well 25:03 there is no leftmost place where they 25:05 differ you run out they agree 0 1 2 3 25:09 agrees with that and then you got 25:11 nothing else to look at okay so you end 25:16 up with a problem because the key that 25:20 you're trying to insert here is a proper 25:22 prefix of a key that exists okay so 25:26 whenever you have a key that's a proper 25:28 prefix of another then there is no 25:31 leftmost place where the two differ and 25:35 to get around this problem what you do 25:38 is you append to each key a special 25:42 digit one that cannot normally arise 25:47 inside of a key so for example here we 25:50 say append this special character pound 25:53 sign 25:53 Amaka to the end of every key okay so 26:00 you can think of this as 0 1 2 3 marker 26:03 there's a marker or a pound sign sitting 26:06 here and one over here and one over that 26:08 okay once you do that no key can be a 26:12 proper prefix of another because a pound 26:15 sign cannot occur in the middle of a key 26:17 ok so that takes away the difficulty you 26:20 have when you a deal 26:22 with variable-length keys okay so with 26:28 that in mind then we applied the same 26:33 rule that we had earlier which is find 26:38 the first place where these two differ 26:39 that's now going to be in digit 1 2 3 4 26:43 5 because this one's has a pound sign 26:46 and this one has a 4 at that position so 26:51 I'm going to do a branching here on 26:53 digit 5 okay so do a branching on 26:58 digital 4 you get the old fellow if it's 27:01 a pound you get the new guys all right 27:05 so with this slight change to the keys 27:09 you can get the scheme to work for 27:11 variable length keys the scheme of 27:17 course comes at a cost each node has one 27:19 extra pointer field 27:32 right that several variations on the 27:36 basic try structure will run through 27:39 some of them in class and I'll let you 27:42 look at the readings on the website for 27:44 the remaining variations in one 27:47 variation we augment our branch nodes so 27:53 that each node has yet another field 27:54 this one we might call an element field 27:57 it points to one of the element nodes in 28:00 the subtree a' which it is root what 28:04 this enables us to do is it enables us 28:07 on the way down to find out what 28:09 characters or digits are being skipped 28:11 over which means that on the way down 28:18 you could check for equality with these 28:19 characters during a search and if you 28:23 find a mismatch you don't have to go any 28:25 further down the try alright so we add a 28:34 new field which points to any one of the 28:36 elements in that sub tree we use this 28:44 information on the way down during a 28:45 search to compare digits that you're 28:48 skipping over all right so for example 28:55 if this is our ten way try then I'm 28:59 going to augment each of my branch nodes 29:01 I haven't done that yet each branch node 29:03 is going to be augmented I'm going to 29:04 give it another fields called an element 29:07 fields and this can point to any one of 29:11 the element nodes in the subtree which 29:15 it is root okay so for example this 29:17 fellow could point here but it could 29:19 also have pointed there or here or here 29:23 this pointer would be used for example 29:27 to extract digits one and two okay so on 29:31 my way down I come here this says well 29:33 branch on three well before branching on 29:35 digit three I say well let me see if 29:37 I've got a match on digits one and two 29:38 okay 29:39 using this pointer I can access this 29:41 fellow and compare digits one and two if 29:44 they don't agree there's no 29:46 going further down I'm not going to 29:48 agree with anybody down here okay so you 29:51 can declare the search unsuccessful 29:52 right here at this level in this node I 29:59 could point to this fellow or to that 30:02 fellow okay again the idea is when you 30:06 come here before you branch on digit 30:08 five okay I can use this pointer to 30:11 check out digit 4 so here I've already 30:16 checked out digit digits 1 & 2 30:19 I've branch on digit 3 so I know there's 30:21 a match on digit 3 I'm sorry here 30:24 there's a match on digit 3 it's 30:26 everybody's got a 2 then before 30:28 branching on digit 5 I use this blue 30:33 pointer here to check out digit 4 make 30:35 sure there's a match okay and so it 30:38 doesn't matter where you're pointing 30:38 here or there everybody has the same 30:40 digit 4 okay so then once you verify 30:44 that's true then you actually do the 30:46 branching on digit 5 alright similarly 30:51 here we have a pointer to one of the two 30:53 yellow nodes in its subtract all right 30:58 so that's a variation that people employ 31:00 to terminate unsuccessful searches early 31:11 or other useful pieces of information if 31:15 you're dealing with random Keys the 31:17 expected height or when my try is log of 31:20 n base M so without any effort at 31:23 bouncing you expect to see a height 31:25 that's about logarithmic in practice 31:31 people might use one of two heuristics 31:34 to control the height one is to say that 31:38 branching will be done only H levels 31:41 deep after that all of the unresolved 31:45 keys are placed into a bucket and the 31:50 bucket may be searched thoroughly so 31:58 again the expectation here is that after 32:00 you've done in this case there's six 32:01 levels of branching the sets of 32:04 unresolved keys are small maybe you've 32:06 got only five to ten keys in each group 32:08 put them all into separate buckets and 32:10 then search those buckets eerily along 32:15 those lines you may implement your 32:20 heuristic slightly differently which 32:22 says you're going to keep branching 32:24 until the number of elements in a 32:26 subtree becomes no more than some 32:29 threshold s let's say ten so if a 32:32 subtree has only ten or fewer elements 32:34 I'm going to start branching I'm going 32:36 to stick a big element node there a 32:38 bucket which contains all of the 32:40 unresolved keys and will to search those 32:42 serially that's similar in spirit to the 32:46 previous heuristic so with this 32:52 heuristic in place the expected number 32:58 of branch nodes total number of branch 33:01 nodes is expected to be something like n 33:03 divided by s natural log of yeah 33:11 some other things people do is instead 33:14 of sampling the keys left to right 33:16 you may sample right to left or you may 33:18 sample in some random fashion using some 33:21 pseudo-random number generator 33:23 so for example if you have a dictionary 33:25 of people's last names and suppose that 33:35 you have a large number of last names 33:38 like oh it's not just last if they did 33:41 got people's name so you got the first 33:42 name in a last name with your king and 33:44 first on the last name and then on the 33:46 first name and you've concatenated them 33:48 okay well then you may have a large 33:50 number of people whose last name is 33:52 Johnson and so you may have to go 33:56 through as many characters as correspond 33:58 to Johnson before you you have any hope 34:00 of distinguishing so to avoid situations 34:03 like that where you expect long common 34:06 prefixes in your keys you may instead do 34:09 it from right to left or if you expect a 34:13 lot of commonality both in the front and 34:16 the back 34:16 you might try some middle to out scheme 34:19 or generally you might just try some 34:21 random scheme to sample again the idea 34:25 is to reduce the number of levels of 34:27 branching a variation on high order 34:36 tries in a high order try as we have 34:39 discussed it you determine the order say 34:41 ten or a hundred and then every branch 34:43 node has that order a variation to that 34:49 has been proposed for applications like 34:52 IP routers packet forwarding tables and 34:56 this variation is referred to as a 34:59 multi-bit try so in a multi-bit try the 35:03 degree of a branch node can change from 35:06 one node to the next so you assigned to 35:12 each branch node astride so each branch 35:15 node gets assigned something we call a 35:17 stride and that determines the number of 35:20 bits in the destination address of the 35:22 packet that's used that 35:24 for branching purposes it also 35:26 determines the number of pointers in the 35:28 node so if your stride is why not only 35:31 one bit is used you can have two 35:32 children if your stride is two two bits 35:35 from the destination address are used 35:37 you can have four children stride is 35:39 three use up three bits the next three 35:41 bits and you have eight children in the 35:43 node okay alright says proposed for 35:49 internet router applications this is an 35:52 inherently variable length key situation 35:55 the keys are the rules of filters in the 35:58 table okay and we're performing longest 36:02 prefix match so yeah when we first 36:11 looked at router tables we talked about 36:14 using binary tries and so there you 36:17 could have height going up to 32 with 36:22 destination addresses being up to 32 36:23 bits long so if you had a binary try and 36:29 you are performing searches you could 36:33 have up to I guess if your height was 32 36:35 you could have 232 cache misses then if 36:38 you go through the math you'd look at 32 36:40 cache misses and the time required to 36:42 service a cache miss there's really no 36:45 way you can classify a few hundred 36:48 million packets a second so to get 36:52 around that problem you have to go to 36:54 higher order tries so that you cut the 36:57 height down now when you go to higher 37:00 order tries and let's say you say well 37:03 instead of one bit I'm going to maybe 37:05 branch on 16 bits well then your node 37:10 size becomes 2 to the 16 because you've 37:12 got to do the 16 pointers in there and 37:13 the total amount of space you need for 37:16 the try becomes large so you have a 37:20 trade-off between the number of levels 37:23 and the total amount of space that your 37:25 try needs and to balance that trade-off 37:28 more effectively and one allows to have 37:32 different degree nodes instead of 37:35 requiring fixed degree nodes at every 37:37 level 37:38 so typically in a router table 37:41 application you might start with the 37:43 specification which says I can tolerate 37:46 up to let's say four cache misses per 37:49 packet that means your height is now 37:52 constrained it can only be four so 37:54 starting with that constraint you then 37:56 want to design a try which uses the 37:58 smallest amount of memory for the router 38:00 table that you have okay so you know all 38:04 of the keys that are going to be in 38:05 there 38:05 you know the target height you want to 38:07 minimize memory what you don't know is 38:09 what are the degrees of the different 38:11 nodes and the try okay so that's an 38:15 interesting optimization problem which 38:18 people have studied and there are nice 38:20 algorithms around to build optimal tries 38:24 under various constraints for allowable 38:27 degrees but subject to height limitation 38:32 okay so once the height limitation is 38:37 given you're kind of free to choose the 38:38 strides of the nodes if you're if you 38:43 choose the route stride to be 32 and 38:45 you're looking at IP 4 well then you 38:47 need only the route okay 38:52 they're only 32 bits and the address so 38:55 your height becomes 1 if you chose 39:01 strides of 16 at the root level 8 and 8 39:04 for the next two levels you're still 39:06 covered all 32 bits you could get away 39:08 with a three level try also an example 39:14 here at the root level I've got a stride 39:19 of one so it's got two children then at 39:22 the next level this node has a stride of 39:25 two so it's got four children and this 39:28 one has a stride of one so it's got two 39:31 children okay in this particular 39:34 organization I'm allowing you to keep 39:36 the router table filters inside the 39:40 branch nodes okay so a branch node not 39:43 only has a point pointer space it also 39:46 has data space at this node here I'm 39:50 going to keep 39:51 filters there are one bit long okay so 39:54 if you had a zero-star I would place it 39:56 here if you had a 1-star I would place 39:58 it here the null here says I don't have 40:01 a one star in my root table in my router 40:03 table okay here I'm branching using the 40:09 next two bits so that would be a bit too 40:10 and bit three of the destination address 40:13 in this node in addition to having these 40:16 four pointers I also have space for four 40:19 filters okay and these filters are 40:22 necessarily of length three 40:25 they all have to begin with zero because 40:27 I came from there and then the other two 40:30 bits are kind of free so you got four 40:31 possibilities for the other two bits 40:36 here at the stride is once you got two 40:39 pointers and you got two fields four 40:41 filters here you branch on bit one and 40:45 then here you're going to use the next 40:46 bit after that so that becomes bit two 40:49 in this node you're going to keep 40:51 filters whose length is two bits okay so 40:55 one bit filters here here you have two 40:57 bit filters but they have to begin with 40:59 a one here you have three bit filters 41:02 but they must begin with a zero you know 41:08 you'll notice that there's no place to 41:11 put a two bit filter that begins with a 41:13 zero okay so if you had a two bit filter 41:20 that begins with a zero we would expand 41:22 it to two three bit filters so for 41:25 example if you had a zero one it could 41:27 be represented as a zero 1 1 and 0 1 0 41:32 okay so you do something called filter 41:35 expansion because when you go into multi 41:38 bit tries you don't have nodes which can 41:41 accommodate every possible length that 41:43 might arise okay so you're going to 41:46 filter expansion as I said hey if you 41:49 had 0 1 we would create a 0 1 0 filter 41:51 which would occupy this part a 0 1 1 41:54 that we're trying to occupy this one but 41:57 if you already have 0 1 1 this fellow 41:59 will win over the expansion of 0 1 42:02 because we're doing longest prefix 42:04 matching the 42:05 longer prefix takes precedence over the 42:07 shorter one okay right in this 42:11 particular try 42:15 we have nodes of different strides 42:19 stride one a stride to node and another 42:22 stride one 42:23 of course this could have been a stride 42:24 three node okay so that leads to 42:27 something called a variable stride 42:29 multi-bit try as I said before a 42:33 challenging optimization problem is to 42:36 select the strides of the tries of the 42:38 try nodes so as to minimize the total 42:41 memory needed by the try the 42:45 optimization problem is somewhat simpler 42:48 if you make a restriction on these on 42:52 the allowable tries which says that 42:54 within a level all of the nodes must 42:57 have the same stride okay which is not 43:00 true for this example because level two 43:02 this fellas got a stride of two that 43:04 fellas got a stride to one but if you 43:07 make that restriction you end up with a 43:09 restricted category of multi but tries 43:11 which are referred to as fixed stride 43:14 tries so the stride is fixed but only on 43:17 a per level basis okay so that's a 43:21 slightly easier optimization problem but 43:24 people know how to find the best tries 43:26 that fit these requirements 43:34 all right so you need to choose the 43:37 stride we've gone through this 43:38 if the stride is as you have to to these 43:41 children you also have space for two to 43:43 the s prefixes there's a rule that tells 43:47 you what length prefixes get stored in a 43:49 node short prefixes they don't fit into 43:54 any of these allowable lengths get 43:55 expanded to longer length prefixes so 44:00 for example if the root stride is if the 44:02 stride of your root is 3 and you have 44:04 prefixes whose length is 3 you need to 44:06 expand them to length 3 this length to 44:12 prefix would get expanded to 0 0 0 and 0 44:15 0 1 if you already had 0 0 0 44:22 well then Q will get precedence over the 44:25 expansion of P because Q is a longer 44:27 length prefix than P all right so using 44:33 this these rules once you choose the 44:37 Strides doesn't matter what your 44:39 prefixes are you can always fit them 44:41 into your try 44:48 all right there are plenty of other 44:50 applications for tries and plenty of 44:52 other variations and tries and you 44:55 really need to take a look at the 44:56 readings that are there on the website 44:59 tries are used for searching for a 45:02 prefix okay so if you have a bunch of 45:05 keys and you're given a prefix you want 45:08 to find out what are the keys that match 45:10 a useful application of prefix search 45:17 would be automatic command complete 45:18 completion so if you're kind of sitting 45:21 in UNIX and you type the first three 45:22 letters of a command you press the tab 45:24 key it brings everything else in 45:25 provided there is no ambiguity it's 45:30 easier to see how you can do that with 45:31 tries you kind of follow down the try 45:32 once you reach an element node it 45:35 doesn't matter what the rest is that the 45:37 user types that's the only possibility 45:44 tries get used to implement lzw 45:47 compression a very popular compression 45:49 technique used in many commercial 45:53 software for example if you every time 45:57 you open up Adobe Acrobat Reader you'll 46:00 see a little message down there which 46:01 says implements lzw compression a fast 46:05 way to implement lzw compression is to 46:07 employ tries and another fast way to do 46:12 it of course is to use hash tables so 46:14 depending upon whose implementation 46:15 you're looking at you might see a try 46:18 implementation or a hash table 46:19 implementation of lzw compression right 46:26 there are plenty of alternative node 46:28 structures the pictures we've been 46:31 drawing and even some of the explicit 46:34 statements I've made all kind of point 46:36 toward representing nodes as arrays 46:42 especially if you want to have only one 46:46 cache miss per level but certainly there 46:49 are other implementations people employ 46:52 you may have a chain so you keep only 46:54 the non null pointers chain them 46:56 together 46:59 very useful if you're dealing with very 47:01 high degree tries instead of wasting a 47:04 lot of space with nulls you just keep 47:07 the non no information you may even have 47:13 a binary search tree of non now pointers 47:15 you may have a hash table of non null 47:21 pointers all right so there's a lot more 47:32 to try than we've covered here and the 47:34 reading on the web gives you a good idea 47:36 of some of the other things that people 47:38 are doing with tries some of the other 47:41 implementations as well okay all right 47:45 any questions okay we'll stop 47:57 you