Transcript


00:09
[Music]
00:12
okay you have any questions on past
00:15
material
00:20
okay right today we're going to take a
00:24
look at higher-order tries and this will
00:27
be our last lecture on the Tri data
00:30
structure so far we have looked only at
00:35
binary tries and as you might expect the
00:38
height of binary tries could be large
00:41
because you only doing 2-way branching
00:43
and when the height is large that in
00:47
turn means that when you do a search you
00:49
have the potential for a large number of
00:51
cache misses similarly when you do an
00:53
insert or delete by going to a higher
00:56
order try perhaps by performing let's
00:59
say 10 Way branching or 100 way
01:01
branching or thousand way branching we
01:03
can reduce the height and thereby also
01:06
reduce the number of cache misses that
01:09
take place and so improve performance
01:14
all right so let's take a look at an
01:19
example let's suppose we're building a
01:22
dictionary structure for an environment
01:27
where the keys are Social Security
01:28
numbers okay so we have a large number
01:30
of customers and they're being keyed in
01:32
by their social security number the
01:34
social security number we know is a nine
01:38
digit number so for example this could
01:42
be a potential social security number
01:44
for someone nine decimal digits now if
01:49
we did ten way branching which means
01:52
each branch node has a degree that is
01:54
ten then the branch nodes of our try
01:59
would look something like this if the
02:02
digit you looking at is a zero you go
02:04
into that subtree if it's a three you're
02:06
going to this subtype it's a mine you go
02:08
into that subtree okay so in ten way
02:11
branching we would examine the digits in
02:13
the key left-to-right using a base ten
02:16
decomposition of the key we could
02:21
certainly set up a binary try with the
02:22
same key then we'd do a two-way
02:25
branching using a base two decomposition
02:28
of the key so using a base ten
02:31
decomposition
02:32
at the root level weed branch on the
02:34
first decimal digit in this case of four
02:37
then at the next level we look at the
02:39
next decimal digit and so on okay so our
02:42
branch nodes would have ten child fields
02:46
in them so depending upon whether the
02:48
digit you're looking at is a zero or or
02:50
mine any one of these numbers you would
02:52
go into the appropriate subtree when you
02:58
reach a situation where a subtree has
03:02
only one key in it then you would place
03:05
an element node over there okay so we
03:10
could do ten way branching that would
03:12
give us an order ten try in this case
03:16
the height of our try could be as much
03:21
as 10 so we could be doing let's see so
03:29
if we have a 10 way try the height could
03:31
be as much as 10 which means you can
03:34
have nine levels of branch nodes for
03:38
each of the nine digits in our social
03:41
security key and then you have one level
03:44
of element nodes so if you're performing
03:47
a search we may end up looking at all
03:50
nine of the decimal digits before we end
03:53
up at an element node and so you could
03:57
move down ten levels of nodes which in
03:59
terms which in turn would mean that you
04:03
would have at most ten cache misses
04:05
during a search right so with 10 Way
04:10
branching we could have up to ten cache
04:13
misses you know to get a feel for why
04:21
does 10 and not Big Oak is this going to
04:24
be important when you look at higher
04:25
order trice let's go back at this
04:27
picture here okay all right so if you
04:30
look at this picture here if you think
04:33
about the node getting very large then
04:35
certainly you can't read the whole node
04:37
in with one cache miss it's not going to
04:39
fit into a cache line okay so if you
04:41
have to read the whole node in that
04:44
potentially could
04:46
involve many cache misses okay now the
04:49
nice thing about tries is that you don't
04:52
need the whole node to be read in okay
04:56
so for example if I know that my digit
04:59
is 4 then all I need is this pointer
05:03
okay and if this thing is set up right
05:07
for example now let's think of this as
05:08
an array okay then I don't need to get
05:13
position 0 through 9 of this array I
05:15
only need to get position 4 of the array
05:18
and I can get position 4 of the array
05:20
with a single cache mess regardless of
05:23
how large this array is okay so if the
05:28
note is set up as an array it doesn't
05:31
really matter how big the node is at
05:33
every level we only need a small part of
05:36
the node and we know which part and we
05:38
can get that part but having a single
05:41
cache miss okay
05:49
you know if we went to a hundredweight
05:51
Rai show the nodes are going to get
05:53
bigger each node will branch node will
05:55
have 100 pointers in it okay but with
05:59
the same mind digit P okay at the root
06:02
level we would branch using the first
06:05
two digits okay at the next level we
06:09
look at the next two digits then at
06:11
level three the next two at level four
06:13
those two and at level five we would
06:16
have only one digit to look at okay so
06:21
in the case of a hundredweight rai our
06:25
height would be at most six five levels
06:27
of branch nodes the first four would be
06:30
hundred way and the last one could only
06:32
be a ten way and then you would have an
06:36
element node also in the picture so you
06:40
could have up to five levels of branch
06:42
nodes on any path and then one element
06:45
node so you would do five up to five
06:51
branches on your way down followed by a
06:54
single compare at the end where you
06:56
would compare this mind digit key with
06:58
whatever sitting in the element node you
07:00
reach if you compare this with using say
07:08
some of the balanced structures we've
07:10
seen if you look at a red-black tree
07:14
well the height of a red-black tree if
07:17
you take this thing to an extreme when
07:19
you assume you've got all potential
07:23
billion social security numbers in use
07:29
okay so you've got ten to the nine
07:30
different numbers okay so if you assume
07:32
that all 10 to the nine are in use then
07:34
your height could be as much as two
07:36
times log of n which means you could
07:42
have 60 levels of nodes and what that
07:49
means is that to find what you're
07:51
looking for you could compare you could
07:53
perform up to 60 comparisons of nine
07:56
digit numbers using a try there is only
08:00
one comparison so it doesn't really
08:02
matter how big
08:03
the key is because you're only going to
08:04
do that compare once you do a lot of
08:06
digital but you can extract digits in
08:12
constant time doesn't really matter how
08:14
big the thing is that you're extracting
08:15
your digits from especially if you
08:18
choose your radix is right okay and so
08:24
here we're looking at doing maybe up to
08:27
60 compares of large numbers whereas if
08:32
you had a hundredweight try we would do
08:35
at most 5 branches followed by 1 compare
08:37
ok so far fewer cache misses and far
08:42
fewer compares if you used an AVL tree
08:48
then the height could be as much as 40
08:51
and that means up to 40 compares but it
08:59
also means up to 40 cache misses if you
09:05
use the best binary tree so that would
09:08
be a full binary tree or a complete
09:11
binary tree you're at about 30 levels
09:16
now certainly you can bring this number
09:18
down you can go to red black trees and
09:20
things but when you're talking about I'm
09:23
certain that red black but B trees okay
09:24
but when you're talking about B trees as
09:27
you get to larger nodes the number of
09:31
cache misses per node goes up because
09:33
you do have to read that whole node in
09:35
to figure out which sub tree to go into
09:37
okay unlike a try where you only need to
09:41
read in the particular pointer that you
09:43
need to move down in in the case of B
09:49
trees because you have to perform a
09:51
binary search inside the node to
09:53
determine which sub tree to move you
09:56
potentially need that whole node yeah
10:01
all right so so high degree tries have a
10:07
lot of potential for dictionary
10:11
applications it's a they represent a
10:15
very simple structure
10:16
there's really no active balancing
10:20
technique that's employed there and yet
10:23
they provide very good performance in
10:28
the right kind of application now in the
10:31
same social security application if we
10:33
went through and said well I don't
10:35
really have close to that billion but I
10:37
have only five hundred keys well even
10:41
with five hundred keys our try could
10:43
have the maximum possible height whereas
10:49
in the case of an AVL we're looking at
10:51
log of 500 base two so those structures
10:57
if you get small enough they will do a
10:59
lot better than I try would do you got
11:04
to have reasonable population of the
11:06
space now let's take a look at a
11:10
compressed social security try the note
11:17
structure that will employ assuming that
11:20
you have ten way branching a branch node
11:23
will have ten child pointers one for
11:26
each of the possible values for the
11:28
digit you're branching on you'll have a
11:31
character number that tells you which
11:34
character or digit of the key you are
11:36
branching on okay so this is equivalent
11:40
to the bit number field that was being
11:41
used in a compressed binary tree then we
11:47
have this field here that we didn't have
11:49
before which tells us the number of non
11:56
null pointers in the node okay this is
12:00
going to be useful when we perform a
12:02
deletion because a property of a
12:04
compressed try with its binary or
12:07
otherwise is that each branch node must
12:11
have at least two non null pointers in
12:13
it okay so using this field here we can
12:18
keep track of how many non null pointers
12:21
there are and every time you remove one
12:22
we can decrease this count by one every
12:25
time you add in one you can increase
12:26
this count by one and when the count
12:28
drops below two
12:29
it's time to get rid of the branch node
12:33
okay so the structure is somewhat
12:37
similar to what's used in compressed
12:38
binary tries except that here we keep in
12:41
track of the number of pointers to avoid
12:43
having to count how many non null
12:45
pointers there are every time you do a
12:47
delete now let's take a look at the
12:52
insert and delete operations as we'll
12:56
see these are just a natural extension
12:58
of the ones discussed earlier for
13:00
compressed binary tries I suppose we
13:06
start with this particular key and we're
13:11
going to build a 10-way compressed try
13:17
so we start with an empty try we're
13:20
going to insert this key in it we're
13:22
going to end up with one element node
13:25
again we have the same relationship here
13:30
with respect to well I guess that's
13:34
really not the same but if you have know
13:40
if you have only one element then there
13:43
will be no branch node the requirement
13:46
is that every branch node must be the
13:49
root of a try that has at least two
13:52
elements in it of course a branch node
13:56
could be the root of a try that has many
13:57
elements in it okay so for example if
14:01
you had ten elements it's possible that
14:05
you have only one branch node because
14:08
the branch node has ten pointers and
14:10
each of these ten pointers could go to
14:11
an element node all right so here we're
14:16
starting with an empty try you get one
14:18
element and we just create a try that
14:23
has a single element node in it which
14:24
contains the key and of course
14:27
associated with the key they will
14:29
generally be other values too but here
14:32
we're going to show only the key all
14:36
right now let's suppose we want to
14:38
insert this well then you're first going
14:40
to search there try to see if this is
14:42
already there
14:43
so when you perform the search in this
14:45
case we end up at this element node when
14:50
you end up at an element node you
14:51
compare the two to make sure they're
14:53
different since they're different you
14:55
sample these two from left to right
14:57
finding the first digit where they
14:59
differ so in this case the first digit
15:01
is number three that's where this one's
15:04
got a two that one's got a five so I'm
15:06
going to introduce a branch node where
15:09
the branching is done on character or
15:12
digit three if it's a two you go down
15:15
here if it's a five you go down there
15:17
all right the picture doesn't show the
15:22
null pointers we actually have eight
15:24
null pointers in this node right next
15:32
let's suppose we're going to insert this
15:34
particular key okay again you perform a
15:38
search you need to first look at digit
15:40
three so you extract digit three which
15:42
is a five so you branch on that you end
15:45
up at an element node you compare the
15:47
two they're different you find the first
15:50
digit where they differ so you scan left
15:53
to right so one one one five zero one
15:57
five two three agreed that this were out
15:59
here
16:00
this is digit number six okay all right
16:08
so having found out where they differ
16:09
you then need to know where to place the
16:11
appropriate branch node as was the case
16:13
for binary tries the digit numbers has
16:17
to increase on the way down okay so
16:19
you're going to place the new branch
16:20
node on the search path in such a manner
16:24
that these digit numbers are increasing
16:27
so you're going to have to place it over
16:28
here okay so I break that link put in a
16:33
branch node on digit 6 and if digit 6 is
16:37
a 1 you end up here if it's a 4 you end
16:39
up down there
16:45
all right the next insert it's going to
16:47
be for a fellow with this key okay you
16:50
perform a search you first look at digit
16:52
3 which is a 9 ok so you're going to
16:59
branch on 9 9 is a null ok so you find
17:03
nothing when you end up at a null
17:09
position from wherever you fall off you
17:14
back up to the appropriate branch nerve
17:16
which is the branch node I fell off off
17:18
and you follow any path downwards trying
17:21
to find an element ok it doesn't matter
17:23
which element you find ok because all of
17:26
those elements agree on the first so
17:28
many bits ok so from here we find a path
17:33
going downwards and let's suppose you
17:40
end up over here okay well then you
17:44
sample these two ok it doesn't matter
17:46
which one you get to we know that when
17:49
you sample the first place they differ
17:51
is going to be smaller than the digit
17:56
number where you fell off because you
17:58
had to know out there ok all right so we
18:01
sample and in this case you find that
18:03
this differs from that at position 2 ok
18:07
this is a 7 that's a 1 ok and that a
18:10
little bit same would be true if you'd
18:11
come here or that ok so you're going to
18:14
have to branch on character or digit 2
18:16
and that's going to happen before you
18:19
reach this node here so you have to put
18:20
up a node over there ok so you do a
18:24
digit 2 branching
18:37
alright next let's suppose we get this
18:39
fellow okay so you perform a search you
18:42
first look at digit 2 which is a 1 so
18:44
you end up here then you look at digit 3
18:47
which is a 2 so you end up here okay you
18:50
compare the two they're different find
18:52
the first place where they differ so
18:54
they agree on 0 1 2 3 4 5 6 and they
18:57
differ out here and that's number 8 ok
19:03
digit 8 so we're going to put a digit 8
19:07
branch on this path and it has to be
19:10
done so that they're in ascending order
19:12
so you'll need to put it here by
19:16
breaking this link so we get a digit 8
19:19
branch here well we do yet another one
19:25
so we insert this fellow do a search you
19:29
first look at digit 2 which is a 1 so
19:31
you come here you look at digit 3 which
19:33
is a 1 so you fall off the one pointers
19:40
are null so at this point you find
19:45
anybody in this search tree here ok so
19:51
all of them agree on the first two
19:54
digits which is 0 and 1 so let's say you
19:57
find somebody in this subtree maybe you
20:00
find this guy here okay you compare and
20:04
the place where you differ is digit 3 so
20:09
you got a 2 there and here you got a 1
20:14
okay so I'm going to be putting in a 1
20:17
branch over there
20:26
whenever you fall off at a null we know
20:29
that you differ at least on the digit
20:33
represented here okay and possibly on
20:37
something earlier
20:38
so you so the first place where you
20:39
differ cannot be something bigger than
20:41
where you fell off it either has to be
20:43
the digit number where you fell off or
20:46
something
20:46
Olivia okay okay so the insert is it's
20:56
pretty much the way you would do it for
20:58
a compressed binary try of course when
21:04
you do the insert you have one added
21:07
operation you have to keep track of the
21:09
number of pointers so whenever you
21:11
change a null pointer to not know you
21:13
need to bump up the number of pointer
21:15
counter also get it delete okay again
21:22
delete is similar to how it would work
21:25
for a compressed binary try you find
21:28
what you're looking for in an element
21:30
node you throw away the element node so
21:34
for example if you want to delete this
21:36
fellow you perform a search and you find
21:39
it here okay so we're going to throw
21:42
this node away which means we're going
21:44
to lose this pointer okay and the case
21:48
of a compressed binary try when you lost
21:50
a pointer you know that the parent
21:52
became deficient it has only one non
21:55
null pointer okay that's not necessarily
21:57
true here in fact here we still have two
22:00
non null pointers okay so to distinguish
22:04
we have that pointer field which others
22:07
how many non null pointers there are
22:08
previously that field had the number
22:10
three in it now we drop it down to two
22:13
which means this node survives okay so
22:16
in this case we just throw this node
22:19
away and change that pointer to no and
22:22
decrement the count
22:28
all right if you're deleting this fellow
22:30
you look for it and you find it here
22:34
okay so we get rid of that element node
22:37
a non null pointer changes to now the
22:42
count for non null pointers drops from
22:45
two to one which tells us that this node
22:50
should also be dispensed with
22:52
okay so we're going to get rid of this
22:54
node and just change the pointer from
22:56
the parent to come here you don't have
23:01
to go any further than the grandparent
23:04
the grandparents count doesn't change
23:16
right if you're removing this fellow we
23:18
perform a search and you find it here to
23:25
get rid of that fellow you look at the
23:28
count and apparent the count goes down
23:31
by one if it becomes less than two then
23:35
the parent has to disappear okay but the
23:40
grandparent count isn't going to change
23:43
it's going to be whatever taste we had
23:44
to change the pointer in the grandparent
23:46
to come down here and for that you
23:49
actually have to if you're thinking of
23:51
this as an array you're going to have to
23:53
search through the pointers to find out
23:56
where that non null pointer is in the
23:59
node
24:08
all right you have any questions about
24:11
adapting the compressed binary insert
24:14
and delete through the case of
24:16
compressed higher-order trice okay let's
24:25
take a look at what we need to do if
24:27
you're going to be working with variable
24:29
length keys the discussion so far has
24:31
focused on fixed length keys okay so
24:36
suppose you had this particular try ten
24:40
way try and you're asked to insert 0 1 2
24:45
3 ok so we would start here we'd look at
24:50
digit 3 which is a 2 so you'll end up
24:53
over here we'll compare the two keys
24:56
we'll find that they're different and
24:58
then my rule says finally leftmost digit
25:01
where these two fellows differ well
25:03
there is no leftmost place where they
25:05
differ you run out they agree 0 1 2 3
25:09
agrees with that and then you got
25:11
nothing else to look at okay so you end
25:16
up with a problem because the key that
25:20
you're trying to insert here is a proper
25:22
prefix of a key that exists okay so
25:26
whenever you have a key that's a proper
25:28
prefix of another then there is no
25:31
leftmost place where the two differ and
25:35
to get around this problem what you do
25:38
is you append to each key a special
25:42
digit one that cannot normally arise
25:47
inside of a key so for example here we
25:50
say append this special character pound
25:53
sign
25:53
Amaka to the end of every key okay so
26:00
you can think of this as 0 1 2 3 marker
26:03
there's a marker or a pound sign sitting
26:06
here and one over here and one over that
26:08
okay once you do that no key can be a
26:12
proper prefix of another because a pound
26:15
sign cannot occur in the middle of a key
26:17
ok so that takes away the difficulty you
26:20
have when you a deal
26:22
with variable-length keys okay so with
26:28
that in mind then we applied the same
26:33
rule that we had earlier which is find
26:38
the first place where these two differ
26:39
that's now going to be in digit 1 2 3 4
26:43
5 because this one's has a pound sign
26:46
and this one has a 4 at that position so
26:51
I'm going to do a branching here on
26:53
digit 5 okay so do a branching on
26:58
digital 4 you get the old fellow if it's
27:01
a pound you get the new guys all right
27:05
so with this slight change to the keys
27:09
you can get the scheme to work for
27:11
variable length keys the scheme of
27:17
course comes at a cost each node has one
27:19
extra pointer field
27:32
right that several variations on the
27:36
basic try structure will run through
27:39
some of them in class and I'll let you
27:42
look at the readings on the website for
27:44
the remaining variations in one
27:47
variation we augment our branch nodes so
27:53
that each node has yet another field
27:54
this one we might call an element field
27:57
it points to one of the element nodes in
28:00
the subtree a' which it is root what
28:04
this enables us to do is it enables us
28:07
on the way down to find out what
28:09
characters or digits are being skipped
28:11
over which means that on the way down
28:18
you could check for equality with these
28:19
characters during a search and if you
28:23
find a mismatch you don't have to go any
28:25
further down the try alright so we add a
28:34
new field which points to any one of the
28:36
elements in that sub tree we use this
28:44
information on the way down during a
28:45
search to compare digits that you're
28:48
skipping over all right so for example
28:55
if this is our ten way try then I'm
28:59
going to augment each of my branch nodes
29:01
I haven't done that yet each branch node
29:03
is going to be augmented I'm going to
29:04
give it another fields called an element
29:07
fields and this can point to any one of
29:11
the element nodes in the subtree which
29:15
it is root okay so for example this
29:17
fellow could point here but it could
29:19
also have pointed there or here or here
29:23
this pointer would be used for example
29:27
to extract digits one and two okay so on
29:31
my way down I come here this says well
29:33
branch on three well before branching on
29:35
digit three I say well let me see if
29:37
I've got a match on digits one and two
29:38
okay
29:39
using this pointer I can access this
29:41
fellow and compare digits one and two if
29:44
they don't agree there's no
29:46
going further down I'm not going to
29:48
agree with anybody down here okay so you
29:51
can declare the search unsuccessful
29:52
right here at this level in this node I
29:59
could point to this fellow or to that
30:02
fellow okay again the idea is when you
30:06
come here before you branch on digit
30:08
five okay I can use this pointer to
30:11
check out digit 4 so here I've already
30:16
checked out digit digits 1 & 2
30:19
I've branch on digit 3 so I know there's
30:21
a match on digit 3 I'm sorry here
30:24
there's a match on digit 3 it's
30:26
everybody's got a 2 then before
30:28
branching on digit 5 I use this blue
30:33
pointer here to check out digit 4 make
30:35
sure there's a match okay and so it
30:38
doesn't matter where you're pointing
30:38
here or there everybody has the same
30:40
digit 4 okay so then once you verify
30:44
that's true then you actually do the
30:46
branching on digit 5 alright similarly
30:51
here we have a pointer to one of the two
30:53
yellow nodes in its subtract all right
30:58
so that's a variation that people employ
31:00
to terminate unsuccessful searches early
31:11
or other useful pieces of information if
31:15
you're dealing with random Keys the
31:17
expected height or when my try is log of
31:20
n base M so without any effort at
31:23
bouncing you expect to see a height
31:25
that's about logarithmic in practice
31:31
people might use one of two heuristics
31:34
to control the height one is to say that
31:38
branching will be done only H levels
31:41
deep after that all of the unresolved
31:45
keys are placed into a bucket and the
31:50
bucket may be searched thoroughly so
31:58
again the expectation here is that after
32:00
you've done in this case there's six
32:01
levels of branching the sets of
32:04
unresolved keys are small maybe you've
32:06
got only five to ten keys in each group
32:08
put them all into separate buckets and
32:10
then search those buckets eerily along
32:15
those lines you may implement your
32:20
heuristic slightly differently which
32:22
says you're going to keep branching
32:24
until the number of elements in a
32:26
subtree becomes no more than some
32:29
threshold s let's say ten so if a
32:32
subtree has only ten or fewer elements
32:34
I'm going to start branching I'm going
32:36
to stick a big element node there a
32:38
bucket which contains all of the
32:40
unresolved keys and will to search those
32:42
serially that's similar in spirit to the
32:46
previous heuristic so with this
32:52
heuristic in place the expected number
32:58
of branch nodes total number of branch
33:01
nodes is expected to be something like n
33:03
divided by s natural log of yeah
33:11
some other things people do is instead
33:14
of sampling the keys left to right
33:16
you may sample right to left or you may
33:18
sample in some random fashion using some
33:21
pseudo-random number generator
33:23
so for example if you have a dictionary
33:25
of people's last names and suppose that
33:35
you have a large number of last names
33:38
like oh it's not just last if they did
33:41
got people's name so you got the first
33:42
name in a last name with your king and
33:44
first on the last name and then on the
33:46
first name and you've concatenated them
33:48
okay well then you may have a large
33:50
number of people whose last name is
33:52
Johnson and so you may have to go
33:56
through as many characters as correspond
33:58
to Johnson before you you have any hope
34:00
of distinguishing so to avoid situations
34:03
like that where you expect long common
34:06
prefixes in your keys you may instead do
34:09
it from right to left or if you expect a
34:13
lot of commonality both in the front and
34:16
the back
34:16
you might try some middle to out scheme
34:19
or generally you might just try some
34:21
random scheme to sample again the idea
34:25
is to reduce the number of levels of
34:27
branching a variation on high order
34:36
tries in a high order try as we have
34:39
discussed it you determine the order say
34:41
ten or a hundred and then every branch
34:43
node has that order a variation to that
34:49
has been proposed for applications like
34:52
IP routers packet forwarding tables and
34:56
this variation is referred to as a
34:59
multi-bit try so in a multi-bit try the
35:03
degree of a branch node can change from
35:06
one node to the next so you assigned to
35:12
each branch node astride so each branch
35:15
node gets assigned something we call a
35:17
stride and that determines the number of
35:20
bits in the destination address of the
35:22
packet that's used that
35:24
for branching purposes it also
35:26
determines the number of pointers in the
35:28
node so if your stride is why not only
35:31
one bit is used you can have two
35:32
children if your stride is two two bits
35:35
from the destination address are used
35:37
you can have four children stride is
35:39
three use up three bits the next three
35:41
bits and you have eight children in the
35:43
node okay alright says proposed for
35:49
internet router applications this is an
35:52
inherently variable length key situation
35:55
the keys are the rules of filters in the
35:58
table okay and we're performing longest
36:02
prefix match so yeah when we first
36:11
looked at router tables we talked about
36:14
using binary tries and so there you
36:17
could have height going up to 32 with
36:22
destination addresses being up to 32
36:23
bits long so if you had a binary try and
36:29
you are performing searches you could
36:33
have up to I guess if your height was 32
36:35
you could have 232 cache misses then if
36:38
you go through the math you'd look at 32
36:40
cache misses and the time required to
36:42
service a cache miss there's really no
36:45
way you can classify a few hundred
36:48
million packets a second so to get
36:52
around that problem you have to go to
36:54
higher order tries so that you cut the
36:57
height down now when you go to higher
37:00
order tries and let's say you say well
37:03
instead of one bit I'm going to maybe
37:05
branch on 16 bits well then your node
37:10
size becomes 2 to the 16 because you've
37:12
got to do the 16 pointers in there and
37:13
the total amount of space you need for
37:16
the try becomes large so you have a
37:20
trade-off between the number of levels
37:23
and the total amount of space that your
37:25
try needs and to balance that trade-off
37:28
more effectively and one allows to have
37:32
different degree nodes instead of
37:35
requiring fixed degree nodes at every
37:37
level
37:38
so typically in a router table
37:41
application you might start with the
37:43
specification which says I can tolerate
37:46
up to let's say four cache misses per
37:49
packet that means your height is now
37:52
constrained it can only be four so
37:54
starting with that constraint you then
37:56
want to design a try which uses the
37:58
smallest amount of memory for the router
38:00
table that you have okay so you know all
38:04
of the keys that are going to be in
38:05
there
38:05
you know the target height you want to
38:07
minimize memory what you don't know is
38:09
what are the degrees of the different
38:11
nodes and the try okay so that's an
38:15
interesting optimization problem which
38:18
people have studied and there are nice
38:20
algorithms around to build optimal tries
38:24
under various constraints for allowable
38:27
degrees but subject to height limitation
38:32
okay so once the height limitation is
38:37
given you're kind of free to choose the
38:38
strides of the nodes if you're if you
38:43
choose the route stride to be 32 and
38:45
you're looking at IP 4 well then you
38:47
need only the route okay
38:52
they're only 32 bits and the address so
38:55
your height becomes 1 if you chose
39:01
strides of 16 at the root level 8 and 8
39:04
for the next two levels you're still
39:06
covered all 32 bits you could get away
39:08
with a three level try also an example
39:14
here at the root level I've got a stride
39:19
of one so it's got two children then at
39:22
the next level this node has a stride of
39:25
two so it's got four children and this
39:28
one has a stride of one so it's got two
39:31
children okay in this particular
39:34
organization I'm allowing you to keep
39:36
the router table filters inside the
39:40
branch nodes okay so a branch node not
39:43
only has a point pointer space it also
39:46
has data space at this node here I'm
39:50
going to keep
39:51
filters there are one bit long okay so
39:54
if you had a zero-star I would place it
39:56
here if you had a 1-star I would place
39:58
it here the null here says I don't have
40:01
a one star in my root table in my router
40:03
table okay here I'm branching using the
40:09
next two bits so that would be a bit too
40:10
and bit three of the destination address
40:13
in this node in addition to having these
40:16
four pointers I also have space for four
40:19
filters okay and these filters are
40:22
necessarily of length three
40:25
they all have to begin with zero because
40:27
I came from there and then the other two
40:30
bits are kind of free so you got four
40:31
possibilities for the other two bits
40:36
here at the stride is once you got two
40:39
pointers and you got two fields four
40:41
filters here you branch on bit one and
40:45
then here you're going to use the next
40:46
bit after that so that becomes bit two
40:49
in this node you're going to keep
40:51
filters whose length is two bits okay so
40:55
one bit filters here here you have two
40:57
bit filters but they have to begin with
40:59
a one here you have three bit filters
41:02
but they must begin with a zero you know
41:08
you'll notice that there's no place to
41:11
put a two bit filter that begins with a
41:13
zero okay so if you had a two bit filter
41:20
that begins with a zero we would expand
41:22
it to two three bit filters so for
41:25
example if you had a zero one it could
41:27
be represented as a zero 1 1 and 0 1 0
41:32
okay so you do something called filter
41:35
expansion because when you go into multi
41:38
bit tries you don't have nodes which can
41:41
accommodate every possible length that
41:43
might arise okay so you're going to
41:46
filter expansion as I said hey if you
41:49
had 0 1 we would create a 0 1 0 filter
41:51
which would occupy this part a 0 1 1
41:54
that we're trying to occupy this one but
41:57
if you already have 0 1 1 this fellow
41:59
will win over the expansion of 0 1
42:02
because we're doing longest prefix
42:04
matching the
42:05
longer prefix takes precedence over the
42:07
shorter one okay right in this
42:11
particular try
42:15
we have nodes of different strides
42:19
stride one a stride to node and another
42:22
stride one
42:23
of course this could have been a stride
42:24
three node okay so that leads to
42:27
something called a variable stride
42:29
multi-bit try as I said before a
42:33
challenging optimization problem is to
42:36
select the strides of the tries of the
42:38
try nodes so as to minimize the total
42:41
memory needed by the try the
42:45
optimization problem is somewhat simpler
42:48
if you make a restriction on these on
42:52
the allowable tries which says that
42:54
within a level all of the nodes must
42:57
have the same stride okay which is not
43:00
true for this example because level two
43:02
this fellas got a stride of two that
43:04
fellas got a stride to one but if you
43:07
make that restriction you end up with a
43:09
restricted category of multi but tries
43:11
which are referred to as fixed stride
43:14
tries so the stride is fixed but only on
43:17
a per level basis okay so that's a
43:21
slightly easier optimization problem but
43:24
people know how to find the best tries
43:26
that fit these requirements
43:34
all right so you need to choose the
43:37
stride we've gone through this
43:38
if the stride is as you have to to these
43:41
children you also have space for two to
43:43
the s prefixes there's a rule that tells
43:47
you what length prefixes get stored in a
43:49
node short prefixes they don't fit into
43:54
any of these allowable lengths get
43:55
expanded to longer length prefixes so
44:00
for example if the root stride is if the
44:02
stride of your root is 3 and you have
44:04
prefixes whose length is 3 you need to
44:06
expand them to length 3 this length to
44:12
prefix would get expanded to 0 0 0 and 0
44:15
0 1 if you already had 0 0 0
44:22
well then Q will get precedence over the
44:25
expansion of P because Q is a longer
44:27
length prefix than P all right so using
44:33
this these rules once you choose the
44:37
Strides doesn't matter what your
44:39
prefixes are you can always fit them
44:41
into your try
44:48
all right there are plenty of other
44:50
applications for tries and plenty of
44:52
other variations and tries and you
44:55
really need to take a look at the
44:56
readings that are there on the website
44:59
tries are used for searching for a
45:02
prefix okay so if you have a bunch of
45:05
keys and you're given a prefix you want
45:08
to find out what are the keys that match
45:10
a useful application of prefix search
45:17
would be automatic command complete
45:18
completion so if you're kind of sitting
45:21
in UNIX and you type the first three
45:22
letters of a command you press the tab
45:24
key it brings everything else in
45:25
provided there is no ambiguity it's
45:30
easier to see how you can do that with
45:31
tries you kind of follow down the try
45:32
once you reach an element node it
45:35
doesn't matter what the rest is that the
45:37
user types that's the only possibility
45:44
tries get used to implement lzw
45:47
compression a very popular compression
45:49
technique used in many commercial
45:53
software for example if you every time
45:57
you open up Adobe Acrobat Reader you'll
46:00
see a little message down there which
46:01
says implements lzw compression a fast
46:05
way to implement lzw compression is to
46:07
employ tries and another fast way to do
46:12
it of course is to use hash tables so
46:14
depending upon whose implementation
46:15
you're looking at you might see a try
46:18
implementation or a hash table
46:19
implementation of lzw compression right
46:26
there are plenty of alternative node
46:28
structures the pictures we've been
46:31
drawing and even some of the explicit
46:34
statements I've made all kind of point
46:36
toward representing nodes as arrays
46:42
especially if you want to have only one
46:46
cache miss per level but certainly there
46:49
are other implementations people employ
46:52
you may have a chain so you keep only
46:54
the non null pointers chain them
46:56
together
46:59
very useful if you're dealing with very
47:01
high degree tries instead of wasting a
47:04
lot of space with nulls you just keep
47:07
the non no information you may even have
47:13
a binary search tree of non now pointers
47:15
you may have a hash table of non null
47:21
pointers all right so there's a lot more
47:32
to try than we've covered here and the
47:34
reading on the web gives you a good idea
47:36
of some of the other things that people
47:38
are doing with tries some of the other
47:41
implementations as well okay all right
47:45
any questions okay we'll stop
47:57
you