Transcript


00:03
okay all ready to start today we're
00:08
going to start looking at data
00:11
structures for dictionaries I guess last
00:15
several lectures if you recall we were
00:16
looking at data structures for proudly
00:18
Q's different types of product use we
00:20
now turn our attention to dictionaries
00:23
okay right so a dictionary is basically
00:28
a collection of items where each item is
00:32
a pair you have a key associated with
00:36
the item and you have an element
00:38
associated with the item is a lot like
00:44
the collections you were keeping in
00:46
project queues you had keys and he had
00:48
elements the difference comes about in
00:51
terms of the kinds of things you can do
00:53
with this collection we make the
00:59
assumption that your collection of pairs
01:02
are such that no two pairs have the same
01:05
key in the case of a prior to queue you
01:09
allow many pairs to have the same key
01:12
here we don't we require all pairs to
01:15
have different keys so for example you
01:21
may use a dictionary to represent a
01:25
collection of student records in a class
01:28
for example this particular class and in
01:30
that case the elements that make up the
01:33
dictionary or the items each item would
01:36
have a key which would be the name of
01:38
the student and the element associated
01:40
with that key would be a list of scores
01:43
on the assignments and exams in the
01:46
course
01:49
so without definition of a dictionary
01:54
for this scheme to work we would require
01:58
that no two students have the same name
02:01
in case it turns out you've got two
02:03
students with the same name we'd have to
02:05
modify the name of it so if you have two
02:07
John Smith's you could call one John
02:09
Smith one and the other one John Smith -
02:11
and now
02:13
you got two different names all right so
02:19
we require that all keys are distinctive
02:23
you know in the in real applications
02:27
it's not always possible to live with
02:30
distinct keys and so we also need to
02:36
have ways to represent dictionaries
02:37
where you have duplicates where
02:39
duplicates I mean you have many pairs
02:42
that have the same key so for example if
02:47
you look at a real dictionary of words
02:50
and then you would find many words where
02:54
that word has a large number of meanings
02:57
so each pair in a real word dictionary
03:00
would consists of a key being the word
03:04
and then associated with the word would
03:07
be the meaning of the word so for
03:11
example if you were to look up a
03:13
dictionary using this key here boat you
03:17
would find many entries under bolt so
03:26
you would need to kind of extend the
03:29
definition of a dictionary as we have it
03:31
to allow for multiple pairs with the
03:36
same key
03:37
okay now the structures we're going to
03:40
look at we're going to specifically
03:42
focus on the pure definition of a
03:45
dictionary which is the first one where
03:47
you don't have duplicates but you should
03:49
be able to see that these structures are
03:51
easily extended to handle the case where
03:54
you have duplicates okay so we won't
03:58
explicitly look at duplicates simply
04:01
assume that you can extend what we're
04:02
doing to handle duplicates okay all
04:09
right the kinds of things that people do
04:12
with dictionaries depend on whether
04:16
you're dealing with what we call a
04:18
static dictionary so that's one in which
04:21
the collection of pairs doesn't change
04:23
in time okay so at the dawn of
04:27
you're given a whole bunch of pairs you
04:31
set up the initial structure and then
04:34
you do not change the collection okay so
04:38
with a static dictionary you need to
04:40
create the data structure that
04:42
represents the collection of pairs and
04:44
then the only operation you perform on
04:48
this collection is that of searching so
04:51
you're given a key and you want to get
04:53
the element associated with that key
04:55
right that for example will be the case
05:02
if you're in the business of selling
05:05
computer-based dictionaries you can go
05:08
and buy a CD and the dictionaries on it
05:10
once you buy the dictionary you can't go
05:13
and change the contents on that CD all
05:16
you can do is you can search the
05:17
dictionary right similarly you could go
05:23
and buy a geographic database for
05:27
example and then you can currie the
05:29
database but you can't make changes to
05:31
the database
05:37
now there are other applications which
05:40
aren't necessarily static but which
05:42
behave like static dictionaries for
05:45
example if you look at at router tables
05:50
IP router tables where you have a
05:55
collection of rules now the rules that
05:58
determine where the next packet coming
06:01
in gets forwarded actually change in
06:03
time but in practice the way people use
06:08
this collection of rules is more in a
06:12
static mode where you come up with a
06:14
data structure which is optimized for
06:15
search and you perform searches for a
06:18
while and then with some periodic with
06:21
some periodicity you update the search
06:24
structure you don't update it in
06:26
real-time so even though somebody might
06:28
want to make a change to the collection
06:30
you'd batch the change and then at some
06:33
point later you rebuild the structure so
06:37
there - for most practical purposes
06:39
you're operating like a static
06:41
dictionary all right so if you have a
06:48
static dictionary the important standard
06:51
operations for a standard dictionary are
06:53
initialize the dictionary with a given
06:55
collection of pairs and then you perform
07:00
search operations in most applications
07:06
of static dictionaries the important
07:08
operation then becomes the search you
07:10
want to optimize the structure for
07:12
search you want to be able to do this
07:13
thing very fast it doesn't really matter
07:15
how much time you take to perform the
07:18
initialize so long as it's reasonable
07:20
amount of time another important
07:26
consideration here might be the amount
07:28
of space your representation takes and
07:32
it leaks in these two examples the space
07:34
is important only to the extent that you
07:38
want that whole collection of elements
07:39
to fit say onto one CD but there isn't
07:44
any great advantage to you to using 25%
07:47
of the capacity of the CD
07:53
alright the other version of a
07:55
dictionary that's important to us is the
07:56
dynamic version and here also the search
08:01
operation is important given a key to
08:03
retrieve the Associated element but in
08:06
addition to the search operation we want
08:09
to be able to perform in real-time
08:12
inserts so add a new pair to your
08:16
collection ok we want to be able to
08:19
remove an existing pair or item in real
08:25
time all right so we're gonna take a
08:31
look at structures for both static and
08:33
dynamic dictionaries now if all you want
08:41
to do is the standard operations that
08:43
I've listed here then perhaps the best
08:46
way to go is to just use a hash table
08:51
right hopefully you've all seen hash
08:53
tables before in an undergraduate class
08:55
and I really don't intend to go into any
09:00
aspects of hash table design here but
09:03
you probably recall from what you've
09:05
studied on hash tables that hash tables
09:07
give you order one expected performance
09:11
for the search insert and remove
09:15
operations so get wouldn't remove take
09:18
order one expected time and certainly in
09:22
practice this is what you see ok that
09:26
the performance really doesn't depend
09:28
upon how many elements or how many pairs
09:31
you have in your dictionary it's a
09:34
function more of something called the
09:36
density what fraction of the table is
09:38
actually occupied and the performance is
09:42
exceptionally good you actually see
09:44
order one performance that's kind of
09:46
hard to beat order one performs so as
09:51
far as expected performance goes this
09:53
would be the best way to go okay now the
09:56
worst-case performance of course is
09:58
another matter you could have a
10:00
pathological
10:02
collection of items for which the search
10:09
time in certain removes actually running
10:11
in order of n time where n is the number
10:13
of items in your collection for most
10:20
situations this is what you're going to
10:22
see but there would be some extreme
10:24
pathological cases where you may get
10:26
that kind of performance and if you
10:29
really have to guarantee that you're
10:30
running in some amount of time
10:32
independent of the collection well then
10:34
you would have a problem providing that
10:37
guarantee with hash tables now if you're
10:45
trying to do operations other than the
10:47
standard operations that I put up okay
10:51
then to hash tables would perform
10:54
miserably okay for example if we change
10:57
the search from an exact match search
11:01
okay where you're given the key and you
11:03
want to find an item with exactly that
11:06
key to something which says well I'm
11:10
really not looking for exact matches I'm
11:12
looking for what you may call a nearest
11:14
match okay so for example if you have
11:17
numeric keys we might be interested in
11:21
finding the element whose key is greater
11:27
than equal to some given key but is the
11:29
smaller such key that satisfies this
11:31
property okay in a moment we'll see an
11:35
application where we do want to answer
11:38
this type of query okay they're trying
11:42
to answer this type of curry using a
11:43
hash table is going to take your order
11:46
then time there's really no way to take
11:52
this which is what you know hash that
11:55
and expect to be anywhere near the
11:57
answer that you're looking for in the
11:59
table
12:04
another application where hash tables
12:08
would do miserably is if you also want
12:11
to support indexed operations so for
12:13
example I might say give me the element
12:16
with the third smallest key or the tenth
12:19
smallest key or remove the item with the
12:23
fifth smallest to keep okay so
12:26
operations of this type cannot be done
12:29
efficiently using hash tables
12:44
all right so let's take a look at a few
12:51
applications the first one is going to
12:54
be one which requires you to find the
12:57
element with the smallest key greater
13:00
than equal to a specified value and one
13:05
place where we use this kind of query is
13:08
in implementing the bin packing
13:10
algorithm Oh a bin packing algorithm
13:14
which is the best best fit algorithm
13:16
okay so if you go back many lectures we
13:19
talked about different bin packing
13:20
algorithms and basically we said that in
13:25
the bin packing problem you're given n
13:27
items each item has a size associated
13:34
with it you have a whole bunch of bins
13:37
and each bin has a capacity associated
13:39
with it okay and the objective was to
13:43
pack these n items into the smallest
13:46
number of binits okay we saw that that
13:50
was an np-hard problem and therefore one
13:54
typically uses heuristics there were
13:57
basically two families of heuristics we
13:59
talked about one was a first fit
14:01
heuristic and the other one was a best
14:04
fit heuristic we saw that a first fit
14:09
heuristic could be implemented
14:10
efficiently using a tournament tree and
14:14
at that time I simply said we can do
14:16
best fit efficiently to attend log n
14:18
time and we'll look at that later and
14:20
this is the later when we're going to
14:22
see how to do best fit efficiently so in
14:26
a best fit heuristic if you recall the
14:29
strategy is you look at the items that
14:32
have to be packed one at a time in some
14:34
order and then for the item you're
14:40
currently looking at you find the best
14:42
bin into which it fits and what that
14:46
meant was from the bins that are
14:49
currently in use
14:50
you find the candidate bins that have
14:52
enough space to accommodate this item
14:54
and if there are no candidate bins then
14:57
you just start a brand new then if there
15:01
are candidate bins then from the
15:03
candidate bins you pick the one that has
15:05
the smallest available capacity the an
15:13
example for items to pack
15:18
they require respectively four seven
15:20
three six units of space and you have
15:23
bins of capacity ten so we'd first take
15:28
the first item which needs four units of
15:30
space at present there's no active bin
15:34
so there's no candidate bin so you start
15:37
a new bin and you place the item into
15:40
that new bin so now this bin has six
15:45
units of space left so you look at the
15:48
next item it needs seven units of space
15:50
there's no candidate bin so you start a
15:53
new bin and you place the item in there
16:00
then you look at the next item which
16:03
needs three units of space both of these
16:07
bins are candidate bins they both have
16:09
adequate space available so from the
16:12
candidate bins you pick the one with the
16:14
smallest available space that's this one
16:16
and you pack the item into that bin then
16:24
you go to the next item need six units
16:28
of space of your active bins only one
16:32
candidate bin has enough space that's
16:34
this one and so you put it into that
16:36
candidate thing so in this particular
16:41
case you end up with an optimal packing
16:46
so the the query that you're asking each
16:49
time when you look at a bin at an item
16:51
to pack here is from the active bins
16:54
find me a bin that has the smallest
17:00
amount of available space greater than
17:02
equal to the amount that is required so
17:05
that's the nearest match query that I
17:08
put up a little while ago right to
17:15
implement this will make use of a
17:19
dictionary structure called a binary
17:24
search tree again something I'm assuming
17:27
you're familiar with from an
17:28
undergraduate class okay so we're going
17:34
to use a binary search tree and the
17:37
diction the binary search tree each node
17:39
is going to hold one of the items in the
17:41
dictionary and each item is of the form
17:44
the I guess each item represents one of
17:48
the active bins so the key is the
17:52
available capacity of the bin and the
17:55
bin index is going to be the element
17:59
associated with the key now in general
18:02
we don't have a pure dictionary here
18:05
because it's possible for two bins to
18:07
have the same available capacity so the
18:09
keys may not be distinct okay so really
18:12
we're going to have to extend our notion
18:14
of binary search tree to allow for
18:16
duplicates all right so to pack an item
18:26
whose space requirement is s we'll be
18:30
looking for the bin with smallest
18:33
available capacity greg- s so that's the
18:36
nearest match query that I mentioned
18:38
earlier once you've found that bin if
18:44
there is such a bin okay then we'll
18:47
reduce the available capacity of that
18:48
bin by the amount that you utilize and
18:54
the way you can do that is you first do
18:58
a remove remove the old bin from your
19:00
collection reduce its key and then
19:02
insert it back in so you kind of do a
19:07
remove followed by a put so you have a
19:13
get operation or removing a put
19:20
if there's no candidate bin well then
19:23
you just start a brand new bin reduce
19:25
its capacity by yes and put it into the
19:27
collection alright so the idea is to use
19:40
a binary search tree in a binary search
19:45
tree if you recall if you look at any
19:47
node then the keys in the left subtree
19:50
are less than the key in that node those
19:54
in the right subtree are greater than
19:56
the key in that node now in our case we
19:59
want to allow for duplicates and so to
20:02
handle duplicates we could go into one
20:05
of two different strategies one is to
20:08
relax the requirement which says left
20:09
subtree has smaller keys and right
20:12
subtree has bigger keys to left subtree
20:14
has keys that are less than equal to and
20:16
right subtree has keys that are greater
20:19
than equal to okay so if you have
20:21
duplicates there's still a place for
20:22
them they could mean the left or the
20:23
right side an alternative strategy would
20:27
be to say that if you have many people
20:32
whose key is 20 so you've got many bins
20:35
that have 20 units of space available
20:37
well then in here I might maintain a
20:39
chain of those people so as far as the
20:44
top-level tree structure is concerned
20:46
everybody still has different values
20:50
depending upon your application you may
20:53
go into different strategies to
20:55
generalize from the distinct key
20:57
requirement to allowing for duplicates
20:59
okay in this particular case either of
21:01
these two schemes would work all right
21:06
so here I've got an example let's assume
21:10
that each node represents only one bin
21:13
okay and I've probably got a total of
21:16
about twelve bins represented here and
21:19
I've got the amount of space available
21:22
in each bin that is shown the bin index
21:24
is not shown
21:28
now suppose you are packing the next
21:32
item and the next item let's say
21:35
requires 23 units of space so you'd like
21:41
to find the best fit bin okay visually
21:44
we can see that the best fit bin for 23
21:46
units of space would be 25 okay it's got
21:52
enough space you can put it in here and
21:54
there's of the people that have enough
21:57
space that's 25 35 30 and 40 this is the
22:00
one with the smallest available space so
22:03
that's where you'd like to put it you
22:07
find this beneficently in this tree you
22:10
start at the root this represents a bin
22:14
that has 20 units of available space
22:16
that's not enough from that information
22:20
you also know that none of the bins in
22:22
the left subtree are of any use to you
22:24
either
22:24
they all have less face than 20 okay so
22:27
we can ignore all of these we only need
22:30
to move into the right subtree okay the
22:33
root of the right subtree has 40 units
22:35
of space that's adequate okay so now we
22:39
have a candidate bin
22:40
there's no point looking into its right
22:42
subtree not because it's empty in this
22:46
case but because if there was anybody in
22:48
the right subtree the amount of space
22:50
they would have would be more than 40
22:51
and so they cannot be better than this
22:53
bin so we move into the left subtree the
22:58
left subtree has adequate space so this
23:01
has to be better than the best you found
23:02
so far so this becomes your candidate
23:05
there's no point going into its right
23:08
subtree everybody there is worse than
23:10
the 30 so you move into the left subtree
23:14
the amount of space this fellow has is
23:16
adequate so this is better than the best
23:18
you found so far you update your best
23:20
candidate there's no point going into
23:23
its right subtree you only go into its
23:25
left right similarly if you were looking
23:31
for let's say someone that needed maybe
23:38
17 units of space
23:42
okay so you'll start at the root okay
23:46
that's adequate so that gives you a
23:47
candidate there's no point going into
23:49
its right subtree the candidates there
23:51
are worse
23:52
so 4:17 we start from here we have found
23:56
a possible bin you move into the left
23:58
subtree that's got ten that's not
24:01
adequate there's no point going into the
24:03
left subtree you move into the right
24:04
subtree this has got 15 units of space
24:09
that's not adequate no point looking in
24:11
the left subtree move into the right
24:13
this one has enough space it's got to be
24:15
better than the best candidate found so
24:17
far and so you update your best
24:19
candidate to this one no point going
24:22
into this right subtree you're going to
24:24
its left so by walking down the tree in
24:28
time proportional to the height of this
24:31
structure we can find the best bin if
24:36
you find no bin well then you just
24:38
create a new node and insert it into the
24:40
structure if you do find a bin for
24:42
example you find the 25 here well then
24:45
you can remove this from the tree reduce
24:50
the 25 we were packing 23 at that time
24:52
so now you got two units left and then
24:54
reinsert a node with capacity to write
25:03
that be one place where we could use
25:05
this nearest match query yeah well you
25:14
question is can't we do a decrease key
25:16
okay again we can certainly define a
25:20
decrease key operation okay for
25:24
dictionaries and again if you were to
25:28
decrease your key here you would
25:31
probably have no alternative once you
25:32
find that you have violated the binary
25:34
search tree property to remove it and
25:37
reinsert it you know if the reduction
25:42
that was done was such that you did not
25:46
violate the binary search tree property
25:47
which would mean that after your
25:49
reduction in this case this fellow was
25:51
still bigger than 20 then you could
25:53
leave it here
25:54
but in other cases you would have to
25:55
remove it and put it back in all right
26:06
so the complexity of using this scheme
26:11
assuming you employ a balanced binary
26:14
search tree the balanced binary search
26:18
trees we're going to see a couple of
26:20
these later have a height guarantee of
26:23
log log of the number of nodes the
26:28
number of nodes will never exceed the
26:30
number of items you're trying to pack
26:32
because that was you'll put one item per
26:34
bin the number of nodes if you're
26:37
packing n items would be order of N and
26:44
for each item to pack you would do a get
26:48
operation find the bin that best fits
26:51
and then you might do a remove the bin
26:56
and put it back in or if there's no
27:01
candidate you might just do a put okay
27:04
but Oh n is a bound order of n is the
27:08
number of get operations that you might
27:11
do or put operations or remove
27:12
operations each of these operations
27:17
would take log n time with a balanced
27:20
binary search tree and so you would have
27:22
an N log n implementation of the best
27:25
fit heuristic
27:32
okay so even though hash tables are
27:37
great for dictionaries provided you stay
27:39
with the standard operations that were
27:41
listed there are a whole bunch of
27:43
applications there notably scientific
27:46
computing applications where you want to
27:48
do other types of things with your
27:51
collection of pairs and for those tree
27:56
based structures work out much better
27:58
one example is best fit alright any
28:04
questions about using binary search
28:07
trees to implement the best fit yeah
28:35
yeah okay the question is suppose that
28:39
the next item you're packing required
28:42
five units of space okay
28:44
so then as before we would search for
28:47
the item and we'll search for the best
28:50
bin and this would turn out to be the
28:51
best bin
28:52
okay so then having discovered that this
28:56
is the best bin I would perform a remove
28:58
operation which says remove this from
29:00
the structure and the remove operation
29:03
has to be able to remove from anywhere
29:04
in the structure okay so the fact that
29:07
this has left or right children is
29:09
really not important as far as the
29:10
remove is concerned it knows how to take
29:12
something out and we'll take a look at
29:14
remove operations later okay so so
29:21
somehow you'll get the six out you leave
29:23
behind the proper binary search tree
29:25
with eleven items you'd reduce the 61
29:28
and then put it back in
29:43
all right other questions
29:56
let's take a look at the other item that
30:00
we put up when we were talking about why
30:03
we may be interested in structures other
30:05
than hash tables and that had to do with
30:10
performing operations by index okay so
30:14
there may be times when you want to
30:16
retrieve elements given the key but then
30:19
there may be times you want to retrieve
30:20
elements given an index okay so while
30:25
the former can be done efficiently using
30:27
hash tables the ladder cannot be done
30:29
efficiently using hash tables all right
30:35
so to do indexed operations will employ
30:37
something called an indexed binary
30:39
search tree and fundamentally it's a
30:43
binary search tree which is augmented
30:46
with one field per node and this field
30:50
keeps track of the number of nodes in
30:52
the left subtree okay so for example if
30:57
we had this binary search tree then with
31:02
each node in addition to everything else
31:05
you'd have kept for normal binary search
31:06
tree we will keep a value which says how
31:09
many nodes are in the left subtree of
31:11
that node okay right so for example here
31:16
this node has nothing in his left
31:18
subtree so it's left size value is zero
31:22
left slice value there is zero this
31:28
fellow has got one node in its left
31:29
subtree that guy's got one this fellow
31:34
has got four
31:36
that one's got zero that's got zero you
31:41
go up here we've got seven in the left
31:43
subtree this fellow's gotten zero that
31:47
fellow's back none this one here's got
31:49
one and this fellow here is got three in
31:52
the left subtree
31:56
all right so each node keeps its left
31:59
size value the number of nodes in its
32:01
left subtree that gives you something
32:05
called an indexed binary search tree now
32:15
there's a correspondence between those
32:18
left size values and the rank of each
32:21
element in the collection where we first
32:24
define rank as its index if you look at
32:28
it in ascending order so for example if
32:31
these were your keys okay then the rank
32:37
of this item is zero okay so the ranks
32:40
are assigned left to right beginning at
32:42
zero the rank of this fellow so this is
32:46
0 1 2 3 4 5 6 7 so rank just gives you
32:55
your position in ascending order as
33:03
something that you can prove is that the
33:05
left size value of a node is the rank of
33:09
the items stored in that node relative
33:12
just to that subtree let's see what that
33:17
means okay the left size here is 7 and
33:22
what that tells us is if you look at
33:25
this whole collection of elements the
33:28
fellow stored here would have rank 7 the
33:33
rank out here is 4 and that says if you
33:35
just look at this collection this
33:36
subtree then this is the item with rank
33:39
4 ok this item has rank 0 1 2 3 that
33:43
would be for this 0 here tells us if you
33:48
look at just this subtree this item has
33:50
rank 0 because in that so previous got
33:53
15 and 18 similarly this 3 here tells us
33:57
if you look only at this subtree ok this
34:00
item has rank 3 ok this is rank 0 Rank 1
34:03
ranked to rank 3 so if you look only at
34:08
that subtree
34:09
the left size values stored in that root
34:12
gives you the rank of that element right
34:17
that's something you should be able to
34:19
convince yourself of back home right
34:24
this is important in performing
34:27
operations by index right suppose you
34:35
want to get the item whose rank is let's
34:39
say 8 all right so we started the root
34:49
ok and this tells us that the root rank
34:54
is 7 so we know that the fellows rank is
35:00
8 it's not the fellow in the root and
35:02
also not anybody in the left sub tree
35:04
because the ranks here are smaller than
35:06
the rank of the root relative to the
35:08
whole collection ok right so the fellow
35:12
whose rank is 8 has to be in the right
35:13
subtree ok now the question is what is
35:17
its Frank in the right subtree when you
35:24
move into the right subtree we're going
35:26
to be bypassing 8 elements of rank
35:29
smaller than the one we were looking for
35:31
ok because you've got 7 people here and
35:33
you got this fellow here okay so we're
35:39
bypassing age people are smaller rank
35:42
and so the rank of the follow you're
35:44
looking for here has to be 8 less than
35:46
the rank you were looking for before ok
35:50
so I decrease my target rank by 8 1 plus
35:53
the left size value that is stored here
35:57
so now I'm looking for somebody who's
35:59
rank is 0 in this subtree ok this
36:04
follows rank is 3 in the subtree so the
36:08
person I'm looking for is not this guy
36:09
and it's not anybody in the right
36:11
subtree because those guys have higher
36:12
rank so it's got to be somebody in the
36:14
left subtree ok when I move into the
36:18
left subtree again ask the question well
36:20
what is the rank of the fellow I'm
36:21
looking for in the left subtree
36:24
okay now when I move into a right
36:26
sub-tree I bypass a whole bunch of
36:28
people with smaller rank that causes me
36:30
to adjust the target rank but when I
36:32
move into a left subtree
36:34
I don't bypass anybody with smaller rank
36:36
and so I do not adjust the target rank
36:40
if you're looking for somebody who's
36:42
rank was zero up here well then it's
36:44
rank is still zero with respect to these
36:46
elements here okay the rank here is one
36:51
so this is not the fellow I'm looking
36:52
for I'm looking for somebody a smaller
36:54
rank I'm moving to the left sub-tree and
36:57
the target rank is still zero now I get
37:00
a match in the ranks this is the guy I'm
37:03
looking for
37:07
let's try another one okay suppose
37:10
you're looking for someone who's rank is
37:12
I don't give me a rank zero all right
37:22
okay that's kind of easy I mean you
37:25
start here looking for someone's rank is
37:27
zero it's smaller than seven so you had
37:30
to move the left sub-tree when you Lewin
37:32
when you're moving to the left sub-tree
37:33
you don't adjust the target rank okay
37:35
you're still looking for somebody's rank
37:37
is zero it's smaller than force he moved
37:40
the left sub-tree again you don't adjust
37:42
the target rank it's smaller than one
37:44
you again moving the left sub-tree you
37:46
don't adjust the target rank you get a
37:48
match and this other fellow suppose
37:52
you're trying a three okay so you're
37:55
sitting out here it's smaller than seven
37:57
so you moving the left sub-tree you
37:59
don't adjust the target rank 3 is
38:01
smaller than 4 you move the left
38:03
sub-tree you don't adjust the target
38:04
rank 3 is bigger than 1 when you move
38:08
the right sub-tree got to adjust the
38:09
target rank and the adjustment is
38:11
subtract off the left size plus one so
38:14
we subtract two so now we're looking for
38:16
someone who's right rank is one you got
38:19
a match
38:20
this fellow is the guy who's rank is 3
38:26
okay so you find somebody with a given
38:31
index you simply move down the tree
38:34
whenever you move to the right
38:35
you're just the target ranked by
38:37
subtracting left sighs and another one
38:38
when you move to the left sub-tree you
38:40
don't do any adjustment when you find a
38:42
match between the target rank and the
38:44
rank posted in the node you found what
38:46
you're looking for if the height of the
38:49
structure is logarithmic it takes you
38:51
log n time to find the element given its
38:53
rank once you found the element you can
38:58
do a remove and the removal algorithms
39:02
as we'll see later also work in
39:06
logarithmic time if you were dealing
39:08
with balance trees if you want to insert
39:11
an item the insertion is done by key
39:15
because you have to maintain the binary
39:17
search tree property okay so once you're
39:22
given a pair to insert you look at its
39:24
key and based on that you decide where
39:25
to put it so for example if you wanted
39:29
to insert somebody with value 43 it's
39:32
bigger than 20 it's bigger than 40
39:33
you're going to put it down here okay in
39:39
this case since I move since 40 is
39:42
bigger than 20 I know I'm going to do
39:44
the insertion on the right side so this
39:45
guy's left sides are not going to be
39:47
affected okay again here I'm moving to
39:50
the right side so the 3 is going to
39:52
remain at 3 on the other hand if you
39:55
were let's say inserting a 32 okay you'd
40:01
come here that's going to stay 7 because
40:03
you went to the right now you're going
40:06
to go here which means this is going to
40:08
go up by 1 so this will become a 4
40:10
you're going to the right so that'll
40:12
stay at 1 you're going down here so this
40:15
will go up by 1 okay so the only nodes
40:21
whose index values or left size value
40:25
changes are those on your insert path
40:28
and and and that's really what enables
40:32
you to do the insert fast in logarithmic
40:35
time if you actually stored physical
40:38
index values on these no
40:40
rather than left sighs values if I said
40:43
this guy's index is zero that guy's
40:44
index is one or rank and this guy's rank
40:47
is 2 and that's three that rank is four
40:49
and that's five and six and so on if you
40:52
insert somebody here everybody's rank
40:54
changes and there's no way to do the
40:57
insert in less than order n time okay so
41:01
so the reason you keep left size values
41:04
rather than ranks on these nodes is
41:06
because only logarithmic left size
41:09
values change when you do an insert or a
41:11
delete whereas everybody's rank can
41:15
change when you do an insert or a delete
41:19
okay all right so this structure is good
41:22
for for get by index removed by index
41:29
can also be done in logarithmic time
41:35
yeah actually you can okay I guess this
41:40
is just a restatement of what I said
41:42
before when you go into the left
41:45
sub-tree the the target index doesn't
41:48
change when you go into the right
41:50
sub-tree you decrease the target index
41:51
by the left sides value and an
41:53
additional one you can use indexed
42:02
binary trees not binary search trees but
42:04
index binary trees to represent linear
42:07
lists in a linear list items don't have
42:13
a key associated with them they just
42:14
have an index and the index is the same
42:17
as the rank okay and so the operations
42:20
that you perform there are like you get
42:24
by index you insert by index and you
42:27
remove by index
42:32
so the strategy is the same the items
42:35
are placed into the tree such that if
42:38
you do an inorder traversal you would
42:40
hit the items in list order and then the
42:46
same definition of left size as we had
42:48
before just there are no keys alright so
42:59
in a linear list probably one of the
43:01
first data structures anybody studies
43:03
the operations that are done are really
43:06
get by index put by index and remove by
43:09
index okay and the typical ways to
43:13
represent a linear list one is just map
43:15
them into an array in the obvious
43:16
fashion a linear list looks like an
43:18
array of elements just put it in the way
43:20
it looks then you can do a get in order
43:25
one time you want to get the io element
43:27
you access the eyuth position in the
43:29
array but a put and a remove take order
43:34
n time because you got to move people
43:36
around to make space for the new item
43:38
you're putting in if you use a chain
43:42
then everything takes order of n time if
43:49
you use an indexed balanced search tree
43:52
this particular one is called an AVL
43:54
tree we'll see these later okay then all
43:57
of your operations will run in
43:59
logarithmic time okay so you infuriate
44:04
two arrays as far as get is concerned
44:07
because you don't have the random access
44:08
capability but you're doing a lot better
44:11
on the puts and the removes instead of
44:14
order and you're taking logarithmic time
44:20
to get a feel for what that means
44:23
suppose you started off with an empty
44:25
linear list and you performed 40,000
44:30
inserts and then
44:35
you do let's say 40,000 gets and then
44:40
you do 40,000 removes and bring it back
44:41
down to empty and you kind of clock to
44:44
see how much time did the inserts take
44:45
how much did the gets take and how much
44:47
did the removes take okay all right so
44:52
some results obviously done a few years
44:55
ago using a somewhat slow machine and
44:58
using Java all right so the 40,000 gets
45:06
you have a random access order one
45:08
that's like 5.6 milliseconds in a chain
45:14
okay now we're accessing each element
45:16
once so each of the 40,000 elements did
45:18
going to get once it's taking her in 57
45:22
seconds
45:24
if you had an indexed AVL tree you're
45:26
taking 63 milliseconds so that was log n
45:29
we certainly didn't expect it to
45:32
outperform an order one thank so you're
45:35
taking about ten times a little more
45:36
than ten times if you look at the
45:39
average case here for puts so average
45:42
puts here is five point eight seconds
45:44
115 versus 392 milliseconds the average
45:49
case for a remove here is five point
45:51
eight seconds 149 seconds and one point
45:54
five if you look at the cost of the
45:57
whole sequence 40,000 inserts 40,000
45:59
gets 40,000 removes so here you're
46:03
looking at five point six more seconds
46:04
plus five point eight plus five point
46:06
eight maybe a little bit below twelve
46:08
seconds over here you got sixty three
46:12
milliseconds 392 milliseconds and 1.5 so
46:16
maybe you got about one point nine to
46:18
two seconds okay for the mix so about
46:21
two seconds for the mix here versus
46:23
about 12 seconds for the mix out here of
46:28
course if you go to to larger lists
46:31
100,000 200,000 the difference is going
46:34
to be a lot more
46:39
okay alright so you can use tree
46:44
structures for dictionaries to perform a
46:48
variety of operations far more
46:51
efficiently
46:52
then you can perform these using hash
46:55
tables or for that matter using say
46:59
chains or race again it's important to
47:04
keep in mind that if if your objective
47:08
is to represent the pure dictionary you
47:10
just want to do exact match gets foot's
47:13
and removes you're going to have a very
47:17
very hard time outperforming a hash
47:19
table and so the tree structures that
47:24
we're going to study would be
47:25
recommended for those applications only
47:27
when you have to absolutely guarantee
47:29
some worst-case performance when you
47:32
can't take the risk that you may not
47:35
actually see the expected performance
47:37
okay but for the other applications we
47:40
put up a hash table is not an option you
47:44
really have to go with search tree type
47:47
structures and our focus here and for
47:51
the next several lectures really is
47:53
going to be looking at tree structures
47:55
for dictionaries and again even though
47:59
we spend a lot of time on that I do want
48:01
you to keep in mind that hash tables are
48:03
there they're very very valuable and
48:05
there's a place for that all right any
48:09
questions okay thanks
48:18
you