Transcript 00:03 okay all ready to start today we're 00:08 going to start looking at data 00:11 structures for dictionaries I guess last 00:15 several lectures if you recall we were 00:16 looking at data structures for proudly 00:18 Q's different types of product use we 00:20 now turn our attention to dictionaries 00:23 okay right so a dictionary is basically 00:28 a collection of items where each item is 00:32 a pair you have a key associated with 00:36 the item and you have an element 00:38 associated with the item is a lot like 00:44 the collections you were keeping in 00:46 project queues you had keys and he had 00:48 elements the difference comes about in 00:51 terms of the kinds of things you can do 00:53 with this collection we make the 00:59 assumption that your collection of pairs 01:02 are such that no two pairs have the same 01:05 key in the case of a prior to queue you 01:09 allow many pairs to have the same key 01:12 here we don't we require all pairs to 01:15 have different keys so for example you 01:21 may use a dictionary to represent a 01:25 collection of student records in a class 01:28 for example this particular class and in 01:30 that case the elements that make up the 01:33 dictionary or the items each item would 01:36 have a key which would be the name of 01:38 the student and the element associated 01:40 with that key would be a list of scores 01:43 on the assignments and exams in the 01:46 course 01:49 so without definition of a dictionary 01:54 for this scheme to work we would require 01:58 that no two students have the same name 02:01 in case it turns out you've got two 02:03 students with the same name we'd have to 02:05 modify the name of it so if you have two 02:07 John Smith's you could call one John 02:09 Smith one and the other one John Smith - 02:11 and now 02:13 you got two different names all right so 02:19 we require that all keys are distinctive 02:23 you know in the in real applications 02:27 it's not always possible to live with 02:30 distinct keys and so we also need to 02:36 have ways to represent dictionaries 02:37 where you have duplicates where 02:39 duplicates I mean you have many pairs 02:42 that have the same key so for example if 02:47 you look at a real dictionary of words 02:50 and then you would find many words where 02:54 that word has a large number of meanings 02:57 so each pair in a real word dictionary 03:00 would consists of a key being the word 03:04 and then associated with the word would 03:07 be the meaning of the word so for 03:11 example if you were to look up a 03:13 dictionary using this key here boat you 03:17 would find many entries under bolt so 03:26 you would need to kind of extend the 03:29 definition of a dictionary as we have it 03:31 to allow for multiple pairs with the 03:36 same key 03:37 okay now the structures we're going to 03:40 look at we're going to specifically 03:42 focus on the pure definition of a 03:45 dictionary which is the first one where 03:47 you don't have duplicates but you should 03:49 be able to see that these structures are 03:51 easily extended to handle the case where 03:54 you have duplicates okay so we won't 03:58 explicitly look at duplicates simply 04:01 assume that you can extend what we're 04:02 doing to handle duplicates okay all 04:09 right the kinds of things that people do 04:12 with dictionaries depend on whether 04:16 you're dealing with what we call a 04:18 static dictionary so that's one in which 04:21 the collection of pairs doesn't change 04:23 in time okay so at the dawn of 04:27 you're given a whole bunch of pairs you 04:31 set up the initial structure and then 04:34 you do not change the collection okay so 04:38 with a static dictionary you need to 04:40 create the data structure that 04:42 represents the collection of pairs and 04:44 then the only operation you perform on 04:48 this collection is that of searching so 04:51 you're given a key and you want to get 04:53 the element associated with that key 04:55 right that for example will be the case 05:02 if you're in the business of selling 05:05 computer-based dictionaries you can go 05:08 and buy a CD and the dictionaries on it 05:10 once you buy the dictionary you can't go 05:13 and change the contents on that CD all 05:16 you can do is you can search the 05:17 dictionary right similarly you could go 05:23 and buy a geographic database for 05:27 example and then you can currie the 05:29 database but you can't make changes to 05:31 the database 05:37 now there are other applications which 05:40 aren't necessarily static but which 05:42 behave like static dictionaries for 05:45 example if you look at at router tables 05:50 IP router tables where you have a 05:55 collection of rules now the rules that 05:58 determine where the next packet coming 06:01 in gets forwarded actually change in 06:03 time but in practice the way people use 06:08 this collection of rules is more in a 06:12 static mode where you come up with a 06:14 data structure which is optimized for 06:15 search and you perform searches for a 06:18 while and then with some periodic with 06:21 some periodicity you update the search 06:24 structure you don't update it in 06:26 real-time so even though somebody might 06:28 want to make a change to the collection 06:30 you'd batch the change and then at some 06:33 point later you rebuild the structure so 06:37 there - for most practical purposes 06:39 you're operating like a static 06:41 dictionary all right so if you have a 06:48 static dictionary the important standard 06:51 operations for a standard dictionary are 06:53 initialize the dictionary with a given 06:55 collection of pairs and then you perform 07:00 search operations in most applications 07:06 of static dictionaries the important 07:08 operation then becomes the search you 07:10 want to optimize the structure for 07:12 search you want to be able to do this 07:13 thing very fast it doesn't really matter 07:15 how much time you take to perform the 07:18 initialize so long as it's reasonable 07:20 amount of time another important 07:26 consideration here might be the amount 07:28 of space your representation takes and 07:32 it leaks in these two examples the space 07:34 is important only to the extent that you 07:38 want that whole collection of elements 07:39 to fit say onto one CD but there isn't 07:44 any great advantage to you to using 25% 07:47 of the capacity of the CD 07:53 alright the other version of a 07:55 dictionary that's important to us is the 07:56 dynamic version and here also the search 08:01 operation is important given a key to 08:03 retrieve the Associated element but in 08:06 addition to the search operation we want 08:09 to be able to perform in real-time 08:12 inserts so add a new pair to your 08:16 collection ok we want to be able to 08:19 remove an existing pair or item in real 08:25 time all right so we're gonna take a 08:31 look at structures for both static and 08:33 dynamic dictionaries now if all you want 08:41 to do is the standard operations that 08:43 I've listed here then perhaps the best 08:46 way to go is to just use a hash table 08:51 right hopefully you've all seen hash 08:53 tables before in an undergraduate class 08:55 and I really don't intend to go into any 09:00 aspects of hash table design here but 09:03 you probably recall from what you've 09:05 studied on hash tables that hash tables 09:07 give you order one expected performance 09:11 for the search insert and remove 09:15 operations so get wouldn't remove take 09:18 order one expected time and certainly in 09:22 practice this is what you see ok that 09:26 the performance really doesn't depend 09:28 upon how many elements or how many pairs 09:31 you have in your dictionary it's a 09:34 function more of something called the 09:36 density what fraction of the table is 09:38 actually occupied and the performance is 09:42 exceptionally good you actually see 09:44 order one performance that's kind of 09:46 hard to beat order one performs so as 09:51 far as expected performance goes this 09:53 would be the best way to go okay now the 09:56 worst-case performance of course is 09:58 another matter you could have a 10:00 pathological 10:02 collection of items for which the search 10:09 time in certain removes actually running 10:11 in order of n time where n is the number 10:13 of items in your collection for most 10:20 situations this is what you're going to 10:22 see but there would be some extreme 10:24 pathological cases where you may get 10:26 that kind of performance and if you 10:29 really have to guarantee that you're 10:30 running in some amount of time 10:32 independent of the collection well then 10:34 you would have a problem providing that 10:37 guarantee with hash tables now if you're 10:45 trying to do operations other than the 10:47 standard operations that I put up okay 10:51 then to hash tables would perform 10:54 miserably okay for example if we change 10:57 the search from an exact match search 11:01 okay where you're given the key and you 11:03 want to find an item with exactly that 11:06 key to something which says well I'm 11:10 really not looking for exact matches I'm 11:12 looking for what you may call a nearest 11:14 match okay so for example if you have 11:17 numeric keys we might be interested in 11:21 finding the element whose key is greater 11:27 than equal to some given key but is the 11:29 smaller such key that satisfies this 11:31 property okay in a moment we'll see an 11:35 application where we do want to answer 11:38 this type of query okay they're trying 11:42 to answer this type of curry using a 11:43 hash table is going to take your order 11:46 then time there's really no way to take 11:52 this which is what you know hash that 11:55 and expect to be anywhere near the 11:57 answer that you're looking for in the 11:59 table 12:04 another application where hash tables 12:08 would do miserably is if you also want 12:11 to support indexed operations so for 12:13 example I might say give me the element 12:16 with the third smallest key or the tenth 12:19 smallest key or remove the item with the 12:23 fifth smallest to keep okay so 12:26 operations of this type cannot be done 12:29 efficiently using hash tables 12:44 all right so let's take a look at a few 12:51 applications the first one is going to 12:54 be one which requires you to find the 12:57 element with the smallest key greater 13:00 than equal to a specified value and one 13:05 place where we use this kind of query is 13:08 in implementing the bin packing 13:10 algorithm Oh a bin packing algorithm 13:14 which is the best best fit algorithm 13:16 okay so if you go back many lectures we 13:19 talked about different bin packing 13:20 algorithms and basically we said that in 13:25 the bin packing problem you're given n 13:27 items each item has a size associated 13:34 with it you have a whole bunch of bins 13:37 and each bin has a capacity associated 13:39 with it okay and the objective was to 13:43 pack these n items into the smallest 13:46 number of binits okay we saw that that 13:50 was an np-hard problem and therefore one 13:54 typically uses heuristics there were 13:57 basically two families of heuristics we 13:59 talked about one was a first fit 14:01 heuristic and the other one was a best 14:04 fit heuristic we saw that a first fit 14:09 heuristic could be implemented 14:10 efficiently using a tournament tree and 14:14 at that time I simply said we can do 14:16 best fit efficiently to attend log n 14:18 time and we'll look at that later and 14:20 this is the later when we're going to 14:22 see how to do best fit efficiently so in 14:26 a best fit heuristic if you recall the 14:29 strategy is you look at the items that 14:32 have to be packed one at a time in some 14:34 order and then for the item you're 14:40 currently looking at you find the best 14:42 bin into which it fits and what that 14:46 meant was from the bins that are 14:49 currently in use 14:50 you find the candidate bins that have 14:52 enough space to accommodate this item 14:54 and if there are no candidate bins then 14:57 you just start a brand new then if there 15:01 are candidate bins then from the 15:03 candidate bins you pick the one that has 15:05 the smallest available capacity the an 15:13 example for items to pack 15:18 they require respectively four seven 15:20 three six units of space and you have 15:23 bins of capacity ten so we'd first take 15:28 the first item which needs four units of 15:30 space at present there's no active bin 15:34 so there's no candidate bin so you start 15:37 a new bin and you place the item into 15:40 that new bin so now this bin has six 15:45 units of space left so you look at the 15:48 next item it needs seven units of space 15:50 there's no candidate bin so you start a 15:53 new bin and you place the item in there 16:00 then you look at the next item which 16:03 needs three units of space both of these 16:07 bins are candidate bins they both have 16:09 adequate space available so from the 16:12 candidate bins you pick the one with the 16:14 smallest available space that's this one 16:16 and you pack the item into that bin then 16:24 you go to the next item need six units 16:28 of space of your active bins only one 16:32 candidate bin has enough space that's 16:34 this one and so you put it into that 16:36 candidate thing so in this particular 16:41 case you end up with an optimal packing 16:46 so the the query that you're asking each 16:49 time when you look at a bin at an item 16:51 to pack here is from the active bins 16:54 find me a bin that has the smallest 17:00 amount of available space greater than 17:02 equal to the amount that is required so 17:05 that's the nearest match query that I 17:08 put up a little while ago right to 17:15 implement this will make use of a 17:19 dictionary structure called a binary 17:24 search tree again something I'm assuming 17:27 you're familiar with from an 17:28 undergraduate class okay so we're going 17:34 to use a binary search tree and the 17:37 diction the binary search tree each node 17:39 is going to hold one of the items in the 17:41 dictionary and each item is of the form 17:44 the I guess each item represents one of 17:48 the active bins so the key is the 17:52 available capacity of the bin and the 17:55 bin index is going to be the element 17:59 associated with the key now in general 18:02 we don't have a pure dictionary here 18:05 because it's possible for two bins to 18:07 have the same available capacity so the 18:09 keys may not be distinct okay so really 18:12 we're going to have to extend our notion 18:14 of binary search tree to allow for 18:16 duplicates all right so to pack an item 18:26 whose space requirement is s we'll be 18:30 looking for the bin with smallest 18:33 available capacity greg- s so that's the 18:36 nearest match query that I mentioned 18:38 earlier once you've found that bin if 18:44 there is such a bin okay then we'll 18:47 reduce the available capacity of that 18:48 bin by the amount that you utilize and 18:54 the way you can do that is you first do 18:58 a remove remove the old bin from your 19:00 collection reduce its key and then 19:02 insert it back in so you kind of do a 19:07 remove followed by a put so you have a 19:13 get operation or removing a put 19:20 if there's no candidate bin well then 19:23 you just start a brand new bin reduce 19:25 its capacity by yes and put it into the 19:27 collection alright so the idea is to use 19:40 a binary search tree in a binary search 19:45 tree if you recall if you look at any 19:47 node then the keys in the left subtree 19:50 are less than the key in that node those 19:54 in the right subtree are greater than 19:56 the key in that node now in our case we 19:59 want to allow for duplicates and so to 20:02 handle duplicates we could go into one 20:05 of two different strategies one is to 20:08 relax the requirement which says left 20:09 subtree has smaller keys and right 20:12 subtree has bigger keys to left subtree 20:14 has keys that are less than equal to and 20:16 right subtree has keys that are greater 20:19 than equal to okay so if you have 20:21 duplicates there's still a place for 20:22 them they could mean the left or the 20:23 right side an alternative strategy would 20:27 be to say that if you have many people 20:32 whose key is 20 so you've got many bins 20:35 that have 20 units of space available 20:37 well then in here I might maintain a 20:39 chain of those people so as far as the 20:44 top-level tree structure is concerned 20:46 everybody still has different values 20:50 depending upon your application you may 20:53 go into different strategies to 20:55 generalize from the distinct key 20:57 requirement to allowing for duplicates 20:59 okay in this particular case either of 21:01 these two schemes would work all right 21:06 so here I've got an example let's assume 21:10 that each node represents only one bin 21:13 okay and I've probably got a total of 21:16 about twelve bins represented here and 21:19 I've got the amount of space available 21:22 in each bin that is shown the bin index 21:24 is not shown 21:28 now suppose you are packing the next 21:32 item and the next item let's say 21:35 requires 23 units of space so you'd like 21:41 to find the best fit bin okay visually 21:44 we can see that the best fit bin for 23 21:46 units of space would be 25 okay it's got 21:52 enough space you can put it in here and 21:54 there's of the people that have enough 21:57 space that's 25 35 30 and 40 this is the 22:00 one with the smallest available space so 22:03 that's where you'd like to put it you 22:07 find this beneficently in this tree you 22:10 start at the root this represents a bin 22:14 that has 20 units of available space 22:16 that's not enough from that information 22:20 you also know that none of the bins in 22:22 the left subtree are of any use to you 22:24 either 22:24 they all have less face than 20 okay so 22:27 we can ignore all of these we only need 22:30 to move into the right subtree okay the 22:33 root of the right subtree has 40 units 22:35 of space that's adequate okay so now we 22:39 have a candidate bin 22:40 there's no point looking into its right 22:42 subtree not because it's empty in this 22:46 case but because if there was anybody in 22:48 the right subtree the amount of space 22:50 they would have would be more than 40 22:51 and so they cannot be better than this 22:53 bin so we move into the left subtree the 22:58 left subtree has adequate space so this 23:01 has to be better than the best you found 23:02 so far so this becomes your candidate 23:05 there's no point going into its right 23:08 subtree everybody there is worse than 23:10 the 30 so you move into the left subtree 23:14 the amount of space this fellow has is 23:16 adequate so this is better than the best 23:18 you found so far you update your best 23:20 candidate there's no point going into 23:23 its right subtree you only go into its 23:25 left right similarly if you were looking 23:31 for let's say someone that needed maybe 23:38 17 units of space 23:42 okay so you'll start at the root okay 23:46 that's adequate so that gives you a 23:47 candidate there's no point going into 23:49 its right subtree the candidates there 23:51 are worse 23:52 so 4:17 we start from here we have found 23:56 a possible bin you move into the left 23:58 subtree that's got ten that's not 24:01 adequate there's no point going into the 24:03 left subtree you move into the right 24:04 subtree this has got 15 units of space 24:09 that's not adequate no point looking in 24:11 the left subtree move into the right 24:13 this one has enough space it's got to be 24:15 better than the best candidate found so 24:17 far and so you update your best 24:19 candidate to this one no point going 24:22 into this right subtree you're going to 24:24 its left so by walking down the tree in 24:28 time proportional to the height of this 24:31 structure we can find the best bin if 24:36 you find no bin well then you just 24:38 create a new node and insert it into the 24:40 structure if you do find a bin for 24:42 example you find the 25 here well then 24:45 you can remove this from the tree reduce 24:50 the 25 we were packing 23 at that time 24:52 so now you got two units left and then 24:54 reinsert a node with capacity to write 25:03 that be one place where we could use 25:05 this nearest match query yeah well you 25:14 question is can't we do a decrease key 25:16 okay again we can certainly define a 25:20 decrease key operation okay for 25:24 dictionaries and again if you were to 25:28 decrease your key here you would 25:31 probably have no alternative once you 25:32 find that you have violated the binary 25:34 search tree property to remove it and 25:37 reinsert it you know if the reduction 25:42 that was done was such that you did not 25:46 violate the binary search tree property 25:47 which would mean that after your 25:49 reduction in this case this fellow was 25:51 still bigger than 20 then you could 25:53 leave it here 25:54 but in other cases you would have to 25:55 remove it and put it back in all right 26:06 so the complexity of using this scheme 26:11 assuming you employ a balanced binary 26:14 search tree the balanced binary search 26:18 trees we're going to see a couple of 26:20 these later have a height guarantee of 26:23 log log of the number of nodes the 26:28 number of nodes will never exceed the 26:30 number of items you're trying to pack 26:32 because that was you'll put one item per 26:34 bin the number of nodes if you're 26:37 packing n items would be order of N and 26:44 for each item to pack you would do a get 26:48 operation find the bin that best fits 26:51 and then you might do a remove the bin 26:56 and put it back in or if there's no 27:01 candidate you might just do a put okay 27:04 but Oh n is a bound order of n is the 27:08 number of get operations that you might 27:11 do or put operations or remove 27:12 operations each of these operations 27:17 would take log n time with a balanced 27:20 binary search tree and so you would have 27:22 an N log n implementation of the best 27:25 fit heuristic 27:32 okay so even though hash tables are 27:37 great for dictionaries provided you stay 27:39 with the standard operations that were 27:41 listed there are a whole bunch of 27:43 applications there notably scientific 27:46 computing applications where you want to 27:48 do other types of things with your 27:51 collection of pairs and for those tree 27:56 based structures work out much better 27:58 one example is best fit alright any 28:04 questions about using binary search 28:07 trees to implement the best fit yeah 28:35 yeah okay the question is suppose that 28:39 the next item you're packing required 28:42 five units of space okay 28:44 so then as before we would search for 28:47 the item and we'll search for the best 28:50 bin and this would turn out to be the 28:51 best bin 28:52 okay so then having discovered that this 28:56 is the best bin I would perform a remove 28:58 operation which says remove this from 29:00 the structure and the remove operation 29:03 has to be able to remove from anywhere 29:04 in the structure okay so the fact that 29:07 this has left or right children is 29:09 really not important as far as the 29:10 remove is concerned it knows how to take 29:12 something out and we'll take a look at 29:14 remove operations later okay so so 29:21 somehow you'll get the six out you leave 29:23 behind the proper binary search tree 29:25 with eleven items you'd reduce the 61 29:28 and then put it back in 29:43 all right other questions 29:56 let's take a look at the other item that 30:00 we put up when we were talking about why 30:03 we may be interested in structures other 30:05 than hash tables and that had to do with 30:10 performing operations by index okay so 30:14 there may be times when you want to 30:16 retrieve elements given the key but then 30:19 there may be times you want to retrieve 30:20 elements given an index okay so while 30:25 the former can be done efficiently using 30:27 hash tables the ladder cannot be done 30:29 efficiently using hash tables all right 30:35 so to do indexed operations will employ 30:37 something called an indexed binary 30:39 search tree and fundamentally it's a 30:43 binary search tree which is augmented 30:46 with one field per node and this field 30:50 keeps track of the number of nodes in 30:52 the left subtree okay so for example if 30:57 we had this binary search tree then with 31:02 each node in addition to everything else 31:05 you'd have kept for normal binary search 31:06 tree we will keep a value which says how 31:09 many nodes are in the left subtree of 31:11 that node okay right so for example here 31:16 this node has nothing in his left 31:18 subtree so it's left size value is zero 31:22 left slice value there is zero this 31:28 fellow has got one node in its left 31:29 subtree that guy's got one this fellow 31:34 has got four 31:36 that one's got zero that's got zero you 31:41 go up here we've got seven in the left 31:43 subtree this fellow's gotten zero that 31:47 fellow's back none this one here's got 31:49 one and this fellow here is got three in 31:52 the left subtree 31:56 all right so each node keeps its left 31:59 size value the number of nodes in its 32:01 left subtree that gives you something 32:05 called an indexed binary search tree now 32:15 there's a correspondence between those 32:18 left size values and the rank of each 32:21 element in the collection where we first 32:24 define rank as its index if you look at 32:28 it in ascending order so for example if 32:31 these were your keys okay then the rank 32:37 of this item is zero okay so the ranks 32:40 are assigned left to right beginning at 32:42 zero the rank of this fellow so this is 32:46 0 1 2 3 4 5 6 7 so rank just gives you 32:55 your position in ascending order as 33:03 something that you can prove is that the 33:05 left size value of a node is the rank of 33:09 the items stored in that node relative 33:12 just to that subtree let's see what that 33:17 means okay the left size here is 7 and 33:22 what that tells us is if you look at 33:25 this whole collection of elements the 33:28 fellow stored here would have rank 7 the 33:33 rank out here is 4 and that says if you 33:35 just look at this collection this 33:36 subtree then this is the item with rank 33:39 4 ok this item has rank 0 1 2 3 that 33:43 would be for this 0 here tells us if you 33:48 look at just this subtree this item has 33:50 rank 0 because in that so previous got 33:53 15 and 18 similarly this 3 here tells us 33:57 if you look only at this subtree ok this 34:00 item has rank 3 ok this is rank 0 Rank 1 34:03 ranked to rank 3 so if you look only at 34:08 that subtree 34:09 the left size values stored in that root 34:12 gives you the rank of that element right 34:17 that's something you should be able to 34:19 convince yourself of back home right 34:24 this is important in performing 34:27 operations by index right suppose you 34:35 want to get the item whose rank is let's 34:39 say 8 all right so we started the root 34:49 ok and this tells us that the root rank 34:54 is 7 so we know that the fellows rank is 35:00 8 it's not the fellow in the root and 35:02 also not anybody in the left sub tree 35:04 because the ranks here are smaller than 35:06 the rank of the root relative to the 35:08 whole collection ok right so the fellow 35:12 whose rank is 8 has to be in the right 35:13 subtree ok now the question is what is 35:17 its Frank in the right subtree when you 35:24 move into the right subtree we're going 35:26 to be bypassing 8 elements of rank 35:29 smaller than the one we were looking for 35:31 ok because you've got 7 people here and 35:33 you got this fellow here okay so we're 35:39 bypassing age people are smaller rank 35:42 and so the rank of the follow you're 35:44 looking for here has to be 8 less than 35:46 the rank you were looking for before ok 35:50 so I decrease my target rank by 8 1 plus 35:53 the left size value that is stored here 35:57 so now I'm looking for somebody who's 35:59 rank is 0 in this subtree ok this 36:04 follows rank is 3 in the subtree so the 36:08 person I'm looking for is not this guy 36:09 and it's not anybody in the right 36:11 subtree because those guys have higher 36:12 rank so it's got to be somebody in the 36:14 left subtree ok when I move into the 36:18 left subtree again ask the question well 36:20 what is the rank of the fellow I'm 36:21 looking for in the left subtree 36:24 okay now when I move into a right 36:26 sub-tree I bypass a whole bunch of 36:28 people with smaller rank that causes me 36:30 to adjust the target rank but when I 36:32 move into a left subtree 36:34 I don't bypass anybody with smaller rank 36:36 and so I do not adjust the target rank 36:40 if you're looking for somebody who's 36:42 rank was zero up here well then it's 36:44 rank is still zero with respect to these 36:46 elements here okay the rank here is one 36:51 so this is not the fellow I'm looking 36:52 for I'm looking for somebody a smaller 36:54 rank I'm moving to the left sub-tree and 36:57 the target rank is still zero now I get 37:00 a match in the ranks this is the guy I'm 37:03 looking for 37:07 let's try another one okay suppose 37:10 you're looking for someone who's rank is 37:12 I don't give me a rank zero all right 37:22 okay that's kind of easy I mean you 37:25 start here looking for someone's rank is 37:27 zero it's smaller than seven so you had 37:30 to move the left sub-tree when you Lewin 37:32 when you're moving to the left sub-tree 37:33 you don't adjust the target rank okay 37:35 you're still looking for somebody's rank 37:37 is zero it's smaller than force he moved 37:40 the left sub-tree again you don't adjust 37:42 the target rank it's smaller than one 37:44 you again moving the left sub-tree you 37:46 don't adjust the target rank you get a 37:48 match and this other fellow suppose 37:52 you're trying a three okay so you're 37:55 sitting out here it's smaller than seven 37:57 so you moving the left sub-tree you 37:59 don't adjust the target rank 3 is 38:01 smaller than 4 you move the left 38:03 sub-tree you don't adjust the target 38:04 rank 3 is bigger than 1 when you move 38:08 the right sub-tree got to adjust the 38:09 target rank and the adjustment is 38:11 subtract off the left size plus one so 38:14 we subtract two so now we're looking for 38:16 someone who's right rank is one you got 38:19 a match 38:20 this fellow is the guy who's rank is 3 38:26 okay so you find somebody with a given 38:31 index you simply move down the tree 38:34 whenever you move to the right 38:35 you're just the target ranked by 38:37 subtracting left sighs and another one 38:38 when you move to the left sub-tree you 38:40 don't do any adjustment when you find a 38:42 match between the target rank and the 38:44 rank posted in the node you found what 38:46 you're looking for if the height of the 38:49 structure is logarithmic it takes you 38:51 log n time to find the element given its 38:53 rank once you found the element you can 38:58 do a remove and the removal algorithms 39:02 as we'll see later also work in 39:06 logarithmic time if you were dealing 39:08 with balance trees if you want to insert 39:11 an item the insertion is done by key 39:15 because you have to maintain the binary 39:17 search tree property okay so once you're 39:22 given a pair to insert you look at its 39:24 key and based on that you decide where 39:25 to put it so for example if you wanted 39:29 to insert somebody with value 43 it's 39:32 bigger than 20 it's bigger than 40 39:33 you're going to put it down here okay in 39:39 this case since I move since 40 is 39:42 bigger than 20 I know I'm going to do 39:44 the insertion on the right side so this 39:45 guy's left sides are not going to be 39:47 affected okay again here I'm moving to 39:50 the right side so the 3 is going to 39:52 remain at 3 on the other hand if you 39:55 were let's say inserting a 32 okay you'd 40:01 come here that's going to stay 7 because 40:03 you went to the right now you're going 40:06 to go here which means this is going to 40:08 go up by 1 so this will become a 4 40:10 you're going to the right so that'll 40:12 stay at 1 you're going down here so this 40:15 will go up by 1 okay so the only nodes 40:21 whose index values or left size value 40:25 changes are those on your insert path 40:28 and and and that's really what enables 40:32 you to do the insert fast in logarithmic 40:35 time if you actually stored physical 40:38 index values on these no 40:40 rather than left sighs values if I said 40:43 this guy's index is zero that guy's 40:44 index is one or rank and this guy's rank 40:47 is 2 and that's three that rank is four 40:49 and that's five and six and so on if you 40:52 insert somebody here everybody's rank 40:54 changes and there's no way to do the 40:57 insert in less than order n time okay so 41:01 so the reason you keep left size values 41:04 rather than ranks on these nodes is 41:06 because only logarithmic left size 41:09 values change when you do an insert or a 41:11 delete whereas everybody's rank can 41:15 change when you do an insert or a delete 41:19 okay all right so this structure is good 41:22 for for get by index removed by index 41:29 can also be done in logarithmic time 41:35 yeah actually you can okay I guess this 41:40 is just a restatement of what I said 41:42 before when you go into the left 41:45 sub-tree the the target index doesn't 41:48 change when you go into the right 41:50 sub-tree you decrease the target index 41:51 by the left sides value and an 41:53 additional one you can use indexed 42:02 binary trees not binary search trees but 42:04 index binary trees to represent linear 42:07 lists in a linear list items don't have 42:13 a key associated with them they just 42:14 have an index and the index is the same 42:17 as the rank okay and so the operations 42:20 that you perform there are like you get 42:24 by index you insert by index and you 42:27 remove by index 42:32 so the strategy is the same the items 42:35 are placed into the tree such that if 42:38 you do an inorder traversal you would 42:40 hit the items in list order and then the 42:46 same definition of left size as we had 42:48 before just there are no keys alright so 42:59 in a linear list probably one of the 43:01 first data structures anybody studies 43:03 the operations that are done are really 43:06 get by index put by index and remove by 43:09 index okay and the typical ways to 43:13 represent a linear list one is just map 43:15 them into an array in the obvious 43:16 fashion a linear list looks like an 43:18 array of elements just put it in the way 43:20 it looks then you can do a get in order 43:25 one time you want to get the io element 43:27 you access the eyuth position in the 43:29 array but a put and a remove take order 43:34 n time because you got to move people 43:36 around to make space for the new item 43:38 you're putting in if you use a chain 43:42 then everything takes order of n time if 43:49 you use an indexed balanced search tree 43:52 this particular one is called an AVL 43:54 tree we'll see these later okay then all 43:57 of your operations will run in 43:59 logarithmic time okay so you infuriate 44:04 two arrays as far as get is concerned 44:07 because you don't have the random access 44:08 capability but you're doing a lot better 44:11 on the puts and the removes instead of 44:14 order and you're taking logarithmic time 44:20 to get a feel for what that means 44:23 suppose you started off with an empty 44:25 linear list and you performed 40,000 44:30 inserts and then 44:35 you do let's say 40,000 gets and then 44:40 you do 40,000 removes and bring it back 44:41 down to empty and you kind of clock to 44:44 see how much time did the inserts take 44:45 how much did the gets take and how much 44:47 did the removes take okay all right so 44:52 some results obviously done a few years 44:55 ago using a somewhat slow machine and 44:58 using Java all right so the 40,000 gets 45:06 you have a random access order one 45:08 that's like 5.6 milliseconds in a chain 45:14 okay now we're accessing each element 45:16 once so each of the 40,000 elements did 45:18 going to get once it's taking her in 57 45:22 seconds 45:24 if you had an indexed AVL tree you're 45:26 taking 63 milliseconds so that was log n 45:29 we certainly didn't expect it to 45:32 outperform an order one thank so you're 45:35 taking about ten times a little more 45:36 than ten times if you look at the 45:39 average case here for puts so average 45:42 puts here is five point eight seconds 45:44 115 versus 392 milliseconds the average 45:49 case for a remove here is five point 45:51 eight seconds 149 seconds and one point 45:54 five if you look at the cost of the 45:57 whole sequence 40,000 inserts 40,000 45:59 gets 40,000 removes so here you're 46:03 looking at five point six more seconds 46:04 plus five point eight plus five point 46:06 eight maybe a little bit below twelve 46:08 seconds over here you got sixty three 46:12 milliseconds 392 milliseconds and 1.5 so 46:16 maybe you got about one point nine to 46:18 two seconds okay for the mix so about 46:21 two seconds for the mix here versus 46:23 about 12 seconds for the mix out here of 46:28 course if you go to to larger lists 46:31 100,000 200,000 the difference is going 46:34 to be a lot more 46:39 okay alright so you can use tree 46:44 structures for dictionaries to perform a 46:48 variety of operations far more 46:51 efficiently 46:52 then you can perform these using hash 46:55 tables or for that matter using say 46:59 chains or race again it's important to 47:04 keep in mind that if if your objective 47:08 is to represent the pure dictionary you 47:10 just want to do exact match gets foot's 47:13 and removes you're going to have a very 47:17 very hard time outperforming a hash 47:19 table and so the tree structures that 47:24 we're going to study would be 47:25 recommended for those applications only 47:27 when you have to absolutely guarantee 47:29 some worst-case performance when you 47:32 can't take the risk that you may not 47:35 actually see the expected performance 47:37 okay but for the other applications we 47:40 put up a hash table is not an option you 47:44 really have to go with search tree type 47:47 structures and our focus here and for 47:51 the next several lectures really is 47:53 going to be looking at tree structures 47:55 for dictionaries and again even though 47:59 we spend a lot of time on that I do want 48:01 you to keep in mind that hash tables are 48:03 there they're very very valuable and 48:05 there's a place for that all right any 48:09 questions okay thanks 48:18 you