Transcript 00:13 all right let's let's resume our 00:18 discussion of b-trees now in the last 00:23 lecture we looked at the definition of 00:25 b-trees we also saw how to insert an 00:28 item in a b-tree and today what we plan 00:32 to do is do an analysis of the number of 00:38 disk accesses made during an insert also 00:41 take a look at how you do a delete from 00:44 a b-tree and then take a look at a 00:48 suitable data structure to represent a 00:51 node in a V tree currently we are kind 00:55 of working with the implicit assumption 00:57 that B tree nodes are going to be 01:00 represented as arrays but that may not 01:03 be the best way to represent the node of 01:05 a B tree okay well you have any 01:08 questions on things we've done so far 01:10 before I do the balance of the B tree 01:14 stuff okay all right so things we're 01:23 going to do today worst an average case 01:27 accesses during an insert take a look at 01:30 the delete operation the analysis of the 01:32 delete and look at a structure for V 01:34 tree nodes okay all right so let's go 01:38 take a quick look at the insert 01:40 operation okay if you're trying to 01:46 insert say a key 14 okay 01:50 and if you assume that the whole be tree 01:52 is sitting on a disk then we need to do 01:57 one access to get the root okay one 01:59 access to get 02:03 this node and then an access to get that 02:07 node okay so three read accesses will 02:09 modify the node we'll need to write it 02:11 backs a total of four okay so three 02:15 reads at one right four accesses to 02:17 insert the 14 if you insert two it's 02:22 going to come in over here so again 02:24 three read accesses on the way down then 02:26 when you put it in here we're going to 02:28 split this node okay so we'll write out 02:31 the new version of this node will write 02:33 out the new split node and we have to 02:35 write out this one because that gets 02:37 changed so when a node gets split okay 02:41 you have to write out the old node and 02:45 the new version and then we're going to 02:47 kind of back up here and do something in 02:48 in this case we just have to write this 02:50 one out if you insert an 18 okay so 02:58 that's going to come into this node here 02:59 so three read accesses on the way down 03:02 this gets split so we'll write out two 03:04 nodes at this level then this one will 03:07 get split so we'll write out two nodes 03:09 at that level then this one will get 03:11 split so you write out two nodes at that 03:14 level and then you create a new route so 03:16 you write out one okay so whenever 03:19 there's a split at the level at which 03:21 there's a split you do two rights and 03:23 then after the last split you do a right 03:26 at its parent level alright so if you 03:32 assume that you have enough memory to 03:35 hold H nodes H is the height of the B 03:37 tree so on the way down you read in H 03:39 nodes okay and then on the way back up 03:42 when you do the splits you don't have to 03:44 read these nodes back in the sitting in 03:46 memory so H freed accesses on the way 03:50 down if the number of splits is s and 03:54 you can't have more than H splits os's 03:56 at most h2 writes at each level at which 04:00 there's a split there are s such levels 04:02 and then one right of the parent of the 04:05 last node that got split okay so 04:10 the total number of accesses 8 reads and 04:13 2 s plus 1 right accesses s cannot be 04:17 any more than H so it works out to 3 H 04:20 plus 1 now if you're dealing with very 04:30 large dictionaries and they're working 04:33 with an M of the order of say 200 then 04:36 your H might be as big as 4 5 or 6 04:40 so if looking at H of 5 this works out 04:42 to 16 disk accesses which still a large 04:46 number of disk accesses to be making now 04:52 the nice thing is that even though in 04:56 the worst case you might be making say 04:57 16 disk accesses when H is 5 on average 05:02 you don't make a large number of disk 05:04 accesses you do a write so typically if 05:08 if all you're doing is you start with a 05:10 dictionary and you keep adding items one 05:13 at a time say ok then the average number 05:15 of disk accesses tends to be rather 05:17 small so let's take a look at an 05:19 analysis under those conditions ok so 05:23 suppose you start with an empty B tree 05:25 and you just do a large number of 05:27 inserts ok so we insert n items and 05:33 let's say that when you're done you end 05:35 up with a B tree that has P nodes in it 05:39 ok so the total number of splits that 05:43 took place across these n inserts is at 05:46 most P minus 2 ok if the number of nodes 05:49 is more than 2 then we know that the 05:52 root has split at least once and when 05:54 the root splits you create 2 nodes when 05:58 every other node splits you create only 05:59 one new node ok so if you have P nodes 06:04 when you're done with the root is split 06:06 at least once and that created two new 06:08 ones each other split might have created 06:11 only one new node and so the total 06:13 number of splits is at most P minus 2 ok 06:18 so this is a total number of splits that 06:20 took place across all inserts 06:23 okay the number of pairs that you must 06:27 have inserted if you have P nodes okay 06:30 the roots got to have at least one item 06:33 in it the remaining nodes must have at 06:37 least M over two minus one items in it 06:39 okay and there are P minus one of those 06:41 nodes okay so you must have at least 06:43 this many items in your tree so you must 06:46 have done at least that many inserts so 06:50 the average number of splits okay is the 06:58 total number of splits you put an upper 07:00 bound on that P minus two divided by the 07:04 total number of inserts 07:05 which must be at least this many okay so 07:09 this is really an upper bound on the 07:11 number of splits per insert so if you 07:19 look at this the P minus two over P 07:22 minus one 07:22 well that's always going to be smaller 07:24 than one so I can throw that out and get 07:27 an even worse upper bound if you try 07:35 this out for the M value we've been 07:38 playing around with M of 200 the average 07:41 splits per insert works out to one over 07:44 99 at most one over 99 and what that 07:50 means is that the average number of disk 07:53 accesses per insert under this model of 07:55 just a long string of inserts works out 07:59 to be it was H plus 08:08 right so it works out to be let's see 08:11 what did we have I think we had h plus 2 08:14 s plus 1 yeah so it's H plus 2 over 99 08:17 plus 1 which is just slightly more than 08:20 H plus 1 h plus 1 is the bare minimum 08:25 you could expect because every insert 08:27 would require you to go all the way down 08:30 to the bottom of the B tree to make sure 08:31 you don't already have an item with that 08:33 key in it and then you would do one 08:36 right after you modify the leaf into 08:38 which you put it so you can't do an 08:41 insert with less than h plus 1 disk 08:43 accesses so h plus 1 is the minimum for 08:45 the average number of disk accesses for 08:47 an insert and the algorithm that we have 08:50 comes already close to that so at least 08:54 from the average standpoint you're doing 08:57 pretty good ok so some inserts may take 09:00 a long time they make many disk accesses 09:04 but an average is very small under the 09:08 model that you just do a long string of 09:10 inserts let's turn our attention to 09:16 delete and again yeah 09:29 what why do we need a lower bound on the 09:32 number of pairs ok well if I just take n 09:35 and stick this in over here P minus 2 09:39 over n it doesn't really tell me 09:41 anything so by putting in the lower 09:48 bound I was able to eliminate n is 09:50 really not known to us okay so I've been 09:54 able to pretty much eliminate I don't 09:56 know P I don't know an all it really 09:57 knows the order of my b-tree so I've 09:59 been able to get rid of all of the 10:00 unknowns and still come up with 10:02 something which said it's about H plus 1 10:05 disk accesses an average 10:10 all right so turning our attention to 10:13 the delete operation illustrate how 10:18 delete works by going through an example 10:19 using a b-tree of order 3 namely a 2-3 10:23 tree and then from that we can 10:24 generalize how it works on all me trees 10:30 all right so let's look at that example 10:32 2 3 3 suppose you want to delete the 8 10:40 now when you delete from a b-tree you 10:44 always transform deletion from an 10:45 interior node into deletion from a leaf 10:48 ok so we really are not going to try and 10:51 once you get the 8 out to try and 10:54 manipulate the tree to make it look like 10:56 a b-tree from that state instead what 11:00 I'll do is I'll find a suitable 11:02 replacement for the 8 from my current be 11:05 tree a suitable replacement for the 8 is 11:08 either the largest fellow in the sub 11:12 tree to its left ok so the 6 4 put the 6 11:16 there things were still satisfied the 11:18 search tree properties or the smallest 11:21 fellow in the sub tree to the right in 11:23 this case the mine I could move the 9 11:25 out there ok that's similar to how you 11:27 handle a delete from a binary search 11:29 tree when you're doing it from a degree 11:30 2 node ok so I like to move the 6 up 11:36 there I'll move the 9 up there so 11:39 suppose we decide to move the 6 up there 11:41 once you move the sex you're left with a 11:43 notice degree is to get the two external 11:46 nodes hanging off of it here ok you 11:47 don't have doing anything else alright 11:50 so first thing is transform it to a 11:53 delete from a leaf and then you only 11:55 need to consider deletion from a leaf 12:03 okay our strategy would be to replace 12:06 the item with the largest fellow in the 12:08 subtree to its left okay so for example 12:12 if you are deleting the four then we'll 12:15 replace the four with the three okay 12:18 some three to its left if you were 12:20 deleting the two we take the largest in 12:23 the subtree doors left a1 and then 12:26 you're left with the problem of removing 12:28 from a leaf because the largest in the 12:31 subtree doors left will always be in a 12:32 leaf node okay all right so we need to 12:39 really consider only deletion from a 12:40 leaf all right so I suppose you're 12:46 removing 16 so the simplest case is the 12:49 leaf currently has more items in it than 12:53 it needs and this node here 12:58 has one more item than it needs so 13:01 that's a removing either the 16 or 17 13:05 with the simplest case it would delete 13:07 so in this case all you do is you take 13:11 the item out of this node okay and 13:15 rewrite the new version of the node all 13:22 right so that's the simplest case the 13:25 leaf you're taking it out of has more 13:28 items than it needs 13:36 or next suppose you're removing from a 13:38 leaf that doesn't have more than it 13:41 means okay so for example here we're 13:43 removing it from this two node once you 13:46 take an item out of a node that doesn't 13:48 have more than it needs the node becomes 13:50 deficient it has one less than what it 13:53 needs so in the context of a two three 13:56 tree one less than what it needs be cut 13:58 means it becomes empty but in a larger 14:01 me tree be tree of higher order it may 14:04 not become empty it just has now M over 14:07 two minus two instead of M over two 14:09 minus one items all right so when we 14:16 take the 17 out of here okay the 14:19 strategy is going to be to try and 14:22 borrow an item from a nearest sibling 14:25 okay so this node has two nearest 14:28 siblings one on the right one on the 14:30 left 14:31 okay no node can have more than two 14:34 nearest siblings you can either have one 14:36 on the left or one on the right only one 14:37 on the left only one on the right okay 14:39 this node has only one near a sibling 14:42 it's on the right this one has only one 14:44 near a sibling it's on the left so we 14:48 try to borrow from a nearest sibling to 14:52 minimize the worst case disk accesses if 14:55 a node has two nearest siblings you only 14:57 look at one of the two to see if it has 15:00 more than what it needs okay so we have 15:02 a strategy which says if you have to 15:04 like this guy maybe you always look only 15:07 at the near assembling to the right okay 15:11 all right so this guy looks to the 15:12 nearest sibling on the right and 15:13 discovers that that fellow has more than 15:16 it needs if it has more than you can 15:19 borrow ok to borrow 15:21 you can't just take this item 30 and put 15:24 it here because then you upset the 15:25 search tree property borrowing is always 15:28 done through the parent so the 30 moves 15:32 up and the 20 moves down and that would 15:36 mean then you got three nodes that you 15:38 have to write back okay so you got to 15:39 read the sibling and then write back 15:41 this node this node in that node alright 15:45 so we do a borrow 15:47 okay because we're deleting from a node 15:49 that doesn't have that has the bare 15:53 minimum right now you take it out it 15:55 becomes deficient so we're going to fix 15:57 that deficiency by borrowing from an 15:59 error sibling and in this case we look 16:01 at their nearest right sibling alright 16:08 so when you do the borrowing you end up 16:11 with that configuration okay 16:13 so this node had become empty the 30 16:15 moved up and the in-between key 20 moves 16:18 down all right now suppose you're 16:30 deleting from a node that has the 16:32 minimum required once you do the delete 16:36 it becomes deficient you check its 16:38 nearest sibling okay whichever one 16:41 you're supposed to check in this case 16:43 the one on the right and that near a 16:45 sibling doesn't have anything extra it's 16:48 also got the minimum if you do a borrow 16:51 it doesn't solve your problem certainly 16:54 you can do the borrow the 40 moves up 16:55 the 30 moves down but now that fellow 16:57 becomes deficient okay so when the 17:00 nearest sibling that you check doesn't 17:02 have anything extra instead of borrowing 17:04 you do a combine okay so in a combine 17:09 the deficient node the in-between key 17:13 and this barely minimum occupied node 17:17 all combined into a single node 17:20 okay so that's the reverse of a split 17:23 okay so this guy that guy and that guy 17:28 will combine you'll get a node down here 17:29 that has a 30 and a 4 unit and when you 17:34 do that the degree of this node 17:37 decreases by 1 ok so from a 3 node it 17:40 becomes a 2 node ok alright so we're 17:44 deleting from a 2 node you check it's 17:47 near a sibling it's not a 3 node it 17:49 doesn't have anything extra and in that 17:53 case you do a combine so when you do a 17:56 combine it's a deficient node combining 17:59 with a No 18:00 odhh which has the absolute minimum 18:03 number of items together with the 18:05 in-between pair in the current node so 18:10 you end up with that situation all right 18:20 so when you do a combine which is done 18:23 at this level at that level only one 18:26 node had to be written out because the 18:27 other node disappeared okay so this node 18:30 gets written out the new version it went 18:32 from a 2 node to a 3 node you check the 18:37 parent the parents degree goes down by 1 18:40 so it could have become deficient in 18:43 this case it doesn't become deficiency 18:45 just write out the non deficient let's 18:56 try another example we're going to 18:57 delete the 30 it's in a node that has 19:03 extra items so whenever you're deleting 19:06 from a leaf that has extra items you 19:08 just take the item out and write that 19:10 note back so here that 3 node becomes a 19:14 2 node 19:18 things look like that 19:25 all right suppose you want to take out 19:27 the three so it's in a two node okay 19:31 bare minimum occupancy once you take out 19:34 the three it becomes deficient when a 19:37 node becomes deficient you check in near 19:39 a sibling to see if it has excess and 19:41 this guy's got excess you take out in 19:45 this case the smallest fellow move it up 19:48 to the parent and moved in between key 19:51 down so that's a bar okay so this will 19:53 end up four five six 19:56 all right so deleting from a two node in 19:59 this case check a sibling it's a three 20:02 node there's got more than it needs you 20:05 do a borrow with a parent and you end up 20:07 in that configuration 20:15 all right suppose you want to remove the 20:17 sex okay once you take it out the note 20:24 becomes deficient you want to fix the 20:26 deficiency you check a nearest sibling 20:30 it's only has a narrow sibling on the 20:32 left so you don't have a choice you go 20:33 to check this one okay and this guy 20:36 doesn't have anything to give you okay 20:40 it's at minimum occupancy so in this 20:43 case you do a combined okay so the 20:46 Nerissa bling which is at minimum 20:48 occupancy you who are deficient and the 20:50 in-between fellow all combined to form a 20:52 single node in this level so I got a 20:56 node which contains four and five and 20:58 the degree of that node will drop by one 21:15 all right now suppose you want to remove 21:17 the 40 okay 21:19 so again you come out here it's in a 21:22 node with the bare minimum once you take 21:24 out the 40 it becomes deficient you 21:27 check it's near a sibling and it's near 21:31 a sibling is the nine that's got nothing 21:35 extra so it's a combined okay so this 21:38 node the in-between key the deficient 21:40 node all combined to form a new node out 21:42 here the degree of the current decreases 21:46 by one 21:57 all right so when you do a combine the 22:00 degree of the parent decreases by one if 22:03 the parent becomes deficient as a result 22:06 you need to fix that deficiency and in 22:10 this case the parent has become 22:12 deficient to fix the deficiency I apply 22:16 the same process I applied at the lower 22:18 level check and near a sibling if the 22:22 nearest sibling has more than it needs 22:23 then you borrow so for example if this 22:27 had been a three node maybe in here 22:30 there is a seven if there's a seven 22:33 there's also a subtree hanging off of it 22:35 then I've moved the seven up here I'd 22:37 move the eight down here and this 22:39 subtree would be transferred to this 22:41 side okay so when you do a borrow it's 22:46 not just that the key from here moved up 22:48 but there's a subtree hanging off which 22:50 was null in the previous example that 22:52 subtree also moves to this side okay so 22:55 in that case you'll have an eight this 22:57 will be its right subtree in the subtree 22:58 that came from the other side would be 23:00 this left subtree okay so again you try 23:03 to borrow in this particular case you 23:06 can't borrow when you can't borrow you 23:08 do a combine so I need to do a combine 23:16 so we need to combine the belly the node 23:22 which is just minimally occupied the 23:25 in-between key and the deficient node 23:27 combined into a single node and that 23:29 decreases the degree of the parent okay 23:36 so the deficiency is propagated up 23:38 another level then you go up to that 23:42 level and try to do a borrow so you 23:46 check it's near a sibling in this case 23:48 there isn't one if you can do a borrow 23:51 you're fine a borrow will terminate the 23:54 backward process if you have to do a 23:56 combine then this thing propagates one 23:59 more level up in the worst case like 24:03 here this deficiency propagates all the 24:05 way up to the root and 24:10 okay so you want to check its notice 24:16 sibling to the bar oh it doesn't have a 24:17 sibling you know you had the root and at 24:20 this time you just take that root and 24:22 you throw it away because that root can 24:28 be deficient only if it has zero keys in 24:32 it 24:32 as Europe s the requirement for a root 24:35 is at least degree two okay so a 24:40 deficient root means it's empty and that 24:43 note can be thrown away irrespective of 24:46 the order of the be true you're dealing 24:48 with okay so you throw away the root and 24:54 you end up with a b-tree whose height is 24:57 one less than what you started with or 25:05 any questions about how you do a delete 25:13 right you have a borrow operation and 25:17 then you have the combined the combined 25:20 is really the inverse of the split that 25:22 was done during an insert 25:35 all right so in a general situation okay 25:39 so the steps are if you're deleting from 25:41 the interior you transform it to a 25:43 deletion from a leaf if the leaf becomes 25:48 deficient it triggers a bottom-up 25:51 borrowing and combining pass when you do 26:01 a combine you will be combining a node 26:08 that has minimal occupancy so that's M 26:12 over 2 minus 1 ok so this is the node 26:24 with minimal occupancy this the node 26:26 that's become deficient it has one less 26:28 than minimal occupancy and then you'll 26:31 have an in-between pair from the parent 26:37 okay so the new node you create you will 26:41 need to show that the new node isn't 26:44 over full so if you work out the sum of 26:48 these three things for the cases when M 26:51 is odd and when it's even you'll see 26:53 that in one case this works out exactly 26:55 a minus 1 in the other case it works out 26:57 to a minus 2 so the combining is 27:00 feasible 27:07 alright so the strategy that we went 27:10 through is guaranteed to work there are 27:18 any questions about that strategy for 27:22 general V trees so look at the number of 27:30 disk accesses that are needed well the 27:42 minimum number of disk access is needed 27:44 to make it delete okay you do in this 27:48 case and you have to go down level by 27:51 level to find the leaf from what you're 27:52 doing the delete so in this case three 27:56 read accesses and then if you're lucky 28:00 you end up at a node that will not 28:02 become deficient following the delete 28:04 you have to write this one out so in 28:07 this case four is the minimum with which 28:09 you can do the delete if you are 28:13 deleting from the interior then the 28:15 minimum would be five because you'd have 28:17 to write out this node as well as that 28:19 one but if you're deleting from a leaf 28:22 the minimum would be four so the minimum 28:27 is either h plus one or h plus two 28:29 depending upon whether you're deleting 28:31 from an interior from the leaf 28:35 now if the delete requires you to do a 28:37 borrow so for example you delete the 28:40 three okay well then in addition to the 28:45 three accesses you made on the way down 28:48 you would need to make one more access 28:50 to read the sibling then you'd have to 28:54 write this fellow out write that fellow 28:56 out and that fellow out okay so a borrow 29:03 requires you to make one read and three 29:07 write accesses 29:12 if you do a combined so maybe you take 29:18 the eight out then you still have to 29:22 read in the sibling so one read you're 29:24 going to write something out okay and 29:27 then you pump up to this level 29:32 so really the expensive one is the 29:34 borrow the borrow is one next to read 29:39 and three writes a combined if it were 29:41 determinate here but it wouldn't in this 29:43 case would be one read and two writes 29:48 the borrow is the expensive one but the 29:50 borrower can only be done once in a 29:52 delete the combine can be done many 29:55 times right so if we're looking for the 30:00 worst case then the worst case again we 30:04 make the same assumptions okay on the 30:07 way down you got to read one note at 30:09 each level then on the way back up for 30:12 the worst case we're going to assume 30:14 that you end up doing a combined at 30:17 every level except when you get to level 30:19 two you do a borrow because that borrow 30:23 will cost you three rights so on the way 30:29 up we assume it propagates all the way 30:32 up okay 30:33 so we'll be doing H minus one sibling 30:36 reads 30:43 then there are h- to rights of the 30:47 combined node because we doing combines 30:48 all the way up except at level two where 30:52 we're going to do a borrow so you got h 30:54 minus two of those and then at level two 30:56 will do the borrow and there will be 31:00 three rights down at that level so if 31:10 you add up everything is 3h so it's 31:15 about the same as when insert and insert 31:16 was 3 h plus 1 here is 3 h are any 31:23 questions about the analysis 31:36 let's do an average analysis again 31:39 making similar assumptions the last one 31:41 where we say you start with a b-tree 31:44 that has some number of items in it then 31:46 we do a long string of deletes all right 31:54 so it's got n items in it and there are 31:56 P nodes and we're going to delete the 31:58 items one by one till it becomes empty 32:04 our previous analysis said that the 32:07 number of items got to be at least this 32:08 money if you got P nodes and from this 32:15 one we can figure out that P is at most 32:18 that so we just rearrange the terms here 32:22 you get this alright you figure out how 32:31 many disk accesses you might have made 32:33 to start from these P nodes and end up 32:36 with 0 ok we will assume that every 32:43 delete that you do performs a borrow in 32:50 addition ok as far as borrowers go 32:55 borrows don't change the number of nodes 32:56 okay so borrowers don't change the 33:00 number of nodes but deletes decrease the 33:02 number of nodes and every time sir not 33:08 deletes but combines decrease the number 33:11 of nodes ok and every time we do a 33:14 combine you decrease the number of nodes 33:17 by at least one now when you do a this 33:21 combine that goes all the way up to the 33:23 top you actually decrease the number of 33:24 nodes by more than one because even the 33:26 root disappears the old root okay but 33:31 combines delete the number of nodes by 33:34 at least one and so across all of your 33:38 deletes you can't have more than P minus 33:41 one combines because in the end you have 33:44 to do a combine at the top level which 33:46 throws away the root 33:48 so the number of combines can be at most 33:51 P minus one across all n deletes but the 33:56 number of borrows could be as many as 33:57 the number of deletes alright so the 34:03 total number of disk accesses across all 34:12 of the deletes okay so the h plus four 34:17 here says there are h read accesses okay 34:22 for all all deletes on the way down okay 34:25 even if you're deleting from the 34:27 interior there are each going down okay 34:29 and then everybody's doing a borrow so 34:34 borrow means read a sibling and the 34:39 borrow means that you're going to do a 34:42 three writes okay so that takes care of 34:47 this okay the combines are being done on 34:53 top of the borrow that we're already 34:54 charging to this fellow okay so when 34:57 you're doing the combines okay we don't 35:01 have to count the read of the sibling 35:02 because we already charge that to the 35:05 borrow here so the combines are going to 35:09 be doing two writes each the total of P 35:14 minus one combined so you get two times 35:16 that now if you want to account for 35:26 deletes always taking place from the 35:28 interior then you probably want to add 35:30 one more for writing an interior node 35:33 back okay so maybe to discount here you 35:36 want to make that H plus five okay but 35:39 that's not going to change the average 35:41 we're going to come up with all right 35:45 so the total number of accesses 35:47 depending upon whether they're all from 35:49 leaves that or if you'll won them all 35:51 from interiors make it H plus 5 ok all 35:58 right so the average number of accesses 36:00 is that most the total number of 36:03 accesses that's the worst case you can 36:05 have divided by the number of deletes 36:11 that you're performing which is n ok and 36:14 if you divide these things out the 36:17 number of nodes if you assume M is of 36:19 the order of 200 number of nodes are 36:21 small compared to the number of items 36:23 okay so we forget about that term it's 36:25 just about H plus 4 or H plus 5 if you 36:30 account for deletes from interior all 36:37 right so it's it's not as bad as a 3 H 36:40 would suggest assuming you're doing a 36:43 long string of deletes the third delete 36:46 accesses is still pretty small the 36:53 minimum you could do it then would be H 36:56 plus 1 if you're deleting from a leaf 36:58 and H plus 2 if you're deleting from an 37:01 interior at least using our strategy to 37:03 delete from an interior 37:11 of course this is our analysis assumes 37:15 you get long strings of inserts and 37:16 deletes but that may not happen if you 37:20 get alternating inserts and deletes well 37:26 then it's possible that an insert comes 37:29 and it splits a whole bunch of nodes so 37:31 you make 3h plus 1 bisque accessors and 37:34 right after that you get a request to do 37:36 a delete which recombines all of these 37:39 nodes making 3h disk accesses and then 37:46 you get another insert that splits 37:47 everybody and then you get a delete that 37:48 combines everybody ok 37:52 so the worst that can happen to you of 37:54 course is you get this bad sequence of 37:57 alternating inserts and deletes that 37:59 make you do the maximum work possible ok 38:05 so it all depends on what kind of 38:07 sequences of inserts and deletes you're 38:09 working with or any questions about a 38:14 delete 38:29 right as I've mentioned earlier even 38:32 though much of the discussion we've had 38:35 has been with respect to disk accesses 38:37 and things b-trees can be used with 38:40 advantage for internal memory 38:42 applications over using something like a 38:45 red black tree because the height would 38:53 be smaller than that of a red black tree 38:55 so if you were using say a a B tree of 38:59 order a to row B to your hold of 16 its 39:02 worst case height would be less than 39:03 that of a red black tree and if a node 39:09 of order 8 or 16 would fit into one 39:13 cache line the number of cache misses 39:16 would be reduced then compared to 39:17 searching a red black tree so you could 39:23 use low degree B trees in internal 39:27 memory to improve dictionary performance 39:30 though most of the time when people talk 39:34 about B trees they are talking about 39:35 external memory applications let's turn 39:41 our attention to the structure of a node 39:47 yeah implicit in all of our discussion 39:49 has been that you're going to take the 39:53 up to M minus 1 pairs that are node 39:55 stores and store them sequentially 39:59 possibly using an array type structure 40:02 in ascending order of the keys of these 40:06 M minus 1 pairs but that may not be the 40:11 best way to go to figure out what 40:16 structure you should use you need to 40:19 [Music] 40:21 take a look at what are the operations 40:23 you want to perform on a node and once 40:26 you know the operations you can decide 40:27 the structure to use for a node alright 40:32 so this is the format of a node this 40:37 tells you how many pairs are being 40:39 stored in the node each P is a pair 40:41 and each a is a pointer to a subtree so 40:47 if the node is sitting on a disk then 40:49 this is really a disk address okay so 40:51 these aids are all disk addresses and 40:53 then the peas are the dictionary pairs 41:01 so when you do a search all you want to 41:05 do is you're given a key you want to 41:09 search these pairs to find out they're a 41:11 match or decide whether they should go 41:13 into the left subtree or the right 41:14 subtree of a pair okay so you want to 41:17 search for a given key and if there 41:22 isn't a match you want to find what you 41:24 might call it nearest match and then you 41:26 can decide whether you going to left the 41:27 right subtree of that nearest match 41:32 after doing an insert well then you need 41:38 to take a new pair and insert it into 41:42 the node okay so that everything is in 41:44 it say ascending order or searchable 41:46 order so you need to insert an item into 41:50 a sorted list or into your data 41:54 structure then you need to be able to 41:57 split okay so when you do a split okay 42:03 the split is done around the middle key 42:05 so you need you will define the middle 42:07 key and then everything that is smaller 42:13 than the middle key shows up here and 42:15 everything bigger than the middle key 42:17 shows up in the other node okay so it's 42:19 like a three-way split given the split 42:23 key which is the middle key okay 42:29 so you need to be able to find the 42:31 middle pair so given an index so if M is 42:35 100 you need to find the pair whose 42:38 index is 100 okay and once you found it 42:43 then you do a three-way split around 42:45 that pair 42:48 if you have a delete okay you need to be 42:54 able to do a borrow operation well it 42:56 was a simple case or in all of them from 43:00 a collection enable to delete an item 43:01 with a given key okay then you may need 43:06 to do a borrow to do a borrow you need 43:10 to take out either the least item from a 43:12 collection of the max item from a 43:13 collection depending upon which side you 43:15 are doing the borrowing then that item 43:17 goes into the parent node and replaces 43:19 something there so you need to do a 43:20 replace and then that item goes into the 43:24 deficient node and gets inserted there 43:26 as the largest or biggest item there 43:36 alright so we need to be able to do 43:38 delete we need to do a replace we need 43:40 to do an insert a combine is really a 43:48 three-way join you have a deficient node 43:51 you have the in-between pair you have a 43:54 barely a barely legal node it's just got 43:59 the bare minimum okay and it has that 44:01 relationship where one node has all 44:03 items smaller than the in between the 44:05 other node as all like no it's bigger 44:06 than the in between so we need to do a 44:08 three-way joint all right so let's see 44:14 what kind of note structure should we 44:15 use we could now remember these nodes 44:21 are actually going to be sitting out on 44:22 a disk so if you're thinking about using 44:25 some kind of a pointer structure you 44:27 need be able to write the pointers out 44:28 and read them back in so that they make 44:30 sense so what we'll do is we'll use an 44:35 array but instead of using language 44:37 provided pointers we will simulate 44:39 pointers with integers so pointers 44:43 really give an index into the array 44:44 rather than absolute memory address or a 44:47 relative memory address these are 44:49 indexes into an array okay so simulate 44:53 pointers using integers and I will have 44:56 an indexed red-black tree built into 44:59 this array 45:01 okay so each node will still have two 45:04 pointers but these will be numbers 45:05 instead of say c++ or pointers or java 45:09 references and the index part just says 45:14 each node keeps a left size value if you 45:17 recall many lectures ago we talked about 45:18 indexed search trees okay so the 45:21 indexing allows you to find the middle 45:23 pair easily the use of integers as 45:30 pointers allows us to write this whole 45:32 array out as one said this right and 45:36 then read it back in and still have our 45:38 structure correctly in memory all right 45:46 so we use simulated pointers so look at 45:48 the operations if you want to search a 45:52 node it's a red black tree okay you can 45:56 search it in logarithmic time you want 46:00 to find a middle pair because it's an 46:01 indexed red black tree you can find that 46:03 in logarithmic time you want to insert a 46:07 pair you can do that in logarithmic time 46:10 you want to delete a pair you can do 46:12 that in logarithmic time you want to 46:15 split a b-tree node into two nodes 46:18 well the split operation works in 46:20 logarithmic time and once you do the 46:25 split when you write the node out it's a 46:29 question of the first time you write it 46:30 out you want to be sure that the active 46:34 part of the node represents one of the 46:36 splits sub trees the second time you're 46:38 going to write the same or array out but 46:40 now the active part of the node is the 46:42 second sub tree okay though they were in 46:45 the same physical array you only join to 46:50 be trees even though you can join to red 46:52 black trees in logarithmic time in this 46:55 application your join is going to 46:56 actually take order of M time and the 47:01 reason it's going to take order of M 47:02 time is because these do to be tree 47:05 nodes and I'm sitting in two different 47:06 arrays in your memory and and you had to 47:09 physically copy the tree from one array 47:11 into the other array and there's no way 47:14 to do that 47:14 in less than linear time okay so the 47:20 joint in this application would run in 47:21 order M because of the copying alright 47:34 so if we make use of index red-black 47:36 trees we can get most of the things to 47:37 work very fast a lot faster than if you 47:40 used arrays if you use an array this 47:42 would work in logarithmic time this 47:45 would work in order one time but that 47:48 would take order of M that would take 47:50 order of M this would take order of m 47:53 and that would take order of them so you 47:58 can get things to work faster by using 48:00 index red-black trees for most of the 48:03 things this one runs slower but most of 48:06 the others run faster now whether you 48:11 really want to go to the trouble of 48:13 using red-black trees or not in a in 48:15 this application will depend upon how 48:19 much time you expect to save if it turns 48:23 out that almost all of your time is 48:25 going reading and writing and the 48:27 internal time is insignificant while 48:30 going from insignificant time to 20% of 48:33 insignificant time doesn't do you a lot 48:35 of good so even though you get 48:40 asymptotic improvement one would need to 48:41 assess whether the saving and the time 48:44 is worth the effort of programming 48:46 red-black trees in this application 48:55 alright any questions okay so we stop 49:03 here 49:04 you