Transcript 00:11 okay all right today we're going to move 00:17 from the balanced binary trees that we 00:22 looked at for dictionary structures and 00:23 take a look at higher degree trees and 00:31 so really what we're going to look at a 00:35 structures called B trees and then we 00:37 take a look at close cousins of B trees 00:41 later B plus and B star trees okay right 00:46 so they said we're extending our 00:53 discussion of dictionary structures from 00:55 binary structures to higher degree 00:56 structures and higher degree structures 00:59 play a role both in the representation 01:01 of dictionaries that are going to be 01:03 stored on disk so for example if you 01:06 have millions of elements or pairs 01:09 dictionary pairs then you may use a high 01:12 degree B tree to store that dictionary 01:14 on a disk but even if you're dealing 01:19 with smaller sized dictionaries B trees 01:22 of smaller degree will give performance 01:25 advantages over things like red black 01:27 trees mainly because as we'll see by 01:31 going to higher degree trees we will 01:33 reduce the height of the tree and that 01:35 in turn means that the number of cache 01:38 misses that take place on the way down 01:41 when you do a search will be reduced 01:43 okay so provided each node of the tree 01:47 fits in a cache line the total number of 01:51 cache misses goes down and so you can 01:53 get better performance okay all right so 01:58 suppose we had a very large dictionary 02:02 and we used an AVL tree to represent it 02:06 so if you had about a billion records 02:09 that's about two to the thirty elements 02:13 or entries in dictionary then the height 02:16 of the AVL tree is somewhere between log 02:19 n and 1.44 log n zero somewhere between 02:22 thirty and forty three okay if you 02:27 stored this dictionary on a disk you'd 02:31 end up in the worst case making 02:33 forty-three accesses to your disk during 02:35 a search you started the root go down 02:37 retrieving one note at a time you'd make 02:41 forty three disk accesses to complete 02:43 the search if you had to go all the way 02:45 down to a leaf okay and if you're 02:49 looking at about one tenth of a second 02:51 for a disk access including seek time 02:53 the worst case seek in worst case 02:56 latency you're looking at about four 02:58 seconds to do the search and that's 02:59 obviously not very good the same 03:04 structure stored in memory would mean 03:07 that you would have about forty three 03:10 cache misses so if we could come up with 03:18 an alternative structure where the 03:20 height was a lot less than forty three 03:22 in the worst case for a million entry 03:24 dictionary we could reduce the number of 03:26 disk accesses if you were storing it on 03:28 disk we could reduce the number of cache 03:30 misses and since most of the time is 03:32 going to be spent servicing the cache 03:34 misses we would reduce the actual run 03:38 time if we used a red-black tree the 03:43 worst case is actually inferior to AVL 03:46 trees because in a red-black tree the 03:48 height can be up to two times log of n 03:50 so now your height could be up to sixty 03:52 and that means that you could do 60 dix 03:57 disk accesses costing you about six 04:00 seconds or in the case of an internal 04:03 dictionary you'd have about sixty cache 04:05 misses in the worst case to perform a 04:07 search okay alright so the structures we 04:11 looked at while they give us login 04:12 performance the actual time they take 04:17 when especially if you look at the disk 04:19 environment is too large and really what 04:23 you're looking for is some kind of a 04:24 constant factor 04:26 provement right if we instead look at 04:36 higher degree search trees okay so an M 04:40 way search tree will have up to M minus 04:43 1 entries in a node which means that 04:47 each node can have up to M children so 04:53 if you deal with M equals 2 you have a 04:55 binary search tree so each node has one 04:57 entry and two children number of 05:00 children is always one more than the 05:02 number of entries if you look at a four 05:06 way search tree so in a four way search 05:08 tree each node can have up to three 05:10 entries okay so this would be a 05:14 maximally populated node for a four way 05:16 search tree so there are three entries 05:18 arranged in ascending order of their key 05:19 and then you would have four sub trees 05:23 hanging off of it so for children here 05:27 you will have all entries whose key is 05:30 smaller than the first one so keys less 05:32 than ten here all of them would have 05:34 keys between 10 and 30 the next sub tree 05:37 would have keys between 30 and 35 and 05:40 the last one would have keys bigger than 05:42 35 you would search such a structure in 05:49 a manner similar to how you search a 05:51 binary search tree started the route if 05:54 this was the root of the tree and you're 05:57 searching for say 20 well then you'd 06:00 have to compare with the keys out here 06:02 it's bigger than 10 so it can't be in 06:05 here it's between 10 and 30 so it's 06:07 present it's got to be here so you move 06:09 down into this subtree okay so by 06:12 searching through the keys in this node 06:14 you can decide which subtree to move 06:16 into if you get a match you stop because 06:19 you've found what you're looking for if 06:22 the degree is large you could do the 06:25 search using the binary search method so 06:28 for example if you had a hundred entries 06:30 here you could find which subtree to 06:32 move into in log hundred time by doing a 06:35 binary search across these keys if it's 06:38 small well then 06:40 cereal search would be better all right 06:45 so you search my search tree is pretty 06:47 much the same way as you search a binary 06:49 search tree the degree of the nodes in 06:52 an my search tree can vary from M 06:55 downwards so all nodes don't have to be 07:00 of degree 4 in a four way search tree 07:01 some could be of degree three some could 07:03 be of degree two when we measure degrees 07:10 in this context I'm assuming you have an 07:12 extended tree concept so every internal 07:16 node has external nodes attached to them 07:21 in case you otherwise would have had a 07:23 null point them so for example if this 07:29 subtree here was empty then I would have 07:32 an external node added here so the 07:34 degree of this node would still be four 07:40 right so with that understanding the 07:44 degree of every internal node is always 07:46 one more than the number of keys or 07:48 entry stole in that node 07:56 right so if you do some analysis on an 08:00 my search tree to find out well how many 08:02 entries can such a tree hold for a 08:04 particular height well then we know that 08:07 the maximum you can hold happens when 08:10 every node in your search tree is of 08:12 degree M that's the highest allowable 08:14 degree and in this case every node will 08:16 have n minus 1 entries so the maximum 08:20 occurs when all internal nodes are M 08:22 nodes M nodes means the degree is M and 08:25 if you have a restriction on the height 08:28 as H then the maximum happens when it's 08:32 a full tree of that height and in that 08:35 case the root if you count the number of 08:38 nodes at level one there is one node 08:40 this follows about M children so there 08:42 are M nodes at level two each of those 08:45 as M children so this M square at level 08:47 three and so on okay so at the last 08:51 level there's M raised to H minus one 08:53 and if you add that all up you get this 08:56 as the total number of nodes in a 08:58 maximally populated M way search tree of 09:02 height H each of these nodes is going to 09:06 have M minus 1 entries in it so the 09:08 number of entries or dictionary pairs is 09:11 going to be M raised to H minus 1 okay 09:15 and that kind of agrees with what we 09:17 know for binary trees you put m equals 2 09:19 you get 2 raised to H minus 1 it's the 09:22 number of internal nodes in a binary 09:24 full binary tree of height H ok all 09:28 right so if you set up a table if you 09:34 use a binary tree then when the height 09:38 is 3 you can have at most seven entries 09:42 when the height is 7 you can have at 09:44 most hundred 27 but if you use a tree of 09:48 degree 200 the capacity becomes much 09:52 much larger okay so at least here you 09:57 have the potential that with the tree of 10:00 degree 200 you could have dictionaries 10:02 of size 10 raised to 16 and still have a 10:05 height of only 7 10:09 if you were storing this dictionary on 10:11 the root in say on a disk then you it 10:16 may be possible to keep the root in 10:17 memory all the time so the number of 10:19 disk accesses would be only six and yet 10:22 you would have as much as 10 to the 16 10:25 entries in the dictionary so the 10:28 potential is great but the worst case is 10:33 still here because an my search tree 10:35 doesn't say that every node must have 10:38 degree M it just says that's the maximum 10:40 degree the smallest degree of the nodes 10:43 could get way down okay you could be 10:47 even below that all right so we really 10:56 need to see how can we achieve the 10:58 potential of high degree trees while 11:02 avoiding the worst case where you could 11:09 still have every node having just one 11:14 non-empty child one non-empty sub tree 11:18 okay right so for that we define the a B 11:22 tree which constrains the number of the 11:26 smallest number of children that a node 11:28 can have all right so the definition 11:34 assumes that we're counting external 11:37 nodes so we're looking at an extended my 11:39 search tree we just means that if you 11:44 had some M tree sub trees you just 11:46 replace them with external nodes all 11:50 right so a B tree of order M first off 11:54 with an my search tree so no node has 11:57 degree more than M inside a node the 12:00 keys are in ascending order and then sub 12:04 tree satisfy the requirement I had on 12:05 that example with a four-way node if the 12:11 tree is not empty then it's at least 12:12 going to have a root and the requirement 12:15 on the root is that it's got to have at 12:16 least two children because both of them 12:18 may be external nodes 12:24 the remaining nodes that's the remaining 12:27 internal nodes must have at least 12:30 ceiling of em over to children so if M 12:33 is 200 the remaining nodes got to have 12:35 at least 100 children each do you have a 12:41 hundred children you got to have at 12:42 least 99 entries so another way to look 12:49 at this is that when M is 200 the roots 12:54 go to have at least one entry and the 12:57 remaining internal nodes must have at 13:00 least 99 entries in them and then we 13:10 required that all of the external nodes 13:11 must be at the same level now let's take 13:21 a look at what happens when you put in 13:24 some values of M all right so that's the 13:27 definition of a b-tree of order M if I 13:30 try M equals three okay so three means 13:35 that my search tree so nodes can have a 13:40 degree up to three if it's not empty the 13:45 roots going to have at least two 13:47 children the other internal nodes must 13:49 have at least ceiling of three over two 13:51 which is also two okay so all internal 13:54 nodes must have at least two children 13:57 and you can have at most three so the 14:00 degree of the internal nodes is either 14:02 two or three and so this version has the 14:08 name to three tree the degree of the 14:12 internal nodes that's after you plug in 14:14 the external nodes is either two or 14:17 three 14:25 if you try an M of four okay so what's 14:30 for you get roots got to have a degree 14:33 at least to the other nodes must have a 14:37 degree that's at least four over two 14:40 that's to the maximum degree allowed is 14:43 four so the permissible degrees for 14:45 internal nodes is two three or four okay 14:49 so you get something called a two three 14:51 four three all right so two three trees 14:56 and two three four trees are special 14:57 cases of B trees if you try five okay 15:07 right so if you do five roots got two 15:11 children at least the other people must 15:12 have at least five over to ceiling of 15:14 that which is three so other than the 15:17 root your degrees are three four and 15:19 five okay so it's really a three four 15:26 five degree with the exception of the 15:28 root which can have a degree of two so 15:32 as you go to higher M's the lower bound 15:37 on the degree starts to go up if you try 15:42 an M of two a b-tree of order two okay 15:47 so order two roots got two okay the 15:51 remaining fellows got to have at least 15:52 one okay but if you look at an internal 15:55 node so an internal nodes going to have 15:56 at least one key in it if it's got one 15:59 key it's got to have two children even 16:03 if they're both empty you got to have 16:04 two external nodes hanging off of it 16:08 okay so even though this thing says the 16:11 remaining internal nodes must have a 16:13 degree at least one a degree of one is 16:15 really impossible for an internal node 16:18 the smallest degree an internal node can 16:20 have once you have thrown in the 16:22 external nodes is - okay so this becomes 16:26 a non constraint really or because you 16:32 have this other constraint that says 16:34 that internal nodes must have a degree 16:36 Utley 16:36 - there's no way around that okay so all 16:42 internal nodes must have a degree of two 16:44 and then this fellow here says that all 16:47 the external nodes are on the same level 16:49 so if you couple the observation that 16:52 internal nodes must all be of degree two 16:54 with everybody all external nodes on the 16:56 same level you get a full binary tree 17:01 okay so a B tree of order two is a full 17:05 binary tree now since it's a full binary 17:12 tree B trees of order 2 only exist for 17:16 some values of N and being the number of 17:18 internal nodes it's got to be two to the 17:20 H minus 1 so B trees of order 2 are not 17:24 very useful for dynamic dictionaries 17:26 because if you've got 15 elements well 17:32 then you got a B tree of order 2 but 17:33 what I do and insert I need a be tree 17:35 with 16 elements and it doesn't exist 17:38 after 15 the next one that exists is 31 17:41 okay 17:42 so B trees of order 2 since the full 17:44 binary trees are really not of any use 17:47 for dictionaries you can only represent 17:50 certain sized dictionaries there okay so 17:55 they're not have any use for dynamic 17:56 dictionaries the smallest order B tree 17:59 that's useful is the 2 3 3 or a 3 so if 18:05 you go with order 3 and above will see 18:07 there are insert delete algorithms that 18:10 essentially prove by construction that 18:13 these B trees exist for every value 18:14 event and that's not true when M is 2 18:19 okay all right so nobody talks about B 18:21 trees order - you only talk about them 18:23 order 3 and up order 3 is a 2 3 tree or 18:27 the fours a 2 3 4 tree everybody else is 18:30 just known as a bee tree of order n 18:35 aren't any questions about what B trees 18:37 up 18:45 all right so let's look at the smallest 18:50 number of pairs or entries you might 18:53 find in a b-tree of order M okay suppose 18:57 that n is the number of pairs or entries 18:59 then the number of external nodes is 19:02 always one more than the number of 19:03 entries if the height of your B tree is 19:08 H this doesn't count the external node 19:11 level because that's kind of a 19:13 fictitious level then all the external 19:15 nodes will be in level H plus one so 19:20 what I'm going to do is I'm really going 19:22 to find a lower bound on the number of 19:24 external nodes and then use this 19:28 equation here to get a lower bound on n 19:33 alright so let's go down level by level 19:36 and see how many was the minimum number 19:38 of nodes you got to have at each level 19:39 ok so at level one you got to have the 19:46 root assuming the level exists ok so at 19:49 level 1 you go to have exactly one node 19:52 if the level exists at level two if the 19:58 level exists okay and that's not the 20:03 okay so at level two okay 20:09 the level one node the root has got to 20:13 have a degree that's at least two okay 20:16 so the number of nodes at level two has 20:19 got to be at least two if you go to 20:24 level three okay then each node at level 20:27 two has got to have a degree that's at 20:30 least M over two 20:32 okay so you got to have at least two 20:35 times M over two nodes at level three 20:38 then at level four it's got to be at 20:40 least two times M over two square and if 20:45 you work your way down to level H plus 20:47 one multiplying each time by ceiling of 20:51 M over two you end up with you gotta 20:53 have at least two times ceiling of M 20:55 what 20:55 to raise to H minus 1 nodes okay all 20:58 right so the number of external nodes 21:00 has to be at least that much the number 21:05 of external nodes is exactly equal to n 21:07 plus 1 so n plus 1 has got to be exact 21:10 it has got to be at least that much so 21:15 the number of entries okay goes up at 21:21 least by this stuff - 1 ok so if your 21:25 degree is 100 then instead of going up 21:28 at the rate of 100 raised to something 21:30 you're going up at the rate of 50 raised 21:32 to something in the worst case right 21:43 suppose you have a b-tree of order 200 21:49 if the height is to put H equals 2 then 21:55 you get 2 times 100 so 200 minus 1 so 22:01 you go to have at least 199 entries in 22:05 your dictionary if the height of a 22:09 b-tree is 3 you gotta have almost 20,000 22:15 you may have many more than that but you 22:18 got to have at least that number if your 22:21 height is 4 you got to have almost 2 22:24 million okay and if you hide this 5 you 22:30 have it almost 200 million 22:37 okay alright so if you're able to work 22:41 with a b-tree order 200 then you can 22:44 hold almost 200 million entries before 22:50 the height becomes 5 okay you cannot 22:53 become 5 unless you have 200 million 22:56 minus 1 so if you have 200 million minus 22:58 to your height can be only 4 and then if 23:01 you're keeping the route in memory then 23:03 it's only the level 2 3 & 4 nodes that 23:06 you might have to access from disk when 23:08 you do a search 23:10 so with 3 disk accesses you can search 23:14 dictionaries with up to 200 million 23:17 minus 2 entries all right so B trees 23:27 have very significant potential to give 23:32 you a good performance when you store 23:34 them on disk also to reduce cache misses 23:36 in internal memory applications there we 23:39 will not be talking about 200 you will 23:42 try and match the size of a node to the 23:44 size of a cache line and so you may be 23:47 talking about B trees of order 4 8 16 so 23:52 you'll be at the smaller end in 23:54 dictionaries you'd be at the higher end 24:02 all right so let's look at choosing the 24:05 order that you should be using now you 24:09 could work in two different environments 24:10 in one environment your choice of M may 24:18 be dictated by the operating system or 24:20 by the hardware so I said well if you're 24:23 dealing with internal stuff you can't 24:26 change the size of a cache line so you 24:30 would match the size of a node to the 24:31 size of a cache line so each time you go 24:33 down one level the node is brought into 24:35 cache you do a search with one cache 24:37 miss you may not have any control on the 24:41 size of a disk block so when you go to 24:44 disk you do an access you bring in a 24:45 block load in which case you choose the 24:49 size of a node to match the size of a 24:51 disk block but if you are able to 24:57 control the quantity of data that can be 25:00 accessed particularly in a disk 25:03 environment you may get to set your 25:07 block size okay so in that case this 25:10 kind of analysis may be useful so you 25:15 may want to optimize your choice of m 25:17 for the search operation say okay so to 25:24 do a search okay 25:26 you'd have to fetch a node from your 25:29 disk and then you'd have to search it 25:31 internally and then now you'd have to do 25:34 it height times or height minus one if 25:37 the root is being stored in memory all 25:39 the time so if you want to look at 25:43 worst-case time then we'd want to 25:44 minimize this quantity here this 25:48 quantity boils down to an equation like 25:50 this where the a here represents things 25:56 like the seek time in latency time so 26:01 that doesn't depend on the size of the 26:04 block here reading okay so you might 26:06 bring in the worst case seek time worst 26:08 case latency time this represents the 26:11 time to transfer 26:14 a node of size M into memory from disk 26:18 and then this represents the time to do 26:21 a binary search of that node in memory 26:27 and then you multiply it by H once for 26:30 each level alright so you may want to 26:34 minimize this and for H we plug in the 26:40 formula we had and the previous slide 26:42 was case for height so if you do that 26:46 and you plug in typical disk parameters 26:48 of stuff you get a nice flat bottom 26:51 curve like this which really says that 26:54 if you're operating somewhere in that 26:56 range it doesn't really matter what your 27:00 order is and typically what that also 27:08 means is that if you can't control the 27:10 size of a disk block is not a whole lot 27:12 because somewhere in that range probably 27:16 you would find a match with the size of 27:19 a disk block alright so as far as 27:24 worst-case such time is concerned it's 27:26 pretty insensitive to the order m you 27:34 may have reason to choose I guess bigger 27:39 orders if you want to optimize say 27:43 insert I'm gonna delete time that 27:45 wouldn't really change your worst case 27:47 search time a whole lot alright so so as 27:55 far as choosing the order of the be tree 27:57 is concerned in most practical 28:00 situations you're limited by factors 28:03 beyond your control like the size of a 28:05 cache line or the size of a disk block 28:07 and so you just go with that but if you 28:11 do have ability to change the size over 28:14 this block it's you could choose some 28:17 number in that range and maybe based on 28:19 further optimization of insert and 28:22 delete 28:27 all right let's take a look at insert 28:32 tomorrow or Friday we'll take a look at 28:34 the delete operation 28:40 all right the example I've got here is a 28:43 two three tree so if you look at the 28:48 internal nodes of a two three three they 28:51 either have one entry or two entries so 28:55 the same as saying that the degrees are 28:56 either 2 or 3 so these kind of greenish 29:00 nodes are degree 2 nodes the yellowish 29:02 nodes are degree 3 nodes the external 29:06 nodes are not shown that said before 29:08 external nodes are not represented now 29:12 if you're doing an insert you pretty 29:14 much follow the same strategy as you 29:16 would when you insert into a binary 29:17 search tree you start at the root and 29:19 find your way down verifying that the 29:22 entry you're going to insert doesn't 29:26 already have somebody with the same key 29:28 ok and that means you typically that 29:31 means that you're going to fall off at a 29:32 leaf now unlike a binary search tree 29:36 where when you fall off a leaf you kind 29:38 of add a new node as a child of the leaf 29:41 you fell off of here we're going to try 29:44 and insert it back into the node that 29:46 you fell off of so for example if you're 29:51 going to insert something that's going 29:52 to land or fall off of this node so for 29:56 example you might be inserting a 10 or a 29:58 12 or 13 you'd follow this path bigger 30:01 than 8 less than 15 bigger than 9 you 30:03 fall off so then I will put it into this 30:06 node this node has got the capacity to 30:09 take one more entry it's only got one 30:11 entry in it so it'll now become a 3 node 30:15 ok so in that case all we would do is I 30:22 would make a change in this node and 30:24 write this node back out to disk ok if 30:27 you look at the process assuming the 30:29 whole tree is sitting on a disk you 30:31 access the root so one read access that 30:34 tells you to come here you do another 30:36 read access then you come out here you 30:38 do a third read 30:39 access make a change of this node write 30:41 it out you do one write access so for 30:44 disk accesses would be enough to make an 30:46 insert in this case the more interesting 30:50 cases when you're when you fall off one 30:53 of these yellow nodes which already at 30:55 their capacity and that time you can't 30:58 insert the new entry into the node you 31:00 fell off of there's no place there so if 31:06 you're trying to put it into a node 31:07 that's already full then that triggers a 31:10 bottom-to-top pass in which we're going 31:14 to be splitting nodes 31:16 okay so let's take a look at how the 31:19 splitting works okay so what I do is 31:24 when I'd reach oh when I fall off the 31:28 tree so what I'll do is when I fall off 31:33 the tree so for example if inserting a - 31:35 you fall off here then at least 31:37 symbolically I'll insert it here 31:38 physically you can't do it but 31:40 symbolically you can do anything 31:41 so I'll get a 1 2 3 at that time I'll 31:45 say I have an over full node and that 31:47 Ophel node will get split alright so I 31:53 have an over full node it's got one more 31:55 entry in it then it can possibly hold so 31:58 I'm going to split that okay so here's 32:01 my over full node it's got M dictionary 32:06 repairs in it the capacity of a node is 32:08 M minus 1 okay so you've exceeded it by 32:10 one the pairs are in ascending order of 32:14 key and then if you've got em pairs in 32:19 it you'll have M plus 1 child pointers 32:22 so the A's are the child pointers the 32:24 piece of the pairs the M tells you how 32:27 many pairs you've got in there so that's 32:29 the typical node format all right so the 32:35 A's are pointers the subtrees peas 32:37 additionally pairs we're going to split 32:39 this note down the middle ok so you find 32:42 the middle pair is P of ceiling M over 2 32:45 everybody towards left ok remains in the 32:50 original node so the 32:52 is the note that became over full that's 32:55 going to retain the first half of the 32:57 pair's 32:58 and along with that will retain the 33:01 associated pointers then we'll create 33:08 another node which will have this many 33:18 pairs in it and it's going to have I 33:24 guess pairs beginning at M over two plus 33:27 one the middle pair isn't there okay 33:30 the middle pair is kind of taken out 33:32 okay so other than the middle pair all 33:36 of the other pairs are accounted for 33:38 either here or there and then all of the 33:40 child pointers that wider there or that 33:42 were here are accounted for here or here 33:44 okay so you kind of split it down the 33:46 middle 33:47 pulling out the middle pair M over two 33:51 that's kept out but everybody else 33:53 either shows up in the yellow fellow 33:55 this was supposed to be blue it hasn't 33:56 shown up in blue but in the blue note 33:59 okay and then you've got the middle pair 34:02 which has been extracted this together 34:05 with a pointer to this new node will get 34:07 inserted into the parent of the original 34:10 node okay so that's going to be the 34:13 strategy split an over full node that 34:16 takes one node creates two it also 34:19 throws out the middle pair and that 34:21 middle pair is going to get inserted in 34:22 the parent node okay let's see the 34:25 strategy at work okay so I'm going to be 34:28 put inserting let's say a two in here 34:32 okay so when you put it in this will 34:36 become one two three and it becomes over 34:39 full right so I got an over full node 34:46 identified the middle pair M over two so 34:49 M is three so in this cases the one in 34:52 position two says that guy everybody to 34:55 the left will remain in this node 34:57 everybody to the right will form a new 34:59 node 35:04 right so I split around the middle key 35:07 everybody to the left stays in the old 35:10 node everybody to the right forms a new 35:13 node and then the middle pair together 35:16 with a pointer to the new node will get 35:18 inserted one level up all right so this 35:29 fella is going to get split half over to 35:33 stay here okay 35:36 then the middle pair and the new node 35:39 pointer this is going to get inserted in 35:42 the parent to insert it over there this 35:47 becomes a 2 4 and since that's a 2 node 35:52 you don't exceed the capacity and you're 35:54 done 36:09 alright so in this particular case on 36:12 the way down I access the route I access 36:15 this fellow access this file I did three 36:17 read accesses okay then I split this guy 36:21 so I have to write this fellow back plus 36:24 the new guy so each time you do a split 36:26 you got to write two nodes the original 36:29 fellow got split has changed plus the 36:30 new node then we pop up one level and I 36:34 made an insert into this nodes I go to 36:35 write this one out okay so assuming I 36:39 got enough memory to store three nodes 36:40 they don't have to read it back when I 36:42 have to make a change I do three read 36:45 accesses on the way down then I do two 36:48 writes out here plus one final right up 36:51 there now let's try another insert 36:58 eighteen okay so you access this node 37:04 search it you end up here you access 37:07 this node you search it you end up here 37:08 you read access this node you search it 37:11 you fall off yeah okay so three read 37:14 accesses you fall off you go to insert 37:16 into that node it becomes over full so 37:19 we're going to split okay all right so 37:26 we're going to we have a know full node 37:28 we're going to split it down the middle 37:33 so you get sixteen that's the original 37:35 node everybody to the right forms a new 37:37 node the middle entry and a pointer to 37:40 the new node will get inserted in the 37:42 parent 37:48 all right so that nodes going to get 37:50 split the old fellow stays there the new 37:58 node middle entry and a pointer to it 38:02 are going to get inserted in the parent 38:04 so when you put it in here it'll become 38:06 15 17 20 and 17s pointer to its right 38:12 will be going down to the 18 okay so 38:14 it's like you got a 17 here and a 38:15 pointer going down to the 18 okay so 38:19 that's Oh full so you're gonna have to 38:20 split the 15 17 8 15 17 20 okay so what 38:32 if you want to see that again 38:38 so symbolically we're going to stuff the 38:42 17 in here okay so you're going to have 38:45 15 17 20 the subtrees will be 9 16 18 38:49 november 30 40 okay split it down the 38:52 middle one of them will be 15 with the 38:54 subtrees 9 and 16 the other one will be 38:58 20 with the subtree it's 18 and 30 40 39:02 okay the new fellow that's the 20 with 39:07 the 18 30 40 and a pointer will get 39:09 inserted in the parent right she end up 39:12 with this fellow so this was the middle 39:15 key this was the new node that was 39:17 created this two node and it had these 39:19 sub trees that's going to get inserted 39:22 in the parent and the parent is a 2 node 39:26 so the insertion is doable you're going 39:30 to get a 3 node out there okay it's 39:36 going to look like that 39:47 at this time we did three read accesses 39:50 on the way down I did a split at this 39:53 level so I had to write out two nodes 39:56 the original and the new I did a split 39:58 at this level I had to write out the 40:00 original and the new and then here I 40:02 didn't do a split I just did a change 40:04 here so to write that fellow out okay so 40:07 three reads and then two and two four 40:09 writes there and one right there 40:11 five okay so every time you do a split 40:13 you get two rights and then in the end 40:15 you have to do one non split right again 40:21 you can see that when you insert in this 40:24 way you preserve the property of a 40:30 b-tree the most I mean certainly the 40:33 splits are designed to preserve the 40:34 degrees the degree properties when you 40:37 split into half you when you take an 40:40 awful node and split into half you 40:42 guarantee that each of the house has a 40:43 degree at least M over to a ceiling of M 40:47 over 2 you also make sure that all of 40:51 the external nodes are the same level 40:52 because anytime you create a new node 40:55 it's on a level that existed before 41:03 let's try one more insert this time I'm 41:07 going to put in a 7 okay all right this 41:11 inserts going to change the height of 41:12 the tree the other two didn't change the 41:14 height the way the height of B 3 changes 41:17 is by a split of the root okay and it's 41:22 important that you split only the root 41:24 that is what preserves all external 41:27 nodes in the same level all right so I'm 41:31 going to put in a 7 41:32 I read that yellow node read this yellow 41:35 node read that one I fall off here and 41:37 we put it into this node this one 41:39 becomes over 4 okay you get a 5 6 7 in 41:43 there ok so I take the five six seven I 41:47 split it so when you split it the 7 41:54 right the five stays that's original 41:56 node then the new node that's created 42:00 the other half that's the seven the 42:02 middle key six and a pointer to it 42:04 you're going to get inserted here so 42:06 you'll get two four six we'll split that 42:09 so the four will get pulled out the new 42:12 node will have six and a pointer to the 42:14 5 and the 7 ok so now that's going to 42:20 get split us now two four six 42:22 ok so the four gets pulled out and then 42:26 it has a pointer to the new node which 42:28 has six and six has pointers to its two 42:31 subtrees 42:33 so the four is going to get inserted 42:35 here into this node so this will become 42:38 four eight seventeen so that's going to 42:45 get split so the eight will get pulled 42:47 out the four will stay in the old node 42:49 and it's subtrees will be the two and 42:52 the six seventeen will go into a new 42:56 node and it's subtrees will be the 15 42:58 and the 20 and then the middle key 43:02 that's eight and a pointer to the new 43:04 node will have to get inserted into the 43:06 parent 43:12 right so this process continues until 43:15 the route splits which is what took 43:17 place right now when the route splits 43:19 there's no parent in which to insert the 43:22 eight at that time we just create a new 43:25 route okay and I take the old node the 43:30 old route and a split part and make 43:32 these the subtree of the of the children 43:35 of the new route 43:45 right this works fine because the root 43:48 is only required to have two children if 43:53 we didn't have that flexibility for the 43:56 root the insert' would never work 43:57 because when you split the root there 43:59 are only two parts of that split node 44:02 that I left when you split everybody 44:09 else you're okay because everybody has 44:10 the right degree when you come and split 44:14 the root these fellows have the right 44:16 degree they all have at least M over two 44:18 children and M over two but there's no 44:20 way to make the new root have M over two 44:22 children all right so the height of B 44:29 tree increases whenever the root splits 44:33 and since that split creates a new level 44:36 up here rather than a new level down 44:38 here we're assured that all of the 44:40 external nodes remain at the same level 44:48 the number of disk accesses on the way 44:51 down we did three reads okay then you 44:55 did a split on this level you did a 44:58 split on this level a split on that okay 45:01 so each level did a split so that's to 45:06 write accesses for each split so that's 45:08 six right accesses and then you did one 45:12 right to create the new root so the 45:19 worst that can happen to you is really 45:21 what was shown by this example is if you 45:24 height is H before the insert on the way 45:27 down you can do H read accesses on the 45:30 way back up you can split to H times 45:32 once at each level so that's 2h rights 45:34 and then you can do one more right for 45:38 the new root just like three h plus one 45:41 disk accesses all right so inserts don't 45:46 work nearly as well as searches but the 45:50 alternative is a lot worse to use AVL or 45:54 red black 45:59 all right questions about b-trees 46:09 all right we'll look at the delete 46:12 operation in the next lecture all right 46:15 thanks