Transcript 00:14 okay let's get started uh you have any 00:18 questions you want to ask first 00:21 assignment is going alright because you 00:24 have one more assignment coming up yeah 00:27 okay all right so yes today we're going 00:32 to talk about a new data structure 00:34 called a segment tree and actually most 00:37 of the structures were going to be 00:38 seeing in the remaining of remainder of 00:41 this semester together with today's 00:45 structures that find a lot of use in 00:47 computational geometry and the segment 00:55 tree is one of the more elementary data 00:57 structures used in computational 00:59 geometry so if before looking at the 01:02 segment tree itself let me spend a 01:05 little bit of time just talking about 01:06 some of the problems that arise in 01:08 computational geometry all right so 01:11 basically the area is concerned with 01:14 performing computation on geometric 01:17 objects or on collections of geometric 01:19 objects and the simplest geometric 01:24 object you can think of is really a 01:25 point so you may look at collections of 01:28 points where the points may be points in 01:31 one dimension so the kind of points laid 01:33 out on a straight line or you can have 01:38 let's say points in two dimensions so 01:43 for example here you may be given let's 01:46 say a piece of sheet metal and you've 01:51 got places where you intend to drill 01:52 holes okay 01:54 so you have points in two dimensions you 01:56 have an XY coordinate associated with 01:57 each point and you may want to then 02:00 perform some kind of operations on this 02:03 collection of points so for example if 02:06 you have sheet metal and I tell you here 02:08 is where all I intend to drill holes I 02:10 might want to verify that no two points 02:13 are closer than say half an inch maybe 02:16 if they close at half an inch the metal 02:17 will buckle okay so given a collection 02:20 of 02:21 instead you want to perform some 02:22 operation on them at three dimensions 02:25 you could be given a set of locations 02:29 for aircraft flying over the airspace in 02:31 Atlanta given us three dimensions XYZ 02:35 coordinates say and now you want to know 02:37 if you might want to know which two 02:40 planes are the closest to each other to 02:44 find the closest pair of points and 02:46 three dimensions or you could have a 02:49 more general problem D dimensional space 02:51 of four five six dimensions so now of 02:53 course you're not talking about physical 02:55 coordinate dimensions but you could have 02:57 points described by that's the XYZ 03:00 coordinates plus time okay and you could 03:03 query on this collection of points so 03:06 the simplest kinds of problems people 03:08 look at in computational geometry have 03:10 to deal with collections of points where 03:12 points could come from one dimension to 03:15 three or more and some of the commonly 03:19 solve problems like closest pair of 03:20 points so which two aircraft are closest 03:23 or in the case of the sheet metal 03:26 example if you know which two points are 03:28 closest then you just have to know if 03:30 that if the distance between these two 03:31 points is smaller than say half an inch 03:34 okay so you may solve that by first 03:36 finding the closest pair of points 03:39 nearest neighbor problems arise saying a 03:46 mapping system so you're driving down 03:48 somewhere you have your coordinates and 03:50 you want to know where is the nearest 03:52 gas station okay so you get a nearest 03:55 neighbor type of problem then you have 04:02 problems that deal with lines and again 04:04 lines could be in in one dimension so in 04:08 which case they're just segments all 04:10 laid out next to each other or they 04:13 could be in two dimensions or three or 04:15 more so it's an example of say lines in 04:20 one dimension so you may just call them 04:21 intervals so you may be running a 04:23 machine shop you got a bunch of machines 04:25 that perform different tasks maybe some 04:28 machines can be used to cut metal some 04:31 could be used to drill holes some may be 04:33 paint and 04:34 and then you have jobs scheduled in your 04:37 machine shop so if your drilling machine 04:42 is busy from two to four you can 04:45 describe that as a line a line in one 04:47 dimension that begins at two and ends at 04:51 four okay so the busy intervals for 04:55 machines can be described as collections 04:57 of lines in one dimension and then you 05:00 may want to ask questions like which 05:02 machines are busy from 1:00 p.m. to 2:00 05:04 p.m. or you could take a problem we 05:10 looked at a few lectures ago IP routing 05:14 so if you look at packet forwarding so 05:16 then in packet forwarding your filter is 05:19 either given as a prefix like this or it 05:22 could be given as a range even if it's 05:24 given as a prefix you could translate it 05:26 into a range so this range for example 05:30 which matches destination addresses from 05:32 20 to 60 is really a line segment which 05:37 begins at 20 and ends at 60 so your 05:41 router table could be interpreted as a 05:43 collection of line segments in one 05:45 dimension and then when you get a packet 05:48 you know the destination address so 05:49 that's a point okay so now you want to 05:52 find the shortest line in your 05:55 collection which contains that point 05:58 there will be a longest prefix match or 06:00 most specific filter match shortest line 06:04 that map that contains a given point so 06:07 you could model IP routing as a problem 06:11 in computational geometry though it may 06:13 not look like a computational geometry 06:15 at first glance right you'd have 06:19 problems in two dimensions so now you're 06:22 dealing with rectangles or more 06:24 generally you're dealing with polygons 06:28 these show up say in VLSI CAD so you 06:33 have a collection of masks that are used 06:36 to fabricate chips and now you would 06:38 like to ask questions like well each 06:40 mask is described by a collection of 06:43 polygons so a polygon might describe for 06:45 example a wire or it might describe an 06:47 active component on the chip so you may 06:51 want to verify for example that no two 06:56 polygons that represent active elements 06:58 are closer than some threshold if 07:00 they're too close you could get a short 07:02 or you may want to verify that polygons 07:05 that represent wires wherever they make 07:08 contact with an active polygon the area 07:10 of contact is at least some amount so 07:13 you get a good contact okay so these are 07:15 really all computational geometry type 07:17 problems because you're dealing with 07:19 geometric objects Century location so 07:26 you want to provide security to a 07:28 building so you can describe the 07:30 building by its footprint so some kind 07:33 of a polygon and now you want to find 07:36 the perhaps the smallest number of 07:38 locations where you must post guards so 07:41 that the entire interior of the building 07:43 is visible by one or more guards or 07:46 maybe you going to post the guards 07:47 outside so the entire exterior is 07:49 visible by one or more guards to get two 07:51 different types of sentry location 07:52 problems or you can go further to sensor 07:56 location so you have different types of 07:58 sensors that are going to be used to 08:00 sense things in this building and 08:01 different sensors have different costs 08:04 in different capability and now you want 08:06 to find an optimal selection of sensors 08:09 and locations to sense everything in the 08:12 building so these all become 08:14 computational geometry problems because 08:17 you're really dealing with geometric 08:18 objects the geometric object here being 08:21 the building which is described by a 08:24 polygon which represents the footprint 08:26 of the building another application 08:30 could be to two-dimensional filters get 08:33 something we talked about a few lectures 08:35 ago when we looked at packet 08:36 classification so 08:40 if you have a two-dimensional filter 08:42 typically that's going to be a source 08:43 address and destination address is 08:45 what's provided in the filter and so for 08:48 example you may have a photo which says 08:51 I match all packets coming from an 08:54 address that begins with 1 0 and going 08:57 to an address that begins with 0 1 1 08:58 okay so that's the source filter and the 09:02 destination filter so we can translate 09:05 that into a range so if you assume that 09:09 the addresses are 4 bits long and again 09:11 that's very Lea small it'll be at least 09:13 32 128 now this kind of filter would 09:16 match source addresses between 8 and 11 09:18 and destination addresses between 6 and 09:22 7 ok so in this case if the 4 bits then 09:26 you could either get a 0 here you get a 09:27 1 so will be a 6 or a 7 09:29 so so this is your filter you can 09:33 describe this as a rectangle okay so you 09:38 have a rectangle where this edge goes 09:42 from 8 to 11 the width and the height is 09:45 from 6 to 7 okay so all packets whose 09:49 source distant if you look at a packet 09:51 its source and destination address is a 09:53 point in two-dimensional space where the 09:56 source address is the x-coordinate of 09:58 the destination is the Y so if that 10:00 point lies inside the rectangle then 10:03 this filter matches it if the point is 10:05 outside the rectangle the filter does 10:07 not match so now my route my 10:10 classification table is really a 10:12 collection of rectangles in 10:13 two-dimensional space and given a point 10:16 in two-dimensional space I want to find 10:19 the rectangle that contains this point 10:22 and which has the say highest priority 10:25 so find the highest priority rectangle 10:28 that contains a point so that's really a 10:30 computational geometry problem okay so 10:35 computational geometry though the first 10:38 time you hear it you kind of get the 10:39 feeling you're dealing with some 10:40 esoteric mathematical type problem it's 10:44 very useful in modeling a wide variety 10:47 of problems that we encounter everyday 10:50 and it's a huge branch of study 10:53 we're a fairly large collection of 10:57 sophisticated data structures have been 10:58 developed so we're only going to touch 11:02 on some of the more popular structures 11:04 used in this field okay all right so the 11:09 simplest structure most fundamental 11:11 structure used here is the segment tree 11:15 let's take a look at what this is when 11:19 the segment tree is used to store 11:20 intervals okay so these are really lines 11:23 in one dimension so each interval has a 11:26 starting point I and a finish point J 11:29 the assumption is that intervals have at 11:32 least unit length so I is less than J 11:35 and we assume that the start and end 11:37 points are integer the form of the 11:41 segment tree we're going to define is 11:43 really suited for static applications 11:46 static in the sense that the range of 11:50 start and end points is known ahead of 11:52 time but the collection of intervals may 11:55 change so I know that all my endpoints 11:59 are confined to be say between 0 and 99 12:02 that's known and then the intervals 12:05 themselves can be inserted or deleted in 12:07 time right so the range of points is 12:10 known ahead of time okay we looked at an 12:14 example of intervals representing 12:16 machines so if a machine is busy say 12:20 from 2 to 4 then I will describe it by 12:22 the interval to 4 the data structure is 12:27 good if you want to answer questions 12:29 like find all the intervals in your 12:32 collection that contain a unit interval 12:35 a particular unit interval so a unit 12:37 interval has a start point then its end 12:40 point is 1 more than its start point so 12:45 for example if I have intervals that 12:48 give me the busy periods for all of my 12:50 machines and I want to know all of the 12:52 machines that are busy from 2 to 3 if 12:56 you want to know all the machines busy 12:57 from 2 to 4 well then you'd first find 12:59 everything busy from 2 to 3 and then 3 13:02 to 4 and then you'd either take their 13:05 union or intersection depending 13:07 whether you wanted at some point between 13:10 two and four or all the time from two to 13:13 four okay we will see other structures 13:16 down the road which are more suited when 13:18 the query interval is bigger than one 13:22 alright so this is good for unit 13:24 interval overlap right the segment tree 13:31 is a binary tree each node of the tree 13:37 represents an interval an interval has a 13:41 start point an interval has an end point 13:45 and every node represents an interval 13:48 whose size is at least one so the start 13:50 point has to be less than the end point 13:55 now there's a variation of a segment 13:59 tree where instead of closed intervals 14:01 it's open on the left and closed on the 14:03 right and that's very similar to the 14:08 version we're looking at that one is 14:09 good if you want to do point containment 14:11 find all the intervals that contain a 14:13 point this version is good for find all 14:17 the intervals that contain a given unit 14:19 interval but conceptually they're very 14:21 very similar okay 14:25 so the start and end points are integers 14:28 all right so the root okay 14:31 I suppose that the root represents a 14:35 range or interval that begins at 1 and 14:38 goes up to n the assumption then is that 14:44 all of the intervals or segments you're 14:46 going to be storing in the tree have 14:48 endpoints which are one of the numbers 1 14:51 2 3 up through n so the root represents 14:56 the entire range of possible endpoints 15:00 even though we're assuming that the 15:03 endpoints are are integer you can 15:09 certainly do transformations from non 15:10 integral endpoints and map non integers 15:13 to integers and work in that fashion so 15:17 if you have endpoints that all known 15:18 ahead of time one point five three point 15:20 five six point two 15:21 you can map them into corresponding 15:24 integers okay if you represent only a 15:34 unit interval so the endpoint is one 15:36 more than the start then you have to be 15:38 a leaf of the tree if you're not a leaf 15:42 so the endpoint differs from the start 15:44 point by more than one then you're going 15:46 to have children and the children will 15:48 have ranges associated with them so the 15:52 range of the children is obtained by 15:54 finding the midpoint of the range of the 15:56 parent so you take the midpoint between 15:59 s and E so that's s plus E or to the 16:04 left child has the same left point start 16:06 point as the parent but goes up to that 16:08 midpoint and the right child begins from 16:11 that midpoint and goes up to the end all 16:18 right so that's the definition of a 16:19 segment tree let's take a look at an 16:21 example and that should clarify the 16:24 definition 16:26 all right so suppose that the root range 16:29 is 113 that means that the only 16:31 permissible end points are 1 2 3 4 up 16:34 through 13 so the root of the tree is 16:38 going to represent the entire set of 16:40 permissible end points search range is 16:42 113 that's s of the root and that's a of 16:45 the root the midpoint or since the size 16:50 of this interval is more than 1 this 16:52 node is going to have children the 16:55 midpoint of this range the 1 plus 13 14 16:58 over 2 7 so left is going to be from 1 17:00 through 7 and the right is going to be 17:02 from 7 through 13 so we do the same 17:09 decomposition here because the size of 17:11 this interval is more than 1 so it'll be 17:13 from 1 through 4 and 4 to 7 on that side 17:19 midpoint of 7 to 13 is 10 so 7 to 10 and 17:23 10 to 13 but this one is not a unit 17:26 interval so it will be decomposed from 1 17:29 to 2 and 2 to 5 sorry 1 to 4 so 1 to 2 17:34 and 2 to 4 17:35 okay so the midpoint is two over here 17:38 the midpoint is five this is from four 17:42 to five and five to seven okay over 17:51 there seven ten we get the midpoint of 17:54 that is eight so you'll have seven eight 17:57 and eight ten and then for this node 18:01 here the midpoint is eleven so you have 18:06 ten eleven and eleven fifteen but the 18:10 unit intervals have been kind of boxed 18:13 off and these are leaf nodes so they 18:17 won't have any children but this one is 18:19 more than unit interval it'll have 18:21 children the midpoint of that interval 18:23 is three so you get two to three and 18:26 three four these are both units 18:28 intervals so neither of them has a child 18:30 this one will have two children 18:32 five six and six seven both are leaves 18:35 that will have two children eight nine 18:38 and nine ten and then 1112 and 1213 18:45 right so the segment tree is defined 18:49 based on the range of permissible 18:52 endpoints the root represents the entire 18:54 range and then at each level you divide 18:57 the range into half until you end up 19:00 with unit intervals all right so this 19:05 would be the full segment tree for the 19:07 range one thirteen as we'll see in 19:12 practice you don't really store the 19:13 whole tree it depends on what intervals 19:17 you actually have to store it any 19:19 particular time all right so let's 19:24 suppose one of the intervals you want to 19:27 store is 311 so we have a machine that's 19:29 busy from 3:00 p.m. to 11:00 p.m. and I 19:32 want to record that in my structure to 19:36 find out where all I'm going to store it 19:39 what I'll first do is I'll highlight all 19:42 of the unit intervals that are contained 19:44 in 311 okay so that's like three four 4 19:47 five 5 six 6 seven 19:49 so on okay all right so these are all of 19:52 the unit intervals that together 19:55 represent the interval 311 all right so 20:06 I've got all of the unit intervals at 20:08 311 then to find out where I'm going to 20:11 actually store 311 okay I'm going to 20:15 store it in all those nodes here which 20:18 have the property that all of the leaves 20:21 and that subtree have been highlighted 20:23 but this property doesn't hold for the 20:26 parent of that node okay so for example 20:30 here okay if you look at this the 20:35 subtree rooted here then all leaves in 20:37 this subtree are highlighted if you look 20:40 at its parent it's got at least one leaf 20:42 that is not highlighted okay so this 20:45 node satisfies this property that all 20:48 leaves in its subtree are highlighted 20:50 but that's not true for its parent so 20:52 you're going to store it in this node 20:56 okay so it's going to be stored here 21:03 it's not going to be stored here even 21:06 though all leaves in this node a 21:09 highlight it because if you go to the 21:10 parent five seven all of its leaves are 21:12 also highlighted it's not even going to 21:16 be stored here because if you look at 21:18 its parent all of the leaves in its 21:20 subtree are highlighted if you look at 21:23 its parent this has got a couple of 21:25 leaves that are not highlighted okay so 21:27 it's not going to get stored there 21:29 what's going to get stored here all 21:35 right coming to this side it's not going 21:37 to be stored here here or here not even 21:39 there okay it's going to get stored here 21:42 all leaves in the subtree rooted here 21:44 are highlighted but if you look at the 21:46 parent it's got some leaves that are not 21:48 highlighted okay it wasn't going to get 21:50 stored over there and then on this side 21:52 is going to get stored in this node all 21:57 right so the interval 311 is going to be 22:00 stored in exactly four nodes one at this 22:03 level of one here and two over there are 22:11 any questions about which nodes store a 22:14 given interval let's take a look at 22:21 another example 22:22 all right for ten so we again highlight 22:25 all of the leaves it's going to get 22:29 stored here because all leaves in it 22:31 subtree are highlighted and that 22:34 property is not true for its parent okay 22:36 go over there 22:38 similarly it's only gets stored out here 22:45 now an interval is going to get stored 22:47 in the root only if it covers all of 113 22:51 okay so all of the leaves in the tree 22:53 are highlighted 22:58 okay alright so in a real application 23:07 you're going to have a large number of 23:10 intervals not just one and those 23:13 intervals would get placed in nodes of 23:14 your segment tree and some nodes will 23:18 end up with no interval some will end up 23:20 with a large number of intervals so 23:27 suppose we have a very simple case like 23:31 here where there's only one interval so 23:34 that's being stored here here here and 23:36 here well then I don't really have any 23:41 use for many of the nodes that are shown 23:44 in this tree okay all I need to really 23:47 store are those nodes that contain data 23:52 intervals together with their ancestors 23:55 so I have paths from the root to get 23:57 there so you will actually store only 24:03 nodes that contain an interval so the 24:05 those label yellow together with all of 24:08 their ancestors so the minimal tree 24:11 structure that includes the nodes that 24:13 contain intervals all right so these are 24:20 the only nodes that I need to keep the 24:23 ones enclosed by this red curve and then 24:31 as you insert items you may end up 24:33 growing the tree as you delete items you 24:38 may contract the tree 24:45 yeah 24:58 okay the purpose of this tree is to 25:03 store all of the line segments that you 25:06 have in your collection of line segments 25:07 okay so we're going to be trying to 25:11 answer questions on collection of line 25:12 segments each line segment has a start 25:14 point and an end point so for example 25:18 here I'm looking at a segment that 25:21 begins at 3 and ends at 11 so that could 25:23 represent a machine that's busy from 3 25:25 p.m. to 11 p.m. so you represent that 25:28 fact I will store that segment I let's 25:30 suppose this machine is called a machine 25:32 a ok so then in this node I'll store the 25:36 value a in that node I'll store the 25:38 value a here I'll store a and Heroes 25:40 store a so they'll tell me that machine 25:42 a is busy from 3 to 11 25:44 okay so depending upon the busy 25:48 intervals that you have for your 25:50 particular machine shop you will have 25:52 some number of intervals that you go to 25:53 store so if you've got 10 machines and 25:57 each machine has 4 different intervals 25:59 when they're busy you may have 40 26:00 intervals to store so inside the nodes 26:04 with storing intervals and technically 26:08 we're not really storing the interval 26:10 itself in the sense of 3 to 11 but we're 26:12 storing a name for the interval so name 26:15 for the machine that's busy ok so each 26:19 of these fellows like I said here if I'm 26:21 talking about the drilling machine and 26:23 in your shop you may have some 26:26 identifier associated with the drilling 26:27 machine let's call that an fie a then 26:29 I'll store the label a a a a you may 26:33 have some other machine a pink machine 26:34 you call that B and maybe the pink 26:36 machine is busy only from 3 to 4 so be 26:38 stored here but not anywhere else you 26:42 may have a sewing machine called that 26:44 machine C and that machine is busy from 26:47 5 to 7 it will store that here ok so 26:52 each of these nodes contains lists of 26:55 intervals or identifiers for the objects 26:59 associated with these intervals 27:02 and as a result of that some of these 27:04 nodes may be empty because you don't 27:07 have any intervals in the collection 27:09 that fall over this one and others would 27:13 be non-empty okay 27:14 so the tree structure that you'll 27:15 actually have and this is really a 27:17 dynamic tree structure the thing that is 27:19 static about it is the range of end 27:20 points the structure itself is dynamic 27:24 as you insert new intervals the tree 27:27 could grow because the tree must always 27:30 contain those nodes that have intervals 27:34 stored in them together with all of 27:35 their ancestors as you delete items so 27:39 for example if this node became empty 27:41 okay then this node would go that node 27:44 would go that node would go okay so this 27:48 the tree itself changes in time okay 27:53 other questions about the tree I will 28:02 look at those now basically it's you can 28:04 insert an interval you can remove an 28:06 interval and then you can do a curry for 28:08 a unit interval okay all right let's 28:13 take a look at some properties okay 28:16 the height of the tree if you're 28:20 representing an interval from 1 through 28:21 n then at each level we cut the size of 28:25 interval into half so here the interval 28:27 size is n then it becomes n over 2 and 28:29 over 4 and over 8 so the height of the 28:31 tree will be about your logarithm log of 28:33 n base two or more formally it's log n 28:38 minus 1 base 2 minus 1 so the height of 28:42 the tree is logarithmic again depending 28:49 upon the intervals you have you may not 28:50 have all the levels being stored so 28:53 that's the worst that can happen to you 28:54 you could have all log n levels 29:02 if an interval okay it's stored in a 29:07 node fee then it's not stored in any 29:10 ancestor of the node so if it's stored 29:15 here then it can't also be stored in the 29:17 parent that follows from the definition 29:19 okay to be stored in a node all of your 29:22 leaves have to be marked but that 29:24 probably should not hold at the parent 29:25 okay so if you're stored here you can't 29:28 be stored there as well or over there or 29:31 somewhere else the next property is if 29:38 you stored in an old you can't also be 29:39 stored in the sibling so for example if 29:43 you're stored here you can't also be 29:45 stored here because if you stored in 29:46 both places well then you shouldn't be 29:47 in either you should be in the parent or 29:50 some other ancestor given any level you 29:58 can be stored it in at most two nodes at 30:01 that level okay so for example here okay 30:07 I'm in two nodes there can't be a third 30:11 note that you're stored in so for 30:12 example if you stored out here well you 30:14 shouldn't have been in either of these 30:15 two you should have been in the parent 30:17 but if you look at this fellow here 30:18 let's suppose you stored at 10:11 and 30:20 you also stored in for five okay and 30:25 maybe also stored in seven eight okay 30:28 that's not possible that you're in for 30:29 five you and seven eight and ten eleven 30:31 because our segments are continuous so 30:34 if you're stored in for five and you 30:36 store in ten eleven then all the leaves 30:38 are five seven seven eight and eight ten 30:40 must also be highlighted in which case 30:44 four five and five seven he should have 30:46 been stored up there 30:47 okay so you can reason that you cannot 30:50 be stored in more than two nodes at a 30:53 level okay that follows from the fact 30:59 that segments are continuous all right 31:03 and this property then says that the 31:06 total number of nodes in which you can 31:08 be stored is log event a really two 31:11 times log event 31:12 your height is log in two times at each 31:16 level so the growth in the size of the 31:20 data if you've got 20 intervals and your 31:24 ranges and you may need 20 log in space 31:28 or times some constant all right let's 31:36 take a look at the operations okay 31:38 suppose you want to do an insert okay 31:44 all right so we start at the root of the 31:47 tree and again keep in mind that now if 31:51 you have no intervals previously stored 31:53 you don't have a structure okay it's 31:56 empty so as we move through we may be 31:59 growing the tree all right so we start 32:03 at the root okay which of course may not 32:05 be there in which case you have to 32:06 create it and we're going to see if we 32:11 fully overlap the range of the root okay 32:14 so if you fully overlap this then you're 32:16 going to get stored here if you don't 32:19 fully overlap then you have to see do 32:21 you have an overlap on this side in 32:23 which case you'll be stored somewhere 32:24 here and some number of nodes here do 32:28 you have an overlap with this range if 32:30 so you'll be stored in some nodes out 32:32 here as well okay all right so you look 32:36 at 311 so 311 doesn't fully cover this 32:40 it's not going to be stored here okay if 32:43 it was going to be stored here I don't 32:45 have to go further down because you 32:47 can't be stored in a node and an 32:49 ancestor which is the same as saying you 32:50 can't be stored in a node and mr. 32:52 descendant okay so if you get stored in 32:54 a node you don't have to look at 32:55 descendents okay all right you're not 32:57 going to get stored here so I'm gonna do 32:59 there's a possibility here to be stored 33:00 in a descendant so I'm gonna check out 33:02 here there is an overlap says gonna get 33:04 stored it at least one point in the 33:06 subtree okay so now I see is 1/7 33:11 contained in 311 it's not so there's at 33:14 least one leaf in this subtree that's 33:16 not going to get highlighted okay so I 33:18 check with the left side there is an 33:21 overlap but this is not fully contained 33:23 in that so some leaves here will not get 33:25 highlighted 33:26 so come down here to four well that's 33:29 not well look here one two there's no 33:31 overlap so you're not going to be in 33:32 this subtree I don't need to go further 33:34 down here to four there is an overlap 33:38 but it's not fully contained so I check 33:42 with two three two three there is no 33:45 overlap so you won't be stored there 33:47 three four 33:49 there's no will happen three four is 33:50 contained so it's going to get stored in 33:52 this node all right so it's going to get 33:57 stored in that node if it had 33:58 descendants I wouldn't go any further 34:00 down so now you kind of back up you're 34:03 backing up from right child's you come 34:05 out here you see if there's an overlap 34:07 with four seven there's an overlap with 34:10 four seven and four seven is contained 34:12 so it's going to get stored out here we 34:15 don't check the subtrees all right then 34:18 I move over to this side there's an 34:21 overlap but seven thirteen is not 34:22 contained okay so I check here seven ten 34:27 is contained okay so it's going to get 34:30 stored here you come to this side there 34:34 is an overlap so at least in one node 34:36 here it's going to get stored then 13 is 34:40 not contains it's not going to be stored 34:42 here so you check the left 10 11 is 34:45 contains is going to get stored here 34:48 there's an overlap so it's going to get 34:50 stored somewhere down here but it's not 34:52 contained so it won't be in this node 34:53 you check here there's an overlap and 34:56 it's contained so we'll be stored there 34:58 oops sorry let me go back there see what 35:02 happened 35:11 okay alright so ten eleven okay then I 35:17 come here and check if there's an 35:20 overlap okay so here there is no overlap 35:23 if there's an endpoint correspondence 35:26 but to have an overlap you could have at 35:28 least a one unit intersection okay so 35:31 there's no overlap here so I don't check 35:33 the subtree down there okay so you can 35:38 start at the top of the tree and kind of 35:42 move your way down growing the tree and 35:45 inserting the interval any questions 35:53 about an insert now to insert the 35:58 interval you're going to need a data 36:00 structure in each of these nodes because 36:02 each node could contain several 36:05 intervals but depending upon the 36:09 application the data structure used here 36:12 might just be a chain so when you want 36:13 to insert you add it into the front of 36:15 the chain it may be a doubly linked list 36:19 which might support deletion from 36:20 arbitrary points if you keep track of 36:22 where you inserted things it could be a 36:25 max-heap 36:26 if eventually you want to extract items 36:29 by priority associated with it intervals 36:31 okay so the data structure in each node 36:34 varies depending upon the application 36:41 right you can state the insertion 36:44 algorithm recursively so we're inserting 36:48 an interval that begins at s goes to e 36:52 into the node V of my segment tree so if 36:58 the interval that you're inserting s E 37:01 is contained in the range of the node so 37:05 the other way around if the range of the 37:07 node is contained inside the interval 37:09 that you're inserting then you just put 37:12 it into that node so if the range of the 37:17 node is contained inside the interval 37:19 inserting you put it in that node and 37:20 you're done if it's not then you check 37:25 to see if you have an overlap with the 37:27 range of the left child 37:29 if so you recursively insert into the 37:32 left subtree you check to see if you 37:34 have an overlap with the range of the 37:36 right child if you do then you 37:37 recursively insert into the right 37:39 subtree 37:49 all right so so that's the algorithm the 37:54 question we'd like to answer of course 37:57 is how many nodes of the segment tree 38:00 does this fellow visit to complete its 38:02 job so we already know the complexity of 38:09 the insert there are two components to 38:11 the complexity of the insert one has to 38:13 deal with how many nodes do you visit in 38:15 the process and then the other component 38:17 has to do with how much time does it 38:18 take to actually insert this interval 38:20 into the data structures in each node 38:22 now since we don't know what data 38:24 structure is being used in each node we 38:26 can't really answer that question we've 38:28 been really answer the question about 38:29 how many nodes do you visit all right so 38:35 let's try to answer that question so if 38:38 you're inserting say 311 let's say these 38:42 are the nodes that get highlighted okay 38:47 of the nodes that are highlighted okay 38:49 again the insertion algorithm doesn't 38:52 really visit all these notes but let me 38:53 just define some terms of respect to 38:55 this okay let Ellen R be the leftmost 39:00 and rightmost leaves that got 39:01 highlighted so this is the leaf that 39:03 represents the leftmost unit interval 39:06 okay you were inserting three levels so 39:09 3/4 is the leftmost and 1011 is the 39:11 rightmost all right so the way the 39:17 algorithm works is exactly going to 39:20 start at the root and it's going to find 39:21 its way to the leftmost and it's also 39:24 going to find its way to the rightmost 39:26 okay so it's going to visit this node 39:29 okay and then it's also going to visit 39:33 that node that's how you get there and 39:36 so on okay all right so we're going to 39:39 visit this node okay we're going to 39:42 visit that node then we're going to 39:45 visit the ancestors of these fellows so 39:47 we was at that fellow we're gonna do is 39:49 that guy this guy the root and then the 39:54 ancestors of ten eleven on that side 39:57 okay all right so 40:05 all right so we're going to visit the 40:07 two nodes that represent the it's a unit 40:11 and towards the leftmost and rightmost 40:12 okay 40:13 possibly you may not always get there 40:16 because if this guy had also been 40:18 highlighted we were stopped out here 40:21 okay so if this was the leftmost we 40:24 would not visit this we would only come 40:26 this far okay but in the worst case you 40:29 will visit the leftmost and you'd visit 40:32 the rightmost unit interval plus you 40:34 would visit the ancestors of these 40:37 fellows and then in addition to all of 40:40 these guys for each node that you can 40:43 come to here you can visit its other 40:45 child now you visit a node only if the 40:49 recursion actually moves to that node 40:51 and to move to a node you have to do the 40:54 overlap check so when I visit this node 40:59 I do an overlap check the overlap check 41:01 fails here so don't come here so I don't 41:03 really visit this node I do an overlap 41:06 check here the overlap check succeeds 41:08 and I move to this node okay so by 41:10 visiting a node my recursive algorithm 41:12 actually moves to that node all right so 41:15 we're going to move to all of the nodes 41:17 that are checked marked okay and then 41:19 when you come here you're going to move 41:21 to this fellow okay so the other child 41:23 of this guy okay but we don't move to 41:26 the other child of this guy or the other 41:28 child of that guy okay similarly on this 41:31 side okay we'll move to the other child 41:33 of 7:13 the other means this is the one 41:36 already marked the other one is this guy 41:37 okay so we will be out there where we 41:40 want to mark it all right so the nodes 41:46 that are visited and by visit I mean my 41:49 recursion actually moves there you move 41:52 there only if the overlap check succeeds 41:56 so the worst that can happen to you is 42:00 the leftmost and rightmost unit 42:02 intervals get visited and as we've seen 42:05 they may not get visited because you may 42:07 stop at some ancestor of them but in the 42:10 worst case you get there plus all of the 42:13 answers 42:14 of these fellows could get visited plus 42:17 the other child okay the one that hasn't 42:20 been marked as visited could get visited 42:22 again that's only a worst case scenario 42:23 because here in this case for example 42:26 the other child is not visited similarly 42:28 for this node the other child is not 42:30 visited okay okay so if you look at all 42:35 of those nodes together the worst that 42:41 can happen to you is this unit interval 42:43 is at the lowest level so you login okay 42:46 so this path has got log n nodes on it 42:50 this could also be in the worst case all 42:52 the way down here and that path could 42:54 have log n nodes on it so that's 2 log n 42:57 nodes and then each of these nodes could 42:59 have an other child that got visited so 43:02 that's 4 log n nodes 43:03 okay so 4 log in is kind of a crude 43:06 upper bound so the maximum number of 43:10 nodes I can visit get visited is 43:12 actually less than 4 log n and so your 43:15 complexity as measured by number of 43:17 nodes visited is order of log n goes to 43:20 that you got to add in the time to 43:21 actually insert into a node alright so 43:28 this kind of analysis is useful it's 43:31 going to be used in some of the other 43:33 structures we're going to look at as 43:34 well to show that the complexity of 43:36 those algorithms logarithmic we will 43:39 define a bounding envelope which is 43:41 given by these two leftmost nodes and 43:45 all of their ancestors and those 43:47 bounding envelopes would be of size at 43:48 most two log N and then you may look at 43:51 kind of the other child of nodes on this 43:54 bounding envelope or any questions about 43:58 this complexity analysis all right a 44:06 delete works in pretty much the same way 44:08 if you're given the interval that has to 44:10 be deleted okay well then the algorithm 44:14 is really the same as that for the 44:15 insert except that when you get to a 44:17 node where you are supposed to have 44:19 inserted it you now need to remove it 44:20 from that node 44:27 both for this algorithm and the one way 44:29 you start the entry assumption is that 44:32 there is an overlap with the range of 44:35 the node V all right so the complexity 44:40 of this fellow is also log n same number 44:44 of notes get visited during a delete as 44:46 they do during an insert at least from 44:51 the worst case point of view alright the 44:55 other operations supported by this data 44:57 structure is to find all of the 45:00 intervals in your collection that 45:03 overlap a given unit interval okay all 45:09 right 45:09 and so to do this we're going to follow 45:11 the path from the root okay down to a 45:15 leaf that contains this unique this unit 45:18 interval and all of the segments stored 45:23 in those nodes will be the answer to our 45:25 query we will not be reporting any 45:33 segments twice because no segment or 45:36 interval can be contained in both a node 45:37 and it's ancestor okay so you get 45:41 distinct responses so let's take a look 45:44 here okay so I want to find all of the 45:47 intervals or segments of my collection 45:49 that contain five six well everybody's 45:52 stored here contains five six because 45:54 everybody stored here spans all of 115 45:56 okay nobody on this side can contain 46:00 well it's not that nobody on this I can 46:02 contain five six but people who don't 46:05 cover this whole range and contain five 46:07 six are going to be represented here 46:10 because they may also be represented 46:12 there but we don't need to check those 46:13 it's enough to come here okay so 46:16 everyone stored here spans five six so 46:19 everybody here has to be reported nobody 46:22 stored here is also stored there so 46:24 there are no duplicates okay then four 46:26 five six would come down here okay okay 46:30 so all the ancestors right so come there 46:33 I come over here 46:35 everybody stole here contains five six 46:37 and they're all different from what's 46:38 stored here and what stored that same is 46:42 true in this node okay and then 46:46 everybody over there so if I travel down 46:50 this path and report all of the elements 46:52 stored in all of those nodes I will 46:55 report every single interval a segment 46:57 in my collection that overlaps five six 46:59 and there'll be no duplicates or the 47:08 time required for all of this the length 47:11 of this path is log n so the number of 47:13 nodes visited is log N and then 47:15 depending upon the size of these 47:18 structures you would need additional 47:21 time and typically a data structures 47:24 allow you to traverse them and climb 47:26 proportional to their size so whether 47:28 you have a chain or a doubly linked list 47:29 or a heap you can collect all the 47:32 elements they're in order of number of 47:34 elements stored in that node so the time 47:38 required then becomes log n because 47:40 they're log n nodes plus S which is the 47:42 total number of intervals in the answer 47:56 okay all right you have any questions 48:00 about the segment tree so basically it's 48:08 a data structure to store a collection 48:10 of segments the version we looked at you 48:13 need to know the end points ahead of 48:15 time you need to know the range of 48:16 endpoints we assume the integer you can 48:20 perform three operations insert delete 48:22 and report all segments in your 48:25 collection that overlap a given unit 48:27 interval it takes the number of nodes 48:32 visited by all three operations as log n 48:34 the actual time for insert and delete 48:36 depend upon the data structure used in 48:39 each node and the query time is log n 48:43 plus total number of segments in the 48:46 answer yep yep right here we have to 48:59 know the interval ahead of time okay 49:02 because you know if I don't know the 49:04 interval ahead of time I don't know the 49:05 range the route and if I don't know that 49:07 range the route I can't figure out the 49:09 midpoints okay so you've got to know 49:11 that this tree has endpoints only from 1 49:14 through 13 if it was going to later 49:16 change 1 through 16 it'll be a totally 49:18 different tree ok so you do have to know 49:21 that ahead of time and that's typically 49:25 not a problem in many geometric 49:28 applications because the objects are 49:30 placed in a finite space to start with 49:33 ok so you don't change the size of the 49:36 space in which you're placing the 49:38 objects ok so we'll stop here 49:45 you