Transcript 00:00 the following content is provided under 00:01 a Creative Commons license your support 00:04 will help MIT OpenCourseWare continue to 00:06 offer high quality educational resources 00:08 for free 00:09 to make a donation or view additional 00:12 materials from hundreds of MIT courses 00:14 visit MIT opencourseware at ocw.mit.edu 00:25 so we're going to talk about routing 00:27 protocols and today we're going to talk 00:29 about how adding protocols handle 00:32 failures so I'm gonna I want to bring 00:34 everybody back onto the same page with 00:36 with respect to where we are in the 00:39 story so in terms of writing protocols 00:44 if you imagine network topology like 00:48 this 00:57 and let's say that links have costs 00:59 associated with them so I'm just going 01:01 to make up some numbers here 01:10 we studied two classes of routing 01:12 protocols and we looked at these in the 01:14 absence of any failures the first is the 01:18 distance vector protocol or more 01:19 generally vector protocols where in an 01:23 advertisement sent from one node to 01:25 another each node sends a subset of its 01:28 routing table in other words it sends 01:29 two columns from its routing table 01:32 recall that the rotting table that every 01:33 node contains a destination a route 01:39 which is a link of the name of the link 01:42 and a cost and in a distance vector 01:46 protocol you send these two columns you 01:51 send these tuples for all of the 01:53 destinations that you know and that's 01:54 what spread out to all of the other 01:56 nodes in contrast in so this is distance 01:59 vector in contrast in a link state 02:02 protocol what you're sending to all of 02:06 the other nodes is information about all 02:08 of the links that you have and in 02:10 particular you send information about 02:13 the neighbor 02:18 in your topology together with the link 02:22 cost to that neighbor whereas here it's 02:26 the cost of the route to the destination 02:30 that you sent in the distance vector 02:34 protocol the computation of the routes 02:36 is distributed so every node kind of 02:39 computes and updates its routing table 02:41 using the bellman-ford update step which 02:44 essentially updates the route to a 02:46 destination if you find a better route 02:48 to the destination or if you find that 02:50 the route to the destination has you 02:52 know that you currently have has a 02:53 changed cost in which case you update 02:55 your routing table in a link state 02:57 protocol the computation is not 02:59 distributed what's distributed is the 03:01 process of flooding this information of 03:04 the local links that's why it's called a 03:07 link state protocol and the computation 03:10 of the routes themselves are centralized 03:11 each node just runs a shortest path 03:14 computation for example Dijkstra's 03:16 shortest path computation and in a link 03:19 state protocol as long as all of the 03:20 nodes run exactly the same computation 03:22 in other words they all try to minimize 03:24 or optimize the same metric then you're 03:27 guaranteed that all of the nodes end up 03:29 computing the correct routes and that 03:31 when you send a packet from one node it 03:33 will reach the reach the destination 03:34 assuming there's a path in the network 03:36 and in the problem said maybe not in the 03:40 problems certainly at the back of the 03:41 chapter there are problems on what 03:44 happens if some nodes on one protocol 03:46 and another node runs on another 03:48 protocol for example you might have 03:49 imagined a routing protocol where one of 03:52 the nodes is doing minimum cost routes 03:53 and another node is doing shortest paths 03:55 in terms of the number of hops not costs 03:57 and if you have that you might end up in 03:59 a situation where in fact the routing 04:01 doesn't work correctly 04:03 but assuming that all of the nodes are 04:05 on the same optimization in a link state 04:08 protocol they'll all get the correct 04:09 answer and similarly in a 04:11 distance-vector protocol actually you 04:12 know the update rule can do anything as 04:14 long as the nodes believe that there's a 04:15 route and they perform some sort of an 04:18 update to there are in table entry 04:19 consistent with the information that 04:21 they hear then the routing will work now 04:25 what happens when there are failures and 04:26 how does it work so let's say that you 04:28 know you run this protocol on this 04:30 topology and you give it enough time 04:32 once you hear all of the advertisements 04:34 and compute the routes of the different 04:37 nodes if you look at this node over here 04:41 see for destination D would have a route 04:45 let's call this link l0 it would have a 04:49 route l0 and the cost of L that route 04:51 would be 3 and similarly at B here once 04:55 it converges you would have for 04:57 destination B let's imagine this link is 05:00 called l0 as well at node B you might 05:03 have a 0 and the cost would be 4 and 05:07 similarly at the other nodes now let's 05:11 say what happens is that some sort of a 05:14 failure occurs and failures could be one 05:16 of you know a variety of failures could 05:18 occur that caused the routing to get 05:22 screwed up so one of the failures the 05:24 easiest form of failure is that a packet 05:26 could be lost in particular an 05:27 advertisement could be lost 05:40 the second thing that could happen is 05:42 that a link could fail I mean if it's a 05:47 wire maybe a backhoe runs over it which 05:49 is actually more common than you think 05:51 in fact you know you could have undersea 05:54 cables where your sharks bite them and 05:56 they can destroy it I mean lots of 05:58 things could happen so links could fail 06:01 the third thing could have that could 06:02 happen which is more common than you 06:04 might think 06:04 is that entire switches could fail or 06:06 nodes could fail since we actually don't 06:14 know how to write bug free software 06:16 software bugs may cause things to fail 06:18 they may cause things to crash in fact 06:20 they may cause things to fail in 06:21 mysterious ways where you know it looks 06:24 like it's not failed but in fact there's 06:26 a Fault in the software causing you to 06:28 send bad advertisements and for this 06:31 course we won't worry about this will 06:32 actually worry about only a class of 06:34 failures that you think of as fail stop 06:39 what I mean by that is if it fails it 06:41 just stops as opposed to it fails and 06:44 then it gives you wrong information 06:45 which is actually a lot harder to deal 06:46 with because it's much better for you to 06:49 fail and just stop rather than for the 06:51 node to fail and pretend that it's 06:52 correct and send you bad information 06:53 that's a lot harder to deal with we'll 06:56 worry about that later 06:57 not not in six or - anyway these things 07:00 could happen so let's concretely assume 07:02 that you have this topology here and 07:04 what happens is this link fails 07:10 now if you did nothing and you had that 07:14 links fail what what would happen now 07:17 let's look at it node by node right what 07:20 would happen here is that if C would end 07:24 up with no route to the destination 07:26 because what would happen is assuming 07:29 that C has some way of discovering that 07:30 this fault has happened it would have no 07:33 route to the destination but if it 07:34 doesn't discover the fault would have 07:35 had happened it has this route to the 07:37 destination but if it sends packets on 07:39 that link it wouldn't reach the other 07:40 side packets would be lost but assuming 07:43 that it discovers has a way of 07:44 discovering that this link has failed it 07:46 has no route to the destination now B 07:49 has a route to the destination it has 07:51 this link but in fact you know what the 07:56 next advertisement from C if C were to 07:58 make another advertisement would tell it 08:00 that it had no route to the destination 08:01 but if it did nothing if you did nothing 08:03 in the protocol B thinks it has a route 08:05 to the destination but in fact it 08:06 reaches C and it's actually a dead-end 08:08 because it reaches C and C doesn't know 08:10 what to do with it it just drops the 08:12 packet okay the technical term for this 08:14 is a dead end you know you send packets 08:17 you think you're reaching the 08:18 destination but it's actually getting 08:19 dropped 08:21 what about s does s have a route to the 08:23 destination well s doesn't have a route 08:26 to the destination either because the 08:28 right route to the destination would 08:29 probably have been this this link over 08:35 here because the cost was two plus one 08:37 plus three and that wouldn't have a 08:38 route to the destination it would have a 08:40 route but there are too late to a dead 08:41 end at C and you know in fact in this 08:43 particular example no node would have a 08:45 route to the destination in terms of the 08:48 routing table itself no node would have 08:51 a route that actually worked in that the 08:53 route wouldn't correspond to an actual 08:55 working path but if you look at the 08:57 picture clearly there are other ways to 08:58 get to the destination we designed this 09:00 topology presumably because we had some 09:03 sort of redundancy so if this failed 09:05 what you would like to have is for C to 09:08 use one of the other paths C might use 09:10 this path or C might use this path or C 09:13 might yeah those are the two possible 09:15 paths it could use and similarly s might 09:17 use should use that path and so forth so 09:21 what you want is a routing protocol that 09:23 converges to the new correct answer 09:26 assuming there's a new correct answer 09:28 and if there's no new character answer 09:30 it converges to whatever the best 09:32 possible answer is so to some 09:34 destinations you could have a route into 09:36 some destinations you don't have route 09:37 so what you want is as long as there is 09:40 a connected path there is a path between 09:43 a source and a destination you would 09:45 like that source to end up with a route 09:47 that corresponds to some good working 09:50 path and particular converges to the new 09:53 minimum cost working path between the 09:56 source and the destination so that's the 09:57 statement of the problem 09:59 and we're going to solve that problem 10:01 today for both distance-vector and lynx 10:05 cake and interestingly the idea that 10:07 we're going to use is the same idea in 10:09 both cases and the idea is just like we 10:12 built a redundant topology by having you 10:15 know alternate paths between places 10:17 we're just going to repeat 10:18 advertisements and we're going to repeat 10:21 the process of processing these 10:23 advertisements it's a very simple idea 10:25 and the general plan there are three 10:28 steps to the plan the first step in the 10:30 plan is to periodically check every 10:33 neighbor is responsible for checking the 10:35 health of every node is responsible for 10:38 checking the health of its neighbors 10:39 okay so that's the step which we're 10:43 going to call neighbor liveness and the 10:51 protocol we're going to use for that is 10:53 called the hello protocol it's a very 10:56 very simple protocol I'll describe it in 10:58 a moment and we're going to use this 11:01 idea that every node is responsible for 11:03 checking whether each of its neighbors 11:05 is alive and if it determines that a 11:08 neighbor is not alive 11:09 it assumes the neighbor is dead and 11:12 removes it from various tables and data 11:15 structures and so on and in fact this 11:17 fail stop assumption is pretty crucial 11:18 for us because the assumption is that 11:20 when a failure occurs of a node that in 11:23 fact the node doesn't respond if a 11:25 failure occurs of a link the Assumption 11:26 here as well is that the link stops 11:28 responding you know you don't get to 11:30 send packets or receive packets over 11:31 that link 11:32 and for now we're going to assume that 11:34 every link is bi-directional so you know 11:36 you send packets in both directions in 11:38 reality there are unidirectional network 11:40 links and you have to deal with the 11:42 problem differently not going to worry 11:44 about that so there's a protocol called 11:47 the hello protocol that runs to detect 11:49 if your neighbor is alive or not the 11:53 second step in our answer is to make the 11:56 advertisements periodic and the third 12:05 step is what you do when you receive an 12:08 advertisement when you receive an 12:10 advertisement you collect a bunch of 12:11 these advertisements that you received 12:13 from various neighbors in the link state 12:15 protocol it's these link-state 12:17 advertisements and the distance-vector 12:18 protocol it's these distance-vector 12:20 tuple advertisements and then you run a 12:24 periodic integration process 12:35 so if you look at it with a time line 12:42 every node is synchronously in other 12:45 words independent of the other nodes you 12:48 don't have to synchronize the clocks 12:49 every node has its own you know clock 12:51 and every node does a periodic these two 12:54 steps periodically so from time to time 13:00 it sends an advertisement it just says 13:08 you know in distance vector it just 13:09 sends these two columns to its neighbors 13:12 in the link-state advertisements sends 13:14 out its link state information and the 13:16 flooding process works and then from 13:19 time to time there's this integration of 13:21 these advertisements that happen 13:30 etc now the beautiful part of these 13:34 propels is that you know I've shown this 13:36 picture here with these integrations 13:37 happening interspersed with the 13:38 advertisement so that doesn't actually 13:40 have to be the case you could do them 13:41 pretty much arbitrarily as long as you 13:43 do them periodically the beautiful part 13:46 of this these protocols is that every 13:49 node is synchronously running these 13:50 advertisement steps and these 13:52 integration steps as long as they do 13:53 this periodically in the end what you 13:57 get is a property called avenge eventual 13:59 convergence what that means is assuming 14:16 you have all sorts of failures and any 14:18 pattern of you know packet losses link 14:19 failures and switches and then you 14:21 freeze the system and assume that no 14:26 more failures happen then what eventual 14:29 convergence means is that in some finite 14:32 time all of the nodes in the network 14:36 will converge to correct routing State 14:37 that is in these routing protocols all 14:40 of the nodes will end up with an answer 14:42 that's consistent with what you were 14:43 trying to optimize for example minimum 14:46 cost paths to all the destinations now 14:49 proving that under an arbitrary model of 14:52 you know when these advertisements and 14:54 integration steps are all asynchronous 14:56 and being done at random times is a 14:58 little involved and we're not going to 15:00 attempt that in this course the nodes 15:02 talked a little bit about how you get 15:03 eventual convergence when you assume 15:05 that all the nodes are running very 15:06 periodic advertisement steps 15:09 interspersed with integration steps the 15:10 proofs really not that important what's 15:12 more important is for you to understand 15:14 the intuition behind why why it works so 15:16 I'll do that by some examples here 15:18 I have to also tell you what the hello 15:22 protocol is I'll get to that but for now 15:24 just assume it's a module that tells you 15:26 if the neighbor is alive or not 15:28 so is this plant clear to everybody it's 15:31 just the same idea except every node is 15:33 doing this periodically so in practice 15:35 you might do this every 30 seconds or 15:36 every 3 minutes or something like that 15:38 of course the longer the time between 15:40 advertisements the longer it's going to 15:42 take for the protocol to converge after 15:45 a failure or set of failures and the 15:47 shorter the time it takes a quicker 15:50 amount of time to converge but you end 15:52 up doing a lot more work and moreover in 15:56 practice many failures are transient so 15:58 you know a link may fail for a for a few 16:00 seconds and then come back up and so 16:02 it's in practice not that useful to 16:04 converge very very quickly or react very 16:07 very quickly it's important to converge 16:09 quickly once you start the convergence 16:10 process but once you you know detecting 16:13 that a neighbor is alive or dead on the 16:15 time scale of a few packets sometimes 16:19 you know too fast because sometimes 16:21 failures last for very little time and 16:22 then they go away 16:23 and in the mean time you've done all 16:25 this work to converge to a new rotting 16:26 state and then when the link comes back 16:28 up you're going to do more work to come 16:29 back to the old answer you may as well 16:31 have just been a little lazy and so 16:34 deciding these times is tricky and 16:36 there's no real systematic way of doing 16:38 it in practice but the trade-off is 16:41 usually between how quickly you wish to 16:44 converge and how much work you're 16:45 willing to expend in making that 16:47 convergence happen so is the plan clear 16:50 it's just the same protocol except we're 16:52 going to do this periodically 16:56 so the first step is this enable 16:58 liveness of the hello protocol so that 17:01 protocol is actually very easy every 17:03 node you know you have a set of links 17:07 coming out of the node and the neighbors 17:09 at the other end of it so the problem is 17:11 that the node needs to decide which of 17:13 these links is working or not working 17:16 and which of the neighbors are still 17:18 there versus not there and the way the 17:21 protocol works is that every node in the 17:23 system on each of its links let's call 17:27 these nodes ABC on each of these links 17:33 sends out periodically sends out only to 17:37 its neighbors a packet called a hello 17:39 packet and the hello packet usually has 17:43 a sequence number on it an incrementing 17:45 sequence number the idea is not very 17:48 very simple this may be sent you know 17:51 periodically say every 10 seconds if n 17:55 finds that a certain number of aloe 17:57 packets it hasn't heard from its 17:59 neighbor one of its neighbors in some 18:02 time that is perhaps three hello packets 18:04 are missing or four hello packets are 18:05 missing in a row it just decides that 18:08 that neighbor is dead it's a very simple 18:11 idea so you send out hello packets 18:14 periodically and if k successive 18:19 missing hello packets implies the 18:26 neighbor from which those packets are 18:28 missing is dead 18:34 now in response what happens is that all 18:39 of the routes that this node had that 18:42 went via that neighbor that via that 18:44 link are eliminated from the routing 18:47 table and you could do that by either 18:49 simply removing the entry from the 18:50 routing table or by keeping the entry in 18:53 the routing table but making for those 18:55 destinations and replacing the cost from 18:58 whatever the value was to infinity okay 19:01 and it's probably a better idea to 19:04 replace it with infinity I'm not exactly 19:07 sure why that's kind of what most people 19:10 do I think the reason is that you'd like 19:12 to know that that destination exists in 19:13 the network and then when later router 19:15 arrives you can fix it but you could 19:17 just remove it as well the other thing 19:20 that happens is if you were in a link 19:22 state protocol what you would then do is 19:24 on the next advertisement of the link 19:27 state you would simply eliminate that 19:29 link and that neighbor altogether so you 19:31 would not advertise this link and this 19:34 neighbor as existing anymore 19:36 and when that link-state advertisements 19:38 Lud's through the network all of the 19:40 other nodes do through the flooding 19:42 process would it would determine that 19:44 that node has gone away and that link 19:46 has gone away and then when they're on 19:48 Dijkstra's algorithm again and recompute 19:49 the routes they will no longer assume 19:52 that that link exists and they may find 19:53 new routes to the destination okay so 19:57 that's that's what the hello protocol 19:58 does and so how you pick K again it's 20:01 the same trade off it depends on how 20:03 quickly you want to converge to a real 20:06 failure and picking this is difficult 20:09 for example if you were on a wireless 20:10 network where the normal packet loss 20:13 probability might be 10% something high 20:16 then waiting for a larger number of 20:20 successive failed packets is a good idea 20:22 because 20:23 just because a packets lost or two 20:25 packets of laws doesn't mean the link 20:26 has failed on the other hand if you were 20:28 running on a highly highly reliable link 20:30 in terms of packet loss like you were 20:32 running on some dedicated optical link 20:34 where the packet loss rate is you know 20:36 one part in a million then you know a 20:39 single packet missing or two packets 20:41 missing would be a good indication that 20:43 that link has actually failed or that 20:45 node at the other end of the link has 20:46 failed and therefore K could be small so 20:50 again it totally depends on the actual 20:52 system context and the normal packet 20:54 loss rates so because what you're trying 20:56 to do is to make sure you react to real 20:58 failure not to simply packet loss 21:00 there's really no way to tell the 21:02 difference there's no way to tell the 21:03 difference between a link that really 21:06 has failed versus a link with the high 21:07 packet loss rate it's a heuristic and in 21:11 fact there's really no way to tell 21:12 between a node that has actually failed 21:14 and gone away or a node that's just you 21:15 know heavily overloaded and is extremely 21:18 slow in responding there's no way to 21:20 tell so these are all heuristics that 21:22 you have to you know work with and try 21:28 to solve the problem so sometimes you 21:30 may get it wrong sometimes you may find 21:32 that a link is you may declare a link to 21:34 have failed when in fact it's still fine 21:35 but that's life and and you just have to 21:38 deal with it so is the story clear so 21:41 far as to how we deal with routing on 21:44 our failure so we're gonna apply that to 21:45 this picture and you'll find that the 21:47 answer will work yes 21:52 right like I said what did let me repeat 21:56 this what it does is first of all it 22:01 really now assumes that both the node 22:03 and the link have failed it doesn't 22:04 really know now it can definitively 22:07 assume that the link has failed the node 22:10 may still be alive because it may well 22:12 be that there's a path like that and n 22:16 wants to find that you know a route via 22:18 a to that to that destination so what it 22:21 does is really two things the first 22:22 thing it does is it may have routes in 22:25 its routing table going through that 22:27 link this link is now considered dead 22:29 and therefore it should remove those 22:30 routes and then in subsequent 22:32 advertisements it should make sure that 22:34 the cost to that destination is infinity 22:36 which is why you know you would remove 22:39 it and replace the cost to be infinity 22:41 so that you tell the other guys that 22:43 previously I told you I could get to be 22:44 with the cost of five but really now 22:47 it's infinity the second thing that's 22:49 done in the link state protocol is when 22:50 you advertise you no longer advertise 22:52 that link so really the answer to your 22:54 question is it assumes that the link has 22:56 failed 22:57 it makes no determination about the lord 22:59 yes 23:06 yes a good protocol and this will be 23:08 tested in your lab 8 in pset 8 is if a 23:11 link fails and eventually comes back up 23:13 we would like for you to actually find 23:15 that answer and this is this is an 23:18 important requirement so if the 23:19 broadcast can't that's why all the stuff 23:20 is done in the background it's done 23:21 periodically so if the link comes back 23:23 up you want to find the correct answer 23:27 any other questions okay so let me apply 23:32 this idea to this this picture here so 23:35 let's so what happened here was this 23:36 thing failed so see is at this point in 23:39 time let's assume we're doing distance 23:41 vector this protocol here so C is going 23:42 to assume that this link has failed and 23:46 therefore it tells all of the other guys 23:48 in its next advertisement it tells these 23:51 other guys that it no longer has a route 23:55 to destination D and it does that by 24:00 sending in its next link-state 24:02 advertisements it would have sent you 24:06 know what did I tell everybody that you 24:09 send out the destination on the route 24:10 and the links in the advertisement I may 24:12 have done that I met the destination and 24:14 the cost so maybe I should call this the 24:17 cost 24:19 change that so this is the stuff that 24:21 sent in the advertisements and and these 24:23 two columns are in the routing table but 24:26 anyway right now here what was that what 24:29 would be advertised was D at a cost of 24:31 three we replaced that now with deal 24:35 with the cost of infinity in our 24:37 advertisements to our neighbors so 24:38 that's what C would advertise and B when 24:43 it receives that would now find that the 24:46 route that it gets along here had 24:48 previously a cost of four but now it 24:50 says that it's replaced with the cost of 24:53 infinity so it would replace this 24:55 routing table entry would go away and it 24:57 would replace it with no route and a 24:59 cost of infinity and that's what would 25:01 propagate now these advertisements are 25:05 done periodically so what D is doing of 25:07 course is to send our two advertisements 25:09 one this way and one that way now this 25:11 thing is not going to reach because this 25:13 link no longer is alive but this 25:16 advertisement works so when a receives 25:19 the next distance-vector advertisement 25:22 from D it now knows that it has a cost 25:28 you know that link is actually alive and 25:29 it has their outgoing there now this is 25:31 this particular example is a little 25:33 tricky because what's happening here of 25:35 course is that a previously had a you 25:37 know two ways of getting to D four plus 25:40 three this way or seven that way if it 25:43 were previously using that then a would 25:46 also have no route to the destination 25:48 and then it would have to wait for this 25:50 guy to send that route so when it sends 25:53 sent that route a would now have a valid 25:55 route to the destination and in is next 25:58 advertisement it would send that route 26:00 over to these two guys so it would send 26:02 out of saying that D is at cost seven 26:06 and it would do the same thing here the 26:08 D is that cost seven to see 26:11 now see when it receives the next 26:13 advertisement that D is at cost seven 26:15 compares that route against its current 26:17 route which is now infinity and replaces 26:20 in its routing table entry the infinity 26:23 with D this link here and a cost of 4 26:26 plus 7 which is 11 so it would replace 26:29 it with D let me call this L 1 and 1 and 26:32 a cost of 11 and then on its next 26:35 advertisement si would send that out to 26:37 be according to that advertisement 26:40 schedule and similarly here s when it 26:44 receives this from a word on its next 26:47 advertisement after integrating the 26:48 route to destination D which would have 26:51 a cost of 1 + 7 8 it would send out an 26:54 advertisement this way which would have 26:57 a cost of 8 and B when it receives both 27:02 of these things would compare a passed 27:04 of 8 on this link against a cost of 1 + 27:09 4 + 7 which is 12 and it would find that 27:13 8 is smaller than 12 and therefore B 27:15 would use this way of getting to the 27:17 destination does that make sense yes 27:23 yep yep 27:28 with receiving hello packets from all 27:30 its neighbors and you know it's just you 27:33 know if a link is alive and Hollow shows 27:34 up it processes it otherwise in the 27:36 moment the first fellow shows up it 27:37 declares the link to be alive again you 27:39 know finding that someone is alive is a 27:41 lot easier than finding that they're 27:42 dead at least if they're in networks 27:44 it's probably true in life too it was 27:47 certainly true of networks because all 27:48 you have to know I mean assuming there's 27:50 no malicious nodes detecting that a node 27:53 is alive is takes one packet detecting 27:55 that a node is dead you know you're not 27:57 sure maybe it was the link was down 27:58 maybe maybe it was just a transient 28:00 failure maybe a packet was lost so it's 28:02 a lot harder to find that something is 28:04 crashed then it's then it is to find 28:06 that something's working 28:07 but yeah so you keep listening for hello 28:11 packets okay 28:13 so this is how it converges now you know 28:16 eventually of course because there's 28:19 some correct working path eventually 28:20 it'll all converge to the correct answer 28:22 enough if eventually if later at some 28:24 point in time this link comes back up 28:26 the same thing occurs because all of the 28:29 stuff is being done periodically and so 28:30 periodically these advertisements are 28:32 going to be sent C is going to find that 28:34 there's a better route to go to D via 28:36 this link l0 it advertises D now at a 28:39 cost of four and eventually you know all 28:41 of the notes figure it out and they 28:42 converge back to the right answer okay 28:45 so and you can see that the link state 28:47 protocols the convergence is actually a 28:49 little bit easier because again they're 28:51 the links are nodes of periodically 28:53 advertising these links so what's going 28:55 to happen in a link state protocol is if 28:57 you take the same picture and previously 28:59 the nodes all had routes and many of 29:01 those routes went through that link you 29:03 have to wait for the next link state 29:05 advertisement which would tell you after 29:08 the salaah protocol discovers that C 29:10 discovers that this link has failed it 29:12 takes that Nick next link state protocol 29:14 advertised link state advertisement by 29:16 which all of the nodes through the 29:18 flooding process discovered that this 29:20 link is failed and they all run 29:21 Dijkstra's algorithm again and they will 29:24 find the correct new answer which will 29:26 take them through paths that by 29:28 pass this fail link now the same logic 29:32 applies in both protocols to when a node 29:34 fails if this node were to fail you can 29:36 sort of think through the node failing 29:38 is actually equivalent to all of the 29:40 links coming out of the node failing so 29:42 it's just somewhat harder problem in 29:43 terms of just making sure that you are 29:46 able to find the routes correctly but 29:48 this node failing is really the same as 29:50 all of the links attached to that node 29:52 failing and in a link state protocol 29:54 it'll eventually you'll discover that 29:55 and all of the nodes will computer outs 29:57 this way and similarly in a distance 29:59 vector protocol that's what happens now 30:02 so far in this picture I've assumed that 30:04 once you have these failures and then 30:07 you pause nothing else happens there's 30:10 no more failures there's no packets that 30:11 are lost and so on but life's actually 30:13 not so kind what will happen in practice 30:16 is that first of all before I get to why 30:20 this stuff is a lot more complicated 30:21 does everyone understand how these 30:23 things work and how they converge 30:25 correctly to the right answer after 30:27 failure and after recovery from failure 30:30 any questions 30:34 okay so now let me tell you all the ways 30:36 in which the story goes wrong the first 30:43 way the story goes wrong is let me do it 30:46 in the context of a link state protocol 30:47 with a very very simple picture let's 30:50 say you have I think I have a slide all 30:55 right 30:56 let's say you have the picture that I've 30:57 shown up there so very very simple 31:01 picture there's a B and D D is the 31:04 destination and this is some path so 31:06 let's say that what happens is that 31:08 normally when there's no failures the 31:11 way to go from B to D is via a so B ad 31:14 and a goes to D directly now let's 31:17 assume that this link fails if that link 31:21 sales and things work great what's going 31:23 to happen is that in the next link state 31:25 advertisement it tells B that ad no 31:28 longer exists it knows the correct link 31:31 state from B and so it computes its path 31:34 via B it's route via B and similarly 31:37 beed realizes that ad doesn't exist 31:40 anymore and it computes an alternate 31:42 route that way but let's say what 31:45 happens is that ad fails and then in the 31:48 next link State advertisement that a 31:51 sends out that that packets lost let's 31:54 say that a is link-state advertisements 31:55 be is just lost right packets could be 31:58 lost 31:59 now we have a problem because 32:02 he knows that this has failed and 32:04 therefore when it computes it's 32:06 Dijkstra's algorithm or shortest path 32:08 algorithm it knows that what it wants is 32:10 a route going like that but B on the 32:13 other hand doesn't know that link ad has 32:16 failed because they didn't see that 32:17 link-state advertisements which was lost 32:19 so what B does is compute its routing 32:22 table entry which is the same as it was 32:23 before going through that link over here 32:29 now you have a problem because when a 32:32 gets a packet that one our data packet 32:34 that it wants to send to destination D 32:36 previously it sent it this way but now 32:38 it knows that link has failed so it 32:39 sends the packet to B because it's route 32:41 for D is via B well B gets that packet 32:45 and looks it up in its routing table and 32:46 B believes that the way to get to D is 32:49 via a so it sends it back to a well they 32:52 gets that back and says oh that's great 32:54 this is a packet for destination D I 32:56 look it up in my routing table that goes 32:58 by a B and this ping pongs for you know 33:01 pretty much as long as you want this 33:03 thing here is the simplest example of a 33:06 general phenomenon called a routing loop 33:08 so the first thing that can happen when 33:10 during the process of route convergence 33:13 various kinds of pathologies and 33:15 problematic conditions could happen and 33:17 one of them is a routing loop 33:21 the second thing that could happen and I 33:23 showed you that here where during the 33:27 process of convergence si does not have 33:29 a route to this link to the destination 33:31 D but B thought it had a route going by 33:33 a C but in fact C just dropped that 33:36 package that's the second condition that 33:37 happens it's a dead end so both of these 33:42 things can happen during the process of 33:43 convergence 33:44 now these routing loops are particularly 33:45 problematic because when you have a 33:47 routing loop this is an example of a to 33:49 hop routing loop right a goes to B B 33:52 goes to a was to B so it bounces twice 33:54 but you can have more complicated 33:55 routing loops you could have a routing 33:57 loop with four hops that looks like or 34:00 four four nodes involved as opposed to 34:03 two nodes where this is destination D 34:06 and this a thinks that you have to go 34:09 that way B things that you have to go 34:10 this way I see things you have to go 34:12 this way and let's call this guy e he 34:14 thinks you have to go that way and this 34:15 could happen so you end up with packets 34:17 cycling around now these packets cycling 34:20 around you know there's really no way to 34:23 once you have routing table entries that 34:26 have somehow converged until it gets 34:29 fixed somehow if they've converged to 34:31 routes that we're for B you have to use 34:34 this link and C has to use this link and 34:36 so forth and you get a cycle of what 34:39 ends up happening is these packets cycle 34:41 forever there's really no way to avoid 34:44 the packet cycling forever now this is 34:49 of course eventually this will be fixed 34:51 if the Varden protocol eventually 34:52 converges it'll eventually discover that 34:54 this is wrong and find the correct 34:55 answer but during the process of 34:57 convergence bad things could happen like 34:59 this and that's why we have on packets 35:02 in in packet switched networks a field 35:04 called the hop limit field and that's on 35:07 a data packet so you have the source of 35:09 the packet set a hop limit let's say 32 35:12 it just says that I need to get to the 35:14 destination I know it shouldn't take 35:16 more than 32 Hoff's no matter what 35:18 happens 35:18 and then every node that every switch 35:20 that gets this packet reduces the hop 35:22 limit by one and eventually when the hop 35:26 limit gets to zero the packets discarded 35:29 so this is a way to flush packets out of 35:30 the network and usually you use this 35:32 mechanism to handle the case when you 35:35 get stuck in a routing loop you don't 35:38 want these packets to cycle around 35:39 forever and ever and ever because these 35:41 packets move around the network in you 35:43 know milliseconds and routing protocols 35:45 take minutes to converge or many many 35:47 seconds to converge so that's many many 35:48 milliseconds these packets could remain 35:50 in the network forever and ever 35:51 using a bandwidth and no one's getting 35:53 any use out of it so you have this hop 35:55 limit field to flush packets out of the 35:57 system but of course what we'd like to 36:00 do is to design protocols but guaranteed 36:02 no routing loops at all unfortunately 36:05 it's impossible to do that but what we 36:07 can try to do is to reduce and mitigate 36:09 the effects of routing loops now I want 36:14 to go through a few more examples of 36:15 routing loops in this isn't the link 36:17 state protocol I want to actually now 36:19 talk about what happens with a 36:20 distance-vector protocol and show you 36:23 why this basic simple distance-vector 36:25 protocol which is the first routing 36:27 protocol that was invented has some 36:29 problems on how we go about fixing it 36:31 and eventually I'll talk about how this 36:32 is all used on on the Internet today 36:35 so here's how a distance protocol 36:38 distance vector protocol might get stuck 36:41 in a a weird kind of routing loop so 36:45 let's take this example here where you 36:48 have five nodes and we're all interested 36:50 in finding routes to destination ii and 36:52 the general lesson i want to get at here 36:54 is that a distance vector protocol is 36:56 extremely simple but it only works on 36:59 small networks and for bigger networks 37:02 we want something better so that's where 37:04 the story is going so let me refresh 37:07 where we are all the discussion I had so 37:09 far so let's assume that 37:11 link a/c fails in this picture so what 37:14 you would like to have I'm assuming 37:16 they're all linked us are one so we 37:17 don't worry about costs at this point 37:19 what you want to have happen is for a to 37:22 discover that this a discovers that's 37:24 failed and when there are in converges 37:26 you would like a to use this link as its 37:29 route to destination E and the cost 37:31 would be 1 1 1 which would be 3 all 37:35 right so when a discovers failure it 37:36 sends the cost of E is infinity to its 37:39 neighbor in particular to B and then B 37:43 of course you know has a route to 37:45 destination e at cost to be advertises 37:49 back to a and then a says now I have a 37:51 route to destination E and that's this 37:53 is an example of a good converging 37:55 routing protocol everything is good now 37:59 let's assume I complicate the picture 38:00 let's assume that link B D also fails so 38:04 now what's the correct answer well these 38:07 two guys have a route to e but the 38:11 network has become disconnected so the 38:13 correct answer the correct convergent 38:14 answer here is that a and B both 38:16 discover and instantiate in the routing 38:19 tables entries that say that E is at a 38:22 cost of infinity right because there's 38:24 no path which means that it's an 38:25 infinite cost so when a packet arrives 38:27 at B for destination de you just drop 38:31 the packet but this could happen so 38:34 here's an example of how that happens so 38:37 let's say that this link fails B 38:40 discovers that through the hello 38:41 protocol and at this point beef changes 38:44 its routing table entry so that E is at 38:47 infinity we had previously sent 38:50 information to a saying that B was he 38:53 was at distance - or cos - and now it 38:56 says well I told you that a was that he 38:59 was at cost - before but I'm changing my 39:01 mind it's a cost infinity and a says ok 39:03 my entry for e now has cost infinity and 39:07 both of them have converged correctly to 39:09 the right answer 39:11 now unfortunately that's not the only 39:13 thing that could happen this was in the 39:16 lucky situation when we discovered it 39:18 had failed and immediately sent out its 39:20 cost to a but what could happen is a 39:24 little different what could happen is 39:26 that we could discover that D has failed 39:29 and change its routing table in entry to 39:33 infinity but before it gets a chance to 39:36 send out its advertisement to a or 39:38 perhaps it sends out its advertisement 39:41 with the cost of infinity to a but it 39:42 got lost in either of those cases what 39:45 could happen is a could send out its 39:47 routing table cost to be for destination 39:52 e because that's what's happening 39:54 periodically right every node is 39:55 periodically running this and the times 39:57 are always synchronous every node has 39:58 its own notion of what you know when it 40:00 should send out its advertisement so 40:03 what happens now is a sends out an 40:05 advertisement to be saying it has a 40:08 route to destination e at a cost of 40:11 three which is very valid right after 40:14 all it does have a route in its routing 40:17 table to e whose cost is three it so 40:20 happens it goes through be but it 40:22 doesn't yet know that that link BD has 40:24 failed now we're a little bit in trouble 40:29 because B now believes that its routing 40:31 table entry for e is the cost of 40:33 infinity because BD has failed that link 40:35 and now it sees an advertisement from a 40:39 with a better cost HUD says wow this is 40:43 cool I now have a path of cost three via 40:47 A to E which is better than my cost of 40:51 infinity and so I'm going to assume that 40:53 I have an entry to e at a cost of four 40:57 so now it's you know you can see that 40:59 this is actually not a valid crowd at 41:01 all now be actually on the face if it 41:05 has no way of knowing if is telling it a 41:08 different route so it could conceivably 41:09 be the case that it has a different 41:11 route going that way whose cost is three 41:15 and it can legitimately have a cost of 41:18 three to e that it could be telling be 41:21 about but in this protocol there's no 41:24 way for me to distinguish that case from 41:26 the case where it is just repeating to 41:29 be a route that it received via B but 41:31 it's just telling it that it has a route 41:33 via B now B therefore says that has a 41:37 cost of 4 2 e and it sends that to a and 41:40 a says whoa previously the cost was 3r 2 41:43 from B and now B is telling me that the 41:46 cost is 4 which means I need to make my 41:48 cost equal to 4 plus 1 which is 5 and 41:51 I'm going to send that back down Abby 41:53 says all right I got a cost of 5 41:55 previously that same thing had a cost of 41:57 3 so now I'm gonna make my cost 6 and 42:00 this goes on forever now in the meantime 42:03 if they're a package showing up at 42:04 either A or B for destination e they're 42:07 just going to go bouncing between these 42:08 two guys this is a routing loop 42:12 now when does the stop when do these are 42:14 our you know when do these guys stop 42:15 selling these incrementing costs sorry 42:23 yeah you need a value of infinity you 42:26 need to say that at some point they're 42:27 going to reach infinity and we're going 42:29 to stop so in other words for this 42:31 protocol as its presented to converge in 42:33 a legitimate amount reasonable amount of 42:35 time your value of infinity should be 42:37 small so this thing has a colorful name 42:41 it's called counting to infinity now in 42:43 reality in no network in any network II 42:45 the cost of infinity cannot be smaller 42:47 than the you know the minimum the bet 42:51 the maximum minimum cost path right if 42:53 you have a minimum cost path that has a 42:55 cost of 75 for whatever reason infinity 42:58 should better be have better be bigger 42:59 than 75 right so what it means is that 43:03 you have a problem with this protocol it 43:06 only it works great on small networks 43:07 but it only works on small networks and 43:09 the reason for that is that it needs a 43:11 value of infinity that's not very big so 43:13 this is why distance vector protocols 43:16 are only used for really simple small 43:18 networks and the moment the network 43:20 becomes a certain size or when you want 43:22 costs that are you know large values you 43:25 really can't use this protocol so how do 43:28 you fix this problem 43:32 any ideas on how to solve this problem 43:34 and we clearly the you know we're 43:36 religion it's pretty big we're not you 43:39 know counting to infinity throughout the 43:41 whole internet so at least we you don't 43:42 think we are so how would you fix this 43:44 problem any ideas 43:51 yes so we do have a half limit on 43:55 packets so all these packets might have 43:57 a half limit so the packets don't remain 43:59 in the network for a long time but that 44:01 doesn't solve the problem of the routing 44:02 protocol to converge takes a time that 44:05 you know which is the the discounting to 44:10 infinity problem so you want a better 44:12 solution in some way yes 44:24 okay so that's one good idea which is in 44:28 fact that's how they started trying to 44:29 solve this problem which is if you have 44:32 a route to a destination coming from a 44:36 neighbor 44:36 don't send back the same route to them 44:39 in other words in this case is route 2 e 44:43 was via B so it should not advertise air 44:47 up to E or route for destination 'back 44:50 to B if you do that there's a name for 44:53 it it's called split horizon the notes 44:54 describe how this protocol works or you 44:57 could do even better you could actually 44:59 a could advertise to be that it's route 45:02 to destination e has a cost of infinity 45:05 forcing B to definitely not use that 45:08 route no matter what happens because a 45:10 received that route via B so a should 45:12 tell me that the cost of that route is 45:14 infinity because under no circumstances 45:16 does a want B to use the same route that 45:18 it received right so you could do that 45:20 the trouble with that is it doesn't it 45:22 solves these two hop loop problems but 45:24 it doesn't solve for hop loop problems 45:26 so you could have a situation where like 45:27 this link fails and C discovers that but 45:32 before C sends out its update B sends 45:34 out its route to C and so see things 45:37 that can use B in the meantime be things 45:40 that you know it has a route via a so 45:42 you might end up with packet cycling 45:44 around in longer hop loops so that idea 45:46 that you had doesn't actually solve the 45:48 more general problem so any other 45:50 modification or idea that can solve the 45:52 problem 46:00 so one thing you could do is something 46:03 called path vector which is what you 46:05 could do is every node rather than just 46:07 sending the cost it could send the 46:09 entire route that is sorry it could send 46:12 the entire path it could send the list 46:15 of nodes that corresponds to that 46:17 particular route in its routing table 46:19 advertisement so I'll show that with a 46:20 picture here so II could send out not 46:24 just its destination and a cost which 46:26 previously it would say that to come to 46:28 II II would say the cost is zero 46:30 but it could now say the cost is zero 46:32 and the path is e and then each of these 46:36 other not guys C and D could send out 46:38 it's their own advertisements saying the 46:43 cost is 2 or whatever the cost is 1 but 46:46 they could also say that the path is de 46:48 so D says that my path to get to e is de 46:52 in its advertisement and B here when it 46:54 receives that could send out its own 46:57 path vector which is the list of nodes 47:03 or the list of switches that corresponds 47:06 to an actual path that's advertised and 47:08 now the rule for how you integrate a 47:11 route into your writing table entries is 47:13 very simple if you see an advertisement 47:16 with your own identity in that 47:18 advertisement then you know that you 47:21 know that's just a rumor that you 47:22 started or you were involved with so you 47:23 shouldn't integrate it so in particular 47:25 in this example here if B for example 47:28 were to see an advertisement from a with 47:31 a path that was a BD e then B wouldn't 47:35 integrate that so what would happen in 47:37 the picture I showed you before is these 47:39 two links were to fail what would happen 47:41 is that you know B would have if that 47:43 link failed B would have sent that BD e 47:46 over here and when a advertises that 47:49 back to B it would have a BD e show up 47:52 and B now sees a BD e and B finds that 47:56 its own name is 47:57 that vector or in that advertisement and 48:00 says I should pay no attention to that 48:02 and as long as you find your own name 48:05 somewhere in the vector in that list of 48:07 nodes that that routing advertisement 48:09 went through you know that you shouldn't 48:11 pay any attention to it because it you 48:13 know you were involved in creating that 48:15 advertisement and so you shouldn't pay 48:16 attention to it this protocol is called 48:19 path vector it's used on the Internet in 48:21 something called the border gateway 48:22 protocol which runs between autonomous 48:24 systems and that's actually what what 48:28 makes the internet essentially converge 48:30 and not have these routing loops that go 48:32 between different Internet service 48:34 providers any questions comments so far 48:40 about any of this stuff 48:45 so let me summarize everything about 48:47 routing protocols and we pick this up in 48:49 recitation with some problems tomorrow 48:50 so the last two lectures in recitation 48:53 we've spoken about the network layer and 48:55 the main problem that solves by the 48:57 network layer is how to get packet 48:59 routing to work how do you find good 49:01 paths between different nodes in the 49:04 network between different switches in 49:06 the network now we separated out the 49:10 tasks of routing from forwarding 49:12 forwarding is what happens when a packet 49:14 arrives at a switch there's a lookup 49:16 that happens in a routing table you take 49:18 the destination you look it up in the 49:19 table find the link in the routing table 49:22 and ship the packet so that's done 49:24 usually you want it to be done very very 49:26 fast 49:26 the routing is the process by which the 49:28 nodes create routing table entries and 49:30 that's a very distributed process it 49:32 runs amongst all of the other all of the 49:35 switches in the network we looked at two 49:37 routing protocols link state and 49:39 distance vector in links in distance 49:41 vector the computation is distributed 49:43 with these bellman-ford update steps and 49:46 the distance vector protocol is very 49:48 beautiful in that it's very very simple 49:50 works for small networks but to make the 49:52 idea work for bigger networks you have 49:54 to enhance the distance with the actual 49:56 path and if you enhance it with the path 49:58 you actually avoid a lot of these 50:00 routing loops that show up you can't 50:01 eliminate it but you can mitigate the 50:03 effect of it in the link state protocol 50:07 there's actually more work that's done 50:09 there's a lot more information that's 50:10 flooded between nodes but the protocol 50:13 converges quicker than these distance 50:15 vector and path vector protocols usually 50:17 linked state protocol you flood this 50:19 neighbor information you consume more 50:20 bandwidth there's a lot more bandwidth 50:22 that's used in networking flooding it 50:24 and the computation is centralized you 50:25 run Dijkstra shortest paths so what the 50:28 internet does in general I'll pick up on 50:29 this to lecture three lectures from now 50:31 when I talk about how the internet 50:32 really works and applies the concepts 50:34 we've studied what you'll find is that 50:36 networks like mi t--'s network will run 50:38 a protocol like links to a link state 50:41 like protocol to achieve connectivity 50:43 between nodes inside MIT and then 50:45 routers at the edge of MIT connecting to 50:47 other Internet service providers are on 50:49 a path vector 50:50 like bgp and all of these things work 50:53 together and they work because 50:54 ultimately all of the switches create 50:56 these routing table entries that have a 50:57 mapping between destinations and routes 51:00 or links that have to be used so that's 51:03 the routing story we will pick it up in 51:04 recitation tomorrow and see you back on 51:07 Wednesday