Transcript


00:00
the following content is provided under
00:01
a Creative Commons license your support
00:04
will help MIT OpenCourseWare continue to
00:06
offer high quality educational resources
00:08
for free
00:09
to make a donation or view additional
00:12
materials from hundreds of MIT courses
00:14
visit MIT opencourseware at ocw.mit.edu
00:25
so we're going to talk about routing
00:27
protocols and today we're going to talk
00:29
about how adding protocols handle
00:32
failures so I'm gonna I want to bring
00:34
everybody back onto the same page with
00:36
with respect to where we are in the
00:39
story so in terms of writing protocols
00:44
if you imagine network topology like
00:48
this
00:57
and let's say that links have costs
00:59
associated with them so I'm just going
01:01
to make up some numbers here
01:10
we studied two classes of routing
01:12
protocols and we looked at these in the
01:14
absence of any failures the first is the
01:18
distance vector protocol or more
01:19
generally vector protocols where in an
01:23
advertisement sent from one node to
01:25
another each node sends a subset of its
01:28
routing table in other words it sends
01:29
two columns from its routing table
01:32
recall that the rotting table that every
01:33
node contains a destination a route
01:39
which is a link of the name of the link
01:42
and a cost and in a distance vector
01:46
protocol you send these two columns you
01:51
send these tuples for all of the
01:53
destinations that you know and that's
01:54
what spread out to all of the other
01:56
nodes in contrast in so this is distance
01:59
vector in contrast in a link state
02:02
protocol what you're sending to all of
02:06
the other nodes is information about all
02:08
of the links that you have and in
02:10
particular you send information about
02:13
the neighbor
02:18
in your topology together with the link
02:22
cost to that neighbor whereas here it's
02:26
the cost of the route to the destination
02:30
that you sent in the distance vector
02:34
protocol the computation of the routes
02:36
is distributed so every node kind of
02:39
computes and updates its routing table
02:41
using the bellman-ford update step which
02:44
essentially updates the route to a
02:46
destination if you find a better route
02:48
to the destination or if you find that
02:50
the route to the destination has you
02:52
know that you currently have has a
02:53
changed cost in which case you update
02:55
your routing table in a link state
02:57
protocol the computation is not
02:59
distributed what's distributed is the
03:01
process of flooding this information of
03:04
the local links that's why it's called a
03:07
link state protocol and the computation
03:10
of the routes themselves are centralized
03:11
each node just runs a shortest path
03:14
computation for example Dijkstra's
03:16
shortest path computation and in a link
03:19
state protocol as long as all of the
03:20
nodes run exactly the same computation
03:22
in other words they all try to minimize
03:24
or optimize the same metric then you're
03:27
guaranteed that all of the nodes end up
03:29
computing the correct routes and that
03:31
when you send a packet from one node it
03:33
will reach the reach the destination
03:34
assuming there's a path in the network
03:36
and in the problem said maybe not in the
03:40
problems certainly at the back of the
03:41
chapter there are problems on what
03:44
happens if some nodes on one protocol
03:46
and another node runs on another
03:48
protocol for example you might have
03:49
imagined a routing protocol where one of
03:52
the nodes is doing minimum cost routes
03:53
and another node is doing shortest paths
03:55
in terms of the number of hops not costs
03:57
and if you have that you might end up in
03:59
a situation where in fact the routing
04:01
doesn't work correctly
04:03
but assuming that all of the nodes are
04:05
on the same optimization in a link state
04:08
protocol they'll all get the correct
04:09
answer and similarly in a
04:11
distance-vector protocol actually you
04:12
know the update rule can do anything as
04:14
long as the nodes believe that there's a
04:15
route and they perform some sort of an
04:18
update to there are in table entry
04:19
consistent with the information that
04:21
they hear then the routing will work now
04:25
what happens when there are failures and
04:26
how does it work so let's say that you
04:28
know you run this protocol on this
04:30
topology and you give it enough time
04:32
once you hear all of the advertisements
04:34
and compute the routes of the different
04:37
nodes if you look at this node over here
04:41
see for destination D would have a route
04:45
let's call this link l0 it would have a
04:49
route l0 and the cost of L that route
04:51
would be 3 and similarly at B here once
04:55
it converges you would have for
04:57
destination B let's imagine this link is
05:00
called l0 as well at node B you might
05:03
have a 0 and the cost would be 4 and
05:07
similarly at the other nodes now let's
05:11
say what happens is that some sort of a
05:14
failure occurs and failures could be one
05:16
of you know a variety of failures could
05:18
occur that caused the routing to get
05:22
screwed up so one of the failures the
05:24
easiest form of failure is that a packet
05:26
could be lost in particular an
05:27
advertisement could be lost
05:40
the second thing that could happen is
05:42
that a link could fail I mean if it's a
05:47
wire maybe a backhoe runs over it which
05:49
is actually more common than you think
05:51
in fact you know you could have undersea
05:54
cables where your sharks bite them and
05:56
they can destroy it I mean lots of
05:58
things could happen so links could fail
06:01
the third thing could have that could
06:02
happen which is more common than you
06:04
might think
06:04
is that entire switches could fail or
06:06
nodes could fail since we actually don't
06:14
know how to write bug free software
06:16
software bugs may cause things to fail
06:18
they may cause things to crash in fact
06:20
they may cause things to fail in
06:21
mysterious ways where you know it looks
06:24
like it's not failed but in fact there's
06:26
a Fault in the software causing you to
06:28
send bad advertisements and for this
06:31
course we won't worry about this will
06:32
actually worry about only a class of
06:34
failures that you think of as fail stop
06:39
what I mean by that is if it fails it
06:41
just stops as opposed to it fails and
06:44
then it gives you wrong information
06:45
which is actually a lot harder to deal
06:46
with because it's much better for you to
06:49
fail and just stop rather than for the
06:51
node to fail and pretend that it's
06:52
correct and send you bad information
06:53
that's a lot harder to deal with we'll
06:56
worry about that later
06:57
not not in six or - anyway these things
07:00
could happen so let's concretely assume
07:02
that you have this topology here and
07:04
what happens is this link fails
07:10
now if you did nothing and you had that
07:14
links fail what what would happen now
07:17
let's look at it node by node right what
07:20
would happen here is that if C would end
07:24
up with no route to the destination
07:26
because what would happen is assuming
07:29
that C has some way of discovering that
07:30
this fault has happened it would have no
07:33
route to the destination but if it
07:34
doesn't discover the fault would have
07:35
had happened it has this route to the
07:37
destination but if it sends packets on
07:39
that link it wouldn't reach the other
07:40
side packets would be lost but assuming
07:43
that it discovers has a way of
07:44
discovering that this link has failed it
07:46
has no route to the destination now B
07:49
has a route to the destination it has
07:51
this link but in fact you know what the
07:56
next advertisement from C if C were to
07:58
make another advertisement would tell it
08:00
that it had no route to the destination
08:01
but if it did nothing if you did nothing
08:03
in the protocol B thinks it has a route
08:05
to the destination but in fact it
08:06
reaches C and it's actually a dead-end
08:08
because it reaches C and C doesn't know
08:10
what to do with it it just drops the
08:12
packet okay the technical term for this
08:14
is a dead end you know you send packets
08:17
you think you're reaching the
08:18
destination but it's actually getting
08:19
dropped
08:21
what about s does s have a route to the
08:23
destination well s doesn't have a route
08:26
to the destination either because the
08:28
right route to the destination would
08:29
probably have been this this link over
08:35
here because the cost was two plus one
08:37
plus three and that wouldn't have a
08:38
route to the destination it would have a
08:40
route but there are too late to a dead
08:41
end at C and you know in fact in this
08:43
particular example no node would have a
08:45
route to the destination in terms of the
08:48
routing table itself no node would have
08:51
a route that actually worked in that the
08:53
route wouldn't correspond to an actual
08:55
working path but if you look at the
08:57
picture clearly there are other ways to
08:58
get to the destination we designed this
09:00
topology presumably because we had some
09:03
sort of redundancy so if this failed
09:05
what you would like to have is for C to
09:08
use one of the other paths C might use
09:10
this path or C might use this path or C
09:13
might yeah those are the two possible
09:15
paths it could use and similarly s might
09:17
use should use that path and so forth so
09:21
what you want is a routing protocol that
09:23
converges to the new correct answer
09:26
assuming there's a new correct answer
09:28
and if there's no new character answer
09:30
it converges to whatever the best
09:32
possible answer is so to some
09:34
destinations you could have a route into
09:36
some destinations you don't have route
09:37
so what you want is as long as there is
09:40
a connected path there is a path between
09:43
a source and a destination you would
09:45
like that source to end up with a route
09:47
that corresponds to some good working
09:50
path and particular converges to the new
09:53
minimum cost working path between the
09:56
source and the destination so that's the
09:57
statement of the problem
09:59
and we're going to solve that problem
10:01
today for both distance-vector and lynx
10:05
cake and interestingly the idea that
10:07
we're going to use is the same idea in
10:09
both cases and the idea is just like we
10:12
built a redundant topology by having you
10:15
know alternate paths between places
10:17
we're just going to repeat
10:18
advertisements and we're going to repeat
10:21
the process of processing these
10:23
advertisements it's a very simple idea
10:25
and the general plan there are three
10:28
steps to the plan the first step in the
10:30
plan is to periodically check every
10:33
neighbor is responsible for checking the
10:35
health of every node is responsible for
10:38
checking the health of its neighbors
10:39
okay so that's the step which we're
10:43
going to call neighbor liveness and the
10:51
protocol we're going to use for that is
10:53
called the hello protocol it's a very
10:56
very simple protocol I'll describe it in
10:58
a moment and we're going to use this
11:01
idea that every node is responsible for
11:03
checking whether each of its neighbors
11:05
is alive and if it determines that a
11:08
neighbor is not alive
11:09
it assumes the neighbor is dead and
11:12
removes it from various tables and data
11:15
structures and so on and in fact this
11:17
fail stop assumption is pretty crucial
11:18
for us because the assumption is that
11:20
when a failure occurs of a node that in
11:23
fact the node doesn't respond if a
11:25
failure occurs of a link the Assumption
11:26
here as well is that the link stops
11:28
responding you know you don't get to
11:30
send packets or receive packets over
11:31
that link
11:32
and for now we're going to assume that
11:34
every link is bi-directional so you know
11:36
you send packets in both directions in
11:38
reality there are unidirectional network
11:40
links and you have to deal with the
11:42
problem differently not going to worry
11:44
about that so there's a protocol called
11:47
the hello protocol that runs to detect
11:49
if your neighbor is alive or not the
11:53
second step in our answer is to make the
11:56
advertisements periodic and the third
12:05
step is what you do when you receive an
12:08
advertisement when you receive an
12:10
advertisement you collect a bunch of
12:11
these advertisements that you received
12:13
from various neighbors in the link state
12:15
protocol it's these link-state
12:17
advertisements and the distance-vector
12:18
protocol it's these distance-vector
12:20
tuple advertisements and then you run a
12:24
periodic integration process
12:35
so if you look at it with a time line
12:42
every node is synchronously in other
12:45
words independent of the other nodes you
12:48
don't have to synchronize the clocks
12:49
every node has its own you know clock
12:51
and every node does a periodic these two
12:54
steps periodically so from time to time
13:00
it sends an advertisement it just says
13:08
you know in distance vector it just
13:09
sends these two columns to its neighbors
13:12
in the link-state advertisements sends
13:14
out its link state information and the
13:16
flooding process works and then from
13:19
time to time there's this integration of
13:21
these advertisements that happen
13:30
etc now the beautiful part of these
13:34
propels is that you know I've shown this
13:36
picture here with these integrations
13:37
happening interspersed with the
13:38
advertisement so that doesn't actually
13:40
have to be the case you could do them
13:41
pretty much arbitrarily as long as you
13:43
do them periodically the beautiful part
13:46
of this these protocols is that every
13:49
node is synchronously running these
13:50
advertisement steps and these
13:52
integration steps as long as they do
13:53
this periodically in the end what you
13:57
get is a property called avenge eventual
13:59
convergence what that means is assuming
14:16
you have all sorts of failures and any
14:18
pattern of you know packet losses link
14:19
failures and switches and then you
14:21
freeze the system and assume that no
14:26
more failures happen then what eventual
14:29
convergence means is that in some finite
14:32
time all of the nodes in the network
14:36
will converge to correct routing State
14:37
that is in these routing protocols all
14:40
of the nodes will end up with an answer
14:42
that's consistent with what you were
14:43
trying to optimize for example minimum
14:46
cost paths to all the destinations now
14:49
proving that under an arbitrary model of
14:52
you know when these advertisements and
14:54
integration steps are all asynchronous
14:56
and being done at random times is a
14:58
little involved and we're not going to
15:00
attempt that in this course the nodes
15:02
talked a little bit about how you get
15:03
eventual convergence when you assume
15:05
that all the nodes are running very
15:06
periodic advertisement steps
15:09
interspersed with integration steps the
15:10
proofs really not that important what's
15:12
more important is for you to understand
15:14
the intuition behind why why it works so
15:16
I'll do that by some examples here
15:18
I have to also tell you what the hello
15:22
protocol is I'll get to that but for now
15:24
just assume it's a module that tells you
15:26
if the neighbor is alive or not
15:28
so is this plant clear to everybody it's
15:31
just the same idea except every node is
15:33
doing this periodically so in practice
15:35
you might do this every 30 seconds or
15:36
every 3 minutes or something like that
15:38
of course the longer the time between
15:40
advertisements the longer it's going to
15:42
take for the protocol to converge after
15:45
a failure or set of failures and the
15:47
shorter the time it takes a quicker
15:50
amount of time to converge but you end
15:52
up doing a lot more work and moreover in
15:56
practice many failures are transient so
15:58
you know a link may fail for a for a few
16:00
seconds and then come back up and so
16:02
it's in practice not that useful to
16:04
converge very very quickly or react very
16:07
very quickly it's important to converge
16:09
quickly once you start the convergence
16:10
process but once you you know detecting
16:13
that a neighbor is alive or dead on the
16:15
time scale of a few packets sometimes
16:19
you know too fast because sometimes
16:21
failures last for very little time and
16:22
then they go away
16:23
and in the mean time you've done all
16:25
this work to converge to a new rotting
16:26
state and then when the link comes back
16:28
up you're going to do more work to come
16:29
back to the old answer you may as well
16:31
have just been a little lazy and so
16:34
deciding these times is tricky and
16:36
there's no real systematic way of doing
16:38
it in practice but the trade-off is
16:41
usually between how quickly you wish to
16:44
converge and how much work you're
16:45
willing to expend in making that
16:47
convergence happen so is the plan clear
16:50
it's just the same protocol except we're
16:52
going to do this periodically
16:56
so the first step is this enable
16:58
liveness of the hello protocol so that
17:01
protocol is actually very easy every
17:03
node you know you have a set of links
17:07
coming out of the node and the neighbors
17:09
at the other end of it so the problem is
17:11
that the node needs to decide which of
17:13
these links is working or not working
17:16
and which of the neighbors are still
17:18
there versus not there and the way the
17:21
protocol works is that every node in the
17:23
system on each of its links let's call
17:27
these nodes ABC on each of these links
17:33
sends out periodically sends out only to
17:37
its neighbors a packet called a hello
17:39
packet and the hello packet usually has
17:43
a sequence number on it an incrementing
17:45
sequence number the idea is not very
17:48
very simple this may be sent you know
17:51
periodically say every 10 seconds if n
17:55
finds that a certain number of aloe
17:57
packets it hasn't heard from its
17:59
neighbor one of its neighbors in some
18:02
time that is perhaps three hello packets
18:04
are missing or four hello packets are
18:05
missing in a row it just decides that
18:08
that neighbor is dead it's a very simple
18:11
idea so you send out hello packets
18:14
periodically and if k successive
18:19
missing hello packets implies the
18:26
neighbor from which those packets are
18:28
missing is dead
18:34
now in response what happens is that all
18:39
of the routes that this node had that
18:42
went via that neighbor that via that
18:44
link are eliminated from the routing
18:47
table and you could do that by either
18:49
simply removing the entry from the
18:50
routing table or by keeping the entry in
18:53
the routing table but making for those
18:55
destinations and replacing the cost from
18:58
whatever the value was to infinity okay
19:01
and it's probably a better idea to
19:04
replace it with infinity I'm not exactly
19:07
sure why that's kind of what most people
19:10
do I think the reason is that you'd like
19:12
to know that that destination exists in
19:13
the network and then when later router
19:15
arrives you can fix it but you could
19:17
just remove it as well the other thing
19:20
that happens is if you were in a link
19:22
state protocol what you would then do is
19:24
on the next advertisement of the link
19:27
state you would simply eliminate that
19:29
link and that neighbor altogether so you
19:31
would not advertise this link and this
19:34
neighbor as existing anymore
19:36
and when that link-state advertisements
19:38
Lud's through the network all of the
19:40
other nodes do through the flooding
19:42
process would it would determine that
19:44
that node has gone away and that link
19:46
has gone away and then when they're on
19:48
Dijkstra's algorithm again and recompute
19:49
the routes they will no longer assume
19:52
that that link exists and they may find
19:53
new routes to the destination okay so
19:57
that's that's what the hello protocol
19:58
does and so how you pick K again it's
20:01
the same trade off it depends on how
20:03
quickly you want to converge to a real
20:06
failure and picking this is difficult
20:09
for example if you were on a wireless
20:10
network where the normal packet loss
20:13
probability might be 10% something high
20:16
then waiting for a larger number of
20:20
successive failed packets is a good idea
20:22
because
20:23
just because a packets lost or two
20:25
packets of laws doesn't mean the link
20:26
has failed on the other hand if you were
20:28
running on a highly highly reliable link
20:30
in terms of packet loss like you were
20:32
running on some dedicated optical link
20:34
where the packet loss rate is you know
20:36
one part in a million then you know a
20:39
single packet missing or two packets
20:41
missing would be a good indication that
20:43
that link has actually failed or that
20:45
node at the other end of the link has
20:46
failed and therefore K could be small so
20:50
again it totally depends on the actual
20:52
system context and the normal packet
20:54
loss rates so because what you're trying
20:56
to do is to make sure you react to real
20:58
failure not to simply packet loss
21:00
there's really no way to tell the
21:02
difference there's no way to tell the
21:03
difference between a link that really
21:06
has failed versus a link with the high
21:07
packet loss rate it's a heuristic and in
21:11
fact there's really no way to tell
21:12
between a node that has actually failed
21:14
and gone away or a node that's just you
21:15
know heavily overloaded and is extremely
21:18
slow in responding there's no way to
21:20
tell so these are all heuristics that
21:22
you have to you know work with and try
21:28
to solve the problem so sometimes you
21:30
may get it wrong sometimes you may find
21:32
that a link is you may declare a link to
21:34
have failed when in fact it's still fine
21:35
but that's life and and you just have to
21:38
deal with it so is the story clear so
21:41
far as to how we deal with routing on
21:44
our failure so we're gonna apply that to
21:45
this picture and you'll find that the
21:47
answer will work yes
21:52
right like I said what did let me repeat
21:56
this what it does is first of all it
22:01
really now assumes that both the node
22:03
and the link have failed it doesn't
22:04
really know now it can definitively
22:07
assume that the link has failed the node
22:10
may still be alive because it may well
22:12
be that there's a path like that and n
22:16
wants to find that you know a route via
22:18
a to that to that destination so what it
22:21
does is really two things the first
22:22
thing it does is it may have routes in
22:25
its routing table going through that
22:27
link this link is now considered dead
22:29
and therefore it should remove those
22:30
routes and then in subsequent
22:32
advertisements it should make sure that
22:34
the cost to that destination is infinity
22:36
which is why you know you would remove
22:39
it and replace the cost to be infinity
22:41
so that you tell the other guys that
22:43
previously I told you I could get to be
22:44
with the cost of five but really now
22:47
it's infinity the second thing that's
22:49
done in the link state protocol is when
22:50
you advertise you no longer advertise
22:52
that link so really the answer to your
22:54
question is it assumes that the link has
22:56
failed
22:57
it makes no determination about the lord
22:59
yes
23:06
yes a good protocol and this will be
23:08
tested in your lab 8 in pset 8 is if a
23:11
link fails and eventually comes back up
23:13
we would like for you to actually find
23:15
that answer and this is this is an
23:18
important requirement so if the
23:19
broadcast can't that's why all the stuff
23:20
is done in the background it's done
23:21
periodically so if the link comes back
23:23
up you want to find the correct answer
23:27
any other questions okay so let me apply
23:32
this idea to this this picture here so
23:35
let's so what happened here was this
23:36
thing failed so see is at this point in
23:39
time let's assume we're doing distance
23:41
vector this protocol here so C is going
23:42
to assume that this link has failed and
23:46
therefore it tells all of the other guys
23:48
in its next advertisement it tells these
23:51
other guys that it no longer has a route
23:55
to destination D and it does that by
24:00
sending in its next link-state
24:02
advertisements it would have sent you
24:06
know what did I tell everybody that you
24:09
send out the destination on the route
24:10
and the links in the advertisement I may
24:12
have done that I met the destination and
24:14
the cost so maybe I should call this the
24:17
cost
24:19
change that so this is the stuff that
24:21
sent in the advertisements and and these
24:23
two columns are in the routing table but
24:26
anyway right now here what was that what
24:29
would be advertised was D at a cost of
24:31
three we replaced that now with deal
24:35
with the cost of infinity in our
24:37
advertisements to our neighbors so
24:38
that's what C would advertise and B when
24:43
it receives that would now find that the
24:46
route that it gets along here had
24:48
previously a cost of four but now it
24:50
says that it's replaced with the cost of
24:53
infinity so it would replace this
24:55
routing table entry would go away and it
24:57
would replace it with no route and a
24:59
cost of infinity and that's what would
25:01
propagate now these advertisements are
25:05
done periodically so what D is doing of
25:07
course is to send our two advertisements
25:09
one this way and one that way now this
25:11
thing is not going to reach because this
25:13
link no longer is alive but this
25:16
advertisement works so when a receives
25:19
the next distance-vector advertisement
25:22
from D it now knows that it has a cost
25:28
you know that link is actually alive and
25:29
it has their outgoing there now this is
25:31
this particular example is a little
25:33
tricky because what's happening here of
25:35
course is that a previously had a you
25:37
know two ways of getting to D four plus
25:40
three this way or seven that way if it
25:43
were previously using that then a would
25:46
also have no route to the destination
25:48
and then it would have to wait for this
25:50
guy to send that route so when it sends
25:53
sent that route a would now have a valid
25:55
route to the destination and in is next
25:58
advertisement it would send that route
26:00
over to these two guys so it would send
26:02
out of saying that D is at cost seven
26:06
and it would do the same thing here the
26:08
D is that cost seven to see
26:11
now see when it receives the next
26:13
advertisement that D is at cost seven
26:15
compares that route against its current
26:17
route which is now infinity and replaces
26:20
in its routing table entry the infinity
26:23
with D this link here and a cost of 4
26:26
plus 7 which is 11 so it would replace
26:29
it with D let me call this L 1 and 1 and
26:32
a cost of 11 and then on its next
26:35
advertisement si would send that out to
26:37
be according to that advertisement
26:40
schedule and similarly here s when it
26:44
receives this from a word on its next
26:47
advertisement after integrating the
26:48
route to destination D which would have
26:51
a cost of 1 + 7 8 it would send out an
26:54
advertisement this way which would have
26:57
a cost of 8 and B when it receives both
27:02
of these things would compare a passed
27:04
of 8 on this link against a cost of 1 +
27:09
4 + 7 which is 12 and it would find that
27:13
8 is smaller than 12 and therefore B
27:15
would use this way of getting to the
27:17
destination does that make sense yes
27:23
yep yep
27:28
with receiving hello packets from all
27:30
its neighbors and you know it's just you
27:33
know if a link is alive and Hollow shows
27:34
up it processes it otherwise in the
27:36
moment the first fellow shows up it
27:37
declares the link to be alive again you
27:39
know finding that someone is alive is a
27:41
lot easier than finding that they're
27:42
dead at least if they're in networks
27:44
it's probably true in life too it was
27:47
certainly true of networks because all
27:48
you have to know I mean assuming there's
27:50
no malicious nodes detecting that a node
27:53
is alive is takes one packet detecting
27:55
that a node is dead you know you're not
27:57
sure maybe it was the link was down
27:58
maybe maybe it was just a transient
28:00
failure maybe a packet was lost so it's
28:02
a lot harder to find that something is
28:04
crashed then it's then it is to find
28:06
that something's working
28:07
but yeah so you keep listening for hello
28:11
packets okay
28:13
so this is how it converges now you know
28:16
eventually of course because there's
28:19
some correct working path eventually
28:20
it'll all converge to the correct answer
28:22
enough if eventually if later at some
28:24
point in time this link comes back up
28:26
the same thing occurs because all of the
28:29
stuff is being done periodically and so
28:30
periodically these advertisements are
28:32
going to be sent C is going to find that
28:34
there's a better route to go to D via
28:36
this link l0 it advertises D now at a
28:39
cost of four and eventually you know all
28:41
of the notes figure it out and they
28:42
converge back to the right answer okay
28:45
so and you can see that the link state
28:47
protocols the convergence is actually a
28:49
little bit easier because again they're
28:51
the links are nodes of periodically
28:53
advertising these links so what's going
28:55
to happen in a link state protocol is if
28:57
you take the same picture and previously
28:59
the nodes all had routes and many of
29:01
those routes went through that link you
29:03
have to wait for the next link state
29:05
advertisement which would tell you after
29:08
the salaah protocol discovers that C
29:10
discovers that this link has failed it
29:12
takes that Nick next link state protocol
29:14
advertised link state advertisement by
29:16
which all of the nodes through the
29:18
flooding process discovered that this
29:20
link is failed and they all run
29:21
Dijkstra's algorithm again and they will
29:24
find the correct new answer which will
29:26
take them through paths that by
29:28
pass this fail link now the same logic
29:32
applies in both protocols to when a node
29:34
fails if this node were to fail you can
29:36
sort of think through the node failing
29:38
is actually equivalent to all of the
29:40
links coming out of the node failing so
29:42
it's just somewhat harder problem in
29:43
terms of just making sure that you are
29:46
able to find the routes correctly but
29:48
this node failing is really the same as
29:50
all of the links attached to that node
29:52
failing and in a link state protocol
29:54
it'll eventually you'll discover that
29:55
and all of the nodes will computer outs
29:57
this way and similarly in a distance
29:59
vector protocol that's what happens now
30:02
so far in this picture I've assumed that
30:04
once you have these failures and then
30:07
you pause nothing else happens there's
30:10
no more failures there's no packets that
30:11
are lost and so on but life's actually
30:13
not so kind what will happen in practice
30:16
is that first of all before I get to why
30:20
this stuff is a lot more complicated
30:21
does everyone understand how these
30:23
things work and how they converge
30:25
correctly to the right answer after
30:27
failure and after recovery from failure
30:30
any questions
30:34
okay so now let me tell you all the ways
30:36
in which the story goes wrong the first
30:43
way the story goes wrong is let me do it
30:46
in the context of a link state protocol
30:47
with a very very simple picture let's
30:50
say you have I think I have a slide all
30:55
right
30:56
let's say you have the picture that I've
30:57
shown up there so very very simple
31:01
picture there's a B and D D is the
31:04
destination and this is some path so
31:06
let's say that what happens is that
31:08
normally when there's no failures the
31:11
way to go from B to D is via a so B ad
31:14
and a goes to D directly now let's
31:17
assume that this link fails if that link
31:21
sales and things work great what's going
31:23
to happen is that in the next link state
31:25
advertisement it tells B that ad no
31:28
longer exists it knows the correct link
31:31
state from B and so it computes its path
31:34
via B it's route via B and similarly
31:37
beed realizes that ad doesn't exist
31:40
anymore and it computes an alternate
31:42
route that way but let's say what
31:45
happens is that ad fails and then in the
31:48
next link State advertisement that a
31:51
sends out that that packets lost let's
31:54
say that a is link-state advertisements
31:55
be is just lost right packets could be
31:58
lost
31:59
now we have a problem because
32:02
he knows that this has failed and
32:04
therefore when it computes it's
32:06
Dijkstra's algorithm or shortest path
32:08
algorithm it knows that what it wants is
32:10
a route going like that but B on the
32:13
other hand doesn't know that link ad has
32:16
failed because they didn't see that
32:17
link-state advertisements which was lost
32:19
so what B does is compute its routing
32:22
table entry which is the same as it was
32:23
before going through that link over here
32:29
now you have a problem because when a
32:32
gets a packet that one our data packet
32:34
that it wants to send to destination D
32:36
previously it sent it this way but now
32:38
it knows that link has failed so it
32:39
sends the packet to B because it's route
32:41
for D is via B well B gets that packet
32:45
and looks it up in its routing table and
32:46
B believes that the way to get to D is
32:49
via a so it sends it back to a well they
32:52
gets that back and says oh that's great
32:54
this is a packet for destination D I
32:56
look it up in my routing table that goes
32:58
by a B and this ping pongs for you know
33:01
pretty much as long as you want this
33:03
thing here is the simplest example of a
33:06
general phenomenon called a routing loop
33:08
so the first thing that can happen when
33:10
during the process of route convergence
33:13
various kinds of pathologies and
33:15
problematic conditions could happen and
33:17
one of them is a routing loop
33:21
the second thing that could happen and I
33:23
showed you that here where during the
33:27
process of convergence si does not have
33:29
a route to this link to the destination
33:31
D but B thought it had a route going by
33:33
a C but in fact C just dropped that
33:36
package that's the second condition that
33:37
happens it's a dead end so both of these
33:42
things can happen during the process of
33:43
convergence
33:44
now these routing loops are particularly
33:45
problematic because when you have a
33:47
routing loop this is an example of a to
33:49
hop routing loop right a goes to B B
33:52
goes to a was to B so it bounces twice
33:54
but you can have more complicated
33:55
routing loops you could have a routing
33:57
loop with four hops that looks like or
34:00
four four nodes involved as opposed to
34:03
two nodes where this is destination D
34:06
and this a thinks that you have to go
34:09
that way B things that you have to go
34:10
this way I see things you have to go
34:12
this way and let's call this guy e he
34:14
thinks you have to go that way and this
34:15
could happen so you end up with packets
34:17
cycling around now these packets cycling
34:20
around you know there's really no way to
34:23
once you have routing table entries that
34:26
have somehow converged until it gets
34:29
fixed somehow if they've converged to
34:31
routes that we're for B you have to use
34:34
this link and C has to use this link and
34:36
so forth and you get a cycle of what
34:39
ends up happening is these packets cycle
34:41
forever there's really no way to avoid
34:44
the packet cycling forever now this is
34:49
of course eventually this will be fixed
34:51
if the Varden protocol eventually
34:52
converges it'll eventually discover that
34:54
this is wrong and find the correct
34:55
answer but during the process of
34:57
convergence bad things could happen like
34:59
this and that's why we have on packets
35:02
in in packet switched networks a field
35:04
called the hop limit field and that's on
35:07
a data packet so you have the source of
35:09
the packet set a hop limit let's say 32
35:12
it just says that I need to get to the
35:14
destination I know it shouldn't take
35:16
more than 32 Hoff's no matter what
35:18
happens
35:18
and then every node that every switch
35:20
that gets this packet reduces the hop
35:22
limit by one and eventually when the hop
35:26
limit gets to zero the packets discarded
35:29
so this is a way to flush packets out of
35:30
the network and usually you use this
35:32
mechanism to handle the case when you
35:35
get stuck in a routing loop you don't
35:38
want these packets to cycle around
35:39
forever and ever and ever because these
35:41
packets move around the network in you
35:43
know milliseconds and routing protocols
35:45
take minutes to converge or many many
35:47
seconds to converge so that's many many
35:48
milliseconds these packets could remain
35:50
in the network forever and ever
35:51
using a bandwidth and no one's getting
35:53
any use out of it so you have this hop
35:55
limit field to flush packets out of the
35:57
system but of course what we'd like to
36:00
do is to design protocols but guaranteed
36:02
no routing loops at all unfortunately
36:05
it's impossible to do that but what we
36:07
can try to do is to reduce and mitigate
36:09
the effects of routing loops now I want
36:14
to go through a few more examples of
36:15
routing loops in this isn't the link
36:17
state protocol I want to actually now
36:19
talk about what happens with a
36:20
distance-vector protocol and show you
36:23
why this basic simple distance-vector
36:25
protocol which is the first routing
36:27
protocol that was invented has some
36:29
problems on how we go about fixing it
36:31
and eventually I'll talk about how this
36:32
is all used on on the Internet today
36:35
so here's how a distance protocol
36:38
distance vector protocol might get stuck
36:41
in a a weird kind of routing loop so
36:45
let's take this example here where you
36:48
have five nodes and we're all interested
36:50
in finding routes to destination ii and
36:52
the general lesson i want to get at here
36:54
is that a distance vector protocol is
36:56
extremely simple but it only works on
36:59
small networks and for bigger networks
37:02
we want something better so that's where
37:04
the story is going so let me refresh
37:07
where we are all the discussion I had so
37:09
far so let's assume that
37:11
link a/c fails in this picture so what
37:14
you would like to have I'm assuming
37:16
they're all linked us are one so we
37:17
don't worry about costs at this point
37:19
what you want to have happen is for a to
37:22
discover that this a discovers that's
37:24
failed and when there are in converges
37:26
you would like a to use this link as its
37:29
route to destination E and the cost
37:31
would be 1 1 1 which would be 3 all
37:35
right so when a discovers failure it
37:36
sends the cost of E is infinity to its
37:39
neighbor in particular to B and then B
37:43
of course you know has a route to
37:45
destination e at cost to be advertises
37:49
back to a and then a says now I have a
37:51
route to destination E and that's this
37:53
is an example of a good converging
37:55
routing protocol everything is good now
37:59
let's assume I complicate the picture
38:00
let's assume that link B D also fails so
38:04
now what's the correct answer well these
38:07
two guys have a route to e but the
38:11
network has become disconnected so the
38:13
correct answer the correct convergent
38:14
answer here is that a and B both
38:16
discover and instantiate in the routing
38:19
tables entries that say that E is at a
38:22
cost of infinity right because there's
38:24
no path which means that it's an
38:25
infinite cost so when a packet arrives
38:27
at B for destination de you just drop
38:31
the packet but this could happen so
38:34
here's an example of how that happens so
38:37
let's say that this link fails B
38:40
discovers that through the hello
38:41
protocol and at this point beef changes
38:44
its routing table entry so that E is at
38:47
infinity we had previously sent
38:50
information to a saying that B was he
38:53
was at distance - or cos - and now it
38:56
says well I told you that a was that he
38:59
was at cost - before but I'm changing my
39:01
mind it's a cost infinity and a says ok
39:03
my entry for e now has cost infinity and
39:07
both of them have converged correctly to
39:09
the right answer
39:11
now unfortunately that's not the only
39:13
thing that could happen this was in the
39:16
lucky situation when we discovered it
39:18
had failed and immediately sent out its
39:20
cost to a but what could happen is a
39:24
little different what could happen is
39:26
that we could discover that D has failed
39:29
and change its routing table in entry to
39:33
infinity but before it gets a chance to
39:36
send out its advertisement to a or
39:38
perhaps it sends out its advertisement
39:41
with the cost of infinity to a but it
39:42
got lost in either of those cases what
39:45
could happen is a could send out its
39:47
routing table cost to be for destination
39:52
e because that's what's happening
39:54
periodically right every node is
39:55
periodically running this and the times
39:57
are always synchronous every node has
39:58
its own notion of what you know when it
40:00
should send out its advertisement so
40:03
what happens now is a sends out an
40:05
advertisement to be saying it has a
40:08
route to destination e at a cost of
40:11
three which is very valid right after
40:14
all it does have a route in its routing
40:17
table to e whose cost is three it so
40:20
happens it goes through be but it
40:22
doesn't yet know that that link BD has
40:24
failed now we're a little bit in trouble
40:29
because B now believes that its routing
40:31
table entry for e is the cost of
40:33
infinity because BD has failed that link
40:35
and now it sees an advertisement from a
40:39
with a better cost HUD says wow this is
40:43
cool I now have a path of cost three via
40:47
A to E which is better than my cost of
40:51
infinity and so I'm going to assume that
40:53
I have an entry to e at a cost of four
40:57
so now it's you know you can see that
40:59
this is actually not a valid crowd at
41:01
all now be actually on the face if it
41:05
has no way of knowing if is telling it a
41:08
different route so it could conceivably
41:09
be the case that it has a different
41:11
route going that way whose cost is three
41:15
and it can legitimately have a cost of
41:18
three to e that it could be telling be
41:21
about but in this protocol there's no
41:24
way for me to distinguish that case from
41:26
the case where it is just repeating to
41:29
be a route that it received via B but
41:31
it's just telling it that it has a route
41:33
via B now B therefore says that has a
41:37
cost of 4 2 e and it sends that to a and
41:40
a says whoa previously the cost was 3r 2
41:43
from B and now B is telling me that the
41:46
cost is 4 which means I need to make my
41:48
cost equal to 4 plus 1 which is 5 and
41:51
I'm going to send that back down Abby
41:53
says all right I got a cost of 5
41:55
previously that same thing had a cost of
41:57
3 so now I'm gonna make my cost 6 and
42:00
this goes on forever now in the meantime
42:03
if they're a package showing up at
42:04
either A or B for destination e they're
42:07
just going to go bouncing between these
42:08
two guys this is a routing loop
42:12
now when does the stop when do these are
42:14
our you know when do these guys stop
42:15
selling these incrementing costs sorry
42:23
yeah you need a value of infinity you
42:26
need to say that at some point they're
42:27
going to reach infinity and we're going
42:29
to stop so in other words for this
42:31
protocol as its presented to converge in
42:33
a legitimate amount reasonable amount of
42:35
time your value of infinity should be
42:37
small so this thing has a colorful name
42:41
it's called counting to infinity now in
42:43
reality in no network in any network II
42:45
the cost of infinity cannot be smaller
42:47
than the you know the minimum the bet
42:51
the maximum minimum cost path right if
42:53
you have a minimum cost path that has a
42:55
cost of 75 for whatever reason infinity
42:58
should better be have better be bigger
42:59
than 75 right so what it means is that
43:03
you have a problem with this protocol it
43:06
only it works great on small networks
43:07
but it only works on small networks and
43:09
the reason for that is that it needs a
43:11
value of infinity that's not very big so
43:13
this is why distance vector protocols
43:16
are only used for really simple small
43:18
networks and the moment the network
43:20
becomes a certain size or when you want
43:22
costs that are you know large values you
43:25
really can't use this protocol so how do
43:28
you fix this problem
43:32
any ideas on how to solve this problem
43:34
and we clearly the you know we're
43:36
religion it's pretty big we're not you
43:39
know counting to infinity throughout the
43:41
whole internet so at least we you don't
43:42
think we are so how would you fix this
43:44
problem any ideas
43:51
yes so we do have a half limit on
43:55
packets so all these packets might have
43:57
a half limit so the packets don't remain
43:59
in the network for a long time but that
44:01
doesn't solve the problem of the routing
44:02
protocol to converge takes a time that
44:05
you know which is the the discounting to
44:10
infinity problem so you want a better
44:12
solution in some way yes
44:24
okay so that's one good idea which is in
44:28
fact that's how they started trying to
44:29
solve this problem which is if you have
44:32
a route to a destination coming from a
44:36
neighbor
44:36
don't send back the same route to them
44:39
in other words in this case is route 2 e
44:43
was via B so it should not advertise air
44:47
up to E or route for destination 'back
44:50
to B if you do that there's a name for
44:53
it it's called split horizon the notes
44:54
describe how this protocol works or you
44:57
could do even better you could actually
44:59
a could advertise to be that it's route
45:02
to destination e has a cost of infinity
45:05
forcing B to definitely not use that
45:08
route no matter what happens because a
45:10
received that route via B so a should
45:12
tell me that the cost of that route is
45:14
infinity because under no circumstances
45:16
does a want B to use the same route that
45:18
it received right so you could do that
45:20
the trouble with that is it doesn't it
45:22
solves these two hop loop problems but
45:24
it doesn't solve for hop loop problems
45:26
so you could have a situation where like
45:27
this link fails and C discovers that but
45:32
before C sends out its update B sends
45:34
out its route to C and so see things
45:37
that can use B in the meantime be things
45:40
that you know it has a route via a so
45:42
you might end up with packet cycling
45:44
around in longer hop loops so that idea
45:46
that you had doesn't actually solve the
45:48
more general problem so any other
45:50
modification or idea that can solve the
45:52
problem
46:00
so one thing you could do is something
46:03
called path vector which is what you
46:05
could do is every node rather than just
46:07
sending the cost it could send the
46:09
entire route that is sorry it could send
46:12
the entire path it could send the list
46:15
of nodes that corresponds to that
46:17
particular route in its routing table
46:19
advertisement so I'll show that with a
46:20
picture here so II could send out not
46:24
just its destination and a cost which
46:26
previously it would say that to come to
46:28
II II would say the cost is zero
46:30
but it could now say the cost is zero
46:32
and the path is e and then each of these
46:36
other not guys C and D could send out
46:38
it's their own advertisements saying the
46:43
cost is 2 or whatever the cost is 1 but
46:46
they could also say that the path is de
46:48
so D says that my path to get to e is de
46:52
in its advertisement and B here when it
46:54
receives that could send out its own
46:57
path vector which is the list of nodes
47:03
or the list of switches that corresponds
47:06
to an actual path that's advertised and
47:08
now the rule for how you integrate a
47:11
route into your writing table entries is
47:13
very simple if you see an advertisement
47:16
with your own identity in that
47:18
advertisement then you know that you
47:21
know that's just a rumor that you
47:22
started or you were involved with so you
47:23
shouldn't integrate it so in particular
47:25
in this example here if B for example
47:28
were to see an advertisement from a with
47:31
a path that was a BD e then B wouldn't
47:35
integrate that so what would happen in
47:37
the picture I showed you before is these
47:39
two links were to fail what would happen
47:41
is that you know B would have if that
47:43
link failed B would have sent that BD e
47:46
over here and when a advertises that
47:49
back to B it would have a BD e show up
47:52
and B now sees a BD e and B finds that
47:56
its own name is
47:57
that vector or in that advertisement and
48:00
says I should pay no attention to that
48:02
and as long as you find your own name
48:05
somewhere in the vector in that list of
48:07
nodes that that routing advertisement
48:09
went through you know that you shouldn't
48:11
pay any attention to it because it you
48:13
know you were involved in creating that
48:15
advertisement and so you shouldn't pay
48:16
attention to it this protocol is called
48:19
path vector it's used on the Internet in
48:21
something called the border gateway
48:22
protocol which runs between autonomous
48:24
systems and that's actually what what
48:28
makes the internet essentially converge
48:30
and not have these routing loops that go
48:32
between different Internet service
48:34
providers any questions comments so far
48:40
about any of this stuff
48:45
so let me summarize everything about
48:47
routing protocols and we pick this up in
48:49
recitation with some problems tomorrow
48:50
so the last two lectures in recitation
48:53
we've spoken about the network layer and
48:55
the main problem that solves by the
48:57
network layer is how to get packet
48:59
routing to work how do you find good
49:01
paths between different nodes in the
49:04
network between different switches in
49:06
the network now we separated out the
49:10
tasks of routing from forwarding
49:12
forwarding is what happens when a packet
49:14
arrives at a switch there's a lookup
49:16
that happens in a routing table you take
49:18
the destination you look it up in the
49:19
table find the link in the routing table
49:22
and ship the packet so that's done
49:24
usually you want it to be done very very
49:26
fast
49:26
the routing is the process by which the
49:28
nodes create routing table entries and
49:30
that's a very distributed process it
49:32
runs amongst all of the other all of the
49:35
switches in the network we looked at two
49:37
routing protocols link state and
49:39
distance vector in links in distance
49:41
vector the computation is distributed
49:43
with these bellman-ford update steps and
49:46
the distance vector protocol is very
49:48
beautiful in that it's very very simple
49:50
works for small networks but to make the
49:52
idea work for bigger networks you have
49:54
to enhance the distance with the actual
49:56
path and if you enhance it with the path
49:58
you actually avoid a lot of these
50:00
routing loops that show up you can't
50:01
eliminate it but you can mitigate the
50:03
effect of it in the link state protocol
50:07
there's actually more work that's done
50:09
there's a lot more information that's
50:10
flooded between nodes but the protocol
50:13
converges quicker than these distance
50:15
vector and path vector protocols usually
50:17
linked state protocol you flood this
50:19
neighbor information you consume more
50:20
bandwidth there's a lot more bandwidth
50:22
that's used in networking flooding it
50:24
and the computation is centralized you
50:25
run Dijkstra shortest paths so what the
50:28
internet does in general I'll pick up on
50:29
this to lecture three lectures from now
50:31
when I talk about how the internet
50:32
really works and applies the concepts
50:34
we've studied what you'll find is that
50:36
networks like mi t--'s network will run
50:38
a protocol like links to a link state
50:41
like protocol to achieve connectivity
50:43
between nodes inside MIT and then
50:45
routers at the edge of MIT connecting to
50:47
other Internet service providers are on
50:49
a path vector
50:50
like bgp and all of these things work
50:53
together and they work because
50:54
ultimately all of the switches create
50:56
these routing table entries that have a
50:57
mapping between destinations and routes
51:00
or links that have to be used so that's
51:03
the routing story we will pick it up in
51:04
recitation tomorrow and see you back on
51:07
Wednesday