1
00:00:22,500 --> 00:00:25,949
all right so uh we need to talk more
2
00:00:25,949 --> 00:00:28,680
about replication and consistency of
3
00:00:28,680 --> 00:00:32,879
course it's in the picture well let me
4
00:00:32,879 --> 00:00:35,100
just point something out it's kind of
5
00:00:35,100 --> 00:00:37,050
interesting if we do no replication we
6
00:00:37,050 --> 00:00:39,570
essentially have no consistency issues
7
00:00:39,570 --> 00:00:42,180
well if you ignore the consistency
8
00:00:42,180 --> 00:00:44,129
issues within the same the same machine
9
00:00:44,129 --> 00:00:45,329
but never the last life would be much
10
00:00:45,329 --> 00:00:48,059
easier if you wouldn't replicate so this
11
00:00:48,059 --> 00:00:49,440
in fact begs the question why bother
12
00:00:49,440 --> 00:00:51,659
replicate and fault tolerance is the is
13
00:00:51,659 --> 00:00:54,600
the big big big issue together with
14
00:00:54,600 --> 00:00:56,640
performance enhancements but you have to
15
00:00:56,640 --> 00:00:57,930
be careful especially when you're
16
00:00:57,930 --> 00:00:59,699
selling replication as a performance
17
00:00:59,699 --> 00:01:02,220
enhancing technique you have to make
18
00:01:02,220 --> 00:01:03,629
sure that you're in fact achieving
19
00:01:03,629 --> 00:01:05,220
higher performance which is not
20
00:01:05,220 --> 00:01:07,110
necessarily the case right so that's the
21
00:01:07,110 --> 00:01:09,000
first question to ask when you need to
22
00:01:09,000 --> 00:01:11,040
pick solutions for replications all
23
00:01:11,040 --> 00:01:13,380
right am I really getting better
24
00:01:13,380 --> 00:01:14,970
performance I mean there are other ways
25
00:01:14,970 --> 00:01:18,090
too in fact yet some sort of protection
26
00:01:18,090 --> 00:01:20,130
against fault tolerance availability is
27
00:01:20,130 --> 00:01:21,420
one of the issues but maybe it's not
28
00:01:21,420 --> 00:01:24,150
even your biggest problem you can do
29
00:01:24,150 --> 00:01:25,590
backups and other things that can help
30
00:01:25,590 --> 00:01:28,620
with things besides availability which
31
00:01:28,620 --> 00:01:29,880
you should be doing in the first place
32
00:01:29,880 --> 00:01:32,610
right it turns out that in general it's
33
00:01:32,610 --> 00:01:35,130
hard to achieve performance improved
34
00:01:35,130 --> 00:01:37,950
performance right we've seen this for
35
00:01:37,950 --> 00:01:39,900
example when you have to use some sort
36
00:01:39,900 --> 00:01:42,180
of total ordering of operations right
37
00:01:42,180 --> 00:01:44,190
it's so tedious in terms of how many
38
00:01:44,190 --> 00:01:45,840
messages needs to be exchanged and so on
39
00:01:45,840 --> 00:01:48,210
that you can easily see that the
40
00:01:48,210 --> 00:01:51,380
performance could in fact dramatically
41
00:01:51,380 --> 00:01:54,299
spiral down right if that happens of
42
00:01:54,299 --> 00:01:56,970
course the benefits of doing replication
43
00:01:56,970 --> 00:01:58,760
disappear very fast so the fact that
44
00:01:58,760 --> 00:02:00,780
you're replicating doesn't mean
45
00:02:00,780 --> 00:02:02,610
necessarily you have a good solution all
46
00:02:02,610 --> 00:02:03,960
right so what I want to do in this class
47
00:02:03,960 --> 00:02:05,610
is start start discussing some other
48
00:02:05,610 --> 00:02:07,860
issues related to replication and
49
00:02:07,860 --> 00:02:09,959
discuss some solutions for all those
50
00:02:09,959 --> 00:02:11,849
consistency concerns we had before right
51
00:02:11,849 --> 00:02:12,959
so we only talked about consistency
52
00:02:12,959 --> 00:02:15,390
models but we didn't have any solution
53
00:02:15,390 --> 00:02:16,830
so we only said you must take care of
54
00:02:16,830 --> 00:02:18,420
this without any indication of how you
55
00:02:18,420 --> 00:02:20,819
might do that okay so let's do some of
56
00:02:20,819 --> 00:02:23,750
these things so one of the issues is a
57
00:02:23,750 --> 00:02:26,610
replica server placement and you can
58
00:02:26,610 --> 00:02:28,739
spend as little or as much time on this
59
00:02:28,739 --> 00:02:30,420
issue I mean that's true with almost any
60
00:02:30,420 --> 00:02:32,790
issues in in this area because there are
61
00:02:32,790 --> 00:02:35,720
no fixed determined
62
00:02:35,720 --> 00:02:38,200
probably good solutions in here right so
63
00:02:38,200 --> 00:02:40,910
the kind of situation you can have is
64
00:02:40,910 --> 00:02:42,110
the following right you have a
65
00:02:42,110 --> 00:02:44,330
non-uniform distribution of let's say
66
00:02:44,330 --> 00:02:45,410
clients I mean this could be
67
00:02:45,410 --> 00:02:47,920
geographical distribution or
68
00:02:47,920 --> 00:02:50,270
connectivity distribution or things of a
69
00:02:50,270 --> 00:02:54,620
sort and the question is how should you
70
00:02:54,620 --> 00:02:56,930
in fact allocate clients if you want to
71
00:02:56,930 --> 00:03:02,030
servers at least for the normal the
72
00:03:02,030 --> 00:03:03,980
normal execution the normal functioning
73
00:03:03,980 --> 00:03:05,930
of the system right so most of the time
74
00:03:05,930 --> 00:03:07,280
you would like the client to go to some
75
00:03:07,280 --> 00:03:09,950
sort of maybe closeby server and only if
76
00:03:09,950 --> 00:03:12,800
that server fails to use some sort of
77
00:03:12,800 --> 00:03:14,210
fault tolerance to move to another
78
00:03:14,210 --> 00:03:16,910
server so on and so forth now I want to
79
00:03:16,910 --> 00:03:19,640
point out that these problems in various
80
00:03:19,640 --> 00:03:21,650
forms have been in Cloudant encountered
81
00:03:21,650 --> 00:03:25,430
in other in other areas for example cell
82
00:03:25,430 --> 00:03:27,920
phone technology right this is a really
83
00:03:27,920 --> 00:03:30,110
big problem for cell phone technology we
84
00:03:30,110 --> 00:03:31,970
have to decide where you're gonna put
85
00:03:31,970 --> 00:03:36,950
those big antennas and what each of the
86
00:03:36,950 --> 00:03:40,340
cell phones has in fact through complex
87
00:03:40,340 --> 00:03:42,140
protocols has to determine which is the
88
00:03:42,140 --> 00:03:44,959
tower it actually talks to and if you
89
00:03:44,959 --> 00:03:46,730
don't have enough towers let's this is
90
00:03:46,730 --> 00:03:48,800
when you actually get congestion right
91
00:03:48,800 --> 00:03:50,450
and of course congestion in the cell
92
00:03:50,450 --> 00:03:51,590
phone network makes everybody very
93
00:03:51,590 --> 00:03:53,360
unhappy because cause get dropped and
94
00:03:53,360 --> 00:03:54,980
whatnot okay depending on how the
95
00:03:54,980 --> 00:03:56,450
technology is put together I mean you
96
00:03:56,450 --> 00:03:59,000
might get crappy phone call or you might
97
00:03:59,000 --> 00:04:01,090
simply not get connectivity at all right
98
00:04:01,090 --> 00:04:03,530
but the same kind of issue can be put in
99
00:04:03,530 --> 00:04:04,720
different contexts
100
00:04:04,720 --> 00:04:07,970
I keep on mentioning Akamai right Akamai
101
00:04:07,970 --> 00:04:11,120
is placing servers closer to the users
102
00:04:11,120 --> 00:04:12,380
but for ACMA is very important to
103
00:04:12,380 --> 00:04:13,880
determine where to place servers because
104
00:04:13,880 --> 00:04:17,209
none of this is free or particularly
105
00:04:17,209 --> 00:04:19,370
cheap right for Akamai to place a
106
00:04:19,370 --> 00:04:22,640
particular server in a particular if you
107
00:04:22,640 --> 00:04:24,050
want corner of the Internet I mean they
108
00:04:24,050 --> 00:04:25,480
have to spend money in rent space
109
00:04:25,480 --> 00:04:27,740
probably at the ISP which is gonna
110
00:04:27,740 --> 00:04:29,450
charge quite a lot because it's kind of
111
00:04:29,450 --> 00:04:33,290
prime real estate right so if this
112
00:04:33,290 --> 00:04:36,050
replica server placement is very much
113
00:04:36,050 --> 00:04:39,080
tied with money with commercial
114
00:04:39,080 --> 00:04:40,970
interests right and this is one of the
115
00:04:40,970 --> 00:04:43,729
reasons why it's somewhat studied in the
116
00:04:43,729 --> 00:04:45,500
research literature but extremely well
117
00:04:45,500 --> 00:04:47,000
studied behind the scenes and the
118
00:04:47,000 --> 00:04:48,840
solutions kept secret by the various
119
00:04:48,840 --> 00:04:50,040
companies that need to do this kind of
120
00:04:50,040 --> 00:04:52,230
things right so for Eisen and AT&T are
121
00:04:52,230 --> 00:04:53,850
very concerned about for example cell
122
00:04:53,850 --> 00:04:56,160
cell tower placement but now also about
123
00:04:56,160 --> 00:04:57,990
server placement Akamai is selling the
124
00:04:57,990 --> 00:04:59,340
service in which they determine where to
125
00:04:59,340 --> 00:05:03,780
place the servers each particular
126
00:05:03,780 --> 00:05:07,830
company can in fact dude
127
00:05:07,830 --> 00:05:09,690
I mean pose the same kind of a problem
128
00:05:09,690 --> 00:05:12,270
and in fact there is a problem that has
129
00:05:12,270 --> 00:05:14,370
been studied for close to hundred years
130
00:05:14,370 --> 00:05:17,340
which has nothing to do with I mean not
131
00:05:17,340 --> 00:05:20,190
in a direct way with necessarily
132
00:05:20,190 --> 00:05:23,910
replica placement but has to do with for
133
00:05:23,910 --> 00:05:26,130
example distribution center placement
134
00:05:26,130 --> 00:05:33,500
right so think for example about a large
135
00:05:33,500 --> 00:05:37,680
grocery store chain right Publix ok so
136
00:05:37,680 --> 00:05:39,750
Publix actually has to take when they
137
00:05:39,750 --> 00:05:41,430
expand and when they do various things I
138
00:05:41,430 --> 00:05:43,500
have to take some hard decisions with
139
00:05:43,500 --> 00:05:46,080
respect where to place those big
140
00:05:46,080 --> 00:05:48,090
warehouses right so the trucks come to
141
00:05:48,090 --> 00:05:50,639
the warehouse and then smaller trucks or
142
00:05:50,639 --> 00:05:53,220
different trucks go and serve each of
143
00:05:53,220 --> 00:05:55,889
the stores and to a large extent is the
144
00:05:55,889 --> 00:05:57,300
same kind of a problem so if you have a
145
00:05:57,300 --> 00:05:59,729
densely populated area right which has
146
00:05:59,729 --> 00:06:01,560
many many many stores then you better
147
00:06:01,560 --> 00:06:03,810
place more of this lot of warehouses and
148
00:06:03,810 --> 00:06:05,490
potentially of different sizes you can
149
00:06:05,490 --> 00:06:06,630
add size of the server as well
150
00:06:06,630 --> 00:06:08,610
capabilities of the server as well right
151
00:06:08,610 --> 00:06:10,110
because otherwise you're not gonna keep
152
00:06:10,110 --> 00:06:12,690
all those stores filled but if you have
153
00:06:12,690 --> 00:06:15,930
a sparsely populated area right you
154
00:06:15,930 --> 00:06:17,550
might actually place far fewer such
155
00:06:17,550 --> 00:06:20,280
warehouses and one of those warehouses
156
00:06:20,280 --> 00:06:22,650
is going to cost you I mean probably in
157
00:06:22,650 --> 00:06:24,419
the hundred million dollar range right
158
00:06:24,419 --> 00:06:27,000
so you don't really want to just really
159
00:06:27,000 --> 00:06:28,410
nearly place them in there now the
160
00:06:28,410 --> 00:06:30,720
property is not so bad for cell phone
161
00:06:30,720 --> 00:06:32,460
towers and it's probably much cheaper
162
00:06:32,460 --> 00:06:34,200
when it comes to placement of the of the
163
00:06:34,200 --> 00:06:36,539
servers and to some extent you have a
164
00:06:36,539 --> 00:06:38,550
lot more flexibility in creating the new
165
00:06:38,550 --> 00:06:40,830
one right so the cost of creating such a
166
00:06:40,830 --> 00:06:44,910
resource has to be figured out in the in
167
00:06:44,910 --> 00:06:46,380
the entire story which suggests the
168
00:06:46,380 --> 00:06:49,410
following kind of approach right these
169
00:06:49,410 --> 00:06:51,210
problems tend to be hard I mean it's
170
00:06:51,210 --> 00:06:53,910
known that optimal placement of such
171
00:06:53,910 --> 00:06:56,700
resources almost always is np-hard
172
00:06:56,700 --> 00:06:59,310
you're all taking the algorithms class
173
00:06:59,310 --> 00:07:01,289
right I think it's on the list of the
174
00:07:01,289 --> 00:07:02,050
core courses
175
00:07:02,050 --> 00:07:04,509
right so trying to find the best
176
00:07:04,509 --> 00:07:06,759
solution is a little bit foolish in this
177
00:07:06,759 --> 00:07:08,620
kind of circumstances a lot of them
178
00:07:08,620 --> 00:07:09,960
don't even have a good approximation
179
00:07:09,960 --> 00:07:13,240
algorithm right so good a eristic sits
180
00:07:13,240 --> 00:07:14,770
kind of the best you can hope for that
181
00:07:14,770 --> 00:07:16,419
work in real life and by the way this is
182
00:07:16,419 --> 00:07:18,789
what Akamai is famous for so Akamai was
183
00:07:18,789 --> 00:07:21,840
founded by theoreticians working on
184
00:07:21,840 --> 00:07:24,280
placement problems and actually
185
00:07:24,280 --> 00:07:25,900
randomized algorithms for placement
186
00:07:25,900 --> 00:07:27,819
which were not which were essentially
187
00:07:27,819 --> 00:07:29,020
offering some kind of randomized
188
00:07:29,020 --> 00:07:31,479
guarantees the algorithms they proved
189
00:07:31,479 --> 00:07:32,590
nice things you're ethically but they
190
00:07:32,590 --> 00:07:34,419
realize they're actually very cute in
191
00:07:34,419 --> 00:07:36,310
practice as well and this is when they
192
00:07:36,310 --> 00:07:38,020
started transitioning towards hey maybe
193
00:07:38,020 --> 00:07:40,000
we can actually figure out how to play
194
00:07:40,000 --> 00:07:42,130
the game right now for a company like
195
00:07:42,130 --> 00:07:44,110
Akamai and you might be close to the
196
00:07:44,110 --> 00:07:45,520
situation potentially at least some of
197
00:07:45,520 --> 00:07:48,310
you for a company like Akamai what's
198
00:07:48,310 --> 00:07:50,280
important is to be as lean as possible
199
00:07:50,280 --> 00:07:53,050
right other companies that do this kind
200
00:07:53,050 --> 00:07:54,909
of place that server placement can
201
00:07:54,909 --> 00:07:57,580
actually pop up and what matters is who
202
00:07:57,580 --> 00:07:59,259
can save more money when it comes to
203
00:07:59,259 --> 00:08:00,550
this because you can offer them the
204
00:08:00,550 --> 00:08:04,719
service a better cost so being wasteful
205
00:08:04,719 --> 00:08:06,909
in placement of the servers and so on
206
00:08:06,909 --> 00:08:09,250
right becomes very problematic I mean it
207
00:08:09,250 --> 00:08:11,199
literally can lead to bankruptcy very
208
00:08:11,199 --> 00:08:14,169
fast for a company like Akamai ok now
209
00:08:14,169 --> 00:08:16,419
similar problems are faced for example
210
00:08:16,419 --> 00:08:20,919
by the large if you want cloud companies
211
00:08:20,919 --> 00:08:23,440
which is about everybody now right for
212
00:08:23,440 --> 00:08:26,110
example when Google decides where to put
213
00:08:26,110 --> 00:08:28,150
another big data center right they have
214
00:08:28,150 --> 00:08:30,159
to think about where do they need more
215
00:08:30,159 --> 00:08:32,589
capacity if you want so this is these
216
00:08:32,589 --> 00:08:34,179
problems are called to some extent
217
00:08:34,179 --> 00:08:39,219
capacity problems right so one approach
218
00:08:39,219 --> 00:08:42,760
for example is find congestions in the
219
00:08:42,760 --> 00:08:44,440
system and place more capacity right
220
00:08:44,440 --> 00:08:45,060
there
221
00:08:45,060 --> 00:08:47,770
this is know not to lead to optimal
222
00:08:47,770 --> 00:08:49,870
solutions but might be reasonably good
223
00:08:49,870 --> 00:08:55,779
right right so it's rumored for example
224
00:08:55,779 --> 00:08:57,670
now that Google is building some sort of
225
00:08:57,670 --> 00:09:01,029
a floating data center in San Francisco
226
00:09:01,029 --> 00:09:04,300
area right people have seen some barges
227
00:09:04,300 --> 00:09:07,000
that look suspicious they ran out of
228
00:09:07,000 --> 00:09:08,589
land I suspect and there is I mean
229
00:09:08,589 --> 00:09:10,360
there's too much stuff going on in that
230
00:09:10,360 --> 00:09:10,840
area
231
00:09:10,840 --> 00:09:14,470
a lot of calm activity right but to a
232
00:09:14,470 --> 00:09:16,660
large extent is to see it's the same
233
00:09:16,660 --> 00:09:19,510
kind of problem so I don't want to talk
234
00:09:19,510 --> 00:09:21,570
too much about this because again even
235
00:09:21,570 --> 00:09:24,280
theoretically all these problems are
236
00:09:24,280 --> 00:09:25,810
actually very hard and you can't really
237
00:09:25,810 --> 00:09:30,120
prove too much about them a reasonable
238
00:09:30,120 --> 00:09:32,350
heuristic solution might actually go a
239
00:09:32,350 --> 00:09:33,880
long way I mean one of the problems is
240
00:09:33,880 --> 00:09:35,050
the following again I want to point out
241
00:09:35,050 --> 00:09:38,830
the dangers of trying too hard on these
242
00:09:38,830 --> 00:09:43,390
kinds of problems so when you model this
243
00:09:43,390 --> 00:09:45,360
problem you can model a theoretically as
244
00:09:45,360 --> 00:09:47,710
an optimization problem this is the
245
00:09:47,710 --> 00:09:50,680
standard approach right an optimization
246
00:09:50,680 --> 00:09:52,390
problem requires some sort of a goodness
247
00:09:52,390 --> 00:09:54,490
measure or badness measure and then what
248
00:09:54,490 --> 00:09:58,030
you're trying to do is say where should
249
00:09:58,030 --> 00:10:01,570
I place the servers or whatnot in order
250
00:10:01,570 --> 00:10:03,790
to let you improve so you define a
251
00:10:03,790 --> 00:10:05,500
mathematical efficiency and say where
252
00:10:05,500 --> 00:10:07,000
should I place the servers or improve
253
00:10:07,000 --> 00:10:08,560
the efficiency okay
254
00:10:08,560 --> 00:10:10,120
now that sounds very nice and produces
255
00:10:10,120 --> 00:10:12,220
very complicated papers and whatnot but
256
00:10:12,220 --> 00:10:14,380
there is a really big problem with any
257
00:10:14,380 --> 00:10:17,980
such approaches and the problem is are
258
00:10:17,980 --> 00:10:21,220
you measuring the right thing so what I
259
00:10:21,220 --> 00:10:23,950
mean by that is okay fine so let's think
260
00:10:23,950 --> 00:10:26,590
about how do we define a cost of let's
261
00:10:26,590 --> 00:10:30,820
say yep or what a benefit measure so
262
00:10:30,820 --> 00:10:32,680
let's say I want to put a server and I
263
00:10:32,680 --> 00:10:34,780
need to determine how much do I gain if
264
00:10:34,780 --> 00:10:36,340
I place a server day I need some sort of
265
00:10:36,340 --> 00:10:37,570
a measure like this if I want to solve
266
00:10:37,570 --> 00:10:39,970
an optimization problem okay so how am I
267
00:10:39,970 --> 00:10:42,490
gonna actually measure how much better
268
00:10:42,490 --> 00:10:43,720
things are going to be if I place a
269
00:10:43,720 --> 00:10:45,610
server in a particular place this turns
270
00:10:45,610 --> 00:10:48,490
out to be very problematic right because
271
00:10:48,490 --> 00:10:50,740
it's not something that can be done with
272
00:10:50,740 --> 00:10:52,600
very simple activations I mean you could
273
00:10:52,600 --> 00:10:54,760
try to approximate it and usually these
274
00:10:54,760 --> 00:10:57,670
approximations are quite rough right to
275
00:10:57,670 --> 00:10:59,710
a large extent until you actually place
276
00:10:59,710 --> 00:11:02,860
the server there and you try to do
277
00:11:02,860 --> 00:11:04,450
whatever is it that you're doing and you
278
00:11:04,450 --> 00:11:05,710
measure how much money you spend you
279
00:11:05,710 --> 00:11:06,910
can't really tell how much money you
280
00:11:06,910 --> 00:11:08,920
spend or let's say you want to improve
281
00:11:08,920 --> 00:11:10,180
things like latency on other things
282
00:11:10,180 --> 00:11:12,190
until you really place it there are so
283
00:11:12,190 --> 00:11:13,780
many factors it's so complicated that
284
00:11:13,780 --> 00:11:16,450
the the real setup is so complicated you
285
00:11:16,450 --> 00:11:18,250
can't know for sure what's going on in
286
00:11:18,250 --> 00:11:19,060
there okay
287
00:11:19,060 --> 00:11:20,740
by the way there is a counterpart
288
00:11:20,740 --> 00:11:22,690
problem in databases where you want to
289
00:11:22,690 --> 00:11:25,030
pick the best plan to execute a query
290
00:11:25,030 --> 00:11:27,700
of course that assumes that you can in
291
00:11:27,700 --> 00:11:29,950
fact tell how good the plan is if I just
292
00:11:29,950 --> 00:11:31,300
give you the plan which turns out to be
293
00:11:31,300 --> 00:11:34,690
absolutely not true okay so this points
294
00:11:34,690 --> 00:11:36,760
out to something that it's usually not
295
00:11:36,760 --> 00:11:38,680
mentioned especially in the in the more
296
00:11:38,680 --> 00:11:41,740
theoretical literature which is trying
297
00:11:41,740 --> 00:11:43,960
too hard is not worth it even if you can
298
00:11:43,960 --> 00:11:46,840
actually solve perfectly the
299
00:11:46,840 --> 00:11:48,460
optimization problem you might still
300
00:11:48,460 --> 00:11:49,810
solve the wrong problem because you
301
00:11:49,810 --> 00:11:51,940
can't measure you can't tell for sure
302
00:11:51,940 --> 00:11:53,770
how costly it's gonna be to do a certain
303
00:11:53,770 --> 00:11:55,810
activity in terms of anything you want
304
00:11:55,810 --> 00:11:59,320
right you can define measures for which
305
00:11:59,320 --> 00:12:01,990
you can solve some of these problems but
306
00:12:01,990 --> 00:12:03,520
they might not correspond to anything
307
00:12:03,520 --> 00:12:06,010
that's in real life right it's really
308
00:12:06,010 --> 00:12:07,270
one of the big problems plaguing
309
00:12:07,270 --> 00:12:09,130
databases I'm thinking any complex
310
00:12:09,130 --> 00:12:11,860
system right so even ask yourself hey if
311
00:12:11,860 --> 00:12:13,360
I place another server let's say in the
312
00:12:13,360 --> 00:12:14,800
department I mean could I tell for sure
313
00:12:14,800 --> 00:12:16,360
that some latency is reduce or some
314
00:12:16,360 --> 00:12:17,560
other things well not really because
315
00:12:17,560 --> 00:12:19,030
it's it's so complicated things go
316
00:12:19,030 --> 00:12:20,710
through so many wires I mean so much
317
00:12:20,710 --> 00:12:22,000
stuff you can get some rough
318
00:12:22,000 --> 00:12:24,610
approximation right so that also
319
00:12:24,610 --> 00:12:26,680
suggests that if your approximation then
320
00:12:26,680 --> 00:12:28,030
by the way approximations within a
321
00:12:28,030 --> 00:12:30,190
constant factor are hard right so for
322
00:12:30,190 --> 00:12:32,170
example in databases nobody can
323
00:12:32,170 --> 00:12:33,610
guarantee that they can approximate the
324
00:12:33,610 --> 00:12:35,530
cost of a query within any constant
325
00:12:35,530 --> 00:12:38,320
sector right including let's say ten you
326
00:12:38,320 --> 00:12:40,300
might not even be able to tell roughly
327
00:12:40,300 --> 00:12:42,820
the order of magnitude for the time
328
00:12:42,820 --> 00:12:45,370
running a query it's that bad so that's
329
00:12:45,370 --> 00:12:47,670
the case insisting go on an actual
330
00:12:47,670 --> 00:12:49,930
optimal solution whatever that means for
331
00:12:49,930 --> 00:12:51,850
such problems it's foolish in my opinion
332
00:12:51,850 --> 00:12:55,060
okay so basically all this discussion
333
00:12:55,060 --> 00:12:58,570
justifies in a heavy way heuristics and
334
00:12:58,570 --> 00:13:00,100
this is in the end what this boils down
335
00:13:00,100 --> 00:13:01,720
to is a lot of common sense and some
336
00:13:01,720 --> 00:13:03,160
reasonable ballistics where you do it
337
00:13:03,160 --> 00:13:04,900
unless you recommend they have some
338
00:13:04,900 --> 00:13:06,670
secret sauce in which to do some at
339
00:13:06,670 --> 00:13:08,380
least they look cool whatever is it that
340
00:13:08,380 --> 00:13:11,980
they are doing right well it turns out
341
00:13:11,980 --> 00:13:13,510
that their algorithms are in fact
342
00:13:13,510 --> 00:13:15,520
approximation algorithms that are kind
343
00:13:15,520 --> 00:13:18,370
of self adaptive that turn to kind of
344
00:13:18,370 --> 00:13:21,820
sense how things happen measure as you
345
00:13:21,820 --> 00:13:25,270
go and change and whatever other stuff
346
00:13:25,270 --> 00:13:26,440
they are doing I'm not gonna go there I
347
00:13:26,440 --> 00:13:28,420
mean it's done by extremely
348
00:13:28,420 --> 00:13:30,760
sophisticated people from MIT and a lot
349
00:13:30,760 --> 00:13:32,050
of it is secret sauce they don't even
350
00:13:32,050 --> 00:13:34,660
bother to patent anything that's on all
351
00:13:34,660 --> 00:13:35,980
right so this is the story with a
352
00:13:35,980 --> 00:13:37,930
replica server placement but it's a real
353
00:13:37,930 --> 00:13:38,920
problem
354
00:13:38,920 --> 00:13:43,000
right now there is a bigger problem that
355
00:13:43,000 --> 00:13:45,610
in general has to be solved which is how
356
00:13:45,610 --> 00:13:48,010
do you grow capacity depending on things
357
00:13:48,010 --> 00:13:51,310
like the load right so this is not even
358
00:13:51,310 --> 00:13:53,050
so much the static scenario you have
359
00:13:53,050 --> 00:13:54,310
here in which you already know where
360
00:13:54,310 --> 00:13:55,779
people are and how they are and you
361
00:13:55,779 --> 00:13:57,940
decided to place another server you
362
00:13:57,940 --> 00:14:00,160
might need to add capacity just because
363
00:14:00,160 --> 00:14:01,540
there are spikes in the usage and this
364
00:14:01,540 --> 00:14:02,829
happens all the time I mean this is one
365
00:14:02,829 --> 00:14:06,310
of the big things that goes in favor of
366
00:14:06,310 --> 00:14:08,589
any cloud-based service right they say
367
00:14:08,589 --> 00:14:10,750
hey no problem you can grow capacity by
368
00:14:10,750 --> 00:14:12,040
essentially just clicking a couple of
369
00:14:12,040 --> 00:14:13,449
buttons and then you fire up more
370
00:14:13,449 --> 00:14:17,800
websites on a more basically on more
371
00:14:17,800 --> 00:14:19,510
servers and this is kind of the promise
372
00:14:19,510 --> 00:14:21,670
that Amazon ec2 is making and is the
373
00:14:21,670 --> 00:14:24,310
promise that Microsoft Azure makes and
374
00:14:24,310 --> 00:14:26,740
so on right they say hey you can you can
375
00:14:26,740 --> 00:14:29,019
spawn off as many as many service
376
00:14:29,019 --> 00:14:31,360
service as maybe for a short period of
377
00:14:31,360 --> 00:14:34,060
time right so I mean that's one way to
378
00:14:34,060 --> 00:14:37,899
go about it but again all of these
379
00:14:37,899 --> 00:14:42,100
solutions have to be treated carefully a
380
00:14:42,100 --> 00:14:43,449
big problem of course with cloud-based
381
00:14:43,449 --> 00:14:45,370
services capacity growing you're
382
00:14:45,370 --> 00:14:48,220
essentially renting capacity is if you
383
00:14:48,220 --> 00:14:50,140
have an outage at the entire Amazon
384
00:14:50,140 --> 00:14:52,930
level which happened I think about a
385
00:14:52,930 --> 00:14:55,959
month ago then ha I mean a quarter of
386
00:14:55,959 --> 00:14:57,130
the internet goes down because they're
387
00:14:57,130 --> 00:15:00,519
all using easy to write I mean when it's
388
00:15:00,519 --> 00:15:02,079
bad is really bad I mean this is kind of
389
00:15:02,079 --> 00:15:05,680
the situation because an outage now in a
390
00:15:05,680 --> 00:15:07,660
big data center doesn't only affect that
391
00:15:07,660 --> 00:15:10,390
company affects everybody who was using
392
00:15:10,390 --> 00:15:16,740
their cloud service all right now
393
00:15:16,740 --> 00:15:19,480
replication especially for performance
394
00:15:19,480 --> 00:15:21,730
reasons is one of the main ways to
395
00:15:21,730 --> 00:15:23,440
improve performance so it's extremely
396
00:15:23,440 --> 00:15:25,120
important to understand refinements in
397
00:15:25,120 --> 00:15:26,709
what can be done with replication and
398
00:15:26,709 --> 00:15:28,300
this replication it's everywhere this is
399
00:15:28,300 --> 00:15:29,890
what I've been arguing for all the class
400
00:15:29,890 --> 00:15:33,940
and is mostly everywhere in the form of
401
00:15:33,940 --> 00:15:38,019
some sort of caching okay so it's worth
402
00:15:38,019 --> 00:15:40,420
talking about some tiny decimal points
403
00:15:40,420 --> 00:15:41,829
in there to see what choices you have
404
00:15:41,829 --> 00:15:43,959
when you do various things and I'm gonna
405
00:15:43,959 --> 00:15:45,310
do something that the textbook writers
406
00:15:45,310 --> 00:15:48,279
don't namely mention equivalent issues
407
00:15:48,279 --> 00:15:50,050
when it comes to computer architectures
408
00:15:50,050 --> 00:15:51,579
things that happen in
409
00:15:51,579 --> 00:15:54,249
inside the processor for example so it's
410
00:15:54,249 --> 00:15:55,389
kind of strange is why would the
411
00:15:55,389 --> 00:15:57,220
distributed system have any similarity
412
00:15:57,220 --> 00:15:58,600
what happens inside the processor well
413
00:15:58,600 --> 00:16:00,550
processors themselves are complex
414
00:16:00,550 --> 00:16:02,860
distributed systems nowadays believe it
415
00:16:02,860 --> 00:16:04,149
or not especially when it comes to
416
00:16:04,149 --> 00:16:05,769
multi-core and a lot of these issues
417
00:16:05,769 --> 00:16:07,389
have to be solved inside the processor
418
00:16:07,389 --> 00:16:08,199
you might have different solutions
419
00:16:08,199 --> 00:16:09,639
because you have a different trade-off
420
00:16:09,639 --> 00:16:11,649
but is the same issue in the first place
421
00:16:11,649 --> 00:16:12,759
ok so I'll point out some of those
422
00:16:12,759 --> 00:16:16,990
things so when it comes to to
423
00:16:16,990 --> 00:16:18,819
replication right there are a number of
424
00:16:18,819 --> 00:16:21,100
things you can do so the issue is what
425
00:16:21,100 --> 00:16:25,660
the issue is when right happen when the
426
00:16:25,660 --> 00:16:28,839
content changes how do you make the new
427
00:16:28,839 --> 00:16:32,800
content available and this I mean this
428
00:16:32,800 --> 00:16:34,179
is really the true solution because if
429
00:16:34,179 --> 00:16:35,949
you have a stable scenario in which
430
00:16:35,949 --> 00:16:38,559
nothing changes and you're done
431
00:16:38,559 --> 00:16:40,269
replicating whatever that means
432
00:16:40,269 --> 00:16:42,610
then is relatively easy you can keep a
433
00:16:42,610 --> 00:16:44,350
personal copy you keep it very close to
434
00:16:44,350 --> 00:16:45,550
yourself as long as you have enough
435
00:16:45,550 --> 00:16:48,040
storage for that right and then all you
436
00:16:48,040 --> 00:16:50,889
need to do is access the local copy not
437
00:16:50,889 --> 00:16:54,670
a big issue at all ok so that happens
438
00:16:54,670 --> 00:16:57,730
essentially all the times I mean even
439
00:16:57,730 --> 00:16:59,290
basic stuff like for example when you
440
00:16:59,290 --> 00:17:00,549
install the operating system on your
441
00:17:00,549 --> 00:17:02,259
machine and you're not keep on getting
442
00:17:02,259 --> 00:17:04,779
the files from some internet internet
443
00:17:04,779 --> 00:17:07,029
website that in itself is some form of
444
00:17:07,029 --> 00:17:10,059
caching in fact you could actually with
445
00:17:10,059 --> 00:17:12,609
a very small kernel and a distributed
446
00:17:12,609 --> 00:17:14,679
file system in principle get all those
447
00:17:14,679 --> 00:17:16,179
things from the cloud but if they look
448
00:17:16,179 --> 00:17:20,109
so foolish right now the technology is
449
00:17:20,109 --> 00:17:23,109
in the favor of caching when it comes to
450
00:17:23,109 --> 00:17:26,020
size when it comes to capacity because
451
00:17:26,020 --> 00:17:27,849
hard drives are cheaper and cheaper
452
00:17:27,849 --> 00:17:31,840
right if I mean it's almost unheard of
453
00:17:31,840 --> 00:17:33,370
not to have at least half a terabyte
454
00:17:33,370 --> 00:17:35,649
hard drive in essentially any machine
455
00:17:35,649 --> 00:17:39,220
unless it's an iPad or an iPhone right
456
00:17:39,220 --> 00:17:41,200
I mean laptops without at least half a
457
00:17:41,200 --> 00:17:42,460
terrible I mean let unless you have an
458
00:17:42,460 --> 00:17:45,070
SSD maybe you get a 128 but in any case
459
00:17:45,070 --> 00:17:48,520
you have a large amount of storage by at
460
00:17:48,520 --> 00:17:50,409
least 10 years old standards when it
461
00:17:50,409 --> 00:17:52,419
comes to to local machines which
462
00:17:52,419 --> 00:17:54,010
essentially means that one of the big
463
00:17:54,010 --> 00:17:56,169
obstacles to replication is removed
464
00:17:56,169 --> 00:18:02,279
namely storage it's cheap right so yeah
465
00:18:02,279 --> 00:18:04,370
when it comes to
466
00:18:04,370 --> 00:18:06,770
like replication the more problems you
467
00:18:06,770 --> 00:18:09,260
have to solve the harder it gets and the
468
00:18:09,260 --> 00:18:11,840
bigger the mess up potential problems
469
00:18:11,840 --> 00:18:14,660
with replication is decide what to
470
00:18:14,660 --> 00:18:18,940
replicate right so the first one is what
471
00:18:19,000 --> 00:18:23,440
right and I mean that is a how question
472
00:18:23,440 --> 00:18:25,970
right and which we are going to discuss
473
00:18:25,970 --> 00:18:29,960
but at least the what goes away if you
474
00:18:29,960 --> 00:18:31,760
have large stories for the most part
475
00:18:31,760 --> 00:18:34,010
because even if you not sure that it's
476
00:18:34,010 --> 00:18:35,600
gonna help you can still replicate it if
477
00:18:35,600 --> 00:18:37,040
you got your hands on it you can still
478
00:18:37,040 --> 00:18:39,679
replicate it and that's it right so what
479
00:18:39,679 --> 00:18:42,620
I mean by that is in most devices if
480
00:18:42,620 --> 00:18:44,420
unless maybe you are have very very
481
00:18:44,420 --> 00:18:46,520
small devices if you bothered to bring
482
00:18:46,520 --> 00:18:48,650
it you might as well keep it and you can
483
00:18:48,650 --> 00:18:51,170
keep it for a while and this is
484
00:18:51,170 --> 00:18:54,410
literally what cashing in into what web
485
00:18:54,410 --> 00:18:56,570
browsers does what a lot of other
486
00:18:56,570 --> 00:18:59,630
caching techniques would would in fact
487
00:18:59,630 --> 00:19:03,710
do right now the how it's still very
488
00:19:03,710 --> 00:19:06,050
important right especially if you are
489
00:19:06,050 --> 00:19:07,550
concerned about bandwidth and to some
490
00:19:07,550 --> 00:19:09,230
extent you have to be so bandwidth is
491
00:19:09,230 --> 00:19:10,670
going up so you don't need to be
492
00:19:10,670 --> 00:19:12,440
paranoid about bandwidth anymore but
493
00:19:12,440 --> 00:19:14,300
nevertheless there are some concerns now
494
00:19:14,300 --> 00:19:16,100
you don't need to be paranoid with
495
00:19:16,100 --> 00:19:17,690
respect to the bandwidth for the clients
496
00:19:17,690 --> 00:19:18,890
but you still have to be paranoid with
497
00:19:18,890 --> 00:19:20,360
respect to the bandwidth for the servers
498
00:19:20,360 --> 00:19:23,000
by the way depending on how many clients
499
00:19:23,000 --> 00:19:25,550
actually access the same server right so
500
00:19:25,550 --> 00:19:27,800
it's very easy to say yeah and we have
501
00:19:27,800 --> 00:19:29,360
now a lot of bandwidth so on and so
502
00:19:29,360 --> 00:19:32,270
forth well yes but not if for example
503
00:19:32,270 --> 00:19:34,150
you're serving a million people from
504
00:19:34,150 --> 00:19:37,250
just one server which is in principle
505
00:19:37,250 --> 00:19:39,440
possible and some people pull that out
506
00:19:39,440 --> 00:19:41,450
right so if that's the case you really
507
00:19:41,450 --> 00:19:44,630
want to have as little as possible
508
00:19:44,630 --> 00:19:47,300
messages going or on day ok so when it
509
00:19:47,300 --> 00:19:48,559
comes to this kind of replication and
510
00:19:48,559 --> 00:19:53,360
they serve these placements right in the
511
00:19:53,360 --> 00:19:54,830
middle we have this if you want the
512
00:19:54,830 --> 00:19:57,500
permanent replicas is where the content
513
00:19:57,500 --> 00:20:00,410
really lives in the outer ring we have
514
00:20:00,410 --> 00:20:02,240
the clients so somehow the clients have
515
00:20:02,240 --> 00:20:04,340
to get their hands on the data that
516
00:20:04,340 --> 00:20:06,950
resides in the permanent replicas right
517
00:20:06,950 --> 00:20:08,780
so the information has to be propagated
518
00:20:08,780 --> 00:20:11,059
correctly but such propagation can be
519
00:20:11,059 --> 00:20:12,800
initiated into fundamentally different
520
00:20:12,800 --> 00:20:14,600
ways right one of them is client
521
00:20:14,600 --> 00:20:17,210
initiated replicas and the other one is
522
00:20:17,210 --> 00:20:18,169
server initiated
523
00:20:18,169 --> 00:20:22,009
replicas okay so this is pull when the
524
00:20:22,009 --> 00:20:23,539
clients pull the content when they need
525
00:20:23,539 --> 00:20:25,629
it this is Porsche the servers will push
526
00:20:25,629 --> 00:20:28,279
whatever is it that they push to the
527
00:20:28,279 --> 00:20:31,659
clients okay and it turns out that in
528
00:20:31,659 --> 00:20:34,460
most sophisticated applications you in
529
00:20:34,460 --> 00:20:36,230
fact have to use a mix between a pool
530
00:20:36,230 --> 00:20:37,489
and a Porsche I'm gonna explain a little
531
00:20:37,489 --> 00:20:38,749
bit why that's the case because for
532
00:20:38,749 --> 00:20:40,580
certain kind of things it's better to
533
00:20:40,580 --> 00:20:42,080
pull and for certain other kinds of
534
00:20:42,080 --> 00:20:44,359
things is better to push okay now you
535
00:20:44,359 --> 00:20:47,029
can do a pure push based system and a
536
00:20:47,029 --> 00:20:49,970
pure pulled a system but you're you're
537
00:20:49,970 --> 00:20:55,059
essentially giving up on some big
538
00:20:55,059 --> 00:20:56,749
characteristics you might actually care
539
00:20:56,749 --> 00:20:59,899
about okay so let's just think about how
540
00:20:59,899 --> 00:21:02,210
this might actually work so the one
541
00:21:02,210 --> 00:21:03,710
you're most familiar with it's in fact
542
00:21:03,710 --> 00:21:05,659
the client initiated replicas because
543
00:21:05,659 --> 00:21:07,039
it's essentially what's happening by
544
00:21:07,039 --> 00:21:14,389
default in in webex's right so when you
545
00:21:14,389 --> 00:21:16,009
take your browser on any device you have
546
00:21:16,009 --> 00:21:17,749
any point into some kind of a URL what
547
00:21:17,749 --> 00:21:19,039
is it that you're doing you're going to
548
00:21:19,039 --> 00:21:21,139
the server and say give me the content
549
00:21:21,139 --> 00:21:22,129
of this resource
550
00:21:22,129 --> 00:21:23,840
I mean resource usually means some sort
551
00:21:23,840 --> 00:21:26,210
of a URL which you can't even say now
552
00:21:26,210 --> 00:21:27,590
that's a file or something else I mean
553
00:21:27,590 --> 00:21:28,759
it could be something completely cooked
554
00:21:28,759 --> 00:21:32,840
up on the fly right and as a client you
555
00:21:32,840 --> 00:21:35,529
can decide to do some sort of
556
00:21:35,529 --> 00:21:38,239
replication I mean you have to I want
557
00:21:38,239 --> 00:21:40,249
you to understand that there is no ifs
558
00:21:40,249 --> 00:21:41,720
and buts about it you have to do some
559
00:21:41,720 --> 00:21:44,210
sort of replication because you can't
560
00:21:44,210 --> 00:21:47,210
get it every frame right so I mean if
561
00:21:47,210 --> 00:21:48,889
you want to be paranoid literally you
562
00:21:48,889 --> 00:21:50,539
can say you know what I want to have the
563
00:21:50,539 --> 00:21:53,690
freshest content so what I'll do is I
564
00:21:53,690 --> 00:21:55,239
will keep the content 30 milliseconds
565
00:21:55,239 --> 00:21:57,379
exactly how much it takes me to display
566
00:21:57,379 --> 00:22:01,309
a frame on the computer no more I mean
567
00:22:01,309 --> 00:22:03,679
why is 30 milliseconds again below 30
568
00:22:03,679 --> 00:22:05,809
milliseconds human I cannot write if
569
00:22:05,809 --> 00:22:07,249
humans are your clients is 30
570
00:22:07,249 --> 00:22:08,889
milliseconds by the way if you're doing
571
00:22:08,889 --> 00:22:11,720
extremely fast trading using this fast
572
00:22:11,720 --> 00:22:13,580
boss you might decide to cache less than
573
00:22:13,580 --> 00:22:16,399
a millisecond okay so it's all kind of
574
00:22:16,399 --> 00:22:18,259
depending on the application right of
575
00:22:18,259 --> 00:22:19,399
course that would be crazy
576
00:22:19,399 --> 00:22:23,029
I mean imagine your device that you use
577
00:22:23,029 --> 00:22:25,460
to access the internet trying to get
578
00:22:25,460 --> 00:22:27,739
something every 30 milliseconds when the
579
00:22:27,739 --> 00:22:29,599
delays can easily be in half a second to
580
00:22:29,599 --> 00:22:31,430
a second to a lot of the service right
581
00:22:31,430 --> 00:22:33,740
nothing would actually work so you can
582
00:22:33,740 --> 00:22:36,260
think about cashing at many many many
583
00:22:36,260 --> 00:22:38,180
different levels including things like
584
00:22:38,180 --> 00:22:42,530
the sophisticated complicated you UI
585
00:22:42,530 --> 00:22:44,900
that the web browser displays on the
586
00:22:44,900 --> 00:22:48,140
screen it's not only the basic
587
00:22:48,140 --> 00:22:50,120
replication that you might think happens
588
00:22:50,120 --> 00:22:52,280
right namely that you got the HTML page
589
00:22:52,280 --> 00:22:54,080
or whatever javascript code or whatever
590
00:22:54,080 --> 00:22:56,870
is it that you're doing but it's cashing
591
00:22:56,870 --> 00:23:02,080
all the way between the server and
592
00:23:02,080 --> 00:23:06,020
consuming the content right so there is
593
00:23:06,020 --> 00:23:07,610
massive amount of caching that happens
594
00:23:07,610 --> 00:23:09,170
for example in the web browser itself I
595
00:23:09,170 --> 00:23:10,490
mean the web browser has to be as lazy
596
00:23:10,490 --> 00:23:12,380
as possible about what gets updated and
597
00:23:12,380 --> 00:23:14,000
what doesn't get updated and try to keep
598
00:23:14,000 --> 00:23:17,960
as much as possible static not changing
599
00:23:17,960 --> 00:23:20,210
because changing everything all the time
600
00:23:20,210 --> 00:23:22,700
right it's a big performance hog so web
601
00:23:22,700 --> 00:23:24,710
browsers got extremely sophisticated and
602
00:23:24,710 --> 00:23:28,100
determining what not to change okay it's
603
00:23:28,100 --> 00:23:30,260
very easy to change everything you just
604
00:23:30,260 --> 00:23:32,450
say fire up the whole processing from
605
00:23:32,450 --> 00:23:35,690
scratch but if you do so you suffer big
606
00:23:35,690 --> 00:23:37,340
performance problems so finding large
607
00:23:37,340 --> 00:23:39,140
areas of the screen that do not change
608
00:23:39,140 --> 00:23:41,330
for sure and not firing up any kind of
609
00:23:41,330 --> 00:23:43,370
updates in there it's a big performance
610
00:23:43,370 --> 00:23:46,160
improvement measure so that's some form
611
00:23:46,160 --> 00:23:48,890
of caching but your caching now pixels
612
00:23:48,890 --> 00:23:52,550
on the screen if you want not actual
613
00:23:52,550 --> 00:23:54,890
content in the backend all right so it
614
00:23:54,890 --> 00:23:55,940
doesn't matter how you call this
615
00:23:55,940 --> 00:23:58,670
technique being lazy about how these
616
00:23:58,670 --> 00:24:01,520
changes get propagated right it's one of
617
00:24:01,520 --> 00:24:01,910
them
618
00:24:01,910 --> 00:24:04,000
the best ways to improve performance
619
00:24:04,000 --> 00:24:07,550
okay so this is really what client
620
00:24:07,550 --> 00:24:09,650
initiated replicas are but then you have
621
00:24:09,650 --> 00:24:11,660
a big problem and you might not even
622
00:24:11,660 --> 00:24:13,370
realize but these are big questions and
623
00:24:13,370 --> 00:24:15,140
people that wrote even web browsers have
624
00:24:15,140 --> 00:24:17,630
to decide it's clear that you do want to
625
00:24:17,630 --> 00:24:20,480
cache things that tend not to change
626
00:24:20,480 --> 00:24:21,950
because that will significantly improve
627
00:24:21,950 --> 00:24:27,100
performance right so for example the
628
00:24:27,100 --> 00:24:29,420
these web applications right they have
629
00:24:29,420 --> 00:24:31,760
large amounts of JavaScript code I mean
630
00:24:31,760 --> 00:24:33,200
it makes a lot of sense to say I'm gonna
631
00:24:33,200 --> 00:24:35,030
catch that large amount of JavaScript
632
00:24:35,030 --> 00:24:37,520
code because bringing all that all the
633
00:24:37,520 --> 00:24:39,350
time I mean it's just it consumes
634
00:24:39,350 --> 00:24:42,290
resources on the other hand you might
635
00:24:42,290 --> 00:24:44,480
work on stale copies so a big question
636
00:24:44,480 --> 00:24:45,200
when it come
637
00:24:45,200 --> 00:24:46,760
always when it comes to replication is
638
00:24:46,760 --> 00:24:49,370
when to replicate right so we have a
639
00:24:49,370 --> 00:24:55,519
what how and when in particular the big
640
00:24:55,519 --> 00:24:58,700
question is when to invalidate whatever
641
00:24:58,700 --> 00:25:01,130
copy you have right now this problem
642
00:25:01,130 --> 00:25:03,649
it's in fact very old because one of the
643
00:25:03,649 --> 00:25:05,029
main techniques inside computer
644
00:25:05,029 --> 00:25:10,010
architecture systems is caching right so
645
00:25:10,010 --> 00:25:12,950
a big question there is how do you deal
646
00:25:12,950 --> 00:25:16,970
with information that stale so there are
647
00:25:16,970 --> 00:25:19,610
two primary techniques one of them is
648
00:25:19,610 --> 00:25:21,919
before you use a copy for example run
649
00:25:21,919 --> 00:25:23,210
multiple techniques but one of them is
650
00:25:23,210 --> 00:25:25,880
before you use the copy you could try to
651
00:25:25,880 --> 00:25:27,649
check and maybe the check is faster than
652
00:25:27,649 --> 00:25:29,899
than using the resource you can try to
653
00:25:29,899 --> 00:25:31,669
check to see if the resource is stale or
654
00:25:31,669 --> 00:25:34,429
not and that still a push I'm sorry a
655
00:25:34,429 --> 00:25:35,750
pool kind of technique in which the
656
00:25:35,750 --> 00:25:38,240
client is trying to figure out do I have
657
00:25:38,240 --> 00:25:41,750
a stable copy or a good copy okay so
658
00:25:41,750 --> 00:25:49,899
some sort of a check check for freshness
659
00:25:51,649 --> 00:25:54,139
the other one is to say you know what
660
00:25:54,139 --> 00:25:57,139
I'm going to assume that my copy is
661
00:25:57,139 --> 00:26:00,529
fresh but I'm gonna have whoever could
662
00:26:00,529 --> 00:26:01,999
change the content send me some
663
00:26:01,999 --> 00:26:04,879
notification when the copy it's invalid
664
00:26:04,879 --> 00:26:06,679
I think I actually have a picture for
665
00:26:06,679 --> 00:26:11,419
this let me write something like this
666
00:26:11,419 --> 00:26:12,590
I'm not going to go back to the other
667
00:26:12,590 --> 00:26:12,889
one
668
00:26:12,889 --> 00:26:19,129
right so this is notifications and then
669
00:26:19,129 --> 00:26:20,809
of course you have the full technique
670
00:26:20,809 --> 00:26:22,100
that's correspond to a notification is
671
00:26:22,100 --> 00:26:31,309
pushed full changes now when you you're
672
00:26:31,309 --> 00:26:32,899
talking about check freshness you can
673
00:26:32,899 --> 00:26:34,159
have various algorithms to check
674
00:26:34,159 --> 00:26:36,409
freshness and in fact for example web
675
00:26:36,409 --> 00:26:39,379
browsers have at least a workable
676
00:26:39,379 --> 00:26:41,269
solution for that right so that solution
677
00:26:41,269 --> 00:26:42,289
is basically the following they are
678
00:26:42,289 --> 00:26:43,820
trying to be quite lazy about it and try
679
00:26:43,820 --> 00:26:46,220
to guess when they should check for
680
00:26:46,220 --> 00:26:47,509
freshness because the checking for
681
00:26:47,509 --> 00:26:49,549
freshness it's it's a itself expensive
682
00:26:49,549 --> 00:26:51,499
in terms of latency right the latency is
683
00:26:51,499 --> 00:26:53,509
what kills any of these things if you
684
00:26:53,509 --> 00:26:55,070
have enough bandwidth if there might be
685
00:26:55,070 --> 00:26:56,869
almost no difference between checking
686
00:26:56,869 --> 00:26:58,429
for frettin freshness and bringing in
687
00:26:58,429 --> 00:27:01,700
your copy okay so you're trying to be as
688
00:27:01,700 --> 00:27:02,929
lazy as possible for checking for
689
00:27:02,929 --> 00:27:04,460
freshness but in the web browsers for
690
00:27:04,460 --> 00:27:07,789
example if you reload the page several
691
00:27:07,789 --> 00:27:09,679
times you kick in a mechanism that will
692
00:27:09,679 --> 00:27:12,580
bring new copies of everything because
693
00:27:12,580 --> 00:27:16,369
they I mean you need some sort of you
694
00:27:16,369 --> 00:27:18,529
know I don't give me the cached version
695
00:27:18,529 --> 00:27:20,389
because for some reason I'm sure that
696
00:27:20,389 --> 00:27:23,389
there is a better version right and that
697
00:27:23,389 --> 00:27:25,369
will force some sort of okay invalidate
698
00:27:25,369 --> 00:27:26,269
the caches and they'll bring new
699
00:27:26,269 --> 00:27:28,580
versions of these programs okay well I
700
00:27:28,580 --> 00:27:30,320
know this because when you do web
701
00:27:30,320 --> 00:27:31,850
development right this is your best
702
00:27:31,850 --> 00:27:34,700
friend otherwise you keep the changing
703
00:27:34,700 --> 00:27:36,649
the backend code and the front-end
704
00:27:36,649 --> 00:27:38,330
doesn't change anything because it
705
00:27:38,330 --> 00:27:40,639
decides your cache for six days right in
706
00:27:40,639 --> 00:27:42,049
six days my application is going to be
707
00:27:42,049 --> 00:27:44,600
very different so okay now these two
708
00:27:44,600 --> 00:27:46,249
solutions are are very different than
709
00:27:46,249 --> 00:27:47,480
this check for fragment so check for
710
00:27:47,480 --> 00:27:49,549
freshness is basically you lazily at
711
00:27:49,549 --> 00:27:51,110
whatever times you check for fragment
712
00:27:51,110 --> 00:27:52,369
freshness or look for indicators that
713
00:27:52,369 --> 00:27:53,929
something changed and when it changed
714
00:27:53,929 --> 00:27:55,610
you can decide to bring only what you
715
00:27:55,610 --> 00:27:56,720
need okay
716
00:27:56,720 --> 00:27:58,789
so this is completely driven by what's
717
00:27:58,789 --> 00:27:59,899
in fact accessed
718
00:27:59,899 --> 00:28:03,889
okay now this one notifications and post
719
00:28:03,889 --> 00:28:05,210
changes
720
00:28:05,210 --> 00:28:10,580
are driven by the server and then you
721
00:28:10,580 --> 00:28:12,530
have hard problems to solve on the
722
00:28:12,530 --> 00:28:14,570
server for now okay so this hard problem
723
00:28:14,570 --> 00:28:17,300
is one of the hard problems is who
724
00:28:17,300 --> 00:28:19,400
should be pushed the change to either
725
00:28:19,400 --> 00:28:21,160
the notification or the change in
726
00:28:21,160 --> 00:28:23,810
particular who is interested in this
727
00:28:23,810 --> 00:28:25,790
resource so that's not clear at all by
728
00:28:25,790 --> 00:28:28,040
the way so for example web browsers are
729
00:28:28,040 --> 00:28:30,020
not designed to give any indication of
730
00:28:30,020 --> 00:28:31,340
who's interested in such a resource
731
00:28:31,340 --> 00:28:33,710
unless you do special programming right
732
00:28:33,710 --> 00:28:36,650
so the fact that I access some web page
733
00:28:36,650 --> 00:28:38,360
doesn't mean I need to not be notified
734
00:28:38,360 --> 00:28:39,980
every time that web page changes and
735
00:28:39,980 --> 00:28:41,990
that would be madness if that's the
736
00:28:41,990 --> 00:28:44,750
solution used everywhere by most of the
737
00:28:44,750 --> 00:28:47,210
websites right it hey CNN updated their
738
00:28:47,210 --> 00:28:49,310
main web page you should definitely do
739
00:28:49,310 --> 00:28:54,290
something about it but and for example
740
00:28:54,290 --> 00:28:58,670
imagine though that I am watching for
741
00:28:58,670 --> 00:29:01,370
example stocks right the prices on the
742
00:29:01,370 --> 00:29:05,630
stock market right then and while I'm
743
00:29:05,630 --> 00:29:07,130
kind of on the web web page of those
744
00:29:07,130 --> 00:29:09,200
guys whatever that means I might
745
00:29:09,200 --> 00:29:10,400
actually be interested in updates
746
00:29:10,400 --> 00:29:12,860
because I'm interesting those events or
747
00:29:12,860 --> 00:29:14,630
imagine a chat application for example
748
00:29:14,630 --> 00:29:16,700
right in a chat application I really
749
00:29:16,700 --> 00:29:19,390
want to know and somebody else chats
750
00:29:19,390 --> 00:29:23,660
without me keep on saying ok so I waited
751
00:29:23,660 --> 00:29:25,340
already 3 seconds is there something
752
00:29:25,340 --> 00:29:26,360
more and that would be essentially
753
00:29:26,360 --> 00:29:28,430
invalidate my state which is I only know
754
00:29:28,430 --> 00:29:30,530
about my chest and go see if there is
755
00:29:30,530 --> 00:29:31,910
any other chat so for certain
756
00:29:31,910 --> 00:29:34,430
applications like chat notifications or
757
00:29:34,430 --> 00:29:36,380
in fact completely pushing changes make
758
00:29:36,380 --> 00:29:38,710
a lot of sense for other applications
759
00:29:38,710 --> 00:29:41,210
only check freshness might make any kind
760
00:29:41,210 --> 00:29:42,590
of sense right so this is the kind of
761
00:29:42,590 --> 00:29:43,850
things that you would have to ask
762
00:29:43,850 --> 00:29:45,650
yourself almost always when you design
763
00:29:45,650 --> 00:29:47,780
an application which one do you do you
764
00:29:47,780 --> 00:29:49,670
do now when it comes to notifications
765
00:29:49,670 --> 00:29:52,940
and push changes you in fact require a
766
00:29:52,940 --> 00:29:55,040
special facility in the server right
767
00:29:55,040 --> 00:29:57,170
which we did not talk extensively about
768
00:29:57,170 --> 00:29:58,910
but you require so-called stateful
769
00:29:58,910 --> 00:30:01,390
servers
770
00:30:04,940 --> 00:30:07,619
okay so stateful is the opposite of
771
00:30:07,619 --> 00:30:09,959
stateless so it's easier to explain what
772
00:30:09,959 --> 00:30:12,509
stateless means right there is some kind
773
00:30:12,509 --> 00:30:14,549
of gradation in between stateless is
774
00:30:14,549 --> 00:30:17,129
very easy you know right so stay to a
775
00:30:17,129 --> 00:30:24,869
server it's a server that keeps no
776
00:30:24,869 --> 00:30:27,749
information about the clients it gets a
777
00:30:27,749 --> 00:30:29,249
request satisfies the request sends the
778
00:30:29,249 --> 00:30:30,869
result and forgot about everything that
779
00:30:30,869 --> 00:30:32,429
happened now there is no such thing as
780
00:30:32,429 --> 00:30:34,019
purely stateless server because they all
781
00:30:34,019 --> 00:30:36,359
log something and whatever but they are
782
00:30:36,359 --> 00:30:37,859
really not gonna analyze their own logs
783
00:30:37,859 --> 00:30:39,809
so even if they log it's an external
784
00:30:39,809 --> 00:30:41,489
activity that goes on to see what was in
785
00:30:41,489 --> 00:30:44,429
the log right most of the web servers
786
00:30:44,429 --> 00:30:46,049
are designed to be stateless servers
787
00:30:46,049 --> 00:30:49,319
right that by the way that's a really
788
00:30:49,319 --> 00:30:50,669
big problem with some of the modern
789
00:30:50,669 --> 00:30:52,849
applications because the request comes
790
00:30:52,849 --> 00:30:55,139
every request is completely independent
791
00:30:55,139 --> 00:30:57,209
of all the other requests every request
792
00:30:57,209 --> 00:30:59,669
is treated as essentially a pure
793
00:30:59,669 --> 00:31:00,329
function
794
00:31:00,329 --> 00:31:02,369
you got some input you go you cook up
795
00:31:02,369 --> 00:31:04,139
whatever is it that you're doing and you
796
00:31:04,139 --> 00:31:09,619
send back the the result yes I'm sorry
797
00:31:09,619 --> 00:31:14,429
what cookie well okay we'll talk about
798
00:31:14,429 --> 00:31:16,799
the cookies in a second okay so that's
799
00:31:16,799 --> 00:31:18,749
blending things a little bit but cookies
800
00:31:18,749 --> 00:31:20,519
only help with one issue but it's not
801
00:31:20,519 --> 00:31:22,739
gonna add too much State okay one second
802
00:31:22,739 --> 00:31:25,349
right so for example Apache and other
803
00:31:25,349 --> 00:31:26,940
web servers who are designed to be State
804
00:31:26,940 --> 00:31:28,499
or servers now of course you can't
805
00:31:28,499 --> 00:31:30,389
really do anything with purely stateful
806
00:31:30,389 --> 00:31:32,879
servers and normally the usual
807
00:31:32,879 --> 00:31:33,690
architecture is
808
00:31:33,690 --> 00:31:34,889
I mean this is the classic architecture
809
00:31:34,889 --> 00:31:36,779
for a web server and it's extremely
810
00:31:36,779 --> 00:31:38,399
relevant for this kind of issues right
811
00:31:38,399 --> 00:31:43,079
so you have the web browser who talks
812
00:31:43,079 --> 00:31:46,709
through the internet whatever that means
813
00:31:46,709 --> 00:31:48,869
with the server and inside the server
814
00:31:48,869 --> 00:31:50,099
this is the architecture you're gonna
815
00:31:50,099 --> 00:31:51,479
have you're gonna have a party let's say
816
00:31:51,479 --> 00:31:55,979
so let's say the web server right that
817
00:31:55,979 --> 00:31:58,619
actually gets a gets a request if
818
00:31:58,619 --> 00:32:01,829
multiple requests come you essentially
819
00:32:01,829 --> 00:32:03,359
fire up different processes or different
820
00:32:03,359 --> 00:32:04,679
friends depending on the implementation
821
00:32:04,679 --> 00:32:06,119
of the web server but they don't talk to
822
00:32:06,119 --> 00:32:08,159
each other this is the classic web
823
00:32:08,159 --> 00:32:11,099
server architecture the web server is
824
00:32:11,099 --> 00:32:14,069
gonna fire up some sort of a language
825
00:32:14,069 --> 00:32:15,570
behind the back I mean
826
00:32:15,570 --> 00:32:18,059
if the job is easy namely just serve a
827
00:32:18,059 --> 00:32:21,029
file it will grab the grab the file from
828
00:32:21,029 --> 00:32:22,919
the file system and serve it back so it
829
00:32:22,919 --> 00:32:24,779
could go let's say to the file system
830
00:32:24,779 --> 00:32:27,929
and then send it right back well it's
831
00:32:27,929 --> 00:32:31,289
really from the web server that happens
832
00:32:31,289 --> 00:32:32,970
or you could go to for example if you
833
00:32:32,970 --> 00:32:34,379
want to serve dynamic content you might
834
00:32:34,379 --> 00:32:40,080
go to something like PHP alright so PHP
835
00:32:40,080 --> 00:32:42,869
is gonna cook carryout on computation
836
00:32:42,869 --> 00:32:44,429
that's going to do something which I'll
837
00:32:44,429 --> 00:32:45,929
mention in a second the important thing
838
00:32:45,929 --> 00:32:49,649
is PHP itself it's gonna be stateless in
839
00:32:49,649 --> 00:32:52,409
the sense that you create a PHP session
840
00:32:52,409 --> 00:32:54,029
it does something and you destroy the
841
00:32:54,029 --> 00:32:56,549
PHP session so there is no state carried
842
00:32:56,549 --> 00:32:58,799
through the session itself even if you
843
00:32:58,799 --> 00:33:00,629
have tricks for example to keep the PHP
844
00:33:00,629 --> 00:33:02,999
alive more right you're still wiping out
845
00:33:02,999 --> 00:33:05,940
the state now the this is not going to
846
00:33:05,940 --> 00:33:07,529
be very good if you need to for example
847
00:33:07,529 --> 00:33:11,580
keep track of transactions I mean moving
848
00:33:11,580 --> 00:33:13,619
money around buying items and so on so
849
00:33:13,619 --> 00:33:15,210
somebody must keep some sort of state
850
00:33:15,210 --> 00:33:16,979
but that in the traditional architecture
851
00:33:16,979 --> 00:33:18,539
is done through some sort of a database
852
00:33:18,539 --> 00:33:21,359
back-end so you have a database back-end
853
00:33:21,359 --> 00:33:23,190
and in fact what's going to happen is
854
00:33:23,190 --> 00:33:25,379
you have connection to the web server
855
00:33:25,379 --> 00:33:28,080
web server to PHP it's still stateless
856
00:33:28,080 --> 00:33:30,359
PHP is gonna do operations against the
857
00:33:30,359 --> 00:33:31,919
database and it's the only one that
858
00:33:31,919 --> 00:33:33,389
actually maintains state and it's the
859
00:33:33,389 --> 00:33:35,460
only one where states fight with each
860
00:33:35,460 --> 00:33:36,960
other and where you have concurrency
861
00:33:36,960 --> 00:33:41,639
problems and then the whole thing comes
862
00:33:41,639 --> 00:33:45,720
back so this doesn't keep any any big
863
00:33:45,720 --> 00:33:47,399
information this doesn't keep any big
864
00:33:47,399 --> 00:33:48,539
information this keeps all the
865
00:33:48,539 --> 00:33:50,369
information so this is stateful but it's
866
00:33:50,369 --> 00:33:52,649
consolidating the database so any
867
00:33:52,649 --> 00:33:54,509
fighting that goes on and any
868
00:33:54,509 --> 00:33:56,429
consistency issues are already resolved
869
00:33:56,429 --> 00:33:58,200
at the database at the database level
870
00:33:58,200 --> 00:34:01,950
now I did mention before that database
871
00:34:01,950 --> 00:34:04,379
technology is many years ahead in terms
872
00:34:04,379 --> 00:34:06,749
of consistency than any anything else
873
00:34:06,749 --> 00:34:08,190
and this is really why is the preferred
874
00:34:08,190 --> 00:34:11,280
way to if you want solve consistency
875
00:34:11,280 --> 00:34:13,020
problems so you have no consistency
876
00:34:13,020 --> 00:34:14,909
issues here and here for the most part
877
00:34:14,909 --> 00:34:16,949
because they are stateless and you push
878
00:34:16,949 --> 00:34:18,210
all the consistency issues in the
879
00:34:18,210 --> 00:34:20,010
database which is reasonably mature
880
00:34:20,010 --> 00:34:22,199
technology okay now why do I say almost
881
00:34:22,199 --> 00:34:26,849
because of this cookie stuff okay so I
882
00:34:26,849 --> 00:34:28,989
mean what are these cookies
883
00:34:28,989 --> 00:34:31,329
and why are they useful by the way they
884
00:34:31,329 --> 00:34:32,619
are kind of growing away I mean there
885
00:34:32,619 --> 00:34:34,260
are better solutions and cookies now
886
00:34:34,260 --> 00:34:36,639
right I believe you can do much better
887
00:34:36,639 --> 00:34:39,579
than cookies well so this has to do with
888
00:34:39,579 --> 00:34:41,859
the fact that it's extremely annoying to
889
00:34:41,859 --> 00:34:43,629
repeat the same operation many times for
890
00:34:43,629 --> 00:34:45,909
for humans so what I mean by that is
891
00:34:45,909 --> 00:34:48,960
imagine that you need to restrict access
892
00:34:48,960 --> 00:34:51,579
so a big problem with any kind of
893
00:34:51,579 --> 00:34:52,839
security and we are going to talk
894
00:34:52,839 --> 00:34:54,460
extensively about this a big issue with
895
00:34:54,460 --> 00:34:56,500
any kind of security is the moment it
896
00:34:56,500 --> 00:35:00,010
becomes annoying people don't use it ok
897
00:35:00,010 --> 00:35:03,160
so how do you make security secure and
898
00:35:03,160 --> 00:35:05,829
not annoying at the same time all right
899
00:35:05,829 --> 00:35:07,960
so what I mean by that is imagine that
900
00:35:07,960 --> 00:35:12,760
for example you're on eBay yes but every
901
00:35:12,760 --> 00:35:14,349
time you click on another page I'm gonna
902
00:35:14,349 --> 00:35:17,380
ask again for the password validate for
903
00:35:17,380 --> 00:35:19,839
that one request that you're who you are
904
00:35:19,839 --> 00:35:21,760
and go and serve the page I mean first
905
00:35:21,760 --> 00:35:23,380
of all it's not clear at all that I have
906
00:35:23,380 --> 00:35:25,359
a single request on a webpage and I
907
00:35:25,359 --> 00:35:26,680
might have many many different requests
908
00:35:26,680 --> 00:35:27,760
but let's say not all of them need to be
909
00:35:27,760 --> 00:35:30,250
secure ok only only one of the requests
910
00:35:30,250 --> 00:35:31,720
but even typing your password every time
911
00:35:31,720 --> 00:35:36,099
you're gonna go insane right not good so
912
00:35:36,099 --> 00:35:38,589
what cookies helped with is to create
913
00:35:38,589 --> 00:35:41,589
some temporary validation ones you typed
914
00:35:41,589 --> 00:35:43,240
in one of these passwords to say for the
915
00:35:43,240 --> 00:35:45,250
next whatever amount of time or until
916
00:35:45,250 --> 00:35:48,099
but such right is revoked you don't need
917
00:35:48,099 --> 00:35:49,809
to type a password ok now that's a
918
00:35:49,809 --> 00:35:51,940
technique used for example by MACO
919
00:35:51,940 --> 00:35:55,630
extent once you log in as as an admin
920
00:35:55,630 --> 00:35:57,790
for a couple of minutes you're gonna add
921
00:35:57,790 --> 00:35:58,930
me now that's done from the pseudo
922
00:35:58,930 --> 00:36:00,819
program right that's true also on on
923
00:36:00,819 --> 00:36:03,220
Linux once you successfully login into
924
00:36:03,220 --> 00:36:04,869
pseudo depending on the policy set in
925
00:36:04,869 --> 00:36:06,220
the system for a couple of minutes you
926
00:36:06,220 --> 00:36:07,599
still are pseudo and if you keep on
927
00:36:07,599 --> 00:36:09,609
doing things you're fine but the moment
928
00:36:09,609 --> 00:36:11,710
you have a gap of whatever set in the
929
00:36:11,710 --> 00:36:13,930
system and the previous goes away ok now
930
00:36:13,930 --> 00:36:15,910
usually cookies live a lot longer I mean
931
00:36:15,910 --> 00:36:17,920
I'm some websites use cookies that stay
932
00:36:17,920 --> 00:36:21,130
alive a month now that's what cookies
933
00:36:21,130 --> 00:36:22,869
were designed for what they are used for
934
00:36:22,869 --> 00:36:26,559
it's a much more abusive thing ok keep
935
00:36:26,559 --> 00:36:30,010
track of people and whatnot okay so
936
00:36:30,010 --> 00:36:34,200
cookies are to a large extent poor man's
937
00:36:34,200 --> 00:36:38,650
state something right so the server
938
00:36:38,650 --> 00:36:40,270
needs to keep a little bit of state and
939
00:36:40,270 --> 00:36:42,760
they invented these cookies to allow
940
00:36:42,760 --> 00:36:44,590
stay to be kept record so what's
941
00:36:44,590 --> 00:36:46,150
happening with the cookie is by the way
942
00:36:46,150 --> 00:36:48,790
is the server itself is the still
943
00:36:48,790 --> 00:36:51,910
reasonably stateless the cookie it's
944
00:36:51,910 --> 00:36:54,700
stored on the client side and the cookie
945
00:36:54,700 --> 00:36:56,430
it send with every single request
946
00:36:56,430 --> 00:36:59,050
automatically so instead of having the
947
00:36:59,050 --> 00:37:00,880
user type all the time something you
948
00:37:00,880 --> 00:37:02,380
simply have the system present the
949
00:37:02,380 --> 00:37:04,120
cookie which is now cryptographically
950
00:37:04,120 --> 00:37:06,760
secure hopefully right so the cookie
951
00:37:06,760 --> 00:37:09,010
will fly with any request the server
952
00:37:09,010 --> 00:37:10,900
will check that the cookies valid that
953
00:37:10,900 --> 00:37:13,390
will be if you want the certificate that
954
00:37:13,390 --> 00:37:14,800
yes you can access the website under
955
00:37:14,800 --> 00:37:17,050
certain credentials but it's not
956
00:37:17,050 --> 00:37:18,460
something that the user does which is
957
00:37:18,460 --> 00:37:21,130
important so a lot of issues related to
958
00:37:21,130 --> 00:37:22,660
security have to be solved from these
959
00:37:22,660 --> 00:37:24,970
automated mechanisms right because that
960
00:37:24,970 --> 00:37:26,710
will make it a lot more bearable for the
961
00:37:26,710 --> 00:37:27,850
user increases a little bit the
962
00:37:27,850 --> 00:37:29,650
bandwidth and maybe decreases mildly
963
00:37:29,650 --> 00:37:33,240
performers not even that much okay and
964
00:37:33,240 --> 00:37:35,620
the web server itself it basically is
965
00:37:35,620 --> 00:37:37,180
going to store the valid cookie
966
00:37:37,180 --> 00:37:38,710
somewhere and compare any cookie with
967
00:37:38,710 --> 00:37:39,970
the valley cookies to see if it's valid
968
00:37:39,970 --> 00:37:42,340
so has a little bit of state okay now it
969
00:37:42,340 --> 00:37:43,690
turns out that this is not the only way
970
00:37:43,690 --> 00:37:45,460
you could do things you can in fact have
971
00:37:45,460 --> 00:37:48,190
and sometimes it is highly desirable to
972
00:37:48,190 --> 00:37:50,260
do so you can in fact have staked full
973
00:37:50,260 --> 00:37:52,810
servers okay
974
00:37:52,810 --> 00:37:55,210
so state full server it's a server that
975
00:37:55,210 --> 00:37:57,180
will stay alive all the time by the way
976
00:37:57,180 --> 00:38:00,040
this server it's only kind of allies it
977
00:38:00,040 --> 00:38:01,750
listens on the port but it doesn't have
978
00:38:01,750 --> 00:38:03,610
that the stayed up right and in fact has
979
00:38:03,610 --> 00:38:05,170
to fire a big machinery or maybe it's
980
00:38:05,170 --> 00:38:06,730
doing some kind of clever tricks not to
981
00:38:06,730 --> 00:38:08,530
fire up so much so much but for the most
982
00:38:08,530 --> 00:38:10,090
part it's wiping out his memory on every
983
00:38:10,090 --> 00:38:11,770
request you could in fact have servers
984
00:38:11,770 --> 00:38:14,350
that don't wipe out anything and in fact
985
00:38:14,350 --> 00:38:16,420
try to aggressively keep things in
986
00:38:16,420 --> 00:38:17,920
memory to speed things up so one
987
00:38:17,920 --> 00:38:19,180
particular technique for the server
988
00:38:19,180 --> 00:38:22,470
would be don't be still so stateless
989
00:38:22,470 --> 00:38:25,570
cache yourself a lot of the things that
990
00:38:25,570 --> 00:38:28,360
otherwise would be used in memory right
991
00:38:28,360 --> 00:38:29,620
because then you don't have to go to
992
00:38:29,620 --> 00:38:31,300
your desk or something else and that
993
00:38:31,300 --> 00:38:32,680
will speed things up at least on the
994
00:38:32,680 --> 00:38:35,620
server side right and in fact that's
995
00:38:35,620 --> 00:38:38,170
what is needed for something like like
996
00:38:38,170 --> 00:38:40,060
chatting right so when it comes to
997
00:38:40,060 --> 00:38:41,920
chatting you really want to do this push
998
00:38:41,920 --> 00:38:44,500
changes but if you do push changes then
999
00:38:44,500 --> 00:38:46,230
the server cannot be stateless really
1000
00:38:46,230 --> 00:38:48,970
right now you could still do something
1001
00:38:48,970 --> 00:38:50,350
like this in which all the state is
1002
00:38:50,350 --> 00:38:53,470
pushing the database right but I mean a
1003
00:38:53,470 --> 00:38:53,940
big
1004
00:38:53,940 --> 00:38:56,550
you still I mean the question the big
1005
00:38:56,550 --> 00:38:59,300
question there is how do you push sake
1006
00:38:59,300 --> 00:39:03,120
so the pool it's easy the client makes
1007
00:39:03,120 --> 00:39:05,480
the request opens a tcp/ip connection or
1008
00:39:05,480 --> 00:39:08,040
it goes through this guy was listening
1009
00:39:08,040 --> 00:39:10,080
fire up one of this fire up one of this
1010
00:39:10,080 --> 00:39:11,580
depending on which part of the web site
1011
00:39:11,580 --> 00:39:13,560
you access fire up the connection to the
1012
00:39:13,560 --> 00:39:15,390
database the database is always up right
1013
00:39:15,390 --> 00:39:17,670
that guy stays up right and then the
1014
00:39:17,670 --> 00:39:19,650
reply goes back but if it comes to
1015
00:39:19,650 --> 00:39:23,100
pushing how do you push so you must have
1016
00:39:23,100 --> 00:39:26,760
built-in mechanisms to produce such such
1017
00:39:26,760 --> 00:39:28,050
okay
1018
00:39:28,050 --> 00:39:29,640
now before I go into details I mean
1019
00:39:29,640 --> 00:39:30,900
point an interesting use of
1020
00:39:30,900 --> 00:39:32,730
notifications which is highly
1021
00:39:32,730 --> 00:39:34,200
non-obvious so not if the difference
1022
00:39:34,200 --> 00:39:35,730
between notification and push changes is
1023
00:39:35,730 --> 00:39:38,220
you're not really saying here is
1024
00:39:38,220 --> 00:39:40,140
everything that changed in the
1025
00:39:40,140 --> 00:39:41,580
notification you just say the cop is no
1026
00:39:41,580 --> 00:39:42,510
longer valid
1027
00:39:42,510 --> 00:39:44,730
you're not saying how it changed you
1028
00:39:44,730 --> 00:39:46,500
just say your copy is not valid which
1029
00:39:46,500 --> 00:39:48,000
essentially means notifications have to
1030
00:39:48,000 --> 00:39:50,190
be paired up with a mechanism that pulls
1031
00:39:50,190 --> 00:39:52,440
another copy so it's a it's a push
1032
00:39:52,440 --> 00:39:54,690
notification pull request when you need
1033
00:39:54,690 --> 00:39:57,990
it right so such a mechanism would
1034
00:39:57,990 --> 00:40:00,420
essentially indicate that oh yeah use
1035
00:40:00,420 --> 00:40:02,250
that javascript file that you were
1036
00:40:02,250 --> 00:40:04,020
accessing before it's no longer fresh
1037
00:40:04,020 --> 00:40:05,280
you need to bring another one before you
1038
00:40:05,280 --> 00:40:06,000
do anything else
1039
00:40:06,000 --> 00:40:11,610
okay so notifications are in fact used
1040
00:40:11,610 --> 00:40:13,650
in the memory aren't here at your most
1041
00:40:13,650 --> 00:40:16,380
modern processors okay the reason is the
1042
00:40:16,380 --> 00:40:17,550
following so by the way it has a
1043
00:40:17,550 --> 00:40:19,470
completely different name and and it's
1044
00:40:19,470 --> 00:40:21,480
called cache invalidation
1045
00:40:21,480 --> 00:40:23,070
and it's part of the cache coherency
1046
00:40:23,070 --> 00:40:24,570
protocol right so as part of the
1047
00:40:24,570 --> 00:40:26,960
architecture has something called cache
1048
00:40:26,960 --> 00:40:31,710
coherency and one way to do cache
1049
00:40:31,710 --> 00:40:36,260
coherence is to do cache invalidation
1050
00:40:38,910 --> 00:40:41,320
right so literally cache invalidation
1051
00:40:41,320 --> 00:40:43,600
it's a mechanism in which the processor
1052
00:40:43,600 --> 00:40:46,120
will say this things that I'm caching is
1053
00:40:46,120 --> 00:40:47,800
definitely not good anymore I need to go
1054
00:40:47,800 --> 00:40:50,230
bring another copy when somebody
1055
00:40:50,230 --> 00:40:53,260
accesses it okay so it turns out I mean
1056
00:40:53,260 --> 00:40:55,180
the actual way things happen it's a
1057
00:40:55,180 --> 00:40:56,980
little a little bit different but in
1058
00:40:56,980 --> 00:41:00,570
fact processors are snooping on the
1059
00:41:00,570 --> 00:41:02,410
communication channel which is actually
1060
00:41:02,410 --> 00:41:04,030
very complicated so the snooping is a
1061
00:41:04,030 --> 00:41:06,430
complicated thing right and in fact are
1062
00:41:06,430 --> 00:41:11,080
listening for any activity that overlaps
1063
00:41:11,080 --> 00:41:12,610
with what they actually cache and if
1064
00:41:12,610 --> 00:41:14,080
they see any activity they invalidate
1065
00:41:14,080 --> 00:41:15,460
their own cache so they know then we
1066
00:41:15,460 --> 00:41:18,750
need to go and grab the real copy when
1067
00:41:18,750 --> 00:41:21,340
when they need one now this is a better
1068
00:41:21,340 --> 00:41:23,710
mechanism that in fact paying attention
1069
00:41:23,710 --> 00:41:25,540
to what's going on and pushing their own
1070
00:41:25,540 --> 00:41:28,680
changes right because that's a lot more
1071
00:41:28,680 --> 00:41:31,240
resource consuming right so it's again
1072
00:41:31,240 --> 00:41:33,100
this difference between notification and
1073
00:41:33,100 --> 00:41:35,350
push changes so instead of pushing the
1074
00:41:35,350 --> 00:41:37,140
food changes forget about that
1075
00:41:37,140 --> 00:41:38,860
essentially what you're doing is you're
1076
00:41:38,860 --> 00:41:40,750
betting that you don't necessarily need
1077
00:41:40,750 --> 00:41:42,670
that resource right away you just want
1078
00:41:42,670 --> 00:41:44,050
to ensure a certain kind of correctness
1079
00:41:44,050 --> 00:41:46,930
for for processors that share the same
1080
00:41:46,930 --> 00:41:48,730
memory for quarters that share the same
1081
00:41:48,730 --> 00:41:50,590
memory this is extremely important right
1082
00:41:50,590 --> 00:41:52,000
because if you wouldn't enforce this
1083
00:41:52,000 --> 00:41:53,320
kind of cache invalidation and to make
1084
00:41:53,320 --> 00:41:55,030
it correct you're gonna have arbitrarily
1085
00:41:55,030 --> 00:41:56,620
bad behavior going on in there I mean
1086
00:41:56,620 --> 00:41:58,720
your work on somebody changes something
1087
00:41:58,720 --> 00:42:01,090
in memory but in fact some other
1088
00:42:01,090 --> 00:42:02,770
processor was catching it and it doesn't
1089
00:42:02,770 --> 00:42:06,040
even notice that's a disaster but I want
1090
00:42:06,040 --> 00:42:07,060
you to understand that this cache
1091
00:42:07,060 --> 00:42:08,320
coherency and cache invalidation
1092
00:42:08,320 --> 00:42:10,780
protocols have the same problems at a
1093
00:42:10,780 --> 00:42:12,910
high level as you have with these things
1094
00:42:12,910 --> 00:42:14,470
I mean a lot of it gives you just get to
1095
00:42:14,470 --> 00:42:15,880
keep up to date and this is why for
1096
00:42:15,880 --> 00:42:17,440
example you do stay here shift now at
1097
00:42:17,440 --> 00:42:20,440
least for certain products away from
1098
00:42:20,440 --> 00:42:22,780
cache coherency protocols right so for
1099
00:42:22,780 --> 00:42:27,400
example the PlayStation the PlayStation
1100
00:42:27,400 --> 00:42:29,290
3 the Cell processor in PlayStation 3
1101
00:42:29,290 --> 00:42:32,200
the actual course the computation of
1102
00:42:32,200 --> 00:42:34,390
course have no cash whatsoever the
1103
00:42:34,390 --> 00:42:36,100
reason is this is very expensive so they
1104
00:42:36,100 --> 00:42:38,170
have to do explicit transfers transfer
1105
00:42:38,170 --> 00:42:40,150
this big block of memory right so it's
1106
00:42:40,150 --> 00:42:42,280
more like not even check for frettin
1107
00:42:42,280 --> 00:42:44,410
freshness but it's kind of planned you
1108
00:42:44,410 --> 00:42:46,300
do this I transfer this portion I do
1109
00:42:46,300 --> 00:42:48,430
something and I push it okay all right
1110
00:42:48,430 --> 00:42:50,770
now coming back to our power chat server
1111
00:42:50,770 --> 00:42:51,980
right of course you want
1112
00:42:51,980 --> 00:42:53,930
The Fool push changes well it kind of
1113
00:42:53,930 --> 00:42:56,380
depends right so you want to pour
1114
00:42:56,380 --> 00:42:58,520
changes if they are relatively small but
1115
00:42:58,520 --> 00:43:01,359
maybe some notification or only a
1116
00:43:01,359 --> 00:43:03,470
skeleton for the changes they are really
1117
00:43:03,470 --> 00:43:06,890
big right being lazy about these things
1118
00:43:06,890 --> 00:43:08,270
pays off big time
1119
00:43:08,270 --> 00:43:10,310
right so if you all the time you have to
1120
00:43:10,310 --> 00:43:12,260
transport big things just in case the
1121
00:43:12,260 --> 00:43:13,670
user wants to use them that's not a
1122
00:43:13,670 --> 00:43:16,220
particularly good policy so always when
1123
00:43:16,220 --> 00:43:17,630
you design applications like this here I
1124
00:43:17,630 --> 00:43:19,480
have to think about how big the
1125
00:43:19,480 --> 00:43:22,010
propagation the information is if it's
1126
00:43:22,010 --> 00:43:23,420
very big I might be better off by
1127
00:43:23,420 --> 00:43:24,890
knowing that it changed maybe not quite
1128
00:43:24,890 --> 00:43:26,390
notification the important stuff gets
1129
00:43:26,390 --> 00:43:28,190
pushed right so an intermediate solution
1130
00:43:28,190 --> 00:43:30,440
between a post change a notification so
1131
00:43:30,440 --> 00:43:31,790
the things that are more visible get
1132
00:43:31,790 --> 00:43:33,619
pushed but the big content is not so if
1133
00:43:33,619 --> 00:43:34,970
you're trying to access it you already
1134
00:43:34,970 --> 00:43:37,369
in fact invalidated the cache for that
1135
00:43:37,369 --> 00:43:38,990
one and you do some sort of a separate
1136
00:43:38,990 --> 00:43:40,820
request to get that to get that content
1137
00:43:40,820 --> 00:43:42,109
and that seems to be a better solution
1138
00:43:42,109 --> 00:43:46,130
okay so in order to support this kind of
1139
00:43:46,130 --> 00:43:49,790
things right notification based kind of
1140
00:43:49,790 --> 00:43:52,400
activities for example there is a brand
1141
00:43:52,400 --> 00:43:54,470
well not quite brand new it's about five
1142
00:43:54,470 --> 00:43:55,850
years old maybe about seven years old
1143
00:43:55,850 --> 00:43:57,109
now right there is a new mechanism
1144
00:43:57,109 --> 00:44:00,710
supported by the web browsers to allow
1145
00:44:00,710 --> 00:44:03,170
you to in fact have a continuous
1146
00:44:03,170 --> 00:44:04,490
communication with the backend and get
1147
00:44:04,490 --> 00:44:07,340
this kind of notifications okay and this
1148
00:44:07,340 --> 00:44:10,160
is a cold web services well is it
1149
00:44:10,160 --> 00:44:13,270
novel of socket
1150
00:44:15,270 --> 00:44:18,700
so the normal operation in a web browser
1151
00:44:18,700 --> 00:44:21,490
is a tcp/ip connection that goes from
1152
00:44:21,490 --> 00:44:24,460
the client make the request and then
1153
00:44:24,460 --> 00:44:26,740
finishes with some gizmos to make it
1154
00:44:26,740 --> 00:44:28,630
faster potentially for example bonding
1155
00:44:28,630 --> 00:44:30,369
and so on but they are still treated
1156
00:44:30,369 --> 00:44:31,960
completely independently ok
1157
00:44:31,960 --> 00:44:33,670
WebSockets are completely different
1158
00:44:33,670 --> 00:44:35,200
WebSockets are a point-to-point
1159
00:44:35,200 --> 00:44:36,730
connection between the client and the
1160
00:44:36,730 --> 00:44:38,589
server in which the communication goes
1161
00:44:38,589 --> 00:44:41,980
both ways okay but in a strange way the
1162
00:44:41,980 --> 00:44:43,750
client itself is now a mini server you
1163
00:44:43,750 --> 00:44:45,640
literally have to listen on your own web
1164
00:44:45,640 --> 00:44:48,130
socket to see what goes on right so you
1165
00:44:48,130 --> 00:44:49,720
can in fact say you know what listen on
1166
00:44:49,720 --> 00:44:51,130
the web socket and when something comes
1167
00:44:51,130 --> 00:44:52,900
up I can look at what come up and then
1168
00:44:52,900 --> 00:44:54,310
do whatever is it that I'm doing now of
1169
00:44:54,310 --> 00:44:56,859
course this web socket is not possible
1170
00:44:56,859 --> 00:45:01,329
without JavaScript you must have an
1171
00:45:01,329 --> 00:45:03,400
active program here and not just some
1172
00:45:03,400 --> 00:45:07,569
sort of simple client that gets an HTML
1173
00:45:07,569 --> 00:45:09,790
page and renders it you must have some
1174
00:45:09,790 --> 00:45:11,980
program running right in order to do
1175
00:45:11,980 --> 00:45:13,869
something like WebSockets but if you do
1176
00:45:13,869 --> 00:45:16,869
then a lot of things become much more
1177
00:45:16,869 --> 00:45:18,369
straightforward for example like a chat
1178
00:45:18,369 --> 00:45:20,230
application without the chat application
1179
00:45:20,230 --> 00:45:21,339
all right in the good old days to
1180
00:45:21,339 --> 00:45:22,660
implement a chat application you have to
1181
00:45:22,660 --> 00:45:24,720
do something very nasty you had to
1182
00:45:24,720 --> 00:45:29,050
artificially invalidate the HTML page by
1183
00:45:29,050 --> 00:45:31,240
setting a very short time for the cache
1184
00:45:31,240 --> 00:45:33,970
in just a couple of seconds to make
1185
00:45:33,970 --> 00:45:35,710
another request and then the server
1186
00:45:35,710 --> 00:45:37,720
somehow to figure out how to render the
1187
00:45:37,720 --> 00:45:39,579
new HTML page to reflect the new changes
1188
00:45:39,579 --> 00:45:42,579
right that produces extremely nasty
1189
00:45:42,579 --> 00:45:44,650
behavior I mean first of all when the
1190
00:45:44,650 --> 00:45:46,540
page gets invalidated the most web
1191
00:45:46,540 --> 00:45:48,700
servers will make the whole page flicker
1192
00:45:48,700 --> 00:45:51,490
and this is how you know if they do
1193
00:45:51,490 --> 00:45:53,440
magic with some of this stuff and maybe
1194
00:45:53,440 --> 00:45:55,660
JavaScript to render parts of the page
1195
00:45:55,660 --> 00:45:58,480
or they load the whole page if your
1196
00:45:58,480 --> 00:46:01,180
entire page flickers and you see it get
1197
00:46:01,180 --> 00:46:04,630
black I mean almost blank and again you
1198
00:46:04,630 --> 00:46:07,630
know they are loading a full page if you
1199
00:46:07,630 --> 00:46:09,609
click around and nothing flickers except
1200
00:46:09,609 --> 00:46:11,230
some content changes here and there you
1201
00:46:11,230 --> 00:46:12,099
know that they are doing JavaScript
1202
00:46:12,099 --> 00:46:14,260
magic and quite possibly using this kind
1203
00:46:14,260 --> 00:46:16,510
of WebSockets so with by the way the
1204
00:46:16,510 --> 00:46:18,220
technology from the point of view of the
1205
00:46:18,220 --> 00:46:19,750
user is quite simple in the in the
1206
00:46:19,750 --> 00:46:21,490
WebSockets I mean very simple API will
1207
00:46:21,490 --> 00:46:22,750
allow you to basically register a
1208
00:46:22,750 --> 00:46:24,550
listener it's exactly like a TCP
1209
00:46:24,550 --> 00:46:27,010
connection you register a listener you
1210
00:46:27,010 --> 00:46:28,750
have your own way to speak on the
1211
00:46:28,750 --> 00:46:30,340
the wire but you register listen and I
1212
00:46:30,340 --> 00:46:33,100
say hey when something comes get the
1213
00:46:33,100 --> 00:46:34,540
content and then do whatever you want
1214
00:46:34,540 --> 00:46:35,950
with the content I mean interpreted as a
1215
00:46:35,950 --> 00:46:37,570
string or more interestingly at some
1216
00:46:37,570 --> 00:46:39,010
kind of a JSON object that can become
1217
00:46:39,010 --> 00:46:41,800
data right you can do a lot of magic
1218
00:46:41,800 --> 00:46:43,630
with this kind of things right
1219
00:46:43,630 --> 00:46:46,840
for example one of them would be to do a
1220
00:46:46,840 --> 00:46:49,450
notification mechanism right when things
1221
00:46:49,450 --> 00:46:52,720
changed right so imagine for example
1222
00:46:52,720 --> 00:46:55,660
you're watching stocks such a WebSocket
1223
00:46:55,660 --> 00:46:59,040
will allow so you're watching stocks and
1224
00:46:59,040 --> 00:47:01,660
you're only interested in iron ore five
1225
00:47:01,660 --> 00:47:04,240
kickers you could in fact tell the
1226
00:47:04,240 --> 00:47:05,890
server through whatever mechanism you
1227
00:47:05,890 --> 00:47:07,630
want either the WebSocket which is
1228
00:47:07,630 --> 00:47:09,160
bi-directional or through one of those
1229
00:47:09,160 --> 00:47:11,020
calls you can tell the server hey I'm
1230
00:47:11,020 --> 00:47:12,760
interested in this kind of things and
1231
00:47:12,760 --> 00:47:15,880
then the server may be it I mean
1232
00:47:15,880 --> 00:47:17,770
hopefully it remembers what you're
1233
00:47:17,770 --> 00:47:19,930
interested in and somehow only pushes
1234
00:47:19,930 --> 00:47:21,850
you changes that you're interested ok
1235
00:47:21,850 --> 00:47:23,560
now we need to discuss about that
1236
00:47:23,560 --> 00:47:25,240
separately I want to come back to that
1237
00:47:25,240 --> 00:47:30,240
that's called publish/subscribe system
1238
00:47:34,050 --> 00:47:36,550
but you see everything is tied up to
1239
00:47:36,550 --> 00:47:39,880
this notion off if you have a stale copy
1240
00:47:39,880 --> 00:47:41,650
I'd somehow have to make it available to
1241
00:47:41,650 --> 00:47:43,060
you either pool or Porsche this is gonna
1242
00:47:43,060 --> 00:47:44,590
be a lot of pushing with a
1243
00:47:44,590 --> 00:47:48,880
publish/subscribe right so a more just
1244
00:47:48,880 --> 00:47:51,190
to give you a short preview a more
1245
00:47:51,190 --> 00:47:52,780
elaborate publish/subscribe system is
1246
00:47:52,780 --> 00:47:54,490
for example some sort of news feeds in
1247
00:47:54,490 --> 00:47:55,690
which you say I'm interested in this
1248
00:47:55,690 --> 00:47:56,980
kind of things and then you have a big
1249
00:47:56,980 --> 00:47:59,770
server and the server over thousands of
1250
00:47:59,770 --> 00:48:01,690
millions of clients is determining who
1251
00:48:01,690 --> 00:48:03,670
needs water at wartime and through the
1252
00:48:03,670 --> 00:48:05,200
notification mechanism pushes things
1253
00:48:05,200 --> 00:48:10,900
right all right now of course the trick
1254
00:48:10,900 --> 00:48:12,790
there is only to let people know about
1255
00:48:12,790 --> 00:48:14,170
things they care about not about
1256
00:48:14,170 --> 00:48:14,710
everything
1257
00:48:14,710 --> 00:48:16,510
right so it's the difference between a
1258
00:48:16,510 --> 00:48:20,310
broadcast or some sort of multicast or
1259
00:48:20,310 --> 00:48:23,950
point-to-point connection all right but
1260
00:48:23,950 --> 00:48:27,760
all comes comes back to this essentially
1261
00:48:27,760 --> 00:48:30,940
to this replication so obviously a
1262
00:48:30,940 --> 00:48:33,040
publish/subscribe system it's going to
1263
00:48:33,040 --> 00:48:34,180
be implemented in a very different way
1264
00:48:34,180 --> 00:48:37,060
than normal for example web web content
1265
00:48:37,060 --> 00:48:40,800
delivery so any any questions about this
1266
00:48:40,800 --> 00:48:42,660
right so this is the basic
1267
00:48:42,660 --> 00:48:44,279
you are the check for freshness and
1268
00:48:44,279 --> 00:48:46,470
whatever you have whatever algorithm to
1269
00:48:46,470 --> 00:48:48,450
detect for that notifications which are
1270
00:48:48,450 --> 00:48:51,299
small or push-pull changes now you can
1271
00:48:51,299 --> 00:48:53,789
be a little bit clever in between
1272
00:48:53,789 --> 00:48:55,950
especially for resources that tend to be
1273
00:48:55,950 --> 00:48:58,650
big but with small changes namely you
1274
00:48:58,650 --> 00:49:01,079
could send a set of operations that can
1275
00:49:01,079 --> 00:49:04,349
be applied at the other end in order to
1276
00:49:04,349 --> 00:49:06,950
bring the copy to a consistent copy
1277
00:49:06,950 --> 00:49:12,089
rather than the full copy right for
1278
00:49:12,089 --> 00:49:13,079
example these kind of things are very
1279
00:49:13,079 --> 00:49:16,559
good for let's say file editing when
1280
00:49:16,559 --> 00:49:18,299
you're editing the file is especially
1281
00:49:18,299 --> 00:49:20,160
this new new trend right over there
1282
00:49:20,160 --> 00:49:22,109
collaborating editing you're usually
1283
00:49:22,109 --> 00:49:24,059
making relatively small changes at any
1284
00:49:24,059 --> 00:49:26,339
given moment of time this is going to be
1285
00:49:26,339 --> 00:49:28,019
foolish to say you know what we work on
1286
00:49:28,019 --> 00:49:29,609
this document and it's a megabyte in
1287
00:49:29,609 --> 00:49:31,470
size the moment somebody presses a key
1288
00:49:31,470 --> 00:49:34,829
and fruits of space in the document I'm
1289
00:49:34,829 --> 00:49:35,940
going to send you the new version of the
1290
00:49:35,940 --> 00:49:39,660
document right so an interesting kind of
1291
00:49:39,660 --> 00:49:41,099
question to ask there is could i
1292
00:49:41,099 --> 00:49:42,930
propagate only the small changes and
1293
00:49:42,930 --> 00:49:43,950
then there is a question of how you
1294
00:49:43,950 --> 00:49:45,930
propagate those changes to keep multiple
1295
00:49:45,930 --> 00:49:48,319
consistent copies when you have multiple
1296
00:49:48,319 --> 00:49:53,759
if you want people or processes that
1297
00:49:53,759 --> 00:49:55,380
actually change change the thing right
1298
00:49:55,380 --> 00:49:58,319
so then you can be somewhere in the
1299
00:49:58,319 --> 00:49:59,940
middle in which you're not only
1300
00:49:59,940 --> 00:50:00,960
important you're not in validating
1301
00:50:00,960 --> 00:50:02,730
because you're not you're not saying hey
1302
00:50:02,730 --> 00:50:05,309
this is there is a change go grab a new
1303
00:50:05,309 --> 00:50:09,119
copy you're simply sending a compact
1304
00:50:09,119 --> 00:50:11,430
description of what changed right so
1305
00:50:11,430 --> 00:50:13,440
some sort of a differential between the
1306
00:50:13,440 --> 00:50:16,890
server copy and your own copy that could
1307
00:50:16,890 --> 00:50:18,809
potentially save a tremendous amount of
1308
00:50:18,809 --> 00:50:20,160
bandwidth and this is one of the core
1309
00:50:20,160 --> 00:50:25,369
problems for example that the box and
1310
00:50:25,369 --> 00:50:28,109
Dropbox people need to solve right
1311
00:50:28,109 --> 00:50:30,390
because most of the files are going to
1312
00:50:30,390 --> 00:50:32,039
have relatively small changes continuous
1313
00:50:32,039 --> 00:50:33,720
changes if you only send the dáil time
1314
00:50:33,720 --> 00:50:35,670
you resolve correctly the changes on the
1315
00:50:35,670 --> 00:50:37,170
other end you're gonna save tremendously
1316
00:50:37,170 --> 00:50:39,420
on the bandwidth which means more profit
1317
00:50:39,420 --> 00:50:41,130
to you because you're literally gonna be
1318
00:50:41,130 --> 00:50:44,069
driven by the bandwidth itself I mean
1319
00:50:44,069 --> 00:50:46,980
that's the big problem for Dropbox right
1320
00:50:46,980 --> 00:50:50,940
it's bandwidth okay now Dropbox is an
1321
00:50:50,940 --> 00:50:53,130
interesting example which would it's
1322
00:50:53,130 --> 00:50:54,900
probably worth discussing how exactly it
1323
00:50:54,900 --> 00:50:56,010
fits into the story
1324
00:50:56,010 --> 00:51:00,750
right the discussion should be probably
1325
00:51:00,750 --> 00:51:01,680
more in Ibraham when we talk about
1326
00:51:01,680 --> 00:51:03,990
disability file systems but essentially
1327
00:51:03,990 --> 00:51:05,280
because you can have a lot of storage
1328
00:51:05,280 --> 00:51:07,200
locally right I mean I just argued that
1329
00:51:07,200 --> 00:51:09,570
hard drives are cheap most of the files
1330
00:51:09,570 --> 00:51:11,970
you're accessing especially if you don't
1331
00:51:11,970 --> 00:51:13,710
pay a lot of money to Dropbox to have a
1332
00:51:13,710 --> 00:51:15,359
lot very large storage which is the case
1333
00:51:15,359 --> 00:51:17,910
with those people right then storing
1334
00:51:17,910 --> 00:51:19,560
your own local copy is not a big issue
1335
00:51:19,560 --> 00:51:21,900
as long as this copies your local copy
1336
00:51:21,900 --> 00:51:23,730
and the server copy are kept in sync
1337
00:51:23,730 --> 00:51:25,680
right so the annoying thing would be if
1338
00:51:25,680 --> 00:51:29,130
I change the files from multiple
1339
00:51:29,130 --> 00:51:30,869
machines and the changes don't get
1340
00:51:30,869 --> 00:51:32,280
propagated I'm not really particularly
1341
00:51:32,280 --> 00:51:33,540
concerned about the fact that I only
1342
00:51:33,540 --> 00:51:35,850
have space for my own copy right but
1343
00:51:35,850 --> 00:51:37,440
most people keep less than something
1344
00:51:37,440 --> 00:51:38,940
thinking about thanking abides now I
1345
00:51:38,940 --> 00:51:40,440
mean even phones can do thank you your
1346
00:51:40,440 --> 00:51:44,760
bytes right kind of right in those kind
1347
00:51:44,760 --> 00:51:49,520
of circumstances the big big issue is
1348
00:51:49,520 --> 00:51:53,340
how can you update those files right you
1349
00:51:53,340 --> 00:51:54,630
could use notification to invalidate
1350
00:51:54,630 --> 00:51:56,369
them and then you force somebody to go
1351
00:51:56,369 --> 00:51:58,200
and bring a new a new copy you can use
1352
00:51:58,200 --> 00:51:59,609
notification in other ways for example
1353
00:51:59,609 --> 00:52:02,220
in Dropbox and notices uses it you can
1354
00:52:02,220 --> 00:52:08,010
have this other applications that when
1355
00:52:08,010 --> 00:52:10,530
there is a notification instead of just
1356
00:52:10,530 --> 00:52:12,270
doing something in the file system or to
1357
00:52:12,270 --> 00:52:13,440
complement what you're doing the file
1358
00:52:13,440 --> 00:52:14,760
system you pop up something on users
1359
00:52:14,760 --> 00:52:16,050
screen and say hey somebody is changing
1360
00:52:16,050 --> 00:52:17,790
that file there is a new copy for this
1361
00:52:17,790 --> 00:52:19,920
file do you want me to bring it and you
1362
00:52:19,920 --> 00:52:22,470
still ask for some user action which
1363
00:52:22,470 --> 00:52:24,420
essentially means you're not waste with
1364
00:52:24,420 --> 00:52:26,490
wasting bandwidth unless the user really
1365
00:52:26,490 --> 00:52:27,930
cares about that resource so that's one
1366
00:52:27,930 --> 00:52:30,240
way for I mean it's something that I
1367
00:52:30,240 --> 00:52:34,140
think at least initially neglected as a
1368
00:52:34,140 --> 00:52:38,100
solution user engagement if people have
1369
00:52:38,100 --> 00:52:41,280
to click right then you can save a lot
1370
00:52:41,280 --> 00:52:45,390
of bandwidth because people are gonna go
1371
00:52:45,390 --> 00:52:47,130
and click and click and click only for
1372
00:52:47,130 --> 00:52:48,420
the first few days and then they get
1373
00:52:48,420 --> 00:52:49,859
tied and they go and click only when
1374
00:52:49,859 --> 00:52:50,369
they need it
1375
00:52:50,369 --> 00:52:53,310
right you send a short notification to
1376
00:52:53,310 --> 00:52:55,560
use the nodes that he changed it has to
1377
00:52:55,560 --> 00:52:57,540
click on it to bring a new version saves
1378
00:52:57,540 --> 00:52:59,280
bandwidth as opposed to automatically
1379
00:52:59,280 --> 00:53:02,010
keep on pushing these changes right now
1380
00:53:02,010 --> 00:53:06,150
in fact a company like Dropbox would
1381
00:53:06,150 --> 00:53:08,460
have to do full post changes for binary
1382
00:53:08,460 --> 00:53:09,420
files
1383
00:53:09,420 --> 00:53:12,600
it's usually very very hard to send
1384
00:53:12,600 --> 00:53:15,540
Delta changes right
1385
00:53:15,540 --> 00:53:16,950
especially when you have compressed
1386
00:53:16,950 --> 00:53:18,840
parts is a complete disaster right
1387
00:53:18,840 --> 00:53:19,980
unless you know precisely the
1388
00:53:19,980 --> 00:53:21,270
compression format that you're trying to
1389
00:53:21,270 --> 00:53:23,370
do something probably too clever for
1390
00:53:23,370 --> 00:53:24,240
your own good
1391
00:53:24,240 --> 00:53:28,380
okay literally a small change in the
1392
00:53:28,380 --> 00:53:30,420
file could produce quite large changes
1393
00:53:30,420 --> 00:53:31,920
overall or if you are talking about
1394
00:53:31,920 --> 00:53:34,470
encrypted files it's a complete disaster
1395
00:53:34,470 --> 00:53:36,870
okay because possibly the entire file
1396
00:53:36,870 --> 00:53:39,000
changes and the smallest change in the
1397
00:53:39,000 --> 00:53:40,890
content if it's really encrypted even if
1398
00:53:40,890 --> 00:53:42,720
you use block encryption you're still
1399
00:53:42,720 --> 00:53:46,590
dealing with blocks right so there is
1400
00:53:46,590 --> 00:53:48,090
always some sort of a gradation between
1401
00:53:48,090 --> 00:53:49,650
these things and finding the right
1402
00:53:49,650 --> 00:53:51,120
trade-off between these things is one
1403
00:53:51,120 --> 00:53:53,070
tricky thing right for the specific
1404
00:53:53,070 --> 00:53:55,740
application finding how and when to
1405
00:53:55,740 --> 00:53:57,570
invalidate the caches is another tricky
1406
00:53:57,570 --> 00:53:59,490
tricky thing and almost always these
1407
00:53:59,490 --> 00:54:02,010
things have to be fine-tuned for the
1408
00:54:02,010 --> 00:54:03,840
specific application so don't look for
1409
00:54:03,840 --> 00:54:05,730
canned solutions that work in every
1410
00:54:05,730 --> 00:54:07,470
circumstance right it's a big struggle
1411
00:54:07,470 --> 00:54:09,840
to find exactly when to check for
1412
00:54:09,840 --> 00:54:11,550
freshness in particular applications and
1413
00:54:11,550 --> 00:54:13,380
when and how to propagate this changes
1414
00:54:13,380 --> 00:54:19,770
okay alright so big issue and some sort
1415
00:54:19,770 --> 00:54:22,490
of a continuum of solutions in that
1416
00:54:22,490 --> 00:54:27,510
discussion now we don't only serve this
1417
00:54:27,510 --> 00:54:28,530
is a non-dairy issue right we are
1418
00:54:28,530 --> 00:54:29,610
talking about the more general or
1419
00:54:29,610 --> 00:54:31,650
application for the purpose of fault
1420
00:54:31,650 --> 00:54:33,240
tolerance when it comes to that we
1421
00:54:33,240 --> 00:54:35,130
really want to think a lot more about
1422
00:54:35,130 --> 00:54:37,470
the server replication right so it's not
1423
00:54:37,470 --> 00:54:39,150
only this this story with when you have
1424
00:54:39,150 --> 00:54:45,570
the client replication right but we have
1425
00:54:45,570 --> 00:54:46,950
multiple servers and you're trying to
1426
00:54:46,950 --> 00:54:48,390
pass the load on multiple servers for
1427
00:54:48,390 --> 00:54:50,190
performance reasons or you're simply
1428
00:54:50,190 --> 00:54:51,630
trying to have enough redundancy in the
1429
00:54:51,630 --> 00:54:53,160
system in case things get lost right
1430
00:54:53,160 --> 00:54:56,370
then the question is how should you
1431
00:54:56,370 --> 00:54:57,840
propagate information from one server to
1432
00:54:57,840 --> 00:55:00,480
another okay now let me tell you an
1433
00:55:00,480 --> 00:55:03,810
interesting kind of story about client
1434
00:55:03,810 --> 00:55:06,060
replication that can serve is even as a
1435
00:55:06,060 --> 00:55:07,650
fault tolerance replication it's kind of
1436
00:55:07,650 --> 00:55:09,360
an interesting situation and that that
1437
00:55:09,360 --> 00:55:11,340
could happen to you but it happened to
1438
00:55:11,340 --> 00:55:17,190
the people that Pixar so it turns out
1439
00:55:17,190 --> 00:55:19,250
that
1440
00:55:20,020 --> 00:55:22,180
somebody deleted by mistake but
1441
00:55:22,180 --> 00:55:24,580
nevertheless deleted the entire creative
1442
00:55:24,580 --> 00:55:28,480
content for Toy Story 2 right is
1443
00:55:28,480 --> 00:55:29,860
somebody DISA maintenance on the server
1444
00:55:29,860 --> 00:55:33,130
and simply destroyed everything they've
1445
00:55:33,130 --> 00:55:35,320
been doing for two years or something
1446
00:55:35,320 --> 00:55:37,120
like that that essentially means by the
1447
00:55:37,120 --> 00:55:39,160
way instant bankruptcy for a company
1448
00:55:39,160 --> 00:55:40,780
young company like Pixar at the time
1449
00:55:40,780 --> 00:55:43,360
because that means we don't deliver the
1450
00:55:43,360 --> 00:55:45,460
movie or we deliver it late you can't do
1451
00:55:45,460 --> 00:55:47,620
it twice okay we deliver it late and
1452
00:55:47,620 --> 00:55:49,720
that's it it might as well call it quits
1453
00:55:49,720 --> 00:55:51,880
they were saved by the fact that one of
1454
00:55:51,880 --> 00:55:54,340
the employees took a copy of the entire
1455
00:55:54,340 --> 00:55:56,650
content home so he can he can work on it
1456
00:55:56,650 --> 00:55:58,960
so this can write and then basically
1457
00:55:58,960 --> 00:56:00,820
they started crying and somebody
1458
00:56:00,820 --> 00:56:03,040
realized that they did something I
1459
00:56:03,040 --> 00:56:04,660
probably was against the company policy
1460
00:56:04,660 --> 00:56:06,550
and took all the files home that happens
1461
00:56:06,550 --> 00:56:09,040
for example when you're using let's say
1462
00:56:09,040 --> 00:56:11,440
subversion or git or one of these tools
1463
00:56:11,440 --> 00:56:16,120
in which you're essentially keeping your
1464
00:56:16,120 --> 00:56:18,070
own copy in which you can make changes
1465
00:56:18,070 --> 00:56:19,690
but you're synchronizing with us with a
1466
00:56:19,690 --> 00:56:21,970
server if the server runs amok which it
1467
00:56:21,970 --> 00:56:24,340
might write at least you have your own
1468
00:56:24,340 --> 00:56:26,200
your own copy which in fact contains the
1469
00:56:26,200 --> 00:56:28,240
full history and you can actually
1470
00:56:28,240 --> 00:56:30,280
recover most of what you had there may
1471
00:56:30,280 --> 00:56:33,220
be - a day or two from the from the back
1472
00:56:33,220 --> 00:56:35,320
end content so don't neglect this
1473
00:56:35,320 --> 00:56:37,140
possibility that you could recover
1474
00:56:37,140 --> 00:56:39,610
information in order to provide for
1475
00:56:39,610 --> 00:56:42,730
tolerance from cached copies the cache
1476
00:56:42,730 --> 00:56:44,560
copies themselves are in fact copies if
1477
00:56:44,560 --> 00:56:46,990
you have a way to resolve how good they
1478
00:56:46,990 --> 00:56:48,520
are and what they are and can patch a
1479
00:56:48,520 --> 00:56:50,140
solution together it's not necessarily a
1480
00:56:50,140 --> 00:56:52,420
bad solution I mean the alternative
1481
00:56:52,420 --> 00:56:53,950
might be to just not do anything which
1482
00:56:53,950 --> 00:56:57,670
would be a complete disaster right it's
1483
00:56:57,670 --> 00:57:00,520
kind of an interesting not meant but
1484
00:57:00,520 --> 00:57:04,600
possible use of of caching okay right
1485
00:57:04,600 --> 00:57:05,980
now when it comes to servers the
1486
00:57:05,980 --> 00:57:08,890
question is how do changes propagate the
1487
00:57:08,890 --> 00:57:10,720
client is the client and from the
1488
00:57:10,720 --> 00:57:11,770
clients point of view you can think more
1489
00:57:11,770 --> 00:57:13,540
in terms of some sort of caching right
1490
00:57:13,540 --> 00:57:15,780
but with the server and especially when
1491
00:57:15,780 --> 00:57:17,800
database servers are involved right
1492
00:57:17,800 --> 00:57:19,450
things can become a lot a lot of
1493
00:57:19,450 --> 00:57:20,680
trickier why because of the consistency
1494
00:57:20,680 --> 00:57:22,630
that needs to be enforced now vary some
1495
00:57:22,630 --> 00:57:24,160
sort of an implicit consistency when it
1496
00:57:24,160 --> 00:57:25,870
comes to the user but somehow everybody
1497
00:57:25,870 --> 00:57:28,300
bends the rules a little bit more right
1498
00:57:28,300 --> 00:57:33,010
so it's some some somewhat
1499
00:57:33,010 --> 00:57:34,510
more acceptable to bend the rules when
1500
00:57:34,510 --> 00:57:36,040
it comes to the time but that's almost
1501
00:57:36,040 --> 00:57:37,390
never acceptable when it comes to the
1502
00:57:37,390 --> 00:57:39,460
service now when it comes to this kind
1503
00:57:39,460 --> 00:57:41,920
of replication right there are multiple
1504
00:57:41,920 --> 00:57:43,810
things you can actually do and they have
1505
00:57:43,810 --> 00:57:45,820
different properties right I mean this
1506
00:57:45,820 --> 00:57:48,150
is just a schematic that basically
1507
00:57:48,150 --> 00:57:52,800
provides some sort of an adaptive way to
1508
00:57:52,800 --> 00:57:55,780
server load the problem right so I
1509
00:57:55,780 --> 00:57:57,220
mentioned the initial problem right
1510
00:57:57,220 --> 00:58:00,280
where do you place servers and how do
1511
00:58:00,280 --> 00:58:04,210
you pair the most clients well one one
1512
00:58:04,210 --> 00:58:05,830
technique and this turns out to work
1513
00:58:05,830 --> 00:58:07,720
reasonably well is you know what forget
1514
00:58:07,720 --> 00:58:09,550
about trying to be smart from the
1515
00:58:09,550 --> 00:58:11,200
beginning how I place servers what I'm
1516
00:58:11,200 --> 00:58:14,200
gonna do is monitor how the server is
1517
00:58:14,200 --> 00:58:16,690
used when I detect that the server is
1518
00:58:16,690 --> 00:58:19,360
struggling I'm simply gonna initiate
1519
00:58:19,360 --> 00:58:22,500
some process of replication and
1520
00:58:22,500 --> 00:58:24,700
partitioning the set of clients then go
1521
00:58:24,700 --> 00:58:26,590
to the to the two different copies in
1522
00:58:26,590 --> 00:58:29,740
order to alleviate a load right now of
1523
00:58:29,740 --> 00:58:31,480
course in order for that to work you
1524
00:58:31,480 --> 00:58:32,950
need some mechanism that ensures that
1525
00:58:32,950 --> 00:58:36,210
that actually kicks in right it
1526
00:58:36,210 --> 00:58:39,160
basically and this is what I tell other
1527
00:58:39,160 --> 00:58:40,960
people in other circumstances is nothing
1528
00:58:40,960 --> 00:58:43,360
exists unless somebody builds it right
1529
00:58:43,360 --> 00:58:45,610
so the fact that it would be nice to
1530
00:58:45,610 --> 00:58:47,260
have that doesn't mean it exists and
1531
00:58:47,260 --> 00:58:51,250
it's there so if I mean some people
1532
00:58:51,250 --> 00:58:53,830
build mechanisms like this or sometimes
1533
00:58:53,830 --> 00:58:55,540
you have to initiate them by hand so for
1534
00:58:55,540 --> 00:58:58,390
example if you really want to rent more
1535
00:58:58,390 --> 00:59:01,000
capacity literally some human has to
1536
00:59:01,000 --> 00:59:03,040
determine that hey that's not enough and
1537
00:59:03,040 --> 00:59:04,930
let's add some capacity or maybe build
1538
00:59:04,930 --> 00:59:06,760
some kind of a semi-automatic mechanism
1539
00:59:06,760 --> 00:59:08,680
so this is one such mechanism automatic
1540
00:59:08,680 --> 00:59:10,540
mechanism right so what you could do is
1541
00:59:10,540 --> 00:59:13,270
measure the load that can mean anything
1542
00:59:13,270 --> 00:59:15,820
but maybe something as simple as simply
1543
00:59:15,820 --> 00:59:17,050
count how many kinds how many
1544
00:59:17,050 --> 00:59:18,100
simultaneous clients you have at the
1545
00:59:18,100 --> 00:59:21,210
same time so keep some statistics or I
1546
00:59:21,210 --> 00:59:24,340
mean this is something that in my
1547
00:59:24,340 --> 00:59:25,720
opinion should happen all the time and
1548
00:59:25,720 --> 00:59:28,480
it happens almost never every large
1549
00:59:28,480 --> 00:59:30,550
system should monitor itself and monitor
1550
00:59:30,550 --> 00:59:32,950
its environment in order to detect am i
1551
00:59:32,950 --> 00:59:35,590
running in trouble for example as a
1552
00:59:35,590 --> 00:59:37,270
server what could you monitor to know
1553
00:59:37,270 --> 00:59:39,160
you're running in trouble well you can
1554
00:59:39,160 --> 00:59:40,870
look for all the signs that indicate
1555
00:59:40,870 --> 00:59:41,920
you're running in trouble for example
1556
00:59:41,920 --> 00:59:43,330
you can look for CPU utilization and if
1557
00:59:43,330 --> 00:59:44,290
you're all the time at a hundred percent
1558
00:59:44,290 --> 00:59:46,880
you're in trouble you
1559
00:59:46,880 --> 00:59:49,309
I don't know look for delays to the hard
1560
00:59:49,309 --> 00:59:50,779
drive and if they are very large you're
1561
00:59:50,779 --> 00:59:52,519
in trouble right you can look for
1562
00:59:52,519 --> 00:59:55,880
network congestion right and then you're
1563
00:59:55,880 --> 00:59:56,480
in trouble
1564
00:59:56,480 --> 00:59:57,799
and all these things can determine you
1565
00:59:57,799 --> 00:59:59,390
to say hey I'm gonna initiate some sort
1566
00:59:59,390 --> 01:00:01,309
of a replication mechanism right and
1567
01:00:01,309 --> 01:00:02,809
then how the replication happens is a
1568
01:00:02,809 --> 01:00:04,579
completely different thing so monitor
1569
01:00:04,579 --> 01:00:07,190
and replicate if you need to this seems
1570
01:00:07,190 --> 01:00:08,599
to be a technique that works reasonably
1571
01:00:08,599 --> 01:00:10,250
well now again it does not produce
1572
01:00:10,250 --> 01:00:13,099
optimal solutions but then I argued at
1573
01:00:13,099 --> 01:00:14,809
the beginning of the class and then you
1574
01:00:14,809 --> 01:00:16,309
might not have such a thing as optimal
1575
01:00:16,309 --> 01:00:17,359
solution because you can't measure
1576
01:00:17,359 --> 01:00:21,170
things perfectly so there is a saying in
1577
01:00:21,170 --> 01:00:23,359
the database community is you don't want
1578
01:00:23,359 --> 01:00:24,529
to talk to my solution you want good
1579
01:00:24,529 --> 01:00:26,089
enough solutions or you want non
1580
01:00:26,089 --> 01:00:29,210
disastrous solutions right so the
1581
01:00:29,210 --> 01:00:31,640
nightmare scenario is when such
1582
01:00:31,640 --> 01:00:34,069
protocols such algorithms that are doing
1583
01:00:34,069 --> 01:00:37,339
things go into really weird territory
1584
01:00:37,339 --> 01:00:38,990
and start doing very foolish things
1585
01:00:38,990 --> 01:00:40,880
right so for example replicating a
1586
01:00:40,880 --> 01:00:42,289
server for every client that'd be a
1587
01:00:42,289 --> 01:00:45,349
disaster by the way a single line of
1588
01:00:45,349 --> 01:00:47,750
code mistake could easily lead to yeah
1589
01:00:47,750 --> 01:00:49,910
may create a server for every client I'm
1590
01:00:49,910 --> 01:00:51,519
sure it happened to some people right
1591
01:00:51,519 --> 01:00:55,220
this is where defects and bugs come into
1592
01:00:55,220 --> 01:01:03,440
play right all right we talked about one
1593
01:01:03,440 --> 01:01:04,910
particular is okay so if we've
1594
01:01:04,910 --> 01:01:06,410
replicated the data but the data is
1595
01:01:06,410 --> 01:01:08,089
readwrite if it's read-only it's easy
1596
01:01:08,089 --> 01:01:11,150
right we've replicated it maybe we have
1597
01:01:11,150 --> 01:01:12,440
a clever algorithm may be a dumb
1598
01:01:12,440 --> 01:01:14,119
algorithm to decide where we place
1599
01:01:14,119 --> 01:01:16,190
another replica but then that's kind of
1600
01:01:16,190 --> 01:01:19,130
the only issue right but when we have
1601
01:01:19,130 --> 01:01:22,660
changes this changes somehow need to be
1602
01:01:22,660 --> 01:01:24,710
propagated or the clients have to be
1603
01:01:24,710 --> 01:01:26,839
made aware of such changes right so if
1604
01:01:26,839 --> 01:01:29,390
the client will check me but again the
1605
01:01:29,390 --> 01:01:30,920
client connects to one of the servers if
1606
01:01:30,920 --> 01:01:32,000
this server doesn't know about the
1607
01:01:32,000 --> 01:01:36,049
change and the change was initiated in
1608
01:01:36,049 --> 01:01:37,759
not the server then you're in trouble so
1609
01:01:37,759 --> 01:01:39,200
the server themselves have to somehow
1610
01:01:39,200 --> 01:01:41,839
keep the information in sync okay there
1611
01:01:41,839 --> 01:01:43,069
are multiple techniques to do this and
1612
01:01:43,069 --> 01:01:45,109
this is exactly the notion of
1613
01:01:45,109 --> 01:01:46,519
consistency that we talked about before
1614
01:01:46,519 --> 01:01:49,759
right multiple techniques to do such
1615
01:01:49,759 --> 01:01:52,099
things and it really depends on what
1616
01:01:52,099 --> 01:01:55,359
blend you have between reads and writes
1617
01:01:55,359 --> 01:01:57,240
so
1618
01:01:57,240 --> 01:02:00,839
one such technique is called remote
1619
01:02:00,839 --> 01:02:02,460
right protocols in which you have a
1620
01:02:02,460 --> 01:02:05,730
primary copy okay so the idea is
1621
01:02:05,730 --> 01:02:08,599
basically the following you're gonna
1622
01:02:08,599 --> 01:02:13,549
replicate the information essentially on
1623
01:02:13,549 --> 01:02:15,960
on all the server's mostly for the
1624
01:02:15,960 --> 01:02:17,579
purpose of reading but when it comes to
1625
01:02:17,579 --> 01:02:19,260
writing only one server it's it's
1626
01:02:19,260 --> 01:02:21,630
allowed to write and initiate maybe a
1627
01:02:21,630 --> 01:02:23,279
more global write propagation of
1628
01:02:23,279 --> 01:02:25,920
propagation of write but instead of
1629
01:02:25,920 --> 01:02:28,619
saying you can write anywhere for every
1630
01:02:28,619 --> 01:02:31,559
item you are allowed to write it only if
1631
01:02:31,559 --> 01:02:33,390
you want in the home base of the item
1632
01:02:33,390 --> 01:02:35,339
okay now it turns out that that
1633
01:02:35,339 --> 01:02:37,589
simplifies protocols between servers
1634
01:02:37,589 --> 01:02:41,309
right so in particular enforcing a
1635
01:02:41,309 --> 01:02:43,319
notion of consistency is now much easier
1636
01:02:43,319 --> 01:02:46,049
why remember before a lot of the
1637
01:02:46,049 --> 01:02:47,819
problems came from the fact that I write
1638
01:02:47,819 --> 01:02:49,440
in this server and there is a concurrent
1639
01:02:49,440 --> 01:02:51,420
write in another server and then you
1640
01:02:51,420 --> 01:02:54,450
can't put them together but if every
1641
01:02:54,450 --> 01:02:56,700
item has a home base then only one
1642
01:02:56,700 --> 01:02:58,710
server can actually write it all right
1643
01:02:58,710 --> 01:03:00,240
so all the write requests will go to
1644
01:03:00,240 --> 01:03:01,680
that server and that server has to
1645
01:03:01,680 --> 01:03:03,539
initiate any further propagation of the
1646
01:03:03,539 --> 01:03:05,279
information to other servers your only
1647
01:03:05,279 --> 01:03:06,510
issue is gonna be mildly stale
1648
01:03:06,510 --> 01:03:09,450
information in in the other servers but
1649
01:03:09,450 --> 01:03:11,609
you're really not gonna have most of the
1650
01:03:11,609 --> 01:03:13,260
conflict you would otherwise have if you
1651
01:03:13,260 --> 01:03:16,170
have a primary a primary base for where
1652
01:03:16,170 --> 01:03:18,240
the writes happen okay no that
1653
01:03:18,240 --> 01:03:20,190
immediately means that writing it's
1654
01:03:20,190 --> 01:03:23,039
gonna be a lot more expensive than then
1655
01:03:23,039 --> 01:03:24,990
reading and a big question for the
1656
01:03:24,990 --> 01:03:26,730
client is the following so the client
1657
01:03:26,730 --> 01:03:28,140
initiates the right I mean the client
1658
01:03:28,140 --> 01:03:29,599
does something that initiates the right
1659
01:03:29,599 --> 01:03:32,130
when can you let the client continue
1660
01:03:32,130 --> 01:03:34,380
right now this is a big issue in general
1661
01:03:34,380 --> 01:03:37,500
right when it comes to asking any other
1662
01:03:37,500 --> 01:03:39,779
entity to do some activity for you and
1663
01:03:39,779 --> 01:03:41,339
it's the discussion we had throughout
1664
01:03:41,339 --> 01:03:42,990
the class synchronous versus
1665
01:03:42,990 --> 01:03:44,789
asynchronous operations right
1666
01:03:44,789 --> 01:03:46,859
asynchronous means I asked you to do
1667
01:03:46,859 --> 01:03:48,420
something and I go immediately and do
1668
01:03:48,420 --> 01:03:49,829
something else and you're eventually
1669
01:03:49,829 --> 01:03:52,079
gonna do it and whatever synchronous
1670
01:03:52,079 --> 01:03:54,029
means I'll wait until I'm sure that
1671
01:03:54,029 --> 01:03:55,349
you've done what I asked you to do
1672
01:03:55,349 --> 01:03:58,760
partially or completely right now
1673
01:03:58,760 --> 01:04:01,980
synchronous writing synchronous programs
1674
01:04:01,980 --> 01:04:03,990
it's much easier when it comes to
1675
01:04:03,990 --> 01:04:05,400
reasoning about what goes on in the
1676
01:04:05,400 --> 01:04:10,020
system for example when I ask the system
1677
01:04:10,020 --> 01:04:10,890
to
1678
01:04:10,890 --> 01:04:14,600
to read something from the disk right
1679
01:04:14,600 --> 01:04:17,820
it's much easier to say read and when I
1680
01:04:17,820 --> 01:04:20,340
get back the control that I would get
1681
01:04:20,340 --> 01:04:21,750
for example in a synchronous system I
1682
01:04:21,750 --> 01:04:24,600
know that what I asked the operating
1683
01:04:24,600 --> 01:04:26,250
system or the server whatever to read is
1684
01:04:26,250 --> 01:04:28,950
there because I waited for it but now
1685
01:04:28,950 --> 01:04:30,750
imagine that you would write code in
1686
01:04:30,750 --> 01:04:32,370
which you say read but the read will
1687
01:04:32,370 --> 01:04:33,600
come in the future but you know anything
1688
01:04:33,600 --> 01:04:34,710
you have control back what would you do
1689
01:04:34,710 --> 01:04:36,120
with it you already held dealt with some
1690
01:04:36,120 --> 01:04:37,200
of these issues if you use for example
1691
01:04:37,200 --> 01:04:40,320
futures in in Scala right so it can't be
1692
01:04:40,320 --> 01:04:42,360
done but it's much harder to think about
1693
01:04:42,360 --> 01:04:44,430
the dynamic of the system it's the same
1694
01:04:44,430 --> 01:04:46,110
thing here so when you're saying right
1695
01:04:46,110 --> 01:04:49,440
the big question for the client is do
1696
01:04:49,440 --> 01:04:51,930
you wait until the right got propagated
1697
01:04:51,930 --> 01:04:53,280
everywhere and that you're sure that the
1698
01:04:53,280 --> 01:04:56,640
right happened or do you go immediately
1699
01:04:56,640 --> 01:04:58,950
and do something else but then you must
1700
01:04:58,950 --> 01:05:01,350
have a mechanism to notify you later
1701
01:05:01,350 --> 01:05:03,000
then the right might have failed there
1702
01:05:03,000 --> 01:05:04,170
are many reasons that the right could
1703
01:05:04,170 --> 01:05:07,170
actually have failed right and this is
1704
01:05:07,170 --> 01:05:08,520
one of the choices that have to be made
1705
01:05:08,520 --> 01:05:09,900
with this kind of assistance and really
1706
01:05:09,900 --> 01:05:11,550
depends on what consistency model you're
1707
01:05:11,550 --> 01:05:14,250
trying to enforce right now a big issue
1708
01:05:14,250 --> 01:05:16,140
always when you have things like home
1709
01:05:16,140 --> 01:05:19,260
bases fault tolerance the primary copy
1710
01:05:19,260 --> 01:05:21,840
goes down then what so you have to
1711
01:05:21,840 --> 01:05:24,090
augment this with mechanisms in which
1712
01:05:24,090 --> 01:05:26,370
you decide who's the primary copy based
1713
01:05:26,370 --> 01:05:28,230
on different protocols and we talked
1714
01:05:28,230 --> 01:05:29,340
already about the leader election
1715
01:05:29,340 --> 01:05:30,630
protocols right I'm going to talk more
1716
01:05:30,630 --> 01:05:35,280
about this right so that's one way to do
1717
01:05:35,280 --> 01:05:37,470
things again don't look for a silver
1718
01:05:37,470 --> 01:05:38,910
bullet here there's no such thing there
1719
01:05:38,910 --> 01:05:40,230
are solutions and have different
1720
01:05:40,230 --> 01:05:41,790
properties different compromises so
1721
01:05:41,790 --> 01:05:44,430
different kind of compromise would be
1722
01:05:44,430 --> 01:05:47,400
done by this local right protocol local
1723
01:05:47,400 --> 01:05:48,930
right protocol you're gonna write
1724
01:05:48,930 --> 01:05:51,300
locally right but essentially what you
1725
01:05:51,300 --> 01:05:53,070
what you're gonna do to stay sane I mean
1726
01:05:53,070 --> 01:05:54,420
everybody can write locally and then you
1727
01:05:54,420 --> 01:05:55,800
have to somehow resolve the conflicts
1728
01:05:55,800 --> 01:05:57,120
and that can have its own set of
1729
01:05:57,120 --> 01:05:58,890
problems but what you could do and this
1730
01:05:58,890 --> 01:06:00,270
is similar to those token based program
1731
01:06:00,270 --> 01:06:02,520
policies to get permission to own the
1732
01:06:02,520 --> 01:06:08,010
local cop to to have the the right the
1733
01:06:08,010 --> 01:06:09,630
right to write this is the different
1734
01:06:09,630 --> 01:06:12,090
rights right on your local machine right
1735
01:06:12,090 --> 01:06:13,770
you say temporarily I'm the one that
1736
01:06:13,770 --> 01:06:15,210
owns this resource then I can do any
1737
01:06:15,210 --> 01:06:16,710
rights I want to attend later I'm gonna
1738
01:06:16,710 --> 01:06:19,370
give the ownership to somebody else okay
1739
01:06:19,370 --> 01:06:22,620
this could work and work nicely
1740
01:06:22,620 --> 01:06:24,360
so essentially you say
1741
01:06:24,360 --> 01:06:27,240
if you think about it right even though
1742
01:06:27,240 --> 01:06:29,760
I use multiple clients if I have this
1743
01:06:29,760 --> 01:06:31,440
kind of local right protocol this might
1744
01:06:31,440 --> 01:06:34,770
work out reasonably well right so when I
1745
01:06:34,770 --> 01:06:37,170
move to a new machine I have to find out
1746
01:06:37,170 --> 01:06:41,100
who has the who's allowed to right and
1747
01:06:41,100 --> 01:06:43,980
say hey I want to grab that privilege
1748
01:06:43,980 --> 01:06:45,210
from you which means you don't have it
1749
01:06:45,210 --> 01:06:47,460
and I have it I might have a little bit
1750
01:06:47,460 --> 01:06:49,530
of mildly tedious protocol to do so but
1751
01:06:49,530 --> 01:06:52,080
beyond that point I can run very fast
1752
01:06:52,080 --> 01:06:53,910
because I'm just gonna write locally of
1753
01:06:53,910 --> 01:06:55,880
course the fault tolerance now suffers
1754
01:06:55,880 --> 01:06:59,700
right which again might or might not be
1755
01:06:59,700 --> 01:07:01,950
a big a big issue for some of the
1756
01:07:01,950 --> 01:07:04,320
activities you do okay so again traders
1757
01:07:04,320 --> 01:07:09,690
vs. trade-offs now when it comes to
1758
01:07:09,690 --> 01:07:12,300
right so I mean there is vast literature
1759
01:07:12,300 --> 01:07:15,510
on these issues right because things can
1760
01:07:15,510 --> 01:07:18,630
be very very complicated depending on
1761
01:07:18,630 --> 01:07:19,980
the consistency model you're trying to
1762
01:07:19,980 --> 01:07:21,840
enforce so one of the one of the things
1763
01:07:21,840 --> 01:07:23,580
that you could try to do is the
1764
01:07:23,580 --> 01:07:24,630
following so if you have multiple
1765
01:07:24,630 --> 01:07:31,020
servers then one way to consistently
1766
01:07:31,020 --> 01:07:33,300
propagate this right right is to be
1767
01:07:33,300 --> 01:07:34,740
careful who's allowed to right at what
1768
01:07:34,740 --> 01:07:36,060
moment of time and who's allowed to read
1769
01:07:36,060 --> 01:07:42,120
at one moment of time okay so one such
1770
01:07:42,120 --> 01:07:43,980
solution is so-called quorum based
1771
01:07:43,980 --> 01:07:44,760
protocols
1772
01:07:44,760 --> 01:07:46,260
so the quorum based protocols are
1773
01:07:46,260 --> 01:07:48,840
protocols in which you have multiple
1774
01:07:48,840 --> 01:07:51,750
participants and they all have to
1775
01:07:51,750 --> 01:07:53,820
participate in the act of either reading
1776
01:07:53,820 --> 01:07:55,920
or writing depending on how you actually
1777
01:07:55,920 --> 01:07:58,710
do things and of particular concern in
1778
01:07:58,710 --> 01:08:00,720
this circumstance is to make sure that
1779
01:08:00,720 --> 01:08:02,670
if somebody arrives nobody can read and
1780
01:08:02,670 --> 01:08:04,590
or at least everybody's read copied
1781
01:08:04,590 --> 01:08:07,200
invalidated right so then one such
1782
01:08:07,200 --> 01:08:08,730
possible solution so this is a solution
1783
01:08:08,730 --> 01:08:11,160
that's not good and this is basically a
1784
01:08:11,160 --> 01:08:16,620
solution that's good right so one one
1785
01:08:16,620 --> 01:08:19,529
way to make sure for example that you
1786
01:08:19,529 --> 01:08:22,170
invalidate everybody else's copy is to
1787
01:08:22,170 --> 01:08:24,779
basically say I need a let's say any
1788
01:08:24,779 --> 01:08:28,109
tokens to do any read or write in the
1789
01:08:28,109 --> 01:08:31,290
system as a client right now to do a
1790
01:08:31,290 --> 01:08:33,450
read I might and this is the situation
1791
01:08:33,450 --> 01:08:34,979
with a circle here right
1792
01:08:34,979 --> 01:08:37,618
if I get one token I can read
1793
01:08:37,618 --> 01:08:38,969
but in order to write I need all the
1794
01:08:38,969 --> 01:08:40,859
tokens now I mean of course the question
1795
01:08:40,859 --> 01:08:42,210
is you can come up with any protocol you
1796
01:08:42,210 --> 01:08:44,279
want but the question is does it enforce
1797
01:08:44,279 --> 01:08:46,770
the consistency model that you're
1798
01:08:46,770 --> 01:08:48,929
looking for right now it turns out that
1799
01:08:48,929 --> 01:08:52,198
this one will make sure that if anybody
1800
01:08:52,198 --> 01:08:54,210
writes nobody can read and you cannot
1801
01:08:54,210 --> 01:08:57,238
write if anybody reads is replicate but
1802
01:08:57,238 --> 01:08:59,238
notice that it's optimized for the reads
1803
01:08:59,238 --> 01:09:03,149
so if a single token is missing you
1804
01:09:03,149 --> 01:09:04,770
cannot write you have to wait until you
1805
01:09:04,770 --> 01:09:08,310
acquire all such tokens right clearly if
1806
01:09:08,310 --> 01:09:09,929
you have all the tokens nobody can read
1807
01:09:09,929 --> 01:09:13,649
right so it you can just reason through
1808
01:09:13,649 --> 01:09:15,569
the scenarios and you can see that well
1809
01:09:15,569 --> 01:09:19,049
the reason the rights are are are not
1810
01:09:19,049 --> 01:09:20,729
going to be wrong of course the problem
1811
01:09:20,729 --> 01:09:23,969
is starvation for the writer right it
1812
01:09:23,969 --> 01:09:25,560
might take a long time for the writer to
1813
01:09:25,560 --> 01:09:27,509
acquire all the tokens in order to be
1814
01:09:27,509 --> 01:09:30,179
able to produce that right operation
1815
01:09:30,179 --> 01:09:33,139
okay yes
1816
01:09:42,038 --> 01:09:47,029
right so you can go so you can go into
1817
01:09:47,029 --> 01:09:51,229
such protocols and refine them now one
1818
01:09:51,229 --> 01:09:54,400
benefit of having separate so you see
1819
01:09:54,400 --> 01:09:56,659
multiple readers are great but then you
1820
01:09:56,659 --> 01:09:58,280
need the mechanism to keep track of how
1821
01:09:58,280 --> 01:10:00,170
many such readers exist or don't exist
1822
01:10:00,170 --> 01:10:01,999
right so that's a choice in itself
1823
01:10:01,999 --> 01:10:04,909
whether you allow multiple reads or you
1824
01:10:04,909 --> 01:10:06,499
don't allow multiple reads to happen and
1825
01:10:06,499 --> 01:10:08,599
also what are exactly are these tokens
1826
01:10:08,599 --> 01:10:10,099
and where are they and how do you keep
1827
01:10:10,099 --> 01:10:11,690
track of them well one possible solution
1828
01:10:11,690 --> 01:10:13,999
is to have a single dedicated if you
1829
01:10:13,999 --> 01:10:15,199
want centralized server that just
1830
01:10:15,199 --> 01:10:17,119
manages these tokens it's very similar
1831
01:10:17,119 --> 01:10:20,300
to the centralized solution that we had
1832
01:10:20,300 --> 01:10:23,300
for the for the locking right and I mean
1833
01:10:23,300 --> 01:10:24,530
that's another way to do it you can you
1834
01:10:24,530 --> 01:10:26,300
can just use read blocks and write locks
1835
01:10:26,300 --> 01:10:36,249
in in a centralized solution yes
1836
01:10:52,320 --> 01:10:55,900
right so many many issues with this and
1837
01:10:55,900 --> 01:10:57,790
so this is what I'm trying to say with
1838
01:10:57,790 --> 01:10:59,320
these protocols you have to look at the
1839
01:10:59,320 --> 01:11:01,330
tiny details I mean if such a protocol
1840
01:11:01,330 --> 01:11:03,490
is published in a research paper almost
1841
01:11:03,490 --> 01:11:05,320
surely they show that it doesn't run
1842
01:11:05,320 --> 01:11:06,640
into this kind of issue so the issue is
1843
01:11:06,640 --> 01:11:08,380
here deadlock right so the deadlock is
1844
01:11:08,380 --> 01:11:10,870
refers to the fact that if two writers
1845
01:11:10,870 --> 01:11:12,550
start at the same time they acquire some
1846
01:11:12,550 --> 01:11:14,620
of the some of the some of the tokens
1847
01:11:14,620 --> 01:11:16,360
and then none of them can make progress
1848
01:11:16,360 --> 01:11:17,650
because they are waiting for the other
1849
01:11:17,650 --> 01:11:20,290
guys token tool to be acquired right and
1850
01:11:20,290 --> 01:11:22,630
that's it you're stuck so you must
1851
01:11:22,630 --> 01:11:25,870
prevent such signs of such things to
1852
01:11:25,870 --> 01:11:28,180
happen right I mean what kind of
1853
01:11:28,180 --> 01:11:29,680
mechanisms could you have to to do this
1854
01:11:29,680 --> 01:11:32,050
I mean one of them is to basically say
1855
01:11:32,050 --> 01:11:33,460
if you don't acquire all your tokens
1856
01:11:33,460 --> 01:11:34,480
within a certain amount of time you
1857
01:11:34,480 --> 01:11:36,070
release them then might or might not
1858
01:11:36,070 --> 01:11:37,780
work I mean all of these things could
1859
01:11:37,780 --> 01:11:39,400
potentially be problematic depending on
1860
01:11:39,400 --> 01:11:41,710
what is it that the token is I mean does
1861
01:11:41,710 --> 01:11:43,450
it require a certain message exchange or
1862
01:11:43,450 --> 01:11:46,950
doesn't require certain message exchange
1863
01:12:01,780 --> 01:12:05,510
well so what I want to point out I mean
1864
01:12:05,510 --> 01:12:07,579
the details don't matter that much okay
1865
01:12:07,579 --> 01:12:09,739
because you literally have thousands of
1866
01:12:09,739 --> 01:12:11,150
these protocols proposed I mean what am
1867
01:12:11,150 --> 01:12:12,260
I gonna do walk through all of them
1868
01:12:12,260 --> 01:12:14,719
right what matters is to be aware of
1869
01:12:14,719 --> 01:12:16,489
issues so one big issue with anything
1870
01:12:16,489 --> 01:12:18,709
that requires tokens or locks or things
1871
01:12:18,709 --> 01:12:21,199
of this sort is deadlocks right so
1872
01:12:21,199 --> 01:12:23,179
that's a tremendously big issue when it
1873
01:12:23,179 --> 01:12:24,590
comes to any kind of synchronization and
1874
01:12:24,590 --> 01:12:27,169
by the way that so there are two things
1875
01:12:27,169 --> 01:12:29,570
that eat you alive when you do parallel
1876
01:12:29,570 --> 01:12:33,320
processing right one of them is race
1877
01:12:33,320 --> 01:12:35,239
conditions you forgot to lock and you
1878
01:12:35,239 --> 01:12:37,669
access resources at the same time and
1879
01:12:37,669 --> 01:12:39,439
the other one is you locked enough to
1880
01:12:39,439 --> 01:12:43,099
deadlock right either of which are
1881
01:12:43,099 --> 01:12:45,439
completely disastrous right then they
1882
01:12:45,439 --> 01:12:47,479
have different ways of being disasters
1883
01:12:47,479 --> 01:12:49,400
over there completely disastrous and by
1884
01:12:49,400 --> 01:12:51,380
the way even things that you might think
1885
01:12:51,380 --> 01:12:53,449
should not ever run in trouble do
1886
01:12:53,449 --> 01:12:55,969
occasionally run in trouble right in
1887
01:12:55,969 --> 01:12:57,949
particular for example the operating
1888
01:12:57,949 --> 01:13:00,019
system itself is in fact the mini
1889
01:13:00,019 --> 01:13:01,550
distributed system it needs to use
1890
01:13:01,550 --> 01:13:03,679
synchronization because even if you
1891
01:13:03,679 --> 01:13:04,969
don't have multi-threaded applications
1892
01:13:04,969 --> 01:13:06,559
running multiple processes is in fact
1893
01:13:06,559 --> 01:13:09,439
some sort of a multi-threaded activity
1894
01:13:09,439 --> 01:13:12,349
right it can actually happen and this is
1895
01:13:12,349 --> 01:13:15,019
for example what plagued operating
1896
01:13:15,019 --> 01:13:16,939
systems for a long time and this is
1897
01:13:16,939 --> 01:13:18,199
where the blue screen of death comes
1898
01:13:18,199 --> 01:13:21,409
into play and other such oops for
1899
01:13:21,409 --> 01:13:24,610
example in the Linux kernel in which
1900
01:13:24,610 --> 01:13:27,679
well I mean they're all humans right so
1901
01:13:27,679 --> 01:13:30,380
they wrote code and they were convinced
1902
01:13:30,380 --> 01:13:31,969
it's okay but at some point somebody
1903
01:13:31,969 --> 01:13:33,829
acquired needed multiple locks to do an
1904
01:13:33,829 --> 01:13:35,719
operation acquired one of them was
1905
01:13:35,719 --> 01:13:37,039
waiting for the other one some other
1906
01:13:37,039 --> 01:13:40,939
application had the lock and requires
1907
01:13:40,939 --> 01:13:42,919
the second lock and all of them could
1908
01:13:42,919 --> 01:13:44,239
make progress because it waits for the
1909
01:13:44,239 --> 01:13:46,309
other one to release the lock right and
1910
01:13:46,309 --> 01:13:51,229
then what well the it's actually you
1911
01:13:51,229 --> 01:13:52,969
have a deadlock that means that resource
1912
01:13:52,969 --> 01:13:54,499
is completely locked out and nobody can
1913
01:13:54,499 --> 01:13:56,119
ever access it until you reboot the
1914
01:13:56,119 --> 01:13:57,829
operating system the Linux kernel for
1915
01:13:57,829 --> 01:14:00,199
example even have a I mean the PS
1916
01:14:00,199 --> 01:14:02,630
program in Linux or most UNIX operating
1917
01:14:02,630 --> 01:14:05,419
systems has a special symbol for this is
1918
01:14:05,419 --> 01:14:07,999
D which means it's deadlocked by the way
1919
01:14:07,999 --> 01:14:10,280
if you see any process that has a D next
1920
01:14:10,280 --> 01:14:10,660
to
1921
01:14:10,660 --> 01:14:12,910
that essentially means maybe by a
1922
01:14:12,910 --> 01:14:14,800
miracle somehow something happens in the
1923
01:14:14,800 --> 01:14:16,840
future and gets rid of it but if not
1924
01:14:16,840 --> 01:14:19,180
next time after you freshly reboot the
1925
01:14:19,180 --> 01:14:20,410
system is going to go away there is no
1926
01:14:20,410 --> 01:14:21,520
other way it's gonna go away
1927
01:14:21,520 --> 01:14:24,610
and if it's about a very important
1928
01:14:24,610 --> 01:14:26,350
resource that could essentially mean
1929
01:14:26,350 --> 01:14:28,330
you're forced to reboot the operating
1930
01:14:28,330 --> 01:14:31,870
system right these dead locks do still
1931
01:14:31,870 --> 01:14:33,940
happen even in things that are so
1932
01:14:33,940 --> 01:14:36,070
pounded on for such a long time like
1933
01:14:36,070 --> 01:14:38,760
operating systems when it comes to user
1934
01:14:38,760 --> 01:14:41,290
user programs you can have lots of these
1935
01:14:41,290 --> 01:14:45,120
things right yes
1936
01:14:51,649 --> 01:14:56,329
right so I mean look look look for
1937
01:14:56,329 --> 01:15:00,349
everything right you can be very careful
1938
01:15:00,349 --> 01:15:02,510
and very safe but the slow or very fast
1939
01:15:02,510 --> 01:15:04,669
and then run in some kind of trouble or
1940
01:15:04,669 --> 01:15:08,809
have solutions that I mean in I don't
1941
01:15:08,809 --> 01:15:09,979
want to go too much into these details
1942
01:15:09,979 --> 01:15:11,959
because they are discussing in an
1943
01:15:11,959 --> 01:15:13,699
undergrad operating system right I mean
1944
01:15:13,699 --> 01:15:15,649
I'm sure you took such a class most of
1945
01:15:15,649 --> 01:15:17,360
you and you had all that deadlock
1946
01:15:17,360 --> 01:15:19,610
prevention and deadlock detection right
1947
01:15:19,610 --> 01:15:21,800
that work detects detection is very
1948
01:15:21,800 --> 01:15:23,599
costly
1949
01:15:23,599 --> 01:15:27,289
that drug prevention it robs you of
1950
01:15:27,289 --> 01:15:28,760
situations where you could have ran in
1951
01:15:28,760 --> 01:15:29,840
parallel so you have to give up
1952
01:15:29,840 --> 01:15:33,919
something right and by the way you might
1953
01:15:33,919 --> 01:15:37,369
prevent deadlocks but have something
1954
01:15:37,369 --> 01:15:39,409
called live logs in which are still not
1955
01:15:39,409 --> 01:15:40,219
making progress
1956
01:15:40,219 --> 01:15:42,169
you're just doing busy work alright so
1957
01:15:42,169 --> 01:15:44,769
for example a live log here would be
1958
01:15:44,769 --> 01:15:47,179
right you try to acquire tokens for the
1959
01:15:47,179 --> 01:15:49,760
right but if you cannot acquire enough
1960
01:15:49,760 --> 01:15:51,469
tokens within one second you give up and
1961
01:15:51,469 --> 01:15:53,449
start again but you see that still
1962
01:15:53,449 --> 01:15:55,129
doesn't guarantee that anybody will go
1963
01:15:55,129 --> 01:15:57,800
through and do anything why because if
1964
01:15:57,800 --> 01:15:59,749
you have a bunch of this aggressive guys
1965
01:15:59,749 --> 01:16:01,280
that want to do this they wait the
1966
01:16:01,280 --> 01:16:02,899
second and again go and steal some of
1967
01:16:02,899 --> 01:16:04,340
the tokens and again they give up and
1968
01:16:04,340 --> 01:16:05,840
steal tokens and again they give up and
1969
01:16:05,840 --> 01:16:09,619
steal tokens yeah but right right right
1970
01:16:09,619 --> 01:16:11,989
so you have to maybe but none of these
1971
01:16:11,989 --> 01:16:14,419
things are really guaranteed to really
1972
01:16:14,419 --> 01:16:16,010
really work a bit okay random
1973
01:16:16,010 --> 01:16:17,479
essentially means giving up performance
1974
01:16:17,479 --> 01:16:19,939
by the way all right so random restarts
1975
01:16:19,939 --> 01:16:21,679
which are used for wireless access most
1976
01:16:21,679 --> 01:16:22,849
of the time when you have collisions on
1977
01:16:22,849 --> 01:16:24,709
the channel right here so one solution
1978
01:16:24,709 --> 01:16:26,539
is to do this exponential exponential
1979
01:16:26,539 --> 01:16:28,280
backups right you start with one second
1980
01:16:28,280 --> 01:16:29,629
but if I don't succeed I go to two
1981
01:16:29,629 --> 01:16:31,579
seconds four seconds right ramp it up
1982
01:16:31,579 --> 01:16:35,419
exponentially you can actually run in
1983
01:16:35,419 --> 01:16:39,769
it's clear that somebody at some point
1984
01:16:39,769 --> 01:16:41,510
succeeds because you keep on going
1985
01:16:41,510 --> 01:16:43,489
exponentially back in time I mean you
1986
01:16:43,489 --> 01:16:44,809
can even do some sorts of analyses
1987
01:16:44,809 --> 01:16:46,369
depending on how many guys are trying to
1988
01:16:46,369 --> 01:16:49,010
jump on it and how fast on expectation
1989
01:16:49,010 --> 01:16:51,280
when you would you would do things but
1990
01:16:51,280 --> 01:16:54,289
keep in mind that if this is part of the
1991
01:16:54,289 --> 01:16:55,969
things you do all the time you might be
1992
01:16:55,969 --> 01:16:57,829
robbed of tremendous amount of
1993
01:16:57,829 --> 01:17:01,760
performance to do this right so if for
1994
01:17:01,760 --> 01:17:03,590
every single item that you're accessing
1995
01:17:03,590 --> 01:17:05,690
you need to do this exponential back
1996
01:17:05,690 --> 01:17:07,160
it's potentially a disaster I mean
1997
01:17:07,160 --> 01:17:08,360
you're not doing anything except running
1998
01:17:08,360 --> 01:17:10,880
this exponential backup backup
1999
01:17:10,880 --> 01:17:14,150
algorithms right so when it comes to
2000
01:17:14,150 --> 01:17:17,180
selecting such so okay let me back up
2001
01:17:17,180 --> 01:17:17,960
even more
2002
01:17:17,960 --> 01:17:20,360
I want you mostly to be aware of this
2003
01:17:20,360 --> 01:17:22,040
kind of things the fact that it's tricky
2004
01:17:22,040 --> 01:17:24,110
to get such a protocol running right
2005
01:17:24,110 --> 01:17:26,780
that you have to ask hard questions
2006
01:17:26,780 --> 01:17:28,760
about what is it that's going on and
2007
01:17:28,760 --> 01:17:31,730
does it fit the specific application I
2008
01:17:31,730 --> 01:17:33,410
have in mind because some of them can
2009
01:17:33,410 --> 01:17:34,850
work exceptionally well under certain
2010
01:17:34,850 --> 01:17:36,620
circumstances and be disasters under
2011
01:17:36,620 --> 01:17:38,660
other circumstances for example under
2012
01:17:38,660 --> 01:17:40,370
rare rides this will help this will
2013
01:17:40,370 --> 01:17:42,410
actually work nicely with very
2014
01:17:42,410 --> 01:17:43,760
aggressive writes this could actually be
2015
01:17:43,760 --> 01:17:47,510
a disaster right but it's like this for
2016
01:17:47,510 --> 01:17:49,850
everything else in this class right all
2017
01:17:49,850 --> 01:17:52,130
of them have good points and not so good
2018
01:17:52,130 --> 01:17:54,340
points depending on the circumstances so
2019
01:17:54,340 --> 01:17:56,540
when it comes to using disability
2020
01:17:56,540 --> 01:17:58,880
systems you the main question is which
2021
01:17:58,880 --> 01:18:01,900
compromise fits my application really
2022
01:18:01,900 --> 01:18:04,820
right out of all the compromises that
2023
01:18:04,820 --> 01:18:07,250
exist as opposed to knowing what the
2024
01:18:07,250 --> 01:18:10,760
good solution is in certain areas is
2025
01:18:10,760 --> 01:18:12,140
known what the good solution is and that
2026
01:18:12,140 --> 01:18:13,310
that's it right you can prove that
2027
01:18:13,310 --> 01:18:15,290
that's the best solution and that that's
2028
01:18:15,290 --> 01:18:17,210
the end of the story right but it's not
2029
01:18:17,210 --> 01:18:19,520
the case here for example what's the
2030
01:18:19,520 --> 01:18:21,140
best sorting algorithm you know that's
2031
01:18:21,140 --> 01:18:23,210
more key now right it used to be the
2032
01:18:23,210 --> 01:18:24,680
case that the answer was easy yeah a
2033
01:18:24,680 --> 01:18:26,540
quicksort but it's not so easy anymore
2034
01:18:26,540 --> 01:18:29,030
but then ask yourself what's the penalty
2035
01:18:29,030 --> 01:18:31,040
of not picking the right algorithm well
2036
01:18:31,040 --> 01:18:33,260
is usually just a factor of two unless
2037
01:18:33,260 --> 01:18:35,000
you do distributed sort when can be many
2038
01:18:35,000 --> 01:18:37,880
orders of magnitude but right but things
2039
01:18:37,880 --> 01:18:39,380
are not easy anymore because everything
2040
01:18:39,380 --> 01:18:43,940
is so complicated put together so we
2041
01:18:43,940 --> 01:18:46,300
still have
2042
01:18:48,510 --> 01:18:52,989
I'm sorry about 20 minutes okay so when
2043
01:18:52,989 --> 01:18:59,260
it comes to right so when it comes to do
2044
01:18:59,260 --> 01:19:00,880
replication right we've seen I mean
2045
01:19:00,880 --> 01:19:03,820
replication is the worst to a large
2046
01:19:03,820 --> 01:19:05,770
extent when when it comes to the
2047
01:19:05,770 --> 01:19:07,510
decisions you actually have to you have
2048
01:19:07,510 --> 01:19:08,949
to make now you have to pay or adapt
2049
01:19:08,949 --> 01:19:11,949
also to with some sort of backups right
2050
01:19:11,949 --> 01:19:13,750
in order to make the information really
2051
01:19:13,750 --> 01:19:15,760
if you want permanent and I mean
2052
01:19:15,760 --> 01:19:17,710
ultimately for example let's think about
2053
01:19:17,710 --> 01:19:19,210
jet applications right I mean for chat
2054
01:19:19,210 --> 01:19:20,440
applications again you have to ask
2055
01:19:20,440 --> 01:19:22,659
yourself the question is what am I
2056
01:19:22,659 --> 01:19:27,909
losing if everything goes away maybe in
2057
01:19:27,909 --> 01:19:30,929
fact not so much and I've actually okay
2058
01:19:30,929 --> 01:19:34,800
did anybody know anything about snapchat
2059
01:19:34,800 --> 01:19:38,679
okay so this is I didn't look at all the
2060
01:19:38,679 --> 01:19:40,179
details but I think this is mostly about
2061
01:19:40,179 --> 01:19:43,060
so they built in the opposite of fault
2062
01:19:43,060 --> 01:19:44,920
tolerance into the protocol it goes away
2063
01:19:44,920 --> 01:19:47,050
after a while right now this is one way
2064
01:19:47,050 --> 01:19:49,030
to solve all your problems right so you
2065
01:19:49,030 --> 01:19:50,320
can essentially solve all your problems
2066
01:19:50,320 --> 01:19:52,840
with replication by not only not
2067
01:19:52,840 --> 01:19:54,580
worrying about replication but promising
2068
01:19:54,580 --> 01:19:57,190
that you're not gonna replicate right so
2069
01:19:57,190 --> 01:19:59,920
apparently snapchat well first of all
2070
01:19:59,920 --> 01:20:03,040
these value that three billion dollars
2071
01:20:03,040 --> 01:20:07,600
which I mean whatever but apparently
2072
01:20:07,600 --> 01:20:09,520
snapchat is basically pictures or
2073
01:20:09,520 --> 01:20:11,889
whatever they go and the system will get
2074
01:20:11,889 --> 01:20:13,300
rid of them in whatever short amount of
2075
01:20:13,300 --> 01:20:15,730
time all right so that basically means
2076
01:20:15,730 --> 01:20:20,350
that you do exactly the opposite
2077
01:20:20,350 --> 01:20:22,480
applications go exactly in the opposite
2078
01:20:22,480 --> 01:20:24,010
direction right so you're in fact
2079
01:20:24,010 --> 01:20:25,270
promising not only that you're not gonna
2080
01:20:25,270 --> 01:20:27,340
replicate but you're in fact promising
2081
01:20:27,340 --> 01:20:28,750
that even on the client you're going to
2082
01:20:28,750 --> 01:20:31,620
destroy any such copy within a store
2083
01:20:31,620 --> 01:20:34,239
right so it's exactly right it's exactly
2084
01:20:34,239 --> 01:20:35,889
the opposite of caching I'm gonna cash
2085
01:20:35,889 --> 01:20:37,270
for this amount of time and then destroy
2086
01:20:37,270 --> 01:20:39,540
any copy I have and it's gone and
2087
01:20:39,540 --> 01:20:41,679
whatever it's cryptographically secure
2088
01:20:41,679 --> 01:20:43,449
or whatnot
2089
01:20:43,449 --> 01:20:44,889
I by the way that should do cryptography
2090
01:20:44,889 --> 01:20:48,250
in fact and that's I know I know and
2091
01:20:48,250 --> 01:20:49,630
they kind of just rely on the fact that
2092
01:20:49,630 --> 01:20:51,730
other developers are not smart enough to
2093
01:20:51,730 --> 01:20:53,139
write another application that watch is
2094
01:20:53,139 --> 01:20:55,540
what the snapchat does right but for
2095
01:20:55,540 --> 01:20:58,350
example cryptographically
2096
01:20:59,170 --> 01:21:02,350
well by the way this is a real problem
2097
01:21:02,350 --> 01:21:04,600
we have to discuss this how do you mean
2098
01:21:04,600 --> 01:21:08,160
how do you make self-destructive
2099
01:21:08,160 --> 01:21:10,960
messages right away you you see in some
2100
01:21:10,960 --> 01:21:13,210
of the spy movies right if this message
2101
01:21:13,210 --> 01:21:15,070
will self destructing whatever amount of
2102
01:21:15,070 --> 01:21:16,570
time now if you literally have a small
2103
01:21:16,570 --> 01:21:18,370
explosive device in a physical device I
2104
01:21:18,370 --> 01:21:26,830
mean that's true right but Abbi this is
2105
01:21:26,830 --> 01:21:29,680
a real question and well most of all I
2106
01:21:29,680 --> 01:21:32,230
think people like snapchat are mostly
2107
01:21:32,230 --> 01:21:34,510
irresponsible right they say yeah it's a
2108
01:21:34,510 --> 01:21:35,560
cool application we don't worry about
2109
01:21:35,560 --> 01:21:37,030
anything in the world they're not
2110
01:21:37,030 --> 01:21:38,320
promising anything I just kind of say
2111
01:21:38,320 --> 01:21:40,750
yeah it's kind of okay right but imagine
2112
01:21:40,750 --> 01:21:42,520
for example you're the US government and
2113
01:21:42,520 --> 01:21:44,110
you really want those things not to be
2114
01:21:44,110 --> 01:21:45,400
available beyond a certain amount of
2115
01:21:45,400 --> 01:21:48,220
time how would you do it these are very
2116
01:21:48,220 --> 01:21:51,040
hard questions to to in fact answer and
2117
01:21:51,040 --> 01:21:52,870
to some extent you need to put together
2118
01:21:52,870 --> 01:21:54,880
multiple pieces of some sort of a key
2119
01:21:54,880 --> 01:21:57,790
and make that unavailable somehow I mean
2120
01:21:57,790 --> 01:22:00,040
the big problem is what if I could read
2121
01:22:00,040 --> 01:22:01,570
it once what would prevent me from
2122
01:22:01,570 --> 01:22:03,340
reading it again unless I have some
2123
01:22:03,340 --> 01:22:04,900
physical mechanism that prevents that
2124
01:22:04,900 --> 01:22:08,470
from happening all right so that's but
2125
01:22:08,470 --> 01:22:10,150
there are interesting questions with
2126
01:22:10,150 --> 01:22:11,680
respect to that for example also with
2127
01:22:11,680 --> 01:22:15,880
this when it comes to for example secure
2128
01:22:15,880 --> 01:22:19,030
secure access to resource right how many
2129
01:22:19,030 --> 01:22:20,770
of you and this is a kind of an
2130
01:22:20,770 --> 01:22:22,750
interesting maybe preview to to security
2131
01:22:22,750 --> 01:22:24,820
but how many of you have seen those tags
2132
01:22:24,820 --> 01:22:28,840
people wear on the keyring or on the on
2133
01:22:28,840 --> 01:22:30,760
the neck that have some sort of a weird
2134
01:22:30,760 --> 01:22:32,530
counter that keeps on counting and they
2135
01:22:32,530 --> 01:22:34,360
use that to login into a website and
2136
01:22:34,360 --> 01:22:36,340
that counter is valid for about 30
2137
01:22:36,340 --> 01:22:39,870
seconds or whatever to watch that stuff
2138
01:22:46,750 --> 01:22:49,490
right but the key for all those things
2139
01:22:49,490 --> 01:22:50,900
right is the fact that they're
2140
01:22:50,900 --> 01:22:57,650
unforgeable right look by the way the
2141
01:22:57,650 --> 01:22:59,720
way it worked is very simple it
2142
01:22:59,720 --> 01:23:01,730
literally has a counter one two three
2143
01:23:01,730 --> 01:23:04,910
four and you push it for AES 256 or for
2144
01:23:04,910 --> 01:23:06,170
some kind of a public key cryptography
2145
01:23:06,170 --> 01:23:08,180
mechanism and you simply take what comes
2146
01:23:08,180 --> 01:23:10,310
out the other end right now remember
2147
01:23:10,310 --> 01:23:11,570
what I told you about public key
2148
01:23:11,570 --> 01:23:13,850
cryptography it's so good that even if
2149
01:23:13,850 --> 01:23:14,960
you know what you put in you don't
2150
01:23:14,960 --> 01:23:18,590
cannot predict what comes out right and
2151
01:23:18,590 --> 01:23:20,570
in fact we will produce a sequence that
2152
01:23:20,570 --> 01:23:22,190
cannot be forced without knowing the key
2153
01:23:22,190 --> 01:23:24,020
right it doesn't matter how many pairs
2154
01:23:24,020 --> 01:23:26,150
of input outputs you have unless you get
2155
01:23:26,150 --> 01:23:28,040
very very close to two to two hundred or
2156
01:23:28,040 --> 01:23:29,180
something like that you're not gonna be
2157
01:23:29,180 --> 01:23:31,520
able to do guess what the key is and you
2158
01:23:31,520 --> 01:23:32,780
need a key in order to predict the next
2159
01:23:32,780 --> 01:23:33,650
number in the sequence
2160
01:23:33,650 --> 01:23:35,540
so those are two logics and
2161
01:23:35,540 --> 01:23:38,240
unpredictable numbers unless you somehow
2162
01:23:38,240 --> 01:23:39,940
extract the key from the physical device
2163
01:23:39,940 --> 01:23:44,870
right okay more discussions about that
2164
01:23:44,870 --> 01:23:47,330
so when I discuss about security
2165
01:23:47,330 --> 01:23:49,670
somebody has to remind me to talk about
2166
01:23:49,670 --> 01:23:51,590
physical attacks on secure systems
2167
01:23:51,590 --> 01:23:53,240
because I have a number of cool stories
2168
01:23:53,240 --> 01:23:56,360
to tell okay physical attacks all right
2169
01:23:56,360 --> 01:23:58,520
so let's start talking a little bit
2170
01:23:58,520 --> 01:24:00,550
about fault tolerance so we've seen the
2171
01:24:00,550 --> 01:24:04,070
I mean concurrency replication one major
2172
01:24:04,070 --> 01:24:05,780
reason to do replication was fault
2173
01:24:05,780 --> 01:24:07,130
tolerance another one is increased
2174
01:24:07,130 --> 01:24:11,480
performance right all right
2175
01:24:11,480 --> 01:24:16,250
so when it comes to full tolerance I let
2176
01:24:16,250 --> 01:24:17,750
me just warm up the discussion about
2177
01:24:17,750 --> 01:24:21,860
fault tolerance with the issue of how do
2178
01:24:21,860 --> 01:24:25,820
you know something is faulty right let's
2179
01:24:25,820 --> 01:24:27,320
even forget about all the formalism and
2180
01:24:27,320 --> 01:24:28,730
all the stuff that needs to be discussed
2181
01:24:28,730 --> 01:24:30,200
when it comes to fault tolerance it's
2182
01:24:30,200 --> 01:24:34,310
basically okay you can say if something
2183
01:24:34,310 --> 01:24:35,960
is faulty I'll do something about it
2184
01:24:35,960 --> 01:24:37,970
replicate switch or the replicas switch
2185
01:24:37,970 --> 01:24:39,140
to another server how do you know
2186
01:24:39,140 --> 01:24:43,250
something is faulty all right so let me
2187
01:24:43,250 --> 01:24:45,790
put some some stuff on the board
2188
01:24:45,790 --> 01:24:49,840
question is what is faulty
2189
01:24:58,520 --> 01:25:00,990
so how do you know something is broken
2190
01:25:00,990 --> 01:25:04,730
I mean faulty means broken in some way
2191
01:25:04,730 --> 01:25:07,350
so all of these things need to be
2192
01:25:07,350 --> 01:25:10,170
somehow monitored I mean when it comes
2193
01:25:10,170 --> 01:25:13,110
to faulty you can think about self
2194
01:25:13,110 --> 01:25:15,390
diagnosis maybe works maybe doesn't
2195
01:25:15,390 --> 01:25:17,190
right so maybe self diagnosis would work
2196
01:25:17,190 --> 01:25:19,890
for this I monitor myself and I realize
2197
01:25:19,890 --> 01:25:21,300
I'm broken I mean some of the cars mone
2198
01:25:21,300 --> 01:25:22,740
toward themselves and start screaming if
2199
01:25:22,740 --> 01:25:24,330
they don't like something sometimes it's
2200
01:25:24,330 --> 01:25:25,470
nothing and they just annoy you
2201
01:25:25,470 --> 01:25:27,420
right it's a red light comes on and hey
2202
01:25:27,420 --> 01:25:29,220
something is faulty so maybe self
2203
01:25:29,220 --> 01:25:36,810
monitoring maybe the other one can be
2204
01:25:36,810 --> 01:25:39,660
peer monitoring right for example peer
2205
01:25:39,660 --> 01:25:42,590
or time monitoring
2206
01:25:47,199 --> 01:25:50,290
alright you could also have physical
2207
01:25:50,290 --> 01:25:52,330
devices that monitor for example this is
2208
01:25:52,330 --> 01:25:54,219
kind of interesting and you can actually
2209
01:25:54,219 --> 01:25:57,219
buy all this stuff they sell devices now
2210
01:25:57,219 --> 01:25:59,290
that can in fact monitor whether a
2211
01:25:59,290 --> 01:26:01,719
computer system has power doesn't have
2212
01:26:01,719 --> 01:26:03,640
power or is in some sort of a bad state
2213
01:26:03,640 --> 01:26:06,219
and then those devices can be used to
2214
01:26:06,219 --> 01:26:10,719
remotely reboot emotion see it's
2215
01:26:10,719 --> 01:26:13,150
important that the monitoring as much as
2216
01:26:13,150 --> 01:26:15,400
possible it's an independent physical
2217
01:26:15,400 --> 01:26:17,110
device from the primary device because
2218
01:26:17,110 --> 01:26:18,760
the fold could be related to a global
2219
01:26:18,760 --> 01:26:20,739
system failure if you have partial
2220
01:26:20,739 --> 01:26:22,360
system failures you can imagine that you
2221
01:26:22,360 --> 01:26:23,710
have some sort of self monitoring going
2222
01:26:23,710 --> 01:26:25,330
on but it's a global system failure you
2223
01:26:25,330 --> 01:26:26,920
must have another system to monitor this
2224
01:26:26,920 --> 01:26:29,380
system right now let's think about the
2225
01:26:29,380 --> 01:26:33,340
kind of falls that can happen right so
2226
01:26:33,340 --> 01:26:34,960
what kind of kind of faults could I have
2227
01:26:34,960 --> 01:26:38,230
well I mean all the servers or all these
2228
01:26:38,230 --> 01:26:40,239
computers run some programs so the
2229
01:26:40,239 --> 01:26:42,340
program itself can have bugs I mean so
2230
01:26:42,340 --> 01:26:45,690
causes or Falls let's think about that
2231
01:26:46,230 --> 01:26:49,330
right so the I think the primary cause
2232
01:26:49,330 --> 01:26:53,080
for bugs is in fact software defense
2233
01:26:53,080 --> 01:27:01,710
bugs right okay so we can have bugs
2234
01:27:01,710 --> 01:27:05,080
software bugs so let's say software bugs
2235
01:27:05,080 --> 01:27:09,030
we can have Hardware bugs less nowadays
2236
01:27:09,030 --> 01:27:11,530
by the way do you know what why bugs are
2237
01:27:11,530 --> 01:27:13,949
cold bugs
2238
01:27:28,480 --> 01:27:31,480
right
2239
01:27:35,570 --> 01:27:38,060
right so for the people that probably
2240
01:27:38,060 --> 01:27:39,440
couldn't hear you is basically a
2241
01:27:39,440 --> 01:27:41,870
literally found of physical bug in one
2242
01:27:41,870 --> 01:27:43,190
of the circuit boards in one of the
2243
01:27:43,190 --> 01:27:44,540
early computers and they call the whole
2244
01:27:44,540 --> 01:27:46,100
process debugging finding the bug
2245
01:27:46,100 --> 01:27:48,770
okay all right so we can have software
2246
01:27:48,770 --> 01:27:50,720
bugs we can have Hardware bugs now I
2247
01:27:50,720 --> 01:27:52,820
want you to understand that this
2248
01:27:52,820 --> 01:27:54,920
computer systems are not super fragile
2249
01:27:54,920 --> 01:27:58,390
but are somehow influenced by various
2250
01:27:58,390 --> 01:28:00,920
external things and a lot of these bugs
2251
01:28:00,920 --> 01:28:03,230
could be caused by external things I
2252
01:28:03,230 --> 01:28:05,720
mean for example when it comes to to for
2253
01:28:05,720 --> 01:28:09,230
to unfold it's basically the power goes
2254
01:28:09,230 --> 01:28:10,250
down I mean that's one of the bigger
2255
01:28:10,250 --> 01:28:11,210
faults right
2256
01:28:11,210 --> 01:28:13,130
power failure this definitely could do
2257
01:28:13,130 --> 01:28:21,920
it right but let me let me give you an
2258
01:28:21,920 --> 01:28:23,600
idea why it's impossible to have a
2259
01:28:23,600 --> 01:28:25,070
system that doesn't have at least some
2260
01:28:25,070 --> 01:28:26,960
eventually some kind of a hardware bugs
2261
01:28:26,960 --> 01:28:33,350
okay so so one of the things that almost
2262
01:28:33,350 --> 01:28:35,660
surely will start producing havoc here's
2263
01:28:35,660 --> 01:28:38,000
the temperature rising for whatever
2264
01:28:38,000 --> 01:28:41,030
reason so it turns out that well because
2265
01:28:41,030 --> 01:28:42,800
of thermodynamics right the higher
2266
01:28:42,800 --> 01:28:45,740
temperature means bigger agitation on
2267
01:28:45,740 --> 01:28:47,390
the on the atoms which essentially means
2268
01:28:47,390 --> 01:28:50,750
that clean zero zero in one state that
2269
01:28:50,750 --> 01:28:53,780
can be kept separate from from each
2270
01:28:53,780 --> 01:28:55,190
other unless you want to do a state
2271
01:28:55,190 --> 01:28:56,690
transition can actually automatically
2272
01:28:56,690 --> 01:29:00,140
transition from a zero to one and by the
2273
01:29:00,140 --> 01:29:02,090
way this is actually I might as well
2274
01:29:02,090 --> 01:29:04,400
tell you this because this looks like a
2275
01:29:04,400 --> 01:29:06,320
bug but can be used as an interesting
2276
01:29:06,320 --> 01:29:07,520
feature when it comes to security
2277
01:29:07,520 --> 01:29:09,920
attacks so there is a group I'm gonna
2278
01:29:09,920 --> 01:29:11,060
tell you the whole story when I talk
2279
01:29:11,060 --> 01:29:13,340
about security but there is a group at I
2280
01:29:13,340 --> 01:29:15,260
think Harvard that's known for
2281
01:29:15,260 --> 01:29:17,300
completely crazy security attacks and
2282
01:29:17,300 --> 01:29:19,070
one of their attacks is based on
2283
01:29:19,070 --> 01:29:21,680
increasing the temperature so
2284
01:29:21,680 --> 01:29:22,820
essentially they showed how you can
2285
01:29:22,820 --> 01:29:24,290
exploit a machine or the java virtual
2286
01:29:24,290 --> 01:29:25,820
machine by increasing the temperature
2287
01:29:25,820 --> 01:29:27,020
because when you increase the
2288
01:29:27,020 --> 01:29:28,400
temperature you produce these random
2289
01:29:28,400 --> 01:29:30,470
flips between zero and one and they
2290
01:29:30,470 --> 01:29:32,930
showed a particularly out in memory for
2291
01:29:32,930 --> 01:29:37,370
a Java program that will allow a single
2292
01:29:37,370 --> 01:29:39,350
bit flip to be exploited and to do
2293
01:29:39,350 --> 01:29:41,960
anything you want with the with the Java
2294
01:29:41,960 --> 01:29:44,660
Virtual Machine right so essentially all
2295
01:29:44,660 --> 01:29:47,530
you have to do is heat up the
2296
01:29:47,530 --> 01:29:49,280
temperature in the room
2297
01:29:49,280 --> 01:29:52,580
right up the memory is the first one to
2298
01:29:52,580 --> 01:29:54,950
go when it comes to heating up because
2299
01:29:54,950 --> 01:29:56,540
you're going to start having those zero
2300
01:29:56,540 --> 01:29:58,160
to one transitions in the memory but
2301
01:29:58,160 --> 01:29:59,330
there are other things by the way that
2302
01:29:59,330 --> 01:30:00,710
transition automatically and this is why
2303
01:30:00,710 --> 01:30:03,800
servers have error correcting memory
2304
01:30:03,800 --> 01:30:07,070
right for example a cosmic rays which do
2305
01:30:07,070 --> 01:30:10,130
happen right one goes through it will
2306
01:30:10,130 --> 01:30:12,440
actually produce a flip and they don't
2307
01:30:12,440 --> 01:30:14,330
happen too often but I mean a server
2308
01:30:14,330 --> 01:30:17,990
like this can stay up for a year right
2309
01:30:17,990 --> 01:30:20,030
it's bound to have at least one of those
2310
01:30:20,030 --> 01:30:21,260
going through the memory and flipping
2311
01:30:21,260 --> 01:30:22,970
something by the way this is where the
2312
01:30:22,970 --> 01:30:24,770
actor model is color comes into play and
2313
01:30:24,770 --> 01:30:26,390
actually active model in our language
2314
01:30:26,390 --> 01:30:28,310
was introduced for this rather than
2315
01:30:28,310 --> 01:30:30,320
design systems that never fail you
2316
01:30:30,320 --> 01:30:32,360
embrace failures and you embrace
2317
01:30:32,360 --> 01:30:34,640
monitoring under self or peer monitoring
2318
01:30:34,640 --> 01:30:36,200
and you can have actors monitoring other
2319
01:30:36,200 --> 01:30:39,020
actors and essentially you say this guy
2320
01:30:39,020 --> 01:30:41,540
it's in a weird state let's kill it and
2321
01:30:41,540 --> 01:30:43,640
start it again to recover some sort of a
2322
01:30:43,640 --> 01:30:45,560
good functioning of the system right so
2323
01:30:45,560 --> 01:30:48,140
one way to deal with fault tolerance is
2324
01:30:48,140 --> 01:30:49,730
to design systems that are more fault
2325
01:30:49,730 --> 01:30:51,590
tolerant and that's for example the
2326
01:30:51,590 --> 01:30:53,630
approach taken by the erickson with
2327
01:30:53,630 --> 01:30:55,970
width or length they design self-healing
2328
01:30:55,970 --> 01:30:57,440
systems or things things of this sort
2329
01:30:57,440 --> 01:31:00,770
and that definitely will alleviate the
2330
01:31:00,770 --> 01:31:02,420
fault tolerance problem or you could say
2331
01:31:02,420 --> 01:31:04,430
let it fail and I'll do something else
2332
01:31:04,430 --> 01:31:05,690
but then you still have this issue with
2333
01:31:05,690 --> 01:31:09,050
how you detect failure right so software
2334
01:31:09,050 --> 01:31:11,180
bugs are bugs power failure but there is
2335
01:31:11,180 --> 01:31:13,580
a special kind of failure that it's
2336
01:31:13,580 --> 01:31:15,770
extremely hard to protect against right
2337
01:31:15,770 --> 01:31:17,060
it's something I mentioned before it's
2338
01:31:17,060 --> 01:31:20,030
called Byzantine failures or let's call
2339
01:31:20,030 --> 01:31:21,680
them malicious failures and nobody
2340
01:31:21,680 --> 01:31:23,510
remembers why they are called Byzantine
2341
01:31:23,510 --> 01:31:25,070
failures there is no direct connection
2342
01:31:25,070 --> 01:31:27,770
with the Byzantine Empire or whatnot so
2343
01:31:27,770 --> 01:31:31,990
let's let's call them malicious failures
2344
01:31:35,940 --> 01:31:38,710
so the malicious failures are of the
2345
01:31:38,710 --> 01:31:40,630
following kind okay and this is what
2346
01:31:40,630 --> 01:31:42,160
it's really really really hard to
2347
01:31:42,160 --> 01:31:44,740
protect against them instead of having
2348
01:31:44,740 --> 01:31:46,360
one of these natural causes for the
2349
01:31:46,360 --> 01:31:48,460
failure and then the monitoring will do
2350
01:31:48,460 --> 01:31:52,150
it somebody goes in takes control of the
2351
01:31:52,150 --> 01:31:53,410
machine and make it do slightly
2352
01:31:53,410 --> 01:31:56,260
different things if they really know
2353
01:31:56,260 --> 01:31:58,030
what they're getting into of they could
2354
01:31:58,030 --> 01:32:01,120
wreak havoc on the entire distribution
2355
01:32:01,120 --> 01:32:02,920
system and not only that machine alone
2356
01:32:02,920 --> 01:32:05,440
right for example imagine any of the
2357
01:32:05,440 --> 01:32:08,290
distributed protocols running in which
2358
01:32:08,290 --> 01:32:10,180
one of the participants is not playing
2359
01:32:10,180 --> 01:32:12,640
by the book and it's not not playing by
2360
01:32:12,640 --> 01:32:13,810
the book because every now and then in
2361
01:32:13,810 --> 01:32:15,610
hiccups which could happen it's not
2362
01:32:15,610 --> 01:32:18,390
playing by the books intentionally to
2363
01:32:18,390 --> 01:32:20,890
wreak havoc on the disability protocol
2364
01:32:20,890 --> 01:32:23,470
right so for example I mentioned here
2365
01:32:23,470 --> 01:32:25,210
with the rights right the fact that they
2366
01:32:25,210 --> 01:32:26,500
are going to do exponential backup what
2367
01:32:26,500 --> 01:32:27,910
if one of the guys never does
2368
01:32:27,910 --> 01:32:29,320
exponential break up and just takes one
2369
01:32:29,320 --> 01:32:30,970
of the 1 and tokens now who gives it the
2370
01:32:30,970 --> 01:32:32,320
way that essentially means nobody rights
2371
01:32:32,320 --> 01:32:34,120
by the way that's a form of disability
2372
01:32:34,120 --> 01:32:37,690
denial of service attack right so this
2373
01:32:37,690 --> 01:32:40,000
kind especially malicious failures are a
2374
01:32:40,000 --> 01:32:43,360
tremendous problem right now in general
2375
01:32:43,360 --> 01:32:47,410
right we're going to discuss some of
2376
01:32:47,410 --> 01:32:51,010
this failure protection mechanisms that
2377
01:32:51,010 --> 01:32:52,960
can deal with malicious failures but in
2378
01:32:52,960 --> 01:32:54,970
fact you have to pay a dealer price for
2379
01:32:54,970 --> 01:32:57,700
it right the same resource has to be
2380
01:32:57,700 --> 01:32:59,560
available in multiple places and somehow
2381
01:32:59,560 --> 01:33:01,390
you're monitoring to see if somebody is
2382
01:33:01,390 --> 01:33:03,160
lying and beyond a certain point that is
2383
01:33:03,160 --> 01:33:04,570
nothing you can detect so a crucial
2384
01:33:04,570 --> 01:33:06,670
question is for example let's think
2385
01:33:06,670 --> 01:33:10,330
about peer-to-peer systems right so let
2386
01:33:10,330 --> 01:33:11,830
me give you an idea just how how far
2387
01:33:11,830 --> 01:33:16,540
this can go so not only that some
2388
01:33:16,540 --> 01:33:18,190
participants can be malicious the whole
2389
01:33:18,190 --> 01:33:20,080
system can be malicious right so one of
2390
01:33:20,080 --> 01:33:22,270
the things that surfaced is that one
2391
01:33:22,270 --> 01:33:24,160
peer-to-peer system that starts to be
2392
01:33:24,160 --> 01:33:25,510
quite popular I don't remember the name
2393
01:33:25,510 --> 01:33:27,460
in fact was only a honeypot for the
2394
01:33:27,460 --> 01:33:29,290
music industry to catch people that are
2395
01:33:29,290 --> 01:33:32,080
interested in sharing music to make a
2396
01:33:32,080 --> 01:33:35,260
list of who to Hue right so to some
2397
01:33:35,260 --> 01:33:37,570
extent the whole system was faulty in a
2398
01:33:37,570 --> 01:33:38,950
certain sentence and not only some
2399
01:33:38,950 --> 01:33:41,950
participants but another way you can
2400
01:33:41,950 --> 01:33:44,470
think about it is hey could I
2401
01:33:44,470 --> 01:33:47,020
participate or could I prevent this
2402
01:33:47,020 --> 01:33:48,400
could I participate in up here too
2403
01:33:48,400 --> 01:33:52,120
your system and by being maliciously
2404
01:33:52,120 --> 01:33:54,489
faulty disrupt the peer-to-peer protocol
2405
01:33:54,489 --> 01:33:55,750
right you just finished one of your
2406
01:33:55,750 --> 01:33:57,550
projects right imagine that some of
2407
01:33:57,550 --> 01:33:58,870
those guys would not play by the book
2408
01:33:58,870 --> 01:34:00,340
can you imagine ways in which they could
2409
01:34:00,340 --> 01:34:02,860
disrupt completing the system right they
2410
01:34:02,860 --> 01:34:04,270
can possibly disrupt the system by
2411
01:34:04,270 --> 01:34:06,070
taking any message that's Realty to one
2412
01:34:06,070 --> 01:34:08,830
and throw it as wrongly as possible
2413
01:34:08,830 --> 01:34:12,010
right that will slow things down a
2414
01:34:12,010 --> 01:34:13,570
little bit you can even compute how much
2415
01:34:13,570 --> 01:34:14,920
you can slow it down so crucial question
2416
01:34:14,920 --> 01:34:16,810
for example for that is how many
2417
01:34:16,810 --> 01:34:19,300
peer-to-peer participants would you need
2418
01:34:19,300 --> 01:34:21,670
to disrupt most of the function of a
2419
01:34:21,670 --> 01:34:22,900
peer-to-peer system if you use a
2420
01:34:22,900 --> 01:34:24,610
distributed hash table peer-to-peer
2421
01:34:24,610 --> 01:34:26,830
system all right I'm sure that is some
2422
01:34:26,830 --> 01:34:28,330
serious math that can be done there to
2423
01:34:28,330 --> 01:34:29,949
do some probabilistic computation to say
2424
01:34:29,949 --> 01:34:33,250
ah ten percent weeds enough nobody to
2425
01:34:33,250 --> 01:34:36,130
get any work done right for example if
2426
01:34:36,130 --> 01:34:38,350
you really throw the requests all the
2427
01:34:38,350 --> 01:34:40,150
way around in the wrong direction right
2428
01:34:40,150 --> 01:34:42,520
if the probability that one of your guys
2429
01:34:42,520 --> 01:34:45,219
is hit is reasonably high then
2430
01:34:45,219 --> 01:34:46,989
essentially you can make sure that none
2431
01:34:46,989 --> 01:34:49,810
of these protocols finish or finish in a
2432
01:34:49,810 --> 01:34:51,790
reasonable amount of time basically take
2433
01:34:51,790 --> 01:34:53,290
the performance down by an order of
2434
01:34:53,290 --> 01:34:56,860
George's or any good okay so this kind
2435
01:34:56,860 --> 01:34:58,270
of failures are going to be very hard to
2436
01:34:58,270 --> 01:35:00,370
deal with because it's hard to tell when
2437
01:35:00,370 --> 01:35:02,290
such a failure kicks in so a big
2438
01:35:02,290 --> 01:35:03,850
question is okay fine you do monitoring
2439
01:35:03,850 --> 01:35:05,530
but how do you know that somebody's
2440
01:35:05,530 --> 01:35:08,140
malicious so to some extent you need to
2441
01:35:08,140 --> 01:35:09,969
collect information in what you believe
2442
01:35:09,969 --> 01:35:12,760
are non malicious nodes or in a lot of
2443
01:35:12,760 --> 01:35:14,140
note some of the malicious some of the
2444
01:35:14,140 --> 01:35:16,330
non malicious and some how to compute
2445
01:35:16,330 --> 01:35:18,520
some sort of are you malicious or non
2446
01:35:18,520 --> 01:35:20,170
malicious property to tell if somebody's
2447
01:35:20,170 --> 01:35:22,300
malicious once you detected such a
2448
01:35:22,300 --> 01:35:24,010
failure you can isolate that node
2449
01:35:24,010 --> 01:35:25,540
through whatever mechanisms for example
2450
01:35:25,540 --> 01:35:27,429
you could use multicast or some sort of
2451
01:35:27,429 --> 01:35:30,429
broadcast to say don't trust that guy
2452
01:35:30,429 --> 01:35:32,530
and kick it out of the network but again
2453
01:35:32,530 --> 01:35:34,300
the problem is how do you know it's
2454
01:35:34,300 --> 01:35:37,210
malicious okay and it's a cat-and-mouse
2455
01:35:37,210 --> 01:35:39,699
kind of situation which is true for
2456
01:35:39,699 --> 01:35:41,230
almost anything related to in fact
2457
01:35:41,230 --> 01:35:47,170
security okay right so again what is
2458
01:35:47,170 --> 01:35:49,420
faulty how do I know so I'm a different
2459
01:35:49,420 --> 01:35:51,640
system how do I know somebody else is
2460
01:35:51,640 --> 01:35:53,800
faulty let's not worry about malicious
2461
01:35:53,800 --> 01:35:55,330
because this is just hard we need the
2462
01:35:55,330 --> 01:35:57,159
discussion separately but there are
2463
01:35:57,159 --> 01:35:59,139
participants and we talked about this
2464
01:35:59,139 --> 01:36:00,730
right bully algorithm and some
2465
01:36:00,730 --> 01:36:04,120
in action algorithms write those
2466
01:36:04,120 --> 01:36:06,160
algorithms needed some sort of a leader
2467
01:36:06,160 --> 01:36:11,140
and without the leader the system
2468
01:36:11,140 --> 01:36:12,820
doesn't work so you have to somehow
2469
01:36:12,820 --> 01:36:15,040
determine when the leader is faulty to
2470
01:36:15,040 --> 01:36:16,630
elect another leader for example that's
2471
01:36:16,630 --> 01:36:17,880
one of the basic things you could do
2472
01:36:17,880 --> 01:36:22,050
right so what does Molitor mean right
2473
01:36:22,050 --> 01:36:24,880
mechanisms need to be built-in in order
2474
01:36:24,880 --> 01:36:28,320
to be able to declare somebody faulty so
2475
01:36:28,320 --> 01:36:30,730
when you declare somebody faulty you
2476
01:36:30,730 --> 01:36:32,380
have other issues to deal with one of
2477
01:36:32,380 --> 01:36:34,000
them is the fact that okay I declare
2478
01:36:34,000 --> 01:36:35,980
somebody to be faulty I let other nodes
2479
01:36:35,980 --> 01:36:37,660
know that it's faulty but maybe the guy
2480
01:36:37,660 --> 01:36:39,370
did turn out not to be faulty in the end
2481
01:36:39,370 --> 01:36:41,800
and he comes alive for example in linear
2482
01:36:41,800 --> 01:36:42,790
election that could create problems
2483
01:36:42,790 --> 01:36:44,260
because if you have two leaders at the
2484
01:36:44,260 --> 01:36:46,810
same time then what the guy that wasn't
2485
01:36:46,810 --> 01:36:49,780
faulty or is not faulty anymore comes
2486
01:36:49,780 --> 01:36:52,590
back you already have another leader and
2487
01:36:52,590 --> 01:36:55,120
would your protocol run with two leaders
2488
01:36:55,120 --> 01:36:57,460
with this old guy figure out that there
2489
01:36:57,460 --> 01:37:00,280
is a new leader how always need kind of
2490
01:37:00,280 --> 01:37:02,400
message exchanges so the classic example
2491
01:37:02,400 --> 01:37:06,340
of how you could detect at least normal
2492
01:37:06,340 --> 01:37:09,070
kind of faults is keepalive right
2493
01:37:09,070 --> 01:37:14,530
keepalive messages or heartbeat it's
2494
01:37:14,530 --> 01:37:15,970
actually hard bit sorry but keep alive
2495
01:37:15,970 --> 01:37:21,660
is the how this heartbeats go so Harvey
2496
01:37:25,510 --> 01:37:31,010
okay so Hardwick mechanism consists in
2497
01:37:31,010 --> 01:37:32,900
having special kinds of messages that
2498
01:37:32,900 --> 01:37:34,910
are sent at regular intervals of time
2499
01:37:34,910 --> 01:37:39,170
and the lack of those messages lead to
2500
01:37:39,170 --> 01:37:41,270
somebody being declared dead it's like
2501
01:37:41,270 --> 01:37:42,679
taking the pulse of somebody if they
2502
01:37:42,679 --> 01:37:44,989
don't have ten successive heartbeats in
2503
01:37:44,989 --> 01:37:46,340
a certain amount of time you say they
2504
01:37:46,340 --> 01:37:47,780
are dead okay
2505
01:37:47,780 --> 01:37:49,880
so this is exactly what this is but that
2506
01:37:49,880 --> 01:37:51,290
essentially means that you're trading
2507
01:37:51,290 --> 01:37:53,060
performance for fault tolerance for
2508
01:37:53,060 --> 01:37:55,940
fault detection in particular you must
2509
01:37:55,940 --> 01:37:58,370
send its heartbeat heart beats because
2510
01:37:58,370 --> 01:38:00,770
if you're if you miss the heartbeat then
2511
01:38:00,770 --> 01:38:02,690
you will be declared faulty and
2512
01:38:02,690 --> 01:38:05,000
potentially kicked out of the the
2513
01:38:05,000 --> 01:38:15,739
network yes good so this is a very good
2514
01:38:15,739 --> 01:38:17,120
question so when it comes to hard bits
2515
01:38:17,120 --> 01:38:20,210
there are two questions to ask one is
2516
01:38:20,210 --> 01:38:21,710
how often do you send them and the other
2517
01:38:21,710 --> 01:38:24,260
one is after how many miss heartbeats do
2518
01:38:24,260 --> 01:38:26,210
you declare somebody dead right and
2519
01:38:26,210 --> 01:38:28,190
these things know it's not enough to do
2520
01:38:28,190 --> 01:38:30,290
this right I mean ideally you would
2521
01:38:30,290 --> 01:38:31,969
treat these heartbeats as an extremely
2522
01:38:31,969 --> 01:38:34,219
high priority messages right in the
2523
01:38:34,219 --> 01:38:35,600
sense that even if the machine is super
2524
01:38:35,600 --> 01:38:38,210
busy right you would like to still send
2525
01:38:38,210 --> 01:38:39,770
a hard way to say oh I'm alive don't
2526
01:38:39,770 --> 01:38:42,080
kill me no matter how busy I am right
2527
01:38:42,080 --> 01:38:44,449
which is not necessarily a simple or
2528
01:38:44,449 --> 01:38:47,239
easy thing so the fact that you can do
2529
01:38:47,239 --> 01:38:48,500
something like heart beats it doesn't
2530
01:38:48,500 --> 01:38:51,260
mean it will just work without problems
2531
01:38:51,260 --> 01:38:53,719
all the time right and in particular a
2532
01:38:53,719 --> 01:38:55,670
big worrisome thing is you declare lots
2533
01:38:55,670 --> 01:38:57,620
of things as being dead when they're not
2534
01:38:57,620 --> 01:38:59,390
dead at all it's just some kind of a
2535
01:38:59,390 --> 01:39:02,420
temporary hiccup right so for example if
2536
01:39:02,420 --> 01:39:04,070
you have stormy weather outside you is
2537
01:39:04,070 --> 01:39:05,900
they can have no internet connectivity
2538
01:39:05,900 --> 01:39:09,650
for let's say half a second because in
2539
01:39:09,650 --> 01:39:10,940
that amount of time there is too much
2540
01:39:10,940 --> 01:39:12,140
electrical charge nothing goes through
2541
01:39:12,140 --> 01:39:14,570
if you require a heartbeat every 30
2542
01:39:14,570 --> 01:39:16,130
milliseconds and all three heartbeats
2543
01:39:16,130 --> 01:39:17,870
you declare dead then you declare that
2544
01:39:17,870 --> 01:39:20,060
half the network that's not gonna be
2545
01:39:20,060 --> 01:39:21,650
good so it's tricky
2546
01:39:21,650 --> 01:39:23,750
exactly how you use it how you find unit
2547
01:39:23,750 --> 01:39:25,219
and what to do with it but this is some
2548
01:39:25,219 --> 01:39:26,660
sort of a mechanism that could be used
2549
01:39:26,660 --> 01:39:30,080
to detect something look the easy way
2550
01:39:30,080 --> 01:39:31,489
out is to say everything is too
2551
01:39:31,489 --> 01:39:32,960
complicated I'm not gonna do anything
2552
01:39:32,960 --> 01:39:34,219
but that's not quite an option when it
2553
01:39:34,219 --> 01:39:35,390
comes to engineering right so that's
2554
01:39:35,390 --> 01:39:36,980
what I want you to keep in mind
2555
01:39:36,980 --> 01:39:38,810
all these things are compromises all of
2556
01:39:38,810 --> 01:39:41,780
them have black spots here and there
2557
01:39:41,780 --> 01:39:43,730
right dark corners sometimes they are
2558
01:39:43,730 --> 01:39:44,929
not doing exactly what they are supposed
2559
01:39:44,929 --> 01:39:46,370
to do but you still need some mechanism
2560
01:39:46,370 --> 01:39:48,650
to to deal with the situation so I'm
2561
01:39:48,650 --> 01:39:51,020
gonna obviously discuss more about this
2562
01:39:51,020 --> 01:39:53,000
on on Thursday and we're gonna continue
2563
01:39:53,000 --> 01:39:55,370
discussing the full tolerance with all
2564
01:39:55,370 --> 01:39:56,989
this replication in mind because that's
2565
01:39:56,989 --> 01:39:58,580
one reason to do that to do the fault
2566
01:39:58,580 --> 01:40:00,020
tolerance it's not enough to detect
2567
01:40:00,020 --> 01:40:01,610
failures you have to do something about
2568
01:40:01,610 --> 01:40:03,800
it and that almost always means some
2569
01:40:03,800 --> 00:00:00,000
solution around replication all right