1
00:00:22,500 --> 00:00:25,949
all right so uh we need to talk more

2
00:00:25,949 --> 00:00:28,680
about replication and consistency of

3
00:00:28,680 --> 00:00:32,879
course it's in the picture well let me

4
00:00:32,879 --> 00:00:35,100
just point something out it's kind of

5
00:00:35,100 --> 00:00:37,050
interesting if we do no replication we

6
00:00:37,050 --> 00:00:39,570
essentially have no consistency issues

7
00:00:39,570 --> 00:00:42,180
well if you ignore the consistency

8
00:00:42,180 --> 00:00:44,129
issues within the same the same machine

9
00:00:44,129 --> 00:00:45,329
but never the last life would be much

10
00:00:45,329 --> 00:00:48,059
easier if you wouldn't replicate so this

11
00:00:48,059 --> 00:00:49,440
in fact begs the question why bother

12
00:00:49,440 --> 00:00:51,659
replicate and fault tolerance is the is

13
00:00:51,659 --> 00:00:54,600
the big big big issue together with

14
00:00:54,600 --> 00:00:56,640
performance enhancements but you have to

15
00:00:56,640 --> 00:00:57,930
be careful especially when you're

16
00:00:57,930 --> 00:00:59,699
selling replication as a performance

17
00:00:59,699 --> 00:01:02,220
enhancing technique you have to make

18
00:01:02,220 --> 00:01:03,629
sure that you're in fact achieving

19
00:01:03,629 --> 00:01:05,220
higher performance which is not

20
00:01:05,220 --> 00:01:07,110
necessarily the case right so that's the

21
00:01:07,110 --> 00:01:09,000
first question to ask when you need to

22
00:01:09,000 --> 00:01:11,040
pick solutions for replications all

23
00:01:11,040 --> 00:01:13,380
right am I really getting better

24
00:01:13,380 --> 00:01:14,970
performance I mean there are other ways

25
00:01:14,970 --> 00:01:18,090
too in fact yet some sort of protection

26
00:01:18,090 --> 00:01:20,130
against fault tolerance availability is

27
00:01:20,130 --> 00:01:21,420
one of the issues but maybe it's not

28
00:01:21,420 --> 00:01:24,150
even your biggest problem you can do

29
00:01:24,150 --> 00:01:25,590
backups and other things that can help

30
00:01:25,590 --> 00:01:28,620
with things besides availability which

31
00:01:28,620 --> 00:01:29,880
you should be doing in the first place

32
00:01:29,880 --> 00:01:32,610
right it turns out that in general it's

33
00:01:32,610 --> 00:01:35,130
hard to achieve performance improved

34
00:01:35,130 --> 00:01:37,950
performance right we've seen this for

35
00:01:37,950 --> 00:01:39,900
example when you have to use some sort

36
00:01:39,900 --> 00:01:42,180
of total ordering of operations right

37
00:01:42,180 --> 00:01:44,190
it's so tedious in terms of how many

38
00:01:44,190 --> 00:01:45,840
messages needs to be exchanged and so on

39
00:01:45,840 --> 00:01:48,210
that you can easily see that the

40
00:01:48,210 --> 00:01:51,380
performance could in fact dramatically

41
00:01:51,380 --> 00:01:54,299
spiral down right if that happens of

42
00:01:54,299 --> 00:01:56,970
course the benefits of doing replication

43
00:01:56,970 --> 00:01:58,760
disappear very fast so the fact that

44
00:01:58,760 --> 00:02:00,780
you're replicating doesn't mean

45
00:02:00,780 --> 00:02:02,610
necessarily you have a good solution all

46
00:02:02,610 --> 00:02:03,960
right so what I want to do in this class

47
00:02:03,960 --> 00:02:05,610
is start start discussing some other

48
00:02:05,610 --> 00:02:07,860
issues related to replication and

49
00:02:07,860 --> 00:02:09,959
discuss some solutions for all those

50
00:02:09,959 --> 00:02:11,849
consistency concerns we had before right

51
00:02:11,849 --> 00:02:12,959
so we only talked about consistency

52
00:02:12,959 --> 00:02:15,390
models but we didn't have any solution

53
00:02:15,390 --> 00:02:16,830
so we only said you must take care of

54
00:02:16,830 --> 00:02:18,420
this without any indication of how you

55
00:02:18,420 --> 00:02:20,819
might do that okay so let's do some of

56
00:02:20,819 --> 00:02:23,750
these things so one of the issues is a

57
00:02:23,750 --> 00:02:26,610
replica server placement and you can

58
00:02:26,610 --> 00:02:28,739
spend as little or as much time on this

59
00:02:28,739 --> 00:02:30,420
issue I mean that's true with almost any

60
00:02:30,420 --> 00:02:32,790
issues in in this area because there are

61
00:02:32,790 --> 00:02:35,720
no fixed determined

62
00:02:35,720 --> 00:02:38,200
probably good solutions in here right so

63
00:02:38,200 --> 00:02:40,910
the kind of situation you can have is

64
00:02:40,910 --> 00:02:42,110
the following right you have a

65
00:02:42,110 --> 00:02:44,330
non-uniform distribution of let's say

66
00:02:44,330 --> 00:02:45,410
clients I mean this could be

67
00:02:45,410 --> 00:02:47,920
geographical distribution or

68
00:02:47,920 --> 00:02:50,270
connectivity distribution or things of a

69
00:02:50,270 --> 00:02:54,620
sort and the question is how should you

70
00:02:54,620 --> 00:02:56,930
in fact allocate clients if you want to

71
00:02:56,930 --> 00:03:02,030
servers at least for the normal the

72
00:03:02,030 --> 00:03:03,980
normal execution the normal functioning

73
00:03:03,980 --> 00:03:05,930
of the system right so most of the time

74
00:03:05,930 --> 00:03:07,280
you would like the client to go to some

75
00:03:07,280 --> 00:03:09,950
sort of maybe closeby server and only if

76
00:03:09,950 --> 00:03:12,800
that server fails to use some sort of

77
00:03:12,800 --> 00:03:14,210
fault tolerance to move to another

78
00:03:14,210 --> 00:03:16,910
server so on and so forth now I want to

79
00:03:16,910 --> 00:03:19,640
point out that these problems in various

80
00:03:19,640 --> 00:03:21,650
forms have been in Cloudant encountered

81
00:03:21,650 --> 00:03:25,430
in other in other areas for example cell

82
00:03:25,430 --> 00:03:27,920
phone technology right this is a really

83
00:03:27,920 --> 00:03:30,110
big problem for cell phone technology we

84
00:03:30,110 --> 00:03:31,970
have to decide where you're gonna put

85
00:03:31,970 --> 00:03:36,950
those big antennas and what each of the

86
00:03:36,950 --> 00:03:40,340
cell phones has in fact through complex

87
00:03:40,340 --> 00:03:42,140
protocols has to determine which is the

88
00:03:42,140 --> 00:03:44,959
tower it actually talks to and if you

89
00:03:44,959 --> 00:03:46,730
don't have enough towers let's this is

90
00:03:46,730 --> 00:03:48,800
when you actually get congestion right

91
00:03:48,800 --> 00:03:50,450
and of course congestion in the cell

92
00:03:50,450 --> 00:03:51,590
phone network makes everybody very

93
00:03:51,590 --> 00:03:53,360
unhappy because cause get dropped and

94
00:03:53,360 --> 00:03:54,980
whatnot okay depending on how the

95
00:03:54,980 --> 00:03:56,450
technology is put together I mean you

96
00:03:56,450 --> 00:03:59,000
might get crappy phone call or you might

97
00:03:59,000 --> 00:04:01,090
simply not get connectivity at all right

98
00:04:01,090 --> 00:04:03,530
but the same kind of issue can be put in

99
00:04:03,530 --> 00:04:04,720
different contexts

100
00:04:04,720 --> 00:04:07,970
I keep on mentioning Akamai right Akamai

101
00:04:07,970 --> 00:04:11,120
is placing servers closer to the users

102
00:04:11,120 --> 00:04:12,380
but for ACMA is very important to

103
00:04:12,380 --> 00:04:13,880
determine where to place servers because

104
00:04:13,880 --> 00:04:17,209
none of this is free or particularly

105
00:04:17,209 --> 00:04:19,370
cheap right for Akamai to place a

106
00:04:19,370 --> 00:04:22,640
particular server in a particular if you

107
00:04:22,640 --> 00:04:24,050
want corner of the Internet I mean they

108
00:04:24,050 --> 00:04:25,480
have to spend money in rent space

109
00:04:25,480 --> 00:04:27,740
probably at the ISP which is gonna

110
00:04:27,740 --> 00:04:29,450
charge quite a lot because it's kind of

111
00:04:29,450 --> 00:04:33,290
prime real estate right so if this

112
00:04:33,290 --> 00:04:36,050
replica server placement is very much

113
00:04:36,050 --> 00:04:39,080
tied with money with commercial

114
00:04:39,080 --> 00:04:40,970
interests right and this is one of the

115
00:04:40,970 --> 00:04:43,729
reasons why it's somewhat studied in the

116
00:04:43,729 --> 00:04:45,500
research literature but extremely well

117
00:04:45,500 --> 00:04:47,000
studied behind the scenes and the

118
00:04:47,000 --> 00:04:48,840
solutions kept secret by the various

119
00:04:48,840 --> 00:04:50,040
companies that need to do this kind of

120
00:04:50,040 --> 00:04:52,230
things right so for Eisen and AT&T are

121
00:04:52,230 --> 00:04:53,850
very concerned about for example cell

122
00:04:53,850 --> 00:04:56,160
cell tower placement but now also about

123
00:04:56,160 --> 00:04:57,990
server placement Akamai is selling the

124
00:04:57,990 --> 00:04:59,340
service in which they determine where to

125
00:04:59,340 --> 00:05:03,780
place the servers each particular

126
00:05:03,780 --> 00:05:07,830
company can in fact dude

127
00:05:07,830 --> 00:05:09,690
I mean pose the same kind of a problem

128
00:05:09,690 --> 00:05:12,270
and in fact there is a problem that has

129
00:05:12,270 --> 00:05:14,370
been studied for close to hundred years

130
00:05:14,370 --> 00:05:17,340
which has nothing to do with I mean not

131
00:05:17,340 --> 00:05:20,190
in a direct way with necessarily

132
00:05:20,190 --> 00:05:23,910
replica placement but has to do with for

133
00:05:23,910 --> 00:05:26,130
example distribution center placement

134
00:05:26,130 --> 00:05:33,500
right so think for example about a large

135
00:05:33,500 --> 00:05:37,680
grocery store chain right Publix ok so

136
00:05:37,680 --> 00:05:39,750
Publix actually has to take when they

137
00:05:39,750 --> 00:05:41,430
expand and when they do various things I

138
00:05:41,430 --> 00:05:43,500
have to take some hard decisions with

139
00:05:43,500 --> 00:05:46,080
respect where to place those big

140
00:05:46,080 --> 00:05:48,090
warehouses right so the trucks come to

141
00:05:48,090 --> 00:05:50,639
the warehouse and then smaller trucks or

142
00:05:50,639 --> 00:05:53,220
different trucks go and serve each of

143
00:05:53,220 --> 00:05:55,889
the stores and to a large extent is the

144
00:05:55,889 --> 00:05:57,300
same kind of a problem so if you have a

145
00:05:57,300 --> 00:05:59,729
densely populated area right which has

146
00:05:59,729 --> 00:06:01,560
many many many stores then you better

147
00:06:01,560 --> 00:06:03,810
place more of this lot of warehouses and

148
00:06:03,810 --> 00:06:05,490
potentially of different sizes you can

149
00:06:05,490 --> 00:06:06,630
add size of the server as well

150
00:06:06,630 --> 00:06:08,610
capabilities of the server as well right

151
00:06:08,610 --> 00:06:10,110
because otherwise you're not gonna keep

152
00:06:10,110 --> 00:06:12,690
all those stores filled but if you have

153
00:06:12,690 --> 00:06:15,930
a sparsely populated area right you

154
00:06:15,930 --> 00:06:17,550
might actually place far fewer such

155
00:06:17,550 --> 00:06:20,280
warehouses and one of those warehouses

156
00:06:20,280 --> 00:06:22,650
is going to cost you I mean probably in

157
00:06:22,650 --> 00:06:24,419
the hundred million dollar range right

158
00:06:24,419 --> 00:06:27,000
so you don't really want to just really

159
00:06:27,000 --> 00:06:28,410
nearly place them in there now the

160
00:06:28,410 --> 00:06:30,720
property is not so bad for cell phone

161
00:06:30,720 --> 00:06:32,460
towers and it's probably much cheaper

162
00:06:32,460 --> 00:06:34,200
when it comes to placement of the of the

163
00:06:34,200 --> 00:06:36,539
servers and to some extent you have a

164
00:06:36,539 --> 00:06:38,550
lot more flexibility in creating the new

165
00:06:38,550 --> 00:06:40,830
one right so the cost of creating such a

166
00:06:40,830 --> 00:06:44,910
resource has to be figured out in the in

167
00:06:44,910 --> 00:06:46,380
the entire story which suggests the

168
00:06:46,380 --> 00:06:49,410
following kind of approach right these

169
00:06:49,410 --> 00:06:51,210
problems tend to be hard I mean it's

170
00:06:51,210 --> 00:06:53,910
known that optimal placement of such

171
00:06:53,910 --> 00:06:56,700
resources almost always is np-hard

172
00:06:56,700 --> 00:06:59,310
you're all taking the algorithms class

173
00:06:59,310 --> 00:07:01,289
right I think it's on the list of the

174
00:07:01,289 --> 00:07:02,050
core courses

175
00:07:02,050 --> 00:07:04,509
right so trying to find the best

176
00:07:04,509 --> 00:07:06,759
solution is a little bit foolish in this

177
00:07:06,759 --> 00:07:08,620
kind of circumstances a lot of them

178
00:07:08,620 --> 00:07:09,960
don't even have a good approximation

179
00:07:09,960 --> 00:07:13,240
algorithm right so good a eristic sits

180
00:07:13,240 --> 00:07:14,770
kind of the best you can hope for that

181
00:07:14,770 --> 00:07:16,419
work in real life and by the way this is

182
00:07:16,419 --> 00:07:18,789
what Akamai is famous for so Akamai was

183
00:07:18,789 --> 00:07:21,840
founded by theoreticians working on

184
00:07:21,840 --> 00:07:24,280
placement problems and actually

185
00:07:24,280 --> 00:07:25,900
randomized algorithms for placement

186
00:07:25,900 --> 00:07:27,819
which were not which were essentially

187
00:07:27,819 --> 00:07:29,020
offering some kind of randomized

188
00:07:29,020 --> 00:07:31,479
guarantees the algorithms they proved

189
00:07:31,479 --> 00:07:32,590
nice things you're ethically but they

190
00:07:32,590 --> 00:07:34,419
realize they're actually very cute in

191
00:07:34,419 --> 00:07:36,310
practice as well and this is when they

192
00:07:36,310 --> 00:07:38,020
started transitioning towards hey maybe

193
00:07:38,020 --> 00:07:40,000
we can actually figure out how to play

194
00:07:40,000 --> 00:07:42,130
the game right now for a company like

195
00:07:42,130 --> 00:07:44,110
Akamai and you might be close to the

196
00:07:44,110 --> 00:07:45,520
situation potentially at least some of

197
00:07:45,520 --> 00:07:48,310
you for a company like Akamai what's

198
00:07:48,310 --> 00:07:50,280
important is to be as lean as possible

199
00:07:50,280 --> 00:07:53,050
right other companies that do this kind

200
00:07:53,050 --> 00:07:54,909
of place that server placement can

201
00:07:54,909 --> 00:07:57,580
actually pop up and what matters is who

202
00:07:57,580 --> 00:07:59,259
can save more money when it comes to

203
00:07:59,259 --> 00:08:00,550
this because you can offer them the

204
00:08:00,550 --> 00:08:04,719
service a better cost so being wasteful

205
00:08:04,719 --> 00:08:06,909
in placement of the servers and so on

206
00:08:06,909 --> 00:08:09,250
right becomes very problematic I mean it

207
00:08:09,250 --> 00:08:11,199
literally can lead to bankruptcy very

208
00:08:11,199 --> 00:08:14,169
fast for a company like Akamai ok now

209
00:08:14,169 --> 00:08:16,419
similar problems are faced for example

210
00:08:16,419 --> 00:08:20,919
by the large if you want cloud companies

211
00:08:20,919 --> 00:08:23,440
which is about everybody now right for

212
00:08:23,440 --> 00:08:26,110
example when Google decides where to put

213
00:08:26,110 --> 00:08:28,150
another big data center right they have

214
00:08:28,150 --> 00:08:30,159
to think about where do they need more

215
00:08:30,159 --> 00:08:32,589
capacity if you want so this is these

216
00:08:32,589 --> 00:08:34,179
problems are called to some extent

217
00:08:34,179 --> 00:08:39,219
capacity problems right so one approach

218
00:08:39,219 --> 00:08:42,760
for example is find congestions in the

219
00:08:42,760 --> 00:08:44,440
system and place more capacity right

220
00:08:44,440 --> 00:08:45,060
there

221
00:08:45,060 --> 00:08:47,770
this is know not to lead to optimal

222
00:08:47,770 --> 00:08:49,870
solutions but might be reasonably good

223
00:08:49,870 --> 00:08:55,779
right right so it's rumored for example

224
00:08:55,779 --> 00:08:57,670
now that Google is building some sort of

225
00:08:57,670 --> 00:09:01,029
a floating data center in San Francisco

226
00:09:01,029 --> 00:09:04,300
area right people have seen some barges

227
00:09:04,300 --> 00:09:07,000
that look suspicious they ran out of

228
00:09:07,000 --> 00:09:08,589
land I suspect and there is I mean

229
00:09:08,589 --> 00:09:10,360
there's too much stuff going on in that

230
00:09:10,360 --> 00:09:10,840
area

231
00:09:10,840 --> 00:09:14,470
a lot of calm activity right but to a

232
00:09:14,470 --> 00:09:16,660
large extent is to see it's the same

233
00:09:16,660 --> 00:09:19,510
kind of problem so I don't want to talk

234
00:09:19,510 --> 00:09:21,570
too much about this because again even

235
00:09:21,570 --> 00:09:24,280
theoretically all these problems are

236
00:09:24,280 --> 00:09:25,810
actually very hard and you can't really

237
00:09:25,810 --> 00:09:30,120
prove too much about them a reasonable

238
00:09:30,120 --> 00:09:32,350
heuristic solution might actually go a

239
00:09:32,350 --> 00:09:33,880
long way I mean one of the problems is

240
00:09:33,880 --> 00:09:35,050
the following again I want to point out

241
00:09:35,050 --> 00:09:38,830
the dangers of trying too hard on these

242
00:09:38,830 --> 00:09:43,390
kinds of problems so when you model this

243
00:09:43,390 --> 00:09:45,360
problem you can model a theoretically as

244
00:09:45,360 --> 00:09:47,710
an optimization problem this is the

245
00:09:47,710 --> 00:09:50,680
standard approach right an optimization

246
00:09:50,680 --> 00:09:52,390
problem requires some sort of a goodness

247
00:09:52,390 --> 00:09:54,490
measure or badness measure and then what

248
00:09:54,490 --> 00:09:58,030
you're trying to do is say where should

249
00:09:58,030 --> 00:10:01,570
I place the servers or whatnot in order

250
00:10:01,570 --> 00:10:03,790
to let you improve so you define a

251
00:10:03,790 --> 00:10:05,500
mathematical efficiency and say where

252
00:10:05,500 --> 00:10:07,000
should I place the servers or improve

253
00:10:07,000 --> 00:10:08,560
the efficiency okay

254
00:10:08,560 --> 00:10:10,120
now that sounds very nice and produces

255
00:10:10,120 --> 00:10:12,220
very complicated papers and whatnot but

256
00:10:12,220 --> 00:10:14,380
there is a really big problem with any

257
00:10:14,380 --> 00:10:17,980
such approaches and the problem is are

258
00:10:17,980 --> 00:10:21,220
you measuring the right thing so what I

259
00:10:21,220 --> 00:10:23,950
mean by that is okay fine so let's think

260
00:10:23,950 --> 00:10:26,590
about how do we define a cost of let's

261
00:10:26,590 --> 00:10:30,820
say yep or what a benefit measure so

262
00:10:30,820 --> 00:10:32,680
let's say I want to put a server and I

263
00:10:32,680 --> 00:10:34,780
need to determine how much do I gain if

264
00:10:34,780 --> 00:10:36,340
I place a server day I need some sort of

265
00:10:36,340 --> 00:10:37,570
a measure like this if I want to solve

266
00:10:37,570 --> 00:10:39,970
an optimization problem okay so how am I

267
00:10:39,970 --> 00:10:42,490
gonna actually measure how much better

268
00:10:42,490 --> 00:10:43,720
things are going to be if I place a

269
00:10:43,720 --> 00:10:45,610
server in a particular place this turns

270
00:10:45,610 --> 00:10:48,490
out to be very problematic right because

271
00:10:48,490 --> 00:10:50,740
it's not something that can be done with

272
00:10:50,740 --> 00:10:52,600
very simple activations I mean you could

273
00:10:52,600 --> 00:10:54,760
try to approximate it and usually these

274
00:10:54,760 --> 00:10:57,670
approximations are quite rough right to

275
00:10:57,670 --> 00:10:59,710
a large extent until you actually place

276
00:10:59,710 --> 00:11:02,860
the server there and you try to do

277
00:11:02,860 --> 00:11:04,450
whatever is it that you're doing and you

278
00:11:04,450 --> 00:11:05,710
measure how much money you spend you

279
00:11:05,710 --> 00:11:06,910
can't really tell how much money you

280
00:11:06,910 --> 00:11:08,920
spend or let's say you want to improve

281
00:11:08,920 --> 00:11:10,180
things like latency on other things

282
00:11:10,180 --> 00:11:12,190
until you really place it there are so

283
00:11:12,190 --> 00:11:13,780
many factors it's so complicated that

284
00:11:13,780 --> 00:11:16,450
the the real setup is so complicated you

285
00:11:16,450 --> 00:11:18,250
can't know for sure what's going on in

286
00:11:18,250 --> 00:11:19,060
there okay

287
00:11:19,060 --> 00:11:20,740
by the way there is a counterpart

288
00:11:20,740 --> 00:11:22,690
problem in databases where you want to

289
00:11:22,690 --> 00:11:25,030
pick the best plan to execute a query

290
00:11:25,030 --> 00:11:27,700
of course that assumes that you can in

291
00:11:27,700 --> 00:11:29,950
fact tell how good the plan is if I just

292
00:11:29,950 --> 00:11:31,300
give you the plan which turns out to be

293
00:11:31,300 --> 00:11:34,690
absolutely not true okay so this points

294
00:11:34,690 --> 00:11:36,760
out to something that it's usually not

295
00:11:36,760 --> 00:11:38,680
mentioned especially in the in the more

296
00:11:38,680 --> 00:11:41,740
theoretical literature which is trying

297
00:11:41,740 --> 00:11:43,960
too hard is not worth it even if you can

298
00:11:43,960 --> 00:11:46,840
actually solve perfectly the

299
00:11:46,840 --> 00:11:48,460
optimization problem you might still

300
00:11:48,460 --> 00:11:49,810
solve the wrong problem because you

301
00:11:49,810 --> 00:11:51,940
can't measure you can't tell for sure

302
00:11:51,940 --> 00:11:53,770
how costly it's gonna be to do a certain

303
00:11:53,770 --> 00:11:55,810
activity in terms of anything you want

304
00:11:55,810 --> 00:11:59,320
right you can define measures for which

305
00:11:59,320 --> 00:12:01,990
you can solve some of these problems but

306
00:12:01,990 --> 00:12:03,520
they might not correspond to anything

307
00:12:03,520 --> 00:12:06,010
that's in real life right it's really

308
00:12:06,010 --> 00:12:07,270
one of the big problems plaguing

309
00:12:07,270 --> 00:12:09,130
databases I'm thinking any complex

310
00:12:09,130 --> 00:12:11,860
system right so even ask yourself hey if

311
00:12:11,860 --> 00:12:13,360
I place another server let's say in the

312
00:12:13,360 --> 00:12:14,800
department I mean could I tell for sure

313
00:12:14,800 --> 00:12:16,360
that some latency is reduce or some

314
00:12:16,360 --> 00:12:17,560
other things well not really because

315
00:12:17,560 --> 00:12:19,030
it's it's so complicated things go

316
00:12:19,030 --> 00:12:20,710
through so many wires I mean so much

317
00:12:20,710 --> 00:12:22,000
stuff you can get some rough

318
00:12:22,000 --> 00:12:24,610
approximation right so that also

319
00:12:24,610 --> 00:12:26,680
suggests that if your approximation then

320
00:12:26,680 --> 00:12:28,030
by the way approximations within a

321
00:12:28,030 --> 00:12:30,190
constant factor are hard right so for

322
00:12:30,190 --> 00:12:32,170
example in databases nobody can

323
00:12:32,170 --> 00:12:33,610
guarantee that they can approximate the

324
00:12:33,610 --> 00:12:35,530
cost of a query within any constant

325
00:12:35,530 --> 00:12:38,320
sector right including let's say ten you

326
00:12:38,320 --> 00:12:40,300
might not even be able to tell roughly

327
00:12:40,300 --> 00:12:42,820
the order of magnitude for the time

328
00:12:42,820 --> 00:12:45,370
running a query it's that bad so that's

329
00:12:45,370 --> 00:12:47,670
the case insisting go on an actual

330
00:12:47,670 --> 00:12:49,930
optimal solution whatever that means for

331
00:12:49,930 --> 00:12:51,850
such problems it's foolish in my opinion

332
00:12:51,850 --> 00:12:55,060
okay so basically all this discussion

333
00:12:55,060 --> 00:12:58,570
justifies in a heavy way heuristics and

334
00:12:58,570 --> 00:13:00,100
this is in the end what this boils down

335
00:13:00,100 --> 00:13:01,720
to is a lot of common sense and some

336
00:13:01,720 --> 00:13:03,160
reasonable ballistics where you do it

337
00:13:03,160 --> 00:13:04,900
unless you recommend they have some

338
00:13:04,900 --> 00:13:06,670
secret sauce in which to do some at

339
00:13:06,670 --> 00:13:08,380
least they look cool whatever is it that

340
00:13:08,380 --> 00:13:11,980
they are doing right well it turns out

341
00:13:11,980 --> 00:13:13,510
that their algorithms are in fact

342
00:13:13,510 --> 00:13:15,520
approximation algorithms that are kind

343
00:13:15,520 --> 00:13:18,370
of self adaptive that turn to kind of

344
00:13:18,370 --> 00:13:21,820
sense how things happen measure as you

345
00:13:21,820 --> 00:13:25,270
go and change and whatever other stuff

346
00:13:25,270 --> 00:13:26,440
they are doing I'm not gonna go there I

347
00:13:26,440 --> 00:13:28,420
mean it's done by extremely

348
00:13:28,420 --> 00:13:30,760
sophisticated people from MIT and a lot

349
00:13:30,760 --> 00:13:32,050
of it is secret sauce they don't even

350
00:13:32,050 --> 00:13:34,660
bother to patent anything that's on all

351
00:13:34,660 --> 00:13:35,980
right so this is the story with a

352
00:13:35,980 --> 00:13:37,930
replica server placement but it's a real

353
00:13:37,930 --> 00:13:38,920
problem

354
00:13:38,920 --> 00:13:43,000
right now there is a bigger problem that

355
00:13:43,000 --> 00:13:45,610
in general has to be solved which is how

356
00:13:45,610 --> 00:13:48,010
do you grow capacity depending on things

357
00:13:48,010 --> 00:13:51,310
like the load right so this is not even

358
00:13:51,310 --> 00:13:53,050
so much the static scenario you have

359
00:13:53,050 --> 00:13:54,310
here in which you already know where

360
00:13:54,310 --> 00:13:55,779
people are and how they are and you

361
00:13:55,779 --> 00:13:57,940
decided to place another server you

362
00:13:57,940 --> 00:14:00,160
might need to add capacity just because

363
00:14:00,160 --> 00:14:01,540
there are spikes in the usage and this

364
00:14:01,540 --> 00:14:02,829
happens all the time I mean this is one

365
00:14:02,829 --> 00:14:06,310
of the big things that goes in favor of

366
00:14:06,310 --> 00:14:08,589
any cloud-based service right they say

367
00:14:08,589 --> 00:14:10,750
hey no problem you can grow capacity by

368
00:14:10,750 --> 00:14:12,040
essentially just clicking a couple of

369
00:14:12,040 --> 00:14:13,449
buttons and then you fire up more

370
00:14:13,449 --> 00:14:17,800
websites on a more basically on more

371
00:14:17,800 --> 00:14:19,510
servers and this is kind of the promise

372
00:14:19,510 --> 00:14:21,670
that Amazon ec2 is making and is the

373
00:14:21,670 --> 00:14:24,310
promise that Microsoft Azure makes and

374
00:14:24,310 --> 00:14:26,740
so on right they say hey you can you can

375
00:14:26,740 --> 00:14:29,019
spawn off as many as many service

376
00:14:29,019 --> 00:14:31,360
service as maybe for a short period of

377
00:14:31,360 --> 00:14:34,060
time right so I mean that's one way to

378
00:14:34,060 --> 00:14:37,899
go about it but again all of these

379
00:14:37,899 --> 00:14:42,100
solutions have to be treated carefully a

380
00:14:42,100 --> 00:14:43,449
big problem of course with cloud-based

381
00:14:43,449 --> 00:14:45,370
services capacity growing you're

382
00:14:45,370 --> 00:14:48,220
essentially renting capacity is if you

383
00:14:48,220 --> 00:14:50,140
have an outage at the entire Amazon

384
00:14:50,140 --> 00:14:52,930
level which happened I think about a

385
00:14:52,930 --> 00:14:55,959
month ago then ha I mean a quarter of

386
00:14:55,959 --> 00:14:57,130
the internet goes down because they're

387
00:14:57,130 --> 00:15:00,519
all using easy to write I mean when it's

388
00:15:00,519 --> 00:15:02,079
bad is really bad I mean this is kind of

389
00:15:02,079 --> 00:15:05,680
the situation because an outage now in a

390
00:15:05,680 --> 00:15:07,660
big data center doesn't only affect that

391
00:15:07,660 --> 00:15:10,390
company affects everybody who was using

392
00:15:10,390 --> 00:15:16,740
their cloud service all right now

393
00:15:16,740 --> 00:15:19,480
replication especially for performance

394
00:15:19,480 --> 00:15:21,730
reasons is one of the main ways to

395
00:15:21,730 --> 00:15:23,440
improve performance so it's extremely

396
00:15:23,440 --> 00:15:25,120
important to understand refinements in

397
00:15:25,120 --> 00:15:26,709
what can be done with replication and

398
00:15:26,709 --> 00:15:28,300
this replication it's everywhere this is

399
00:15:28,300 --> 00:15:29,890
what I've been arguing for all the class

400
00:15:29,890 --> 00:15:33,940
and is mostly everywhere in the form of

401
00:15:33,940 --> 00:15:38,019
some sort of caching okay so it's worth

402
00:15:38,019 --> 00:15:40,420
talking about some tiny decimal points

403
00:15:40,420 --> 00:15:41,829
in there to see what choices you have

404
00:15:41,829 --> 00:15:43,959
when you do various things and I'm gonna

405
00:15:43,959 --> 00:15:45,310
do something that the textbook writers

406
00:15:45,310 --> 00:15:48,279
don't namely mention equivalent issues

407
00:15:48,279 --> 00:15:50,050
when it comes to computer architectures

408
00:15:50,050 --> 00:15:51,579
things that happen in

409
00:15:51,579 --> 00:15:54,249
inside the processor for example so it's

410
00:15:54,249 --> 00:15:55,389
kind of strange is why would the

411
00:15:55,389 --> 00:15:57,220
distributed system have any similarity

412
00:15:57,220 --> 00:15:58,600
what happens inside the processor well

413
00:15:58,600 --> 00:16:00,550
processors themselves are complex

414
00:16:00,550 --> 00:16:02,860
distributed systems nowadays believe it

415
00:16:02,860 --> 00:16:04,149
or not especially when it comes to

416
00:16:04,149 --> 00:16:05,769
multi-core and a lot of these issues

417
00:16:05,769 --> 00:16:07,389
have to be solved inside the processor

418
00:16:07,389 --> 00:16:08,199
you might have different solutions

419
00:16:08,199 --> 00:16:09,639
because you have a different trade-off

420
00:16:09,639 --> 00:16:11,649
but is the same issue in the first place

421
00:16:11,649 --> 00:16:12,759
ok so I'll point out some of those

422
00:16:12,759 --> 00:16:16,990
things so when it comes to to

423
00:16:16,990 --> 00:16:18,819
replication right there are a number of

424
00:16:18,819 --> 00:16:21,100
things you can do so the issue is what

425
00:16:21,100 --> 00:16:25,660
the issue is when right happen when the

426
00:16:25,660 --> 00:16:28,839
content changes how do you make the new

427
00:16:28,839 --> 00:16:32,800
content available and this I mean this

428
00:16:32,800 --> 00:16:34,179
is really the true solution because if

429
00:16:34,179 --> 00:16:35,949
you have a stable scenario in which

430
00:16:35,949 --> 00:16:38,559
nothing changes and you're done

431
00:16:38,559 --> 00:16:40,269
replicating whatever that means

432
00:16:40,269 --> 00:16:42,610
then is relatively easy you can keep a

433
00:16:42,610 --> 00:16:44,350
personal copy you keep it very close to

434
00:16:44,350 --> 00:16:45,550
yourself as long as you have enough

435
00:16:45,550 --> 00:16:48,040
storage for that right and then all you

436
00:16:48,040 --> 00:16:50,889
need to do is access the local copy not

437
00:16:50,889 --> 00:16:54,670
a big issue at all ok so that happens

438
00:16:54,670 --> 00:16:57,730
essentially all the times I mean even

439
00:16:57,730 --> 00:16:59,290
basic stuff like for example when you

440
00:16:59,290 --> 00:17:00,549
install the operating system on your

441
00:17:00,549 --> 00:17:02,259
machine and you're not keep on getting

442
00:17:02,259 --> 00:17:04,779
the files from some internet internet

443
00:17:04,779 --> 00:17:07,029
website that in itself is some form of

444
00:17:07,029 --> 00:17:10,059
caching in fact you could actually with

445
00:17:10,059 --> 00:17:12,609
a very small kernel and a distributed

446
00:17:12,609 --> 00:17:14,679
file system in principle get all those

447
00:17:14,679 --> 00:17:16,179
things from the cloud but if they look

448
00:17:16,179 --> 00:17:20,109
so foolish right now the technology is

449
00:17:20,109 --> 00:17:23,109
in the favor of caching when it comes to

450
00:17:23,109 --> 00:17:26,020
size when it comes to capacity because

451
00:17:26,020 --> 00:17:27,849
hard drives are cheaper and cheaper

452
00:17:27,849 --> 00:17:31,840
right if I mean it's almost unheard of

453
00:17:31,840 --> 00:17:33,370
not to have at least half a terabyte

454
00:17:33,370 --> 00:17:35,649
hard drive in essentially any machine

455
00:17:35,649 --> 00:17:39,220
unless it's an iPad or an iPhone right

456
00:17:39,220 --> 00:17:41,200
I mean laptops without at least half a

457
00:17:41,200 --> 00:17:42,460
terrible I mean let unless you have an

458
00:17:42,460 --> 00:17:45,070
SSD maybe you get a 128 but in any case

459
00:17:45,070 --> 00:17:48,520
you have a large amount of storage by at

460
00:17:48,520 --> 00:17:50,409
least 10 years old standards when it

461
00:17:50,409 --> 00:17:52,419
comes to to local machines which

462
00:17:52,419 --> 00:17:54,010
essentially means that one of the big

463
00:17:54,010 --> 00:17:56,169
obstacles to replication is removed

464
00:17:56,169 --> 00:18:02,279
namely storage it's cheap right so yeah

465
00:18:02,279 --> 00:18:04,370
when it comes to

466
00:18:04,370 --> 00:18:06,770
like replication the more problems you

467
00:18:06,770 --> 00:18:09,260
have to solve the harder it gets and the

468
00:18:09,260 --> 00:18:11,840
bigger the mess up potential problems

469
00:18:11,840 --> 00:18:14,660
with replication is decide what to

470
00:18:14,660 --> 00:18:18,940
replicate right so the first one is what

471
00:18:19,000 --> 00:18:23,440
right and I mean that is a how question

472
00:18:23,440 --> 00:18:25,970
right and which we are going to discuss

473
00:18:25,970 --> 00:18:29,960
but at least the what goes away if you

474
00:18:29,960 --> 00:18:31,760
have large stories for the most part

475
00:18:31,760 --> 00:18:34,010
because even if you not sure that it's

476
00:18:34,010 --> 00:18:35,600
gonna help you can still replicate it if

477
00:18:35,600 --> 00:18:37,040
you got your hands on it you can still

478
00:18:37,040 --> 00:18:39,679
replicate it and that's it right so what

479
00:18:39,679 --> 00:18:42,620
I mean by that is in most devices if

480
00:18:42,620 --> 00:18:44,420
unless maybe you are have very very

481
00:18:44,420 --> 00:18:46,520
small devices if you bothered to bring

482
00:18:46,520 --> 00:18:48,650
it you might as well keep it and you can

483
00:18:48,650 --> 00:18:51,170
keep it for a while and this is

484
00:18:51,170 --> 00:18:54,410
literally what cashing in into what web

485
00:18:54,410 --> 00:18:56,570
browsers does what a lot of other

486
00:18:56,570 --> 00:18:59,630
caching techniques would would in fact

487
00:18:59,630 --> 00:19:03,710
do right now the how it's still very

488
00:19:03,710 --> 00:19:06,050
important right especially if you are

489
00:19:06,050 --> 00:19:07,550
concerned about bandwidth and to some

490
00:19:07,550 --> 00:19:09,230
extent you have to be so bandwidth is

491
00:19:09,230 --> 00:19:10,670
going up so you don't need to be

492
00:19:10,670 --> 00:19:12,440
paranoid about bandwidth anymore but

493
00:19:12,440 --> 00:19:14,300
nevertheless there are some concerns now

494
00:19:14,300 --> 00:19:16,100
you don't need to be paranoid with

495
00:19:16,100 --> 00:19:17,690
respect to the bandwidth for the clients

496
00:19:17,690 --> 00:19:18,890
but you still have to be paranoid with

497
00:19:18,890 --> 00:19:20,360
respect to the bandwidth for the servers

498
00:19:20,360 --> 00:19:23,000
by the way depending on how many clients

499
00:19:23,000 --> 00:19:25,550
actually access the same server right so

500
00:19:25,550 --> 00:19:27,800
it's very easy to say yeah and we have

501
00:19:27,800 --> 00:19:29,360
now a lot of bandwidth so on and so

502
00:19:29,360 --> 00:19:32,270
forth well yes but not if for example

503
00:19:32,270 --> 00:19:34,150
you're serving a million people from

504
00:19:34,150 --> 00:19:37,250
just one server which is in principle

505
00:19:37,250 --> 00:19:39,440
possible and some people pull that out

506
00:19:39,440 --> 00:19:41,450
right so if that's the case you really

507
00:19:41,450 --> 00:19:44,630
want to have as little as possible

508
00:19:44,630 --> 00:19:47,300
messages going or on day ok so when it

509
00:19:47,300 --> 00:19:48,559
comes to this kind of replication and

510
00:19:48,559 --> 00:19:53,360
they serve these placements right in the

511
00:19:53,360 --> 00:19:54,830
middle we have this if you want the

512
00:19:54,830 --> 00:19:57,500
permanent replicas is where the content

513
00:19:57,500 --> 00:20:00,410
really lives in the outer ring we have

514
00:20:00,410 --> 00:20:02,240
the clients so somehow the clients have

515
00:20:02,240 --> 00:20:04,340
to get their hands on the data that

516
00:20:04,340 --> 00:20:06,950
resides in the permanent replicas right

517
00:20:06,950 --> 00:20:08,780
so the information has to be propagated

518
00:20:08,780 --> 00:20:11,059
correctly but such propagation can be

519
00:20:11,059 --> 00:20:12,800
initiated into fundamentally different

520
00:20:12,800 --> 00:20:14,600
ways right one of them is client

521
00:20:14,600 --> 00:20:17,210
initiated replicas and the other one is

522
00:20:17,210 --> 00:20:18,169
server initiated

523
00:20:18,169 --> 00:20:22,009
replicas okay so this is pull when the

524
00:20:22,009 --> 00:20:23,539
clients pull the content when they need

525
00:20:23,539 --> 00:20:25,629
it this is Porsche the servers will push

526
00:20:25,629 --> 00:20:28,279
whatever is it that they push to the

527
00:20:28,279 --> 00:20:31,659
clients okay and it turns out that in

528
00:20:31,659 --> 00:20:34,460
most sophisticated applications you in

529
00:20:34,460 --> 00:20:36,230
fact have to use a mix between a pool

530
00:20:36,230 --> 00:20:37,489
and a Porsche I'm gonna explain a little

531
00:20:37,489 --> 00:20:38,749
bit why that's the case because for

532
00:20:38,749 --> 00:20:40,580
certain kind of things it's better to

533
00:20:40,580 --> 00:20:42,080
pull and for certain other kinds of

534
00:20:42,080 --> 00:20:44,359
things is better to push okay now you

535
00:20:44,359 --> 00:20:47,029
can do a pure push based system and a

536
00:20:47,029 --> 00:20:49,970
pure pulled a system but you're you're

537
00:20:49,970 --> 00:20:55,059
essentially giving up on some big

538
00:20:55,059 --> 00:20:56,749
characteristics you might actually care

539
00:20:56,749 --> 00:20:59,899
about okay so let's just think about how

540
00:20:59,899 --> 00:21:02,210
this might actually work so the one

541
00:21:02,210 --> 00:21:03,710
you're most familiar with it's in fact

542
00:21:03,710 --> 00:21:05,659
the client initiated replicas because

543
00:21:05,659 --> 00:21:07,039
it's essentially what's happening by

544
00:21:07,039 --> 00:21:14,389
default in in webex's right so when you

545
00:21:14,389 --> 00:21:16,009
take your browser on any device you have

546
00:21:16,009 --> 00:21:17,749
any point into some kind of a URL what

547
00:21:17,749 --> 00:21:19,039
is it that you're doing you're going to

548
00:21:19,039 --> 00:21:21,139
the server and say give me the content

549
00:21:21,139 --> 00:21:22,129
of this resource

550
00:21:22,129 --> 00:21:23,840
I mean resource usually means some sort

551
00:21:23,840 --> 00:21:26,210
of a URL which you can't even say now

552
00:21:26,210 --> 00:21:27,590
that's a file or something else I mean

553
00:21:27,590 --> 00:21:28,759
it could be something completely cooked

554
00:21:28,759 --> 00:21:32,840
up on the fly right and as a client you

555
00:21:32,840 --> 00:21:35,529
can decide to do some sort of

556
00:21:35,529 --> 00:21:38,239
replication I mean you have to I want

557
00:21:38,239 --> 00:21:40,249
you to understand that there is no ifs

558
00:21:40,249 --> 00:21:41,720
and buts about it you have to do some

559
00:21:41,720 --> 00:21:44,210
sort of replication because you can't

560
00:21:44,210 --> 00:21:47,210
get it every frame right so I mean if

561
00:21:47,210 --> 00:21:48,889
you want to be paranoid literally you

562
00:21:48,889 --> 00:21:50,539
can say you know what I want to have the

563
00:21:50,539 --> 00:21:53,690
freshest content so what I'll do is I

564
00:21:53,690 --> 00:21:55,239
will keep the content 30 milliseconds

565
00:21:55,239 --> 00:21:57,379
exactly how much it takes me to display

566
00:21:57,379 --> 00:22:01,309
a frame on the computer no more I mean

567
00:22:01,309 --> 00:22:03,679
why is 30 milliseconds again below 30

568
00:22:03,679 --> 00:22:05,809
milliseconds human I cannot write if

569
00:22:05,809 --> 00:22:07,249
humans are your clients is 30

570
00:22:07,249 --> 00:22:08,889
milliseconds by the way if you're doing

571
00:22:08,889 --> 00:22:11,720
extremely fast trading using this fast

572
00:22:11,720 --> 00:22:13,580
boss you might decide to cache less than

573
00:22:13,580 --> 00:22:16,399
a millisecond okay so it's all kind of

574
00:22:16,399 --> 00:22:18,259
depending on the application right of

575
00:22:18,259 --> 00:22:19,399
course that would be crazy

576
00:22:19,399 --> 00:22:23,029
I mean imagine your device that you use

577
00:22:23,029 --> 00:22:25,460
to access the internet trying to get

578
00:22:25,460 --> 00:22:27,739
something every 30 milliseconds when the

579
00:22:27,739 --> 00:22:29,599
delays can easily be in half a second to

580
00:22:29,599 --> 00:22:31,430
a second to a lot of the service right

581
00:22:31,430 --> 00:22:33,740
nothing would actually work so you can

582
00:22:33,740 --> 00:22:36,260
think about cashing at many many many

583
00:22:36,260 --> 00:22:38,180
different levels including things like

584
00:22:38,180 --> 00:22:42,530
the sophisticated complicated you UI

585
00:22:42,530 --> 00:22:44,900
that the web browser displays on the

586
00:22:44,900 --> 00:22:48,140
screen it's not only the basic

587
00:22:48,140 --> 00:22:50,120
replication that you might think happens

588
00:22:50,120 --> 00:22:52,280
right namely that you got the HTML page

589
00:22:52,280 --> 00:22:54,080
or whatever javascript code or whatever

590
00:22:54,080 --> 00:22:56,870
is it that you're doing but it's cashing

591
00:22:56,870 --> 00:23:02,080
all the way between the server and

592
00:23:02,080 --> 00:23:06,020
consuming the content right so there is

593
00:23:06,020 --> 00:23:07,610
massive amount of caching that happens

594
00:23:07,610 --> 00:23:09,170
for example in the web browser itself I

595
00:23:09,170 --> 00:23:10,490
mean the web browser has to be as lazy

596
00:23:10,490 --> 00:23:12,380
as possible about what gets updated and

597
00:23:12,380 --> 00:23:14,000
what doesn't get updated and try to keep

598
00:23:14,000 --> 00:23:17,960
as much as possible static not changing

599
00:23:17,960 --> 00:23:20,210
because changing everything all the time

600
00:23:20,210 --> 00:23:22,700
right it's a big performance hog so web

601
00:23:22,700 --> 00:23:24,710
browsers got extremely sophisticated and

602
00:23:24,710 --> 00:23:28,100
determining what not to change okay it's

603
00:23:28,100 --> 00:23:30,260
very easy to change everything you just

604
00:23:30,260 --> 00:23:32,450
say fire up the whole processing from

605
00:23:32,450 --> 00:23:35,690
scratch but if you do so you suffer big

606
00:23:35,690 --> 00:23:37,340
performance problems so finding large

607
00:23:37,340 --> 00:23:39,140
areas of the screen that do not change

608
00:23:39,140 --> 00:23:41,330
for sure and not firing up any kind of

609
00:23:41,330 --> 00:23:43,370
updates in there it's a big performance

610
00:23:43,370 --> 00:23:46,160
improvement measure so that's some form

611
00:23:46,160 --> 00:23:48,890
of caching but your caching now pixels

612
00:23:48,890 --> 00:23:52,550
on the screen if you want not actual

613
00:23:52,550 --> 00:23:54,890
content in the backend all right so it

614
00:23:54,890 --> 00:23:55,940
doesn't matter how you call this

615
00:23:55,940 --> 00:23:58,670
technique being lazy about how these

616
00:23:58,670 --> 00:24:01,520
changes get propagated right it's one of

617
00:24:01,520 --> 00:24:01,910
them

618
00:24:01,910 --> 00:24:04,000
the best ways to improve performance

619
00:24:04,000 --> 00:24:07,550
okay so this is really what client

620
00:24:07,550 --> 00:24:09,650
initiated replicas are but then you have

621
00:24:09,650 --> 00:24:11,660
a big problem and you might not even

622
00:24:11,660 --> 00:24:13,370
realize but these are big questions and

623
00:24:13,370 --> 00:24:15,140
people that wrote even web browsers have

624
00:24:15,140 --> 00:24:17,630
to decide it's clear that you do want to

625
00:24:17,630 --> 00:24:20,480
cache things that tend not to change

626
00:24:20,480 --> 00:24:21,950
because that will significantly improve

627
00:24:21,950 --> 00:24:27,100
performance right so for example the

628
00:24:27,100 --> 00:24:29,420
these web applications right they have

629
00:24:29,420 --> 00:24:31,760
large amounts of JavaScript code I mean

630
00:24:31,760 --> 00:24:33,200
it makes a lot of sense to say I'm gonna

631
00:24:33,200 --> 00:24:35,030
catch that large amount of JavaScript

632
00:24:35,030 --> 00:24:37,520
code because bringing all that all the

633
00:24:37,520 --> 00:24:39,350
time I mean it's just it consumes

634
00:24:39,350 --> 00:24:42,290
resources on the other hand you might

635
00:24:42,290 --> 00:24:44,480
work on stale copies so a big question

636
00:24:44,480 --> 00:24:45,200
when it come

637
00:24:45,200 --> 00:24:46,760
always when it comes to replication is

638
00:24:46,760 --> 00:24:49,370
when to replicate right so we have a

639
00:24:49,370 --> 00:24:55,519
what how and when in particular the big

640
00:24:55,519 --> 00:24:58,700
question is when to invalidate whatever

641
00:24:58,700 --> 00:25:01,130
copy you have right now this problem

642
00:25:01,130 --> 00:25:03,649
it's in fact very old because one of the

643
00:25:03,649 --> 00:25:05,029
main techniques inside computer

644
00:25:05,029 --> 00:25:10,010
architecture systems is caching right so

645
00:25:10,010 --> 00:25:12,950
a big question there is how do you deal

646
00:25:12,950 --> 00:25:16,970
with information that stale so there are

647
00:25:16,970 --> 00:25:19,610
two primary techniques one of them is

648
00:25:19,610 --> 00:25:21,919
before you use a copy for example run

649
00:25:21,919 --> 00:25:23,210
multiple techniques but one of them is

650
00:25:23,210 --> 00:25:25,880
before you use the copy you could try to

651
00:25:25,880 --> 00:25:27,649
check and maybe the check is faster than

652
00:25:27,649 --> 00:25:29,899
than using the resource you can try to

653
00:25:29,899 --> 00:25:31,669
check to see if the resource is stale or

654
00:25:31,669 --> 00:25:34,429
not and that still a push I'm sorry a

655
00:25:34,429 --> 00:25:35,750
pool kind of technique in which the

656
00:25:35,750 --> 00:25:38,240
client is trying to figure out do I have

657
00:25:38,240 --> 00:25:41,750
a stable copy or a good copy okay so

658
00:25:41,750 --> 00:25:49,899
some sort of a check check for freshness

659
00:25:51,649 --> 00:25:54,139
the other one is to say you know what

660
00:25:54,139 --> 00:25:57,139
I'm going to assume that my copy is

661
00:25:57,139 --> 00:26:00,529
fresh but I'm gonna have whoever could

662
00:26:00,529 --> 00:26:01,999
change the content send me some

663
00:26:01,999 --> 00:26:04,879
notification when the copy it's invalid

664
00:26:04,879 --> 00:26:06,679
I think I actually have a picture for

665
00:26:06,679 --> 00:26:11,419
this let me write something like this

666
00:26:11,419 --> 00:26:12,590
I'm not going to go back to the other

667
00:26:12,590 --> 00:26:12,889
one

668
00:26:12,889 --> 00:26:19,129
right so this is notifications and then

669
00:26:19,129 --> 00:26:20,809
of course you have the full technique

670
00:26:20,809 --> 00:26:22,100
that's correspond to a notification is

671
00:26:22,100 --> 00:26:31,309
pushed full changes now when you you're

672
00:26:31,309 --> 00:26:32,899
talking about check freshness you can

673
00:26:32,899 --> 00:26:34,159
have various algorithms to check

674
00:26:34,159 --> 00:26:36,409
freshness and in fact for example web

675
00:26:36,409 --> 00:26:39,379
browsers have at least a workable

676
00:26:39,379 --> 00:26:41,269
solution for that right so that solution

677
00:26:41,269 --> 00:26:42,289
is basically the following they are

678
00:26:42,289 --> 00:26:43,820
trying to be quite lazy about it and try

679
00:26:43,820 --> 00:26:46,220
to guess when they should check for

680
00:26:46,220 --> 00:26:47,509
freshness because the checking for

681
00:26:47,509 --> 00:26:49,549
freshness it's it's a itself expensive

682
00:26:49,549 --> 00:26:51,499
in terms of latency right the latency is

683
00:26:51,499 --> 00:26:53,509
what kills any of these things if you

684
00:26:53,509 --> 00:26:55,070
have enough bandwidth if there might be

685
00:26:55,070 --> 00:26:56,869
almost no difference between checking

686
00:26:56,869 --> 00:26:58,429
for frettin freshness and bringing in

687
00:26:58,429 --> 00:27:01,700
your copy okay so you're trying to be as

688
00:27:01,700 --> 00:27:02,929
lazy as possible for checking for

689
00:27:02,929 --> 00:27:04,460
freshness but in the web browsers for

690
00:27:04,460 --> 00:27:07,789
example if you reload the page several

691
00:27:07,789 --> 00:27:09,679
times you kick in a mechanism that will

692
00:27:09,679 --> 00:27:12,580
bring new copies of everything because

693
00:27:12,580 --> 00:27:16,369
they I mean you need some sort of you

694
00:27:16,369 --> 00:27:18,529
know I don't give me the cached version

695
00:27:18,529 --> 00:27:20,389
because for some reason I'm sure that

696
00:27:20,389 --> 00:27:23,389
there is a better version right and that

697
00:27:23,389 --> 00:27:25,369
will force some sort of okay invalidate

698
00:27:25,369 --> 00:27:26,269
the caches and they'll bring new

699
00:27:26,269 --> 00:27:28,580
versions of these programs okay well I

700
00:27:28,580 --> 00:27:30,320
know this because when you do web

701
00:27:30,320 --> 00:27:31,850
development right this is your best

702
00:27:31,850 --> 00:27:34,700
friend otherwise you keep the changing

703
00:27:34,700 --> 00:27:36,649
the backend code and the front-end

704
00:27:36,649 --> 00:27:38,330
doesn't change anything because it

705
00:27:38,330 --> 00:27:40,639
decides your cache for six days right in

706
00:27:40,639 --> 00:27:42,049
six days my application is going to be

707
00:27:42,049 --> 00:27:44,600
very different so okay now these two

708
00:27:44,600 --> 00:27:46,249
solutions are are very different than

709
00:27:46,249 --> 00:27:47,480
this check for fragment so check for

710
00:27:47,480 --> 00:27:49,549
freshness is basically you lazily at

711
00:27:49,549 --> 00:27:51,110
whatever times you check for fragment

712
00:27:51,110 --> 00:27:52,369
freshness or look for indicators that

713
00:27:52,369 --> 00:27:53,929
something changed and when it changed

714
00:27:53,929 --> 00:27:55,610
you can decide to bring only what you

715
00:27:55,610 --> 00:27:56,720
need okay

716
00:27:56,720 --> 00:27:58,789
so this is completely driven by what's

717
00:27:58,789 --> 00:27:59,899
in fact accessed

718
00:27:59,899 --> 00:28:03,889
okay now this one notifications and post

719
00:28:03,889 --> 00:28:05,210
changes

720
00:28:05,210 --> 00:28:10,580
are driven by the server and then you

721
00:28:10,580 --> 00:28:12,530
have hard problems to solve on the

722
00:28:12,530 --> 00:28:14,570
server for now okay so this hard problem

723
00:28:14,570 --> 00:28:17,300
is one of the hard problems is who

724
00:28:17,300 --> 00:28:19,400
should be pushed the change to either

725
00:28:19,400 --> 00:28:21,160
the notification or the change in

726
00:28:21,160 --> 00:28:23,810
particular who is interested in this

727
00:28:23,810 --> 00:28:25,790
resource so that's not clear at all by

728
00:28:25,790 --> 00:28:28,040
the way so for example web browsers are

729
00:28:28,040 --> 00:28:30,020
not designed to give any indication of

730
00:28:30,020 --> 00:28:31,340
who's interested in such a resource

731
00:28:31,340 --> 00:28:33,710
unless you do special programming right

732
00:28:33,710 --> 00:28:36,650
so the fact that I access some web page

733
00:28:36,650 --> 00:28:38,360
doesn't mean I need to not be notified

734
00:28:38,360 --> 00:28:39,980
every time that web page changes and

735
00:28:39,980 --> 00:28:41,990
that would be madness if that's the

736
00:28:41,990 --> 00:28:44,750
solution used everywhere by most of the

737
00:28:44,750 --> 00:28:47,210
websites right it hey CNN updated their

738
00:28:47,210 --> 00:28:49,310
main web page you should definitely do

739
00:28:49,310 --> 00:28:54,290
something about it but and for example

740
00:28:54,290 --> 00:28:58,670
imagine though that I am watching for

741
00:28:58,670 --> 00:29:01,370
example stocks right the prices on the

742
00:29:01,370 --> 00:29:05,630
stock market right then and while I'm

743
00:29:05,630 --> 00:29:07,130
kind of on the web web page of those

744
00:29:07,130 --> 00:29:09,200
guys whatever that means I might

745
00:29:09,200 --> 00:29:10,400
actually be interested in updates

746
00:29:10,400 --> 00:29:12,860
because I'm interesting those events or

747
00:29:12,860 --> 00:29:14,630
imagine a chat application for example

748
00:29:14,630 --> 00:29:16,700
right in a chat application I really

749
00:29:16,700 --> 00:29:19,390
want to know and somebody else chats

750
00:29:19,390 --> 00:29:23,660
without me keep on saying ok so I waited

751
00:29:23,660 --> 00:29:25,340
already 3 seconds is there something

752
00:29:25,340 --> 00:29:26,360
more and that would be essentially

753
00:29:26,360 --> 00:29:28,430
invalidate my state which is I only know

754
00:29:28,430 --> 00:29:30,530
about my chest and go see if there is

755
00:29:30,530 --> 00:29:31,910
any other chat so for certain

756
00:29:31,910 --> 00:29:34,430
applications like chat notifications or

757
00:29:34,430 --> 00:29:36,380
in fact completely pushing changes make

758
00:29:36,380 --> 00:29:38,710
a lot of sense for other applications

759
00:29:38,710 --> 00:29:41,210
only check freshness might make any kind

760
00:29:41,210 --> 00:29:42,590
of sense right so this is the kind of

761
00:29:42,590 --> 00:29:43,850
things that you would have to ask

762
00:29:43,850 --> 00:29:45,650
yourself almost always when you design

763
00:29:45,650 --> 00:29:47,780
an application which one do you do you

764
00:29:47,780 --> 00:29:49,670
do now when it comes to notifications

765
00:29:49,670 --> 00:29:52,940
and push changes you in fact require a

766
00:29:52,940 --> 00:29:55,040
special facility in the server right

767
00:29:55,040 --> 00:29:57,170
which we did not talk extensively about

768
00:29:57,170 --> 00:29:58,910
but you require so-called stateful

769
00:29:58,910 --> 00:30:01,390
servers

770
00:30:04,940 --> 00:30:07,619
okay so stateful is the opposite of

771
00:30:07,619 --> 00:30:09,959
stateless so it's easier to explain what

772
00:30:09,959 --> 00:30:12,509
stateless means right there is some kind

773
00:30:12,509 --> 00:30:14,549
of gradation in between stateless is

774
00:30:14,549 --> 00:30:17,129
very easy you know right so stay to a

775
00:30:17,129 --> 00:30:24,869
server it's a server that keeps no

776
00:30:24,869 --> 00:30:27,749
information about the clients it gets a

777
00:30:27,749 --> 00:30:29,249
request satisfies the request sends the

778
00:30:29,249 --> 00:30:30,869
result and forgot about everything that

779
00:30:30,869 --> 00:30:32,429
happened now there is no such thing as

780
00:30:32,429 --> 00:30:34,019
purely stateless server because they all

781
00:30:34,019 --> 00:30:36,359
log something and whatever but they are

782
00:30:36,359 --> 00:30:37,859
really not gonna analyze their own logs

783
00:30:37,859 --> 00:30:39,809
so even if they log it's an external

784
00:30:39,809 --> 00:30:41,489
activity that goes on to see what was in

785
00:30:41,489 --> 00:30:44,429
the log right most of the web servers

786
00:30:44,429 --> 00:30:46,049
are designed to be stateless servers

787
00:30:46,049 --> 00:30:49,319
right that by the way that's a really

788
00:30:49,319 --> 00:30:50,669
big problem with some of the modern

789
00:30:50,669 --> 00:30:52,849
applications because the request comes

790
00:30:52,849 --> 00:30:55,139
every request is completely independent

791
00:30:55,139 --> 00:30:57,209
of all the other requests every request

792
00:30:57,209 --> 00:30:59,669
is treated as essentially a pure

793
00:30:59,669 --> 00:31:00,329
function

794
00:31:00,329 --> 00:31:02,369
you got some input you go you cook up

795
00:31:02,369 --> 00:31:04,139
whatever is it that you're doing and you

796
00:31:04,139 --> 00:31:09,619
send back the the result yes I'm sorry

797
00:31:09,619 --> 00:31:14,429
what cookie well okay we'll talk about

798
00:31:14,429 --> 00:31:16,799
the cookies in a second okay so that's

799
00:31:16,799 --> 00:31:18,749
blending things a little bit but cookies

800
00:31:18,749 --> 00:31:20,519
only help with one issue but it's not

801
00:31:20,519 --> 00:31:22,739
gonna add too much State okay one second

802
00:31:22,739 --> 00:31:25,349
right so for example Apache and other

803
00:31:25,349 --> 00:31:26,940
web servers who are designed to be State

804
00:31:26,940 --> 00:31:28,499
or servers now of course you can't

805
00:31:28,499 --> 00:31:30,389
really do anything with purely stateful

806
00:31:30,389 --> 00:31:32,879
servers and normally the usual

807
00:31:32,879 --> 00:31:33,690
architecture is

808
00:31:33,690 --> 00:31:34,889
I mean this is the classic architecture

809
00:31:34,889 --> 00:31:36,779
for a web server and it's extremely

810
00:31:36,779 --> 00:31:38,399
relevant for this kind of issues right

811
00:31:38,399 --> 00:31:43,079
so you have the web browser who talks

812
00:31:43,079 --> 00:31:46,709
through the internet whatever that means

813
00:31:46,709 --> 00:31:48,869
with the server and inside the server

814
00:31:48,869 --> 00:31:50,099
this is the architecture you're gonna

815
00:31:50,099 --> 00:31:51,479
have you're gonna have a party let's say

816
00:31:51,479 --> 00:31:55,979
so let's say the web server right that

817
00:31:55,979 --> 00:31:58,619
actually gets a gets a request if

818
00:31:58,619 --> 00:32:01,829
multiple requests come you essentially

819
00:32:01,829 --> 00:32:03,359
fire up different processes or different

820
00:32:03,359 --> 00:32:04,679
friends depending on the implementation

821
00:32:04,679 --> 00:32:06,119
of the web server but they don't talk to

822
00:32:06,119 --> 00:32:08,159
each other this is the classic web

823
00:32:08,159 --> 00:32:11,099
server architecture the web server is

824
00:32:11,099 --> 00:32:14,069
gonna fire up some sort of a language

825
00:32:14,069 --> 00:32:15,570
behind the back I mean

826
00:32:15,570 --> 00:32:18,059
if the job is easy namely just serve a

827
00:32:18,059 --> 00:32:21,029
file it will grab the grab the file from

828
00:32:21,029 --> 00:32:22,919
the file system and serve it back so it

829
00:32:22,919 --> 00:32:24,779
could go let's say to the file system

830
00:32:24,779 --> 00:32:27,929
and then send it right back well it's

831
00:32:27,929 --> 00:32:31,289
really from the web server that happens

832
00:32:31,289 --> 00:32:32,970
or you could go to for example if you

833
00:32:32,970 --> 00:32:34,379
want to serve dynamic content you might

834
00:32:34,379 --> 00:32:40,080
go to something like PHP alright so PHP

835
00:32:40,080 --> 00:32:42,869
is gonna cook carryout on computation

836
00:32:42,869 --> 00:32:44,429
that's going to do something which I'll

837
00:32:44,429 --> 00:32:45,929
mention in a second the important thing

838
00:32:45,929 --> 00:32:49,649
is PHP itself it's gonna be stateless in

839
00:32:49,649 --> 00:32:52,409
the sense that you create a PHP session

840
00:32:52,409 --> 00:32:54,029
it does something and you destroy the

841
00:32:54,029 --> 00:32:56,549
PHP session so there is no state carried

842
00:32:56,549 --> 00:32:58,799
through the session itself even if you

843
00:32:58,799 --> 00:33:00,629
have tricks for example to keep the PHP

844
00:33:00,629 --> 00:33:02,999
alive more right you're still wiping out

845
00:33:02,999 --> 00:33:05,940
the state now the this is not going to

846
00:33:05,940 --> 00:33:07,529
be very good if you need to for example

847
00:33:07,529 --> 00:33:11,580
keep track of transactions I mean moving

848
00:33:11,580 --> 00:33:13,619
money around buying items and so on so

849
00:33:13,619 --> 00:33:15,210
somebody must keep some sort of state

850
00:33:15,210 --> 00:33:16,979
but that in the traditional architecture

851
00:33:16,979 --> 00:33:18,539
is done through some sort of a database

852
00:33:18,539 --> 00:33:21,359
back-end so you have a database back-end

853
00:33:21,359 --> 00:33:23,190
and in fact what's going to happen is

854
00:33:23,190 --> 00:33:25,379
you have connection to the web server

855
00:33:25,379 --> 00:33:28,080
web server to PHP it's still stateless

856
00:33:28,080 --> 00:33:30,359
PHP is gonna do operations against the

857
00:33:30,359 --> 00:33:31,919
database and it's the only one that

858
00:33:31,919 --> 00:33:33,389
actually maintains state and it's the

859
00:33:33,389 --> 00:33:35,460
only one where states fight with each

860
00:33:35,460 --> 00:33:36,960
other and where you have concurrency

861
00:33:36,960 --> 00:33:41,639
problems and then the whole thing comes

862
00:33:41,639 --> 00:33:45,720
back so this doesn't keep any any big

863
00:33:45,720 --> 00:33:47,399
information this doesn't keep any big

864
00:33:47,399 --> 00:33:48,539
information this keeps all the

865
00:33:48,539 --> 00:33:50,369
information so this is stateful but it's

866
00:33:50,369 --> 00:33:52,649
consolidating the database so any

867
00:33:52,649 --> 00:33:54,509
fighting that goes on and any

868
00:33:54,509 --> 00:33:56,429
consistency issues are already resolved

869
00:33:56,429 --> 00:33:58,200
at the database at the database level

870
00:33:58,200 --> 00:34:01,950
now I did mention before that database

871
00:34:01,950 --> 00:34:04,379
technology is many years ahead in terms

872
00:34:04,379 --> 00:34:06,749
of consistency than any anything else

873
00:34:06,749 --> 00:34:08,190
and this is really why is the preferred

874
00:34:08,190 --> 00:34:11,280
way to if you want solve consistency

875
00:34:11,280 --> 00:34:13,020
problems so you have no consistency

876
00:34:13,020 --> 00:34:14,909
issues here and here for the most part

877
00:34:14,909 --> 00:34:16,949
because they are stateless and you push

878
00:34:16,949 --> 00:34:18,210
all the consistency issues in the

879
00:34:18,210 --> 00:34:20,010
database which is reasonably mature

880
00:34:20,010 --> 00:34:22,199
technology okay now why do I say almost

881
00:34:22,199 --> 00:34:26,849
because of this cookie stuff okay so I

882
00:34:26,849 --> 00:34:28,989
mean what are these cookies

883
00:34:28,989 --> 00:34:31,329
and why are they useful by the way they

884
00:34:31,329 --> 00:34:32,619
are kind of growing away I mean there

885
00:34:32,619 --> 00:34:34,260
are better solutions and cookies now

886
00:34:34,260 --> 00:34:36,639
right I believe you can do much better

887
00:34:36,639 --> 00:34:39,579
than cookies well so this has to do with

888
00:34:39,579 --> 00:34:41,859
the fact that it's extremely annoying to

889
00:34:41,859 --> 00:34:43,629
repeat the same operation many times for

890
00:34:43,629 --> 00:34:45,909
for humans so what I mean by that is

891
00:34:45,909 --> 00:34:48,960
imagine that you need to restrict access

892
00:34:48,960 --> 00:34:51,579
so a big problem with any kind of

893
00:34:51,579 --> 00:34:52,839
security and we are going to talk

894
00:34:52,839 --> 00:34:54,460
extensively about this a big issue with

895
00:34:54,460 --> 00:34:56,500
any kind of security is the moment it

896
00:34:56,500 --> 00:35:00,010
becomes annoying people don't use it ok

897
00:35:00,010 --> 00:35:03,160
so how do you make security secure and

898
00:35:03,160 --> 00:35:05,829
not annoying at the same time all right

899
00:35:05,829 --> 00:35:07,960
so what I mean by that is imagine that

900
00:35:07,960 --> 00:35:12,760
for example you're on eBay yes but every

901
00:35:12,760 --> 00:35:14,349
time you click on another page I'm gonna

902
00:35:14,349 --> 00:35:17,380
ask again for the password validate for

903
00:35:17,380 --> 00:35:19,839
that one request that you're who you are

904
00:35:19,839 --> 00:35:21,760
and go and serve the page I mean first

905
00:35:21,760 --> 00:35:23,380
of all it's not clear at all that I have

906
00:35:23,380 --> 00:35:25,359
a single request on a webpage and I

907
00:35:25,359 --> 00:35:26,680
might have many many different requests

908
00:35:26,680 --> 00:35:27,760
but let's say not all of them need to be

909
00:35:27,760 --> 00:35:30,250
secure ok only only one of the requests

910
00:35:30,250 --> 00:35:31,720
but even typing your password every time

911
00:35:31,720 --> 00:35:36,099
you're gonna go insane right not good so

912
00:35:36,099 --> 00:35:38,589
what cookies helped with is to create

913
00:35:38,589 --> 00:35:41,589
some temporary validation ones you typed

914
00:35:41,589 --> 00:35:43,240
in one of these passwords to say for the

915
00:35:43,240 --> 00:35:45,250
next whatever amount of time or until

916
00:35:45,250 --> 00:35:48,099
but such right is revoked you don't need

917
00:35:48,099 --> 00:35:49,809
to type a password ok now that's a

918
00:35:49,809 --> 00:35:51,940
technique used for example by MACO

919
00:35:51,940 --> 00:35:55,630
extent once you log in as as an admin

920
00:35:55,630 --> 00:35:57,790
for a couple of minutes you're gonna add

921
00:35:57,790 --> 00:35:58,930
me now that's done from the pseudo

922
00:35:58,930 --> 00:36:00,819
program right that's true also on on

923
00:36:00,819 --> 00:36:03,220
Linux once you successfully login into

924
00:36:03,220 --> 00:36:04,869
pseudo depending on the policy set in

925
00:36:04,869 --> 00:36:06,220
the system for a couple of minutes you

926
00:36:06,220 --> 00:36:07,599
still are pseudo and if you keep on

927
00:36:07,599 --> 00:36:09,609
doing things you're fine but the moment

928
00:36:09,609 --> 00:36:11,710
you have a gap of whatever set in the

929
00:36:11,710 --> 00:36:13,930
system and the previous goes away ok now

930
00:36:13,930 --> 00:36:15,910
usually cookies live a lot longer I mean

931
00:36:15,910 --> 00:36:17,920
I'm some websites use cookies that stay

932
00:36:17,920 --> 00:36:21,130
alive a month now that's what cookies

933
00:36:21,130 --> 00:36:22,869
were designed for what they are used for

934
00:36:22,869 --> 00:36:26,559
it's a much more abusive thing ok keep

935
00:36:26,559 --> 00:36:30,010
track of people and whatnot okay so

936
00:36:30,010 --> 00:36:34,200
cookies are to a large extent poor man's

937
00:36:34,200 --> 00:36:38,650
state something right so the server

938
00:36:38,650 --> 00:36:40,270
needs to keep a little bit of state and

939
00:36:40,270 --> 00:36:42,760
they invented these cookies to allow

940
00:36:42,760 --> 00:36:44,590
stay to be kept record so what's

941
00:36:44,590 --> 00:36:46,150
happening with the cookie is by the way

942
00:36:46,150 --> 00:36:48,790
is the server itself is the still

943
00:36:48,790 --> 00:36:51,910
reasonably stateless the cookie it's

944
00:36:51,910 --> 00:36:54,700
stored on the client side and the cookie

945
00:36:54,700 --> 00:36:56,430
it send with every single request

946
00:36:56,430 --> 00:36:59,050
automatically so instead of having the

947
00:36:59,050 --> 00:37:00,880
user type all the time something you

948
00:37:00,880 --> 00:37:02,380
simply have the system present the

949
00:37:02,380 --> 00:37:04,120
cookie which is now cryptographically

950
00:37:04,120 --> 00:37:06,760
secure hopefully right so the cookie

951
00:37:06,760 --> 00:37:09,010
will fly with any request the server

952
00:37:09,010 --> 00:37:10,900
will check that the cookies valid that

953
00:37:10,900 --> 00:37:13,390
will be if you want the certificate that

954
00:37:13,390 --> 00:37:14,800
yes you can access the website under

955
00:37:14,800 --> 00:37:17,050
certain credentials but it's not

956
00:37:17,050 --> 00:37:18,460
something that the user does which is

957
00:37:18,460 --> 00:37:21,130
important so a lot of issues related to

958
00:37:21,130 --> 00:37:22,660
security have to be solved from these

959
00:37:22,660 --> 00:37:24,970
automated mechanisms right because that

960
00:37:24,970 --> 00:37:26,710
will make it a lot more bearable for the

961
00:37:26,710 --> 00:37:27,850
user increases a little bit the

962
00:37:27,850 --> 00:37:29,650
bandwidth and maybe decreases mildly

963
00:37:29,650 --> 00:37:33,240
performers not even that much okay and

964
00:37:33,240 --> 00:37:35,620
the web server itself it basically is

965
00:37:35,620 --> 00:37:37,180
going to store the valid cookie

966
00:37:37,180 --> 00:37:38,710
somewhere and compare any cookie with

967
00:37:38,710 --> 00:37:39,970
the valley cookies to see if it's valid

968
00:37:39,970 --> 00:37:42,340
so has a little bit of state okay now it

969
00:37:42,340 --> 00:37:43,690
turns out that this is not the only way

970
00:37:43,690 --> 00:37:45,460
you could do things you can in fact have

971
00:37:45,460 --> 00:37:48,190
and sometimes it is highly desirable to

972
00:37:48,190 --> 00:37:50,260
do so you can in fact have staked full

973
00:37:50,260 --> 00:37:52,810
servers okay

974
00:37:52,810 --> 00:37:55,210
so state full server it's a server that

975
00:37:55,210 --> 00:37:57,180
will stay alive all the time by the way

976
00:37:57,180 --> 00:38:00,040
this server it's only kind of allies it

977
00:38:00,040 --> 00:38:01,750
listens on the port but it doesn't have

978
00:38:01,750 --> 00:38:03,610
that the stayed up right and in fact has

979
00:38:03,610 --> 00:38:05,170
to fire a big machinery or maybe it's

980
00:38:05,170 --> 00:38:06,730
doing some kind of clever tricks not to

981
00:38:06,730 --> 00:38:08,530
fire up so much so much but for the most

982
00:38:08,530 --> 00:38:10,090
part it's wiping out his memory on every

983
00:38:10,090 --> 00:38:11,770
request you could in fact have servers

984
00:38:11,770 --> 00:38:14,350
that don't wipe out anything and in fact

985
00:38:14,350 --> 00:38:16,420
try to aggressively keep things in

986
00:38:16,420 --> 00:38:17,920
memory to speed things up so one

987
00:38:17,920 --> 00:38:19,180
particular technique for the server

988
00:38:19,180 --> 00:38:22,470
would be don't be still so stateless

989
00:38:22,470 --> 00:38:25,570
cache yourself a lot of the things that

990
00:38:25,570 --> 00:38:28,360
otherwise would be used in memory right

991
00:38:28,360 --> 00:38:29,620
because then you don't have to go to

992
00:38:29,620 --> 00:38:31,300
your desk or something else and that

993
00:38:31,300 --> 00:38:32,680
will speed things up at least on the

994
00:38:32,680 --> 00:38:35,620
server side right and in fact that's

995
00:38:35,620 --> 00:38:38,170
what is needed for something like like

996
00:38:38,170 --> 00:38:40,060
chatting right so when it comes to

997
00:38:40,060 --> 00:38:41,920
chatting you really want to do this push

998
00:38:41,920 --> 00:38:44,500
changes but if you do push changes then

999
00:38:44,500 --> 00:38:46,230
the server cannot be stateless really

1000
00:38:46,230 --> 00:38:48,970
right now you could still do something

1001
00:38:48,970 --> 00:38:50,350
like this in which all the state is

1002
00:38:50,350 --> 00:38:53,470
pushing the database right but I mean a

1003
00:38:53,470 --> 00:38:53,940
big

1004
00:38:53,940 --> 00:38:56,550
you still I mean the question the big

1005
00:38:56,550 --> 00:38:59,300
question there is how do you push sake

1006
00:38:59,300 --> 00:39:03,120
so the pool it's easy the client makes

1007
00:39:03,120 --> 00:39:05,480
the request opens a tcp/ip connection or

1008
00:39:05,480 --> 00:39:08,040
it goes through this guy was listening

1009
00:39:08,040 --> 00:39:10,080
fire up one of this fire up one of this

1010
00:39:10,080 --> 00:39:11,580
depending on which part of the web site

1011
00:39:11,580 --> 00:39:13,560
you access fire up the connection to the

1012
00:39:13,560 --> 00:39:15,390
database the database is always up right

1013
00:39:15,390 --> 00:39:17,670
that guy stays up right and then the

1014
00:39:17,670 --> 00:39:19,650
reply goes back but if it comes to

1015
00:39:19,650 --> 00:39:23,100
pushing how do you push so you must have

1016
00:39:23,100 --> 00:39:26,760
built-in mechanisms to produce such such

1017
00:39:26,760 --> 00:39:28,050
okay

1018
00:39:28,050 --> 00:39:29,640
now before I go into details I mean

1019
00:39:29,640 --> 00:39:30,900
point an interesting use of

1020
00:39:30,900 --> 00:39:32,730
notifications which is highly

1021
00:39:32,730 --> 00:39:34,200
non-obvious so not if the difference

1022
00:39:34,200 --> 00:39:35,730
between notification and push changes is

1023
00:39:35,730 --> 00:39:38,220
you're not really saying here is

1024
00:39:38,220 --> 00:39:40,140
everything that changed in the

1025
00:39:40,140 --> 00:39:41,580
notification you just say the cop is no

1026
00:39:41,580 --> 00:39:42,510
longer valid

1027
00:39:42,510 --> 00:39:44,730
you're not saying how it changed you

1028
00:39:44,730 --> 00:39:46,500
just say your copy is not valid which

1029
00:39:46,500 --> 00:39:48,000
essentially means notifications have to

1030
00:39:48,000 --> 00:39:50,190
be paired up with a mechanism that pulls

1031
00:39:50,190 --> 00:39:52,440
another copy so it's a it's a push

1032
00:39:52,440 --> 00:39:54,690
notification pull request when you need

1033
00:39:54,690 --> 00:39:57,990
it right so such a mechanism would

1034
00:39:57,990 --> 00:40:00,420
essentially indicate that oh yeah use

1035
00:40:00,420 --> 00:40:02,250
that javascript file that you were

1036
00:40:02,250 --> 00:40:04,020
accessing before it's no longer fresh

1037
00:40:04,020 --> 00:40:05,280
you need to bring another one before you

1038
00:40:05,280 --> 00:40:06,000
do anything else

1039
00:40:06,000 --> 00:40:11,610
okay so notifications are in fact used

1040
00:40:11,610 --> 00:40:13,650
in the memory aren't here at your most

1041
00:40:13,650 --> 00:40:16,380
modern processors okay the reason is the

1042
00:40:16,380 --> 00:40:17,550
following so by the way it has a

1043
00:40:17,550 --> 00:40:19,470
completely different name and and it's

1044
00:40:19,470 --> 00:40:21,480
called cache invalidation

1045
00:40:21,480 --> 00:40:23,070
and it's part of the cache coherency

1046
00:40:23,070 --> 00:40:24,570
protocol right so as part of the

1047
00:40:24,570 --> 00:40:26,960
architecture has something called cache

1048
00:40:26,960 --> 00:40:31,710
coherency and one way to do cache

1049
00:40:31,710 --> 00:40:36,260
coherence is to do cache invalidation

1050
00:40:38,910 --> 00:40:41,320
right so literally cache invalidation

1051
00:40:41,320 --> 00:40:43,600
it's a mechanism in which the processor

1052
00:40:43,600 --> 00:40:46,120
will say this things that I'm caching is

1053
00:40:46,120 --> 00:40:47,800
definitely not good anymore I need to go

1054
00:40:47,800 --> 00:40:50,230
bring another copy when somebody

1055
00:40:50,230 --> 00:40:53,260
accesses it okay so it turns out I mean

1056
00:40:53,260 --> 00:40:55,180
the actual way things happen it's a

1057
00:40:55,180 --> 00:40:56,980
little a little bit different but in

1058
00:40:56,980 --> 00:41:00,570
fact processors are snooping on the

1059
00:41:00,570 --> 00:41:02,410
communication channel which is actually

1060
00:41:02,410 --> 00:41:04,030
very complicated so the snooping is a

1061
00:41:04,030 --> 00:41:06,430
complicated thing right and in fact are

1062
00:41:06,430 --> 00:41:11,080
listening for any activity that overlaps

1063
00:41:11,080 --> 00:41:12,610
with what they actually cache and if

1064
00:41:12,610 --> 00:41:14,080
they see any activity they invalidate

1065
00:41:14,080 --> 00:41:15,460
their own cache so they know then we

1066
00:41:15,460 --> 00:41:18,750
need to go and grab the real copy when

1067
00:41:18,750 --> 00:41:21,340
when they need one now this is a better

1068
00:41:21,340 --> 00:41:23,710
mechanism that in fact paying attention

1069
00:41:23,710 --> 00:41:25,540
to what's going on and pushing their own

1070
00:41:25,540 --> 00:41:28,680
changes right because that's a lot more

1071
00:41:28,680 --> 00:41:31,240
resource consuming right so it's again

1072
00:41:31,240 --> 00:41:33,100
this difference between notification and

1073
00:41:33,100 --> 00:41:35,350
push changes so instead of pushing the

1074
00:41:35,350 --> 00:41:37,140
food changes forget about that

1075
00:41:37,140 --> 00:41:38,860
essentially what you're doing is you're

1076
00:41:38,860 --> 00:41:40,750
betting that you don't necessarily need

1077
00:41:40,750 --> 00:41:42,670
that resource right away you just want

1078
00:41:42,670 --> 00:41:44,050
to ensure a certain kind of correctness

1079
00:41:44,050 --> 00:41:46,930
for for processors that share the same

1080
00:41:46,930 --> 00:41:48,730
memory for quarters that share the same

1081
00:41:48,730 --> 00:41:50,590
memory this is extremely important right

1082
00:41:50,590 --> 00:41:52,000
because if you wouldn't enforce this

1083
00:41:52,000 --> 00:41:53,320
kind of cache invalidation and to make

1084
00:41:53,320 --> 00:41:55,030
it correct you're gonna have arbitrarily

1085
00:41:55,030 --> 00:41:56,620
bad behavior going on in there I mean

1086
00:41:56,620 --> 00:41:58,720
your work on somebody changes something

1087
00:41:58,720 --> 00:42:01,090
in memory but in fact some other

1088
00:42:01,090 --> 00:42:02,770
processor was catching it and it doesn't

1089
00:42:02,770 --> 00:42:06,040
even notice that's a disaster but I want

1090
00:42:06,040 --> 00:42:07,060
you to understand that this cache

1091
00:42:07,060 --> 00:42:08,320
coherency and cache invalidation

1092
00:42:08,320 --> 00:42:10,780
protocols have the same problems at a

1093
00:42:10,780 --> 00:42:12,910
high level as you have with these things

1094
00:42:12,910 --> 00:42:14,470
I mean a lot of it gives you just get to

1095
00:42:14,470 --> 00:42:15,880
keep up to date and this is why for

1096
00:42:15,880 --> 00:42:17,440
example you do stay here shift now at

1097
00:42:17,440 --> 00:42:20,440
least for certain products away from

1098
00:42:20,440 --> 00:42:22,780
cache coherency protocols right so for

1099
00:42:22,780 --> 00:42:27,400
example the PlayStation the PlayStation

1100
00:42:27,400 --> 00:42:29,290
3 the Cell processor in PlayStation 3

1101
00:42:29,290 --> 00:42:32,200
the actual course the computation of

1102
00:42:32,200 --> 00:42:34,390
course have no cash whatsoever the

1103
00:42:34,390 --> 00:42:36,100
reason is this is very expensive so they

1104
00:42:36,100 --> 00:42:38,170
have to do explicit transfers transfer

1105
00:42:38,170 --> 00:42:40,150
this big block of memory right so it's

1106
00:42:40,150 --> 00:42:42,280
more like not even check for frettin

1107
00:42:42,280 --> 00:42:44,410
freshness but it's kind of planned you

1108
00:42:44,410 --> 00:42:46,300
do this I transfer this portion I do

1109
00:42:46,300 --> 00:42:48,430
something and I push it okay all right

1110
00:42:48,430 --> 00:42:50,770
now coming back to our power chat server

1111
00:42:50,770 --> 00:42:51,980
right of course you want

1112
00:42:51,980 --> 00:42:53,930
The Fool push changes well it kind of

1113
00:42:53,930 --> 00:42:56,380
depends right so you want to pour

1114
00:42:56,380 --> 00:42:58,520
changes if they are relatively small but

1115
00:42:58,520 --> 00:43:01,359
maybe some notification or only a

1116
00:43:01,359 --> 00:43:03,470
skeleton for the changes they are really

1117
00:43:03,470 --> 00:43:06,890
big right being lazy about these things

1118
00:43:06,890 --> 00:43:08,270
pays off big time

1119
00:43:08,270 --> 00:43:10,310
right so if you all the time you have to

1120
00:43:10,310 --> 00:43:12,260
transport big things just in case the

1121
00:43:12,260 --> 00:43:13,670
user wants to use them that's not a

1122
00:43:13,670 --> 00:43:16,220
particularly good policy so always when

1123
00:43:16,220 --> 00:43:17,630
you design applications like this here I

1124
00:43:17,630 --> 00:43:19,480
have to think about how big the

1125
00:43:19,480 --> 00:43:22,010
propagation the information is if it's

1126
00:43:22,010 --> 00:43:23,420
very big I might be better off by

1127
00:43:23,420 --> 00:43:24,890
knowing that it changed maybe not quite

1128
00:43:24,890 --> 00:43:26,390
notification the important stuff gets

1129
00:43:26,390 --> 00:43:28,190
pushed right so an intermediate solution

1130
00:43:28,190 --> 00:43:30,440
between a post change a notification so

1131
00:43:30,440 --> 00:43:31,790
the things that are more visible get

1132
00:43:31,790 --> 00:43:33,619
pushed but the big content is not so if

1133
00:43:33,619 --> 00:43:34,970
you're trying to access it you already

1134
00:43:34,970 --> 00:43:37,369
in fact invalidated the cache for that

1135
00:43:37,369 --> 00:43:38,990
one and you do some sort of a separate

1136
00:43:38,990 --> 00:43:40,820
request to get that to get that content

1137
00:43:40,820 --> 00:43:42,109
and that seems to be a better solution

1138
00:43:42,109 --> 00:43:46,130
okay so in order to support this kind of

1139
00:43:46,130 --> 00:43:49,790
things right notification based kind of

1140
00:43:49,790 --> 00:43:52,400
activities for example there is a brand

1141
00:43:52,400 --> 00:43:54,470
well not quite brand new it's about five

1142
00:43:54,470 --> 00:43:55,850
years old maybe about seven years old

1143
00:43:55,850 --> 00:43:57,109
now right there is a new mechanism

1144
00:43:57,109 --> 00:44:00,710
supported by the web browsers to allow

1145
00:44:00,710 --> 00:44:03,170
you to in fact have a continuous

1146
00:44:03,170 --> 00:44:04,490
communication with the backend and get

1147
00:44:04,490 --> 00:44:07,340
this kind of notifications okay and this

1148
00:44:07,340 --> 00:44:10,160
is a cold web services well is it

1149
00:44:10,160 --> 00:44:13,270
novel of socket

1150
00:44:15,270 --> 00:44:18,700
so the normal operation in a web browser

1151
00:44:18,700 --> 00:44:21,490
is a tcp/ip connection that goes from

1152
00:44:21,490 --> 00:44:24,460
the client make the request and then

1153
00:44:24,460 --> 00:44:26,740
finishes with some gizmos to make it

1154
00:44:26,740 --> 00:44:28,630
faster potentially for example bonding

1155
00:44:28,630 --> 00:44:30,369
and so on but they are still treated

1156
00:44:30,369 --> 00:44:31,960
completely independently ok

1157
00:44:31,960 --> 00:44:33,670
WebSockets are completely different

1158
00:44:33,670 --> 00:44:35,200
WebSockets are a point-to-point

1159
00:44:35,200 --> 00:44:36,730
connection between the client and the

1160
00:44:36,730 --> 00:44:38,589
server in which the communication goes

1161
00:44:38,589 --> 00:44:41,980
both ways okay but in a strange way the

1162
00:44:41,980 --> 00:44:43,750
client itself is now a mini server you

1163
00:44:43,750 --> 00:44:45,640
literally have to listen on your own web

1164
00:44:45,640 --> 00:44:48,130
socket to see what goes on right so you

1165
00:44:48,130 --> 00:44:49,720
can in fact say you know what listen on

1166
00:44:49,720 --> 00:44:51,130
the web socket and when something comes

1167
00:44:51,130 --> 00:44:52,900
up I can look at what come up and then

1168
00:44:52,900 --> 00:44:54,310
do whatever is it that I'm doing now of

1169
00:44:54,310 --> 00:44:56,859
course this web socket is not possible

1170
00:44:56,859 --> 00:45:01,329
without JavaScript you must have an

1171
00:45:01,329 --> 00:45:03,400
active program here and not just some

1172
00:45:03,400 --> 00:45:07,569
sort of simple client that gets an HTML

1173
00:45:07,569 --> 00:45:09,790
page and renders it you must have some

1174
00:45:09,790 --> 00:45:11,980
program running right in order to do

1175
00:45:11,980 --> 00:45:13,869
something like WebSockets but if you do

1176
00:45:13,869 --> 00:45:16,869
then a lot of things become much more

1177
00:45:16,869 --> 00:45:18,369
straightforward for example like a chat

1178
00:45:18,369 --> 00:45:20,230
application without the chat application

1179
00:45:20,230 --> 00:45:21,339
all right in the good old days to

1180
00:45:21,339 --> 00:45:22,660
implement a chat application you have to

1181
00:45:22,660 --> 00:45:24,720
do something very nasty you had to

1182
00:45:24,720 --> 00:45:29,050
artificially invalidate the HTML page by

1183
00:45:29,050 --> 00:45:31,240
setting a very short time for the cache

1184
00:45:31,240 --> 00:45:33,970
in just a couple of seconds to make

1185
00:45:33,970 --> 00:45:35,710
another request and then the server

1186
00:45:35,710 --> 00:45:37,720
somehow to figure out how to render the

1187
00:45:37,720 --> 00:45:39,579
new HTML page to reflect the new changes

1188
00:45:39,579 --> 00:45:42,579
right that produces extremely nasty

1189
00:45:42,579 --> 00:45:44,650
behavior I mean first of all when the

1190
00:45:44,650 --> 00:45:46,540
page gets invalidated the most web

1191
00:45:46,540 --> 00:45:48,700
servers will make the whole page flicker

1192
00:45:48,700 --> 00:45:51,490
and this is how you know if they do

1193
00:45:51,490 --> 00:45:53,440
magic with some of this stuff and maybe

1194
00:45:53,440 --> 00:45:55,660
JavaScript to render parts of the page

1195
00:45:55,660 --> 00:45:58,480
or they load the whole page if your

1196
00:45:58,480 --> 00:46:01,180
entire page flickers and you see it get

1197
00:46:01,180 --> 00:46:04,630
black I mean almost blank and again you

1198
00:46:04,630 --> 00:46:07,630
know they are loading a full page if you

1199
00:46:07,630 --> 00:46:09,609
click around and nothing flickers except

1200
00:46:09,609 --> 00:46:11,230
some content changes here and there you

1201
00:46:11,230 --> 00:46:12,099
know that they are doing JavaScript

1202
00:46:12,099 --> 00:46:14,260
magic and quite possibly using this kind

1203
00:46:14,260 --> 00:46:16,510
of WebSockets so with by the way the

1204
00:46:16,510 --> 00:46:18,220
technology from the point of view of the

1205
00:46:18,220 --> 00:46:19,750
user is quite simple in the in the

1206
00:46:19,750 --> 00:46:21,490
WebSockets I mean very simple API will

1207
00:46:21,490 --> 00:46:22,750
allow you to basically register a

1208
00:46:22,750 --> 00:46:24,550
listener it's exactly like a TCP

1209
00:46:24,550 --> 00:46:27,010
connection you register a listener you

1210
00:46:27,010 --> 00:46:28,750
have your own way to speak on the

1211
00:46:28,750 --> 00:46:30,340
the wire but you register listen and I

1212
00:46:30,340 --> 00:46:33,100
say hey when something comes get the

1213
00:46:33,100 --> 00:46:34,540
content and then do whatever you want

1214
00:46:34,540 --> 00:46:35,950
with the content I mean interpreted as a

1215
00:46:35,950 --> 00:46:37,570
string or more interestingly at some

1216
00:46:37,570 --> 00:46:39,010
kind of a JSON object that can become

1217
00:46:39,010 --> 00:46:41,800
data right you can do a lot of magic

1218
00:46:41,800 --> 00:46:43,630
with this kind of things right

1219
00:46:43,630 --> 00:46:46,840
for example one of them would be to do a

1220
00:46:46,840 --> 00:46:49,450
notification mechanism right when things

1221
00:46:49,450 --> 00:46:52,720
changed right so imagine for example

1222
00:46:52,720 --> 00:46:55,660
you're watching stocks such a WebSocket

1223
00:46:55,660 --> 00:46:59,040
will allow so you're watching stocks and

1224
00:46:59,040 --> 00:47:01,660
you're only interested in iron ore five

1225
00:47:01,660 --> 00:47:04,240
kickers you could in fact tell the

1226
00:47:04,240 --> 00:47:05,890
server through whatever mechanism you

1227
00:47:05,890 --> 00:47:07,630
want either the WebSocket which is

1228
00:47:07,630 --> 00:47:09,160
bi-directional or through one of those

1229
00:47:09,160 --> 00:47:11,020
calls you can tell the server hey I'm

1230
00:47:11,020 --> 00:47:12,760
interested in this kind of things and

1231
00:47:12,760 --> 00:47:15,880
then the server may be it I mean

1232
00:47:15,880 --> 00:47:17,770
hopefully it remembers what you're

1233
00:47:17,770 --> 00:47:19,930
interested in and somehow only pushes

1234
00:47:19,930 --> 00:47:21,850
you changes that you're interested ok

1235
00:47:21,850 --> 00:47:23,560
now we need to discuss about that

1236
00:47:23,560 --> 00:47:25,240
separately I want to come back to that

1237
00:47:25,240 --> 00:47:30,240
that's called publish/subscribe system

1238
00:47:34,050 --> 00:47:36,550
but you see everything is tied up to

1239
00:47:36,550 --> 00:47:39,880
this notion off if you have a stale copy

1240
00:47:39,880 --> 00:47:41,650
I'd somehow have to make it available to

1241
00:47:41,650 --> 00:47:43,060
you either pool or Porsche this is gonna

1242
00:47:43,060 --> 00:47:44,590
be a lot of pushing with a

1243
00:47:44,590 --> 00:47:48,880
publish/subscribe right so a more just

1244
00:47:48,880 --> 00:47:51,190
to give you a short preview a more

1245
00:47:51,190 --> 00:47:52,780
elaborate publish/subscribe system is

1246
00:47:52,780 --> 00:47:54,490
for example some sort of news feeds in

1247
00:47:54,490 --> 00:47:55,690
which you say I'm interested in this

1248
00:47:55,690 --> 00:47:56,980
kind of things and then you have a big

1249
00:47:56,980 --> 00:47:59,770
server and the server over thousands of

1250
00:47:59,770 --> 00:48:01,690
millions of clients is determining who

1251
00:48:01,690 --> 00:48:03,670
needs water at wartime and through the

1252
00:48:03,670 --> 00:48:05,200
notification mechanism pushes things

1253
00:48:05,200 --> 00:48:10,900
right all right now of course the trick

1254
00:48:10,900 --> 00:48:12,790
there is only to let people know about

1255
00:48:12,790 --> 00:48:14,170
things they care about not about

1256
00:48:14,170 --> 00:48:14,710
everything

1257
00:48:14,710 --> 00:48:16,510
right so it's the difference between a

1258
00:48:16,510 --> 00:48:20,310
broadcast or some sort of multicast or

1259
00:48:20,310 --> 00:48:23,950
point-to-point connection all right but

1260
00:48:23,950 --> 00:48:27,760
all comes comes back to this essentially

1261
00:48:27,760 --> 00:48:30,940
to this replication so obviously a

1262
00:48:30,940 --> 00:48:33,040
publish/subscribe system it's going to

1263
00:48:33,040 --> 00:48:34,180
be implemented in a very different way

1264
00:48:34,180 --> 00:48:37,060
than normal for example web web content

1265
00:48:37,060 --> 00:48:40,800
delivery so any any questions about this

1266
00:48:40,800 --> 00:48:42,660
right so this is the basic

1267
00:48:42,660 --> 00:48:44,279
you are the check for freshness and

1268
00:48:44,279 --> 00:48:46,470
whatever you have whatever algorithm to

1269
00:48:46,470 --> 00:48:48,450
detect for that notifications which are

1270
00:48:48,450 --> 00:48:51,299
small or push-pull changes now you can

1271
00:48:51,299 --> 00:48:53,789
be a little bit clever in between

1272
00:48:53,789 --> 00:48:55,950
especially for resources that tend to be

1273
00:48:55,950 --> 00:48:58,650
big but with small changes namely you

1274
00:48:58,650 --> 00:49:01,079
could send a set of operations that can

1275
00:49:01,079 --> 00:49:04,349
be applied at the other end in order to

1276
00:49:04,349 --> 00:49:06,950
bring the copy to a consistent copy

1277
00:49:06,950 --> 00:49:12,089
rather than the full copy right for

1278
00:49:12,089 --> 00:49:13,079
example these kind of things are very

1279
00:49:13,079 --> 00:49:16,559
good for let's say file editing when

1280
00:49:16,559 --> 00:49:18,299
you're editing the file is especially

1281
00:49:18,299 --> 00:49:20,160
this new new trend right over there

1282
00:49:20,160 --> 00:49:22,109
collaborating editing you're usually

1283
00:49:22,109 --> 00:49:24,059
making relatively small changes at any

1284
00:49:24,059 --> 00:49:26,339
given moment of time this is going to be

1285
00:49:26,339 --> 00:49:28,019
foolish to say you know what we work on

1286
00:49:28,019 --> 00:49:29,609
this document and it's a megabyte in

1287
00:49:29,609 --> 00:49:31,470
size the moment somebody presses a key

1288
00:49:31,470 --> 00:49:34,829
and fruits of space in the document I'm

1289
00:49:34,829 --> 00:49:35,940
going to send you the new version of the

1290
00:49:35,940 --> 00:49:39,660
document right so an interesting kind of

1291
00:49:39,660 --> 00:49:41,099
question to ask there is could i

1292
00:49:41,099 --> 00:49:42,930
propagate only the small changes and

1293
00:49:42,930 --> 00:49:43,950
then there is a question of how you

1294
00:49:43,950 --> 00:49:45,930
propagate those changes to keep multiple

1295
00:49:45,930 --> 00:49:48,319
consistent copies when you have multiple

1296
00:49:48,319 --> 00:49:53,759
if you want people or processes that

1297
00:49:53,759 --> 00:49:55,380
actually change change the thing right

1298
00:49:55,380 --> 00:49:58,319
so then you can be somewhere in the

1299
00:49:58,319 --> 00:49:59,940
middle in which you're not only

1300
00:49:59,940 --> 00:50:00,960
important you're not in validating

1301
00:50:00,960 --> 00:50:02,730
because you're not you're not saying hey

1302
00:50:02,730 --> 00:50:05,309
this is there is a change go grab a new

1303
00:50:05,309 --> 00:50:09,119
copy you're simply sending a compact

1304
00:50:09,119 --> 00:50:11,430
description of what changed right so

1305
00:50:11,430 --> 00:50:13,440
some sort of a differential between the

1306
00:50:13,440 --> 00:50:16,890
server copy and your own copy that could

1307
00:50:16,890 --> 00:50:18,809
potentially save a tremendous amount of

1308
00:50:18,809 --> 00:50:20,160
bandwidth and this is one of the core

1309
00:50:20,160 --> 00:50:25,369
problems for example that the box and

1310
00:50:25,369 --> 00:50:28,109
Dropbox people need to solve right

1311
00:50:28,109 --> 00:50:30,390
because most of the files are going to

1312
00:50:30,390 --> 00:50:32,039
have relatively small changes continuous

1313
00:50:32,039 --> 00:50:33,720
changes if you only send the dáil time

1314
00:50:33,720 --> 00:50:35,670
you resolve correctly the changes on the

1315
00:50:35,670 --> 00:50:37,170
other end you're gonna save tremendously

1316
00:50:37,170 --> 00:50:39,420
on the bandwidth which means more profit

1317
00:50:39,420 --> 00:50:41,130
to you because you're literally gonna be

1318
00:50:41,130 --> 00:50:44,069
driven by the bandwidth itself I mean

1319
00:50:44,069 --> 00:50:46,980
that's the big problem for Dropbox right

1320
00:50:46,980 --> 00:50:50,940
it's bandwidth okay now Dropbox is an

1321
00:50:50,940 --> 00:50:53,130
interesting example which would it's

1322
00:50:53,130 --> 00:50:54,900
probably worth discussing how exactly it

1323
00:50:54,900 --> 00:50:56,010
fits into the story

1324
00:50:56,010 --> 00:51:00,750
right the discussion should be probably

1325
00:51:00,750 --> 00:51:01,680
more in Ibraham when we talk about

1326
00:51:01,680 --> 00:51:03,990
disability file systems but essentially

1327
00:51:03,990 --> 00:51:05,280
because you can have a lot of storage

1328
00:51:05,280 --> 00:51:07,200
locally right I mean I just argued that

1329
00:51:07,200 --> 00:51:09,570
hard drives are cheap most of the files

1330
00:51:09,570 --> 00:51:11,970
you're accessing especially if you don't

1331
00:51:11,970 --> 00:51:13,710
pay a lot of money to Dropbox to have a

1332
00:51:13,710 --> 00:51:15,359
lot very large storage which is the case

1333
00:51:15,359 --> 00:51:17,910
with those people right then storing

1334
00:51:17,910 --> 00:51:19,560
your own local copy is not a big issue

1335
00:51:19,560 --> 00:51:21,900
as long as this copies your local copy

1336
00:51:21,900 --> 00:51:23,730
and the server copy are kept in sync

1337
00:51:23,730 --> 00:51:25,680
right so the annoying thing would be if

1338
00:51:25,680 --> 00:51:29,130
I change the files from multiple

1339
00:51:29,130 --> 00:51:30,869
machines and the changes don't get

1340
00:51:30,869 --> 00:51:32,280
propagated I'm not really particularly

1341
00:51:32,280 --> 00:51:33,540
concerned about the fact that I only

1342
00:51:33,540 --> 00:51:35,850
have space for my own copy right but

1343
00:51:35,850 --> 00:51:37,440
most people keep less than something

1344
00:51:37,440 --> 00:51:38,940
thinking about thanking abides now I

1345
00:51:38,940 --> 00:51:40,440
mean even phones can do thank you your

1346
00:51:40,440 --> 00:51:44,760
bytes right kind of right in those kind

1347
00:51:44,760 --> 00:51:49,520
of circumstances the big big issue is

1348
00:51:49,520 --> 00:51:53,340
how can you update those files right you

1349
00:51:53,340 --> 00:51:54,630
could use notification to invalidate

1350
00:51:54,630 --> 00:51:56,369
them and then you force somebody to go

1351
00:51:56,369 --> 00:51:58,200
and bring a new a new copy you can use

1352
00:51:58,200 --> 00:51:59,609
notification in other ways for example

1353
00:51:59,609 --> 00:52:02,220
in Dropbox and notices uses it you can

1354
00:52:02,220 --> 00:52:08,010
have this other applications that when

1355
00:52:08,010 --> 00:52:10,530
there is a notification instead of just

1356
00:52:10,530 --> 00:52:12,270
doing something in the file system or to

1357
00:52:12,270 --> 00:52:13,440
complement what you're doing the file

1358
00:52:13,440 --> 00:52:14,760
system you pop up something on users

1359
00:52:14,760 --> 00:52:16,050
screen and say hey somebody is changing

1360
00:52:16,050 --> 00:52:17,790
that file there is a new copy for this

1361
00:52:17,790 --> 00:52:19,920
file do you want me to bring it and you

1362
00:52:19,920 --> 00:52:22,470
still ask for some user action which

1363
00:52:22,470 --> 00:52:24,420
essentially means you're not waste with

1364
00:52:24,420 --> 00:52:26,490
wasting bandwidth unless the user really

1365
00:52:26,490 --> 00:52:27,930
cares about that resource so that's one

1366
00:52:27,930 --> 00:52:30,240
way for I mean it's something that I

1367
00:52:30,240 --> 00:52:34,140
think at least initially neglected as a

1368
00:52:34,140 --> 00:52:38,100
solution user engagement if people have

1369
00:52:38,100 --> 00:52:41,280
to click right then you can save a lot

1370
00:52:41,280 --> 00:52:45,390
of bandwidth because people are gonna go

1371
00:52:45,390 --> 00:52:47,130
and click and click and click only for

1372
00:52:47,130 --> 00:52:48,420
the first few days and then they get

1373
00:52:48,420 --> 00:52:49,859
tied and they go and click only when

1374
00:52:49,859 --> 00:52:50,369
they need it

1375
00:52:50,369 --> 00:52:53,310
right you send a short notification to

1376
00:52:53,310 --> 00:52:55,560
use the nodes that he changed it has to

1377
00:52:55,560 --> 00:52:57,540
click on it to bring a new version saves

1378
00:52:57,540 --> 00:52:59,280
bandwidth as opposed to automatically

1379
00:52:59,280 --> 00:53:02,010
keep on pushing these changes right now

1380
00:53:02,010 --> 00:53:06,150
in fact a company like Dropbox would

1381
00:53:06,150 --> 00:53:08,460
have to do full post changes for binary

1382
00:53:08,460 --> 00:53:09,420
files

1383
00:53:09,420 --> 00:53:12,600
it's usually very very hard to send

1384
00:53:12,600 --> 00:53:15,540
Delta changes right

1385
00:53:15,540 --> 00:53:16,950
especially when you have compressed

1386
00:53:16,950 --> 00:53:18,840
parts is a complete disaster right

1387
00:53:18,840 --> 00:53:19,980
unless you know precisely the

1388
00:53:19,980 --> 00:53:21,270
compression format that you're trying to

1389
00:53:21,270 --> 00:53:23,370
do something probably too clever for

1390
00:53:23,370 --> 00:53:24,240
your own good

1391
00:53:24,240 --> 00:53:28,380
okay literally a small change in the

1392
00:53:28,380 --> 00:53:30,420
file could produce quite large changes

1393
00:53:30,420 --> 00:53:31,920
overall or if you are talking about

1394
00:53:31,920 --> 00:53:34,470
encrypted files it's a complete disaster

1395
00:53:34,470 --> 00:53:36,870
okay because possibly the entire file

1396
00:53:36,870 --> 00:53:39,000
changes and the smallest change in the

1397
00:53:39,000 --> 00:53:40,890
content if it's really encrypted even if

1398
00:53:40,890 --> 00:53:42,720
you use block encryption you're still

1399
00:53:42,720 --> 00:53:46,590
dealing with blocks right so there is

1400
00:53:46,590 --> 00:53:48,090
always some sort of a gradation between

1401
00:53:48,090 --> 00:53:49,650
these things and finding the right

1402
00:53:49,650 --> 00:53:51,120
trade-off between these things is one

1403
00:53:51,120 --> 00:53:53,070
tricky thing right for the specific

1404
00:53:53,070 --> 00:53:55,740
application finding how and when to

1405
00:53:55,740 --> 00:53:57,570
invalidate the caches is another tricky

1406
00:53:57,570 --> 00:53:59,490
tricky thing and almost always these

1407
00:53:59,490 --> 00:54:02,010
things have to be fine-tuned for the

1408
00:54:02,010 --> 00:54:03,840
specific application so don't look for

1409
00:54:03,840 --> 00:54:05,730
canned solutions that work in every

1410
00:54:05,730 --> 00:54:07,470
circumstance right it's a big struggle

1411
00:54:07,470 --> 00:54:09,840
to find exactly when to check for

1412
00:54:09,840 --> 00:54:11,550
freshness in particular applications and

1413
00:54:11,550 --> 00:54:13,380
when and how to propagate this changes

1414
00:54:13,380 --> 00:54:19,770
okay alright so big issue and some sort

1415
00:54:19,770 --> 00:54:22,490
of a continuum of solutions in that

1416
00:54:22,490 --> 00:54:27,510
discussion now we don't only serve this

1417
00:54:27,510 --> 00:54:28,530
is a non-dairy issue right we are

1418
00:54:28,530 --> 00:54:29,610
talking about the more general or

1419
00:54:29,610 --> 00:54:31,650
application for the purpose of fault

1420
00:54:31,650 --> 00:54:33,240
tolerance when it comes to that we

1421
00:54:33,240 --> 00:54:35,130
really want to think a lot more about

1422
00:54:35,130 --> 00:54:37,470
the server replication right so it's not

1423
00:54:37,470 --> 00:54:39,150
only this this story with when you have

1424
00:54:39,150 --> 00:54:45,570
the client replication right but we have

1425
00:54:45,570 --> 00:54:46,950
multiple servers and you're trying to

1426
00:54:46,950 --> 00:54:48,390
pass the load on multiple servers for

1427
00:54:48,390 --> 00:54:50,190
performance reasons or you're simply

1428
00:54:50,190 --> 00:54:51,630
trying to have enough redundancy in the

1429
00:54:51,630 --> 00:54:53,160
system in case things get lost right

1430
00:54:53,160 --> 00:54:56,370
then the question is how should you

1431
00:54:56,370 --> 00:54:57,840
propagate information from one server to

1432
00:54:57,840 --> 00:55:00,480
another okay now let me tell you an

1433
00:55:00,480 --> 00:55:03,810
interesting kind of story about client

1434
00:55:03,810 --> 00:55:06,060
replication that can serve is even as a

1435
00:55:06,060 --> 00:55:07,650
fault tolerance replication it's kind of

1436
00:55:07,650 --> 00:55:09,360
an interesting situation and that that

1437
00:55:09,360 --> 00:55:11,340
could happen to you but it happened to

1438
00:55:11,340 --> 00:55:17,190
the people that Pixar so it turns out

1439
00:55:17,190 --> 00:55:19,250
that

1440
00:55:20,020 --> 00:55:22,180
somebody deleted by mistake but

1441
00:55:22,180 --> 00:55:24,580
nevertheless deleted the entire creative

1442
00:55:24,580 --> 00:55:28,480
content for Toy Story 2 right is

1443
00:55:28,480 --> 00:55:29,860
somebody DISA maintenance on the server

1444
00:55:29,860 --> 00:55:33,130
and simply destroyed everything they've

1445
00:55:33,130 --> 00:55:35,320
been doing for two years or something

1446
00:55:35,320 --> 00:55:37,120
like that that essentially means by the

1447
00:55:37,120 --> 00:55:39,160
way instant bankruptcy for a company

1448
00:55:39,160 --> 00:55:40,780
young company like Pixar at the time

1449
00:55:40,780 --> 00:55:43,360
because that means we don't deliver the

1450
00:55:43,360 --> 00:55:45,460
movie or we deliver it late you can't do

1451
00:55:45,460 --> 00:55:47,620
it twice okay we deliver it late and

1452
00:55:47,620 --> 00:55:49,720
that's it it might as well call it quits

1453
00:55:49,720 --> 00:55:51,880
they were saved by the fact that one of

1454
00:55:51,880 --> 00:55:54,340
the employees took a copy of the entire

1455
00:55:54,340 --> 00:55:56,650
content home so he can he can work on it

1456
00:55:56,650 --> 00:55:58,960
so this can write and then basically

1457
00:55:58,960 --> 00:56:00,820
they started crying and somebody

1458
00:56:00,820 --> 00:56:03,040
realized that they did something I

1459
00:56:03,040 --> 00:56:04,660
probably was against the company policy

1460
00:56:04,660 --> 00:56:06,550
and took all the files home that happens

1461
00:56:06,550 --> 00:56:09,040
for example when you're using let's say

1462
00:56:09,040 --> 00:56:11,440
subversion or git or one of these tools

1463
00:56:11,440 --> 00:56:16,120
in which you're essentially keeping your

1464
00:56:16,120 --> 00:56:18,070
own copy in which you can make changes

1465
00:56:18,070 --> 00:56:19,690
but you're synchronizing with us with a

1466
00:56:19,690 --> 00:56:21,970
server if the server runs amok which it

1467
00:56:21,970 --> 00:56:24,340
might write at least you have your own

1468
00:56:24,340 --> 00:56:26,200
your own copy which in fact contains the

1469
00:56:26,200 --> 00:56:28,240
full history and you can actually

1470
00:56:28,240 --> 00:56:30,280
recover most of what you had there may

1471
00:56:30,280 --> 00:56:33,220
be - a day or two from the from the back

1472
00:56:33,220 --> 00:56:35,320
end content so don't neglect this

1473
00:56:35,320 --> 00:56:37,140
possibility that you could recover

1474
00:56:37,140 --> 00:56:39,610
information in order to provide for

1475
00:56:39,610 --> 00:56:42,730
tolerance from cached copies the cache

1476
00:56:42,730 --> 00:56:44,560
copies themselves are in fact copies if

1477
00:56:44,560 --> 00:56:46,990
you have a way to resolve how good they

1478
00:56:46,990 --> 00:56:48,520
are and what they are and can patch a

1479
00:56:48,520 --> 00:56:50,140
solution together it's not necessarily a

1480
00:56:50,140 --> 00:56:52,420
bad solution I mean the alternative

1481
00:56:52,420 --> 00:56:53,950
might be to just not do anything which

1482
00:56:53,950 --> 00:56:57,670
would be a complete disaster right it's

1483
00:56:57,670 --> 00:57:00,520
kind of an interesting not meant but

1484
00:57:00,520 --> 00:57:04,600
possible use of of caching okay right

1485
00:57:04,600 --> 00:57:05,980
now when it comes to servers the

1486
00:57:05,980 --> 00:57:08,890
question is how do changes propagate the

1487
00:57:08,890 --> 00:57:10,720
client is the client and from the

1488
00:57:10,720 --> 00:57:11,770
clients point of view you can think more

1489
00:57:11,770 --> 00:57:13,540
in terms of some sort of caching right

1490
00:57:13,540 --> 00:57:15,780
but with the server and especially when

1491
00:57:15,780 --> 00:57:17,800
database servers are involved right

1492
00:57:17,800 --> 00:57:19,450
things can become a lot a lot of

1493
00:57:19,450 --> 00:57:20,680
trickier why because of the consistency

1494
00:57:20,680 --> 00:57:22,630
that needs to be enforced now vary some

1495
00:57:22,630 --> 00:57:24,160
sort of an implicit consistency when it

1496
00:57:24,160 --> 00:57:25,870
comes to the user but somehow everybody

1497
00:57:25,870 --> 00:57:28,300
bends the rules a little bit more right

1498
00:57:28,300 --> 00:57:33,010
so it's some some somewhat

1499
00:57:33,010 --> 00:57:34,510
more acceptable to bend the rules when

1500
00:57:34,510 --> 00:57:36,040
it comes to the time but that's almost

1501
00:57:36,040 --> 00:57:37,390
never acceptable when it comes to the

1502
00:57:37,390 --> 00:57:39,460
service now when it comes to this kind

1503
00:57:39,460 --> 00:57:41,920
of replication right there are multiple

1504
00:57:41,920 --> 00:57:43,810
things you can actually do and they have

1505
00:57:43,810 --> 00:57:45,820
different properties right I mean this

1506
00:57:45,820 --> 00:57:48,150
is just a schematic that basically

1507
00:57:48,150 --> 00:57:52,800
provides some sort of an adaptive way to

1508
00:57:52,800 --> 00:57:55,780
server load the problem right so I

1509
00:57:55,780 --> 00:57:57,220
mentioned the initial problem right

1510
00:57:57,220 --> 00:58:00,280
where do you place servers and how do

1511
00:58:00,280 --> 00:58:04,210
you pair the most clients well one one

1512
00:58:04,210 --> 00:58:05,830
technique and this turns out to work

1513
00:58:05,830 --> 00:58:07,720
reasonably well is you know what forget

1514
00:58:07,720 --> 00:58:09,550
about trying to be smart from the

1515
00:58:09,550 --> 00:58:11,200
beginning how I place servers what I'm

1516
00:58:11,200 --> 00:58:14,200
gonna do is monitor how the server is

1517
00:58:14,200 --> 00:58:16,690
used when I detect that the server is

1518
00:58:16,690 --> 00:58:19,360
struggling I'm simply gonna initiate

1519
00:58:19,360 --> 00:58:22,500
some process of replication and

1520
00:58:22,500 --> 00:58:24,700
partitioning the set of clients then go

1521
00:58:24,700 --> 00:58:26,590
to the to the two different copies in

1522
00:58:26,590 --> 00:58:29,740
order to alleviate a load right now of

1523
00:58:29,740 --> 00:58:31,480
course in order for that to work you

1524
00:58:31,480 --> 00:58:32,950
need some mechanism that ensures that

1525
00:58:32,950 --> 00:58:36,210
that actually kicks in right it

1526
00:58:36,210 --> 00:58:39,160
basically and this is what I tell other

1527
00:58:39,160 --> 00:58:40,960
people in other circumstances is nothing

1528
00:58:40,960 --> 00:58:43,360
exists unless somebody builds it right

1529
00:58:43,360 --> 00:58:45,610
so the fact that it would be nice to

1530
00:58:45,610 --> 00:58:47,260
have that doesn't mean it exists and

1531
00:58:47,260 --> 00:58:51,250
it's there so if I mean some people

1532
00:58:51,250 --> 00:58:53,830
build mechanisms like this or sometimes

1533
00:58:53,830 --> 00:58:55,540
you have to initiate them by hand so for

1534
00:58:55,540 --> 00:58:58,390
example if you really want to rent more

1535
00:58:58,390 --> 00:59:01,000
capacity literally some human has to

1536
00:59:01,000 --> 00:59:03,040
determine that hey that's not enough and

1537
00:59:03,040 --> 00:59:04,930
let's add some capacity or maybe build

1538
00:59:04,930 --> 00:59:06,760
some kind of a semi-automatic mechanism

1539
00:59:06,760 --> 00:59:08,680
so this is one such mechanism automatic

1540
00:59:08,680 --> 00:59:10,540
mechanism right so what you could do is

1541
00:59:10,540 --> 00:59:13,270
measure the load that can mean anything

1542
00:59:13,270 --> 00:59:15,820
but maybe something as simple as simply

1543
00:59:15,820 --> 00:59:17,050
count how many kinds how many

1544
00:59:17,050 --> 00:59:18,100
simultaneous clients you have at the

1545
00:59:18,100 --> 00:59:21,210
same time so keep some statistics or I

1546
00:59:21,210 --> 00:59:24,340
mean this is something that in my

1547
00:59:24,340 --> 00:59:25,720
opinion should happen all the time and

1548
00:59:25,720 --> 00:59:28,480
it happens almost never every large

1549
00:59:28,480 --> 00:59:30,550
system should monitor itself and monitor

1550
00:59:30,550 --> 00:59:32,950
its environment in order to detect am i

1551
00:59:32,950 --> 00:59:35,590
running in trouble for example as a

1552
00:59:35,590 --> 00:59:37,270
server what could you monitor to know

1553
00:59:37,270 --> 00:59:39,160
you're running in trouble well you can

1554
00:59:39,160 --> 00:59:40,870
look for all the signs that indicate

1555
00:59:40,870 --> 00:59:41,920
you're running in trouble for example

1556
00:59:41,920 --> 00:59:43,330
you can look for CPU utilization and if

1557
00:59:43,330 --> 00:59:44,290
you're all the time at a hundred percent

1558
00:59:44,290 --> 00:59:46,880
you're in trouble you

1559
00:59:46,880 --> 00:59:49,309
I don't know look for delays to the hard

1560
00:59:49,309 --> 00:59:50,779
drive and if they are very large you're

1561
00:59:50,779 --> 00:59:52,519
in trouble right you can look for

1562
00:59:52,519 --> 00:59:55,880
network congestion right and then you're

1563
00:59:55,880 --> 00:59:56,480
in trouble

1564
00:59:56,480 --> 00:59:57,799
and all these things can determine you

1565
00:59:57,799 --> 00:59:59,390
to say hey I'm gonna initiate some sort

1566
00:59:59,390 --> 01:00:01,309
of a replication mechanism right and

1567
01:00:01,309 --> 01:00:02,809
then how the replication happens is a

1568
01:00:02,809 --> 01:00:04,579
completely different thing so monitor

1569
01:00:04,579 --> 01:00:07,190
and replicate if you need to this seems

1570
01:00:07,190 --> 01:00:08,599
to be a technique that works reasonably

1571
01:00:08,599 --> 01:00:10,250
well now again it does not produce

1572
01:00:10,250 --> 01:00:13,099
optimal solutions but then I argued at

1573
01:00:13,099 --> 01:00:14,809
the beginning of the class and then you

1574
01:00:14,809 --> 01:00:16,309
might not have such a thing as optimal

1575
01:00:16,309 --> 01:00:17,359
solution because you can't measure

1576
01:00:17,359 --> 01:00:21,170
things perfectly so there is a saying in

1577
01:00:21,170 --> 01:00:23,359
the database community is you don't want

1578
01:00:23,359 --> 01:00:24,529
to talk to my solution you want good

1579
01:00:24,529 --> 01:00:26,089
enough solutions or you want non

1580
01:00:26,089 --> 01:00:29,210
disastrous solutions right so the

1581
01:00:29,210 --> 01:00:31,640
nightmare scenario is when such

1582
01:00:31,640 --> 01:00:34,069
protocols such algorithms that are doing

1583
01:00:34,069 --> 01:00:37,339
things go into really weird territory

1584
01:00:37,339 --> 01:00:38,990
and start doing very foolish things

1585
01:00:38,990 --> 01:00:40,880
right so for example replicating a

1586
01:00:40,880 --> 01:00:42,289
server for every client that'd be a

1587
01:00:42,289 --> 01:00:45,349
disaster by the way a single line of

1588
01:00:45,349 --> 01:00:47,750
code mistake could easily lead to yeah

1589
01:00:47,750 --> 01:00:49,910
may create a server for every client I'm

1590
01:00:49,910 --> 01:00:51,519
sure it happened to some people right

1591
01:00:51,519 --> 01:00:55,220
this is where defects and bugs come into

1592
01:00:55,220 --> 01:01:03,440
play right all right we talked about one

1593
01:01:03,440 --> 01:01:04,910
particular is okay so if we've

1594
01:01:04,910 --> 01:01:06,410
replicated the data but the data is

1595
01:01:06,410 --> 01:01:08,089
readwrite if it's read-only it's easy

1596
01:01:08,089 --> 01:01:11,150
right we've replicated it maybe we have

1597
01:01:11,150 --> 01:01:12,440
a clever algorithm may be a dumb

1598
01:01:12,440 --> 01:01:14,119
algorithm to decide where we place

1599
01:01:14,119 --> 01:01:16,190
another replica but then that's kind of

1600
01:01:16,190 --> 01:01:19,130
the only issue right but when we have

1601
01:01:19,130 --> 01:01:22,660
changes this changes somehow need to be

1602
01:01:22,660 --> 01:01:24,710
propagated or the clients have to be

1603
01:01:24,710 --> 01:01:26,839
made aware of such changes right so if

1604
01:01:26,839 --> 01:01:29,390
the client will check me but again the

1605
01:01:29,390 --> 01:01:30,920
client connects to one of the servers if

1606
01:01:30,920 --> 01:01:32,000
this server doesn't know about the

1607
01:01:32,000 --> 01:01:36,049
change and the change was initiated in

1608
01:01:36,049 --> 01:01:37,759
not the server then you're in trouble so

1609
01:01:37,759 --> 01:01:39,200
the server themselves have to somehow

1610
01:01:39,200 --> 01:01:41,839
keep the information in sync okay there

1611
01:01:41,839 --> 01:01:43,069
are multiple techniques to do this and

1612
01:01:43,069 --> 01:01:45,109
this is exactly the notion of

1613
01:01:45,109 --> 01:01:46,519
consistency that we talked about before

1614
01:01:46,519 --> 01:01:49,759
right multiple techniques to do such

1615
01:01:49,759 --> 01:01:52,099
things and it really depends on what

1616
01:01:52,099 --> 01:01:55,359
blend you have between reads and writes

1617
01:01:55,359 --> 01:01:57,240
so

1618
01:01:57,240 --> 01:02:00,839
one such technique is called remote

1619
01:02:00,839 --> 01:02:02,460
right protocols in which you have a

1620
01:02:02,460 --> 01:02:05,730
primary copy okay so the idea is

1621
01:02:05,730 --> 01:02:08,599
basically the following you're gonna

1622
01:02:08,599 --> 01:02:13,549
replicate the information essentially on

1623
01:02:13,549 --> 01:02:15,960
on all the server's mostly for the

1624
01:02:15,960 --> 01:02:17,579
purpose of reading but when it comes to

1625
01:02:17,579 --> 01:02:19,260
writing only one server it's it's

1626
01:02:19,260 --> 01:02:21,630
allowed to write and initiate maybe a

1627
01:02:21,630 --> 01:02:23,279
more global write propagation of

1628
01:02:23,279 --> 01:02:25,920
propagation of write but instead of

1629
01:02:25,920 --> 01:02:28,619
saying you can write anywhere for every

1630
01:02:28,619 --> 01:02:31,559
item you are allowed to write it only if

1631
01:02:31,559 --> 01:02:33,390
you want in the home base of the item

1632
01:02:33,390 --> 01:02:35,339
okay now it turns out that that

1633
01:02:35,339 --> 01:02:37,589
simplifies protocols between servers

1634
01:02:37,589 --> 01:02:41,309
right so in particular enforcing a

1635
01:02:41,309 --> 01:02:43,319
notion of consistency is now much easier

1636
01:02:43,319 --> 01:02:46,049
why remember before a lot of the

1637
01:02:46,049 --> 01:02:47,819
problems came from the fact that I write

1638
01:02:47,819 --> 01:02:49,440
in this server and there is a concurrent

1639
01:02:49,440 --> 01:02:51,420
write in another server and then you

1640
01:02:51,420 --> 01:02:54,450
can't put them together but if every

1641
01:02:54,450 --> 01:02:56,700
item has a home base then only one

1642
01:02:56,700 --> 01:02:58,710
server can actually write it all right

1643
01:02:58,710 --> 01:03:00,240
so all the write requests will go to

1644
01:03:00,240 --> 01:03:01,680
that server and that server has to

1645
01:03:01,680 --> 01:03:03,539
initiate any further propagation of the

1646
01:03:03,539 --> 01:03:05,279
information to other servers your only

1647
01:03:05,279 --> 01:03:06,510
issue is gonna be mildly stale

1648
01:03:06,510 --> 01:03:09,450
information in in the other servers but

1649
01:03:09,450 --> 01:03:11,609
you're really not gonna have most of the

1650
01:03:11,609 --> 01:03:13,260
conflict you would otherwise have if you

1651
01:03:13,260 --> 01:03:16,170
have a primary a primary base for where

1652
01:03:16,170 --> 01:03:18,240
the writes happen okay no that

1653
01:03:18,240 --> 01:03:20,190
immediately means that writing it's

1654
01:03:20,190 --> 01:03:23,039
gonna be a lot more expensive than then

1655
01:03:23,039 --> 01:03:24,990
reading and a big question for the

1656
01:03:24,990 --> 01:03:26,730
client is the following so the client

1657
01:03:26,730 --> 01:03:28,140
initiates the right I mean the client

1658
01:03:28,140 --> 01:03:29,599
does something that initiates the right

1659
01:03:29,599 --> 01:03:32,130
when can you let the client continue

1660
01:03:32,130 --> 01:03:34,380
right now this is a big issue in general

1661
01:03:34,380 --> 01:03:37,500
right when it comes to asking any other

1662
01:03:37,500 --> 01:03:39,779
entity to do some activity for you and

1663
01:03:39,779 --> 01:03:41,339
it's the discussion we had throughout

1664
01:03:41,339 --> 01:03:42,990
the class synchronous versus

1665
01:03:42,990 --> 01:03:44,789
asynchronous operations right

1666
01:03:44,789 --> 01:03:46,859
asynchronous means I asked you to do

1667
01:03:46,859 --> 01:03:48,420
something and I go immediately and do

1668
01:03:48,420 --> 01:03:49,829
something else and you're eventually

1669
01:03:49,829 --> 01:03:52,079
gonna do it and whatever synchronous

1670
01:03:52,079 --> 01:03:54,029
means I'll wait until I'm sure that

1671
01:03:54,029 --> 01:03:55,349
you've done what I asked you to do

1672
01:03:55,349 --> 01:03:58,760
partially or completely right now

1673
01:03:58,760 --> 01:04:01,980
synchronous writing synchronous programs

1674
01:04:01,980 --> 01:04:03,990
it's much easier when it comes to

1675
01:04:03,990 --> 01:04:05,400
reasoning about what goes on in the

1676
01:04:05,400 --> 01:04:10,020
system for example when I ask the system

1677
01:04:10,020 --> 01:04:10,890
to

1678
01:04:10,890 --> 01:04:14,600
to read something from the disk right

1679
01:04:14,600 --> 01:04:17,820
it's much easier to say read and when I

1680
01:04:17,820 --> 01:04:20,340
get back the control that I would get

1681
01:04:20,340 --> 01:04:21,750
for example in a synchronous system I

1682
01:04:21,750 --> 01:04:24,600
know that what I asked the operating

1683
01:04:24,600 --> 01:04:26,250
system or the server whatever to read is

1684
01:04:26,250 --> 01:04:28,950
there because I waited for it but now

1685
01:04:28,950 --> 01:04:30,750
imagine that you would write code in

1686
01:04:30,750 --> 01:04:32,370
which you say read but the read will

1687
01:04:32,370 --> 01:04:33,600
come in the future but you know anything

1688
01:04:33,600 --> 01:04:34,710
you have control back what would you do

1689
01:04:34,710 --> 01:04:36,120
with it you already held dealt with some

1690
01:04:36,120 --> 01:04:37,200
of these issues if you use for example

1691
01:04:37,200 --> 01:04:40,320
futures in in Scala right so it can't be

1692
01:04:40,320 --> 01:04:42,360
done but it's much harder to think about

1693
01:04:42,360 --> 01:04:44,430
the dynamic of the system it's the same

1694
01:04:44,430 --> 01:04:46,110
thing here so when you're saying right

1695
01:04:46,110 --> 01:04:49,440
the big question for the client is do

1696
01:04:49,440 --> 01:04:51,930
you wait until the right got propagated

1697
01:04:51,930 --> 01:04:53,280
everywhere and that you're sure that the

1698
01:04:53,280 --> 01:04:56,640
right happened or do you go immediately

1699
01:04:56,640 --> 01:04:58,950
and do something else but then you must

1700
01:04:58,950 --> 01:05:01,350
have a mechanism to notify you later

1701
01:05:01,350 --> 01:05:03,000
then the right might have failed there

1702
01:05:03,000 --> 01:05:04,170
are many reasons that the right could

1703
01:05:04,170 --> 01:05:07,170
actually have failed right and this is

1704
01:05:07,170 --> 01:05:08,520
one of the choices that have to be made

1705
01:05:08,520 --> 01:05:09,900
with this kind of assistance and really

1706
01:05:09,900 --> 01:05:11,550
depends on what consistency model you're

1707
01:05:11,550 --> 01:05:14,250
trying to enforce right now a big issue

1708
01:05:14,250 --> 01:05:16,140
always when you have things like home

1709
01:05:16,140 --> 01:05:19,260
bases fault tolerance the primary copy

1710
01:05:19,260 --> 01:05:21,840
goes down then what so you have to

1711
01:05:21,840 --> 01:05:24,090
augment this with mechanisms in which

1712
01:05:24,090 --> 01:05:26,370
you decide who's the primary copy based

1713
01:05:26,370 --> 01:05:28,230
on different protocols and we talked

1714
01:05:28,230 --> 01:05:29,340
already about the leader election

1715
01:05:29,340 --> 01:05:30,630
protocols right I'm going to talk more

1716
01:05:30,630 --> 01:05:35,280
about this right so that's one way to do

1717
01:05:35,280 --> 01:05:37,470
things again don't look for a silver

1718
01:05:37,470 --> 01:05:38,910
bullet here there's no such thing there

1719
01:05:38,910 --> 01:05:40,230
are solutions and have different

1720
01:05:40,230 --> 01:05:41,790
properties different compromises so

1721
01:05:41,790 --> 01:05:44,430
different kind of compromise would be

1722
01:05:44,430 --> 01:05:47,400
done by this local right protocol local

1723
01:05:47,400 --> 01:05:48,930
right protocol you're gonna write

1724
01:05:48,930 --> 01:05:51,300
locally right but essentially what you

1725
01:05:51,300 --> 01:05:53,070
what you're gonna do to stay sane I mean

1726
01:05:53,070 --> 01:05:54,420
everybody can write locally and then you

1727
01:05:54,420 --> 01:05:55,800
have to somehow resolve the conflicts

1728
01:05:55,800 --> 01:05:57,120
and that can have its own set of

1729
01:05:57,120 --> 01:05:58,890
problems but what you could do and this

1730
01:05:58,890 --> 01:06:00,270
is similar to those token based program

1731
01:06:00,270 --> 01:06:02,520
policies to get permission to own the

1732
01:06:02,520 --> 01:06:08,010
local cop to to have the the right the

1733
01:06:08,010 --> 01:06:09,630
right to write this is the different

1734
01:06:09,630 --> 01:06:12,090
rights right on your local machine right

1735
01:06:12,090 --> 01:06:13,770
you say temporarily I'm the one that

1736
01:06:13,770 --> 01:06:15,210
owns this resource then I can do any

1737
01:06:15,210 --> 01:06:16,710
rights I want to attend later I'm gonna

1738
01:06:16,710 --> 01:06:19,370
give the ownership to somebody else okay

1739
01:06:19,370 --> 01:06:22,620
this could work and work nicely

1740
01:06:22,620 --> 01:06:24,360
so essentially you say

1741
01:06:24,360 --> 01:06:27,240
if you think about it right even though

1742
01:06:27,240 --> 01:06:29,760
I use multiple clients if I have this

1743
01:06:29,760 --> 01:06:31,440
kind of local right protocol this might

1744
01:06:31,440 --> 01:06:34,770
work out reasonably well right so when I

1745
01:06:34,770 --> 01:06:37,170
move to a new machine I have to find out

1746
01:06:37,170 --> 01:06:41,100
who has the who's allowed to right and

1747
01:06:41,100 --> 01:06:43,980
say hey I want to grab that privilege

1748
01:06:43,980 --> 01:06:45,210
from you which means you don't have it

1749
01:06:45,210 --> 01:06:47,460
and I have it I might have a little bit

1750
01:06:47,460 --> 01:06:49,530
of mildly tedious protocol to do so but

1751
01:06:49,530 --> 01:06:52,080
beyond that point I can run very fast

1752
01:06:52,080 --> 01:06:53,910
because I'm just gonna write locally of

1753
01:06:53,910 --> 01:06:55,880
course the fault tolerance now suffers

1754
01:06:55,880 --> 01:06:59,700
right which again might or might not be

1755
01:06:59,700 --> 01:07:01,950
a big a big issue for some of the

1756
01:07:01,950 --> 01:07:04,320
activities you do okay so again traders

1757
01:07:04,320 --> 01:07:09,690
vs. trade-offs now when it comes to

1758
01:07:09,690 --> 01:07:12,300
right so I mean there is vast literature

1759
01:07:12,300 --> 01:07:15,510
on these issues right because things can

1760
01:07:15,510 --> 01:07:18,630
be very very complicated depending on

1761
01:07:18,630 --> 01:07:19,980
the consistency model you're trying to

1762
01:07:19,980 --> 01:07:21,840
enforce so one of the one of the things

1763
01:07:21,840 --> 01:07:23,580
that you could try to do is the

1764
01:07:23,580 --> 01:07:24,630
following so if you have multiple

1765
01:07:24,630 --> 01:07:31,020
servers then one way to consistently

1766
01:07:31,020 --> 01:07:33,300
propagate this right right is to be

1767
01:07:33,300 --> 01:07:34,740
careful who's allowed to right at what

1768
01:07:34,740 --> 01:07:36,060
moment of time and who's allowed to read

1769
01:07:36,060 --> 01:07:42,120
at one moment of time okay so one such

1770
01:07:42,120 --> 01:07:43,980
solution is so-called quorum based

1771
01:07:43,980 --> 01:07:44,760
protocols

1772
01:07:44,760 --> 01:07:46,260
so the quorum based protocols are

1773
01:07:46,260 --> 01:07:48,840
protocols in which you have multiple

1774
01:07:48,840 --> 01:07:51,750
participants and they all have to

1775
01:07:51,750 --> 01:07:53,820
participate in the act of either reading

1776
01:07:53,820 --> 01:07:55,920
or writing depending on how you actually

1777
01:07:55,920 --> 01:07:58,710
do things and of particular concern in

1778
01:07:58,710 --> 01:08:00,720
this circumstance is to make sure that

1779
01:08:00,720 --> 01:08:02,670
if somebody arrives nobody can read and

1780
01:08:02,670 --> 01:08:04,590
or at least everybody's read copied

1781
01:08:04,590 --> 01:08:07,200
invalidated right so then one such

1782
01:08:07,200 --> 01:08:08,730
possible solution so this is a solution

1783
01:08:08,730 --> 01:08:11,160
that's not good and this is basically a

1784
01:08:11,160 --> 01:08:16,620
solution that's good right so one one

1785
01:08:16,620 --> 01:08:19,529
way to make sure for example that you

1786
01:08:19,529 --> 01:08:22,170
invalidate everybody else's copy is to

1787
01:08:22,170 --> 01:08:24,779
basically say I need a let's say any

1788
01:08:24,779 --> 01:08:28,109
tokens to do any read or write in the

1789
01:08:28,109 --> 01:08:31,290
system as a client right now to do a

1790
01:08:31,290 --> 01:08:33,450
read I might and this is the situation

1791
01:08:33,450 --> 01:08:34,979
with a circle here right

1792
01:08:34,979 --> 01:08:37,618
if I get one token I can read

1793
01:08:37,618 --> 01:08:38,969
but in order to write I need all the

1794
01:08:38,969 --> 01:08:40,859
tokens now I mean of course the question

1795
01:08:40,859 --> 01:08:42,210
is you can come up with any protocol you

1796
01:08:42,210 --> 01:08:44,279
want but the question is does it enforce

1797
01:08:44,279 --> 01:08:46,770
the consistency model that you're

1798
01:08:46,770 --> 01:08:48,929
looking for right now it turns out that

1799
01:08:48,929 --> 01:08:52,198
this one will make sure that if anybody

1800
01:08:52,198 --> 01:08:54,210
writes nobody can read and you cannot

1801
01:08:54,210 --> 01:08:57,238
write if anybody reads is replicate but

1802
01:08:57,238 --> 01:08:59,238
notice that it's optimized for the reads

1803
01:08:59,238 --> 01:09:03,149
so if a single token is missing you

1804
01:09:03,149 --> 01:09:04,770
cannot write you have to wait until you

1805
01:09:04,770 --> 01:09:08,310
acquire all such tokens right clearly if

1806
01:09:08,310 --> 01:09:09,929
you have all the tokens nobody can read

1807
01:09:09,929 --> 01:09:13,649
right so it you can just reason through

1808
01:09:13,649 --> 01:09:15,569
the scenarios and you can see that well

1809
01:09:15,569 --> 01:09:19,049
the reason the rights are are are not

1810
01:09:19,049 --> 01:09:20,729
going to be wrong of course the problem

1811
01:09:20,729 --> 01:09:23,969
is starvation for the writer right it

1812
01:09:23,969 --> 01:09:25,560
might take a long time for the writer to

1813
01:09:25,560 --> 01:09:27,509
acquire all the tokens in order to be

1814
01:09:27,509 --> 01:09:30,179
able to produce that right operation

1815
01:09:30,179 --> 01:09:33,139
okay yes

1816
01:09:42,038 --> 01:09:47,029
right so you can go so you can go into

1817
01:09:47,029 --> 01:09:51,229
such protocols and refine them now one

1818
01:09:51,229 --> 01:09:54,400
benefit of having separate so you see

1819
01:09:54,400 --> 01:09:56,659
multiple readers are great but then you

1820
01:09:56,659 --> 01:09:58,280
need the mechanism to keep track of how

1821
01:09:58,280 --> 01:10:00,170
many such readers exist or don't exist

1822
01:10:00,170 --> 01:10:01,999
right so that's a choice in itself

1823
01:10:01,999 --> 01:10:04,909
whether you allow multiple reads or you

1824
01:10:04,909 --> 01:10:06,499
don't allow multiple reads to happen and

1825
01:10:06,499 --> 01:10:08,599
also what are exactly are these tokens

1826
01:10:08,599 --> 01:10:10,099
and where are they and how do you keep

1827
01:10:10,099 --> 01:10:11,690
track of them well one possible solution

1828
01:10:11,690 --> 01:10:13,999
is to have a single dedicated if you

1829
01:10:13,999 --> 01:10:15,199
want centralized server that just

1830
01:10:15,199 --> 01:10:17,119
manages these tokens it's very similar

1831
01:10:17,119 --> 01:10:20,300
to the centralized solution that we had

1832
01:10:20,300 --> 01:10:23,300
for the for the locking right and I mean

1833
01:10:23,300 --> 01:10:24,530
that's another way to do it you can you

1834
01:10:24,530 --> 01:10:26,300
can just use read blocks and write locks

1835
01:10:26,300 --> 01:10:36,249
in in a centralized solution yes

1836
01:10:52,320 --> 01:10:55,900
right so many many issues with this and

1837
01:10:55,900 --> 01:10:57,790
so this is what I'm trying to say with

1838
01:10:57,790 --> 01:10:59,320
these protocols you have to look at the

1839
01:10:59,320 --> 01:11:01,330
tiny details I mean if such a protocol

1840
01:11:01,330 --> 01:11:03,490
is published in a research paper almost

1841
01:11:03,490 --> 01:11:05,320
surely they show that it doesn't run

1842
01:11:05,320 --> 01:11:06,640
into this kind of issue so the issue is

1843
01:11:06,640 --> 01:11:08,380
here deadlock right so the deadlock is

1844
01:11:08,380 --> 01:11:10,870
refers to the fact that if two writers

1845
01:11:10,870 --> 01:11:12,550
start at the same time they acquire some

1846
01:11:12,550 --> 01:11:14,620
of the some of the some of the tokens

1847
01:11:14,620 --> 01:11:16,360
and then none of them can make progress

1848
01:11:16,360 --> 01:11:17,650
because they are waiting for the other

1849
01:11:17,650 --> 01:11:20,290
guys token tool to be acquired right and

1850
01:11:20,290 --> 01:11:22,630
that's it you're stuck so you must

1851
01:11:22,630 --> 01:11:25,870
prevent such signs of such things to

1852
01:11:25,870 --> 01:11:28,180
happen right I mean what kind of

1853
01:11:28,180 --> 01:11:29,680
mechanisms could you have to to do this

1854
01:11:29,680 --> 01:11:32,050
I mean one of them is to basically say

1855
01:11:32,050 --> 01:11:33,460
if you don't acquire all your tokens

1856
01:11:33,460 --> 01:11:34,480
within a certain amount of time you

1857
01:11:34,480 --> 01:11:36,070
release them then might or might not

1858
01:11:36,070 --> 01:11:37,780
work I mean all of these things could

1859
01:11:37,780 --> 01:11:39,400
potentially be problematic depending on

1860
01:11:39,400 --> 01:11:41,710
what is it that the token is I mean does

1861
01:11:41,710 --> 01:11:43,450
it require a certain message exchange or

1862
01:11:43,450 --> 01:11:46,950
doesn't require certain message exchange

1863
01:12:01,780 --> 01:12:05,510
well so what I want to point out I mean

1864
01:12:05,510 --> 01:12:07,579
the details don't matter that much okay

1865
01:12:07,579 --> 01:12:09,739
because you literally have thousands of

1866
01:12:09,739 --> 01:12:11,150
these protocols proposed I mean what am

1867
01:12:11,150 --> 01:12:12,260
I gonna do walk through all of them

1868
01:12:12,260 --> 01:12:14,719
right what matters is to be aware of

1869
01:12:14,719 --> 01:12:16,489
issues so one big issue with anything

1870
01:12:16,489 --> 01:12:18,709
that requires tokens or locks or things

1871
01:12:18,709 --> 01:12:21,199
of this sort is deadlocks right so

1872
01:12:21,199 --> 01:12:23,179
that's a tremendously big issue when it

1873
01:12:23,179 --> 01:12:24,590
comes to any kind of synchronization and

1874
01:12:24,590 --> 01:12:27,169
by the way that so there are two things

1875
01:12:27,169 --> 01:12:29,570
that eat you alive when you do parallel

1876
01:12:29,570 --> 01:12:33,320
processing right one of them is race

1877
01:12:33,320 --> 01:12:35,239
conditions you forgot to lock and you

1878
01:12:35,239 --> 01:12:37,669
access resources at the same time and

1879
01:12:37,669 --> 01:12:39,439
the other one is you locked enough to

1880
01:12:39,439 --> 01:12:43,099
deadlock right either of which are

1881
01:12:43,099 --> 01:12:45,439
completely disastrous right then they

1882
01:12:45,439 --> 01:12:47,479
have different ways of being disasters

1883
01:12:47,479 --> 01:12:49,400
over there completely disastrous and by

1884
01:12:49,400 --> 01:12:51,380
the way even things that you might think

1885
01:12:51,380 --> 01:12:53,449
should not ever run in trouble do

1886
01:12:53,449 --> 01:12:55,969
occasionally run in trouble right in

1887
01:12:55,969 --> 01:12:57,949
particular for example the operating

1888
01:12:57,949 --> 01:13:00,019
system itself is in fact the mini

1889
01:13:00,019 --> 01:13:01,550
distributed system it needs to use

1890
01:13:01,550 --> 01:13:03,679
synchronization because even if you

1891
01:13:03,679 --> 01:13:04,969
don't have multi-threaded applications

1892
01:13:04,969 --> 01:13:06,559
running multiple processes is in fact

1893
01:13:06,559 --> 01:13:09,439
some sort of a multi-threaded activity

1894
01:13:09,439 --> 01:13:12,349
right it can actually happen and this is

1895
01:13:12,349 --> 01:13:15,019
for example what plagued operating

1896
01:13:15,019 --> 01:13:16,939
systems for a long time and this is

1897
01:13:16,939 --> 01:13:18,199
where the blue screen of death comes

1898
01:13:18,199 --> 01:13:21,409
into play and other such oops for

1899
01:13:21,409 --> 01:13:24,610
example in the Linux kernel in which

1900
01:13:24,610 --> 01:13:27,679
well I mean they're all humans right so

1901
01:13:27,679 --> 01:13:30,380
they wrote code and they were convinced

1902
01:13:30,380 --> 01:13:31,969
it's okay but at some point somebody

1903
01:13:31,969 --> 01:13:33,829
acquired needed multiple locks to do an

1904
01:13:33,829 --> 01:13:35,719
operation acquired one of them was

1905
01:13:35,719 --> 01:13:37,039
waiting for the other one some other

1906
01:13:37,039 --> 01:13:40,939
application had the lock and requires

1907
01:13:40,939 --> 01:13:42,919
the second lock and all of them could

1908
01:13:42,919 --> 01:13:44,239
make progress because it waits for the

1909
01:13:44,239 --> 01:13:46,309
other one to release the lock right and

1910
01:13:46,309 --> 01:13:51,229
then what well the it's actually you

1911
01:13:51,229 --> 01:13:52,969
have a deadlock that means that resource

1912
01:13:52,969 --> 01:13:54,499
is completely locked out and nobody can

1913
01:13:54,499 --> 01:13:56,119
ever access it until you reboot the

1914
01:13:56,119 --> 01:13:57,829
operating system the Linux kernel for

1915
01:13:57,829 --> 01:14:00,199
example even have a I mean the PS

1916
01:14:00,199 --> 01:14:02,630
program in Linux or most UNIX operating

1917
01:14:02,630 --> 01:14:05,419
systems has a special symbol for this is

1918
01:14:05,419 --> 01:14:07,999
D which means it's deadlocked by the way

1919
01:14:07,999 --> 01:14:10,280
if you see any process that has a D next

1920
01:14:10,280 --> 01:14:10,660
to

1921
01:14:10,660 --> 01:14:12,910
that essentially means maybe by a

1922
01:14:12,910 --> 01:14:14,800
miracle somehow something happens in the

1923
01:14:14,800 --> 01:14:16,840
future and gets rid of it but if not

1924
01:14:16,840 --> 01:14:19,180
next time after you freshly reboot the

1925
01:14:19,180 --> 01:14:20,410
system is going to go away there is no

1926
01:14:20,410 --> 01:14:21,520
other way it's gonna go away

1927
01:14:21,520 --> 01:14:24,610
and if it's about a very important

1928
01:14:24,610 --> 01:14:26,350
resource that could essentially mean

1929
01:14:26,350 --> 01:14:28,330
you're forced to reboot the operating

1930
01:14:28,330 --> 01:14:31,870
system right these dead locks do still

1931
01:14:31,870 --> 01:14:33,940
happen even in things that are so

1932
01:14:33,940 --> 01:14:36,070
pounded on for such a long time like

1933
01:14:36,070 --> 01:14:38,760
operating systems when it comes to user

1934
01:14:38,760 --> 01:14:41,290
user programs you can have lots of these

1935
01:14:41,290 --> 01:14:45,120
things right yes

1936
01:14:51,649 --> 01:14:56,329
right so I mean look look look for

1937
01:14:56,329 --> 01:15:00,349
everything right you can be very careful

1938
01:15:00,349 --> 01:15:02,510
and very safe but the slow or very fast

1939
01:15:02,510 --> 01:15:04,669
and then run in some kind of trouble or

1940
01:15:04,669 --> 01:15:08,809
have solutions that I mean in I don't

1941
01:15:08,809 --> 01:15:09,979
want to go too much into these details

1942
01:15:09,979 --> 01:15:11,959
because they are discussing in an

1943
01:15:11,959 --> 01:15:13,699
undergrad operating system right I mean

1944
01:15:13,699 --> 01:15:15,649
I'm sure you took such a class most of

1945
01:15:15,649 --> 01:15:17,360
you and you had all that deadlock

1946
01:15:17,360 --> 01:15:19,610
prevention and deadlock detection right

1947
01:15:19,610 --> 01:15:21,800
that work detects detection is very

1948
01:15:21,800 --> 01:15:23,599
costly

1949
01:15:23,599 --> 01:15:27,289
that drug prevention it robs you of

1950
01:15:27,289 --> 01:15:28,760
situations where you could have ran in

1951
01:15:28,760 --> 01:15:29,840
parallel so you have to give up

1952
01:15:29,840 --> 01:15:33,919
something right and by the way you might

1953
01:15:33,919 --> 01:15:37,369
prevent deadlocks but have something

1954
01:15:37,369 --> 01:15:39,409
called live logs in which are still not

1955
01:15:39,409 --> 01:15:40,219
making progress

1956
01:15:40,219 --> 01:15:42,169
you're just doing busy work alright so

1957
01:15:42,169 --> 01:15:44,769
for example a live log here would be

1958
01:15:44,769 --> 01:15:47,179
right you try to acquire tokens for the

1959
01:15:47,179 --> 01:15:49,760
right but if you cannot acquire enough

1960
01:15:49,760 --> 01:15:51,469
tokens within one second you give up and

1961
01:15:51,469 --> 01:15:53,449
start again but you see that still

1962
01:15:53,449 --> 01:15:55,129
doesn't guarantee that anybody will go

1963
01:15:55,129 --> 01:15:57,800
through and do anything why because if

1964
01:15:57,800 --> 01:15:59,749
you have a bunch of this aggressive guys

1965
01:15:59,749 --> 01:16:01,280
that want to do this they wait the

1966
01:16:01,280 --> 01:16:02,899
second and again go and steal some of

1967
01:16:02,899 --> 01:16:04,340
the tokens and again they give up and

1968
01:16:04,340 --> 01:16:05,840
steal tokens and again they give up and

1969
01:16:05,840 --> 01:16:09,619
steal tokens yeah but right right right

1970
01:16:09,619 --> 01:16:11,989
so you have to maybe but none of these

1971
01:16:11,989 --> 01:16:14,419
things are really guaranteed to really

1972
01:16:14,419 --> 01:16:16,010
really work a bit okay random

1973
01:16:16,010 --> 01:16:17,479
essentially means giving up performance

1974
01:16:17,479 --> 01:16:19,939
by the way all right so random restarts

1975
01:16:19,939 --> 01:16:21,679
which are used for wireless access most

1976
01:16:21,679 --> 01:16:22,849
of the time when you have collisions on

1977
01:16:22,849 --> 01:16:24,709
the channel right here so one solution

1978
01:16:24,709 --> 01:16:26,539
is to do this exponential exponential

1979
01:16:26,539 --> 01:16:28,280
backups right you start with one second

1980
01:16:28,280 --> 01:16:29,629
but if I don't succeed I go to two

1981
01:16:29,629 --> 01:16:31,579
seconds four seconds right ramp it up

1982
01:16:31,579 --> 01:16:35,419
exponentially you can actually run in

1983
01:16:35,419 --> 01:16:39,769
it's clear that somebody at some point

1984
01:16:39,769 --> 01:16:41,510
succeeds because you keep on going

1985
01:16:41,510 --> 01:16:43,489
exponentially back in time I mean you

1986
01:16:43,489 --> 01:16:44,809
can even do some sorts of analyses

1987
01:16:44,809 --> 01:16:46,369
depending on how many guys are trying to

1988
01:16:46,369 --> 01:16:49,010
jump on it and how fast on expectation

1989
01:16:49,010 --> 01:16:51,280
when you would you would do things but

1990
01:16:51,280 --> 01:16:54,289
keep in mind that if this is part of the

1991
01:16:54,289 --> 01:16:55,969
things you do all the time you might be

1992
01:16:55,969 --> 01:16:57,829
robbed of tremendous amount of

1993
01:16:57,829 --> 01:17:01,760
performance to do this right so if for

1994
01:17:01,760 --> 01:17:03,590
every single item that you're accessing

1995
01:17:03,590 --> 01:17:05,690
you need to do this exponential back

1996
01:17:05,690 --> 01:17:07,160
it's potentially a disaster I mean

1997
01:17:07,160 --> 01:17:08,360
you're not doing anything except running

1998
01:17:08,360 --> 01:17:10,880
this exponential backup backup

1999
01:17:10,880 --> 01:17:14,150
algorithms right so when it comes to

2000
01:17:14,150 --> 01:17:17,180
selecting such so okay let me back up

2001
01:17:17,180 --> 01:17:17,960
even more

2002
01:17:17,960 --> 01:17:20,360
I want you mostly to be aware of this

2003
01:17:20,360 --> 01:17:22,040
kind of things the fact that it's tricky

2004
01:17:22,040 --> 01:17:24,110
to get such a protocol running right

2005
01:17:24,110 --> 01:17:26,780
that you have to ask hard questions

2006
01:17:26,780 --> 01:17:28,760
about what is it that's going on and

2007
01:17:28,760 --> 01:17:31,730
does it fit the specific application I

2008
01:17:31,730 --> 01:17:33,410
have in mind because some of them can

2009
01:17:33,410 --> 01:17:34,850
work exceptionally well under certain

2010
01:17:34,850 --> 01:17:36,620
circumstances and be disasters under

2011
01:17:36,620 --> 01:17:38,660
other circumstances for example under

2012
01:17:38,660 --> 01:17:40,370
rare rides this will help this will

2013
01:17:40,370 --> 01:17:42,410
actually work nicely with very

2014
01:17:42,410 --> 01:17:43,760
aggressive writes this could actually be

2015
01:17:43,760 --> 01:17:47,510
a disaster right but it's like this for

2016
01:17:47,510 --> 01:17:49,850
everything else in this class right all

2017
01:17:49,850 --> 01:17:52,130
of them have good points and not so good

2018
01:17:52,130 --> 01:17:54,340
points depending on the circumstances so

2019
01:17:54,340 --> 01:17:56,540
when it comes to using disability

2020
01:17:56,540 --> 01:17:58,880
systems you the main question is which

2021
01:17:58,880 --> 01:18:01,900
compromise fits my application really

2022
01:18:01,900 --> 01:18:04,820
right out of all the compromises that

2023
01:18:04,820 --> 01:18:07,250
exist as opposed to knowing what the

2024
01:18:07,250 --> 01:18:10,760
good solution is in certain areas is

2025
01:18:10,760 --> 01:18:12,140
known what the good solution is and that

2026
01:18:12,140 --> 01:18:13,310
that's it right you can prove that

2027
01:18:13,310 --> 01:18:15,290
that's the best solution and that that's

2028
01:18:15,290 --> 01:18:17,210
the end of the story right but it's not

2029
01:18:17,210 --> 01:18:19,520
the case here for example what's the

2030
01:18:19,520 --> 01:18:21,140
best sorting algorithm you know that's

2031
01:18:21,140 --> 01:18:23,210
more key now right it used to be the

2032
01:18:23,210 --> 01:18:24,680
case that the answer was easy yeah a

2033
01:18:24,680 --> 01:18:26,540
quicksort but it's not so easy anymore

2034
01:18:26,540 --> 01:18:29,030
but then ask yourself what's the penalty

2035
01:18:29,030 --> 01:18:31,040
of not picking the right algorithm well

2036
01:18:31,040 --> 01:18:33,260
is usually just a factor of two unless

2037
01:18:33,260 --> 01:18:35,000
you do distributed sort when can be many

2038
01:18:35,000 --> 01:18:37,880
orders of magnitude but right but things

2039
01:18:37,880 --> 01:18:39,380
are not easy anymore because everything

2040
01:18:39,380 --> 01:18:43,940
is so complicated put together so we

2041
01:18:43,940 --> 01:18:46,300
still have

2042
01:18:48,510 --> 01:18:52,989
I'm sorry about 20 minutes okay so when

2043
01:18:52,989 --> 01:18:59,260
it comes to right so when it comes to do

2044
01:18:59,260 --> 01:19:00,880
replication right we've seen I mean

2045
01:19:00,880 --> 01:19:03,820
replication is the worst to a large

2046
01:19:03,820 --> 01:19:05,770
extent when when it comes to the

2047
01:19:05,770 --> 01:19:07,510
decisions you actually have to you have

2048
01:19:07,510 --> 01:19:08,949
to make now you have to pay or adapt

2049
01:19:08,949 --> 01:19:11,949
also to with some sort of backups right

2050
01:19:11,949 --> 01:19:13,750
in order to make the information really

2051
01:19:13,750 --> 01:19:15,760
if you want permanent and I mean

2052
01:19:15,760 --> 01:19:17,710
ultimately for example let's think about

2053
01:19:17,710 --> 01:19:19,210
jet applications right I mean for chat

2054
01:19:19,210 --> 01:19:20,440
applications again you have to ask

2055
01:19:20,440 --> 01:19:22,659
yourself the question is what am I

2056
01:19:22,659 --> 01:19:27,909
losing if everything goes away maybe in

2057
01:19:27,909 --> 01:19:30,929
fact not so much and I've actually okay

2058
01:19:30,929 --> 01:19:34,800
did anybody know anything about snapchat

2059
01:19:34,800 --> 01:19:38,679
okay so this is I didn't look at all the

2060
01:19:38,679 --> 01:19:40,179
details but I think this is mostly about

2061
01:19:40,179 --> 01:19:43,060
so they built in the opposite of fault

2062
01:19:43,060 --> 01:19:44,920
tolerance into the protocol it goes away

2063
01:19:44,920 --> 01:19:47,050
after a while right now this is one way

2064
01:19:47,050 --> 01:19:49,030
to solve all your problems right so you

2065
01:19:49,030 --> 01:19:50,320
can essentially solve all your problems

2066
01:19:50,320 --> 01:19:52,840
with replication by not only not

2067
01:19:52,840 --> 01:19:54,580
worrying about replication but promising

2068
01:19:54,580 --> 01:19:57,190
that you're not gonna replicate right so

2069
01:19:57,190 --> 01:19:59,920
apparently snapchat well first of all

2070
01:19:59,920 --> 01:20:03,040
these value that three billion dollars

2071
01:20:03,040 --> 01:20:07,600
which I mean whatever but apparently

2072
01:20:07,600 --> 01:20:09,520
snapchat is basically pictures or

2073
01:20:09,520 --> 01:20:11,889
whatever they go and the system will get

2074
01:20:11,889 --> 01:20:13,300
rid of them in whatever short amount of

2075
01:20:13,300 --> 01:20:15,730
time all right so that basically means

2076
01:20:15,730 --> 01:20:20,350
that you do exactly the opposite

2077
01:20:20,350 --> 01:20:22,480
applications go exactly in the opposite

2078
01:20:22,480 --> 01:20:24,010
direction right so you're in fact

2079
01:20:24,010 --> 01:20:25,270
promising not only that you're not gonna

2080
01:20:25,270 --> 01:20:27,340
replicate but you're in fact promising

2081
01:20:27,340 --> 01:20:28,750
that even on the client you're going to

2082
01:20:28,750 --> 01:20:31,620
destroy any such copy within a store

2083
01:20:31,620 --> 01:20:34,239
right so it's exactly right it's exactly

2084
01:20:34,239 --> 01:20:35,889
the opposite of caching I'm gonna cash

2085
01:20:35,889 --> 01:20:37,270
for this amount of time and then destroy

2086
01:20:37,270 --> 01:20:39,540
any copy I have and it's gone and

2087
01:20:39,540 --> 01:20:41,679
whatever it's cryptographically secure

2088
01:20:41,679 --> 01:20:43,449
or whatnot

2089
01:20:43,449 --> 01:20:44,889
I by the way that should do cryptography

2090
01:20:44,889 --> 01:20:48,250
in fact and that's I know I know and

2091
01:20:48,250 --> 01:20:49,630
they kind of just rely on the fact that

2092
01:20:49,630 --> 01:20:51,730
other developers are not smart enough to

2093
01:20:51,730 --> 01:20:53,139
write another application that watch is

2094
01:20:53,139 --> 01:20:55,540
what the snapchat does right but for

2095
01:20:55,540 --> 01:20:58,350
example cryptographically

2096
01:20:59,170 --> 01:21:02,350
well by the way this is a real problem

2097
01:21:02,350 --> 01:21:04,600
we have to discuss this how do you mean

2098
01:21:04,600 --> 01:21:08,160
how do you make self-destructive

2099
01:21:08,160 --> 01:21:10,960
messages right away you you see in some

2100
01:21:10,960 --> 01:21:13,210
of the spy movies right if this message

2101
01:21:13,210 --> 01:21:15,070
will self destructing whatever amount of

2102
01:21:15,070 --> 01:21:16,570
time now if you literally have a small

2103
01:21:16,570 --> 01:21:18,370
explosive device in a physical device I

2104
01:21:18,370 --> 01:21:26,830
mean that's true right but Abbi this is

2105
01:21:26,830 --> 01:21:29,680
a real question and well most of all I

2106
01:21:29,680 --> 01:21:32,230
think people like snapchat are mostly

2107
01:21:32,230 --> 01:21:34,510
irresponsible right they say yeah it's a

2108
01:21:34,510 --> 01:21:35,560
cool application we don't worry about

2109
01:21:35,560 --> 01:21:37,030
anything in the world they're not

2110
01:21:37,030 --> 01:21:38,320
promising anything I just kind of say

2111
01:21:38,320 --> 01:21:40,750
yeah it's kind of okay right but imagine

2112
01:21:40,750 --> 01:21:42,520
for example you're the US government and

2113
01:21:42,520 --> 01:21:44,110
you really want those things not to be

2114
01:21:44,110 --> 01:21:45,400
available beyond a certain amount of

2115
01:21:45,400 --> 01:21:48,220
time how would you do it these are very

2116
01:21:48,220 --> 01:21:51,040
hard questions to to in fact answer and

2117
01:21:51,040 --> 01:21:52,870
to some extent you need to put together

2118
01:21:52,870 --> 01:21:54,880
multiple pieces of some sort of a key

2119
01:21:54,880 --> 01:21:57,790
and make that unavailable somehow I mean

2120
01:21:57,790 --> 01:22:00,040
the big problem is what if I could read

2121
01:22:00,040 --> 01:22:01,570
it once what would prevent me from

2122
01:22:01,570 --> 01:22:03,340
reading it again unless I have some

2123
01:22:03,340 --> 01:22:04,900
physical mechanism that prevents that

2124
01:22:04,900 --> 01:22:08,470
from happening all right so that's but

2125
01:22:08,470 --> 01:22:10,150
there are interesting questions with

2126
01:22:10,150 --> 01:22:11,680
respect to that for example also with

2127
01:22:11,680 --> 01:22:15,880
this when it comes to for example secure

2128
01:22:15,880 --> 01:22:19,030
secure access to resource right how many

2129
01:22:19,030 --> 01:22:20,770
of you and this is a kind of an

2130
01:22:20,770 --> 01:22:22,750
interesting maybe preview to to security

2131
01:22:22,750 --> 01:22:24,820
but how many of you have seen those tags

2132
01:22:24,820 --> 01:22:28,840
people wear on the keyring or on the on

2133
01:22:28,840 --> 01:22:30,760
the neck that have some sort of a weird

2134
01:22:30,760 --> 01:22:32,530
counter that keeps on counting and they

2135
01:22:32,530 --> 01:22:34,360
use that to login into a website and

2136
01:22:34,360 --> 01:22:36,340
that counter is valid for about 30

2137
01:22:36,340 --> 01:22:39,870
seconds or whatever to watch that stuff

2138
01:22:46,750 --> 01:22:49,490
right but the key for all those things

2139
01:22:49,490 --> 01:22:50,900
right is the fact that they're

2140
01:22:50,900 --> 01:22:57,650
unforgeable right look by the way the

2141
01:22:57,650 --> 01:22:59,720
way it worked is very simple it

2142
01:22:59,720 --> 01:23:01,730
literally has a counter one two three

2143
01:23:01,730 --> 01:23:04,910
four and you push it for AES 256 or for

2144
01:23:04,910 --> 01:23:06,170
some kind of a public key cryptography

2145
01:23:06,170 --> 01:23:08,180
mechanism and you simply take what comes

2146
01:23:08,180 --> 01:23:10,310
out the other end right now remember

2147
01:23:10,310 --> 01:23:11,570
what I told you about public key

2148
01:23:11,570 --> 01:23:13,850
cryptography it's so good that even if

2149
01:23:13,850 --> 01:23:14,960
you know what you put in you don't

2150
01:23:14,960 --> 01:23:18,590
cannot predict what comes out right and

2151
01:23:18,590 --> 01:23:20,570
in fact we will produce a sequence that

2152
01:23:20,570 --> 01:23:22,190
cannot be forced without knowing the key

2153
01:23:22,190 --> 01:23:24,020
right it doesn't matter how many pairs

2154
01:23:24,020 --> 01:23:26,150
of input outputs you have unless you get

2155
01:23:26,150 --> 01:23:28,040
very very close to two to two hundred or

2156
01:23:28,040 --> 01:23:29,180
something like that you're not gonna be

2157
01:23:29,180 --> 01:23:31,520
able to do guess what the key is and you

2158
01:23:31,520 --> 01:23:32,780
need a key in order to predict the next

2159
01:23:32,780 --> 01:23:33,650
number in the sequence

2160
01:23:33,650 --> 01:23:35,540
so those are two logics and

2161
01:23:35,540 --> 01:23:38,240
unpredictable numbers unless you somehow

2162
01:23:38,240 --> 01:23:39,940
extract the key from the physical device

2163
01:23:39,940 --> 01:23:44,870
right okay more discussions about that

2164
01:23:44,870 --> 01:23:47,330
so when I discuss about security

2165
01:23:47,330 --> 01:23:49,670
somebody has to remind me to talk about

2166
01:23:49,670 --> 01:23:51,590
physical attacks on secure systems

2167
01:23:51,590 --> 01:23:53,240
because I have a number of cool stories

2168
01:23:53,240 --> 01:23:56,360
to tell okay physical attacks all right

2169
01:23:56,360 --> 01:23:58,520
so let's start talking a little bit

2170
01:23:58,520 --> 01:24:00,550
about fault tolerance so we've seen the

2171
01:24:00,550 --> 01:24:04,070
I mean concurrency replication one major

2172
01:24:04,070 --> 01:24:05,780
reason to do replication was fault

2173
01:24:05,780 --> 01:24:07,130
tolerance another one is increased

2174
01:24:07,130 --> 01:24:11,480
performance right all right

2175
01:24:11,480 --> 01:24:16,250
so when it comes to full tolerance I let

2176
01:24:16,250 --> 01:24:17,750
me just warm up the discussion about

2177
01:24:17,750 --> 01:24:21,860
fault tolerance with the issue of how do

2178
01:24:21,860 --> 01:24:25,820
you know something is faulty right let's

2179
01:24:25,820 --> 01:24:27,320
even forget about all the formalism and

2180
01:24:27,320 --> 01:24:28,730
all the stuff that needs to be discussed

2181
01:24:28,730 --> 01:24:30,200
when it comes to fault tolerance it's

2182
01:24:30,200 --> 01:24:34,310
basically okay you can say if something

2183
01:24:34,310 --> 01:24:35,960
is faulty I'll do something about it

2184
01:24:35,960 --> 01:24:37,970
replicate switch or the replicas switch

2185
01:24:37,970 --> 01:24:39,140
to another server how do you know

2186
01:24:39,140 --> 01:24:43,250
something is faulty all right so let me

2187
01:24:43,250 --> 01:24:45,790
put some some stuff on the board

2188
01:24:45,790 --> 01:24:49,840
question is what is faulty

2189
01:24:58,520 --> 01:25:00,990
so how do you know something is broken

2190
01:25:00,990 --> 01:25:04,730
I mean faulty means broken in some way

2191
01:25:04,730 --> 01:25:07,350
so all of these things need to be

2192
01:25:07,350 --> 01:25:10,170
somehow monitored I mean when it comes

2193
01:25:10,170 --> 01:25:13,110
to faulty you can think about self

2194
01:25:13,110 --> 01:25:15,390
diagnosis maybe works maybe doesn't

2195
01:25:15,390 --> 01:25:17,190
right so maybe self diagnosis would work

2196
01:25:17,190 --> 01:25:19,890
for this I monitor myself and I realize

2197
01:25:19,890 --> 01:25:21,300
I'm broken I mean some of the cars mone

2198
01:25:21,300 --> 01:25:22,740
toward themselves and start screaming if

2199
01:25:22,740 --> 01:25:24,330
they don't like something sometimes it's

2200
01:25:24,330 --> 01:25:25,470
nothing and they just annoy you

2201
01:25:25,470 --> 01:25:27,420
right it's a red light comes on and hey

2202
01:25:27,420 --> 01:25:29,220
something is faulty so maybe self

2203
01:25:29,220 --> 01:25:36,810
monitoring maybe the other one can be

2204
01:25:36,810 --> 01:25:39,660
peer monitoring right for example peer

2205
01:25:39,660 --> 01:25:42,590
or time monitoring

2206
01:25:47,199 --> 01:25:50,290
alright you could also have physical

2207
01:25:50,290 --> 01:25:52,330
devices that monitor for example this is

2208
01:25:52,330 --> 01:25:54,219
kind of interesting and you can actually

2209
01:25:54,219 --> 01:25:57,219
buy all this stuff they sell devices now

2210
01:25:57,219 --> 01:25:59,290
that can in fact monitor whether a

2211
01:25:59,290 --> 01:26:01,719
computer system has power doesn't have

2212
01:26:01,719 --> 01:26:03,640
power or is in some sort of a bad state

2213
01:26:03,640 --> 01:26:06,219
and then those devices can be used to

2214
01:26:06,219 --> 01:26:10,719
remotely reboot emotion see it's

2215
01:26:10,719 --> 01:26:13,150
important that the monitoring as much as

2216
01:26:13,150 --> 01:26:15,400
possible it's an independent physical

2217
01:26:15,400 --> 01:26:17,110
device from the primary device because

2218
01:26:17,110 --> 01:26:18,760
the fold could be related to a global

2219
01:26:18,760 --> 01:26:20,739
system failure if you have partial

2220
01:26:20,739 --> 01:26:22,360
system failures you can imagine that you

2221
01:26:22,360 --> 01:26:23,710
have some sort of self monitoring going

2222
01:26:23,710 --> 01:26:25,330
on but it's a global system failure you

2223
01:26:25,330 --> 01:26:26,920
must have another system to monitor this

2224
01:26:26,920 --> 01:26:29,380
system right now let's think about the

2225
01:26:29,380 --> 01:26:33,340
kind of falls that can happen right so

2226
01:26:33,340 --> 01:26:34,960
what kind of kind of faults could I have

2227
01:26:34,960 --> 01:26:38,230
well I mean all the servers or all these

2228
01:26:38,230 --> 01:26:40,239
computers run some programs so the

2229
01:26:40,239 --> 01:26:42,340
program itself can have bugs I mean so

2230
01:26:42,340 --> 01:26:45,690
causes or Falls let's think about that

2231
01:26:46,230 --> 01:26:49,330
right so the I think the primary cause

2232
01:26:49,330 --> 01:26:53,080
for bugs is in fact software defense

2233
01:26:53,080 --> 01:27:01,710
bugs right okay so we can have bugs

2234
01:27:01,710 --> 01:27:05,080
software bugs so let's say software bugs

2235
01:27:05,080 --> 01:27:09,030
we can have Hardware bugs less nowadays

2236
01:27:09,030 --> 01:27:11,530
by the way do you know what why bugs are

2237
01:27:11,530 --> 01:27:13,949
cold bugs

2238
01:27:28,480 --> 01:27:31,480
right

2239
01:27:35,570 --> 01:27:38,060
right so for the people that probably

2240
01:27:38,060 --> 01:27:39,440
couldn't hear you is basically a

2241
01:27:39,440 --> 01:27:41,870
literally found of physical bug in one

2242
01:27:41,870 --> 01:27:43,190
of the circuit boards in one of the

2243
01:27:43,190 --> 01:27:44,540
early computers and they call the whole

2244
01:27:44,540 --> 01:27:46,100
process debugging finding the bug

2245
01:27:46,100 --> 01:27:48,770
okay all right so we can have software

2246
01:27:48,770 --> 01:27:50,720
bugs we can have Hardware bugs now I

2247
01:27:50,720 --> 01:27:52,820
want you to understand that this

2248
01:27:52,820 --> 01:27:54,920
computer systems are not super fragile

2249
01:27:54,920 --> 01:27:58,390
but are somehow influenced by various

2250
01:27:58,390 --> 01:28:00,920
external things and a lot of these bugs

2251
01:28:00,920 --> 01:28:03,230
could be caused by external things I

2252
01:28:03,230 --> 01:28:05,720
mean for example when it comes to to for

2253
01:28:05,720 --> 01:28:09,230
to unfold it's basically the power goes

2254
01:28:09,230 --> 01:28:10,250
down I mean that's one of the bigger

2255
01:28:10,250 --> 01:28:11,210
faults right

2256
01:28:11,210 --> 01:28:13,130
power failure this definitely could do

2257
01:28:13,130 --> 01:28:21,920
it right but let me let me give you an

2258
01:28:21,920 --> 01:28:23,600
idea why it's impossible to have a

2259
01:28:23,600 --> 01:28:25,070
system that doesn't have at least some

2260
01:28:25,070 --> 01:28:26,960
eventually some kind of a hardware bugs

2261
01:28:26,960 --> 01:28:33,350
okay so so one of the things that almost

2262
01:28:33,350 --> 01:28:35,660
surely will start producing havoc here's

2263
01:28:35,660 --> 01:28:38,000
the temperature rising for whatever

2264
01:28:38,000 --> 01:28:41,030
reason so it turns out that well because

2265
01:28:41,030 --> 01:28:42,800
of thermodynamics right the higher

2266
01:28:42,800 --> 01:28:45,740
temperature means bigger agitation on

2267
01:28:45,740 --> 01:28:47,390
the on the atoms which essentially means

2268
01:28:47,390 --> 01:28:50,750
that clean zero zero in one state that

2269
01:28:50,750 --> 01:28:53,780
can be kept separate from from each

2270
01:28:53,780 --> 01:28:55,190
other unless you want to do a state

2271
01:28:55,190 --> 01:28:56,690
transition can actually automatically

2272
01:28:56,690 --> 01:29:00,140
transition from a zero to one and by the

2273
01:29:00,140 --> 01:29:02,090
way this is actually I might as well

2274
01:29:02,090 --> 01:29:04,400
tell you this because this looks like a

2275
01:29:04,400 --> 01:29:06,320
bug but can be used as an interesting

2276
01:29:06,320 --> 01:29:07,520
feature when it comes to security

2277
01:29:07,520 --> 01:29:09,920
attacks so there is a group I'm gonna

2278
01:29:09,920 --> 01:29:11,060
tell you the whole story when I talk

2279
01:29:11,060 --> 01:29:13,340
about security but there is a group at I

2280
01:29:13,340 --> 01:29:15,260
think Harvard that's known for

2281
01:29:15,260 --> 01:29:17,300
completely crazy security attacks and

2282
01:29:17,300 --> 01:29:19,070
one of their attacks is based on

2283
01:29:19,070 --> 01:29:21,680
increasing the temperature so

2284
01:29:21,680 --> 01:29:22,820
essentially they showed how you can

2285
01:29:22,820 --> 01:29:24,290
exploit a machine or the java virtual

2286
01:29:24,290 --> 01:29:25,820
machine by increasing the temperature

2287
01:29:25,820 --> 01:29:27,020
because when you increase the

2288
01:29:27,020 --> 01:29:28,400
temperature you produce these random

2289
01:29:28,400 --> 01:29:30,470
flips between zero and one and they

2290
01:29:30,470 --> 01:29:32,930
showed a particularly out in memory for

2291
01:29:32,930 --> 01:29:37,370
a Java program that will allow a single

2292
01:29:37,370 --> 01:29:39,350
bit flip to be exploited and to do

2293
01:29:39,350 --> 01:29:41,960
anything you want with the with the Java

2294
01:29:41,960 --> 01:29:44,660
Virtual Machine right so essentially all

2295
01:29:44,660 --> 01:29:47,530
you have to do is heat up the

2296
01:29:47,530 --> 01:29:49,280
temperature in the room

2297
01:29:49,280 --> 01:29:52,580
right up the memory is the first one to

2298
01:29:52,580 --> 01:29:54,950
go when it comes to heating up because

2299
01:29:54,950 --> 01:29:56,540
you're going to start having those zero

2300
01:29:56,540 --> 01:29:58,160
to one transitions in the memory but

2301
01:29:58,160 --> 01:29:59,330
there are other things by the way that

2302
01:29:59,330 --> 01:30:00,710
transition automatically and this is why

2303
01:30:00,710 --> 01:30:03,800
servers have error correcting memory

2304
01:30:03,800 --> 01:30:07,070
right for example a cosmic rays which do

2305
01:30:07,070 --> 01:30:10,130
happen right one goes through it will

2306
01:30:10,130 --> 01:30:12,440
actually produce a flip and they don't

2307
01:30:12,440 --> 01:30:14,330
happen too often but I mean a server

2308
01:30:14,330 --> 01:30:17,990
like this can stay up for a year right

2309
01:30:17,990 --> 01:30:20,030
it's bound to have at least one of those

2310
01:30:20,030 --> 01:30:21,260
going through the memory and flipping

2311
01:30:21,260 --> 01:30:22,970
something by the way this is where the

2312
01:30:22,970 --> 01:30:24,770
actor model is color comes into play and

2313
01:30:24,770 --> 01:30:26,390
actually active model in our language

2314
01:30:26,390 --> 01:30:28,310
was introduced for this rather than

2315
01:30:28,310 --> 01:30:30,320
design systems that never fail you

2316
01:30:30,320 --> 01:30:32,360
embrace failures and you embrace

2317
01:30:32,360 --> 01:30:34,640
monitoring under self or peer monitoring

2318
01:30:34,640 --> 01:30:36,200
and you can have actors monitoring other

2319
01:30:36,200 --> 01:30:39,020
actors and essentially you say this guy

2320
01:30:39,020 --> 01:30:41,540
it's in a weird state let's kill it and

2321
01:30:41,540 --> 01:30:43,640
start it again to recover some sort of a

2322
01:30:43,640 --> 01:30:45,560
good functioning of the system right so

2323
01:30:45,560 --> 01:30:48,140
one way to deal with fault tolerance is

2324
01:30:48,140 --> 01:30:49,730
to design systems that are more fault

2325
01:30:49,730 --> 01:30:51,590
tolerant and that's for example the

2326
01:30:51,590 --> 01:30:53,630
approach taken by the erickson with

2327
01:30:53,630 --> 01:30:55,970
width or length they design self-healing

2328
01:30:55,970 --> 01:30:57,440
systems or things things of this sort

2329
01:30:57,440 --> 01:31:00,770
and that definitely will alleviate the

2330
01:31:00,770 --> 01:31:02,420
fault tolerance problem or you could say

2331
01:31:02,420 --> 01:31:04,430
let it fail and I'll do something else

2332
01:31:04,430 --> 01:31:05,690
but then you still have this issue with

2333
01:31:05,690 --> 01:31:09,050
how you detect failure right so software

2334
01:31:09,050 --> 01:31:11,180
bugs are bugs power failure but there is

2335
01:31:11,180 --> 01:31:13,580
a special kind of failure that it's

2336
01:31:13,580 --> 01:31:15,770
extremely hard to protect against right

2337
01:31:15,770 --> 01:31:17,060
it's something I mentioned before it's

2338
01:31:17,060 --> 01:31:20,030
called Byzantine failures or let's call

2339
01:31:20,030 --> 01:31:21,680
them malicious failures and nobody

2340
01:31:21,680 --> 01:31:23,510
remembers why they are called Byzantine

2341
01:31:23,510 --> 01:31:25,070
failures there is no direct connection

2342
01:31:25,070 --> 01:31:27,770
with the Byzantine Empire or whatnot so

2343
01:31:27,770 --> 01:31:31,990
let's let's call them malicious failures

2344
01:31:35,940 --> 01:31:38,710
so the malicious failures are of the

2345
01:31:38,710 --> 01:31:40,630
following kind okay and this is what

2346
01:31:40,630 --> 01:31:42,160
it's really really really hard to

2347
01:31:42,160 --> 01:31:44,740
protect against them instead of having

2348
01:31:44,740 --> 01:31:46,360
one of these natural causes for the

2349
01:31:46,360 --> 01:31:48,460
failure and then the monitoring will do

2350
01:31:48,460 --> 01:31:52,150
it somebody goes in takes control of the

2351
01:31:52,150 --> 01:31:53,410
machine and make it do slightly

2352
01:31:53,410 --> 01:31:56,260
different things if they really know

2353
01:31:56,260 --> 01:31:58,030
what they're getting into of they could

2354
01:31:58,030 --> 01:32:01,120
wreak havoc on the entire distribution

2355
01:32:01,120 --> 01:32:02,920
system and not only that machine alone

2356
01:32:02,920 --> 01:32:05,440
right for example imagine any of the

2357
01:32:05,440 --> 01:32:08,290
distributed protocols running in which

2358
01:32:08,290 --> 01:32:10,180
one of the participants is not playing

2359
01:32:10,180 --> 01:32:12,640
by the book and it's not not playing by

2360
01:32:12,640 --> 01:32:13,810
the book because every now and then in

2361
01:32:13,810 --> 01:32:15,610
hiccups which could happen it's not

2362
01:32:15,610 --> 01:32:18,390
playing by the books intentionally to

2363
01:32:18,390 --> 01:32:20,890
wreak havoc on the disability protocol

2364
01:32:20,890 --> 01:32:23,470
right so for example I mentioned here

2365
01:32:23,470 --> 01:32:25,210
with the rights right the fact that they

2366
01:32:25,210 --> 01:32:26,500
are going to do exponential backup what

2367
01:32:26,500 --> 01:32:27,910
if one of the guys never does

2368
01:32:27,910 --> 01:32:29,320
exponential break up and just takes one

2369
01:32:29,320 --> 01:32:30,970
of the 1 and tokens now who gives it the

2370
01:32:30,970 --> 01:32:32,320
way that essentially means nobody rights

2371
01:32:32,320 --> 01:32:34,120
by the way that's a form of disability

2372
01:32:34,120 --> 01:32:37,690
denial of service attack right so this

2373
01:32:37,690 --> 01:32:40,000
kind especially malicious failures are a

2374
01:32:40,000 --> 01:32:43,360
tremendous problem right now in general

2375
01:32:43,360 --> 01:32:47,410
right we're going to discuss some of

2376
01:32:47,410 --> 01:32:51,010
this failure protection mechanisms that

2377
01:32:51,010 --> 01:32:52,960
can deal with malicious failures but in

2378
01:32:52,960 --> 01:32:54,970
fact you have to pay a dealer price for

2379
01:32:54,970 --> 01:32:57,700
it right the same resource has to be

2380
01:32:57,700 --> 01:32:59,560
available in multiple places and somehow

2381
01:32:59,560 --> 01:33:01,390
you're monitoring to see if somebody is

2382
01:33:01,390 --> 01:33:03,160
lying and beyond a certain point that is

2383
01:33:03,160 --> 01:33:04,570
nothing you can detect so a crucial

2384
01:33:04,570 --> 01:33:06,670
question is for example let's think

2385
01:33:06,670 --> 01:33:10,330
about peer-to-peer systems right so let

2386
01:33:10,330 --> 01:33:11,830
me give you an idea just how how far

2387
01:33:11,830 --> 01:33:16,540
this can go so not only that some

2388
01:33:16,540 --> 01:33:18,190
participants can be malicious the whole

2389
01:33:18,190 --> 01:33:20,080
system can be malicious right so one of

2390
01:33:20,080 --> 01:33:22,270
the things that surfaced is that one

2391
01:33:22,270 --> 01:33:24,160
peer-to-peer system that starts to be

2392
01:33:24,160 --> 01:33:25,510
quite popular I don't remember the name

2393
01:33:25,510 --> 01:33:27,460
in fact was only a honeypot for the

2394
01:33:27,460 --> 01:33:29,290
music industry to catch people that are

2395
01:33:29,290 --> 01:33:32,080
interested in sharing music to make a

2396
01:33:32,080 --> 01:33:35,260
list of who to Hue right so to some

2397
01:33:35,260 --> 01:33:37,570
extent the whole system was faulty in a

2398
01:33:37,570 --> 01:33:38,950
certain sentence and not only some

2399
01:33:38,950 --> 01:33:41,950
participants but another way you can

2400
01:33:41,950 --> 01:33:44,470
think about it is hey could I

2401
01:33:44,470 --> 01:33:47,020
participate or could I prevent this

2402
01:33:47,020 --> 01:33:48,400
could I participate in up here too

2403
01:33:48,400 --> 01:33:52,120
your system and by being maliciously

2404
01:33:52,120 --> 01:33:54,489
faulty disrupt the peer-to-peer protocol

2405
01:33:54,489 --> 01:33:55,750
right you just finished one of your

2406
01:33:55,750 --> 01:33:57,550
projects right imagine that some of

2407
01:33:57,550 --> 01:33:58,870
those guys would not play by the book

2408
01:33:58,870 --> 01:34:00,340
can you imagine ways in which they could

2409
01:34:00,340 --> 01:34:02,860
disrupt completing the system right they

2410
01:34:02,860 --> 01:34:04,270
can possibly disrupt the system by

2411
01:34:04,270 --> 01:34:06,070
taking any message that's Realty to one

2412
01:34:06,070 --> 01:34:08,830
and throw it as wrongly as possible

2413
01:34:08,830 --> 01:34:12,010
right that will slow things down a

2414
01:34:12,010 --> 01:34:13,570
little bit you can even compute how much

2415
01:34:13,570 --> 01:34:14,920
you can slow it down so crucial question

2416
01:34:14,920 --> 01:34:16,810
for example for that is how many

2417
01:34:16,810 --> 01:34:19,300
peer-to-peer participants would you need

2418
01:34:19,300 --> 01:34:21,670
to disrupt most of the function of a

2419
01:34:21,670 --> 01:34:22,900
peer-to-peer system if you use a

2420
01:34:22,900 --> 01:34:24,610
distributed hash table peer-to-peer

2421
01:34:24,610 --> 01:34:26,830
system all right I'm sure that is some

2422
01:34:26,830 --> 01:34:28,330
serious math that can be done there to

2423
01:34:28,330 --> 01:34:29,949
do some probabilistic computation to say

2424
01:34:29,949 --> 01:34:33,250
ah ten percent weeds enough nobody to

2425
01:34:33,250 --> 01:34:36,130
get any work done right for example if

2426
01:34:36,130 --> 01:34:38,350
you really throw the requests all the

2427
01:34:38,350 --> 01:34:40,150
way around in the wrong direction right

2428
01:34:40,150 --> 01:34:42,520
if the probability that one of your guys

2429
01:34:42,520 --> 01:34:45,219
is hit is reasonably high then

2430
01:34:45,219 --> 01:34:46,989
essentially you can make sure that none

2431
01:34:46,989 --> 01:34:49,810
of these protocols finish or finish in a

2432
01:34:49,810 --> 01:34:51,790
reasonable amount of time basically take

2433
01:34:51,790 --> 01:34:53,290
the performance down by an order of

2434
01:34:53,290 --> 01:34:56,860
George's or any good okay so this kind

2435
01:34:56,860 --> 01:34:58,270
of failures are going to be very hard to

2436
01:34:58,270 --> 01:35:00,370
deal with because it's hard to tell when

2437
01:35:00,370 --> 01:35:02,290
such a failure kicks in so a big

2438
01:35:02,290 --> 01:35:03,850
question is okay fine you do monitoring

2439
01:35:03,850 --> 01:35:05,530
but how do you know that somebody's

2440
01:35:05,530 --> 01:35:08,140
malicious so to some extent you need to

2441
01:35:08,140 --> 01:35:09,969
collect information in what you believe

2442
01:35:09,969 --> 01:35:12,760
are non malicious nodes or in a lot of

2443
01:35:12,760 --> 01:35:14,140
note some of the malicious some of the

2444
01:35:14,140 --> 01:35:16,330
non malicious and some how to compute

2445
01:35:16,330 --> 01:35:18,520
some sort of are you malicious or non

2446
01:35:18,520 --> 01:35:20,170
malicious property to tell if somebody's

2447
01:35:20,170 --> 01:35:22,300
malicious once you detected such a

2448
01:35:22,300 --> 01:35:24,010
failure you can isolate that node

2449
01:35:24,010 --> 01:35:25,540
through whatever mechanisms for example

2450
01:35:25,540 --> 01:35:27,429
you could use multicast or some sort of

2451
01:35:27,429 --> 01:35:30,429
broadcast to say don't trust that guy

2452
01:35:30,429 --> 01:35:32,530
and kick it out of the network but again

2453
01:35:32,530 --> 01:35:34,300
the problem is how do you know it's

2454
01:35:34,300 --> 01:35:37,210
malicious okay and it's a cat-and-mouse

2455
01:35:37,210 --> 01:35:39,699
kind of situation which is true for

2456
01:35:39,699 --> 01:35:41,230
almost anything related to in fact

2457
01:35:41,230 --> 01:35:47,170
security okay right so again what is

2458
01:35:47,170 --> 01:35:49,420
faulty how do I know so I'm a different

2459
01:35:49,420 --> 01:35:51,640
system how do I know somebody else is

2460
01:35:51,640 --> 01:35:53,800
faulty let's not worry about malicious

2461
01:35:53,800 --> 01:35:55,330
because this is just hard we need the

2462
01:35:55,330 --> 01:35:57,159
discussion separately but there are

2463
01:35:57,159 --> 01:35:59,139
participants and we talked about this

2464
01:35:59,139 --> 01:36:00,730
right bully algorithm and some

2465
01:36:00,730 --> 01:36:04,120
in action algorithms write those

2466
01:36:04,120 --> 01:36:06,160
algorithms needed some sort of a leader

2467
01:36:06,160 --> 01:36:11,140
and without the leader the system

2468
01:36:11,140 --> 01:36:12,820
doesn't work so you have to somehow

2469
01:36:12,820 --> 01:36:15,040
determine when the leader is faulty to

2470
01:36:15,040 --> 01:36:16,630
elect another leader for example that's

2471
01:36:16,630 --> 01:36:17,880
one of the basic things you could do

2472
01:36:17,880 --> 01:36:22,050
right so what does Molitor mean right

2473
01:36:22,050 --> 01:36:24,880
mechanisms need to be built-in in order

2474
01:36:24,880 --> 01:36:28,320
to be able to declare somebody faulty so

2475
01:36:28,320 --> 01:36:30,730
when you declare somebody faulty you

2476
01:36:30,730 --> 01:36:32,380
have other issues to deal with one of

2477
01:36:32,380 --> 01:36:34,000
them is the fact that okay I declare

2478
01:36:34,000 --> 01:36:35,980
somebody to be faulty I let other nodes

2479
01:36:35,980 --> 01:36:37,660
know that it's faulty but maybe the guy

2480
01:36:37,660 --> 01:36:39,370
did turn out not to be faulty in the end

2481
01:36:39,370 --> 01:36:41,800
and he comes alive for example in linear

2482
01:36:41,800 --> 01:36:42,790
election that could create problems

2483
01:36:42,790 --> 01:36:44,260
because if you have two leaders at the

2484
01:36:44,260 --> 01:36:46,810
same time then what the guy that wasn't

2485
01:36:46,810 --> 01:36:49,780
faulty or is not faulty anymore comes

2486
01:36:49,780 --> 01:36:52,590
back you already have another leader and

2487
01:36:52,590 --> 01:36:55,120
would your protocol run with two leaders

2488
01:36:55,120 --> 01:36:57,460
with this old guy figure out that there

2489
01:36:57,460 --> 01:37:00,280
is a new leader how always need kind of

2490
01:37:00,280 --> 01:37:02,400
message exchanges so the classic example

2491
01:37:02,400 --> 01:37:06,340
of how you could detect at least normal

2492
01:37:06,340 --> 01:37:09,070
kind of faults is keepalive right

2493
01:37:09,070 --> 01:37:14,530
keepalive messages or heartbeat it's

2494
01:37:14,530 --> 01:37:15,970
actually hard bit sorry but keep alive

2495
01:37:15,970 --> 01:37:21,660
is the how this heartbeats go so Harvey

2496
01:37:25,510 --> 01:37:31,010
okay so Hardwick mechanism consists in

2497
01:37:31,010 --> 01:37:32,900
having special kinds of messages that

2498
01:37:32,900 --> 01:37:34,910
are sent at regular intervals of time

2499
01:37:34,910 --> 01:37:39,170
and the lack of those messages lead to

2500
01:37:39,170 --> 01:37:41,270
somebody being declared dead it's like

2501
01:37:41,270 --> 01:37:42,679
taking the pulse of somebody if they

2502
01:37:42,679 --> 01:37:44,989
don't have ten successive heartbeats in

2503
01:37:44,989 --> 01:37:46,340
a certain amount of time you say they

2504
01:37:46,340 --> 01:37:47,780
are dead okay

2505
01:37:47,780 --> 01:37:49,880
so this is exactly what this is but that

2506
01:37:49,880 --> 01:37:51,290
essentially means that you're trading

2507
01:37:51,290 --> 01:37:53,060
performance for fault tolerance for

2508
01:37:53,060 --> 01:37:55,940
fault detection in particular you must

2509
01:37:55,940 --> 01:37:58,370
send its heartbeat heart beats because

2510
01:37:58,370 --> 01:38:00,770
if you're if you miss the heartbeat then

2511
01:38:00,770 --> 01:38:02,690
you will be declared faulty and

2512
01:38:02,690 --> 01:38:05,000
potentially kicked out of the the

2513
01:38:05,000 --> 01:38:15,739
network yes good so this is a very good

2514
01:38:15,739 --> 01:38:17,120
question so when it comes to hard bits

2515
01:38:17,120 --> 01:38:20,210
there are two questions to ask one is

2516
01:38:20,210 --> 01:38:21,710
how often do you send them and the other

2517
01:38:21,710 --> 01:38:24,260
one is after how many miss heartbeats do

2518
01:38:24,260 --> 01:38:26,210
you declare somebody dead right and

2519
01:38:26,210 --> 01:38:28,190
these things know it's not enough to do

2520
01:38:28,190 --> 01:38:30,290
this right I mean ideally you would

2521
01:38:30,290 --> 01:38:31,969
treat these heartbeats as an extremely

2522
01:38:31,969 --> 01:38:34,219
high priority messages right in the

2523
01:38:34,219 --> 01:38:35,600
sense that even if the machine is super

2524
01:38:35,600 --> 01:38:38,210
busy right you would like to still send

2525
01:38:38,210 --> 01:38:39,770
a hard way to say oh I'm alive don't

2526
01:38:39,770 --> 01:38:42,080
kill me no matter how busy I am right

2527
01:38:42,080 --> 01:38:44,449
which is not necessarily a simple or

2528
01:38:44,449 --> 01:38:47,239
easy thing so the fact that you can do

2529
01:38:47,239 --> 01:38:48,500
something like heart beats it doesn't

2530
01:38:48,500 --> 01:38:51,260
mean it will just work without problems

2531
01:38:51,260 --> 01:38:53,719
all the time right and in particular a

2532
01:38:53,719 --> 01:38:55,670
big worrisome thing is you declare lots

2533
01:38:55,670 --> 01:38:57,620
of things as being dead when they're not

2534
01:38:57,620 --> 01:38:59,390
dead at all it's just some kind of a

2535
01:38:59,390 --> 01:39:02,420
temporary hiccup right so for example if

2536
01:39:02,420 --> 01:39:04,070
you have stormy weather outside you is

2537
01:39:04,070 --> 01:39:05,900
they can have no internet connectivity

2538
01:39:05,900 --> 01:39:09,650
for let's say half a second because in

2539
01:39:09,650 --> 01:39:10,940
that amount of time there is too much

2540
01:39:10,940 --> 01:39:12,140
electrical charge nothing goes through

2541
01:39:12,140 --> 01:39:14,570
if you require a heartbeat every 30

2542
01:39:14,570 --> 01:39:16,130
milliseconds and all three heartbeats

2543
01:39:16,130 --> 01:39:17,870
you declare dead then you declare that

2544
01:39:17,870 --> 01:39:20,060
half the network that's not gonna be

2545
01:39:20,060 --> 01:39:21,650
good so it's tricky

2546
01:39:21,650 --> 01:39:23,750
exactly how you use it how you find unit

2547
01:39:23,750 --> 01:39:25,219
and what to do with it but this is some

2548
01:39:25,219 --> 01:39:26,660
sort of a mechanism that could be used

2549
01:39:26,660 --> 01:39:30,080
to detect something look the easy way

2550
01:39:30,080 --> 01:39:31,489
out is to say everything is too

2551
01:39:31,489 --> 01:39:32,960
complicated I'm not gonna do anything

2552
01:39:32,960 --> 01:39:34,219
but that's not quite an option when it

2553
01:39:34,219 --> 01:39:35,390
comes to engineering right so that's

2554
01:39:35,390 --> 01:39:36,980
what I want you to keep in mind

2555
01:39:36,980 --> 01:39:38,810
all these things are compromises all of

2556
01:39:38,810 --> 01:39:41,780
them have black spots here and there

2557
01:39:41,780 --> 01:39:43,730
right dark corners sometimes they are

2558
01:39:43,730 --> 01:39:44,929
not doing exactly what they are supposed

2559
01:39:44,929 --> 01:39:46,370
to do but you still need some mechanism

2560
01:39:46,370 --> 01:39:48,650
to to deal with the situation so I'm

2561
01:39:48,650 --> 01:39:51,020
gonna obviously discuss more about this

2562
01:39:51,020 --> 01:39:53,000
on on Thursday and we're gonna continue

2563
01:39:53,000 --> 01:39:55,370
discussing the full tolerance with all

2564
01:39:55,370 --> 01:39:56,989
this replication in mind because that's

2565
01:39:56,989 --> 01:39:58,580
one reason to do that to do the fault

2566
01:39:58,580 --> 01:40:00,020
tolerance it's not enough to detect

2567
01:40:00,020 --> 01:40:01,610
failures you have to do something about

2568
01:40:01,610 --> 01:40:03,800
it and that almost always means some

2569
01:40:03,800 --> 00:00:00,000
solution around replication all right