1
00:00:25,700 --> 00:00:29,490
all right so we are gathering now two

2
00:00:29,490 --> 00:00:31,560
interesting things okay so I gave you

3
00:00:31,560 --> 00:00:35,910
abroad all of you before high level what

4
00:00:35,910 --> 00:00:37,379
are the issues that disability since

5
00:00:37,379 --> 00:00:40,410
people care about and it was what mainly

6
00:00:40,410 --> 00:00:43,350
use lots of lots of lots of computers to

7
00:00:43,350 --> 00:00:45,239
do all kinds of great things right so

8
00:00:45,239 --> 00:00:47,910
let's see how that might actually be

9
00:00:47,910 --> 00:00:49,440
done so in order to talk about

10
00:00:49,440 --> 00:00:51,330
particular solutions we have to talk

11
00:00:51,330 --> 00:00:52,949
about this issue called architectures

12
00:00:52,949 --> 00:00:54,660
and architectures is gonna mean many

13
00:00:54,660 --> 00:00:56,129
different things in this lecture right

14
00:00:56,129 --> 00:00:58,170
it's gonna be to some extent hardware

15
00:00:58,170 --> 00:01:00,629
architectures but also organization of

16
00:01:00,629 --> 00:01:02,820
at the logical level of the simulated

17
00:01:02,820 --> 00:01:05,489
systems so maybe the first question to

18
00:01:05,489 --> 00:01:10,530
ask is why bother organize well because

19
00:01:10,530 --> 00:01:12,030
if you don't organize you're not gonna

20
00:01:12,030 --> 00:01:14,369
be able to articulate a nice enough

21
00:01:14,369 --> 00:01:15,780
solution you're not gonna be understood

22
00:01:15,780 --> 00:01:16,680
by other people

23
00:01:16,680 --> 00:01:18,600
you're gonna reinvent the wheel over and

24
00:01:18,600 --> 00:01:20,420
over all kinds of issues like this right

25
00:01:20,420 --> 00:01:23,970
now it's important to have ways to

26
00:01:23,970 --> 00:01:27,479
organize talk about in a certain way

27
00:01:27,479 --> 00:01:29,399
material but it's also important to

28
00:01:29,399 --> 00:01:31,469
recognize that you can break away from

29
00:01:31,469 --> 00:01:33,840
some of those issues and invent maybe

30
00:01:33,840 --> 00:01:36,509
your own terms all right so I call this

31
00:01:36,509 --> 00:01:39,960
a selective will invention right so you

32
00:01:39,960 --> 00:01:41,340
want to reinvent the wheel in a very

33
00:01:41,340 --> 00:01:44,820
selective way in a purposeful manner to

34
00:01:44,820 --> 00:01:46,079
achieve a certain thing you don't want

35
00:01:46,079 --> 00:01:47,670
to just randomly reinvent the wheel

36
00:01:47,670 --> 00:01:48,869
right it's good to know what other

37
00:01:48,869 --> 00:01:50,700
people did is good to know when you

38
00:01:50,700 --> 00:01:53,249
might want to to break break free

39
00:01:53,249 --> 00:01:56,159
yourself okay so let's first see how

40
00:01:56,159 --> 00:01:58,289
already people in distributed systems

41
00:01:58,289 --> 00:02:01,079
have thought about this issue

42
00:02:01,079 --> 00:02:05,460
architectures okay so I'm not gonna go

43
00:02:05,460 --> 00:02:07,139
into what how computers work what do

44
00:02:07,139 --> 00:02:09,538
they do what networking does and almost

45
00:02:09,538 --> 00:02:10,710
anything involving these civil

46
00:02:10,710 --> 00:02:12,540
resistance is going to obviously run on

47
00:02:12,540 --> 00:02:14,100
something I can run programs computers

48
00:02:14,100 --> 00:02:15,300
and is going to have some sort of

49
00:02:15,300 --> 00:02:17,580
interconnectivity okay now that might

50
00:02:17,580 --> 00:02:21,959
take the form of some sort of inside the

51
00:02:21,959 --> 00:02:24,300
machine connectivity or between machine

52
00:02:24,300 --> 00:02:26,459
connectivity networking people have

53
00:02:26,459 --> 00:02:28,050
worried about this for a long time so we

54
00:02:28,050 --> 00:02:29,370
are just going to use networking right

55
00:02:29,370 --> 00:02:31,710
we're never going to get at a extremely

56
00:02:31,710 --> 00:02:33,240
low level when it comes to communication

57
00:02:33,240 --> 00:02:34,830
we're going to assume that we do some

58
00:02:34,830 --> 00:02:37,680
kind sort of a tcp/ip or maybe

59
00:02:37,680 --> 00:02:39,989
datagrams UDP well that's about all the

60
00:02:39,989 --> 00:02:42,030
level we we actually care about we don't

61
00:02:42,030 --> 00:02:44,280
care about the bits going on on the wire

62
00:02:44,280 --> 00:02:47,939
in this class okay or for that matter in

63
00:02:47,939 --> 00:02:50,010
any class except a networking class when

64
00:02:50,010 --> 00:02:51,299
we only care about the beads going on

65
00:02:51,299 --> 00:02:54,290
the wire for the most part okay what so

66
00:02:54,290 --> 00:02:58,170
when it comes to architectural styles at

67
00:02:58,170 --> 00:03:00,269
least this textbook organizes them in

68
00:03:00,269 --> 00:03:01,590
the in the following way these are the

69
00:03:01,590 --> 00:03:03,299
four primary architectural styles talked

70
00:03:03,299 --> 00:03:05,670
about layered architecture object based

71
00:03:05,670 --> 00:03:07,739
architectures datacenter architectures

72
00:03:07,739 --> 00:03:09,780
and event based architectures now

73
00:03:09,780 --> 00:03:11,909
sometimes it's hard to say whether a

74
00:03:11,909 --> 00:03:13,139
particular system has one of these

75
00:03:13,139 --> 00:03:14,819
architectures or a blend between them or

76
00:03:14,819 --> 00:03:17,849
things of this sort and take them as

77
00:03:17,849 --> 00:03:18,900
only some sort of a high-level

78
00:03:18,900 --> 00:03:20,940
indication of how you might do things

79
00:03:20,940 --> 00:03:24,689
okay now since we are talking about the

80
00:03:24,689 --> 00:03:27,299
actor model and that's really what I

81
00:03:27,299 --> 00:03:29,760
wanted to use primarily in this class

82
00:03:29,760 --> 00:03:31,260
I'm gonna make lots of references to the

83
00:03:31,260 --> 00:03:33,170
actor model and interestingly enough

84
00:03:33,170 --> 00:03:35,730
actor model is gonna fit perfectly well

85
00:03:35,730 --> 00:03:37,950
some of this architectures but

86
00:03:37,950 --> 00:03:39,750
definitely be able to emulate all the

87
00:03:39,750 --> 00:03:42,750
other architectures thinking in

88
00:03:42,750 --> 00:03:44,400
different ways is still helpful even if

89
00:03:44,400 --> 00:03:47,069
you use a for example a paradigm like

90
00:03:47,069 --> 00:03:49,650
the actor model that naturally fits one

91
00:03:49,650 --> 00:03:50,940
of this and let's see which one it does

92
00:03:50,940 --> 00:03:51,959
fit okay

93
00:03:51,959 --> 00:03:54,959
so layered architecture so first how

94
00:03:54,959 --> 00:03:57,919
many people took a networking class here

95
00:03:57,919 --> 00:04:01,590
okay so the layered architecture really

96
00:04:01,590 --> 00:04:03,870
comes from networking I would even say

97
00:04:03,870 --> 00:04:05,549
the networking people are completely

98
00:04:05,549 --> 00:04:08,970
obsessed by layering any kind of network

99
00:04:08,970 --> 00:04:11,370
stack it's about layers right so they

100
00:04:11,370 --> 00:04:13,889
talk about several layers and then they

101
00:04:13,889 --> 00:04:15,989
implement five and hope people notice

102
00:04:15,989 --> 00:04:20,070
three but in the end these layers right

103
00:04:20,070 --> 00:04:21,988
are gonna add various kind of

104
00:04:21,988 --> 00:04:24,360
functionality but then why stack them it

105
00:04:24,360 --> 00:04:27,360
makes the code it makes reasoning and

106
00:04:27,360 --> 00:04:28,560
the implementation a little bit easier

107
00:04:28,560 --> 00:04:30,750
to understand so they figure out and to

108
00:04:30,750 --> 00:04:34,080
some extent they are going along with

109
00:04:34,080 --> 00:04:36,810
the initial implementation of say tcp/ip

110
00:04:36,810 --> 00:04:40,110
right their reasoning that separating

111
00:04:40,110 --> 00:04:41,820
various functionality in these layers

112
00:04:41,820 --> 00:04:43,349
and then maybe optionally including the

113
00:04:43,349 --> 00:04:44,729
layers so not including the layers is

114
00:04:44,729 --> 00:04:46,529
gonna make for a nice implementation for

115
00:04:46,529 --> 00:04:47,849
the network stack and virtually all

116
00:04:47,849 --> 00:04:49,710
network stacks are have a layered

117
00:04:49,710 --> 00:04:50,650
architecture

118
00:04:50,650 --> 00:04:53,740
okay the very bottom layer in a network

119
00:04:53,740 --> 00:04:56,290
sack here's essentially Ethernet like

120
00:04:56,290 --> 00:04:58,389
behavior the base let go on an Ethernet

121
00:04:58,389 --> 00:05:00,009
like connection and the higher most

122
00:05:00,009 --> 00:05:01,449
level might be something like the full

123
00:05:01,449 --> 00:05:04,180
flash TCP tcp/ip behavior right so maybe

124
00:05:04,180 --> 00:05:06,250
the very the end checksum or things of

125
00:05:06,250 --> 00:05:09,419
this sort okay so when it comes to

126
00:05:09,419 --> 00:05:11,530
protocol implementation especially

127
00:05:11,530 --> 00:05:13,300
low-level protocol implementation

128
00:05:13,300 --> 00:05:15,430
virtually the the layers it actually

129
00:05:15,430 --> 00:05:17,289
dominates but it's not particularly nice

130
00:05:17,289 --> 00:05:21,550
to reason about right for the most part

131
00:05:21,550 --> 00:05:23,289
when it comes to distributed systems

132
00:05:23,289 --> 00:05:26,289
since distributed systems people mostly

133
00:05:26,289 --> 00:05:28,330
just use networking as a tool and say oh

134
00:05:28,330 --> 00:05:31,600
yes let's do tcp/ip layering the kind of

135
00:05:31,600 --> 00:05:33,580
layered architecture is not particularly

136
00:05:33,580 --> 00:05:36,130
interesting it's too constraining so one

137
00:05:36,130 --> 00:05:37,840
particular constraining part in it is

138
00:05:37,840 --> 00:05:40,660
the fact that one layer can talk only

139
00:05:40,660 --> 00:05:43,300
with the layer right right below right

140
00:05:43,300 --> 00:05:45,039
so for example in networking the high

141
00:05:45,039 --> 00:05:47,349
the high most level is going to be make

142
00:05:47,349 --> 00:05:49,240
some kind of a tcp/ip request and then

143
00:05:49,240 --> 00:05:50,800
you go layer layer layer layer until

144
00:05:50,800 --> 00:05:55,780
oops smartboard okay until you

145
00:05:55,780 --> 00:05:56,979
essentially get to the bottommost layer

146
00:05:56,979 --> 00:05:59,139
that's the requesting bits go on the

147
00:05:59,139 --> 00:06:00,940
wire and then when something happened or

148
00:06:00,940 --> 00:06:03,130
everything gets propagated back but you

149
00:06:03,130 --> 00:06:04,539
have to go through the intermediate

150
00:06:04,539 --> 00:06:05,889
layers that's the problem in the layered

151
00:06:05,889 --> 00:06:07,960
architecture why not jump so even

152
00:06:07,960 --> 00:06:09,490
network implementers have thought about

153
00:06:09,490 --> 00:06:12,900
better ways to do this and by the way

154
00:06:12,900 --> 00:06:16,360
some distributed systems people came up

155
00:06:16,360 --> 00:06:18,400
with interesting variants of the layered

156
00:06:18,400 --> 00:06:19,900
architecture to make it much higher

157
00:06:19,900 --> 00:06:21,789
performance right and that was

158
00:06:21,789 --> 00:06:23,139
essentially by saying you have

159
00:06:23,139 --> 00:06:24,610
essentially a layered architecture but

160
00:06:24,610 --> 00:06:26,760
you can optimize it and do jumps and and

161
00:06:26,760 --> 00:06:28,990
write for example if you could jump from

162
00:06:28,990 --> 00:06:31,030
layer 1 to layer n which might be

163
00:06:31,030 --> 00:06:32,590
acceptable under certain circumstances

164
00:06:32,590 --> 00:06:34,479
the network stack is going to run faster

165
00:06:34,479 --> 00:06:37,599
ok but as it is is too constraining it's

166
00:06:37,599 --> 00:06:38,889
very important though it's important to

167
00:06:38,889 --> 00:06:41,800
know that something like this it's is

168
00:06:41,800 --> 00:06:44,680
use ok now this is an object based

169
00:06:44,680 --> 00:06:46,389
architectural style okay you have

170
00:06:46,389 --> 00:06:48,909
various objects and the objects somehow

171
00:06:48,909 --> 00:06:51,880
communicate with each other this fits

172
00:06:51,880 --> 00:06:54,030
perfectly the object-oriented design

173
00:06:54,030 --> 00:06:59,260
right now while it's nice right so this

174
00:06:59,260 --> 00:07:01,029
kind of arrows means some sort of a

175
00:07:01,029 --> 00:07:03,370
method call if you would put

176
00:07:03,370 --> 00:07:03,919
object

177
00:07:03,919 --> 00:07:05,599
different machines then those errors are

178
00:07:05,599 --> 00:07:06,860
still gonna be some sort of method calls

179
00:07:06,860 --> 00:07:09,889
but remote method calls sometimes known

180
00:07:09,889 --> 00:07:11,599
under the name remote procedure calls

181
00:07:11,599 --> 00:07:13,580
right it's a more generic not

182
00:07:13,580 --> 00:07:14,900
necessarily method but the function in

183
00:07:14,900 --> 00:07:18,710
general right now because of the object

184
00:07:18,710 --> 00:07:23,169
oriented design taught in virtually all

185
00:07:23,169 --> 00:07:25,159
programming classes or almost all

186
00:07:25,159 --> 00:07:27,650
programming classes this feels like a

187
00:07:27,650 --> 00:07:29,900
very natural kind of model right the

188
00:07:29,900 --> 00:07:31,580
trouble is is not quite clear who does

189
00:07:31,580 --> 00:07:34,580
what and when right namely any notion of

190
00:07:34,580 --> 00:07:36,349
parallelism is not obvious at all but

191
00:07:36,349 --> 00:07:38,330
this kind of a model would fit perfectly

192
00:07:38,330 --> 00:07:40,039
let's say an actor based model in which

193
00:07:40,039 --> 00:07:41,539
you replace objects with actors and

194
00:07:41,539 --> 00:07:43,719
Method calls with some kind of a message

195
00:07:43,719 --> 00:07:46,189
exchange between act between actors and

196
00:07:46,189 --> 00:07:48,379
then this starts to make a lot of sense

197
00:07:48,379 --> 00:07:51,409
right but this is really not the intent

198
00:07:51,409 --> 00:07:53,060
of the algae-based architecture right

199
00:07:53,060 --> 00:07:56,210
those objects quite often paste a lot of

200
00:07:56,210 --> 00:07:58,669
moves in between themselves large part

201
00:07:58,669 --> 00:08:01,039
of the state and share share things and

202
00:08:01,039 --> 00:08:02,659
whatnot part of the actor model is

203
00:08:02,659 --> 00:08:04,999
really try to keep to yourself the state

204
00:08:04,999 --> 00:08:08,469
you're you're managing right and use

205
00:08:08,469 --> 00:08:12,229
message exchanges to act on your own

206
00:08:12,229 --> 00:08:14,240
state and to send more messages to

207
00:08:14,240 --> 00:08:15,889
somebody else not necessarily true in

208
00:08:15,889 --> 00:08:20,509
object model okay now another one is the

209
00:08:20,509 --> 00:08:24,319
event based architectural style and you

210
00:08:24,319 --> 00:08:26,779
probably already say hey but actors have

211
00:08:26,779 --> 00:08:28,490
events so Xers are really some sort of a

212
00:08:28,490 --> 00:08:30,259
combination you can see now of some kind

213
00:08:30,259 --> 00:08:32,328
of an events event based architecture

214
00:08:32,328 --> 00:08:34,429
and an object based architecture in the

215
00:08:34,429 --> 00:08:35,929
event based architecture you have

216
00:08:35,929 --> 00:08:37,549
messages but not necessary any kind of

217
00:08:37,549 --> 00:08:40,370
actors I want you to understand that the

218
00:08:40,370 --> 00:08:44,870
actor model right it's a separate idea

219
00:08:44,870 --> 00:08:47,000
you can take part of it just exchanging

220
00:08:47,000 --> 00:08:48,260
messages and then you have this event

221
00:08:48,260 --> 00:08:49,820
this architecture you can take the

222
00:08:49,820 --> 00:08:51,350
overall object organization and then you

223
00:08:51,350 --> 00:08:52,760
have an object based architecture you

224
00:08:52,760 --> 00:08:54,320
can combine both and then you have some

225
00:08:54,320 --> 00:08:56,360
sort of an actor model you might ask the

226
00:08:56,360 --> 00:08:57,800
question why is the actor model not in

227
00:08:57,800 --> 00:09:00,589
the textbook it's really not in the

228
00:09:00,589 --> 00:09:03,350
textbook well it looks to me that this

229
00:09:03,350 --> 00:09:05,779
is the biggest blunder of the

230
00:09:05,779 --> 00:09:07,519
distributed systems community they

231
00:09:07,519 --> 00:09:10,399
ignore this very nice well fitting actor

232
00:09:10,399 --> 00:09:14,600
model idea right why it's still not

233
00:09:14,600 --> 00:09:16,279
quite clear okay anyway

234
00:09:16,279 --> 00:09:17,570
the event based architect

235
00:09:17,570 --> 00:09:20,570
style consists in some sort of a medium

236
00:09:20,570 --> 00:09:23,480
in which you're exchanging events now

237
00:09:23,480 --> 00:09:24,769
what you do with the events on how you

238
00:09:24,769 --> 00:09:28,990
behave and how you manage those events

239
00:09:28,990 --> 00:09:31,250
the architecture diagram doesn't care

240
00:09:31,250 --> 00:09:33,199
about that important thing is you have

241
00:09:33,199 --> 00:09:34,720
the ability to change these events

242
00:09:34,720 --> 00:09:37,660
interestingly if you look at modern

243
00:09:37,660 --> 00:09:40,819
computer architectures right to large

244
00:09:40,819 --> 00:09:41,990
extent they are actually event

245
00:09:41,990 --> 00:09:44,600
exchanging architectures for example the

246
00:09:44,600 --> 00:09:46,009
way the course talked to each other on

247
00:09:46,009 --> 00:09:48,440
an AMD processor is really this message

248
00:09:48,440 --> 00:09:49,940
passing and those are wrong it's exactly

249
00:09:49,940 --> 00:09:51,920
this they exchange these events the

250
00:09:51,920 --> 00:09:53,449
events being somebody wrote in this

251
00:09:53,449 --> 00:09:55,009
location in the cache update your cache

252
00:09:55,009 --> 00:09:56,269
it's all that cache coherency protocol

253
00:09:56,269 --> 00:09:57,620
kind of thing for the people that took

254
00:09:57,620 --> 00:09:59,209
some kind of an architecture class how

255
00:09:59,209 --> 00:10:00,470
many people took an architecture class

256
00:10:00,470 --> 00:10:02,300
or are taking an architecture class even

257
00:10:02,300 --> 00:10:05,149
more interesting but don't we have a

258
00:10:05,149 --> 00:10:06,649
core class where does this or maybe it's

259
00:10:06,649 --> 00:10:09,259
not core but it's not in it okay anyway

260
00:10:09,259 --> 00:10:10,810
it's potentially interesting to look at

261
00:10:10,810 --> 00:10:16,100
okay the other one is and this is really

262
00:10:16,100 --> 00:10:19,250
everywhere okay and I'm gonna talk

263
00:10:19,250 --> 00:10:22,010
extensively about this in class it's

264
00:10:22,010 --> 00:10:24,410
this idea of a shared data space and

265
00:10:24,410 --> 00:10:27,079
when you usually talk about data I want

266
00:10:27,079 --> 00:10:28,430
you to understand this you're talking

267
00:10:28,430 --> 00:10:31,250
about some sort of persistency right so

268
00:10:31,250 --> 00:10:33,709
data is supposed to leave out leave

269
00:10:33,709 --> 00:10:36,440
computers that exchange the data so the

270
00:10:36,440 --> 00:10:37,850
trouble with all the other architectures

271
00:10:37,850 --> 00:10:41,269
we had before is things move around but

272
00:10:41,269 --> 00:10:44,050
they will disappear if the computers

273
00:10:44,050 --> 00:10:47,300
basically get turned off but the moment

274
00:10:47,300 --> 00:10:48,889
you start talking about persistency of

275
00:10:48,889 --> 00:10:51,889
the data the data is going to be pushed

276
00:10:51,889 --> 00:10:53,540
to some sort of a persistent medium for

277
00:10:53,540 --> 00:10:55,670
example a hard drive so even if the

278
00:10:55,670 --> 00:10:57,920
computer has issues goes down for any

279
00:10:57,920 --> 00:10:59,569
reason when it wakes up he still finds

280
00:10:59,569 --> 00:11:02,899
the data right that means the data now

281
00:11:02,899 --> 00:11:04,880
is really the core and everything

282
00:11:04,880 --> 00:11:08,540
revolves around this data right it's as

283
00:11:08,540 --> 00:11:11,000
if we would use basically some kind of a

284
00:11:11,000 --> 00:11:13,250
written record and if I want to do a

285
00:11:13,250 --> 00:11:15,170
transaction right I want to sell you

286
00:11:15,170 --> 00:11:18,019
something for example a house we simply

287
00:11:18,019 --> 00:11:19,430
write some kind of a document that goes

288
00:11:19,430 --> 00:11:20,959
in the archive saying we've done that

289
00:11:20,959 --> 00:11:22,730
transaction so if anything happens to us

290
00:11:22,730 --> 00:11:24,410
somebody can go and look at the document

291
00:11:24,410 --> 00:11:26,930
this is probably what you revolutionize

292
00:11:26,930 --> 00:11:29,540
completely commerce and owning property

293
00:11:29,540 --> 00:11:30,440
owner

294
00:11:30,440 --> 00:11:32,540
right you didn't have to defend all the

295
00:11:32,540 --> 00:11:34,940
time using some sort of a weapon what

296
00:11:34,940 --> 00:11:36,410
you owned you could prove you own

297
00:11:36,410 --> 00:11:37,910
something based on the written record so

298
00:11:37,910 --> 00:11:39,200
that's kind of the ultimate in

299
00:11:39,200 --> 00:11:41,450
persistent storage right a lot of it

300
00:11:41,450 --> 00:11:43,700
goes hundreds of years back or or more

301
00:11:43,700 --> 00:11:47,660
right well even going back to

302
00:11:47,660 --> 00:11:49,430
milliseconds it's tough in some of the

303
00:11:49,430 --> 00:11:51,050
dispute system so when you're talking

304
00:11:51,050 --> 00:11:53,120
about data things change you're suddenly

305
00:11:53,120 --> 00:11:55,160
worried about doing things to the data

306
00:11:55,160 --> 00:11:57,200
and not so much what all those computers

307
00:11:57,200 --> 00:11:59,690
that are alive will not do okay so then

308
00:11:59,690 --> 00:12:01,430
you everything gets in terms of go and

309
00:12:01,430 --> 00:12:03,290
change this data and propagate the

310
00:12:03,290 --> 00:12:04,520
change to the data make sure everybody

311
00:12:04,520 --> 00:12:08,060
sees the same data okay right now this

312
00:12:08,060 --> 00:12:10,220
is really the I would say the prevalent

313
00:12:10,220 --> 00:12:12,020
architectural style especially when it

314
00:12:12,020 --> 00:12:14,240
comes to web related anything all right

315
00:12:14,240 --> 00:12:15,830
I'm gonna come back to this when I talk

316
00:12:15,830 --> 00:12:17,810
about classic architecture for web-based

317
00:12:17,810 --> 00:12:20,150
services distributed system services and

318
00:12:20,150 --> 00:12:23,570
data it's always gonna be there okay by

319
00:12:23,570 --> 00:12:26,980
the way this is why databases are a

320
00:12:26,980 --> 00:12:29,180
multibillion-dollar industry because

321
00:12:29,180 --> 00:12:30,710
everybody needs some sort of persistency

322
00:12:30,710 --> 00:12:33,680
for their data so okay we're gonna come

323
00:12:33,680 --> 00:12:35,840
back to this later so it's mostly about

324
00:12:35,840 --> 00:12:37,220
data delivery of course some machines

325
00:12:37,220 --> 00:12:38,390
have to deliver the data but the

326
00:12:38,390 --> 00:12:39,410
important thing is the data is

327
00:12:39,410 --> 00:12:43,460
persistent okay no I did mention the

328
00:12:43,460 --> 00:12:46,730
fact that there are a number of core

329
00:12:46,730 --> 00:12:49,730
obsessions for every area and this TV

330
00:12:49,730 --> 00:12:52,820
systems has quite a lot of them but one

331
00:12:52,820 --> 00:12:54,490
of the most important one is this

332
00:12:54,490 --> 00:12:56,420
centralized idea of centralized

333
00:12:56,420 --> 00:12:58,160
architecture and not so much the

334
00:12:58,160 --> 00:13:01,730
centralized architecture but the deep

335
00:13:01,730 --> 00:13:03,170
opposition to a centralized architecture

336
00:13:03,170 --> 00:13:05,180
so distributed system research for

337
00:13:05,180 --> 00:13:07,820
example is mostly about removing the

338
00:13:07,820 --> 00:13:11,510
need for centralized anything right so

339
00:13:11,510 --> 00:13:14,000
whatever opposite of centralized you can

340
00:13:14,000 --> 00:13:16,070
find you can potentially do some

341
00:13:16,070 --> 00:13:17,270
interesting research in disability

342
00:13:17,270 --> 00:13:18,740
systems or build an interesting

343
00:13:18,740 --> 00:13:20,840
disability system in place in the end

344
00:13:20,840 --> 00:13:24,200
okay now what exactly does centralized

345
00:13:24,200 --> 00:13:26,570
architecture mean you have one or more

346
00:13:26,570 --> 00:13:28,790
clients and some sort of a server the

347
00:13:28,790 --> 00:13:30,080
server is going to do all the heavy

348
00:13:30,080 --> 00:13:33,800
lifting okay now obviously if the server

349
00:13:33,800 --> 00:13:36,880
goes down there is no more service

350
00:13:36,880 --> 00:13:41,510
whatsoever okay now it's easy to see how

351
00:13:41,510 --> 00:13:43,880
this would work because we are used

352
00:13:43,880 --> 00:13:45,490
this is the classic client-server model

353
00:13:45,490 --> 00:13:50,149
okay you have some one or more clients

354
00:13:50,149 --> 00:13:51,680
and the server is usually a big powerful

355
00:13:51,680 --> 00:13:53,149
machine and you ask the server to do

356
00:13:53,149 --> 00:13:55,310
things the interesting question is

357
00:13:55,310 --> 00:13:58,069
what's on the other side what could you

358
00:13:58,069 --> 00:14:00,069
have that's not a client-server model

359
00:14:00,069 --> 00:14:02,269
right and this is the kind of questions

360
00:14:02,269 --> 00:14:04,720
that these two B systems try to ask and

361
00:14:04,720 --> 00:14:11,589
tries to find solutions for okay now

362
00:14:11,589 --> 00:14:15,470
when it comes to any kind of system it's

363
00:14:15,470 --> 00:14:17,839
important to think how the entire stack

364
00:14:17,839 --> 00:14:19,519
looks like so I mean one of the biggest

365
00:14:19,519 --> 00:14:21,170
problems in general when it comes to

366
00:14:21,170 --> 00:14:23,360
software design is not having a complete

367
00:14:23,360 --> 00:14:25,240
view of how everything comes together

368
00:14:25,240 --> 00:14:27,680
right this is not only about various

369
00:14:27,680 --> 00:14:29,389
subject in computer science but this is

370
00:14:29,389 --> 00:14:32,660
a very important question is how the

371
00:14:32,660 --> 00:14:34,670
whole thing works all right so the

372
00:14:34,670 --> 00:14:36,139
interesting question is to start

373
00:14:36,139 --> 00:14:39,199
wheezing people to say okay so we have

374
00:14:39,199 --> 00:14:40,880
an application on let's say your cell

375
00:14:40,880 --> 00:14:43,519
phone let's get down to the lowest bits

376
00:14:43,519 --> 00:14:45,529
the lowest parts we get and try to

377
00:14:45,529 --> 00:14:47,079
figure out how things actually work

378
00:14:47,079 --> 00:14:49,910
right now that kind of a question I mean

379
00:14:49,910 --> 00:14:52,810
first of all it can become overwhelming

380
00:14:52,810 --> 00:14:55,189
most people have very weird ideas about

381
00:14:55,189 --> 00:14:57,829
how things work alright so I try to

382
00:14:57,829 --> 00:14:59,149
explain for example to my wife how the

383
00:14:59,149 --> 00:15:02,209
internet works well I had to give up

384
00:15:02,209 --> 00:15:04,639
about five minutes because she said okay

385
00:15:04,639 --> 00:15:06,170
so I'm still gonna keep my opinion that

386
00:15:06,170 --> 00:15:09,290
there is some sort of a Holy Spirit that

387
00:15:09,290 --> 00:15:11,000
keeps everything together and just makes

388
00:15:11,000 --> 00:15:12,079
things work and that's good enough for

389
00:15:12,079 --> 00:15:14,269
me so that's it right but this is a

390
00:15:14,269 --> 00:15:15,380
legitimate questions how does it

391
00:15:15,380 --> 00:15:19,250
actually work okay now by the way this

392
00:15:19,250 --> 00:15:22,120
is a very hard question to answer

393
00:15:22,120 --> 00:15:25,220
because it's easy to say what the

394
00:15:25,220 --> 00:15:26,990
mechanisms are but is not quite clear

395
00:15:26,990 --> 00:15:28,670
what all the parts do together it's a

396
00:15:28,670 --> 00:15:31,730
very very complex system okay so the

397
00:15:31,730 --> 00:15:33,439
parts don't really describe with the

398
00:15:33,439 --> 00:15:34,819
behavior of the entire system because

399
00:15:34,819 --> 00:15:37,069
they're not fully predictable alright so

400
00:15:37,069 --> 00:15:39,319
when it comes to design of an

401
00:15:39,319 --> 00:15:40,759
application there is some sort of a

402
00:15:40,759 --> 00:15:44,060
classic approach now I mean it's it kind

403
00:15:44,060 --> 00:15:46,399
of evolved over many decades but it's

404
00:15:46,399 --> 00:15:48,500
now about there and this is really what

405
00:15:48,500 --> 00:15:50,029
this is all about right so you have some

406
00:15:50,029 --> 00:15:51,769
sort of a user interface level a

407
00:15:51,769 --> 00:15:54,019
processing level and a data level the

408
00:15:54,019 --> 00:15:55,519
data level it's almost always there

409
00:15:55,519 --> 00:15:57,100
because you need to make things

410
00:15:57,100 --> 00:15:58,690
one way or another ultimately all

411
00:15:58,690 --> 00:16:00,550
computer systems are about exchanging

412
00:16:00,550 --> 00:16:03,430
some sort of information without that

413
00:16:03,430 --> 00:16:08,470
it's kind of a sport without purpose all

414
00:16:08,470 --> 00:16:10,000
right I mean who wants to create a

415
00:16:10,000 --> 00:16:12,190
distributed network just to measure how

416
00:16:12,190 --> 00:16:13,690
many messages we can send per second

417
00:16:13,690 --> 00:16:16,570
it's it's fun for a little bit but then

418
00:16:16,570 --> 00:16:19,480
what right so when it comes to getting

419
00:16:19,480 --> 00:16:21,070
things done with computers ultimately

420
00:16:21,070 --> 00:16:23,380
it's all about some sort of data

421
00:16:23,380 --> 00:16:27,250
exchange well it could be movement in a

422
00:16:27,250 --> 00:16:29,440
massively multiplayer game but is still

423
00:16:29,440 --> 00:16:32,320
data okay so it's data I mean it could

424
00:16:32,320 --> 00:16:33,970
be visual it could be all it's still

425
00:16:33,970 --> 00:16:36,760
data okay good so the classic

426
00:16:36,760 --> 00:16:39,820
architecture might look like this so we

427
00:16:39,820 --> 00:16:41,770
have some sort of user interface here

428
00:16:41,770 --> 00:16:44,200
then this level the processing level

429
00:16:44,200 --> 00:16:45,970
might have all kinds of fancy things for

430
00:16:45,970 --> 00:16:48,910
example some core sort of a query

431
00:16:48,910 --> 00:16:51,970
generator then the information goes to

432
00:16:51,970 --> 00:16:53,890
whatever a database a is some sort of a

433
00:16:53,890 --> 00:16:57,190
data layer and then data goes back but

434
00:16:57,190 --> 00:16:59,230
raw data especially coming from

435
00:16:59,230 --> 00:17:01,390
relational databases it's it's really

436
00:17:01,390 --> 00:17:03,280
dry I mean it looks like some tables

437
00:17:03,280 --> 00:17:05,319
that are very boring now of course you

438
00:17:05,319 --> 00:17:07,089
could display tables right but that's

439
00:17:07,089 --> 00:17:10,000
really really boring so then what you

440
00:17:10,000 --> 00:17:12,640
might want to do is some sort of a nicer

441
00:17:12,640 --> 00:17:14,829
processing on top of the table maybe

442
00:17:14,829 --> 00:17:17,890
some ranking some nice HTML generation

443
00:17:17,890 --> 00:17:20,140
oh right and then present the user in

444
00:17:20,140 --> 00:17:21,430
the user interface that's kind of a nice

445
00:17:21,430 --> 00:17:23,530
webpage as opposed to just some numbers

446
00:17:23,530 --> 00:17:28,630
in a table okay now you could see how

447
00:17:28,630 --> 00:17:30,250
this Artic the architecture might work

448
00:17:30,250 --> 00:17:31,810
but let's say you want to actually

449
00:17:31,810 --> 00:17:34,840
implement this the question is where

450
00:17:34,840 --> 00:17:36,820
would you put various components what

451
00:17:36,820 --> 00:17:39,640
runs where okay and this in fact is a

452
00:17:39,640 --> 00:17:40,990
very legitimate question let me see what

453
00:17:40,990 --> 00:17:43,300
where my other slide is okay so I want

454
00:17:43,300 --> 00:17:44,500
to talk a little bit about this slide

455
00:17:44,500 --> 00:17:47,050
okay so in this kind of phone

456
00:17:47,050 --> 00:17:48,070
architecture right you have the user

457
00:17:48,070 --> 00:17:49,300
interface the application and the

458
00:17:49,300 --> 00:17:50,620
database alright the processing layer is

459
00:17:50,620 --> 00:17:53,860
the application of and the interesting

460
00:17:53,860 --> 00:17:57,310
question is where do you separate the

461
00:17:57,310 --> 00:17:59,740
code that runs on let's say the two

462
00:17:59,740 --> 00:18:03,490
client let's say my phone or my my

463
00:18:03,490 --> 00:18:05,440
desktop machine or my laptop or whatever

464
00:18:05,440 --> 00:18:07,210
you want versus the code that runs on

465
00:18:07,210 --> 00:18:08,870
server whatever server means

466
00:18:08,870 --> 00:18:10,880
it may be the collection of machines

467
00:18:10,880 --> 00:18:13,970
that provide this virtual server okay

468
00:18:13,970 --> 00:18:17,570
and this drawing suggests that basically

469
00:18:17,570 --> 00:18:19,850
you can have anything from a very very

470
00:18:19,850 --> 00:18:21,799
shallow user interface and by the way

471
00:18:21,799 --> 00:18:24,950
this is what was happening in times of

472
00:18:24,950 --> 00:18:27,710
very expensive computers so if the

473
00:18:27,710 --> 00:18:29,240
computers are extremely expensive than

474
00:18:29,240 --> 00:18:31,640
this happen in the 60s right then

475
00:18:31,640 --> 00:18:33,380
essentially what you do is you produce

476
00:18:33,380 --> 00:18:35,720
lots of dumb terminals that can

477
00:18:35,720 --> 00:18:37,279
essentially just put letters on the

478
00:18:37,279 --> 00:18:38,840
screen and they can take the keystrokes

479
00:18:38,840 --> 00:18:40,460
and send them to the server this is the

480
00:18:40,460 --> 00:18:42,890
mainframe era right and then the server

481
00:18:42,890 --> 00:18:44,510
runs everything else and then switches

482
00:18:44,510 --> 00:18:47,960
between all kinds of various terminals

483
00:18:47,960 --> 00:18:49,159
and the terminals are really dumb

484
00:18:49,159 --> 00:18:50,990
terminals all right

485
00:18:50,990 --> 00:18:53,270
so for example the terminals you have

486
00:18:53,270 --> 00:18:55,159
now the terminal emulators you have now

487
00:18:55,159 --> 00:18:56,899
let's say in Linux or some other UNIX

488
00:18:56,899 --> 00:18:58,399
system they are really if you want

489
00:18:58,399 --> 00:19:00,620
software emulation of a real real

490
00:19:00,620 --> 00:19:03,669
physical device I had in the 60 70s a

491
00:19:03,669 --> 00:19:06,919
terminal was only about a thousand

492
00:19:06,919 --> 00:19:08,630
dollars but the mainframe was ten

493
00:19:08,630 --> 00:19:11,990
million dollars hands you heard hundreds

494
00:19:11,990 --> 00:19:14,330
of terminals on top of the multi-million

495
00:19:14,330 --> 00:19:16,309
dollar mainstream so that means a really

496
00:19:16,309 --> 00:19:18,919
shallow user interface but you could go

497
00:19:18,919 --> 00:19:22,039
to the other extreme and by the way yeah

498
00:19:22,039 --> 00:19:24,500
so these guys drew the right picture but

499
00:19:24,500 --> 00:19:26,600
so let me give you an extreme situation

500
00:19:26,600 --> 00:19:29,419
like this right the other extreme

501
00:19:29,419 --> 00:19:30,740
situation is a situation in which

502
00:19:30,740 --> 00:19:33,350
virtually almost anything runs on the

503
00:19:33,350 --> 00:19:35,510
client and very little runs on the

504
00:19:35,510 --> 00:19:38,929
server by the way the more on the right

505
00:19:38,929 --> 00:19:41,929
side you are the more scalable the

506
00:19:41,929 --> 00:19:44,510
services right why especially if you

507
00:19:44,510 --> 00:19:46,640
have very powerful clients and now even

508
00:19:46,640 --> 00:19:48,220
cell phones are very powerful clients

509
00:19:48,220 --> 00:19:50,480
that means basically you can get away

510
00:19:50,480 --> 00:19:53,240
with very little on the server side so

511
00:19:53,240 --> 00:19:56,620
then it's easy to serve hundreds

512
00:19:56,620 --> 00:19:58,730
thousands even millions of requests

513
00:19:58,730 --> 00:20:00,470
right now any problems if you do very

514
00:20:00,470 --> 00:20:05,080
little ok so let me actually give you

515
00:20:05,169 --> 00:20:07,399
well let me give you an example because

516
00:20:07,399 --> 00:20:09,409
nothing is better than examples ok let

517
00:20:09,409 --> 00:20:11,750
me switch to web browser I didn't really

518
00:20:11,750 --> 00:20:14,120
intend to do this but let me do that so

519
00:20:14,120 --> 00:20:16,250
let's try right so let me describe as I

520
00:20:16,250 --> 00:20:18,370
go on the web site so this is basically

521
00:20:18,370 --> 00:20:21,549
visualization of

522
00:20:22,149 --> 00:20:26,440
Medicare data in 2011 the federal

523
00:20:26,440 --> 00:20:28,989
government made available Medicare data

524
00:20:28,989 --> 00:20:31,029
in the form of a large table well they

525
00:20:31,029 --> 00:20:33,489
made it available the three four months

526
00:20:33,489 --> 00:20:35,769
ago a table with a hundred and seventy

527
00:20:35,769 --> 00:20:37,809
one hundred sixty-three thousand rows

528
00:20:37,809 --> 00:20:40,029
right so you remember those boring

529
00:20:40,029 --> 00:20:41,830
tables that I mentioned right that's one

530
00:20:41,830 --> 00:20:43,419
of those boring tables it tells you

531
00:20:43,419 --> 00:20:46,269
various hospitals that did various kind

532
00:20:46,269 --> 00:20:49,179
of procedures how much they asked for

533
00:20:49,179 --> 00:20:51,219
Medicare and how much Medicare paid for

534
00:20:51,219 --> 00:20:53,169
every procedure interesting information

535
00:20:53,169 --> 00:20:54,669
is just that the table is boring so what

536
00:20:54,669 --> 00:20:56,049
you do you pull it off in Excel for

537
00:20:56,049 --> 00:20:57,629
example but let's look at the web

538
00:20:57,629 --> 00:21:03,999
front-end for this I will take me a few

539
00:21:03,999 --> 00:21:11,830
seconds alright so something interesting

540
00:21:11,830 --> 00:21:14,349
happens low did you see the loading data

541
00:21:14,349 --> 00:21:16,389
that was a lot about two seconds right

542
00:21:16,389 --> 00:21:20,200
and now some disclaimer okay because hey

543
00:21:20,200 --> 00:21:23,710
right let me see if f11 works so

544
00:21:23,710 --> 00:21:25,210
basically this is a visualization on

545
00:21:25,210 --> 00:21:27,070
that data you can click on a state what

546
00:21:27,070 --> 00:21:29,710
did I pick I don't know some sort of New

547
00:21:29,710 --> 00:21:32,200
Mexico right this is a disease you can

548
00:21:32,200 --> 00:21:34,839
click on the disease things get computed

549
00:21:34,839 --> 00:21:36,429
in the browser is very nice you can

550
00:21:36,429 --> 00:21:40,119
highlight something right they select

551
00:21:40,119 --> 00:21:42,639
this so what you have is data processing

552
00:21:42,639 --> 00:21:46,899
at very very high speeds right sorts on

553
00:21:46,899 --> 00:21:50,409
the table select States literally in 100

554
00:21:50,409 --> 00:21:51,580
millisecond you get the answer for

555
00:21:51,580 --> 00:21:52,149
everything

556
00:21:52,149 --> 00:21:55,029
so this makes that a hundred and sixty

557
00:21:55,029 --> 00:21:58,539
three thousand row table fun to navigate

558
00:21:58,539 --> 00:22:03,039
right okay now interesting question how

559
00:22:03,039 --> 00:22:07,349
is this done how comes it works so fast

560
00:22:07,349 --> 00:22:10,029
so how many of you ran a query on a

561
00:22:10,029 --> 00:22:12,909
database engine that over a table that

562
00:22:12,909 --> 00:22:14,379
has a hundred and sixty-three thousand

563
00:22:14,379 --> 00:22:17,919
tables or didn't Excel something right

564
00:22:17,919 --> 00:22:20,139
so that was what about ten seconds right

565
00:22:20,139 --> 00:22:21,849
almost on no servers depending on how

566
00:22:21,849 --> 00:22:24,460
you set things up so how does it work so

567
00:22:24,460 --> 00:22:28,499
fast here what's going on in here yes

568
00:22:30,879 --> 00:22:32,720
so we're getting somewhere so a

569
00:22:32,720 --> 00:22:35,269
suggestion here is those two seconds we

570
00:22:35,269 --> 00:22:36,289
had at the beginning that was

571
00:22:36,289 --> 00:22:38,960
downloading all the data right she

572
00:22:38,960 --> 00:22:40,250
remember the architecture we had on the

573
00:22:40,250 --> 00:22:41,889
board that the extreme version was

574
00:22:41,889 --> 00:22:44,570
everything except maybe the very lowest

575
00:22:44,570 --> 00:22:46,970
part of the database was in the client

576
00:22:46,970 --> 00:22:48,710
it turns out that everything here is in

577
00:22:48,710 --> 00:22:51,950
the client except the raw table right

578
00:22:51,950 --> 00:22:54,740
now cleverly the raw table was converted

579
00:22:54,740 --> 00:22:56,899
from a boring table format to directly

580
00:22:56,899 --> 00:22:59,330
Jason because JavaScript likes Jason and

581
00:22:59,330 --> 00:23:00,919
essentially every time you click

582
00:23:00,919 --> 00:23:03,080
anything in here on this website it runs

583
00:23:03,080 --> 00:23:04,639
in the browser it does query processing

584
00:23:04,639 --> 00:23:07,039
in the browser so now you have query

585
00:23:07,039 --> 00:23:08,539
processing in JavaScript rather than

586
00:23:08,539 --> 00:23:11,269
send requests to the server right now

587
00:23:11,269 --> 00:23:13,309
this is good why for two reasons

588
00:23:13,309 --> 00:23:15,799
one of them the server does nothing

589
00:23:15,799 --> 00:23:18,889
except serve that initial data right it

590
00:23:18,889 --> 00:23:20,840
literally does nothing it served the raw

591
00:23:20,840 --> 00:23:23,210
JavaScript code the raw HTML there's no

592
00:23:23,210 --> 00:23:25,669
changing there is no HTML request it

593
00:23:25,669 --> 00:23:26,779
doesn't matter what you click click in

594
00:23:26,779 --> 00:23:29,090
here right Sarah has nothing to do if

595
00:23:29,090 --> 00:23:31,519
you want to literally have a million

596
00:23:31,519 --> 00:23:32,899
people using this as long as they can

597
00:23:32,899 --> 00:23:34,519
get it the data gets cached on their on

598
00:23:34,519 --> 00:23:35,990
their machine the data doesn't change

599
00:23:35,990 --> 00:23:37,659
because the feds don't publish more data

600
00:23:37,659 --> 00:23:39,980
right you don't even need to ask the

601
00:23:39,980 --> 00:23:41,840
server again data is cached on the

602
00:23:41,840 --> 00:23:44,870
client right so one way to achieve

603
00:23:44,870 --> 00:23:47,330
scalability is to push a lot of stuff in

604
00:23:47,330 --> 00:23:48,500
the client and this is an extreme

605
00:23:48,500 --> 00:23:49,789
situation in which virtually everything

606
00:23:49,789 --> 00:23:51,649
is pushed in the client ok but then of

607
00:23:51,649 --> 00:23:52,669
course you have to pull off lots of

608
00:23:52,669 --> 00:23:58,549
stunts in JavaScript ok so why am i show

609
00:23:58,549 --> 00:24:03,769
you do this oops one because it's

610
00:24:03,769 --> 00:24:06,679
possible true because that's how you get

611
00:24:06,679 --> 00:24:08,450
scalability you remember the other core

612
00:24:08,450 --> 00:24:13,460
obsession in distributed systems is how

613
00:24:13,460 --> 00:24:14,899
do you scale things up well you scale

614
00:24:14,899 --> 00:24:16,549
them up by asking the server to do as

615
00:24:16,549 --> 00:24:18,950
little as possible ok so it's possible

616
00:24:18,950 --> 00:24:21,919
now to be even more than this where

617
00:24:21,919 --> 00:24:25,399
there is almost no no data I'm sorry

618
00:24:25,399 --> 00:24:27,710
there's no almost no database that's

619
00:24:27,710 --> 00:24:29,179
possible you have mostly read-only data

620
00:24:29,179 --> 00:24:35,320
okay all right good now

621
00:24:35,320 --> 00:24:37,910
when it comes to this kind of message

622
00:24:37,910 --> 00:24:39,289
exchanges you're gonna see a lot of

623
00:24:39,289 --> 00:24:40,760
these diagrams which you might have seen

624
00:24:40,760 --> 00:24:42,830
for example in the hardware class this

625
00:24:42,830 --> 00:24:46,299
literally are diagrams of kind of

626
00:24:46,299 --> 00:24:48,320
information exchange diagrams that come

627
00:24:48,320 --> 00:24:50,330
from architecture designers in which you

628
00:24:50,330 --> 00:24:52,309
show some sort of a clock and then

629
00:24:52,309 --> 00:24:54,590
things go in a certain way right so when

630
00:24:54,590 --> 00:24:56,240
it comes to we can borrow those same

631
00:24:56,240 --> 00:24:58,490
formalism it's kind of nice right so

632
00:24:58,490 --> 00:25:00,380
when it comes to how things could work

633
00:25:00,380 --> 00:25:02,240
right if you have this simple

634
00:25:02,240 --> 00:25:04,400
architecture the user interface could

635
00:25:04,400 --> 00:25:06,380
have some sort of a request operation to

636
00:25:06,380 --> 00:25:07,789
the application server who can have a

637
00:25:07,789 --> 00:25:09,289
request to the database server at some

638
00:25:09,289 --> 00:25:11,179
point the data is returned and result is

639
00:25:11,179 --> 00:25:12,590
returned now the interesting thing about

640
00:25:12,590 --> 00:25:15,620
this kind of an architecture is it

641
00:25:15,620 --> 00:25:17,570
doesn't really matter where you draw the

642
00:25:17,570 --> 00:25:19,340
line what runs on the client computer

643
00:25:19,340 --> 00:25:20,990
what runs on the server computer it's

644
00:25:20,990 --> 00:25:22,700
still healthy to think about this

645
00:25:22,700 --> 00:25:25,190
architecture in particular when you have

646
00:25:25,190 --> 00:25:26,630
to write that JavaScript code so

647
00:25:26,630 --> 00:25:28,970
virtually the JavaScript code now does

648
00:25:28,970 --> 00:25:30,620
almost everything in the interface I

649
00:25:30,620 --> 00:25:32,090
showed you when you're thinking about

650
00:25:32,090 --> 00:25:33,770
designing that JavaScript code you

651
00:25:33,770 --> 00:25:35,480
essentially still want to say I have the

652
00:25:35,480 --> 00:25:38,419
same architecture except now that any

653
00:25:38,419 --> 00:25:40,789
kind of message exchange is in fact some

654
00:25:40,789 --> 00:25:43,250
sort of a function call right so then

655
00:25:43,250 --> 00:25:44,570
selectively you can replace function

656
00:25:44,570 --> 00:25:47,600
calls by remove function calls or some

657
00:25:47,600 --> 00:25:50,360
sort of message exchange in one way or

658
00:25:50,360 --> 00:25:52,400
another and you essentially can decide

659
00:25:52,400 --> 00:25:54,049
where to actually place various

660
00:25:54,049 --> 00:25:55,280
functionality so I want you to

661
00:25:55,280 --> 00:25:57,230
understand that the architecture is

662
00:25:57,230 --> 00:26:00,260
dissociated in fact from the specific

663
00:26:00,260 --> 00:26:02,659
implementation where things run how they

664
00:26:02,659 --> 00:26:04,220
run and what's actually happening and

665
00:26:04,220 --> 00:26:06,230
this is one of the things you want to do

666
00:26:06,230 --> 00:26:08,090
when you discuss about any kind of

667
00:26:08,090 --> 00:26:09,770
software writing but distribute systems

668
00:26:09,770 --> 00:26:11,840
in particular right dissociating the

669
00:26:11,840 --> 00:26:13,580
ideas the fact that you're going to do

670
00:26:13,580 --> 00:26:14,990
layering that's a way to organize the

671
00:26:14,990 --> 00:26:17,450
data from specifically what is

672
00:26:17,450 --> 00:26:21,850
implemented where all right

673
00:26:23,080 --> 00:26:25,269
okay now I mentioned the fact that we

674
00:26:25,269 --> 00:26:26,769
have client-server architectures in

675
00:26:26,769 --> 00:26:28,239
which you have clients and servers and

676
00:26:28,239 --> 00:26:30,580
by the way that's how the web is powered

677
00:26:30,580 --> 00:26:33,369
at least in principle right so when you

678
00:26:33,369 --> 00:26:36,480
want to read your mail you go to

679
00:26:36,480 --> 00:26:38,200
gmail.com

680
00:26:38,200 --> 00:26:39,970
that means you access some sort of a

681
00:26:39,970 --> 00:26:42,940
server by the way there is a lot of Java

682
00:26:42,940 --> 00:26:44,830
Script magic happening in the in the

683
00:26:44,830 --> 00:26:48,549
Gmail application right literally Gmail

684
00:26:48,549 --> 00:26:52,419
drove the development of Chrome as a web

685
00:26:52,419 --> 00:26:55,389
browser a lot of features and a lot of

686
00:26:55,389 --> 00:26:57,369
speeding chrome comes from the need to

687
00:26:57,369 --> 00:27:01,960
do the kind of advanced email interface

688
00:27:01,960 --> 00:27:04,779
that the Gmail has okay and once chrome

689
00:27:04,779 --> 00:27:06,460
got the speed everybody else had

690
00:27:06,460 --> 00:27:07,989
pressure on them to increase the speed

691
00:27:07,989 --> 00:27:09,429
and this is why we have good browsers

692
00:27:09,429 --> 00:27:11,919
now right because of this application

693
00:27:11,919 --> 00:27:14,679
application pressure you want more

694
00:27:14,679 --> 00:27:16,029
advanced applications you put more

695
00:27:16,029 --> 00:27:18,730
pressure on the mid layer which is now

696
00:27:18,730 --> 00:27:22,080
the web browser right and then

697
00:27:22,080 --> 00:27:25,509
competition progress great browsers okay

698
00:27:25,509 --> 00:27:27,070
I mentioned the fact that I've seen a 3d

699
00:27:27,070 --> 00:27:28,509
game running in a browser right Unreal

700
00:27:28,509 --> 00:27:30,580
Tournament things are getting much

701
00:27:30,580 --> 00:27:30,999
better

702
00:27:30,999 --> 00:27:34,509
all right now the peer-to-peer networks

703
00:27:34,509 --> 00:27:37,049
are the opposite if you want of

704
00:27:37,049 --> 00:27:40,480
centralized systems in which there's

705
00:27:40,480 --> 00:27:42,129
really no more client and server

706
00:27:42,129 --> 00:27:43,600
everybody is both a client and a server

707
00:27:43,600 --> 00:27:46,179
at the same time and some sort of a more

708
00:27:46,179 --> 00:27:48,549
global collaboration now there are many

709
00:27:48,549 --> 00:27:50,440
reasons why you might want to do a

710
00:27:50,440 --> 00:27:52,629
peer-to-peer system right one of them is

711
00:27:52,629 --> 00:27:55,960
extreme resilience for example it's not

712
00:27:55,960 --> 00:27:59,289
enough to take down some of one or some

713
00:27:59,289 --> 00:28:00,609
of the servers to take down the entire

714
00:28:00,609 --> 00:28:02,409
service and this might be important for

715
00:28:02,409 --> 00:28:04,450
many many reasons so how many people

716
00:28:04,450 --> 00:28:08,919
know about Napster how many people know

717
00:28:08,919 --> 00:28:12,340
how Napster worked so the key question

718
00:28:12,340 --> 00:28:14,619
is one second the key question we have

719
00:28:14,619 --> 00:28:18,039
to ask is was Napster client-server

720
00:28:18,039 --> 00:28:19,899
architecture or was some more

721
00:28:19,899 --> 00:28:23,940
peer-to-peer decentralized thing so

722
00:28:25,280 --> 00:28:29,580
right so natural was the first service

723
00:28:29,580 --> 00:28:32,429
we want it to be a peer-to-peer but in

724
00:28:32,429 --> 00:28:36,539
fact part of it was client-server ok the

725
00:28:36,539 --> 00:28:37,650
question is which part we're going to

726
00:28:37,650 --> 00:28:38,850
come back to an absurd later but which

727
00:28:38,850 --> 00:28:41,220
part was client-server so you see when

728
00:28:41,220 --> 00:28:42,900
it comes to finally when it comes to

729
00:28:42,900 --> 00:28:45,210
accessing resources we are going to see

730
00:28:45,210 --> 00:28:47,220
this later you need at least two kinds

731
00:28:47,220 --> 00:28:48,809
of activities one of them is to find the

732
00:28:48,809 --> 00:28:50,039
resource who has the resource and then

733
00:28:50,039 --> 00:28:52,590
to access it right now what next er had

734
00:28:52,590 --> 00:28:55,470
is who had the resource who was the

735
00:28:55,470 --> 00:28:56,730
server that had the resource they were

736
00:28:56,730 --> 00:28:58,919
really having millions of servers

737
00:28:58,919 --> 00:29:00,480
because every client was a server as

738
00:29:00,480 --> 00:29:03,030
well ok if you want it to be but what it

739
00:29:03,030 --> 00:29:04,470
had as a pure client-server architecture

740
00:29:04,470 --> 00:29:07,860
was the name lookup the lookup between

741
00:29:07,860 --> 00:29:10,890
the name of the file some sort of a

742
00:29:10,890 --> 00:29:12,690
directory structure right the directory

743
00:29:12,690 --> 00:29:15,179
structure was client-server the content

744
00:29:15,179 --> 00:29:18,720
was peer-to-peer now you see if the

745
00:29:18,720 --> 00:29:20,640
directories so if you take care of the

746
00:29:20,640 --> 00:29:22,890
directory structure then you don't need

747
00:29:22,890 --> 00:29:24,570
much of an organization for where the

748
00:29:24,570 --> 00:29:26,309
content is because you just get a random

749
00:29:26,309 --> 00:29:27,900
IP address you go to that IP address and

750
00:29:27,900 --> 00:29:30,000
you're done and the tcp/ip protocol does

751
00:29:30,000 --> 00:29:32,490
its job and it's very easy to implement

752
00:29:32,490 --> 00:29:35,669
the content access just point-to-point

753
00:29:35,669 --> 00:29:39,390
connections by the way the network sack

754
00:29:39,390 --> 00:29:42,260
is designed so that every single

755
00:29:42,260 --> 00:29:44,610
computer can be both a client or a

756
00:29:44,610 --> 00:29:46,289
server at the same time namely it can

757
00:29:46,289 --> 00:29:49,500
both open connections and listen for

758
00:29:49,500 --> 00:29:51,059
connections to be opened and actually

759
00:29:51,059 --> 00:29:53,400
both are needed to even run what you

760
00:29:53,400 --> 00:29:55,260
would normally think as being a normal

761
00:29:55,260 --> 00:29:59,400
client ok good now why is this important

762
00:29:59,400 --> 00:30:03,330
because there is then a very clear way

763
00:30:03,330 --> 00:30:05,159
to take down than the Napster service

764
00:30:05,159 --> 00:30:06,900
what you do you shut down the name

765
00:30:06,900 --> 00:30:08,610
servers it doesn't matter that you have

766
00:30:08,610 --> 00:30:12,299
millions of actual servers that happen

767
00:30:12,299 --> 00:30:13,919
to also be client and have content now

768
00:30:13,919 --> 00:30:17,659
nobody knows where the content is right

769
00:30:17,659 --> 00:30:20,610
so if that's effective what happening I

770
00:30:20,610 --> 00:30:22,919
think in 2001 right the music industry

771
00:30:22,919 --> 00:30:24,510
was completely outraged

772
00:30:24,510 --> 00:30:26,220
I mean after was cool for a couple of

773
00:30:26,220 --> 00:30:28,919
years they start to have hundreds of

774
00:30:28,919 --> 00:30:32,250
millions of users then they shut down

775
00:30:32,250 --> 00:30:34,049
the name servers they only had about

776
00:30:34,049 --> 00:30:35,789
fourteen by the way and they were

777
00:30:35,789 --> 00:30:36,780
geographically

778
00:30:36,780 --> 00:30:39,060
this is a classic approach to do some

779
00:30:39,060 --> 00:30:41,790
sort of balancing right load balancing

780
00:30:41,790 --> 00:30:45,660
to avoid having overwhelm servers if you

781
00:30:45,660 --> 00:30:47,790
have millions of people looking up in

782
00:30:47,790 --> 00:30:51,630
your in your name server right a single

783
00:30:51,630 --> 00:30:53,670
machine start not to be so good even if

784
00:30:53,670 --> 00:30:55,320
you're just saying this machine with

785
00:30:55,320 --> 00:30:57,000
this IP address has the file if that

786
00:30:57,000 --> 00:30:58,500
becomes overwhelming if you have

787
00:30:58,500 --> 00:31:00,690
millions of such requests flying at

788
00:31:00,690 --> 00:31:04,950
every every moment ok good so then what

789
00:31:04,950 --> 00:31:07,350
do you do you design systems that have

790
00:31:07,350 --> 00:31:10,560
absolutely no centralization really the

791
00:31:10,560 --> 00:31:16,130
goal there was not to do so because the

792
00:31:16,130 --> 00:31:18,210
Napster hit actually didn't work it

793
00:31:18,210 --> 00:31:20,460
worked perfectly I would say it was to

794
00:31:20,460 --> 00:31:21,780
do it because then it's virtually

795
00:31:21,780 --> 00:31:23,850
impossible to take it down so if you

796
00:31:23,850 --> 00:31:26,670
could remove the need for a centralized

797
00:31:26,670 --> 00:31:28,680
name service and you could have a true

798
00:31:28,680 --> 00:31:31,020
peer-to-peer means everybody can act

799
00:31:31,020 --> 00:31:32,640
both as a client and a server a true

800
00:31:32,640 --> 00:31:34,500
peer-to-peer system then essentially

801
00:31:34,500 --> 00:31:36,890
even if you take millions of nodes off

802
00:31:36,890 --> 00:31:39,450
the system supposedly is still running

803
00:31:39,450 --> 00:31:41,160
so a core interesting question we are

804
00:31:41,160 --> 00:31:42,690
going to see this later in the class is

805
00:31:42,690 --> 00:31:43,890
how do you design this peer-to-peer

806
00:31:43,890 --> 00:31:45,480
systems that can survive this massive

807
00:31:45,480 --> 00:31:47,730
removal of nodes and still kind of pick

808
00:31:47,730 --> 00:31:49,380
if you do that then it's virtually

809
00:31:49,380 --> 00:31:51,150
impossible to shut down the system right

810
00:31:51,150 --> 00:31:53,370
and in fact this is happening to a large

811
00:31:53,370 --> 00:31:55,050
extent with the torrent right you can

812
00:31:55,050 --> 00:31:56,790
start one down another one Springs up I

813
00:31:56,790 --> 00:31:58,260
mean it's everywhere or pure

814
00:31:58,260 --> 00:32:00,660
peer-to-peer systems ok now when it

815
00:32:00,660 --> 00:32:02,430
comes to these peer-to-peer systems you

816
00:32:02,430 --> 00:32:03,930
can ask a more structured question is

817
00:32:03,930 --> 00:32:05,780
how should we go about designing them

818
00:32:05,780 --> 00:32:08,250
can you be systematic in a certain way

819
00:32:08,250 --> 00:32:11,910
and if this was really the darling

820
00:32:11,910 --> 00:32:13,710
research topic in distributed systems

821
00:32:13,710 --> 00:32:16,440
for a while so people came up with a

822
00:32:16,440 --> 00:32:18,870
number of interesting interesting

823
00:32:18,870 --> 00:32:20,400
solutions for this one of them is this

824
00:32:20,400 --> 00:32:21,740
idea of a structured peer-to-peer

825
00:32:21,740 --> 00:32:24,480
network so in a peer-to-peer network you

826
00:32:24,480 --> 00:32:25,980
want to know about nodes but not about

827
00:32:25,980 --> 00:32:27,630
all the nodes you simply can't keep

828
00:32:27,630 --> 00:32:29,460
track of all the nodes in the system if

829
00:32:29,460 --> 00:32:32,240
you have a million nodes in the system

830
00:32:32,240 --> 00:32:34,290
accurately keeping track of a million

831
00:32:34,290 --> 00:32:36,300
nodes it's essentially an impossible

832
00:32:36,300 --> 00:32:38,280
task now I want you to understand that

833
00:32:38,280 --> 00:32:39,510
it's impossible now because you don't

834
00:32:39,510 --> 00:32:41,310
have enough memory to keep track of a

835
00:32:41,310 --> 00:32:42,510
million nodes so that's not a problem

836
00:32:42,510 --> 00:32:45,780
right now it's simply you can't possibly

837
00:32:45,780 --> 00:32:49,100
know what even ten guys for sure do like

838
00:32:49,100 --> 00:32:51,320
a million guys though right so knowing

839
00:32:51,320 --> 00:32:53,090
which of the million are alive because

840
00:32:53,090 --> 00:32:55,009
if you contact servers are not alive

841
00:32:55,009 --> 00:32:56,840
it's kind of tough or distributing that

842
00:32:56,840 --> 00:32:58,909
information right when another know

843
00:32:58,909 --> 00:33:00,769
joins if you really have a million nodes

844
00:33:00,769 --> 00:33:02,299
in and you have to tell this guy what

845
00:33:02,299 --> 00:33:04,610
all the million other guys do it starts

846
00:33:04,610 --> 00:33:06,940
to be problematic okay

847
00:33:06,940 --> 00:33:09,259
so the structure peer-to-peer networks

848
00:33:09,259 --> 00:33:13,820
came came up with a very systematic way

849
00:33:13,820 --> 00:33:15,559
to decide which are the nodes you should

850
00:33:15,559 --> 00:33:17,600
know about which are the connections you

851
00:33:17,600 --> 00:33:20,029
should keep track of any particular they

852
00:33:20,029 --> 00:33:21,139
were looking for just a logarithmic

853
00:33:21,139 --> 00:33:23,509
number of such connections right again

854
00:33:23,509 --> 00:33:26,059
by the way you can already see a little

855
00:33:26,059 --> 00:33:29,210
bit of big o-notation obsession creeping

856
00:33:29,210 --> 00:33:32,149
in while logarithmic this is very

857
00:33:32,149 --> 00:33:35,690
legitimate question but logarithms are

858
00:33:35,690 --> 00:33:38,450
kind of very nice mostly because the

859
00:33:38,450 --> 00:33:40,490
theory fictions told us that logarithms

860
00:33:40,490 --> 00:33:42,139
are good and anything that's not too

861
00:33:42,139 --> 00:33:44,120
logarithm it's bad but I mean if you

862
00:33:44,120 --> 00:33:45,529
think about it even square root might be

863
00:33:45,529 --> 00:33:48,230
fine right square root of a million it's

864
00:33:48,230 --> 00:33:50,029
a thousand well a thousand is not such a

865
00:33:50,029 --> 00:33:52,159
bad thing number of nodes to keep track

866
00:33:52,159 --> 00:33:54,789
of I would say okay but nevertheless

867
00:33:54,789 --> 00:33:56,539
virtually all the structure peer-to-peer

868
00:33:56,539 --> 00:33:58,940
architectures like that logarithm and

869
00:33:58,940 --> 00:34:00,049
they're gonna keep track of only a

870
00:34:00,049 --> 00:34:02,600
longer than a number of such nodes okay

871
00:34:02,600 --> 00:34:04,429
how they do that we are gonna go into

872
00:34:04,429 --> 00:34:06,169
many many details later in the class

873
00:34:06,169 --> 00:34:09,619
right yes by coming with various tricks

874
00:34:09,619 --> 00:34:11,540
to be able to in fact compute which are

875
00:34:11,540 --> 00:34:13,969
the the logarithm number of neighbors

876
00:34:13,969 --> 00:34:19,099
you should actually have right almost

877
00:34:19,099 --> 00:34:20,719
all of them are based on on an idea

878
00:34:20,719 --> 00:34:22,909
called the distributed hash table okay

879
00:34:22,909 --> 00:34:25,099
and that forms some kind of a virtual

880
00:34:25,099 --> 00:34:27,859
ring and jumping on this ring with steps

881
00:34:27,859 --> 00:34:29,359
of increasing size it essentially gives

882
00:34:29,359 --> 00:34:30,889
you the longer if number of steps okay

883
00:34:30,889 --> 00:34:32,418
can be proven has all kinds of nice

884
00:34:32,418 --> 00:34:34,040
properties as long as things don't

885
00:34:34,040 --> 00:34:35,540
change too fast when becomes very

886
00:34:35,540 --> 00:34:37,129
problematic to analyze in any way shape

887
00:34:37,129 --> 00:34:37,750
or form

888
00:34:37,750 --> 00:34:41,449
okay another approach is to say you know

889
00:34:41,449 --> 00:34:43,639
what let's view the space of documents

890
00:34:43,639 --> 00:34:44,929
at some sort of a multi-dimensional

891
00:34:44,929 --> 00:34:46,879
space and let's cut it into pieces to

892
00:34:46,879 --> 00:34:49,668
determine who is responsible for every

893
00:34:49,668 --> 00:34:52,369
piece and then to find a way to navigate

894
00:34:52,369 --> 00:34:55,760
this space right so in any such net or

895
00:34:55,760 --> 00:34:57,290
peer-to-peer or not peer-to-peer the

896
00:34:57,290 --> 00:34:59,180
question is how can you find a resource

897
00:34:59,180 --> 00:35:00,980
by name who actually has the resource by

898
00:35:00,980 --> 00:35:01,530
name and then

899
00:35:01,530 --> 00:35:03,630
how can you grab the resource grabbing

900
00:35:03,630 --> 00:35:04,890
the resource it's easy if you can do

901
00:35:04,890 --> 00:35:06,690
this point-to-point connections so kind

902
00:35:06,690 --> 00:35:08,040
of keep on talking about point-to-point

903
00:35:08,040 --> 00:35:10,260
connections versus some sort of a lookup

904
00:35:10,260 --> 00:35:13,170
right point of all connections are by

905
00:35:13,170 --> 00:35:17,520
far the best approach right unless the

906
00:35:17,520 --> 00:35:19,200
network becomes overwhelming to get data

907
00:35:19,200 --> 00:35:21,090
torrents do something interesting there

908
00:35:21,090 --> 00:35:23,910
right which is files get cached in

909
00:35:23,910 --> 00:35:26,400
multiple places to alleviate network

910
00:35:26,400 --> 00:35:28,050
strangulation right we're gonna come

911
00:35:28,050 --> 00:35:29,640
back to this as well this is also some

912
00:35:29,640 --> 00:35:31,170
sort of a disability assistance strategy

913
00:35:31,170 --> 00:35:33,630
right so can it's not such a system in

914
00:35:33,630 --> 00:35:35,970
which some sort of a partitioning of a

915
00:35:35,970 --> 00:35:37,020
high dimensional space is actually

916
00:35:37,020 --> 00:35:39,870
happening and then things are easy for

917
00:35:39,870 --> 00:35:42,000
awhile right for example if a new node

918
00:35:42,000 --> 00:35:45,540
joins in right you have this

919
00:35:45,540 --> 00:35:47,420
partitioning and you know joins in and

920
00:35:47,420 --> 00:35:49,470
randomly picks a location in the space

921
00:35:49,470 --> 00:35:50,970
and it picks this location then you have

922
00:35:50,970 --> 00:35:53,480
to split this location you cut it in two

923
00:35:53,480 --> 00:35:55,770
so the new node and you all know that

924
00:35:55,770 --> 00:35:58,020
own the entire location communicate and

925
00:35:58,020 --> 00:36:00,300
partitioned say the set of documents

926
00:36:00,300 --> 00:36:01,830
right and you partition the space but

927
00:36:01,830 --> 00:36:03,380
then the trouble is when this guy dies

928
00:36:03,380 --> 00:36:06,030
how do you put together such pieces

929
00:36:06,030 --> 00:36:07,740
they're not nice rectangular anymore all

930
00:36:07,740 --> 00:36:09,480
kinds of complications potentially right

931
00:36:09,480 --> 00:36:11,760
so all inevitably what's going to happen

932
00:36:11,760 --> 00:36:13,860
is any such solutions are going to have

933
00:36:13,860 --> 00:36:15,330
some very nice properties under certain

934
00:36:15,330 --> 00:36:17,250
circumstances in some weird situations

935
00:36:17,250 --> 00:36:19,410
to deal with in other circumstances okay

936
00:36:19,410 --> 00:36:21,270
now one of your protests is going to be

937
00:36:21,270 --> 00:36:22,650
to implement simulate one of these

938
00:36:22,650 --> 00:36:24,180
things we're using actors so it's gonna

939
00:36:24,180 --> 00:36:26,480
be quite a lot of fun

940
00:36:26,480 --> 00:36:30,600
all right now something that you might

941
00:36:30,600 --> 00:36:32,190
have seen in a networking class whenever

942
00:36:32,190 --> 00:36:34,370
it comes to communication you have to

943
00:36:34,370 --> 00:36:37,950
establish a so-called protocol right for

944
00:36:37,950 --> 00:36:39,840
example it's easy to say an actor talks

945
00:36:39,840 --> 00:36:41,940
to another actor but the question of

946
00:36:41,940 --> 00:36:44,640
course is what do they talk about what's

947
00:36:44,640 --> 00:36:47,250
in that message what one actor tells the

948
00:36:47,250 --> 00:36:49,860
other actor now that's important why

949
00:36:49,860 --> 00:36:52,290
because if an actor tells something to

950
00:36:52,290 --> 00:36:54,360
another actor then the actor that send

951
00:36:54,360 --> 00:36:55,830
the message can assume about what the

952
00:36:55,830 --> 00:36:57,450
other guy at least we'll know after a

953
00:36:57,450 --> 00:37:00,600
while all right and a lot of distributed

954
00:37:00,600 --> 00:37:03,690
systems are about who knows what so what

955
00:37:03,690 --> 00:37:05,070
can you assume that the other party

956
00:37:05,070 --> 00:37:08,220
knows right because if you can assume

957
00:37:08,220 --> 00:37:10,440
about what the other party knows you can

958
00:37:10,440 --> 00:37:12,270
have a certain expectation for what

959
00:37:12,270 --> 00:37:13,380
would happen when you ask a certain

960
00:37:13,380 --> 00:37:14,940
question and ultimately it's about

961
00:37:14,940 --> 00:37:15,460
getting

962
00:37:15,460 --> 00:37:18,570
something from from somewhere right so

963
00:37:18,570 --> 00:37:21,280
for example in a structure peer-to-peer

964
00:37:21,280 --> 00:37:24,190
network you can essentially figure out

965
00:37:24,190 --> 00:37:25,360
who should have a certain document

966
00:37:25,360 --> 00:37:27,760
because it's placed using certain rules

967
00:37:27,760 --> 00:37:29,650
and then the protocol can reflect that

968
00:37:29,650 --> 00:37:35,920
right now at the opposite end well there

969
00:37:35,920 --> 00:37:37,690
are many opposite ends but you can

970
00:37:37,690 --> 00:37:39,070
remove the structured peer-to-peer

971
00:37:39,070 --> 00:37:42,040
network and that requires using hashes

972
00:37:42,040 --> 00:37:43,360
in a very clever way one way or another

973
00:37:43,360 --> 00:37:44,770
and you can have purely unstructured

974
00:37:44,770 --> 00:37:48,730
ones in which it's truly ad-hoc I think

975
00:37:48,730 --> 00:37:52,750
Casa was like this right so Napster big

976
00:37:52,750 --> 00:37:54,370
problem with the service and casa came

977
00:37:54,370 --> 00:37:56,110
along in which you simply had to

978
00:37:56,110 --> 00:37:57,930
complete the ad-hoc

979
00:37:57,930 --> 00:37:59,800
Association all you had to know it's

980
00:37:59,800 --> 00:38:02,380
another another client that was part of

981
00:38:02,380 --> 00:38:04,060
the network and from that guy you got

982
00:38:04,060 --> 00:38:05,860
information about what he knows partial

983
00:38:05,860 --> 00:38:07,810
information what he knows so you can

984
00:38:07,810 --> 00:38:10,540
come up with higher-level ideas right

985
00:38:10,540 --> 00:38:13,870
in particular this idea of that what you

986
00:38:13,870 --> 00:38:16,990
know about other peers in a peer-to-peer

987
00:38:16,990 --> 00:38:18,900
system is some sort of a partial view

988
00:38:18,900 --> 00:38:21,700
right so essentially then you're saying

989
00:38:21,700 --> 00:38:23,770
hey there is some kind of a global view

990
00:38:23,770 --> 00:38:25,180
in the system is what the collection of

991
00:38:25,180 --> 00:38:28,300
all the peers do but no specific period

992
00:38:28,300 --> 00:38:30,310
is how gonna have a global view each of

993
00:38:30,310 --> 00:38:31,450
them are going to have only a partial

994
00:38:31,450 --> 00:38:32,560
view so they're not only about the

995
00:38:32,560 --> 00:38:34,690
subset of such nodes when they exchange

996
00:38:34,690 --> 00:38:36,070
messages they could exchange information

997
00:38:36,070 --> 00:38:38,290
about those partial views right so this

998
00:38:38,290 --> 00:38:41,080
is a particular strategy in which when a

999
00:38:41,080 --> 00:38:43,930
new node joins in or when you have one

1000
00:38:43,930 --> 00:38:46,570
want to talk about let's change a little

1001
00:38:46,570 --> 00:38:48,010
bit about what we know about in the

1002
00:38:48,010 --> 00:38:49,840
system and not really just going and

1003
00:38:49,840 --> 00:38:51,970
getting those files but trying to

1004
00:38:51,970 --> 00:38:53,290
maintain some sort of information about

1005
00:38:53,290 --> 00:38:55,360
connectivity right then essentially what

1006
00:38:55,360 --> 00:38:56,710
I can do is I can tell you what I know

1007
00:38:56,710 --> 00:38:58,990
about my partial view and you can

1008
00:38:58,990 --> 00:39:00,580
combine me with your parts with you now

1009
00:39:00,580 --> 00:39:02,410
what you do need is some mechanism to

1010
00:39:02,410 --> 00:39:03,640
limit the amount of information you have

1011
00:39:03,640 --> 00:39:05,920
to keep track of you can say hey I want

1012
00:39:05,920 --> 00:39:07,900
to know about see peers see some sort of

1013
00:39:07,900 --> 00:39:09,700
a constant okay maybe it's logarithmic

1014
00:39:09,700 --> 00:39:11,380
babies more than logarithmic whatever it

1015
00:39:11,380 --> 00:39:13,690
is so essentially what I can do is say

1016
00:39:13,690 --> 00:39:15,070
hey when I communicate with you I'll

1017
00:39:15,070 --> 00:39:20,560
pick randomly see over to half of the

1018
00:39:20,560 --> 00:39:22,210
nodes I know about and I'm gonna send

1019
00:39:22,210 --> 00:39:23,650
your message and tell you about those

1020
00:39:23,650 --> 00:39:25,420
nodes and essentially what you could do

1021
00:39:25,420 --> 00:39:27,430
is throw away half your information and

1022
00:39:27,430 --> 00:39:28,750
keep the half caf

1023
00:39:28,750 --> 00:39:30,130
from me and that will provide some kind

1024
00:39:30,130 --> 00:39:31,870
of a dynamic component into the system

1025
00:39:31,870 --> 00:39:34,120
that propagates the information not

1026
00:39:34,120 --> 00:39:36,430
quite clear why this is good but

1027
00:39:36,430 --> 00:39:38,320
potentially might be interesting right

1028
00:39:38,320 --> 00:39:39,790
well it turns out that this kind of

1029
00:39:39,790 --> 00:39:41,470
protocols it's a randomized protocols

1030
00:39:41,470 --> 00:39:45,160
right in which you randomly select some

1031
00:39:45,160 --> 00:39:46,540
parts of information and possibly even

1032
00:39:46,540 --> 00:39:47,890
randomly select who you send the

1033
00:39:47,890 --> 00:39:49,750
information to these kind of protocols

1034
00:39:49,750 --> 00:39:51,910
are extremely resilient now it's very

1035
00:39:51,910 --> 00:39:53,800
hard to understand intuitively why

1036
00:39:53,800 --> 00:39:54,490
that's the case

1037
00:39:54,490 --> 00:39:57,970
and it's hard theoretically to prove why

1038
00:39:57,970 --> 00:39:59,860
that's the case but it can be proved

1039
00:39:59,860 --> 00:40:01,480
theoretically right we are going to come

1040
00:40:01,480 --> 00:40:03,250
back to this later these are so-called

1041
00:40:03,250 --> 00:40:06,280
gossiping like algorithms that have

1042
00:40:06,280 --> 00:40:08,230
extremely good properties if you truly

1043
00:40:08,230 --> 00:40:10,600
pick a random node to exchange

1044
00:40:10,600 --> 00:40:12,940
information with then for example if you

1045
00:40:12,940 --> 00:40:18,310
just want to say want one single one

1046
00:40:18,310 --> 00:40:20,770
single message right let's all agree

1047
00:40:20,770 --> 00:40:23,290
that something happened you can have

1048
00:40:23,290 --> 00:40:25,780
this kind of virus like dissemination

1049
00:40:25,780 --> 00:40:28,630
that will run at exponential speed and

1050
00:40:28,630 --> 00:40:30,970
there are a lot of circumstances and

1051
00:40:30,970 --> 00:40:32,050
we're going to come back to this when we

1052
00:40:32,050 --> 00:40:33,690
talk about various networks right

1053
00:40:33,690 --> 00:40:35,590
exponential speed is very good in

1054
00:40:35,590 --> 00:40:38,530
general right that means in a

1055
00:40:38,530 --> 00:40:39,970
logarithmic number of steps everybody

1056
00:40:39,970 --> 00:40:42,310
knows about what happened now this is a

1057
00:40:42,310 --> 00:40:44,470
good logarithm hopefully as long as it

1058
00:40:44,470 --> 00:40:46,500
doesn't have a large constant okay so

1059
00:40:46,500 --> 00:40:49,870
strange as it might seem a lot of these

1060
00:40:49,870 --> 00:40:53,980
distributed systems use purely

1061
00:40:53,980 --> 00:40:55,630
randomized algorithms because they tend

1062
00:40:55,630 --> 00:40:57,160
to have a very good average behavior

1063
00:40:57,160 --> 00:41:01,180
okay well I'm gonna mention this later

1064
00:41:01,180 --> 00:41:03,460
when we actually see specifics but in

1065
00:41:03,460 --> 00:41:05,500
fact so how many of you took any class

1066
00:41:05,500 --> 00:41:07,120
that had some kind of randomization in

1067
00:41:07,120 --> 00:41:09,520
it sometimes the randomized algorithms

1068
00:41:09,520 --> 00:41:11,080
are taught a little bit as part of the

1069
00:41:11,080 --> 00:41:14,590
algorithms class right there is even

1070
00:41:14,590 --> 00:41:16,210
weirder stuff with pseudo random but

1071
00:41:16,210 --> 00:41:16,960
that's a different story

1072
00:41:16,960 --> 00:41:20,640
okay well you'll see some in this class

1073
00:41:20,640 --> 00:41:23,440
the important thing is a relatively

1074
00:41:23,440 --> 00:41:24,790
small amount of information for each of

1075
00:41:24,790 --> 00:41:26,830
the peers together with that complicated

1076
00:41:26,830 --> 00:41:28,300
theory will guarantee that you have good

1077
00:41:28,300 --> 00:41:30,190
properties in the global system okay and

1078
00:41:30,190 --> 00:41:32,370
such an unstructured peer-to-peer

1079
00:41:32,370 --> 00:41:34,360
protocol could be this one right with

1080
00:41:34,360 --> 00:41:36,130
the poor small pool mode and this one

1081
00:41:36,130 --> 00:41:37,660
actually stabilizes quite fast if you

1082
00:41:37,660 --> 00:41:40,000
share randomly with random neighbors I

1083
00:41:40,000 --> 00:41:42,400
share half my information very fast

1084
00:41:42,400 --> 00:41:44,319
know about very faraway neighbors and

1085
00:41:44,319 --> 00:41:46,450
that's a good thing because if you know

1086
00:41:46,450 --> 00:41:48,010
about very faraway neighbors you're

1087
00:41:48,010 --> 00:41:49,779
gonna get that logarithmic behavior in

1088
00:41:49,779 --> 00:41:51,700
terms of finding something very fast

1089
00:41:51,700 --> 00:41:54,309
okay now when everything else fails if

1090
00:41:54,309 --> 00:41:55,690
you want to find something you can do

1091
00:41:55,690 --> 00:41:57,369
something that's considered to be the

1092
00:41:57,369 --> 00:41:59,349
worst of the worst which is flood the

1093
00:41:59,349 --> 00:42:01,150
network you essentially start yelling

1094
00:42:01,150 --> 00:42:04,359
who knows where this file is you tell

1095
00:42:04,359 --> 00:42:05,890
all your neighbors your neighbors tell

1096
00:42:05,890 --> 00:42:07,270
other neighbors are the neighbors are

1097
00:42:07,270 --> 00:42:08,829
the neighbors eventually gets to who has

1098
00:42:08,829 --> 00:42:11,859
the file right that's one way to get

1099
00:42:11,859 --> 00:42:14,500
information of course to get that one

1100
00:42:14,500 --> 00:42:16,930
piece of the information with where is

1101
00:42:16,930 --> 00:42:19,359
that file you basically have to disrupt

1102
00:42:19,359 --> 00:42:21,730
everybody everybody has to know about

1103
00:42:21,730 --> 00:42:23,410
the fact that you are looking for sir

1104
00:42:23,410 --> 00:42:24,609
tonight and now of course if the item

1105
00:42:24,609 --> 00:42:25,809
it's important maybe that's warranted

1106
00:42:25,809 --> 00:42:28,390
it's like an Amber Alert right it you

1107
00:42:28,390 --> 00:42:30,279
just I've already start seeing it can

1108
00:42:30,279 --> 00:42:32,200
get worse than this right because the

1109
00:42:32,200 --> 00:42:33,819
yelling can propagate in waves and keep

1110
00:42:33,819 --> 00:42:37,089
on bouncing of the of the network if you

1111
00:42:37,089 --> 00:42:38,980
have no way to cut down on those

1112
00:42:38,980 --> 00:42:40,869
messages and make them die somehow they

1113
00:42:40,869 --> 00:42:44,650
can live for a very long time right for

1114
00:42:44,650 --> 00:42:45,970
example if you wouldn't remember that

1115
00:42:45,970 --> 00:42:48,760
you've seen this message if you send a

1116
00:42:48,760 --> 00:42:50,410
message to your neighbors your neighbors

1117
00:42:50,410 --> 00:42:51,640
to their neighbors but you're part of

1118
00:42:51,640 --> 00:42:52,869
the neighbors of they get you back the

1119
00:42:52,869 --> 00:42:54,390
message and everybody keeps on just

1120
00:42:54,390 --> 00:42:56,260
propagating this message is more and

1121
00:42:56,260 --> 00:42:57,910
more and more and more and more and then

1122
00:42:57,910 --> 00:43:00,460
everybody doesn't do I mean all the

1123
00:43:00,460 --> 00:43:01,869
messages in the system are just about

1124
00:43:01,869 --> 00:43:03,849
this flooding this doesn't even have a

1125
00:43:03,849 --> 00:43:06,609
way to die out right now you might say

1126
00:43:06,609 --> 00:43:09,490
hey we must do something about it but

1127
00:43:09,490 --> 00:43:10,839
sometimes it's tricky to do something

1128
00:43:10,839 --> 00:43:11,230
about it

1129
00:43:11,230 --> 00:43:13,599
so one particular solution for example

1130
00:43:13,599 --> 00:43:15,220
I'm just throwing it out there is some

1131
00:43:15,220 --> 00:43:18,010
sort of a time to live right for any

1132
00:43:18,010 --> 00:43:20,109
such message you include the counter and

1133
00:43:20,109 --> 00:43:22,150
you say this mashes message can be

1134
00:43:22,150 --> 00:43:24,839
retransmitted only let's say 20 times

1135
00:43:24,839 --> 00:43:27,940
every time the message gets sent again

1136
00:43:27,940 --> 00:43:30,190
you decrease the counter the counter

1137
00:43:30,190 --> 00:43:31,960
gets to zero you stop sending a message

1138
00:43:31,960 --> 00:43:34,420
that will make the message die it cannot

1139
00:43:34,420 --> 00:43:37,750
propagate more than 20 runs right now of

1140
00:43:37,750 --> 00:43:39,910
course the question is why 20 what's

1141
00:43:39,910 --> 00:43:43,529
special about 20 or how many do you need

1142
00:43:43,529 --> 00:43:47,200
right so then you have a solution but

1143
00:43:47,200 --> 00:43:48,730
it's only a partial solution because it

1144
00:43:48,730 --> 00:43:50,349
depends on magic parameters that you

1145
00:43:50,349 --> 00:43:51,339
still have to figure out how you are

1146
00:43:51,339 --> 00:43:52,869
actually select so if that number is

1147
00:43:52,869 --> 00:43:54,430
large you're guaranteed that everybody

1148
00:43:54,430 --> 00:43:55,599
hears about it with

1149
00:43:55,599 --> 00:43:57,549
very high probability but then you do

1150
00:43:57,549 --> 00:43:59,109
more disruption in the network if the

1151
00:43:59,109 --> 00:44:01,839
number is small then maybe there are

1152
00:44:01,839 --> 00:44:03,910
circumstances in which the right people

1153
00:44:03,910 --> 00:44:05,229
don't hear about it but at least you

1154
00:44:05,229 --> 00:44:06,579
don't have so many messages going in the

1155
00:44:06,579 --> 00:44:08,589
system right it's very very tricky

1156
00:44:08,589 --> 00:44:10,539
business that needs to be treated

1157
00:44:10,539 --> 00:44:13,749
possibly theoretically right by the way

1158
00:44:13,749 --> 00:44:17,170
the theory on such systems gets

1159
00:44:17,170 --> 00:44:19,390
extremely hard even under idealized

1160
00:44:19,390 --> 00:44:22,869
modelling conditions extremely hard even

1161
00:44:22,869 --> 00:44:24,910
partial success in doing some sort of a

1162
00:44:24,910 --> 00:44:26,589
theory and definitely distributed

1163
00:44:26,589 --> 00:44:27,970
systems literature is not the realm of

1164
00:44:27,970 --> 00:44:29,589
this right even partial successes can

1165
00:44:29,589 --> 00:44:33,039
it's it's kind of celebrated as a big

1166
00:44:33,039 --> 00:44:36,789
breakthrough right all right so this is

1167
00:44:36,789 --> 00:44:41,400
one such protocol right then you can

1168
00:44:41,400 --> 00:44:43,509
write some interesting research papers

1169
00:44:43,509 --> 00:44:44,950
in which you argue that even if you do

1170
00:44:44,950 --> 00:44:46,329
an instruction protocol if you put an

1171
00:44:46,329 --> 00:44:47,739
extra condition it starts to look like a

1172
00:44:47,739 --> 00:44:49,180
structured protocol right I'm not gonna

1173
00:44:49,180 --> 00:44:50,890
go into all the details but if you're a

1174
00:44:50,890 --> 00:44:52,569
little bit picky about how you select

1175
00:44:52,569 --> 00:44:53,920
that half information that you send to

1176
00:44:53,920 --> 00:44:55,509
somebody else and you have some sort of

1177
00:44:55,509 --> 00:44:57,970
a measure for that then you can actually

1178
00:44:57,970 --> 00:45:00,519
see that from everybody with everybody

1179
00:45:00,519 --> 00:45:02,229
connection that the very instruction

1180
00:45:02,229 --> 00:45:04,269
network might look like you can get more

1181
00:45:04,269 --> 00:45:05,559
and more structure than in the end look

1182
00:45:05,559 --> 00:45:07,420
very structured if you just have a

1183
00:45:07,420 --> 00:45:09,460
particular preference in how you select

1184
00:45:09,460 --> 00:45:10,989
the neighbors you want to retain over

1185
00:45:10,989 --> 00:45:14,499
time so the network can go from very

1186
00:45:14,499 --> 00:45:16,479
intertwined to a lot more structured

1187
00:45:16,479 --> 00:45:20,499
just by being picky about what

1188
00:45:20,499 --> 00:45:21,969
information you keep so it's kind of an

1189
00:45:21,969 --> 00:45:23,710
interesting result that says you don't

1190
00:45:23,710 --> 00:45:25,089
really need to be structured from the

1191
00:45:25,089 --> 00:45:26,289
beginning all you need to do is add

1192
00:45:26,289 --> 00:45:28,529
pickiness to unstructured and it will

1193
00:45:28,529 --> 00:45:31,660
form some sort of a structure now if you

1194
00:45:31,660 --> 00:45:33,369
think about this kind of a network then

1195
00:45:33,369 --> 00:45:34,989
it's not fantastic because to get from

1196
00:45:34,989 --> 00:45:36,609
one end to another for example this guy

1197
00:45:36,609 --> 00:45:38,950
wants to find a file that's here and you

1198
00:45:38,950 --> 00:45:42,099
have to go through many many hops to get

1199
00:45:42,099 --> 00:45:45,039
there so an interesting question would

1200
00:45:45,039 --> 00:45:46,630
be hey I might want to have structure

1201
00:45:46,630 --> 00:45:49,809
network but some some other links how

1202
00:45:49,809 --> 00:45:51,219
many such links would I need to be able

1203
00:45:51,219 --> 00:45:52,960
to traverse the network fast that's a

1204
00:45:52,960 --> 00:45:54,190
theoretical question and there are some

1205
00:45:54,190 --> 00:45:56,079
interesting solutions to that okay but

1206
00:45:56,079 --> 00:45:57,460
all them hard I mean just gonna mention

1207
00:45:57,460 --> 00:45:58,569
this kind of results you're not gonna

1208
00:45:58,569 --> 00:46:00,400
start proving anything about random

1209
00:46:00,400 --> 00:46:03,520
networks in this class okay

1210
00:46:03,520 --> 00:46:06,610
all right now I said peer-to-peer

1211
00:46:06,610 --> 00:46:09,610
network the peer in peer to peer means

1212
00:46:09,610 --> 00:46:12,640
everybody its equal right but sometimes

1213
00:46:12,640 --> 00:46:14,980
it's healthy to have some peers to be

1214
00:46:14,980 --> 00:46:16,600
more equal than some other peers and

1215
00:46:16,600 --> 00:46:18,490
that's where the super peers come into

1216
00:46:18,490 --> 00:46:18,880
play

1217
00:46:18,880 --> 00:46:21,280
I want you to understand that a lot of

1218
00:46:21,280 --> 00:46:25,840
this solutions came from evolution of

1219
00:46:25,840 --> 00:46:28,360
particular particular systems right so I

1220
00:46:28,360 --> 00:46:30,160
think casa initially was simply a

1221
00:46:30,160 --> 00:46:31,810
peer-to-peer system but then they figure

1222
00:46:31,810 --> 00:46:34,090
out that certain people have much better

1223
00:46:34,090 --> 00:46:35,740
computers than other people and much

1224
00:46:35,740 --> 00:46:37,270
better network connections which is even

1225
00:46:37,270 --> 00:46:40,420
more important hey why don't we give

1226
00:46:40,420 --> 00:46:42,220
more work to guys that have better

1227
00:46:42,220 --> 00:46:43,600
network connections and better computers

1228
00:46:43,600 --> 00:46:46,090
and that's how super peers popped up so

1229
00:46:46,090 --> 00:46:47,619
the super peer simply can take a lot

1230
00:46:47,619 --> 00:46:49,600
more of the load and it depends on if

1231
00:46:49,600 --> 00:46:51,190
you want to do some sort of a gradation

1232
00:46:51,190 --> 00:46:52,750
between peers and super peers or simply

1233
00:46:52,750 --> 00:46:54,580
have two classes peers and super peers

1234
00:46:54,580 --> 00:46:56,440
now the trick is the following to still

1235
00:46:56,440 --> 00:46:57,790
have enough super peers so you can't

1236
00:46:57,790 --> 00:47:00,250
shut down the system original Napster

1237
00:47:00,250 --> 00:47:02,860
system had essentially 14 super peers

1238
00:47:02,860 --> 00:47:04,119
which were the name servers and

1239
00:47:04,119 --> 00:47:05,650
everybody else was a norm up here the

1240
00:47:05,650 --> 00:47:07,150
super peers were just serving the Dames

1241
00:47:07,150 --> 00:47:13,210
you shut down oh and the Napster company

1242
00:47:13,210 --> 00:47:15,250
owned all of the 14 so all you need to

1243
00:47:15,250 --> 00:47:17,380
do is get some sort of a court

1244
00:47:17,380 --> 00:47:20,050
injunction since that company has to

1245
00:47:20,050 --> 00:47:21,760
work in a legitimate way you shut down

1246
00:47:21,760 --> 00:47:23,230
the entire system now if you have a

1247
00:47:23,230 --> 00:47:26,650
million super peers and a hundred

1248
00:47:26,650 --> 00:47:27,880
million normal peers

1249
00:47:27,880 --> 00:47:29,830
good luck chasing down the million super

1250
00:47:29,830 --> 00:47:32,290
peers right it's like fights you kill

1251
00:47:32,290 --> 00:47:36,700
one more spring up right and this

1252
00:47:36,700 --> 00:47:38,140
literally happens with a lot of these

1253
00:47:38,140 --> 00:47:42,760
services okay now these are interesting

1254
00:47:42,760 --> 00:47:44,920
ideas that we might want to use even

1255
00:47:44,920 --> 00:47:46,810
beyond just normal peer-to-peer systems

1256
00:47:46,810 --> 00:47:48,250
right if you want to have this super

1257
00:47:48,250 --> 00:47:50,830
resilient networks for example an

1258
00:47:50,830 --> 00:47:52,000
interesting thing I didn't talk about

1259
00:47:52,000 --> 00:47:54,040
and it's interesting in itself is this

1260
00:47:54,040 --> 00:47:57,190
idea of sensor networks so we keep on

1261
00:47:57,190 --> 00:47:58,030
talking about Computers Computers

1262
00:47:58,030 --> 00:47:59,590
Computers you have computers everywhere

1263
00:47:59,590 --> 00:48:01,480
computers mostly consume data and

1264
00:48:01,480 --> 00:48:03,240
somebody has to produce the data somehow

1265
00:48:03,240 --> 00:48:06,970
how well we can all say yes it's come

1266
00:48:06,970 --> 00:48:08,470
comes from databases and somebody

1267
00:48:08,470 --> 00:48:09,850
bothered to put it there but another way

1268
00:48:09,850 --> 00:48:12,580
to say is something measures it and

1269
00:48:12,580 --> 00:48:14,680
those are the sensors and that can be a

1270
00:48:14,680 --> 00:48:16,079
very interesting

1271
00:48:16,079 --> 00:48:19,859
kind of endeavor you don't device for

1272
00:48:19,859 --> 00:48:22,289
example by the way this is coming to

1273
00:48:22,289 --> 00:48:24,269
some extent it's already there but it's

1274
00:48:24,269 --> 00:48:26,130
possibly coming in a big way because

1275
00:48:26,130 --> 00:48:27,779
they're gonna run out of gizmos to put

1276
00:48:27,779 --> 00:48:30,209
in a cell phone soon your cell phone is

1277
00:48:30,209 --> 00:48:34,729
gonna be able to measure temperature

1278
00:48:34,729 --> 00:48:38,670
pressure possibly even your heart rate

1279
00:48:38,670 --> 00:48:40,199
when you hold it and stuff like that

1280
00:48:40,199 --> 00:48:42,779
right and then suddenly you you can ask

1281
00:48:42,779 --> 00:48:44,910
questions like what am I gonna do with

1282
00:48:44,910 --> 00:48:45,839
that information that will be

1283
00:48:45,839 --> 00:48:47,099
interesting by the way that could

1284
00:48:47,099 --> 00:48:48,599
provide very interesting data for

1285
00:48:48,599 --> 00:48:49,739
weather prediction the problem with

1286
00:48:49,739 --> 00:48:51,150
weather prediction now is you measure it

1287
00:48:51,150 --> 00:48:53,309
in too few points as you solve

1288
00:48:53,309 --> 00:48:54,479
differential equations and you simply

1289
00:48:54,479 --> 00:48:57,839
don't have enough points to know exactly

1290
00:48:57,839 --> 00:49:01,499
what's actually happening now if you

1291
00:49:01,499 --> 00:49:03,329
find any sort of application for example

1292
00:49:03,329 --> 00:49:06,089
for something a co-op a cool thing I

1293
00:49:06,089 --> 00:49:07,949
would like my phone to do is to tell me

1294
00:49:07,949 --> 00:49:10,079
what the outside temperature is because

1295
00:49:10,079 --> 00:49:12,179
that helps me determine whether I'm

1296
00:49:12,179 --> 00:49:14,880
justifying you feeling cold or something

1297
00:49:14,880 --> 00:49:17,449
is wrong with me that I'm feeling cold

1298
00:49:17,449 --> 00:49:20,249
this is why we have thermometers really

1299
00:49:20,249 --> 00:49:22,140
right because I mean otherwise you would

1300
00:49:22,140 --> 00:49:23,939
feel it's just kind of to calibrate our

1301
00:49:23,939 --> 00:49:28,380
own senses to say I would love to have a

1302
00:49:28,380 --> 00:49:31,319
thermometer and hydrometer on my cell

1303
00:49:31,319 --> 00:49:33,179
phone to know it's humid outside or not

1304
00:49:33,179 --> 00:49:35,219
humid outside now the moment I have the

1305
00:49:35,219 --> 00:49:37,439
convenience of having those sensors you

1306
00:49:37,439 --> 00:49:39,119
could gather that information that's a

1307
00:49:39,119 --> 00:49:41,630
sense of network okay or you could ask

1308
00:49:41,630 --> 00:49:44,609
could I deploy such sensor networks for

1309
00:49:44,609 --> 00:49:48,900
example I do some kind of kind of ocean

1310
00:49:48,900 --> 00:49:51,150
related research I'm gonna throw a

1311
00:49:51,150 --> 00:49:53,160
million sensors measure things with a

1312
00:49:53,160 --> 00:49:54,719
million sensors getting the information

1313
00:49:54,719 --> 00:49:57,359
and then well I can imagine things I

1314
00:49:57,359 --> 00:49:59,249
could do with it how many people solve

1315
00:49:59,249 --> 00:50:01,619
this movie but must be at least 15 years

1316
00:50:01,619 --> 00:50:04,079
old I think it was called twister with

1317
00:50:04,079 --> 00:50:06,209
those tornado researchers they had those

1318
00:50:06,209 --> 00:50:09,119
little flying sensors and the big

1319
00:50:09,119 --> 00:50:10,769
highlight of the movie is when they all

1320
00:50:10,769 --> 00:50:13,559
fly and they gather their data right

1321
00:50:13,559 --> 00:50:15,989
that's a sense of network oh that's

1322
00:50:15,989 --> 00:50:17,880
exactly a sense of network now

1323
00:50:17,880 --> 00:50:19,469
interesting question you have those

1324
00:50:19,469 --> 00:50:22,949
little gizmos they measure whatever they

1325
00:50:22,949 --> 00:50:26,939
measure and they send information now

1326
00:50:26,939 --> 00:50:28,319
that's the part that starts to become

1327
00:50:28,319 --> 00:50:29,860
troublesome what exactly does it mean

1328
00:50:29,860 --> 00:50:33,100
and information right who gets the

1329
00:50:33,100 --> 00:50:35,710
information how do you capture that

1330
00:50:35,710 --> 00:50:37,750
information how do you store it how do

1331
00:50:37,750 --> 00:50:39,850
you talk so fast with those sensors now

1332
00:50:39,850 --> 00:50:42,220
if it's just ten of them you can imagine

1333
00:50:42,220 --> 00:50:43,900
what computer not having problems

1334
00:50:43,900 --> 00:50:47,650
keeping up with it but let's say a

1335
00:50:47,650 --> 00:50:49,300
decent number to really get the shape of

1336
00:50:49,300 --> 00:50:50,830
a tornado would be a hundred thousand

1337
00:50:50,830 --> 00:50:51,550
right

1338
00:50:51,550 --> 00:50:53,770
how could you capture sensor information

1339
00:50:53,770 --> 00:50:57,450
from a hundred thousand little devices

1340
00:50:57,810 --> 00:51:00,730
so that's really the realm of the sensor

1341
00:51:00,730 --> 00:51:02,410
networks it starts to be very

1342
00:51:02,410 --> 00:51:04,270
problematic because you can't have each

1343
00:51:04,270 --> 00:51:05,770
of those a hundred thousand talking to

1344
00:51:05,770 --> 00:51:08,260
some sort of a server right you simply

1345
00:51:08,260 --> 00:51:11,740
won't be able to keep up with so many

1346
00:51:11,740 --> 00:51:14,680
things flying around I mean at least a

1347
00:51:14,680 --> 00:51:16,810
hundred thousand tcp/ip connections or

1348
00:51:16,810 --> 00:51:18,730
some sort I mean imagine writing a

1349
00:51:18,730 --> 00:51:20,290
program that can keep track of a hundred

1350
00:51:20,290 --> 00:51:22,420
thousand separate Internet connections

1351
00:51:22,420 --> 00:51:25,330
in a single machine what are you going

1352
00:51:25,330 --> 00:51:27,430
to do build a server farm to take it

1353
00:51:27,430 --> 00:51:28,990
with you in the truck that has the sense

1354
00:51:28,990 --> 00:51:32,620
of right so the idea is good it's sound

1355
00:51:32,620 --> 00:51:33,790
they aren't going to have all kinds of

1356
00:51:33,790 --> 00:51:35,560
small sensors to track how the tornado

1357
00:51:35,560 --> 00:51:38,470
moves but then the distribution systems

1358
00:51:38,470 --> 00:51:40,980
component comes into play and how do you

1359
00:51:40,980 --> 00:51:43,180
how do you get information from them a

1360
00:51:43,180 --> 00:51:44,410
particularly interesting idea we're

1361
00:51:44,410 --> 00:51:48,010
going to explore this right is you're

1362
00:51:48,010 --> 00:51:49,360
going to have sensors talk to sensors

1363
00:51:49,360 --> 00:51:50,800
and somehow propagate information and

1364
00:51:50,800 --> 00:51:52,450
then you have a lot less information to

1365
00:51:52,450 --> 00:51:53,800
get from the system because ultimately

1366
00:51:53,800 --> 00:51:55,120
what you want is some sort of knowledge

1367
00:51:55,120 --> 00:51:56,230
that is not a hundred thousand

1368
00:51:56,230 --> 00:51:57,910
measurements may be a lot less will be

1369
00:51:57,910 --> 00:51:59,890
good right so since the networks are all

1370
00:51:59,890 --> 00:52:01,600
about how do you cut down on the amount

1371
00:52:01,600 --> 00:52:03,760
of communication not to mention that you

1372
00:52:03,760 --> 00:52:06,010
need over the powerful antennas for all

1373
00:52:06,010 --> 00:52:07,330
those a hundred thousand guys to talk

1374
00:52:07,330 --> 00:52:10,840
long distance with the server if they

1375
00:52:10,840 --> 00:52:14,530
have peers nodes sends a note close

1376
00:52:14,530 --> 00:52:15,970
enough to them within few meters you can

1377
00:52:15,970 --> 00:52:18,130
use much much weaker signals and get

1378
00:52:18,130 --> 00:52:20,020
stuff done right so that's potentially

1379
00:52:20,020 --> 00:52:21,490
an interesting application of

1380
00:52:21,490 --> 00:52:23,770
distributed systems ideas that we need

1381
00:52:23,770 --> 00:52:25,630
to pursue and that's a particular kind

1382
00:52:25,630 --> 00:52:27,250
of if you want hardware architecture

1383
00:52:27,250 --> 00:52:30,790
okay in those circumstances you might in

1384
00:52:30,790 --> 00:52:33,160
fact use some sort of a super nodes and

1385
00:52:33,160 --> 00:52:36,010
normal nodes right the super nodes for

1386
00:52:36,010 --> 00:52:38,590
example could have a lot more battery

1387
00:52:38,590 --> 00:52:40,150
life a big problem in sensor networks is

1388
00:52:40,150 --> 00:52:42,250
battery life or you could even use the

1389
00:52:42,250 --> 00:52:43,210
following kind of tree

1390
00:52:43,210 --> 00:52:46,599
in which you switch roles but at certain

1391
00:52:46,599 --> 00:52:48,339
moments of time certain sensor networks

1392
00:52:48,339 --> 00:52:49,510
are super nodes and then they switch

1393
00:52:49,510 --> 00:52:50,859
roles because they drain their Barre

1394
00:52:50,859 --> 00:52:54,190
right by the way the battery life is one

1395
00:52:54,190 --> 00:52:55,270
of the biggest problems with any

1396
00:52:55,270 --> 00:52:59,859
autonomous sensor networks you can make

1397
00:52:59,859 --> 00:53:01,420
them small but then you have to put

1398
00:53:01,420 --> 00:53:02,619
small batteries in them

1399
00:53:02,619 --> 00:53:06,280
that's a trouble ok good so lots of

1400
00:53:06,280 --> 00:53:08,560
interesting ideas another one is this

1401
00:53:08,560 --> 00:53:11,050
idea of edge servers edge server systems

1402
00:53:11,050 --> 00:53:14,740
ok so when we think about we have

1403
00:53:14,740 --> 00:53:16,330
clients and servers even if you have the

1404
00:53:16,330 --> 00:53:18,490
traditional architecture right the

1405
00:53:18,490 --> 00:53:19,780
problem is the server can be on the

1406
00:53:19,780 --> 00:53:22,240
other side of the of the planet and then

1407
00:53:22,240 --> 00:53:23,950
you have large latencies and whatnot and

1408
00:53:23,950 --> 00:53:25,599
a particular idea this is what hekima

1409
00:53:25,599 --> 00:53:27,430
does I threw it up in the air in the

1410
00:53:27,430 --> 00:53:30,010
introduction was to play servers much

1411
00:53:30,010 --> 00:53:31,270
closer to where the client is in

1412
00:53:31,270 --> 00:53:34,810
particularly at the ISP right now those

1413
00:53:34,810 --> 00:53:36,609
kind of servers are called edge servers

1414
00:53:36,609 --> 00:53:40,089
right because they really live close to

1415
00:53:40,089 --> 00:53:42,580
your actual connection to the Internet

1416
00:53:42,580 --> 00:53:44,920
you don't then have to traverse the

1417
00:53:44,920 --> 00:53:47,680
Internet and then what you have to make

1418
00:53:47,680 --> 00:53:50,349
sure is that all the edge servers

1419
00:53:50,349 --> 00:53:52,599
maintain consistent and maybe

1420
00:53:52,599 --> 00:53:55,540
synchronizing some way information so

1421
00:53:55,540 --> 00:53:56,859
for example you're serving the pages of

1422
00:53:56,859 --> 00:53:58,960
CNN and by the way CNN it's a Akamai

1423
00:53:58,960 --> 00:54:01,480
client ok so it's not a hypothetical

1424
00:54:01,480 --> 00:54:04,680
scenario so you're serving pages for CNN

1425
00:54:04,680 --> 00:54:06,820
Akamai's algorithms essentially make

1426
00:54:06,820 --> 00:54:08,830
sure that when somebody at CNN publishes

1427
00:54:08,830 --> 00:54:10,599
an article they propagate fast enough to

1428
00:54:10,599 --> 00:54:12,280
all the edge servers and then the

1429
00:54:12,280 --> 00:54:13,839
connection is very fast all you all you

1430
00:54:13,839 --> 00:54:15,640
do is you talk to the server but the

1431
00:54:15,640 --> 00:54:18,099
distributed system aspect of it right it

1432
00:54:18,099 --> 00:54:22,380
is how do you keep these servers in sync

1433
00:54:22,380 --> 00:54:24,820
that's even more problematic for example

1434
00:54:24,820 --> 00:54:26,470
when you're actually doing transactions

1435
00:54:26,470 --> 00:54:29,530
your Amazon and you have multiple such a

1436
00:54:29,530 --> 00:54:32,470
server so people can buy very fast which

1437
00:54:32,470 --> 00:54:36,369
means you sell more right you have to

1438
00:54:36,369 --> 00:54:37,480
ask for today and I want you to

1439
00:54:37,480 --> 00:54:39,040
understand that it's important to

1440
00:54:39,040 --> 00:54:40,390
realize there are two aspects you can

1441
00:54:40,390 --> 00:54:41,650
have two different solutions even though

1442
00:54:41,650 --> 00:54:43,240
it looks like a single system so one

1443
00:54:43,240 --> 00:54:45,130
aspect when it comes to let's say

1444
00:54:45,130 --> 00:54:48,640
ecommerce is what items are available

1445
00:54:48,640 --> 00:54:50,650
and what's the description for the items

1446
00:54:50,650 --> 00:54:52,450
and the other aspect is the actual

1447
00:54:52,450 --> 00:54:55,730
financial transaction right

1448
00:54:55,730 --> 00:54:59,150
now how many of you noticed bought an

1449
00:54:59,150 --> 00:55:01,790
item from Amazon for Amazon later to

1450
00:55:01,790 --> 00:55:03,710
send an email say we apologize by we

1451
00:55:03,710 --> 00:55:05,990
don't actually have the item I bought a

1452
00:55:05,990 --> 00:55:07,339
keyboard then this happened to me and

1453
00:55:07,339 --> 00:55:08,180
it's very annoying

1454
00:55:08,180 --> 00:55:10,280
well they suggested that the sellers of

1455
00:55:10,280 --> 00:55:13,099
my habit but I already wasted the week

1456
00:55:13,099 --> 00:55:17,180
right so you might ask hey amazon has

1457
00:55:17,180 --> 00:55:18,980
computers computers have storage how

1458
00:55:18,980 --> 00:55:20,329
comes Amazon didn't know that they don't

1459
00:55:20,329 --> 00:55:21,220
have the keyboard

1460
00:55:21,220 --> 00:55:23,930
well because Amazon is cheating and they

1461
00:55:23,930 --> 00:55:26,180
are cheating with distributed system

1462
00:55:26,180 --> 00:55:30,070
techniques in order to alleviate

1463
00:55:30,070 --> 00:55:32,240
otherwise significant problems I would

1464
00:55:32,240 --> 00:55:34,579
have so in particular they did not have

1465
00:55:34,579 --> 00:55:37,700
keep ad server synchronized right they

1466
00:55:37,700 --> 00:55:40,579
said hey last night we had 10 keyboards

1467
00:55:40,579 --> 00:55:45,290
that's fine they're going to tell the ad

1468
00:55:45,290 --> 00:55:48,530
servers that we have 10 keyboards and

1469
00:55:48,530 --> 00:55:51,619
then all the time keyboards were bought

1470
00:55:51,619 --> 00:55:53,599
without information propagating to the

1471
00:55:53,599 --> 00:55:55,820
ad servers I jump on one of the answer

1472
00:55:55,820 --> 00:55:56,810
by the way you don't even know if there

1473
00:55:56,810 --> 00:55:58,130
is an ad server because you have to go

1474
00:55:58,130 --> 00:56:02,300
through your ISP right and you think you

1475
00:56:02,300 --> 00:56:05,750
go to amazon.com but you go to the ad

1476
00:56:05,750 --> 00:56:07,700
server where your ISP sends you that's

1477
00:56:07,700 --> 00:56:09,579
all networking magic right you just

1478
00:56:09,579 --> 00:56:12,980
routing and whatnot right that server

1479
00:56:12,980 --> 00:56:14,660
said oh yeah it sounds good

1480
00:56:14,660 --> 00:56:16,819
give me the money actually did the

1481
00:56:16,819 --> 00:56:20,060
financial transaction later when they

1482
00:56:20,060 --> 00:56:21,770
took the trouble to put together all of

1483
00:56:21,770 --> 00:56:23,690
this information they realize oops we

1484
00:56:23,690 --> 00:56:25,700
don't actually have the item no problem

1485
00:56:25,700 --> 00:56:27,680
we designed the system to be like this

1486
00:56:27,680 --> 00:56:31,460
so it's fault tolerant by mildly

1487
00:56:31,460 --> 00:56:33,410
annoying the user which is okay if you

1488
00:56:33,410 --> 00:56:35,869
know what you're doing right so what the

1489
00:56:35,869 --> 00:56:38,960
lesson here is you really have to take

1490
00:56:38,960 --> 00:56:40,730
tough choices even if you're one of the

1491
00:56:40,730 --> 00:56:42,200
big guys you might think hey there are

1492
00:56:42,200 --> 00:56:43,640
perfect solutions and all these big guys

1493
00:56:43,640 --> 00:56:45,170
use perfect solutions no they're using

1494
00:56:45,170 --> 00:56:46,760
perfect solutions that are designed in a

1495
00:56:46,760 --> 00:56:49,250
very specific way right in this case

1496
00:56:49,250 --> 00:56:51,829
they know that if they don't do it too

1497
00:56:51,829 --> 00:56:55,069
often it's not a disaster as long as

1498
00:56:55,069 --> 00:56:57,109
they give me my money back if they don't

1499
00:56:57,109 --> 00:56:58,670
give me my money back it's a disaster

1500
00:56:58,670 --> 00:57:00,380
but if they give me my money back it's

1501
00:57:00,380 --> 00:57:04,089
not such a big tragedy right so

1502
00:57:04,089 --> 00:57:06,410
sometimes and this is really the key to

1503
00:57:06,410 --> 00:57:08,030
successful distributed systems it's

1504
00:57:08,030 --> 00:57:09,019
perfectly happy

1505
00:57:09,019 --> 00:57:12,049
fine not to have perfect information or

1506
00:57:12,049 --> 00:57:14,359
to have a little bit of lying going

1507
00:57:14,359 --> 00:57:16,849
around in there as long as it's done in

1508
00:57:16,849 --> 00:57:19,009
a controlled way right and Amazon it's a

1509
00:57:19,009 --> 00:57:20,479
master in this by the way this is why

1510
00:57:20,479 --> 00:57:22,189
they are so big and so successful they

1511
00:57:22,189 --> 00:57:23,959
just found the right compromises this

1512
00:57:23,959 --> 00:57:26,509
being one of them okay instead of

1513
00:57:26,509 --> 00:57:28,369
insisting on really knowing what you

1514
00:57:28,369 --> 00:57:30,679
have in your inventory and then having a

1515
00:57:30,679 --> 00:57:33,019
very big distribution systems problem

1516
00:57:33,019 --> 00:57:35,809
and potentially limiting how many people

1517
00:57:35,809 --> 00:57:37,009
can do transactions at the same time

1518
00:57:37,009 --> 00:57:38,390
they say hey no let's just do

1519
00:57:38,390 --> 00:57:39,799
transactions and we'll figure out a

1520
00:57:39,799 --> 00:57:42,140
solution out of it right now that's not

1521
00:57:42,140 --> 00:57:44,959
true for the financial transaction if I

1522
00:57:44,959 --> 00:57:46,219
would do financial transaction and

1523
00:57:46,219 --> 00:57:47,719
sometimes it would fail that would be a

1524
00:57:47,719 --> 00:57:49,819
complete disaster by the way there are

1525
00:57:49,819 --> 00:57:51,499
strict regulations along these lines

1526
00:57:51,499 --> 00:57:56,749
right you cannot play with money without

1527
00:57:56,749 --> 00:57:58,399
giving very strict guarantees about how

1528
00:57:58,399 --> 00:58:00,819
what's happening with that money right

1529
00:58:00,819 --> 00:58:04,279
so what Amazon can do Visa and

1530
00:58:04,279 --> 00:58:07,669
MasterCard cannot do so for this I'm a

1531
00:58:07,669 --> 00:58:09,109
sucker this is a tremendous problem

1532
00:58:09,109 --> 00:58:11,359
because when they say the credit card

1533
00:58:11,359 --> 00:58:13,249
transaction went through then it went

1534
00:58:13,249 --> 00:58:14,869
through and it's a real transaction

1535
00:58:14,869 --> 00:58:17,089
right at the level of that financial

1536
00:58:17,089 --> 00:58:18,679
transaction you can't cheat but the

1537
00:58:18,679 --> 00:58:20,449
important thing is you focus now the

1538
00:58:20,449 --> 00:58:23,269
need to have a transaction everywhere -

1539
00:58:23,269 --> 00:58:24,649
the need to have a transaction only on

1540
00:58:24,649 --> 00:58:26,989
the very last step just the money

1541
00:58:26,989 --> 00:58:29,859
exchange by the way that's why

1542
00:58:29,859 --> 00:58:32,899
MasterCard and Visa charge at least two

1543
00:58:32,899 --> 00:58:35,749
point something percent because they do

1544
00:58:35,749 --> 00:58:37,519
this really fast sophisticated

1545
00:58:37,519 --> 00:58:39,019
distributed system their guarantee is

1546
00:58:39,019 --> 00:58:40,729
that you have real transactions right

1547
00:58:40,729 --> 00:58:42,619
and by the way this is why PayPal made a

1548
00:58:42,619 --> 00:58:44,269
lot of money because if you go out how

1549
00:58:44,269 --> 00:58:46,339
to do play their own game in the same

1550
00:58:46,339 --> 00:58:49,969
space okay so to some extent PayPal

1551
00:58:49,969 --> 00:58:53,269
founders deserve the big wealth they got

1552
00:58:53,269 --> 00:58:55,189
because they managed to build a really

1553
00:58:55,189 --> 00:58:56,599
good distributed systems it's all about

1554
00:58:56,599 --> 00:58:58,849
distributed systems at this point okay

1555
00:58:58,849 --> 00:59:00,829
good so let's read the story with this

1556
00:59:00,829 --> 00:59:03,139
ad servers where you play some how they

1557
00:59:03,139 --> 00:59:05,630
communicate what they do this is why

1558
00:59:05,630 --> 00:59:08,299
people use Akamai and not in-house ad

1559
00:59:08,299 --> 00:59:09,829
hoc solutions this is why people pay a

1560
00:59:09,829 --> 00:59:11,359
lot of money to Akamai because they

1561
00:59:11,359 --> 00:59:12,529
already have figured out a lot of these

1562
00:59:12,529 --> 00:59:16,249
things okay or this is why the big

1563
00:59:16,249 --> 00:59:19,039
companies like Amazon designed their own

1564
00:59:19,039 --> 00:59:20,269
solution because it's potentially

1565
00:59:20,269 --> 00:59:21,930
expensive to go with somebody else's

1566
00:59:21,930 --> 00:59:23,520
to convey themselves want to make a lot

1567
00:59:23,520 --> 00:59:26,730
of money right essentially what this

1568
00:59:26,730 --> 00:59:28,559
means is that virtually anybody who's

1569
00:59:28,559 --> 00:59:30,059
medium-sized or larger it's interested

1570
00:59:30,059 --> 00:59:31,380
in some sort of distributed system

1571
00:59:31,380 --> 00:59:33,809
implementation right so it's a big

1572
00:59:33,809 --> 00:59:35,279
market for people that understand

1573
00:59:35,279 --> 00:59:37,230
disability systems because they all have

1574
00:59:37,230 --> 00:59:39,539
to face this issues at the minimum how

1575
00:59:39,539 --> 00:59:41,670
do you use certain software Authority

1576
00:59:41,670 --> 00:59:43,980
does some disability systems right to

1577
00:59:43,980 --> 00:59:45,059
know how to fix something you have to

1578
00:59:45,059 --> 00:59:46,020
understand how it works

1579
00:59:46,020 --> 00:59:48,299
otherwise it's basically just taking a

1580
00:59:48,299 --> 00:59:49,859
hammer and knocking it on the side right

1581
00:59:49,859 --> 00:59:53,730
it's not particularly good okay so this

1582
00:59:53,730 --> 00:59:56,819
is a schematic of for example of torrent

1583
00:59:56,819 --> 00:59:58,710
BitTorrent and some other torrent sites

1584
00:59:58,710 --> 01:00:02,039
right in which you always go through

1585
01:00:02,039 --> 01:00:03,690
some sort of web interface even if you

1586
01:00:03,690 --> 01:00:06,000
don't see it by the way most of the apps

1587
01:00:06,000 --> 01:00:07,920
have a purely web-based interface of a

1588
01:00:07,920 --> 01:00:11,490
contact through a HTTP like protocol the

1589
01:00:11,490 --> 01:00:13,529
backend server even if they are actually

1590
01:00:13,529 --> 01:00:15,470
implemented on your iPhone or Android

1591
01:00:15,470 --> 01:00:17,700
okay that's very convenient because then

1592
01:00:17,700 --> 01:00:20,099
you can unify apps on multiple platforms

1593
01:00:20,099 --> 01:00:22,559
and web interfaces within the same kind

1594
01:00:22,559 --> 01:00:26,819
of big umbrella protocol okay so you

1595
01:00:26,819 --> 01:00:29,039
somehow go to some web page with bitter

1596
01:00:29,039 --> 01:00:31,079
torrents it tells you where a torrent

1597
01:00:31,079 --> 01:00:32,640
file is a doctor and file will tell you

1598
01:00:32,640 --> 01:00:34,500
where fragments of a particular file are

1599
01:00:34,500 --> 01:00:37,710
right as long as you make this resilient

1600
01:00:37,710 --> 01:00:40,380
as in multiple nodes know about torrent

1601
01:00:40,380 --> 01:00:40,710
files

1602
01:00:40,710 --> 01:00:42,029
you saw that properties not particularly

1603
01:00:42,029 --> 01:00:44,460
centralized and the total file will

1604
01:00:44,460 --> 01:00:46,529
indicate again where fragments of the

1605
01:00:46,529 --> 01:00:48,420
file are and the important thing is and

1606
01:00:48,420 --> 01:00:50,130
this is one interesting thing especially

1607
01:00:50,130 --> 01:00:53,099
if using christian on top of this nobody

1608
01:00:53,099 --> 01:00:57,029
has the file every file is partitioned

1609
01:00:57,029 --> 01:00:58,380
into multiple pieces and you have to

1610
01:00:58,380 --> 01:01:00,390
gather pieces from multiple places it's

1611
01:01:00,390 --> 01:01:02,460
easier to replicate them to make sure

1612
01:01:02,460 --> 01:01:04,349
that copies of every fragment are on

1613
01:01:04,349 --> 01:01:06,630
multiple or multiple sites right and

1614
01:01:06,630 --> 01:01:08,789
makes for an interesting distributed

1615
01:01:08,789 --> 01:01:11,160
system in fact the torrent if you think

1616
01:01:11,160 --> 01:01:12,390
about them are some sort of a file

1617
01:01:12,390 --> 01:01:13,710
system and they are in fact a

1618
01:01:13,710 --> 01:01:15,839
distributed file system this is one big

1619
01:01:15,839 --> 01:01:17,130
topic we are going to study in this

1620
01:01:17,130 --> 01:01:18,839
class the distributed file systems and

1621
01:01:18,839 --> 01:01:21,839
one way or another all the big companies

1622
01:01:21,839 --> 01:01:23,520
have some version of this distributed

1623
01:01:23,520 --> 01:01:25,079
file system right so for example what

1624
01:01:25,079 --> 01:01:28,170
powers a lot of the services that Google

1625
01:01:28,170 --> 01:01:29,940
provides is GFS the Google file system

1626
01:01:29,940 --> 01:01:31,859
which is a distributed file system and

1627
01:01:31,859 --> 01:01:33,359
Google design some 10 years back

1628
01:01:33,359 --> 01:01:35,839
right everything eventually goes to GFS

1629
01:01:35,839 --> 01:01:40,559
for for Google at least HDFS for let's

1630
01:01:40,559 --> 01:01:42,479
say Yahoo some other distributed file

1631
01:01:42,479 --> 01:01:45,539
systems okay now to some extent

1632
01:01:45,539 --> 01:01:46,739
disability file systems are the

1633
01:01:46,739 --> 01:01:47,999
connection between the distributed

1634
01:01:47,999 --> 01:01:51,119
systems and operating systems sometimes

1635
01:01:51,119 --> 01:01:52,890
we call disability systems disability

1636
01:01:52,890 --> 01:01:54,569
operating systems even well for the last

1637
01:01:54,569 --> 01:01:55,739
10 years we kind of dropped the

1638
01:01:55,739 --> 01:01:57,660
operating part right it's really

1639
01:01:57,660 --> 01:01:59,789
distributed systems when it comes to

1640
01:01:59,789 --> 01:02:01,799
file systems that's really the realm of

1641
01:02:01,799 --> 01:02:03,779
operating systems and by providing a

1642
01:02:03,779 --> 01:02:04,920
distributed file system you're getting

1643
01:02:04,920 --> 01:02:06,749
close to what an operating system will

1644
01:02:06,749 --> 01:02:09,749
do ok but large discussion later to be

1645
01:02:09,749 --> 01:02:10,859
done about this but essentially be

1646
01:02:10,859 --> 01:02:12,959
torrent it's some sort of disability

1647
01:02:12,959 --> 01:02:14,249
file system arguably peer-to-peer

1648
01:02:14,249 --> 01:02:19,259
network system so that begs the

1649
01:02:19,259 --> 01:02:20,640
following interesting question what's a

1650
01:02:20,640 --> 01:02:28,589
file so what are this files so how many

1651
01:02:28,589 --> 01:02:31,469
people did not use files ever right

1652
01:02:31,469 --> 01:02:32,219
that's what I thought

1653
01:02:32,219 --> 01:02:36,509
so what's really a file so let's try to

1654
01:02:36,509 --> 01:02:39,359
go deep ok let's make it simple I'm

1655
01:02:39,359 --> 01:02:41,430
running an operating system let's say

1656
01:02:41,430 --> 01:02:43,650
Linux and I create a file

1657
01:02:43,650 --> 01:02:46,859
what exactly is created where what what

1658
01:02:46,859 --> 01:02:52,170
happens I mean it's some sort of bit

1659
01:02:52,170 --> 01:02:54,269
somewhere right it has to be built we

1660
01:02:54,269 --> 01:02:58,199
know that the hard drive stores bits but

1661
01:02:58,199 --> 01:03:00,089
they have to be organized on how right

1662
01:03:00,089 --> 01:03:02,130
it's all about organizing things well it

1663
01:03:02,130 --> 01:03:03,359
turns out that it's a little bit hard to

1664
01:03:03,359 --> 01:03:05,759
say what the file is and where it is and

1665
01:03:05,759 --> 01:03:07,759
how it is how many of you know about

1666
01:03:07,759 --> 01:03:11,999
in-memory file systems so what's an

1667
01:03:11,999 --> 01:03:15,269
in-memory file system alright so the

1668
01:03:15,269 --> 01:03:16,529
files are written in my memory says

1669
01:03:16,529 --> 01:03:17,609
they're not even associated with the

1670
01:03:17,609 --> 01:03:20,190
hard drives anymore right why would you

1671
01:03:20,190 --> 01:03:25,259
do this in memory file systems we have

1672
01:03:25,259 --> 01:03:26,719
access to the memory why do I need files

1673
01:03:26,719 --> 01:03:29,190
so it's clear that FasTracks is but not

1674
01:03:29,190 --> 01:03:30,479
clear of that I need files why would I

1675
01:03:30,479 --> 01:03:35,069
do files but there is no persistence

1676
01:03:35,069 --> 01:03:36,749
because it's an in-memory file system

1677
01:03:36,749 --> 01:03:38,069
and it's never written under this with

1678
01:03:38,069 --> 01:03:39,779
no persistence literally no in-memory

1679
01:03:39,779 --> 01:03:41,430
file systems don't usually have

1680
01:03:41,430 --> 01:03:44,089
persistence

1681
01:03:45,130 --> 01:03:48,100
right so we're getting somewhere okay so

1682
01:03:48,100 --> 01:03:50,020
the file is some sort of a universal

1683
01:03:50,020 --> 01:03:52,630
abstraction that everybody likes to use

1684
01:03:52,630 --> 01:03:56,140
right so that's the problem there it has

1685
01:03:56,140 --> 01:03:57,250
nothing to do necessarily with

1686
01:03:57,250 --> 01:03:58,780
persistency because we throw away

1687
01:03:58,780 --> 01:04:00,370
persistency for the in-memory version it

1688
01:04:00,370 --> 01:04:01,840
has to do with the abstraction we are

1689
01:04:01,840 --> 01:04:03,040
used to the abstraction we like the

1690
01:04:03,040 --> 01:04:04,510
abstraction and all the programs use it

1691
01:04:04,510 --> 01:04:07,180
right so this is why this DVD file

1692
01:04:07,180 --> 01:04:08,950
systems are suddenly extremely important

1693
01:04:08,950 --> 01:04:11,110
because any program written to deal with

1694
01:04:11,110 --> 01:04:13,060
files could use a distributed file

1695
01:04:13,060 --> 01:04:15,880
system and be a distributed system right

1696
01:04:15,880 --> 01:04:19,360
if you can distribute files all you need

1697
01:04:19,360 --> 01:04:20,770
is to read and write files and you can

1698
01:04:20,770 --> 01:04:23,050
have a distributed system so this might

1699
01:04:23,050 --> 01:04:25,240
sound weird but one way for me to talk

1700
01:04:25,240 --> 01:04:28,480
to you is I write a part of a file

1701
01:04:28,480 --> 01:04:30,520
that's a distributed file the CV file

1702
01:04:30,520 --> 01:04:32,320
system somehow gets the information to

1703
01:04:32,320 --> 01:04:34,360
you it's not actively pushed to you but

1704
01:04:34,360 --> 01:04:35,200
if you know where to look

1705
01:04:35,200 --> 01:04:37,060
you suddenly talk to me you talk to me

1706
01:04:37,060 --> 01:04:40,570
for a file right how many people know

1707
01:04:40,570 --> 01:04:43,390
what the pipe is in an operating system

1708
01:04:43,390 --> 01:04:52,330
what's a pipe it really is a channel of

1709
01:04:52,330 --> 01:04:54,460
communication that's made to look like a

1710
01:04:54,460 --> 01:04:56,560
file so it's again the file abstraction

1711
01:04:56,560 --> 01:04:59,680
so a pipe it's a file except that it

1712
01:04:59,680 --> 01:05:01,210
never touches the hard drive and it's

1713
01:05:01,210 --> 01:05:03,790
really fast but it's still a file so

1714
01:05:03,790 --> 01:05:05,560
it's all about files so to a large

1715
01:05:05,560 --> 01:05:08,530
extent any modern operating system it's

1716
01:05:08,530 --> 01:05:14,110
about files right by by the way so this

1717
01:05:14,110 --> 01:05:16,480
might sound strange but you can even

1718
01:05:16,480 --> 01:05:19,870
access the network through files these

1719
01:05:19,870 --> 01:05:21,880
are the sockets especially named sockets

1720
01:05:21,880 --> 01:05:27,820
all right how does that work no it

1721
01:05:27,820 --> 01:05:29,830
starts to be very very weird there's

1722
01:05:29,830 --> 01:05:31,270
definitely no persistency whatsoever

1723
01:05:31,270 --> 01:05:34,570
because if nobody waits at the other end

1724
01:05:34,570 --> 01:05:35,980
first of all nothing gets transmitted or

1725
01:05:35,980 --> 01:05:37,360
if it waits and froze the information

1726
01:05:37,360 --> 01:05:39,160
nothing ever made it in a persistent way

1727
01:05:39,160 --> 01:05:42,750
anyway so what's that all about well

1728
01:05:42,750 --> 01:05:45,820
it's piggybacking on one abstraction

1729
01:05:45,820 --> 01:05:47,830
which is the file system a completely

1730
01:05:47,830 --> 01:05:48,820
different thing this kind of

1731
01:05:48,820 --> 01:05:50,410
communication and it's in fact very

1732
01:05:50,410 --> 01:05:52,630
convenient because in that way so files

1733
01:05:52,630 --> 01:05:55,780
allow you to make a certain resource

1734
01:05:55,780 --> 01:05:58,240
available to other parties by just

1735
01:05:58,240 --> 01:05:58,810
agreeing

1736
01:05:58,810 --> 01:06:00,790
some sort of a name it's literally can

1737
01:06:00,790 --> 01:06:03,040
be used as a directory service right so

1738
01:06:03,040 --> 01:06:04,420
in that circumstance the file it so

1739
01:06:04,420 --> 01:06:06,730
called a special file the file just

1740
01:06:06,730 --> 01:06:08,350
contains information about the fact that

1741
01:06:08,350 --> 01:06:09,670
I'm really a socket

1742
01:06:09,670 --> 01:06:11,830
and this is the information about the

1743
01:06:11,830 --> 01:06:13,840
socket the IP address I'm listening to

1744
01:06:13,840 --> 01:06:15,280
who I'm talking to and whatever else

1745
01:06:15,280 --> 01:06:18,580
right but then suddenly all of that

1746
01:06:18,580 --> 01:06:20,140
information that's about the socket has

1747
01:06:20,140 --> 01:06:22,180
a name and as far as the user is

1748
01:06:22,180 --> 01:06:25,300
concerned it looks like a file you open

1749
01:06:25,300 --> 01:06:26,410
the file you send something on the file

1750
01:06:26,410 --> 01:06:29,860
right so distribute file systems can be

1751
01:06:29,860 --> 01:06:32,380
regarded as some sort of viewer

1752
01:06:32,380 --> 01:06:34,270
connection as a file and put something

1753
01:06:34,270 --> 01:06:35,680
at one end and it pops up at the other

1754
01:06:35,680 --> 01:06:37,510
end in a very simple form with

1755
01:06:37,510 --> 01:06:39,250
persistency thrown in the middle to make

1756
01:06:39,250 --> 01:06:43,390
sure we can dissociate in time the

1757
01:06:43,390 --> 01:06:45,160
writing of the file and the reading of

1758
01:06:45,160 --> 01:06:47,770
the file right so you can write it and I

1759
01:06:47,770 --> 01:06:50,680
came back later and read it as long as I

1760
01:06:50,680 --> 01:06:52,630
come after it's fine it varies position

1761
01:06:52,630 --> 01:06:54,010
see if you don't have persistency you

1762
01:06:54,010 --> 01:06:56,380
can still have the file abstraction but

1763
01:06:56,380 --> 01:06:58,090
we have to be there at the same time I

1764
01:06:58,090 --> 01:07:00,430
start writing but I block until you

1765
01:07:00,430 --> 01:07:02,710
start reading and then I write you read

1766
01:07:02,710 --> 01:07:04,330
and then we are done we can still

1767
01:07:04,330 --> 01:07:05,740
pretend it was a file by the way this is

1768
01:07:05,740 --> 01:07:07,630
exactly what the pipe is all about right

1769
01:07:07,630 --> 01:07:09,700
the pipes in a traditional operating

1770
01:07:09,700 --> 01:07:11,770
systems require the reader and the

1771
01:07:11,770 --> 01:07:12,930
writer to be there at the same time

1772
01:07:12,930 --> 01:07:15,250
otherwise one blocks until the other one

1773
01:07:15,250 --> 01:07:18,430
shows up and does something right if you

1774
01:07:18,430 --> 01:07:20,590
think about persistency that allows you

1775
01:07:20,590 --> 01:07:23,140
to write now and somebody comes later to

1776
01:07:23,140 --> 01:07:24,910
pick it up so allows you more

1777
01:07:24,910 --> 01:07:26,710
flexibility in what to do with that

1778
01:07:26,710 --> 01:07:29,380
so persistency might be there to just

1779
01:07:29,380 --> 01:07:31,510
break this requirement to be there at

1780
01:07:31,510 --> 01:07:32,920
the same time and not necessarily to

1781
01:07:32,920 --> 01:07:35,020
keep the information forever lab for

1782
01:07:35,020 --> 01:07:36,730
example temporary files with temporary

1783
01:07:36,730 --> 01:07:39,190
results would work like this right so

1784
01:07:39,190 --> 01:07:41,230
that's kind of an interesting view of

1785
01:07:41,230 --> 01:07:43,450
what files might or might not do by the

1786
01:07:43,450 --> 01:07:45,490
way in most modern operating systems

1787
01:07:45,490 --> 01:07:47,440
every resource essentially is viewed

1788
01:07:47,440 --> 01:07:50,110
through this kind of a file abstraction

1789
01:07:50,110 --> 01:07:53,800
including raw devices right

1790
01:07:53,800 --> 01:07:57,820
for example even when it comes to to

1791
01:07:57,820 --> 01:08:00,670
file it up so how many of you know about

1792
01:08:00,670 --> 01:08:03,370
file systems within a file how do you

1793
01:08:03,370 --> 01:08:04,810
created a file system within a normal

1794
01:08:04,810 --> 01:08:09,970
file well probably all of you if you use

1795
01:08:09,970 --> 01:08:11,860
some sort of visualize virtualization

1796
01:08:11,860 --> 01:08:12,580
with VMware

1797
01:08:12,580 --> 01:08:14,500
but for example a particularly

1798
01:08:14,500 --> 01:08:15,790
interesting approach to this would be

1799
01:08:15,790 --> 01:08:18,609
this kind of encrypted file systems so

1800
01:08:18,609 --> 01:08:20,229
to crib for example it's a very nice

1801
01:08:20,229 --> 01:08:21,760
program runs on all the platforms that

1802
01:08:21,760 --> 01:08:23,680
allows you to create an entire file

1803
01:08:23,680 --> 01:08:26,500
system within a file it's encrypted so

1804
01:08:26,500 --> 01:08:28,510
when you shut it down it's basically

1805
01:08:28,510 --> 01:08:32,380
sealed by a complex presumably very hard

1806
01:08:32,380 --> 01:08:34,300
to break encryption system then take

1807
01:08:34,300 --> 01:08:35,979
that file you put it on USB stick you

1808
01:08:35,979 --> 01:08:37,479
transport it somewhere you send it in

1809
01:08:37,479 --> 01:08:39,040
your an email you pick it up at the

1810
01:08:39,040 --> 01:08:41,350
other end and you mount the file as a

1811
01:08:41,350 --> 01:08:43,510
full file system so what exactly does

1812
01:08:43,510 --> 01:08:47,260
that mean that means the operating

1813
01:08:47,260 --> 01:08:48,760
system has the possibility to create

1814
01:08:48,760 --> 01:08:51,430
this view of files with subtraction of

1815
01:08:51,430 --> 01:08:54,189
files on top of any set of raw bits so

1816
01:08:54,189 --> 01:08:55,450
ultimately there is not much of a

1817
01:08:55,450 --> 01:08:56,800
distinction between a normal file that

1818
01:08:56,800 --> 01:08:58,899
keeps bits and a bit interpretation

1819
01:08:58,899 --> 01:09:01,479
that's a file system versus the original

1820
01:09:01,479 --> 01:09:03,160
implementation of the file system on top

1821
01:09:03,160 --> 01:09:04,660
of the row bits that live on the medium

1822
01:09:04,660 --> 01:09:07,330
there is literally no difference in any

1823
01:09:07,330 --> 01:09:10,120
decent operating system and that

1824
01:09:10,120 --> 01:09:11,500
immediately suggests that the raw device

1825
01:09:11,500 --> 01:09:13,180
itself could be could be viewed as a

1826
01:09:13,180 --> 01:09:14,500
file and that's true in all modern

1827
01:09:14,500 --> 01:09:17,339
operating systems right slash dev slash

1828
01:09:17,339 --> 01:09:22,569
SDA it's in fact a file that allows you

1829
01:09:22,569 --> 01:09:26,500
access to the raw device right now why

1830
01:09:26,500 --> 01:09:29,200
is this important well this is important

1831
01:09:29,200 --> 01:09:30,520
especially if you want to do tricks

1832
01:09:30,520 --> 01:09:33,939
related to high performance file systems

1833
01:09:33,939 --> 01:09:35,380
because they are an abstraction they add

1834
01:09:35,380 --> 01:09:38,710
their own good and bad on top of it they

1835
01:09:38,710 --> 01:09:40,870
make things more convenient but at the

1836
01:09:40,870 --> 01:09:42,100
same time they could rob you of some

1837
01:09:42,100 --> 01:09:44,290
performance for example caching can

1838
01:09:44,290 --> 01:09:47,050
happen that it's undesirable now this

1839
01:09:47,050 --> 01:09:47,950
sounds strange

1840
01:09:47,950 --> 01:09:50,950
why would caching be undesirable because

1841
01:09:50,950 --> 01:09:52,779
and I'm not going to go into details but

1842
01:09:52,779 --> 01:09:54,820
I might later in the class sometimes

1843
01:09:54,820 --> 01:09:56,830
it's faster to write to some of the

1844
01:09:56,830 --> 01:10:00,070
modern devices then to cache we have

1845
01:10:00,070 --> 01:10:01,660
especially some of the SSD hard drives

1846
01:10:01,660 --> 01:10:06,490
can be extremely fast right so it's

1847
01:10:06,490 --> 01:10:08,920
strange but it might actually be faster

1848
01:10:08,920 --> 01:10:11,650
to ask the raw device to write a piece

1849
01:10:11,650 --> 01:10:13,330
of the memory than to have the processor

1850
01:10:13,330 --> 01:10:14,980
copy that thing from one part of the

1851
01:10:14,980 --> 01:10:16,630
memory to another part of the memory and

1852
01:10:16,630 --> 01:10:18,610
I've seen it happening ok and those

1853
01:10:18,610 --> 01:10:19,810
circumcised you definitely want to

1854
01:10:19,810 --> 01:10:21,910
access the raw device so knowing that

1855
01:10:21,910 --> 01:10:23,710
you have this hierarchy of the files and

1856
01:10:23,710 --> 01:10:25,420
you have this idea of a virtual file

1857
01:10:25,420 --> 01:10:26,110
system that

1858
01:10:26,110 --> 01:10:28,090
simulates the abstraction at multiple

1859
01:10:28,090 --> 01:10:29,860
levels of a file right I become

1860
01:10:29,860 --> 01:10:33,340
important if you want to obtain such a

1861
01:10:33,340 --> 01:10:36,070
performance also provides this ultimate

1862
01:10:36,070 --> 01:10:38,170
convenience all you have to worry about

1863
01:10:38,170 --> 01:10:40,420
it so far like interface which is what

1864
01:10:40,420 --> 01:10:43,420
open file write read and write parts of

1865
01:10:43,420 --> 01:10:44,620
the file if you have random access to it

1866
01:10:44,620 --> 01:10:47,110
pipes don't write close the file may be

1867
01:10:47,110 --> 01:10:49,830
create file but it's a very simple

1868
01:10:49,830 --> 01:10:52,900
abstraction right very easy to write

1869
01:10:52,900 --> 01:10:54,970
applications you just open files close

1870
01:10:54,970 --> 01:10:55,330
files

1871
01:10:55,330 --> 01:10:59,010
write distributed file systems allow

1872
01:10:59,010 --> 01:11:02,200
simple application writing on and to

1873
01:11:02,200 --> 01:11:03,550
have a distributed system like things

1874
01:11:03,550 --> 01:11:05,650
know they might not be the everything

1875
01:11:05,650 --> 01:11:07,840
and you see more and more systems now

1876
01:11:07,840 --> 01:11:09,520
that use other paradigms for example

1877
01:11:09,520 --> 01:11:10,990
like this remote procedure calls or some

1878
01:11:10,990 --> 01:11:13,540
some sort of a message exchange of some

1879
01:11:13,540 --> 01:11:17,230
some form oh by the way how could you

1880
01:11:17,230 --> 01:11:19,120
implement the message queues using these

1881
01:11:19,120 --> 01:11:21,370
three file systems a very important

1882
01:11:21,370 --> 01:11:23,470
question is how one abstraction can

1883
01:11:23,470 --> 01:11:25,510
emulate other abstractions could you do

1884
01:11:25,510 --> 01:11:30,520
this what is very simple you just do

1885
01:11:30,520 --> 01:11:31,870
file manipulation right if I want to

1886
01:11:31,870 --> 01:11:33,640
send your message we designate a special

1887
01:11:33,640 --> 01:11:35,350
file that's a distributed file so if I

1888
01:11:35,350 --> 01:11:36,730
write it in the information is

1889
01:11:36,730 --> 01:11:38,320
propagated and I simply write to the

1890
01:11:38,320 --> 01:11:40,690
file I'm simply gonna put the message at

1891
01:11:40,690 --> 01:11:41,980
the end of the file and you delete

1892
01:11:41,980 --> 01:11:43,270
messages from the beginning of the file

1893
01:11:43,270 --> 01:11:44,590
now it's not particularly efficient

1894
01:11:44,590 --> 01:11:47,320
maybe but it can easily be done with the

1895
01:11:47,320 --> 01:11:50,130
file distributed file system abstraction

1896
01:11:50,130 --> 01:11:53,470
okay now there are many distributed

1897
01:11:53,470 --> 01:11:55,690
systems some of them very popular some

1898
01:11:55,690 --> 01:11:58,510
of them esoteric right almost all of

1899
01:11:58,510 --> 01:12:00,640
them try to have a certain band a

1900
01:12:00,640 --> 01:12:03,010
certain different approach and cater to

1901
01:12:03,010 --> 01:12:05,620
a different segment okay some in this

1902
01:12:05,620 --> 01:12:07,630
class I'm not really gonna talk about

1903
01:12:07,630 --> 01:12:10,210
this is better than this one I'm mostly

1904
01:12:10,210 --> 01:12:12,550
going to look at details of particular

1905
01:12:12,550 --> 01:12:14,800
systems what is it that they want wanted

1906
01:12:14,800 --> 01:12:17,410
to to cater cater to which specific

1907
01:12:17,410 --> 01:12:20,050
scenario what are some of the

1908
01:12:20,050 --> 01:12:21,910
interesting features of that particular

1909
01:12:21,910 --> 01:12:24,040
system how does the technology work in

1910
01:12:24,040 --> 01:12:26,050
the hope that when you're faced with a

1911
01:12:26,050 --> 01:12:29,680
particular problem to be solved right

1912
01:12:29,680 --> 01:12:32,320
you can find your way to sort out

1913
01:12:32,320 --> 01:12:34,660
through what kind of things you could

1914
01:12:34,660 --> 01:12:37,690
try to do to to cause your own solution

1915
01:12:37,690 --> 01:12:39,219
now

1916
01:12:39,219 --> 01:12:41,239
something that the shipping systems

1917
01:12:41,239 --> 01:12:43,639
community looked at for a very long time

1918
01:12:43,639 --> 01:12:46,789
and I would say this is a core human

1919
01:12:46,789 --> 01:12:49,249
obsession to find the ultimate solution

1920
01:12:49,249 --> 01:12:52,460
right so an interesting question to ask

1921
01:12:52,460 --> 01:12:57,079
in any in any endeavor you want is is

1922
01:12:57,079 --> 01:12:59,480
there such a thing as the best solution

1923
01:12:59,480 --> 01:13:00,260
for this problem

1924
01:13:00,260 --> 01:13:03,219
end of story no questions asked right

1925
01:13:03,219 --> 01:13:05,150
from a practical point of view that

1926
01:13:05,150 --> 01:13:06,829
would be fantastic because if you find

1927
01:13:06,829 --> 01:13:09,349
the best solution you're done you know

1928
01:13:09,349 --> 01:13:10,940
that's the best solution we all use the

1929
01:13:10,940 --> 01:13:12,829
best solution let's move on and do

1930
01:13:12,829 --> 01:13:18,260
something else right it's really

1931
01:13:18,260 --> 01:13:19,699
equivalent in theoretical computer

1932
01:13:19,699 --> 01:13:21,260
science to finding those matching lower

1933
01:13:21,260 --> 01:13:23,019
bounds you found them you're done

1934
01:13:23,019 --> 01:13:25,099
athletes from a big o-notation point of

1935
01:13:25,099 --> 01:13:25,670
view okay

1936
01:13:25,670 --> 01:13:27,769
well it turns out and in DCB systems

1937
01:13:27,769 --> 01:13:33,050
that is definitely no best solution it's

1938
01:13:33,050 --> 01:13:35,239
all compromises and that's true in

1939
01:13:35,239 --> 01:13:37,969
almost everything right a particular

1940
01:13:37,969 --> 01:13:40,219
company is my lead to a different kind

1941
01:13:40,219 --> 01:13:42,199
of solution all right

1942
01:13:42,199 --> 01:13:43,610
a particular solution might be better

1943
01:13:43,610 --> 01:13:46,360
under certain circumstances for example

1944
01:13:46,360 --> 01:13:49,039
you might not need any full tolerance

1945
01:13:49,039 --> 01:13:50,480
whatsoever if your system is reliable

1946
01:13:50,480 --> 01:13:51,739
and I mentioned the fact that some

1947
01:13:51,739 --> 01:13:53,420
systems are just extremely reliable

1948
01:13:53,420 --> 01:13:56,659
right local networks are extremely

1949
01:13:56,659 --> 01:13:58,099
reliable you don't need to assume that

1950
01:13:58,099 --> 01:13:59,570
you're gonna have losses on the network

1951
01:13:59,570 --> 01:14:02,329
you simply don't right for example if

1952
01:14:02,329 --> 01:14:03,650
you don't get a reply within 10

1953
01:14:03,650 --> 01:14:04,969
milliseconds you know that the other guy

1954
01:14:04,969 --> 01:14:08,659
it's in big trouble in a local in a

1955
01:14:08,659 --> 01:14:10,670
local network and those assumptions can

1956
01:14:10,670 --> 01:14:12,289
actually be used to great effect to have

1957
01:14:12,289 --> 01:14:16,010
much faster systems right if you have to

1958
01:14:16,010 --> 01:14:18,829
send a message across the globe right to

1959
01:14:18,829 --> 01:14:20,749
some other server you're virtually

1960
01:14:20,749 --> 01:14:22,219
guaranteed that something will happen

1961
01:14:22,219 --> 01:14:24,019
in most of them is masked by tcp/ip this

1962
01:14:24,019 --> 01:14:27,110
is why you don't notice it right how do

1963
01:14:27,110 --> 01:14:30,440
you have done this you try to go to a

1964
01:14:30,440 --> 01:14:31,969
website it and it doesn't work you press

1965
01:14:31,969 --> 01:14:36,590
refresh and it instantly goes right one

1966
01:14:36,590 --> 01:14:38,179
second I mean it was the same Internet

1967
01:14:38,179 --> 01:14:40,760
right you can't assume that the internet

1968
01:14:40,760 --> 01:14:43,039
was very bad and then half a second

1969
01:14:43,039 --> 01:14:45,139
later I press refresh is very good so

1970
01:14:45,139 --> 01:14:46,249
what happened there well it's a very

1971
01:14:46,249 --> 01:14:48,860
complex system nobody quite knows right

1972
01:14:48,860 --> 01:14:51,170
so sometimes and this is an interesting

1973
01:14:51,170 --> 01:14:52,300
approach we're going to see this

1974
01:14:52,300 --> 01:14:54,520
right sometimes doing retries on the

1975
01:14:54,520 --> 01:14:56,020
same activity might actually

1976
01:14:56,020 --> 01:14:57,460
significantly improve your chance to get

1977
01:14:57,460 --> 01:14:59,500
the thing done faster but then you have

1978
01:14:59,500 --> 01:15:02,080
to somehow deal with the fact that you

1979
01:15:02,080 --> 01:15:04,120
already made another request now that's

1980
01:15:04,120 --> 01:15:05,740
not a problem if you ask for a refresh

1981
01:15:05,740 --> 01:15:08,620
on a page except if you try to do a

1982
01:15:08,620 --> 01:15:10,150
financial transaction when they actually

1983
01:15:10,150 --> 01:15:12,370
put a nice banner there please do not

1984
01:15:12,370 --> 01:15:17,860
press the bottom twice right until they

1985
01:15:17,860 --> 01:15:19,150
figure out you can do that in JavaScript

1986
01:15:19,150 --> 01:15:21,430
so once the button is pressed you simply

1987
01:15:21,430 --> 01:15:22,750
change the state in JavaScript and you

1988
01:15:22,750 --> 01:15:27,000
don't let people press it again except

1989
01:15:27,000 --> 01:15:29,830
that sometimes the first click did not

1990
01:15:29,830 --> 01:15:32,740
go through right three days later people

1991
01:15:32,740 --> 01:15:34,300
call the customer service and say I

1992
01:15:34,300 --> 01:15:35,890
clicked you told me not to click again

1993
01:15:35,890 --> 01:15:39,340
now what let me look right now that's

1994
01:15:39,340 --> 01:15:41,140
not good fault tolerance in the system

1995
01:15:41,140 --> 01:15:45,640
right so now that sounds funny but it's

1996
01:15:45,640 --> 01:15:48,700
in fact one of the core issues how can I

1997
01:15:48,700 --> 01:15:51,850
mean when you can reissue the same the

1998
01:15:51,850 --> 01:15:53,440
same command again and that potentially

1999
01:15:53,440 --> 01:15:55,990
might make things go much faster right

2000
01:15:55,990 --> 01:15:58,720
and when you cannot by the way that

2001
01:15:58,720 --> 01:16:01,090
thing has a name it's called so a

2002
01:16:01,090 --> 01:16:02,920
particular activities that can be done

2003
01:16:02,920 --> 01:16:04,540
multiple times with no harm done is

2004
01:16:04,540 --> 01:16:07,330
they're called idempotent okay for

2005
01:16:07,330 --> 01:16:09,100
example give me this page it doesn't

2006
01:16:09,100 --> 01:16:11,680
matter I'd say 10 times I just ignore

2007
01:16:11,680 --> 01:16:12,850
the other ones and I display the last

2008
01:16:12,850 --> 01:16:14,620
one when you're talking about financial

2009
01:16:14,620 --> 01:16:15,850
transactions we are almost never

2010
01:16:15,850 --> 01:16:19,120
idempotent and right now maybe if you

2011
01:16:19,120 --> 01:16:21,730
asked for what's the balance of my

2012
01:16:21,730 --> 01:16:23,710
account that's fine but if you say

2013
01:16:23,710 --> 01:16:25,270
transfer 100 thousand dollars when it's

2014
01:16:25,270 --> 01:16:26,530
definitely not fine if it happens

2015
01:16:26,530 --> 01:16:30,460
multiple times right except when you can

2016
01:16:30,460 --> 01:16:32,440
cheat when could you cheat is exactly

2017
01:16:32,440 --> 01:16:35,620
the Amazon approach if you say look if

2018
01:16:35,620 --> 01:16:37,780
you allow me every now and then to do a

2019
01:16:37,780 --> 01:16:39,730
transaction but revert it within 24

2020
01:16:39,730 --> 01:16:43,300
hours but I'll go ahead and do it and if

2021
01:16:43,300 --> 01:16:44,080
you have enough money in the bank

2022
01:16:44,080 --> 01:16:45,550
account I'll do two transfers over a

2023
01:16:45,550 --> 01:16:46,900
hundred thousand overnight figure out

2024
01:16:46,900 --> 01:16:48,700
that one is wrong and put the money back

2025
01:16:48,700 --> 01:16:50,470
are you okay with that yes good let's do

2026
01:16:50,470 --> 01:16:53,650
that for most people that's not okay

2027
01:16:53,650 --> 01:16:57,550
right occasionally your bank tells you

2028
01:16:57,550 --> 01:17:00,640
that you're some something happened to

2029
01:17:00,640 --> 01:17:02,080
your bank account and is negative a

2030
01:17:02,080 --> 01:17:04,660
hundred thousand people have experienced

2031
01:17:04,660 --> 01:17:05,290
some of that stuff

2032
01:17:05,290 --> 01:17:08,530
I which essentially means what that

2033
01:17:08,530 --> 01:17:09,880
everybody's cheating one way or another

2034
01:17:09,880 --> 01:17:11,410
as long as you can resolve this

2035
01:17:11,410 --> 01:17:12,640
conflicts as long as you can

2036
01:17:12,640 --> 01:17:15,640
re-establish this kind of a consistent

2037
01:17:15,640 --> 01:17:17,410
state you might be fine

2038
01:17:17,410 --> 01:17:20,110
so some sort of ideas pop up in here is

2039
01:17:20,110 --> 01:17:23,440
this consistence thing what's consistent

2040
01:17:23,440 --> 01:17:26,460
state what does it mean to be consistent

2041
01:17:26,460 --> 01:17:28,510
we are going to bump into this later

2042
01:17:28,510 --> 01:17:30,970
right you can say for example the one

2043
01:17:30,970 --> 01:17:32,440
way to define transactions we mentioned

2044
01:17:32,440 --> 01:17:33,790
transactions last time right if begin

2045
01:17:33,790 --> 01:17:35,740
transaction do something either abort or

2046
01:17:35,740 --> 01:17:38,110
commit and the requirement for a

2047
01:17:38,110 --> 01:17:40,390
transaction to be a transaction is to

2048
01:17:40,390 --> 01:17:41,890
leave the system in a consistent state

2049
01:17:41,890 --> 01:17:43,960
but you have a problem there what's a

2050
01:17:43,960 --> 01:17:47,230
consistent state all right what could be

2051
01:17:47,230 --> 01:17:49,780
a consistent state well why is this

2052
01:17:49,780 --> 01:17:51,940
problematic because so how many of you

2053
01:17:51,940 --> 01:17:53,380
know about verification Hardware

2054
01:17:53,380 --> 01:17:55,480
verification or hard Hardware in general

2055
01:17:55,480 --> 01:17:58,030
is very hard to say if some device

2056
01:17:58,030 --> 01:18:00,010
really does what it's supposed to do

2057
01:18:00,010 --> 01:18:02,490
right

2058
01:18:02,920 --> 01:18:04,930
humans are not pretty good at designing

2059
01:18:04,930 --> 01:18:08,470
things that you really always work it's

2060
01:18:08,470 --> 01:18:11,020
always that uncertainity maybe sometimes

2061
01:18:11,020 --> 01:18:14,260
it doesn't quite work right so when it

2062
01:18:14,260 --> 01:18:15,940
works we can call that some sort of a

2063
01:18:15,940 --> 01:18:18,280
consistent state it's something that we

2064
01:18:18,280 --> 01:18:20,440
expected we predicted and anything

2065
01:18:20,440 --> 01:18:22,180
that's out of what we predict you can

2066
01:18:22,180 --> 01:18:23,860
call it inconsistent state but we have a

2067
01:18:23,860 --> 01:18:26,010
lot of problems even writing any single

2068
01:18:26,010 --> 01:18:28,900
little program that for sure it's

2069
01:18:28,900 --> 01:18:30,730
consistent so then how can you talk

2070
01:18:30,730 --> 01:18:31,930
about the distribution system that has

2071
01:18:31,930 --> 01:18:33,640
so many other things going on to be

2072
01:18:33,640 --> 01:18:35,710
actually consistent well you can always

2073
01:18:35,710 --> 01:18:37,720
try to give a definition with respect to

2074
01:18:37,720 --> 01:18:40,090
something that has less of a freedom

2075
01:18:40,090 --> 01:18:43,650
right you say your program is consistent

2076
01:18:43,650 --> 01:18:46,300
right if it does exactly what the

2077
01:18:46,300 --> 01:18:47,590
program that running on one machine

2078
01:18:47,590 --> 01:18:49,930
would do now we don't know if the one

2079
01:18:49,930 --> 01:18:51,550
machine program is consistent or not but

2080
01:18:51,550 --> 01:18:52,930
if I do what the one machine does at

2081
01:18:52,930 --> 01:18:56,590
least I didn't make it less consistent

2082
01:18:56,590 --> 01:18:58,210
by doing it in a distributed fashion

2083
01:18:58,210 --> 01:19:02,110
right so that's one specific trickery

2084
01:19:02,110 --> 01:19:03,730
even when you provide the right

2085
01:19:03,730 --> 01:19:05,560
definition of what you're trying to

2086
01:19:05,560 --> 01:19:07,600
achieve okay not to mention that you can

2087
01:19:07,600 --> 01:19:09,100
actually prove negative results let's

2088
01:19:09,100 --> 01:19:10,590
say you can never really achieve

2089
01:19:10,590 --> 01:19:14,500
consistency in certain ways and so on so

2090
01:19:14,500 --> 01:19:16,450
global is one such distributed system

2091
01:19:16,450 --> 01:19:18,900
right

2092
01:19:19,819 --> 01:19:21,679
one of the interesting characteristics

2093
01:19:21,679 --> 01:19:23,869
in global ears that he can do redirect

2094
01:19:23,869 --> 01:19:26,299
so why would you do redirect well an

2095
01:19:26,299 --> 01:19:28,039
interesting idea is and this is actually

2096
01:19:28,039 --> 01:19:29,479
happening with the ad servers you think

2097
01:19:29,479 --> 01:19:31,339
you're accessing some server that the

2098
01:19:31,339 --> 01:19:32,719
main CNN server that you are actually

2099
01:19:32,719 --> 01:19:34,339
accessing the site server that that's

2100
01:19:34,339 --> 01:19:37,309
placed at your ISP right so these

2101
01:19:37,309 --> 01:19:41,449
redirects are crucial for dealing with

2102
01:19:41,449 --> 01:19:44,029
large loads for example when you're

2103
01:19:44,029 --> 01:19:46,099
really going let's say to ebay right

2104
01:19:46,099 --> 01:19:47,659
your go to eBay and try to do some kind

2105
01:19:47,659 --> 01:19:50,899
of a transaction right if you don't use

2106
01:19:50,899 --> 01:19:53,359
trickery literally a single server would

2107
01:19:53,359 --> 01:19:56,299
have to serve all the requests and

2108
01:19:56,299 --> 01:19:57,609
that's problematic at so many levels

2109
01:19:57,609 --> 01:19:59,509
okay especially when you have millions

2110
01:19:59,509 --> 01:20:01,699
of people pounding on it now what's the

2111
01:20:01,699 --> 01:20:04,489
trick that large companies use but by

2112
01:20:04,489 --> 01:20:06,709
the way they might use this tricks even

2113
01:20:06,709 --> 01:20:08,689
in a geographically distributed way in

2114
01:20:08,689 --> 01:20:10,099
which they have multiples or centers in

2115
01:20:10,099 --> 01:20:11,659
multiple places they use multiple

2116
01:20:11,659 --> 01:20:13,489
servers to serve the requests I mean

2117
01:20:13,489 --> 01:20:16,219
when it comes to the HTML requests for

2118
01:20:16,219 --> 01:20:18,739
example for the web browser right most

2119
01:20:18,739 --> 01:20:19,969
of the difficulty is putting together

2120
01:20:19,969 --> 01:20:22,009
that HTML page that you get that

2121
01:20:22,009 --> 01:20:23,479
contains some information from the

2122
01:20:23,479 --> 01:20:24,919
backend database but a lot of other

2123
01:20:24,919 --> 01:20:27,199
fuzzy stuff in there okay

2124
01:20:27,199 --> 01:20:29,389
some of it is for example pictures and

2125
01:20:29,389 --> 01:20:30,679
other things that you can get in mostly

2126
01:20:30,679 --> 01:20:32,329
read-only way I don't even necessarily

2127
01:20:32,329 --> 01:20:35,059
need to to pull them from databases so

2128
01:20:35,059 --> 01:20:36,649
there is a lot of work to be done beyond

2129
01:20:36,649 --> 01:20:38,869
just the actual database database work

2130
01:20:38,869 --> 01:20:40,699
and you easily could do it on multiple

2131
01:20:40,699 --> 01:20:42,199
machines but then your problem is the

2132
01:20:42,199 --> 01:20:44,839
following is how can you give the

2133
01:20:44,839 --> 01:20:47,209
illusion of a single entry point but to

2134
01:20:47,209 --> 01:20:48,979
have multiple machines that actually

2135
01:20:48,979 --> 01:20:51,079
serve the requests and these redirects

2136
01:20:51,079 --> 01:20:53,509
are crucial right so they're really one

2137
01:20:53,509 --> 01:20:57,649
of the main mechanisms for which you can

2138
01:20:57,649 --> 01:21:00,649
in fact provide a lot of the services we

2139
01:21:00,649 --> 01:21:02,089
want to provide in distributed systems

2140
01:21:02,089 --> 01:21:03,799
some sort of fault tolerance for example

2141
01:21:03,799 --> 01:21:06,589
one of the thousand front end servers

2142
01:21:06,589 --> 01:21:08,179
hiccup so what you just don't give it

2143
01:21:08,179 --> 01:21:10,459
requests by the way when you're doing

2144
01:21:10,459 --> 01:21:12,229
that it doesn't load and then you press

2145
01:21:12,229 --> 01:21:13,789
reload then he goes through it could be

2146
01:21:13,789 --> 01:21:15,649
that you got sent to one of these

2147
01:21:15,649 --> 01:21:17,839
servers that for some reason was stuck

2148
01:21:17,839 --> 01:21:19,939
and when you do a refresh your sent to

2149
01:21:19,939 --> 01:21:21,499
another server so there's some sort of

2150
01:21:21,499 --> 01:21:23,029
component that can actually do this

2151
01:21:23,029 --> 01:21:25,279
video right now these redirects can

2152
01:21:25,279 --> 01:21:28,399
happen in multiple ways for example with

2153
01:21:28,399 --> 01:21:29,809
the actor model what you could do is

2154
01:21:29,809 --> 01:21:31,670
have a front actor that yes requires

2155
01:21:31,670 --> 01:21:33,110
he doesn't do anything except delegated

2156
01:21:33,110 --> 01:21:35,030
to another actor now of course the

2157
01:21:35,030 --> 01:21:36,230
question is how would you implement this

2158
01:21:36,230 --> 01:21:38,000
front actor now you could use Scala and

2159
01:21:38,000 --> 01:21:39,350
so on and that will take you to a

2160
01:21:39,350 --> 01:21:41,179
certain level at some point it becomes

2161
01:21:41,179 --> 01:21:45,530
overwhelming and literally the thing to

2162
01:21:45,530 --> 01:21:47,900
do would be to do it at an extremely low

2163
01:21:47,900 --> 01:21:49,280
level and this is where the networking

2164
01:21:49,280 --> 01:21:51,830
people can come in right so a lot of

2165
01:21:51,830 --> 01:21:53,929
these redirects can actually be done

2166
01:21:53,929 --> 01:21:56,300
deep in the networking stack and the

2167
01:21:56,300 --> 01:21:58,670
natural protocol by the actual router so

2168
01:21:58,670 --> 01:22:01,280
these routers are extremely fast devices

2169
01:22:01,280 --> 01:22:03,530
that can get hundreds of millions of

2170
01:22:03,530 --> 01:22:06,560
packets now in and make hundreds of

2171
01:22:06,560 --> 01:22:08,540
millions going in other directions but

2172
01:22:08,540 --> 01:22:10,190
they can do this magic with redirects

2173
01:22:10,190 --> 01:22:12,920
also how do you do that well you simply

2174
01:22:12,920 --> 01:22:14,600
replace some information in the packet

2175
01:22:14,600 --> 01:22:16,760
with other information so you want it to

2176
01:22:16,760 --> 01:22:19,719
go to the main server that has a certain

2177
01:22:19,719 --> 01:22:21,380
IP address

2178
01:22:21,380 --> 01:22:23,270
well this router can actually change the

2179
01:22:23,270 --> 01:22:24,980
IP address and send it to one of the

2180
01:22:24,980 --> 01:22:26,360
thousand other servers that can actually

2181
01:22:26,360 --> 01:22:27,590
serve it right

2182
01:22:27,590 --> 01:22:30,290
in fact the tcp/ip protocol allows you

2183
01:22:30,290 --> 01:22:31,130
to do this

2184
01:22:31,130 --> 01:22:33,410
handover for a connection you want to

2185
01:22:33,410 --> 01:22:34,820
open a connection with this guy but I

2186
01:22:34,820 --> 01:22:36,020
hand you over to another guy and then

2187
01:22:36,020 --> 01:22:37,190
you establish the connection with the

2188
01:22:37,190 --> 01:22:39,530
other guy right if this is not done at

2189
01:22:39,530 --> 01:22:42,290
the very core of the network device you

2190
01:22:42,290 --> 01:22:44,000
have no chance to do large-scale load

2191
01:22:44,000 --> 01:22:45,980
balancing so that's an extreme version

2192
01:22:45,980 --> 01:22:48,949
of this redirect but this redirection

2193
01:22:48,949 --> 01:22:50,929
happened throughout the stack at a

2194
01:22:50,929 --> 01:22:52,219
higher and higher and higher and higher

2195
01:22:52,219 --> 01:22:55,520
level including in in JavaScript so for

2196
01:22:55,520 --> 01:22:57,140
example when you install say a Linux

2197
01:22:57,140 --> 01:22:58,699
distribution right you have to pick a

2198
01:22:58,699 --> 01:23:02,540
mirror an automatic mirror selection

2199
01:23:02,540 --> 01:23:04,780
it's in fact a form of redirect right

2200
01:23:04,780 --> 01:23:08,090
because you just I mean literally this

2201
01:23:08,090 --> 01:23:09,920
you can you can do you go to the main

2202
01:23:09,920 --> 01:23:13,429
website and simply in JavaScript you

2203
01:23:13,429 --> 01:23:15,980
send a JSON object in which you say here

2204
01:23:15,980 --> 01:23:18,500
are 20 meters and then JavaScript flips

2205
01:23:18,500 --> 01:23:21,010
a coin and says I go to this mirror

2206
01:23:21,010 --> 01:23:23,989
that's a redirect happening extremely

2207
01:23:23,989 --> 01:23:26,360
high in the client as opposed to

2208
01:23:26,360 --> 01:23:28,370
happening deep in the network stack but

2209
01:23:28,370 --> 01:23:29,739
it's essentially the same mechanism

2210
01:23:29,739 --> 01:23:32,780
right it's very good for open source

2211
01:23:32,780 --> 01:23:35,120
projects that cannot buy those really

2212
01:23:35,120 --> 01:23:36,710
tough routers that can do this kind of

2213
01:23:36,710 --> 01:23:38,989
magic they are very expensive by the way

2214
01:23:38,989 --> 01:23:40,550
they gives me running into millions of

2215
01:23:40,550 --> 01:23:43,340
dollars okay I mean they're really

2216
01:23:43,340 --> 01:23:44,510
really tough

2217
01:23:44,510 --> 01:23:47,000
routers by the way this actually begs

2218
01:23:47,000 --> 01:23:48,199
the following kind of interesting

2219
01:23:48,199 --> 01:23:52,250
question if you want to let's say attack

2220
01:23:52,250 --> 01:23:56,179
one of the main websites what would you

2221
01:23:56,179 --> 01:23:58,489
attack so it's hard to overwhelm a

2222
01:23:58,489 --> 01:24:03,860
thousand front end servers all of them

2223
01:24:03,860 --> 01:24:06,019
can easily deal with ten thousand

2224
01:24:06,019 --> 01:24:08,170
simultaneous connections no problem

2225
01:24:08,170 --> 01:24:11,539
right so you're talking about 10 million

2226
01:24:11,539 --> 01:24:13,190
simultaneous connections you really need

2227
01:24:13,190 --> 01:24:15,500
10 I mean it's hard to keep them busy

2228
01:24:15,500 --> 01:24:17,119
you have to get all the computers on the

2229
01:24:17,119 --> 01:24:18,829
planet to really pound on them to to

2230
01:24:18,829 --> 01:24:21,800
really make them sweat or you could just

2231
01:24:21,800 --> 01:24:23,719
take down the front the front router

2232
01:24:23,719 --> 01:24:25,730
this one this is what happened in 2001

2233
01:24:25,730 --> 01:24:27,050
they figure out how to treat the front

2234
01:24:27,050 --> 01:24:29,780
router the one that was doing the load

2235
01:24:29,780 --> 01:24:31,219
balancing the tricks with the rewriting

2236
01:24:31,219 --> 01:24:33,739
the hijacking right if you take the main

2237
01:24:33,739 --> 01:24:36,260
guide down the whole thing is done it

2238
01:24:36,260 --> 01:24:37,699
doesn't matter that the backend servers

2239
01:24:37,699 --> 01:24:39,949
are up it's the same issue we discussed

2240
01:24:39,949 --> 01:24:42,829
with with Napster you took the name

2241
01:24:42,829 --> 01:24:44,269
servers down you took the whole thing

2242
01:24:44,269 --> 01:24:45,920
down even though the information was

2243
01:24:45,920 --> 01:24:47,179
there there was no way to get to it

2244
01:24:47,179 --> 01:24:51,019
right so in fact that main router at the

2245
01:24:51,019 --> 01:24:53,030
entrance point it's in a centralized

2246
01:24:53,030 --> 01:24:54,559
solution this is the kind of thing that

2247
01:24:54,559 --> 01:24:56,539
these ruby systems don't like disability

2248
01:24:56,539 --> 01:24:58,250
systems people don't like right so

2249
01:24:58,250 --> 01:24:59,659
interesting questions are could you not

2250
01:24:59,659 --> 01:25:01,369
have that router there and still do well

2251
01:25:01,369 --> 01:25:03,260
and right but still doing such a good

2252
01:25:03,260 --> 01:25:05,150
job I mean maybe you can have a spare

2253
01:25:05,150 --> 01:25:08,210
and you'll flip tricky stuff by the way

2254
01:25:08,210 --> 01:25:09,590
these are important issues when it comes

2255
01:25:09,590 --> 01:25:11,420
to databases as well you have databases

2256
01:25:11,420 --> 01:25:14,210
that are now essentially some cater to

2257
01:25:14,210 --> 01:25:16,460
many many many clients you still have to

2258
01:25:16,460 --> 01:25:17,630
go and do sort of some kind of

2259
01:25:17,630 --> 01:25:20,570
transactions in the database right what

2260
01:25:20,570 --> 01:25:22,130
happens if the main database goes down

2261
01:25:22,130 --> 01:25:26,059
for example if that happens for for visa

2262
01:25:26,059 --> 01:25:28,099
it's a tragedy I mean people can't buy

2263
01:25:28,099 --> 01:25:31,670
their snacks right it's cash only for

2264
01:25:31,670 --> 01:25:34,000
the entire planet that's not good I mean

2265
01:25:34,000 --> 01:25:36,349
it's possible but this is something this

2266
01:25:36,349 --> 01:25:37,400
would happen right when the whole

2267
01:25:37,400 --> 01:25:40,789
network is done right now Google being

2268
01:25:40,789 --> 01:25:43,159
down everybody gets bored the creeks are

2269
01:25:43,159 --> 01:25:44,809
much is being down and that's a tragedy

2270
01:25:44,809 --> 01:25:48,289
at least for me I don't can't have cash

2271
01:25:48,289 --> 01:25:51,130
with little right

2272
01:25:51,130 --> 01:25:54,159
right and so this is the kind of picture

2273
01:25:54,159 --> 01:25:55,270
you can imagine right you could

2274
01:25:55,270 --> 01:25:56,679
intercept at the request you can

2275
01:25:56,679 --> 01:25:58,600
intercept in the in the serve you can

2276
01:25:58,600 --> 01:25:59,889
intercept at so many different levels

2277
01:25:59,889 --> 01:26:01,929
now why is that it's not true because

2278
01:26:01,929 --> 01:26:03,580
everything can be virtualized everybody

2279
01:26:03,580 --> 01:26:06,190
can can trick and present a certain view

2280
01:26:06,190 --> 01:26:07,510
but in fact do something very different

2281
01:26:07,510 --> 01:26:11,560
right if we would really have every

2282
01:26:11,560 --> 01:26:13,300
application talk directly on the wire

2283
01:26:13,300 --> 01:26:15,370
with the raw device and this was really

2284
01:26:15,370 --> 01:26:17,110
the case about twenty something years

2285
01:26:17,110 --> 01:26:19,060
ago more about 25 with the DOS operating

2286
01:26:19,060 --> 01:26:20,920
system and things like that then there

2287
01:26:20,920 --> 01:26:24,179
is no cheating to be done right this

2288
01:26:24,179 --> 01:26:27,610
abstractions allow cheating which is

2289
01:26:27,610 --> 01:26:31,030
good in this case so a lot of what these

2290
01:26:31,030 --> 01:26:34,989
three systems would have to do right is

2291
01:26:34,989 --> 01:26:38,350
to use such virtualization and to use

2292
01:26:38,350 --> 01:26:40,960
such cheating and such this kind of

2293
01:26:40,960 --> 01:26:43,389
rewrite to mask certain kinds of

2294
01:26:43,389 --> 01:26:45,340
failures right the server is done well

2295
01:26:45,340 --> 01:26:46,600
I'll send you to another server and

2296
01:26:46,600 --> 01:26:48,010
don't even know what was the server you

2297
01:26:48,010 --> 01:26:49,210
wanted to talk to in the first place

2298
01:26:49,210 --> 01:26:50,760
because they all look the same together

2299
01:26:50,760 --> 01:26:55,900
then right so in some services if you

2300
01:26:55,900 --> 01:26:57,850
look carefully at the URL you might even

2301
01:26:57,850 --> 01:27:00,130
see where on which specific server

2302
01:27:00,130 --> 01:27:01,810
something gets stored right you go to

2303
01:27:01,810 --> 01:27:04,449
the main website you go www flickr.com

2304
01:27:04,449 --> 01:27:06,699
and then when you look at your pictures

2305
01:27:06,699 --> 01:27:08,139
maybe they are on one particular server

2306
01:27:08,139 --> 01:27:10,480
so has a weird name okay some of them

2307
01:27:10,480 --> 01:27:13,179
completely mask it from the at the low

2308
01:27:13,179 --> 01:27:16,480
level in the networking right and by the

2309
01:27:16,480 --> 01:27:19,139
way when you access your gmail account

2310
01:27:19,139 --> 01:27:23,350
the actual emails leave on some server

2311
01:27:23,350 --> 01:27:25,300
not all on the same server by the way

2312
01:27:25,300 --> 01:27:26,620
but they live on some server somewhere

2313
01:27:26,620 --> 01:27:28,449
so I mean some machine knows what your

2314
01:27:28,449 --> 01:27:31,330
email content is it's just all the magic

2315
01:27:31,330 --> 01:27:32,650
that happens behind that's of this pure

2316
01:27:32,650 --> 01:27:34,330
distribution systems magic this using

2317
01:27:34,330 --> 01:27:36,489
file systems or some other means we'll

2318
01:27:36,489 --> 01:27:38,199
find where that file is and it makes it

2319
01:27:38,199 --> 01:27:39,699
back to you and you don't know better

2320
01:27:39,699 --> 01:27:41,739
and you definitely don't care which

2321
01:27:41,739 --> 01:27:44,590
which of the million Google servers has

2322
01:27:44,590 --> 01:27:47,590
your email as long as more than one has

2323
01:27:47,590 --> 01:27:48,820
your email so it's actually a

2324
01:27:48,820 --> 01:27:56,010
fault-tolerant right so

2325
01:27:56,130 --> 01:27:58,659
one of the main things you would like to

2326
01:27:58,659 --> 01:28:00,130
do in the context of disability systems

2327
01:28:00,130 --> 01:28:03,219
is to write this adaptive software

2328
01:28:03,219 --> 01:28:05,730
adaptive is good right because let me

2329
01:28:05,730 --> 01:28:08,469
self-healing self monitoring self

2330
01:28:08,469 --> 01:28:11,650
something right that sounds fantastic

2331
01:28:11,650 --> 01:28:15,690
because if it's self let's say healing

2332
01:28:15,690 --> 01:28:18,130
very some damage to the system it fixes

2333
01:28:18,130 --> 01:28:21,040
itself right and then it's as good as as

2334
01:28:21,040 --> 01:28:22,570
new and you keep on going and never

2335
01:28:22,570 --> 01:28:24,940
shuts down okay I mentioned the fact

2336
01:28:24,940 --> 01:28:27,969
that Erikson did precisely this with

2337
01:28:27,969 --> 01:28:31,030
Erlang the first large-scale language to

2338
01:28:31,030 --> 01:28:33,219
use actors who in fact design extremely

2339
01:28:33,219 --> 01:28:36,909
reliable telco systems I mean one way to

2340
01:28:36,909 --> 01:28:38,560
do it so I mentioned this before one way

2341
01:28:38,560 --> 01:28:40,270
to do extremely reliable is to never do

2342
01:28:40,270 --> 01:28:43,510
mistakes which we have yet to see or to

2343
01:28:43,510 --> 01:28:45,639
repair any mistakes you detect and

2344
01:28:45,639 --> 01:28:48,790
repair mistakes and then right if you

2345
01:28:48,790 --> 01:28:50,889
can find the bad part cut it off and put

2346
01:28:50,889 --> 01:28:55,210
another part in place the user might not

2347
01:28:55,210 --> 01:28:57,489
even notice anything more than a slight

2348
01:28:57,489 --> 01:29:02,050
delay in providing the service okay all

2349
01:29:02,050 --> 01:29:04,090
right so some of the approaches and this

2350
01:29:04,090 --> 01:29:07,510
again it's a way to be systematic about

2351
01:29:07,510 --> 01:29:09,310
how you talk about this but I mean these

2352
01:29:09,310 --> 01:29:10,449
are basically just fancy words

2353
01:29:10,449 --> 01:29:12,040
ultimately and you can do very many

2354
01:29:12,040 --> 01:29:14,170
variations on top of this is separation

2355
01:29:14,170 --> 01:29:16,420
of concerns computational reflection and

2356
01:29:16,420 --> 01:29:18,100
component based design so the easiest

2357
01:29:18,100 --> 01:29:20,560
one is component based design so why do

2358
01:29:20,560 --> 01:29:24,070
we even do component based design well

2359
01:29:24,070 --> 01:29:27,820
the trouble is we as humans and machines

2360
01:29:27,820 --> 01:29:30,520
are worse we are not particularly good

2361
01:29:30,520 --> 01:29:32,260
at keeping track of complicated things

2362
01:29:32,260 --> 01:29:33,790
unless we break them into simpler things

2363
01:29:33,790 --> 01:29:35,260
this is what engineering is all about

2364
01:29:35,260 --> 01:29:38,020
right get an extremely complicated

2365
01:29:38,020 --> 01:29:40,900
seeing broken down into nuts and bolts

2366
01:29:40,900 --> 01:29:42,250
and how you put them together and how

2367
01:29:42,250 --> 01:29:43,810
you keep them together how you build it

2368
01:29:43,810 --> 01:29:45,730
and how you maintain it later right

2369
01:29:45,730 --> 01:29:47,650
that immediately suggests some sort of

2370
01:29:47,650 --> 01:29:50,350
components so if we would be required

2371
01:29:50,350 --> 01:29:52,300
all the time to express everything in

2372
01:29:52,300 --> 01:29:53,710
terms of transistors it would be a

2373
01:29:53,710 --> 01:29:55,420
disaster we would all just make those

2374
01:29:55,420 --> 01:29:57,219
little sirens that go with slightly

2375
01:29:57,219 --> 01:30:01,239
different sounds right but we build

2376
01:30:01,239 --> 01:30:02,889
components some of them are full slice

2377
01:30:02,889 --> 01:30:04,350
processor that just does all the magic

2378
01:30:04,350 --> 01:30:05,560
right

2379
01:30:05,560 --> 01:30:07,239
and because of that we can accomplish a

2380
01:30:07,239 --> 01:30:08,289
lot more things now

2381
01:30:08,289 --> 01:30:09,579
poor InDesign it's to some extent

2382
01:30:09,579 --> 01:30:11,979
wasteful because you're not using the

2383
01:30:11,979 --> 01:30:14,379
hardware maybe at the highest potential

2384
01:30:14,379 --> 01:30:15,699
but at the same time allows you to do

2385
01:30:15,699 --> 01:30:16,780
complicated things that otherwise you

2386
01:30:16,780 --> 01:30:20,199
wouldn't be able to accomplish right for

2387
01:30:20,199 --> 01:30:22,449
example did you ever wonder how Intel is

2388
01:30:22,449 --> 01:30:27,189
putting together those processors so

2389
01:30:27,189 --> 01:30:30,280
first of all intel has many thousands of

2390
01:30:30,280 --> 01:30:31,959
engineers that design those components

2391
01:30:31,959 --> 01:30:34,539
they were doing so for many years if you

2392
01:30:34,539 --> 01:30:35,769
think they are redesigning everything

2393
01:30:35,769 --> 01:30:38,499
from scratch every time it's impossible

2394
01:30:38,499 --> 01:30:40,149
the complexity is just out of the scale

2395
01:30:40,149 --> 01:30:41,979
plus they buy designs from other

2396
01:30:41,979 --> 01:30:44,649
companies it's like buying libraries

2397
01:30:44,649 --> 01:30:47,109
right little company in Silicon Valley

2398
01:30:47,109 --> 01:30:50,499
designs a very good floating-point unit

2399
01:30:50,499 --> 01:30:54,359
Intel will license it why it's cheaper

2400
01:30:54,359 --> 01:30:57,099
right you cannot really go back to those

2401
01:30:57,099 --> 01:30:59,889
transistors all the time ok and that

2402
01:30:59,889 --> 01:31:02,709
even at that level is component based so

2403
01:31:02,709 --> 01:31:04,449
libraries for example are one way to do

2404
01:31:04,449 --> 01:31:06,489
component based programming you simply

2405
01:31:06,489 --> 01:31:08,429
use prepackaged

2406
01:31:08,429 --> 01:31:12,159
existing framework to get the job done

2407
01:31:12,159 --> 01:31:14,769
tcp/ip it's an extreme example open

2408
01:31:14,769 --> 01:31:16,659
connection send information this is

2409
01:31:16,659 --> 01:31:18,519
extremely complex behavior going on on

2410
01:31:18,519 --> 01:31:20,679
TCP but it's a component it's a block

2411
01:31:20,679 --> 01:31:23,859
just use it right so it's obvious why a

2412
01:31:23,859 --> 01:31:25,179
component based design would actually

2413
01:31:25,179 --> 01:31:26,919
help and we if you don't do this you

2414
01:31:26,919 --> 01:31:29,010
don't get any anything accomplished ok

2415
01:31:29,010 --> 01:31:32,919
but the interesting story right the

2416
01:31:32,919 --> 01:31:34,449
importance of abstraction so I found an

2417
01:31:34,449 --> 01:31:35,919
interesting example when somebody got

2418
01:31:35,919 --> 01:31:37,349
away with an extremely low level

2419
01:31:37,349 --> 01:31:40,749
abstraction and I really admire the guy

2420
01:31:40,749 --> 01:31:43,089
right so you normally in what language

2421
01:31:43,089 --> 01:31:45,369
are most games written anybody happens

2422
01:31:45,369 --> 01:31:48,819
to know C++ or C I mean it's kind of

2423
01:31:48,819 --> 01:31:51,249
split now okay but it used to be pure C

2424
01:31:51,249 --> 01:31:56,249
but I found out about this game designer

2425
01:31:56,249 --> 01:31:59,349
that was writing games in assembly as

2426
01:31:59,349 --> 01:32:04,839
late as 2005 okay so this is a guy that

2427
01:32:04,839 --> 01:32:07,809
started with assembly on the 80

2428
01:32:07,809 --> 01:32:09,760
processor and really got into it and

2429
01:32:09,760 --> 01:32:11,619
then just went with assembly so let me

2430
01:32:11,619 --> 01:32:12,909
remember the names of the games he did

2431
01:32:12,909 --> 01:32:15,909
so he did the railroad tycoon and then

2432
01:32:15,909 --> 01:32:19,209
the rollercoaster tycoon and then

2433
01:32:19,209 --> 01:32:22,059
whatever follow ups on this so he was

2434
01:32:22,059 --> 01:32:23,829
the only programmer for the game and he

2435
01:32:23,829 --> 01:32:26,439
wrote everything in assembly x86

2436
01:32:26,439 --> 01:32:28,569
assembly fold for the later games no

2437
01:32:28,569 --> 01:32:32,499
look some people just have it so they

2438
01:32:32,499 --> 01:32:34,449
can work with crazy abstractions and get

2439
01:32:34,449 --> 01:32:36,489
the work done most people can't so they

2440
01:32:36,489 --> 01:32:38,109
have to use higher-level components to

2441
01:32:38,109 --> 01:32:39,489
get things done in high-level languages

2442
01:32:39,489 --> 01:32:44,799
right so yeah I mean the exceptions

2443
01:32:44,799 --> 01:32:46,389
maybe strengthen the rules in some of

2444
01:32:46,389 --> 01:32:48,429
these this is the only known example of

2445
01:32:48,429 --> 01:32:51,459
an assembly complex assembly written

2446
01:32:51,459 --> 01:32:51,939
game

2447
01:32:51,939 --> 01:32:54,069
and trust me two people could not have

2448
01:32:54,069 --> 01:32:56,169
written that game it's only one mind can

2449
01:32:56,169 --> 01:32:58,479
keep that craziness under control right

2450
01:32:58,479 --> 01:33:01,029
it's two people cannot ever agree how to

2451
01:33:01,029 --> 01:33:02,489
do things such a low-level

2452
01:33:02,489 --> 01:33:04,929
okay separation of concerns this is

2453
01:33:04,929 --> 01:33:06,999
gonna be happening throughout this class

2454
01:33:06,999 --> 01:33:09,099
is how can you deal with issues

2455
01:33:09,099 --> 01:33:11,859
separately right so for example how can

2456
01:33:11,859 --> 01:33:13,209
you separate the basic functionality

2457
01:33:13,209 --> 01:33:14,469
from high-level functionality like

2458
01:33:14,469 --> 01:33:16,499
redundancy fault tolerance and whatnot

2459
01:33:16,499 --> 01:33:19,629
because usually low-level functionality

2460
01:33:19,629 --> 01:33:21,849
is more in the realm of the specifics

2461
01:33:21,849 --> 01:33:23,109
the application and the high level

2462
01:33:23,109 --> 01:33:25,719
functionality potentially can be bundled

2463
01:33:25,719 --> 01:33:28,209
up in a more uniform way so then you

2464
01:33:28,209 --> 01:33:29,799
might have more Universal solutions for

2465
01:33:29,799 --> 01:33:31,389
how do you make something float tolerant

2466
01:33:31,389 --> 01:33:33,369
and to some extent those high level

2467
01:33:33,369 --> 01:33:36,609
issues are harder to do right now

2468
01:33:36,609 --> 01:33:38,079
computational refraction this is

2469
01:33:38,079 --> 01:33:40,449
somewhat of a weird one because what

2470
01:33:40,449 --> 01:33:42,309
computational refraction really means is

2471
01:33:42,309 --> 01:33:46,749
the program can ask itself what behavior

2472
01:33:46,749 --> 01:33:48,249
it has I mean what would the program ask

2473
01:33:48,249 --> 01:33:50,229
well because the program was written by

2474
01:33:50,229 --> 01:33:54,069
the programmer and it's usually not the

2475
01:33:54,069 --> 01:33:55,959
same programmer throughout the time and

2476
01:33:55,959 --> 01:33:59,229
one way to diagnose yourself or one way

2477
01:33:59,229 --> 01:34:01,239
to know as a program what's going on is

2478
01:34:01,239 --> 01:34:05,379
to say what am i doing or how do i work

2479
01:34:05,379 --> 01:34:08,199
or what methods do I have so this is

2480
01:34:08,199 --> 01:34:09,129
something that happens in high-level

2481
01:34:09,129 --> 01:34:10,779
languages for example this is a very

2482
01:34:10,779 --> 01:34:15,189
very neat way to to program for example

2483
01:34:15,189 --> 01:34:17,289
this is one reason I like JavaScript you

2484
01:34:17,289 --> 01:34:19,149
can get a random library of whoever

2485
01:34:19,149 --> 01:34:22,299
wrote it and in JavaScript say hey what

2486
01:34:22,299 --> 01:34:25,029
methods do I have I don't I don't even

2487
01:34:25,029 --> 01:34:26,619
care how that guy wrote his library I'm

2488
01:34:26,619 --> 01:34:27,909
just gonna look at the methods using

2489
01:34:27,909 --> 01:34:29,559
this reflection what methods are did

2490
01:34:29,559 --> 01:34:30,909
heven I can guess what it is especially

2491
01:34:30,909 --> 01:34:33,249
if I do it in a console but that can be

2492
01:34:33,249 --> 01:34:34,089
done completely

2493
01:34:34,089 --> 01:34:35,679
in the language for example in

2494
01:34:35,679 --> 01:34:37,510
JavaScript you can write a for loop to

2495
01:34:37,510 --> 01:34:40,629
go over your content as an object that

2496
01:34:40,629 --> 01:34:42,010
essentially means that the content as an

2497
01:34:42,010 --> 01:34:45,459
object is not fixed as is for example

2498
01:34:45,459 --> 01:34:48,069
the case in Java or C++ right oh by the

2499
01:34:48,069 --> 01:34:49,329
way in Scala it's also fixed because

2500
01:34:49,329 --> 01:34:51,039
it's strongly typed language but this

2501
01:34:51,039 --> 01:34:53,229
refraction can come in handy in a big

2502
01:34:53,229 --> 01:34:56,439
way and especially can be useful if you

2503
01:34:56,439 --> 01:34:58,030
have multiple versions of the same of

2504
01:34:58,030 --> 01:35:00,519
the same program because for example one

2505
01:35:00,519 --> 01:35:02,019
way to make sure that you can work

2506
01:35:02,019 --> 01:35:03,729
correctly is to say do I have this

2507
01:35:03,729 --> 01:35:05,199
functionality if yes I'm gonna do

2508
01:35:05,199 --> 01:35:08,050
something if not say hey this version is

2509
01:35:08,050 --> 01:35:10,079
too old maybe you should upgrade

2510
01:35:10,079 --> 01:35:12,010
especially when you get separate

2511
01:35:12,010 --> 01:35:13,479
components that are not access is

2512
01:35:13,479 --> 01:35:15,189
synchronized from a versioning point of

2513
01:35:15,189 --> 01:35:17,709
view right so this computational

2514
01:35:17,709 --> 01:35:20,289
reflection it's a very high-level kind

2515
01:35:20,289 --> 01:35:24,399
of feature that makes design of these

2516
01:35:24,399 --> 01:35:26,559
three systems a little bit easier okay

2517
01:35:26,559 --> 01:35:27,939
so it's a kind of a nice thing to help

2518
01:35:27,939 --> 01:35:32,349
you definitely don't have in C++ so how

2519
01:35:32,349 --> 01:35:34,030
would you do this in C++ can you maybe

2520
01:35:34,030 --> 01:35:35,649
this is an interesting assignment write

2521
01:35:35,649 --> 01:35:37,510
a C++ program that enumerate all the

2522
01:35:37,510 --> 01:35:42,699
method it has an object has in C++ you

2523
01:35:42,699 --> 01:35:46,539
just know because of the curved what

2524
01:35:46,539 --> 01:35:48,339
methods you have but you can't say

2525
01:35:48,339 --> 01:35:51,760
enumerate what methods I have really the

2526
01:35:51,760 --> 01:35:53,379
new C plus percent that has a little bit

2527
01:35:53,379 --> 01:35:55,780
of reflection but not significantly

2528
01:35:55,780 --> 01:35:58,119
right in PHP JavaScript you can really

2529
01:35:58,119 --> 01:36:03,189
say what matters do I have by the way

2530
01:36:03,189 --> 01:36:06,059
the original object-oriented language

2531
01:36:06,059 --> 01:36:08,229
which was the first object oriented

2532
01:36:08,229 --> 01:36:10,919
language anybody knows

2533
01:36:13,589 --> 01:36:19,539
okay so I have C++ I have Fortran well

2534
01:36:19,539 --> 01:36:20,889
it turns out that none of the above yes

2535
01:36:20,889 --> 01:36:25,479
I'm sorry Simula C one had some ideas in

2536
01:36:25,479 --> 01:36:26,649
object-oriented but that first pure

2537
01:36:26,649 --> 01:36:29,439
object-oriented I'm sorry oh no way skal

2538
01:36:29,439 --> 01:36:32,379
I so knew I mean C++ is older how many

2539
01:36:32,379 --> 01:36:36,819
people heard about small talk small talk

2540
01:36:36,819 --> 01:36:38,169
is the original object-oriented

2541
01:36:38,169 --> 01:36:39,819
programming language and had reflection

2542
01:36:39,819 --> 01:36:42,309
the designers of small talk believe that

2543
01:36:42,309 --> 01:36:43,839
that's a crucial property of any

2544
01:36:43,839 --> 01:36:46,089
object-oriented language and then C++

2545
01:36:46,089 --> 01:36:47,260
came along and destroyed

2546
01:36:47,260 --> 01:36:51,489
right in small talk an object could say

2547
01:36:51,489 --> 01:36:52,780
what methods do I have

2548
01:36:52,780 --> 01:36:54,970
and then selectively call them or not

2549
01:36:54,970 --> 01:36:56,920
call them small talk was designed to do

2550
01:36:56,920 --> 01:36:59,200
graphic interfaces and it's crucial for

2551
01:36:59,200 --> 01:37:02,650
nice menu based system and whatnot to

2552
01:37:02,650 --> 01:37:04,060
have in fact some sort of an

2553
01:37:04,060 --> 01:37:06,790
object-oriented programming interface

2554
01:37:06,790 --> 01:37:08,920
makes it much much much easier all the

2555
01:37:08,920 --> 01:37:10,890
good interfaces like that have

2556
01:37:10,890 --> 01:37:12,460
significant characteristic of

2557
01:37:12,460 --> 01:37:14,650
object-oriented design okay with

2558
01:37:14,650 --> 01:37:17,890
reflection so how white C++ designers

2559
01:37:17,890 --> 01:37:21,370
knocked off reflection oh because it

2560
01:37:21,370 --> 01:37:22,660
gives performance right as I forget

2561
01:37:22,660 --> 01:37:23,920
about it it gives performance we are

2562
01:37:23,920 --> 01:37:27,340
knocking it off right so to a large

2563
01:37:27,340 --> 01:37:30,340
extent C++ is not a true object-oriented

2564
01:37:30,340 --> 01:37:34,180
language it's a kind of object-oriented

2565
01:37:34,180 --> 01:37:37,989
organized language we're not really an

2566
01:37:37,989 --> 01:37:39,790
object-oriented language not in the

2567
01:37:39,790 --> 01:37:42,190
small talk sense okay that doesn't make

2568
01:37:42,190 --> 01:37:45,220
it bad they just it essentially says

2569
01:37:45,220 --> 01:37:47,260
there are many flavors of everything and

2570
01:37:47,260 --> 01:37:48,610
you have to be careful how you call

2571
01:37:48,610 --> 01:37:50,140
things and what exactly do they mean

2572
01:37:50,140 --> 01:37:51,070
right

2573
01:37:51,070 --> 01:37:54,520
so different kind of story all right now

2574
01:37:54,520 --> 01:37:56,110
the last thing I want to mention is this

2575
01:37:56,110 --> 01:37:57,880
idea of a feedback control model because

2576
01:37:57,880 --> 01:37:59,800
this is what will allow you to monitor

2577
01:37:59,800 --> 01:38:01,989
yourself any feedback loop this is a big

2578
01:38:01,989 --> 01:38:04,330
issue for example in in control theory

2579
01:38:04,330 --> 01:38:06,430
any feedback loop consistent you start

2580
01:38:06,430 --> 01:38:09,520
with some initial configuration and what

2581
01:38:09,520 --> 01:38:10,690
you're trying to do is do some

2582
01:38:10,690 --> 01:38:12,400
Corrections they have to be based on

2583
01:38:12,400 --> 01:38:14,350
some sort of a loop in which you look at

2584
01:38:14,350 --> 01:38:16,120
what you were doing yeah that means you

2585
01:38:16,120 --> 01:38:18,190
measure it with some metric you do some

2586
01:38:18,190 --> 01:38:20,140
sort of analysis you figure out how to

2587
01:38:20,140 --> 01:38:21,310
adjust yourself and you apply the

2588
01:38:21,310 --> 01:38:23,290
adjustment and you keep staying in that

2589
01:38:23,290 --> 01:38:25,000
loop for example if you notice that a

2590
01:38:25,000 --> 01:38:26,680
part of it doesn't quite work and this

2591
01:38:26,680 --> 01:38:28,000
is one great thing you can do with

2592
01:38:28,000 --> 01:38:29,980
actors you have actors with morning poor

2593
01:38:29,980 --> 01:38:31,660
parts of the system if they say notice

2594
01:38:31,660 --> 01:38:33,510
that something is hiccupping one way to

2595
01:38:33,510 --> 01:38:36,220
to heal the system is you kill those

2596
01:38:36,220 --> 01:38:39,180
components and create the menu all right

2597
01:38:39,180 --> 01:38:42,280
any of them ran in a weird state if you

2598
01:38:42,280 --> 01:38:44,430
can afford to do that and not overly

2599
01:38:44,430 --> 01:38:47,770
disrupt the system your it's perfect for

2600
01:38:47,770 --> 01:38:49,440
example if you have a thousand servers

2601
01:38:49,440 --> 01:38:51,489
front-end service some of them will run

2602
01:38:51,489 --> 01:38:54,520
amok for whatever reasons best way is to

2603
01:38:54,520 --> 01:38:56,590
use this hardware devices that can

2604
01:38:56,590 --> 01:38:58,780
simply cut the power of the machine and

2605
01:38:58,780 --> 01:39:00,270
reboot it in a hard way

2606
01:39:00,270 --> 01:39:03,240
and you started again 30 seconds later

2607
01:39:03,240 --> 01:39:06,060
or whatever it's like new as opposed to

2608
01:39:06,060 --> 01:39:09,540
running on some sort of a problem so in

2609
01:39:09,540 --> 01:39:11,280
principle you could mask a lot a problem

2610
01:39:11,280 --> 01:39:13,650
a lot of problems I'm sorry you might

2611
01:39:13,650 --> 01:39:15,330
have for example you have a program that

2612
01:39:15,330 --> 01:39:17,790
has memory leaks you simply put an actor

2613
01:39:17,790 --> 01:39:19,710
in front of it that watches it and when

2614
01:39:19,710 --> 01:39:22,950
the problem is true severe you kill it

2615
01:39:22,950 --> 01:39:24,840
and it's like you know so you can mask

2616
01:39:24,840 --> 01:39:29,100
any bad code with self-healing which

2617
01:39:29,100 --> 01:39:30,720
means you have to be able to be careful

2618
01:39:30,720 --> 01:39:31,860
how you use this self-healing but in

2619
01:39:31,860 --> 01:39:33,270
principle it could be something that

2620
01:39:33,270 --> 01:39:35,100
will make the system very robust as long

2621
01:39:35,100 --> 01:39:35,880
as you can do this

2622
01:39:35,880 --> 01:39:39,660
take it down put it back up and you can

2623
01:39:39,660 --> 01:39:41,430
keep the system running while this is

2624
01:39:41,430 --> 01:39:43,440
actually happening this is one of the

2625
01:39:43,440 --> 01:39:45,810
things goggle does all the time okay

2626
01:39:45,810 --> 01:39:47,640
they monitor all the systems they simply

2627
01:39:47,640 --> 01:39:49,110
knock off the ones that don't work and

2628
01:39:49,110 --> 01:39:50,880
you don't even feel it because they mask

2629
01:39:50,880 --> 01:39:54,090
the failure actor actor model should

2630
01:39:54,090 --> 01:39:56,730
make this a much easier exercise alright

2631
01:39:56,730 --> 01:39:58,740
so this is it for this lecture I'm going

2632
01:39:58,740 --> 01:40:01,770
to go to my office and find the project

2633
01:40:01,770 --> 01:40:03,150
and post the first project so you can

2634
01:40:03,150 --> 01:40:04,080
start working on it

2635
01:40:04,080 --> 01:40:06,350
okay and we're gonna have more fun

2636
01:40:06,350 --> 01:40:08,610
Thursday and talk about distributed

2637
01:40:08,610 --> 00:00:00,000
systems