1
00:00:25,700 --> 00:00:29,490
all right so we are gathering now two
2
00:00:29,490 --> 00:00:31,560
interesting things okay so I gave you
3
00:00:31,560 --> 00:00:35,910
abroad all of you before high level what
4
00:00:35,910 --> 00:00:37,379
are the issues that disability since
5
00:00:37,379 --> 00:00:40,410
people care about and it was what mainly
6
00:00:40,410 --> 00:00:43,350
use lots of lots of lots of computers to
7
00:00:43,350 --> 00:00:45,239
do all kinds of great things right so
8
00:00:45,239 --> 00:00:47,910
let's see how that might actually be
9
00:00:47,910 --> 00:00:49,440
done so in order to talk about
10
00:00:49,440 --> 00:00:51,330
particular solutions we have to talk
11
00:00:51,330 --> 00:00:52,949
about this issue called architectures
12
00:00:52,949 --> 00:00:54,660
and architectures is gonna mean many
13
00:00:54,660 --> 00:00:56,129
different things in this lecture right
14
00:00:56,129 --> 00:00:58,170
it's gonna be to some extent hardware
15
00:00:58,170 --> 00:01:00,629
architectures but also organization of
16
00:01:00,629 --> 00:01:02,820
at the logical level of the simulated
17
00:01:02,820 --> 00:01:05,489
systems so maybe the first question to
18
00:01:05,489 --> 00:01:10,530
ask is why bother organize well because
19
00:01:10,530 --> 00:01:12,030
if you don't organize you're not gonna
20
00:01:12,030 --> 00:01:14,369
be able to articulate a nice enough
21
00:01:14,369 --> 00:01:15,780
solution you're not gonna be understood
22
00:01:15,780 --> 00:01:16,680
by other people
23
00:01:16,680 --> 00:01:18,600
you're gonna reinvent the wheel over and
24
00:01:18,600 --> 00:01:20,420
over all kinds of issues like this right
25
00:01:20,420 --> 00:01:23,970
now it's important to have ways to
26
00:01:23,970 --> 00:01:27,479
organize talk about in a certain way
27
00:01:27,479 --> 00:01:29,399
material but it's also important to
28
00:01:29,399 --> 00:01:31,469
recognize that you can break away from
29
00:01:31,469 --> 00:01:33,840
some of those issues and invent maybe
30
00:01:33,840 --> 00:01:36,509
your own terms all right so I call this
31
00:01:36,509 --> 00:01:39,960
a selective will invention right so you
32
00:01:39,960 --> 00:01:41,340
want to reinvent the wheel in a very
33
00:01:41,340 --> 00:01:44,820
selective way in a purposeful manner to
34
00:01:44,820 --> 00:01:46,079
achieve a certain thing you don't want
35
00:01:46,079 --> 00:01:47,670
to just randomly reinvent the wheel
36
00:01:47,670 --> 00:01:48,869
right it's good to know what other
37
00:01:48,869 --> 00:01:50,700
people did is good to know when you
38
00:01:50,700 --> 00:01:53,249
might want to to break break free
39
00:01:53,249 --> 00:01:56,159
yourself okay so let's first see how
40
00:01:56,159 --> 00:01:58,289
already people in distributed systems
41
00:01:58,289 --> 00:02:01,079
have thought about this issue
42
00:02:01,079 --> 00:02:05,460
architectures okay so I'm not gonna go
43
00:02:05,460 --> 00:02:07,139
into what how computers work what do
44
00:02:07,139 --> 00:02:09,538
they do what networking does and almost
45
00:02:09,538 --> 00:02:10,710
anything involving these civil
46
00:02:10,710 --> 00:02:12,540
resistance is going to obviously run on
47
00:02:12,540 --> 00:02:14,100
something I can run programs computers
48
00:02:14,100 --> 00:02:15,300
and is going to have some sort of
49
00:02:15,300 --> 00:02:17,580
interconnectivity okay now that might
50
00:02:17,580 --> 00:02:21,959
take the form of some sort of inside the
51
00:02:21,959 --> 00:02:24,300
machine connectivity or between machine
52
00:02:24,300 --> 00:02:26,459
connectivity networking people have
53
00:02:26,459 --> 00:02:28,050
worried about this for a long time so we
54
00:02:28,050 --> 00:02:29,370
are just going to use networking right
55
00:02:29,370 --> 00:02:31,710
we're never going to get at a extremely
56
00:02:31,710 --> 00:02:33,240
low level when it comes to communication
57
00:02:33,240 --> 00:02:34,830
we're going to assume that we do some
58
00:02:34,830 --> 00:02:37,680
kind sort of a tcp/ip or maybe
59
00:02:37,680 --> 00:02:39,989
datagrams UDP well that's about all the
60
00:02:39,989 --> 00:02:42,030
level we we actually care about we don't
61
00:02:42,030 --> 00:02:44,280
care about the bits going on on the wire
62
00:02:44,280 --> 00:02:47,939
in this class okay or for that matter in
63
00:02:47,939 --> 00:02:50,010
any class except a networking class when
64
00:02:50,010 --> 00:02:51,299
we only care about the beads going on
65
00:02:51,299 --> 00:02:54,290
the wire for the most part okay what so
66
00:02:54,290 --> 00:02:58,170
when it comes to architectural styles at
67
00:02:58,170 --> 00:03:00,269
least this textbook organizes them in
68
00:03:00,269 --> 00:03:01,590
the in the following way these are the
69
00:03:01,590 --> 00:03:03,299
four primary architectural styles talked
70
00:03:03,299 --> 00:03:05,670
about layered architecture object based
71
00:03:05,670 --> 00:03:07,739
architectures datacenter architectures
72
00:03:07,739 --> 00:03:09,780
and event based architectures now
73
00:03:09,780 --> 00:03:11,909
sometimes it's hard to say whether a
74
00:03:11,909 --> 00:03:13,139
particular system has one of these
75
00:03:13,139 --> 00:03:14,819
architectures or a blend between them or
76
00:03:14,819 --> 00:03:17,849
things of this sort and take them as
77
00:03:17,849 --> 00:03:18,900
only some sort of a high-level
78
00:03:18,900 --> 00:03:20,940
indication of how you might do things
79
00:03:20,940 --> 00:03:24,689
okay now since we are talking about the
80
00:03:24,689 --> 00:03:27,299
actor model and that's really what I
81
00:03:27,299 --> 00:03:29,760
wanted to use primarily in this class
82
00:03:29,760 --> 00:03:31,260
I'm gonna make lots of references to the
83
00:03:31,260 --> 00:03:33,170
actor model and interestingly enough
84
00:03:33,170 --> 00:03:35,730
actor model is gonna fit perfectly well
85
00:03:35,730 --> 00:03:37,950
some of this architectures but
86
00:03:37,950 --> 00:03:39,750
definitely be able to emulate all the
87
00:03:39,750 --> 00:03:42,750
other architectures thinking in
88
00:03:42,750 --> 00:03:44,400
different ways is still helpful even if
89
00:03:44,400 --> 00:03:47,069
you use a for example a paradigm like
90
00:03:47,069 --> 00:03:49,650
the actor model that naturally fits one
91
00:03:49,650 --> 00:03:50,940
of this and let's see which one it does
92
00:03:50,940 --> 00:03:51,959
fit okay
93
00:03:51,959 --> 00:03:54,959
so layered architecture so first how
94
00:03:54,959 --> 00:03:57,919
many people took a networking class here
95
00:03:57,919 --> 00:04:01,590
okay so the layered architecture really
96
00:04:01,590 --> 00:04:03,870
comes from networking I would even say
97
00:04:03,870 --> 00:04:05,549
the networking people are completely
98
00:04:05,549 --> 00:04:08,970
obsessed by layering any kind of network
99
00:04:08,970 --> 00:04:11,370
stack it's about layers right so they
100
00:04:11,370 --> 00:04:13,889
talk about several layers and then they
101
00:04:13,889 --> 00:04:15,989
implement five and hope people notice
102
00:04:15,989 --> 00:04:20,070
three but in the end these layers right
103
00:04:20,070 --> 00:04:21,988
are gonna add various kind of
104
00:04:21,988 --> 00:04:24,360
functionality but then why stack them it
105
00:04:24,360 --> 00:04:27,360
makes the code it makes reasoning and
106
00:04:27,360 --> 00:04:28,560
the implementation a little bit easier
107
00:04:28,560 --> 00:04:30,750
to understand so they figure out and to
108
00:04:30,750 --> 00:04:34,080
some extent they are going along with
109
00:04:34,080 --> 00:04:36,810
the initial implementation of say tcp/ip
110
00:04:36,810 --> 00:04:40,110
right their reasoning that separating
111
00:04:40,110 --> 00:04:41,820
various functionality in these layers
112
00:04:41,820 --> 00:04:43,349
and then maybe optionally including the
113
00:04:43,349 --> 00:04:44,729
layers so not including the layers is
114
00:04:44,729 --> 00:04:46,529
gonna make for a nice implementation for
115
00:04:46,529 --> 00:04:47,849
the network stack and virtually all
116
00:04:47,849 --> 00:04:49,710
network stacks are have a layered
117
00:04:49,710 --> 00:04:50,650
architecture
118
00:04:50,650 --> 00:04:53,740
okay the very bottom layer in a network
119
00:04:53,740 --> 00:04:56,290
sack here's essentially Ethernet like
120
00:04:56,290 --> 00:04:58,389
behavior the base let go on an Ethernet
121
00:04:58,389 --> 00:05:00,009
like connection and the higher most
122
00:05:00,009 --> 00:05:01,449
level might be something like the full
123
00:05:01,449 --> 00:05:04,180
flash TCP tcp/ip behavior right so maybe
124
00:05:04,180 --> 00:05:06,250
the very the end checksum or things of
125
00:05:06,250 --> 00:05:09,419
this sort okay so when it comes to
126
00:05:09,419 --> 00:05:11,530
protocol implementation especially
127
00:05:11,530 --> 00:05:13,300
low-level protocol implementation
128
00:05:13,300 --> 00:05:15,430
virtually the the layers it actually
129
00:05:15,430 --> 00:05:17,289
dominates but it's not particularly nice
130
00:05:17,289 --> 00:05:21,550
to reason about right for the most part
131
00:05:21,550 --> 00:05:23,289
when it comes to distributed systems
132
00:05:23,289 --> 00:05:26,289
since distributed systems people mostly
133
00:05:26,289 --> 00:05:28,330
just use networking as a tool and say oh
134
00:05:28,330 --> 00:05:31,600
yes let's do tcp/ip layering the kind of
135
00:05:31,600 --> 00:05:33,580
layered architecture is not particularly
136
00:05:33,580 --> 00:05:36,130
interesting it's too constraining so one
137
00:05:36,130 --> 00:05:37,840
particular constraining part in it is
138
00:05:37,840 --> 00:05:40,660
the fact that one layer can talk only
139
00:05:40,660 --> 00:05:43,300
with the layer right right below right
140
00:05:43,300 --> 00:05:45,039
so for example in networking the high
141
00:05:45,039 --> 00:05:47,349
the high most level is going to be make
142
00:05:47,349 --> 00:05:49,240
some kind of a tcp/ip request and then
143
00:05:49,240 --> 00:05:50,800
you go layer layer layer layer until
144
00:05:50,800 --> 00:05:55,780
oops smartboard okay until you
145
00:05:55,780 --> 00:05:56,979
essentially get to the bottommost layer
146
00:05:56,979 --> 00:05:59,139
that's the requesting bits go on the
147
00:05:59,139 --> 00:06:00,940
wire and then when something happened or
148
00:06:00,940 --> 00:06:03,130
everything gets propagated back but you
149
00:06:03,130 --> 00:06:04,539
have to go through the intermediate
150
00:06:04,539 --> 00:06:05,889
layers that's the problem in the layered
151
00:06:05,889 --> 00:06:07,960
architecture why not jump so even
152
00:06:07,960 --> 00:06:09,490
network implementers have thought about
153
00:06:09,490 --> 00:06:12,900
better ways to do this and by the way
154
00:06:12,900 --> 00:06:16,360
some distributed systems people came up
155
00:06:16,360 --> 00:06:18,400
with interesting variants of the layered
156
00:06:18,400 --> 00:06:19,900
architecture to make it much higher
157
00:06:19,900 --> 00:06:21,789
performance right and that was
158
00:06:21,789 --> 00:06:23,139
essentially by saying you have
159
00:06:23,139 --> 00:06:24,610
essentially a layered architecture but
160
00:06:24,610 --> 00:06:26,760
you can optimize it and do jumps and and
161
00:06:26,760 --> 00:06:28,990
write for example if you could jump from
162
00:06:28,990 --> 00:06:31,030
layer 1 to layer n which might be
163
00:06:31,030 --> 00:06:32,590
acceptable under certain circumstances
164
00:06:32,590 --> 00:06:34,479
the network stack is going to run faster
165
00:06:34,479 --> 00:06:37,599
ok but as it is is too constraining it's
166
00:06:37,599 --> 00:06:38,889
very important though it's important to
167
00:06:38,889 --> 00:06:41,800
know that something like this it's is
168
00:06:41,800 --> 00:06:44,680
use ok now this is an object based
169
00:06:44,680 --> 00:06:46,389
architectural style okay you have
170
00:06:46,389 --> 00:06:48,909
various objects and the objects somehow
171
00:06:48,909 --> 00:06:51,880
communicate with each other this fits
172
00:06:51,880 --> 00:06:54,030
perfectly the object-oriented design
173
00:06:54,030 --> 00:06:59,260
right now while it's nice right so this
174
00:06:59,260 --> 00:07:01,029
kind of arrows means some sort of a
175
00:07:01,029 --> 00:07:03,370
method call if you would put
176
00:07:03,370 --> 00:07:03,919
object
177
00:07:03,919 --> 00:07:05,599
different machines then those errors are
178
00:07:05,599 --> 00:07:06,860
still gonna be some sort of method calls
179
00:07:06,860 --> 00:07:09,889
but remote method calls sometimes known
180
00:07:09,889 --> 00:07:11,599
under the name remote procedure calls
181
00:07:11,599 --> 00:07:13,580
right it's a more generic not
182
00:07:13,580 --> 00:07:14,900
necessarily method but the function in
183
00:07:14,900 --> 00:07:18,710
general right now because of the object
184
00:07:18,710 --> 00:07:23,169
oriented design taught in virtually all
185
00:07:23,169 --> 00:07:25,159
programming classes or almost all
186
00:07:25,159 --> 00:07:27,650
programming classes this feels like a
187
00:07:27,650 --> 00:07:29,900
very natural kind of model right the
188
00:07:29,900 --> 00:07:31,580
trouble is is not quite clear who does
189
00:07:31,580 --> 00:07:34,580
what and when right namely any notion of
190
00:07:34,580 --> 00:07:36,349
parallelism is not obvious at all but
191
00:07:36,349 --> 00:07:38,330
this kind of a model would fit perfectly
192
00:07:38,330 --> 00:07:40,039
let's say an actor based model in which
193
00:07:40,039 --> 00:07:41,539
you replace objects with actors and
194
00:07:41,539 --> 00:07:43,719
Method calls with some kind of a message
195
00:07:43,719 --> 00:07:46,189
exchange between act between actors and
196
00:07:46,189 --> 00:07:48,379
then this starts to make a lot of sense
197
00:07:48,379 --> 00:07:51,409
right but this is really not the intent
198
00:07:51,409 --> 00:07:53,060
of the algae-based architecture right
199
00:07:53,060 --> 00:07:56,210
those objects quite often paste a lot of
200
00:07:56,210 --> 00:07:58,669
moves in between themselves large part
201
00:07:58,669 --> 00:08:01,039
of the state and share share things and
202
00:08:01,039 --> 00:08:02,659
whatnot part of the actor model is
203
00:08:02,659 --> 00:08:04,999
really try to keep to yourself the state
204
00:08:04,999 --> 00:08:08,469
you're you're managing right and use
205
00:08:08,469 --> 00:08:12,229
message exchanges to act on your own
206
00:08:12,229 --> 00:08:14,240
state and to send more messages to
207
00:08:14,240 --> 00:08:15,889
somebody else not necessarily true in
208
00:08:15,889 --> 00:08:20,509
object model okay now another one is the
209
00:08:20,509 --> 00:08:24,319
event based architectural style and you
210
00:08:24,319 --> 00:08:26,779
probably already say hey but actors have
211
00:08:26,779 --> 00:08:28,490
events so Xers are really some sort of a
212
00:08:28,490 --> 00:08:30,259
combination you can see now of some kind
213
00:08:30,259 --> 00:08:32,328
of an events event based architecture
214
00:08:32,328 --> 00:08:34,429
and an object based architecture in the
215
00:08:34,429 --> 00:08:35,929
event based architecture you have
216
00:08:35,929 --> 00:08:37,549
messages but not necessary any kind of
217
00:08:37,549 --> 00:08:40,370
actors I want you to understand that the
218
00:08:40,370 --> 00:08:44,870
actor model right it's a separate idea
219
00:08:44,870 --> 00:08:47,000
you can take part of it just exchanging
220
00:08:47,000 --> 00:08:48,260
messages and then you have this event
221
00:08:48,260 --> 00:08:49,820
this architecture you can take the
222
00:08:49,820 --> 00:08:51,350
overall object organization and then you
223
00:08:51,350 --> 00:08:52,760
have an object based architecture you
224
00:08:52,760 --> 00:08:54,320
can combine both and then you have some
225
00:08:54,320 --> 00:08:56,360
sort of an actor model you might ask the
226
00:08:56,360 --> 00:08:57,800
question why is the actor model not in
227
00:08:57,800 --> 00:09:00,589
the textbook it's really not in the
228
00:09:00,589 --> 00:09:03,350
textbook well it looks to me that this
229
00:09:03,350 --> 00:09:05,779
is the biggest blunder of the
230
00:09:05,779 --> 00:09:07,519
distributed systems community they
231
00:09:07,519 --> 00:09:10,399
ignore this very nice well fitting actor
232
00:09:10,399 --> 00:09:14,600
model idea right why it's still not
233
00:09:14,600 --> 00:09:16,279
quite clear okay anyway
234
00:09:16,279 --> 00:09:17,570
the event based architect
235
00:09:17,570 --> 00:09:20,570
style consists in some sort of a medium
236
00:09:20,570 --> 00:09:23,480
in which you're exchanging events now
237
00:09:23,480 --> 00:09:24,769
what you do with the events on how you
238
00:09:24,769 --> 00:09:28,990
behave and how you manage those events
239
00:09:28,990 --> 00:09:31,250
the architecture diagram doesn't care
240
00:09:31,250 --> 00:09:33,199
about that important thing is you have
241
00:09:33,199 --> 00:09:34,720
the ability to change these events
242
00:09:34,720 --> 00:09:37,660
interestingly if you look at modern
243
00:09:37,660 --> 00:09:40,819
computer architectures right to large
244
00:09:40,819 --> 00:09:41,990
extent they are actually event
245
00:09:41,990 --> 00:09:44,600
exchanging architectures for example the
246
00:09:44,600 --> 00:09:46,009
way the course talked to each other on
247
00:09:46,009 --> 00:09:48,440
an AMD processor is really this message
248
00:09:48,440 --> 00:09:49,940
passing and those are wrong it's exactly
249
00:09:49,940 --> 00:09:51,920
this they exchange these events the
250
00:09:51,920 --> 00:09:53,449
events being somebody wrote in this
251
00:09:53,449 --> 00:09:55,009
location in the cache update your cache
252
00:09:55,009 --> 00:09:56,269
it's all that cache coherency protocol
253
00:09:56,269 --> 00:09:57,620
kind of thing for the people that took
254
00:09:57,620 --> 00:09:59,209
some kind of an architecture class how
255
00:09:59,209 --> 00:10:00,470
many people took an architecture class
256
00:10:00,470 --> 00:10:02,300
or are taking an architecture class even
257
00:10:02,300 --> 00:10:05,149
more interesting but don't we have a
258
00:10:05,149 --> 00:10:06,649
core class where does this or maybe it's
259
00:10:06,649 --> 00:10:09,259
not core but it's not in it okay anyway
260
00:10:09,259 --> 00:10:10,810
it's potentially interesting to look at
261
00:10:10,810 --> 00:10:16,100
okay the other one is and this is really
262
00:10:16,100 --> 00:10:19,250
everywhere okay and I'm gonna talk
263
00:10:19,250 --> 00:10:22,010
extensively about this in class it's
264
00:10:22,010 --> 00:10:24,410
this idea of a shared data space and
265
00:10:24,410 --> 00:10:27,079
when you usually talk about data I want
266
00:10:27,079 --> 00:10:28,430
you to understand this you're talking
267
00:10:28,430 --> 00:10:31,250
about some sort of persistency right so
268
00:10:31,250 --> 00:10:33,709
data is supposed to leave out leave
269
00:10:33,709 --> 00:10:36,440
computers that exchange the data so the
270
00:10:36,440 --> 00:10:37,850
trouble with all the other architectures
271
00:10:37,850 --> 00:10:41,269
we had before is things move around but
272
00:10:41,269 --> 00:10:44,050
they will disappear if the computers
273
00:10:44,050 --> 00:10:47,300
basically get turned off but the moment
274
00:10:47,300 --> 00:10:48,889
you start talking about persistency of
275
00:10:48,889 --> 00:10:51,889
the data the data is going to be pushed
276
00:10:51,889 --> 00:10:53,540
to some sort of a persistent medium for
277
00:10:53,540 --> 00:10:55,670
example a hard drive so even if the
278
00:10:55,670 --> 00:10:57,920
computer has issues goes down for any
279
00:10:57,920 --> 00:10:59,569
reason when it wakes up he still finds
280
00:10:59,569 --> 00:11:02,899
the data right that means the data now
281
00:11:02,899 --> 00:11:04,880
is really the core and everything
282
00:11:04,880 --> 00:11:08,540
revolves around this data right it's as
283
00:11:08,540 --> 00:11:11,000
if we would use basically some kind of a
284
00:11:11,000 --> 00:11:13,250
written record and if I want to do a
285
00:11:13,250 --> 00:11:15,170
transaction right I want to sell you
286
00:11:15,170 --> 00:11:18,019
something for example a house we simply
287
00:11:18,019 --> 00:11:19,430
write some kind of a document that goes
288
00:11:19,430 --> 00:11:20,959
in the archive saying we've done that
289
00:11:20,959 --> 00:11:22,730
transaction so if anything happens to us
290
00:11:22,730 --> 00:11:24,410
somebody can go and look at the document
291
00:11:24,410 --> 00:11:26,930
this is probably what you revolutionize
292
00:11:26,930 --> 00:11:29,540
completely commerce and owning property
293
00:11:29,540 --> 00:11:30,440
owner
294
00:11:30,440 --> 00:11:32,540
right you didn't have to defend all the
295
00:11:32,540 --> 00:11:34,940
time using some sort of a weapon what
296
00:11:34,940 --> 00:11:36,410
you owned you could prove you own
297
00:11:36,410 --> 00:11:37,910
something based on the written record so
298
00:11:37,910 --> 00:11:39,200
that's kind of the ultimate in
299
00:11:39,200 --> 00:11:41,450
persistent storage right a lot of it
300
00:11:41,450 --> 00:11:43,700
goes hundreds of years back or or more
301
00:11:43,700 --> 00:11:47,660
right well even going back to
302
00:11:47,660 --> 00:11:49,430
milliseconds it's tough in some of the
303
00:11:49,430 --> 00:11:51,050
dispute system so when you're talking
304
00:11:51,050 --> 00:11:53,120
about data things change you're suddenly
305
00:11:53,120 --> 00:11:55,160
worried about doing things to the data
306
00:11:55,160 --> 00:11:57,200
and not so much what all those computers
307
00:11:57,200 --> 00:11:59,690
that are alive will not do okay so then
308
00:11:59,690 --> 00:12:01,430
you everything gets in terms of go and
309
00:12:01,430 --> 00:12:03,290
change this data and propagate the
310
00:12:03,290 --> 00:12:04,520
change to the data make sure everybody
311
00:12:04,520 --> 00:12:08,060
sees the same data okay right now this
312
00:12:08,060 --> 00:12:10,220
is really the I would say the prevalent
313
00:12:10,220 --> 00:12:12,020
architectural style especially when it
314
00:12:12,020 --> 00:12:14,240
comes to web related anything all right
315
00:12:14,240 --> 00:12:15,830
I'm gonna come back to this when I talk
316
00:12:15,830 --> 00:12:17,810
about classic architecture for web-based
317
00:12:17,810 --> 00:12:20,150
services distributed system services and
318
00:12:20,150 --> 00:12:23,570
data it's always gonna be there okay by
319
00:12:23,570 --> 00:12:26,980
the way this is why databases are a
320
00:12:26,980 --> 00:12:29,180
multibillion-dollar industry because
321
00:12:29,180 --> 00:12:30,710
everybody needs some sort of persistency
322
00:12:30,710 --> 00:12:33,680
for their data so okay we're gonna come
323
00:12:33,680 --> 00:12:35,840
back to this later so it's mostly about
324
00:12:35,840 --> 00:12:37,220
data delivery of course some machines
325
00:12:37,220 --> 00:12:38,390
have to deliver the data but the
326
00:12:38,390 --> 00:12:39,410
important thing is the data is
327
00:12:39,410 --> 00:12:43,460
persistent okay no I did mention the
328
00:12:43,460 --> 00:12:46,730
fact that there are a number of core
329
00:12:46,730 --> 00:12:49,730
obsessions for every area and this TV
330
00:12:49,730 --> 00:12:52,820
systems has quite a lot of them but one
331
00:12:52,820 --> 00:12:54,490
of the most important one is this
332
00:12:54,490 --> 00:12:56,420
centralized idea of centralized
333
00:12:56,420 --> 00:12:58,160
architecture and not so much the
334
00:12:58,160 --> 00:13:01,730
centralized architecture but the deep
335
00:13:01,730 --> 00:13:03,170
opposition to a centralized architecture
336
00:13:03,170 --> 00:13:05,180
so distributed system research for
337
00:13:05,180 --> 00:13:07,820
example is mostly about removing the
338
00:13:07,820 --> 00:13:11,510
need for centralized anything right so
339
00:13:11,510 --> 00:13:14,000
whatever opposite of centralized you can
340
00:13:14,000 --> 00:13:16,070
find you can potentially do some
341
00:13:16,070 --> 00:13:17,270
interesting research in disability
342
00:13:17,270 --> 00:13:18,740
systems or build an interesting
343
00:13:18,740 --> 00:13:20,840
disability system in place in the end
344
00:13:20,840 --> 00:13:24,200
okay now what exactly does centralized
345
00:13:24,200 --> 00:13:26,570
architecture mean you have one or more
346
00:13:26,570 --> 00:13:28,790
clients and some sort of a server the
347
00:13:28,790 --> 00:13:30,080
server is going to do all the heavy
348
00:13:30,080 --> 00:13:33,800
lifting okay now obviously if the server
349
00:13:33,800 --> 00:13:36,880
goes down there is no more service
350
00:13:36,880 --> 00:13:41,510
whatsoever okay now it's easy to see how
351
00:13:41,510 --> 00:13:43,880
this would work because we are used
352
00:13:43,880 --> 00:13:45,490
this is the classic client-server model
353
00:13:45,490 --> 00:13:50,149
okay you have some one or more clients
354
00:13:50,149 --> 00:13:51,680
and the server is usually a big powerful
355
00:13:51,680 --> 00:13:53,149
machine and you ask the server to do
356
00:13:53,149 --> 00:13:55,310
things the interesting question is
357
00:13:55,310 --> 00:13:58,069
what's on the other side what could you
358
00:13:58,069 --> 00:14:00,069
have that's not a client-server model
359
00:14:00,069 --> 00:14:02,269
right and this is the kind of questions
360
00:14:02,269 --> 00:14:04,720
that these two B systems try to ask and
361
00:14:04,720 --> 00:14:11,589
tries to find solutions for okay now
362
00:14:11,589 --> 00:14:15,470
when it comes to any kind of system it's
363
00:14:15,470 --> 00:14:17,839
important to think how the entire stack
364
00:14:17,839 --> 00:14:19,519
looks like so I mean one of the biggest
365
00:14:19,519 --> 00:14:21,170
problems in general when it comes to
366
00:14:21,170 --> 00:14:23,360
software design is not having a complete
367
00:14:23,360 --> 00:14:25,240
view of how everything comes together
368
00:14:25,240 --> 00:14:27,680
right this is not only about various
369
00:14:27,680 --> 00:14:29,389
subject in computer science but this is
370
00:14:29,389 --> 00:14:32,660
a very important question is how the
371
00:14:32,660 --> 00:14:34,670
whole thing works all right so the
372
00:14:34,670 --> 00:14:36,139
interesting question is to start
373
00:14:36,139 --> 00:14:39,199
wheezing people to say okay so we have
374
00:14:39,199 --> 00:14:40,880
an application on let's say your cell
375
00:14:40,880 --> 00:14:43,519
phone let's get down to the lowest bits
376
00:14:43,519 --> 00:14:45,529
the lowest parts we get and try to
377
00:14:45,529 --> 00:14:47,079
figure out how things actually work
378
00:14:47,079 --> 00:14:49,910
right now that kind of a question I mean
379
00:14:49,910 --> 00:14:52,810
first of all it can become overwhelming
380
00:14:52,810 --> 00:14:55,189
most people have very weird ideas about
381
00:14:55,189 --> 00:14:57,829
how things work alright so I try to
382
00:14:57,829 --> 00:14:59,149
explain for example to my wife how the
383
00:14:59,149 --> 00:15:02,209
internet works well I had to give up
384
00:15:02,209 --> 00:15:04,639
about five minutes because she said okay
385
00:15:04,639 --> 00:15:06,170
so I'm still gonna keep my opinion that
386
00:15:06,170 --> 00:15:09,290
there is some sort of a Holy Spirit that
387
00:15:09,290 --> 00:15:11,000
keeps everything together and just makes
388
00:15:11,000 --> 00:15:12,079
things work and that's good enough for
389
00:15:12,079 --> 00:15:14,269
me so that's it right but this is a
390
00:15:14,269 --> 00:15:15,380
legitimate questions how does it
391
00:15:15,380 --> 00:15:19,250
actually work okay now by the way this
392
00:15:19,250 --> 00:15:22,120
is a very hard question to answer
393
00:15:22,120 --> 00:15:25,220
because it's easy to say what the
394
00:15:25,220 --> 00:15:26,990
mechanisms are but is not quite clear
395
00:15:26,990 --> 00:15:28,670
what all the parts do together it's a
396
00:15:28,670 --> 00:15:31,730
very very complex system okay so the
397
00:15:31,730 --> 00:15:33,439
parts don't really describe with the
398
00:15:33,439 --> 00:15:34,819
behavior of the entire system because
399
00:15:34,819 --> 00:15:37,069
they're not fully predictable alright so
400
00:15:37,069 --> 00:15:39,319
when it comes to design of an
401
00:15:39,319 --> 00:15:40,759
application there is some sort of a
402
00:15:40,759 --> 00:15:44,060
classic approach now I mean it's it kind
403
00:15:44,060 --> 00:15:46,399
of evolved over many decades but it's
404
00:15:46,399 --> 00:15:48,500
now about there and this is really what
405
00:15:48,500 --> 00:15:50,029
this is all about right so you have some
406
00:15:50,029 --> 00:15:51,769
sort of a user interface level a
407
00:15:51,769 --> 00:15:54,019
processing level and a data level the
408
00:15:54,019 --> 00:15:55,519
data level it's almost always there
409
00:15:55,519 --> 00:15:57,100
because you need to make things
410
00:15:57,100 --> 00:15:58,690
one way or another ultimately all
411
00:15:58,690 --> 00:16:00,550
computer systems are about exchanging
412
00:16:00,550 --> 00:16:03,430
some sort of information without that
413
00:16:03,430 --> 00:16:08,470
it's kind of a sport without purpose all
414
00:16:08,470 --> 00:16:10,000
right I mean who wants to create a
415
00:16:10,000 --> 00:16:12,190
distributed network just to measure how
416
00:16:12,190 --> 00:16:13,690
many messages we can send per second
417
00:16:13,690 --> 00:16:16,570
it's it's fun for a little bit but then
418
00:16:16,570 --> 00:16:19,480
what right so when it comes to getting
419
00:16:19,480 --> 00:16:21,070
things done with computers ultimately
420
00:16:21,070 --> 00:16:23,380
it's all about some sort of data
421
00:16:23,380 --> 00:16:27,250
exchange well it could be movement in a
422
00:16:27,250 --> 00:16:29,440
massively multiplayer game but is still
423
00:16:29,440 --> 00:16:32,320
data okay so it's data I mean it could
424
00:16:32,320 --> 00:16:33,970
be visual it could be all it's still
425
00:16:33,970 --> 00:16:36,760
data okay good so the classic
426
00:16:36,760 --> 00:16:39,820
architecture might look like this so we
427
00:16:39,820 --> 00:16:41,770
have some sort of user interface here
428
00:16:41,770 --> 00:16:44,200
then this level the processing level
429
00:16:44,200 --> 00:16:45,970
might have all kinds of fancy things for
430
00:16:45,970 --> 00:16:48,910
example some core sort of a query
431
00:16:48,910 --> 00:16:51,970
generator then the information goes to
432
00:16:51,970 --> 00:16:53,890
whatever a database a is some sort of a
433
00:16:53,890 --> 00:16:57,190
data layer and then data goes back but
434
00:16:57,190 --> 00:16:59,230
raw data especially coming from
435
00:16:59,230 --> 00:17:01,390
relational databases it's it's really
436
00:17:01,390 --> 00:17:03,280
dry I mean it looks like some tables
437
00:17:03,280 --> 00:17:05,319
that are very boring now of course you
438
00:17:05,319 --> 00:17:07,089
could display tables right but that's
439
00:17:07,089 --> 00:17:10,000
really really boring so then what you
440
00:17:10,000 --> 00:17:12,640
might want to do is some sort of a nicer
441
00:17:12,640 --> 00:17:14,829
processing on top of the table maybe
442
00:17:14,829 --> 00:17:17,890
some ranking some nice HTML generation
443
00:17:17,890 --> 00:17:20,140
oh right and then present the user in
444
00:17:20,140 --> 00:17:21,430
the user interface that's kind of a nice
445
00:17:21,430 --> 00:17:23,530
webpage as opposed to just some numbers
446
00:17:23,530 --> 00:17:28,630
in a table okay now you could see how
447
00:17:28,630 --> 00:17:30,250
this Artic the architecture might work
448
00:17:30,250 --> 00:17:31,810
but let's say you want to actually
449
00:17:31,810 --> 00:17:34,840
implement this the question is where
450
00:17:34,840 --> 00:17:36,820
would you put various components what
451
00:17:36,820 --> 00:17:39,640
runs where okay and this in fact is a
452
00:17:39,640 --> 00:17:40,990
very legitimate question let me see what
453
00:17:40,990 --> 00:17:43,300
where my other slide is okay so I want
454
00:17:43,300 --> 00:17:44,500
to talk a little bit about this slide
455
00:17:44,500 --> 00:17:47,050
okay so in this kind of phone
456
00:17:47,050 --> 00:17:48,070
architecture right you have the user
457
00:17:48,070 --> 00:17:49,300
interface the application and the
458
00:17:49,300 --> 00:17:50,620
database alright the processing layer is
459
00:17:50,620 --> 00:17:53,860
the application of and the interesting
460
00:17:53,860 --> 00:17:57,310
question is where do you separate the
461
00:17:57,310 --> 00:17:59,740
code that runs on let's say the two
462
00:17:59,740 --> 00:18:03,490
client let's say my phone or my my
463
00:18:03,490 --> 00:18:05,440
desktop machine or my laptop or whatever
464
00:18:05,440 --> 00:18:07,210
you want versus the code that runs on
465
00:18:07,210 --> 00:18:08,870
server whatever server means
466
00:18:08,870 --> 00:18:10,880
it may be the collection of machines
467
00:18:10,880 --> 00:18:13,970
that provide this virtual server okay
468
00:18:13,970 --> 00:18:17,570
and this drawing suggests that basically
469
00:18:17,570 --> 00:18:19,850
you can have anything from a very very
470
00:18:19,850 --> 00:18:21,799
shallow user interface and by the way
471
00:18:21,799 --> 00:18:24,950
this is what was happening in times of
472
00:18:24,950 --> 00:18:27,710
very expensive computers so if the
473
00:18:27,710 --> 00:18:29,240
computers are extremely expensive than
474
00:18:29,240 --> 00:18:31,640
this happen in the 60s right then
475
00:18:31,640 --> 00:18:33,380
essentially what you do is you produce
476
00:18:33,380 --> 00:18:35,720
lots of dumb terminals that can
477
00:18:35,720 --> 00:18:37,279
essentially just put letters on the
478
00:18:37,279 --> 00:18:38,840
screen and they can take the keystrokes
479
00:18:38,840 --> 00:18:40,460
and send them to the server this is the
480
00:18:40,460 --> 00:18:42,890
mainframe era right and then the server
481
00:18:42,890 --> 00:18:44,510
runs everything else and then switches
482
00:18:44,510 --> 00:18:47,960
between all kinds of various terminals
483
00:18:47,960 --> 00:18:49,159
and the terminals are really dumb
484
00:18:49,159 --> 00:18:50,990
terminals all right
485
00:18:50,990 --> 00:18:53,270
so for example the terminals you have
486
00:18:53,270 --> 00:18:55,159
now the terminal emulators you have now
487
00:18:55,159 --> 00:18:56,899
let's say in Linux or some other UNIX
488
00:18:56,899 --> 00:18:58,399
system they are really if you want
489
00:18:58,399 --> 00:19:00,620
software emulation of a real real
490
00:19:00,620 --> 00:19:03,669
physical device I had in the 60 70s a
491
00:19:03,669 --> 00:19:06,919
terminal was only about a thousand
492
00:19:06,919 --> 00:19:08,630
dollars but the mainframe was ten
493
00:19:08,630 --> 00:19:11,990
million dollars hands you heard hundreds
494
00:19:11,990 --> 00:19:14,330
of terminals on top of the multi-million
495
00:19:14,330 --> 00:19:16,309
dollar mainstream so that means a really
496
00:19:16,309 --> 00:19:18,919
shallow user interface but you could go
497
00:19:18,919 --> 00:19:22,039
to the other extreme and by the way yeah
498
00:19:22,039 --> 00:19:24,500
so these guys drew the right picture but
499
00:19:24,500 --> 00:19:26,600
so let me give you an extreme situation
500
00:19:26,600 --> 00:19:29,419
like this right the other extreme
501
00:19:29,419 --> 00:19:30,740
situation is a situation in which
502
00:19:30,740 --> 00:19:33,350
virtually almost anything runs on the
503
00:19:33,350 --> 00:19:35,510
client and very little runs on the
504
00:19:35,510 --> 00:19:38,929
server by the way the more on the right
505
00:19:38,929 --> 00:19:41,929
side you are the more scalable the
506
00:19:41,929 --> 00:19:44,510
services right why especially if you
507
00:19:44,510 --> 00:19:46,640
have very powerful clients and now even
508
00:19:46,640 --> 00:19:48,220
cell phones are very powerful clients
509
00:19:48,220 --> 00:19:50,480
that means basically you can get away
510
00:19:50,480 --> 00:19:53,240
with very little on the server side so
511
00:19:53,240 --> 00:19:56,620
then it's easy to serve hundreds
512
00:19:56,620 --> 00:19:58,730
thousands even millions of requests
513
00:19:58,730 --> 00:20:00,470
right now any problems if you do very
514
00:20:00,470 --> 00:20:05,080
little ok so let me actually give you
515
00:20:05,169 --> 00:20:07,399
well let me give you an example because
516
00:20:07,399 --> 00:20:09,409
nothing is better than examples ok let
517
00:20:09,409 --> 00:20:11,750
me switch to web browser I didn't really
518
00:20:11,750 --> 00:20:14,120
intend to do this but let me do that so
519
00:20:14,120 --> 00:20:16,250
let's try right so let me describe as I
520
00:20:16,250 --> 00:20:18,370
go on the web site so this is basically
521
00:20:18,370 --> 00:20:21,549
visualization of
522
00:20:22,149 --> 00:20:26,440
Medicare data in 2011 the federal
523
00:20:26,440 --> 00:20:28,989
government made available Medicare data
524
00:20:28,989 --> 00:20:31,029
in the form of a large table well they
525
00:20:31,029 --> 00:20:33,489
made it available the three four months
526
00:20:33,489 --> 00:20:35,769
ago a table with a hundred and seventy
527
00:20:35,769 --> 00:20:37,809
one hundred sixty-three thousand rows
528
00:20:37,809 --> 00:20:40,029
right so you remember those boring
529
00:20:40,029 --> 00:20:41,830
tables that I mentioned right that's one
530
00:20:41,830 --> 00:20:43,419
of those boring tables it tells you
531
00:20:43,419 --> 00:20:46,269
various hospitals that did various kind
532
00:20:46,269 --> 00:20:49,179
of procedures how much they asked for
533
00:20:49,179 --> 00:20:51,219
Medicare and how much Medicare paid for
534
00:20:51,219 --> 00:20:53,169
every procedure interesting information
535
00:20:53,169 --> 00:20:54,669
is just that the table is boring so what
536
00:20:54,669 --> 00:20:56,049
you do you pull it off in Excel for
537
00:20:56,049 --> 00:20:57,629
example but let's look at the web
538
00:20:57,629 --> 00:21:03,999
front-end for this I will take me a few
539
00:21:03,999 --> 00:21:11,830
seconds alright so something interesting
540
00:21:11,830 --> 00:21:14,349
happens low did you see the loading data
541
00:21:14,349 --> 00:21:16,389
that was a lot about two seconds right
542
00:21:16,389 --> 00:21:20,200
and now some disclaimer okay because hey
543
00:21:20,200 --> 00:21:23,710
right let me see if f11 works so
544
00:21:23,710 --> 00:21:25,210
basically this is a visualization on
545
00:21:25,210 --> 00:21:27,070
that data you can click on a state what
546
00:21:27,070 --> 00:21:29,710
did I pick I don't know some sort of New
547
00:21:29,710 --> 00:21:32,200
Mexico right this is a disease you can
548
00:21:32,200 --> 00:21:34,839
click on the disease things get computed
549
00:21:34,839 --> 00:21:36,429
in the browser is very nice you can
550
00:21:36,429 --> 00:21:40,119
highlight something right they select
551
00:21:40,119 --> 00:21:42,639
this so what you have is data processing
552
00:21:42,639 --> 00:21:46,899
at very very high speeds right sorts on
553
00:21:46,899 --> 00:21:50,409
the table select States literally in 100
554
00:21:50,409 --> 00:21:51,580
millisecond you get the answer for
555
00:21:51,580 --> 00:21:52,149
everything
556
00:21:52,149 --> 00:21:55,029
so this makes that a hundred and sixty
557
00:21:55,029 --> 00:21:58,539
three thousand row table fun to navigate
558
00:21:58,539 --> 00:22:03,039
right okay now interesting question how
559
00:22:03,039 --> 00:22:07,349
is this done how comes it works so fast
560
00:22:07,349 --> 00:22:10,029
so how many of you ran a query on a
561
00:22:10,029 --> 00:22:12,909
database engine that over a table that
562
00:22:12,909 --> 00:22:14,379
has a hundred and sixty-three thousand
563
00:22:14,379 --> 00:22:17,919
tables or didn't Excel something right
564
00:22:17,919 --> 00:22:20,139
so that was what about ten seconds right
565
00:22:20,139 --> 00:22:21,849
almost on no servers depending on how
566
00:22:21,849 --> 00:22:24,460
you set things up so how does it work so
567
00:22:24,460 --> 00:22:28,499
fast here what's going on in here yes
568
00:22:30,879 --> 00:22:32,720
so we're getting somewhere so a
569
00:22:32,720 --> 00:22:35,269
suggestion here is those two seconds we
570
00:22:35,269 --> 00:22:36,289
had at the beginning that was
571
00:22:36,289 --> 00:22:38,960
downloading all the data right she
572
00:22:38,960 --> 00:22:40,250
remember the architecture we had on the
573
00:22:40,250 --> 00:22:41,889
board that the extreme version was
574
00:22:41,889 --> 00:22:44,570
everything except maybe the very lowest
575
00:22:44,570 --> 00:22:46,970
part of the database was in the client
576
00:22:46,970 --> 00:22:48,710
it turns out that everything here is in
577
00:22:48,710 --> 00:22:51,950
the client except the raw table right
578
00:22:51,950 --> 00:22:54,740
now cleverly the raw table was converted
579
00:22:54,740 --> 00:22:56,899
from a boring table format to directly
580
00:22:56,899 --> 00:22:59,330
Jason because JavaScript likes Jason and
581
00:22:59,330 --> 00:23:00,919
essentially every time you click
582
00:23:00,919 --> 00:23:03,080
anything in here on this website it runs
583
00:23:03,080 --> 00:23:04,639
in the browser it does query processing
584
00:23:04,639 --> 00:23:07,039
in the browser so now you have query
585
00:23:07,039 --> 00:23:08,539
processing in JavaScript rather than
586
00:23:08,539 --> 00:23:11,269
send requests to the server right now
587
00:23:11,269 --> 00:23:13,309
this is good why for two reasons
588
00:23:13,309 --> 00:23:15,799
one of them the server does nothing
589
00:23:15,799 --> 00:23:18,889
except serve that initial data right it
590
00:23:18,889 --> 00:23:20,840
literally does nothing it served the raw
591
00:23:20,840 --> 00:23:23,210
JavaScript code the raw HTML there's no
592
00:23:23,210 --> 00:23:25,669
changing there is no HTML request it
593
00:23:25,669 --> 00:23:26,779
doesn't matter what you click click in
594
00:23:26,779 --> 00:23:29,090
here right Sarah has nothing to do if
595
00:23:29,090 --> 00:23:31,519
you want to literally have a million
596
00:23:31,519 --> 00:23:32,899
people using this as long as they can
597
00:23:32,899 --> 00:23:34,519
get it the data gets cached on their on
598
00:23:34,519 --> 00:23:35,990
their machine the data doesn't change
599
00:23:35,990 --> 00:23:37,659
because the feds don't publish more data
600
00:23:37,659 --> 00:23:39,980
right you don't even need to ask the
601
00:23:39,980 --> 00:23:41,840
server again data is cached on the
602
00:23:41,840 --> 00:23:44,870
client right so one way to achieve
603
00:23:44,870 --> 00:23:47,330
scalability is to push a lot of stuff in
604
00:23:47,330 --> 00:23:48,500
the client and this is an extreme
605
00:23:48,500 --> 00:23:49,789
situation in which virtually everything
606
00:23:49,789 --> 00:23:51,649
is pushed in the client ok but then of
607
00:23:51,649 --> 00:23:52,669
course you have to pull off lots of
608
00:23:52,669 --> 00:23:58,549
stunts in JavaScript ok so why am i show
609
00:23:58,549 --> 00:24:03,769
you do this oops one because it's
610
00:24:03,769 --> 00:24:06,679
possible true because that's how you get
611
00:24:06,679 --> 00:24:08,450
scalability you remember the other core
612
00:24:08,450 --> 00:24:13,460
obsession in distributed systems is how
613
00:24:13,460 --> 00:24:14,899
do you scale things up well you scale
614
00:24:14,899 --> 00:24:16,549
them up by asking the server to do as
615
00:24:16,549 --> 00:24:18,950
little as possible ok so it's possible
616
00:24:18,950 --> 00:24:21,919
now to be even more than this where
617
00:24:21,919 --> 00:24:25,399
there is almost no no data I'm sorry
618
00:24:25,399 --> 00:24:27,710
there's no almost no database that's
619
00:24:27,710 --> 00:24:29,179
possible you have mostly read-only data
620
00:24:29,179 --> 00:24:35,320
okay all right good now
621
00:24:35,320 --> 00:24:37,910
when it comes to this kind of message
622
00:24:37,910 --> 00:24:39,289
exchanges you're gonna see a lot of
623
00:24:39,289 --> 00:24:40,760
these diagrams which you might have seen
624
00:24:40,760 --> 00:24:42,830
for example in the hardware class this
625
00:24:42,830 --> 00:24:46,299
literally are diagrams of kind of
626
00:24:46,299 --> 00:24:48,320
information exchange diagrams that come
627
00:24:48,320 --> 00:24:50,330
from architecture designers in which you
628
00:24:50,330 --> 00:24:52,309
show some sort of a clock and then
629
00:24:52,309 --> 00:24:54,590
things go in a certain way right so when
630
00:24:54,590 --> 00:24:56,240
it comes to we can borrow those same
631
00:24:56,240 --> 00:24:58,490
formalism it's kind of nice right so
632
00:24:58,490 --> 00:25:00,380
when it comes to how things could work
633
00:25:00,380 --> 00:25:02,240
right if you have this simple
634
00:25:02,240 --> 00:25:04,400
architecture the user interface could
635
00:25:04,400 --> 00:25:06,380
have some sort of a request operation to
636
00:25:06,380 --> 00:25:07,789
the application server who can have a
637
00:25:07,789 --> 00:25:09,289
request to the database server at some
638
00:25:09,289 --> 00:25:11,179
point the data is returned and result is
639
00:25:11,179 --> 00:25:12,590
returned now the interesting thing about
640
00:25:12,590 --> 00:25:15,620
this kind of an architecture is it
641
00:25:15,620 --> 00:25:17,570
doesn't really matter where you draw the
642
00:25:17,570 --> 00:25:19,340
line what runs on the client computer
643
00:25:19,340 --> 00:25:20,990
what runs on the server computer it's
644
00:25:20,990 --> 00:25:22,700
still healthy to think about this
645
00:25:22,700 --> 00:25:25,190
architecture in particular when you have
646
00:25:25,190 --> 00:25:26,630
to write that JavaScript code so
647
00:25:26,630 --> 00:25:28,970
virtually the JavaScript code now does
648
00:25:28,970 --> 00:25:30,620
almost everything in the interface I
649
00:25:30,620 --> 00:25:32,090
showed you when you're thinking about
650
00:25:32,090 --> 00:25:33,770
designing that JavaScript code you
651
00:25:33,770 --> 00:25:35,480
essentially still want to say I have the
652
00:25:35,480 --> 00:25:38,419
same architecture except now that any
653
00:25:38,419 --> 00:25:40,789
kind of message exchange is in fact some
654
00:25:40,789 --> 00:25:43,250
sort of a function call right so then
655
00:25:43,250 --> 00:25:44,570
selectively you can replace function
656
00:25:44,570 --> 00:25:47,600
calls by remove function calls or some
657
00:25:47,600 --> 00:25:50,360
sort of message exchange in one way or
658
00:25:50,360 --> 00:25:52,400
another and you essentially can decide
659
00:25:52,400 --> 00:25:54,049
where to actually place various
660
00:25:54,049 --> 00:25:55,280
functionality so I want you to
661
00:25:55,280 --> 00:25:57,230
understand that the architecture is
662
00:25:57,230 --> 00:26:00,260
dissociated in fact from the specific
663
00:26:00,260 --> 00:26:02,659
implementation where things run how they
664
00:26:02,659 --> 00:26:04,220
run and what's actually happening and
665
00:26:04,220 --> 00:26:06,230
this is one of the things you want to do
666
00:26:06,230 --> 00:26:08,090
when you discuss about any kind of
667
00:26:08,090 --> 00:26:09,770
software writing but distribute systems
668
00:26:09,770 --> 00:26:11,840
in particular right dissociating the
669
00:26:11,840 --> 00:26:13,580
ideas the fact that you're going to do
670
00:26:13,580 --> 00:26:14,990
layering that's a way to organize the
671
00:26:14,990 --> 00:26:17,450
data from specifically what is
672
00:26:17,450 --> 00:26:21,850
implemented where all right
673
00:26:23,080 --> 00:26:25,269
okay now I mentioned the fact that we
674
00:26:25,269 --> 00:26:26,769
have client-server architectures in
675
00:26:26,769 --> 00:26:28,239
which you have clients and servers and
676
00:26:28,239 --> 00:26:30,580
by the way that's how the web is powered
677
00:26:30,580 --> 00:26:33,369
at least in principle right so when you
678
00:26:33,369 --> 00:26:36,480
want to read your mail you go to
679
00:26:36,480 --> 00:26:38,200
gmail.com
680
00:26:38,200 --> 00:26:39,970
that means you access some sort of a
681
00:26:39,970 --> 00:26:42,940
server by the way there is a lot of Java
682
00:26:42,940 --> 00:26:44,830
Script magic happening in the in the
683
00:26:44,830 --> 00:26:48,549
Gmail application right literally Gmail
684
00:26:48,549 --> 00:26:52,419
drove the development of Chrome as a web
685
00:26:52,419 --> 00:26:55,389
browser a lot of features and a lot of
686
00:26:55,389 --> 00:26:57,369
speeding chrome comes from the need to
687
00:26:57,369 --> 00:27:01,960
do the kind of advanced email interface
688
00:27:01,960 --> 00:27:04,779
that the Gmail has okay and once chrome
689
00:27:04,779 --> 00:27:06,460
got the speed everybody else had
690
00:27:06,460 --> 00:27:07,989
pressure on them to increase the speed
691
00:27:07,989 --> 00:27:09,429
and this is why we have good browsers
692
00:27:09,429 --> 00:27:11,919
now right because of this application
693
00:27:11,919 --> 00:27:14,679
application pressure you want more
694
00:27:14,679 --> 00:27:16,029
advanced applications you put more
695
00:27:16,029 --> 00:27:18,730
pressure on the mid layer which is now
696
00:27:18,730 --> 00:27:22,080
the web browser right and then
697
00:27:22,080 --> 00:27:25,509
competition progress great browsers okay
698
00:27:25,509 --> 00:27:27,070
I mentioned the fact that I've seen a 3d
699
00:27:27,070 --> 00:27:28,509
game running in a browser right Unreal
700
00:27:28,509 --> 00:27:30,580
Tournament things are getting much
701
00:27:30,580 --> 00:27:30,999
better
702
00:27:30,999 --> 00:27:34,509
all right now the peer-to-peer networks
703
00:27:34,509 --> 00:27:37,049
are the opposite if you want of
704
00:27:37,049 --> 00:27:40,480
centralized systems in which there's
705
00:27:40,480 --> 00:27:42,129
really no more client and server
706
00:27:42,129 --> 00:27:43,600
everybody is both a client and a server
707
00:27:43,600 --> 00:27:46,179
at the same time and some sort of a more
708
00:27:46,179 --> 00:27:48,549
global collaboration now there are many
709
00:27:48,549 --> 00:27:50,440
reasons why you might want to do a
710
00:27:50,440 --> 00:27:52,629
peer-to-peer system right one of them is
711
00:27:52,629 --> 00:27:55,960
extreme resilience for example it's not
712
00:27:55,960 --> 00:27:59,289
enough to take down some of one or some
713
00:27:59,289 --> 00:28:00,609
of the servers to take down the entire
714
00:28:00,609 --> 00:28:02,409
service and this might be important for
715
00:28:02,409 --> 00:28:04,450
many many reasons so how many people
716
00:28:04,450 --> 00:28:08,919
know about Napster how many people know
717
00:28:08,919 --> 00:28:12,340
how Napster worked so the key question
718
00:28:12,340 --> 00:28:14,619
is one second the key question we have
719
00:28:14,619 --> 00:28:18,039
to ask is was Napster client-server
720
00:28:18,039 --> 00:28:19,899
architecture or was some more
721
00:28:19,899 --> 00:28:23,940
peer-to-peer decentralized thing so
722
00:28:25,280 --> 00:28:29,580
right so natural was the first service
723
00:28:29,580 --> 00:28:32,429
we want it to be a peer-to-peer but in
724
00:28:32,429 --> 00:28:36,539
fact part of it was client-server ok the
725
00:28:36,539 --> 00:28:37,650
question is which part we're going to
726
00:28:37,650 --> 00:28:38,850
come back to an absurd later but which
727
00:28:38,850 --> 00:28:41,220
part was client-server so you see when
728
00:28:41,220 --> 00:28:42,900
it comes to finally when it comes to
729
00:28:42,900 --> 00:28:45,210
accessing resources we are going to see
730
00:28:45,210 --> 00:28:47,220
this later you need at least two kinds
731
00:28:47,220 --> 00:28:48,809
of activities one of them is to find the
732
00:28:48,809 --> 00:28:50,039
resource who has the resource and then
733
00:28:50,039 --> 00:28:52,590
to access it right now what next er had
734
00:28:52,590 --> 00:28:55,470
is who had the resource who was the
735
00:28:55,470 --> 00:28:56,730
server that had the resource they were
736
00:28:56,730 --> 00:28:58,919
really having millions of servers
737
00:28:58,919 --> 00:29:00,480
because every client was a server as
738
00:29:00,480 --> 00:29:03,030
well ok if you want it to be but what it
739
00:29:03,030 --> 00:29:04,470
had as a pure client-server architecture
740
00:29:04,470 --> 00:29:07,860
was the name lookup the lookup between
741
00:29:07,860 --> 00:29:10,890
the name of the file some sort of a
742
00:29:10,890 --> 00:29:12,690
directory structure right the directory
743
00:29:12,690 --> 00:29:15,179
structure was client-server the content
744
00:29:15,179 --> 00:29:18,720
was peer-to-peer now you see if the
745
00:29:18,720 --> 00:29:20,640
directories so if you take care of the
746
00:29:20,640 --> 00:29:22,890
directory structure then you don't need
747
00:29:22,890 --> 00:29:24,570
much of an organization for where the
748
00:29:24,570 --> 00:29:26,309
content is because you just get a random
749
00:29:26,309 --> 00:29:27,900
IP address you go to that IP address and
750
00:29:27,900 --> 00:29:30,000
you're done and the tcp/ip protocol does
751
00:29:30,000 --> 00:29:32,490
its job and it's very easy to implement
752
00:29:32,490 --> 00:29:35,669
the content access just point-to-point
753
00:29:35,669 --> 00:29:39,390
connections by the way the network sack
754
00:29:39,390 --> 00:29:42,260
is designed so that every single
755
00:29:42,260 --> 00:29:44,610
computer can be both a client or a
756
00:29:44,610 --> 00:29:46,289
server at the same time namely it can
757
00:29:46,289 --> 00:29:49,500
both open connections and listen for
758
00:29:49,500 --> 00:29:51,059
connections to be opened and actually
759
00:29:51,059 --> 00:29:53,400
both are needed to even run what you
760
00:29:53,400 --> 00:29:55,260
would normally think as being a normal
761
00:29:55,260 --> 00:29:59,400
client ok good now why is this important
762
00:29:59,400 --> 00:30:03,330
because there is then a very clear way
763
00:30:03,330 --> 00:30:05,159
to take down than the Napster service
764
00:30:05,159 --> 00:30:06,900
what you do you shut down the name
765
00:30:06,900 --> 00:30:08,610
servers it doesn't matter that you have
766
00:30:08,610 --> 00:30:12,299
millions of actual servers that happen
767
00:30:12,299 --> 00:30:13,919
to also be client and have content now
768
00:30:13,919 --> 00:30:17,659
nobody knows where the content is right
769
00:30:17,659 --> 00:30:20,610
so if that's effective what happening I
770
00:30:20,610 --> 00:30:22,919
think in 2001 right the music industry
771
00:30:22,919 --> 00:30:24,510
was completely outraged
772
00:30:24,510 --> 00:30:26,220
I mean after was cool for a couple of
773
00:30:26,220 --> 00:30:28,919
years they start to have hundreds of
774
00:30:28,919 --> 00:30:32,250
millions of users then they shut down
775
00:30:32,250 --> 00:30:34,049
the name servers they only had about
776
00:30:34,049 --> 00:30:35,789
fourteen by the way and they were
777
00:30:35,789 --> 00:30:36,780
geographically
778
00:30:36,780 --> 00:30:39,060
this is a classic approach to do some
779
00:30:39,060 --> 00:30:41,790
sort of balancing right load balancing
780
00:30:41,790 --> 00:30:45,660
to avoid having overwhelm servers if you
781
00:30:45,660 --> 00:30:47,790
have millions of people looking up in
782
00:30:47,790 --> 00:30:51,630
your in your name server right a single
783
00:30:51,630 --> 00:30:53,670
machine start not to be so good even if
784
00:30:53,670 --> 00:30:55,320
you're just saying this machine with
785
00:30:55,320 --> 00:30:57,000
this IP address has the file if that
786
00:30:57,000 --> 00:30:58,500
becomes overwhelming if you have
787
00:30:58,500 --> 00:31:00,690
millions of such requests flying at
788
00:31:00,690 --> 00:31:04,950
every every moment ok good so then what
789
00:31:04,950 --> 00:31:07,350
do you do you design systems that have
790
00:31:07,350 --> 00:31:10,560
absolutely no centralization really the
791
00:31:10,560 --> 00:31:16,130
goal there was not to do so because the
792
00:31:16,130 --> 00:31:18,210
Napster hit actually didn't work it
793
00:31:18,210 --> 00:31:20,460
worked perfectly I would say it was to
794
00:31:20,460 --> 00:31:21,780
do it because then it's virtually
795
00:31:21,780 --> 00:31:23,850
impossible to take it down so if you
796
00:31:23,850 --> 00:31:26,670
could remove the need for a centralized
797
00:31:26,670 --> 00:31:28,680
name service and you could have a true
798
00:31:28,680 --> 00:31:31,020
peer-to-peer means everybody can act
799
00:31:31,020 --> 00:31:32,640
both as a client and a server a true
800
00:31:32,640 --> 00:31:34,500
peer-to-peer system then essentially
801
00:31:34,500 --> 00:31:36,890
even if you take millions of nodes off
802
00:31:36,890 --> 00:31:39,450
the system supposedly is still running
803
00:31:39,450 --> 00:31:41,160
so a core interesting question we are
804
00:31:41,160 --> 00:31:42,690
going to see this later in the class is
805
00:31:42,690 --> 00:31:43,890
how do you design this peer-to-peer
806
00:31:43,890 --> 00:31:45,480
systems that can survive this massive
807
00:31:45,480 --> 00:31:47,730
removal of nodes and still kind of pick
808
00:31:47,730 --> 00:31:49,380
if you do that then it's virtually
809
00:31:49,380 --> 00:31:51,150
impossible to shut down the system right
810
00:31:51,150 --> 00:31:53,370
and in fact this is happening to a large
811
00:31:53,370 --> 00:31:55,050
extent with the torrent right you can
812
00:31:55,050 --> 00:31:56,790
start one down another one Springs up I
813
00:31:56,790 --> 00:31:58,260
mean it's everywhere or pure
814
00:31:58,260 --> 00:32:00,660
peer-to-peer systems ok now when it
815
00:32:00,660 --> 00:32:02,430
comes to these peer-to-peer systems you
816
00:32:02,430 --> 00:32:03,930
can ask a more structured question is
817
00:32:03,930 --> 00:32:05,780
how should we go about designing them
818
00:32:05,780 --> 00:32:08,250
can you be systematic in a certain way
819
00:32:08,250 --> 00:32:11,910
and if this was really the darling
820
00:32:11,910 --> 00:32:13,710
research topic in distributed systems
821
00:32:13,710 --> 00:32:16,440
for a while so people came up with a
822
00:32:16,440 --> 00:32:18,870
number of interesting interesting
823
00:32:18,870 --> 00:32:20,400
solutions for this one of them is this
824
00:32:20,400 --> 00:32:21,740
idea of a structured peer-to-peer
825
00:32:21,740 --> 00:32:24,480
network so in a peer-to-peer network you
826
00:32:24,480 --> 00:32:25,980
want to know about nodes but not about
827
00:32:25,980 --> 00:32:27,630
all the nodes you simply can't keep
828
00:32:27,630 --> 00:32:29,460
track of all the nodes in the system if
829
00:32:29,460 --> 00:32:32,240
you have a million nodes in the system
830
00:32:32,240 --> 00:32:34,290
accurately keeping track of a million
831
00:32:34,290 --> 00:32:36,300
nodes it's essentially an impossible
832
00:32:36,300 --> 00:32:38,280
task now I want you to understand that
833
00:32:38,280 --> 00:32:39,510
it's impossible now because you don't
834
00:32:39,510 --> 00:32:41,310
have enough memory to keep track of a
835
00:32:41,310 --> 00:32:42,510
million nodes so that's not a problem
836
00:32:42,510 --> 00:32:45,780
right now it's simply you can't possibly
837
00:32:45,780 --> 00:32:49,100
know what even ten guys for sure do like
838
00:32:49,100 --> 00:32:51,320
a million guys though right so knowing
839
00:32:51,320 --> 00:32:53,090
which of the million are alive because
840
00:32:53,090 --> 00:32:55,009
if you contact servers are not alive
841
00:32:55,009 --> 00:32:56,840
it's kind of tough or distributing that
842
00:32:56,840 --> 00:32:58,909
information right when another know
843
00:32:58,909 --> 00:33:00,769
joins if you really have a million nodes
844
00:33:00,769 --> 00:33:02,299
in and you have to tell this guy what
845
00:33:02,299 --> 00:33:04,610
all the million other guys do it starts
846
00:33:04,610 --> 00:33:06,940
to be problematic okay
847
00:33:06,940 --> 00:33:09,259
so the structure peer-to-peer networks
848
00:33:09,259 --> 00:33:13,820
came came up with a very systematic way
849
00:33:13,820 --> 00:33:15,559
to decide which are the nodes you should
850
00:33:15,559 --> 00:33:17,600
know about which are the connections you
851
00:33:17,600 --> 00:33:20,029
should keep track of any particular they
852
00:33:20,029 --> 00:33:21,139
were looking for just a logarithmic
853
00:33:21,139 --> 00:33:23,509
number of such connections right again
854
00:33:23,509 --> 00:33:26,059
by the way you can already see a little
855
00:33:26,059 --> 00:33:29,210
bit of big o-notation obsession creeping
856
00:33:29,210 --> 00:33:32,149
in while logarithmic this is very
857
00:33:32,149 --> 00:33:35,690
legitimate question but logarithms are
858
00:33:35,690 --> 00:33:38,450
kind of very nice mostly because the
859
00:33:38,450 --> 00:33:40,490
theory fictions told us that logarithms
860
00:33:40,490 --> 00:33:42,139
are good and anything that's not too
861
00:33:42,139 --> 00:33:44,120
logarithm it's bad but I mean if you
862
00:33:44,120 --> 00:33:45,529
think about it even square root might be
863
00:33:45,529 --> 00:33:48,230
fine right square root of a million it's
864
00:33:48,230 --> 00:33:50,029
a thousand well a thousand is not such a
865
00:33:50,029 --> 00:33:52,159
bad thing number of nodes to keep track
866
00:33:52,159 --> 00:33:54,789
of I would say okay but nevertheless
867
00:33:54,789 --> 00:33:56,539
virtually all the structure peer-to-peer
868
00:33:56,539 --> 00:33:58,940
architectures like that logarithm and
869
00:33:58,940 --> 00:34:00,049
they're gonna keep track of only a
870
00:34:00,049 --> 00:34:02,600
longer than a number of such nodes okay
871
00:34:02,600 --> 00:34:04,429
how they do that we are gonna go into
872
00:34:04,429 --> 00:34:06,169
many many details later in the class
873
00:34:06,169 --> 00:34:09,619
right yes by coming with various tricks
874
00:34:09,619 --> 00:34:11,540
to be able to in fact compute which are
875
00:34:11,540 --> 00:34:13,969
the the logarithm number of neighbors
876
00:34:13,969 --> 00:34:19,099
you should actually have right almost
877
00:34:19,099 --> 00:34:20,719
all of them are based on on an idea
878
00:34:20,719 --> 00:34:22,909
called the distributed hash table okay
879
00:34:22,909 --> 00:34:25,099
and that forms some kind of a virtual
880
00:34:25,099 --> 00:34:27,859
ring and jumping on this ring with steps
881
00:34:27,859 --> 00:34:29,359
of increasing size it essentially gives
882
00:34:29,359 --> 00:34:30,889
you the longer if number of steps okay
883
00:34:30,889 --> 00:34:32,418
can be proven has all kinds of nice
884
00:34:32,418 --> 00:34:34,040
properties as long as things don't
885
00:34:34,040 --> 00:34:35,540
change too fast when becomes very
886
00:34:35,540 --> 00:34:37,129
problematic to analyze in any way shape
887
00:34:37,129 --> 00:34:37,750
or form
888
00:34:37,750 --> 00:34:41,449
okay another approach is to say you know
889
00:34:41,449 --> 00:34:43,639
what let's view the space of documents
890
00:34:43,639 --> 00:34:44,929
at some sort of a multi-dimensional
891
00:34:44,929 --> 00:34:46,879
space and let's cut it into pieces to
892
00:34:46,879 --> 00:34:49,668
determine who is responsible for every
893
00:34:49,668 --> 00:34:52,369
piece and then to find a way to navigate
894
00:34:52,369 --> 00:34:55,760
this space right so in any such net or
895
00:34:55,760 --> 00:34:57,290
peer-to-peer or not peer-to-peer the
896
00:34:57,290 --> 00:34:59,180
question is how can you find a resource
897
00:34:59,180 --> 00:35:00,980
by name who actually has the resource by
898
00:35:00,980 --> 00:35:01,530
name and then
899
00:35:01,530 --> 00:35:03,630
how can you grab the resource grabbing
900
00:35:03,630 --> 00:35:04,890
the resource it's easy if you can do
901
00:35:04,890 --> 00:35:06,690
this point-to-point connections so kind
902
00:35:06,690 --> 00:35:08,040
of keep on talking about point-to-point
903
00:35:08,040 --> 00:35:10,260
connections versus some sort of a lookup
904
00:35:10,260 --> 00:35:13,170
right point of all connections are by
905
00:35:13,170 --> 00:35:17,520
far the best approach right unless the
906
00:35:17,520 --> 00:35:19,200
network becomes overwhelming to get data
907
00:35:19,200 --> 00:35:21,090
torrents do something interesting there
908
00:35:21,090 --> 00:35:23,910
right which is files get cached in
909
00:35:23,910 --> 00:35:26,400
multiple places to alleviate network
910
00:35:26,400 --> 00:35:28,050
strangulation right we're gonna come
911
00:35:28,050 --> 00:35:29,640
back to this as well this is also some
912
00:35:29,640 --> 00:35:31,170
sort of a disability assistance strategy
913
00:35:31,170 --> 00:35:33,630
right so can it's not such a system in
914
00:35:33,630 --> 00:35:35,970
which some sort of a partitioning of a
915
00:35:35,970 --> 00:35:37,020
high dimensional space is actually
916
00:35:37,020 --> 00:35:39,870
happening and then things are easy for
917
00:35:39,870 --> 00:35:42,000
awhile right for example if a new node
918
00:35:42,000 --> 00:35:45,540
joins in right you have this
919
00:35:45,540 --> 00:35:47,420
partitioning and you know joins in and
920
00:35:47,420 --> 00:35:49,470
randomly picks a location in the space
921
00:35:49,470 --> 00:35:50,970
and it picks this location then you have
922
00:35:50,970 --> 00:35:53,480
to split this location you cut it in two
923
00:35:53,480 --> 00:35:55,770
so the new node and you all know that
924
00:35:55,770 --> 00:35:58,020
own the entire location communicate and
925
00:35:58,020 --> 00:36:00,300
partitioned say the set of documents
926
00:36:00,300 --> 00:36:01,830
right and you partition the space but
927
00:36:01,830 --> 00:36:03,380
then the trouble is when this guy dies
928
00:36:03,380 --> 00:36:06,030
how do you put together such pieces
929
00:36:06,030 --> 00:36:07,740
they're not nice rectangular anymore all
930
00:36:07,740 --> 00:36:09,480
kinds of complications potentially right
931
00:36:09,480 --> 00:36:11,760
so all inevitably what's going to happen
932
00:36:11,760 --> 00:36:13,860
is any such solutions are going to have
933
00:36:13,860 --> 00:36:15,330
some very nice properties under certain
934
00:36:15,330 --> 00:36:17,250
circumstances in some weird situations
935
00:36:17,250 --> 00:36:19,410
to deal with in other circumstances okay
936
00:36:19,410 --> 00:36:21,270
now one of your protests is going to be
937
00:36:21,270 --> 00:36:22,650
to implement simulate one of these
938
00:36:22,650 --> 00:36:24,180
things we're using actors so it's gonna
939
00:36:24,180 --> 00:36:26,480
be quite a lot of fun
940
00:36:26,480 --> 00:36:30,600
all right now something that you might
941
00:36:30,600 --> 00:36:32,190
have seen in a networking class whenever
942
00:36:32,190 --> 00:36:34,370
it comes to communication you have to
943
00:36:34,370 --> 00:36:37,950
establish a so-called protocol right for
944
00:36:37,950 --> 00:36:39,840
example it's easy to say an actor talks
945
00:36:39,840 --> 00:36:41,940
to another actor but the question of
946
00:36:41,940 --> 00:36:44,640
course is what do they talk about what's
947
00:36:44,640 --> 00:36:47,250
in that message what one actor tells the
948
00:36:47,250 --> 00:36:49,860
other actor now that's important why
949
00:36:49,860 --> 00:36:52,290
because if an actor tells something to
950
00:36:52,290 --> 00:36:54,360
another actor then the actor that send
951
00:36:54,360 --> 00:36:55,830
the message can assume about what the
952
00:36:55,830 --> 00:36:57,450
other guy at least we'll know after a
953
00:36:57,450 --> 00:37:00,600
while all right and a lot of distributed
954
00:37:00,600 --> 00:37:03,690
systems are about who knows what so what
955
00:37:03,690 --> 00:37:05,070
can you assume that the other party
956
00:37:05,070 --> 00:37:08,220
knows right because if you can assume
957
00:37:08,220 --> 00:37:10,440
about what the other party knows you can
958
00:37:10,440 --> 00:37:12,270
have a certain expectation for what
959
00:37:12,270 --> 00:37:13,380
would happen when you ask a certain
960
00:37:13,380 --> 00:37:14,940
question and ultimately it's about
961
00:37:14,940 --> 00:37:15,460
getting
962
00:37:15,460 --> 00:37:18,570
something from from somewhere right so
963
00:37:18,570 --> 00:37:21,280
for example in a structure peer-to-peer
964
00:37:21,280 --> 00:37:24,190
network you can essentially figure out
965
00:37:24,190 --> 00:37:25,360
who should have a certain document
966
00:37:25,360 --> 00:37:27,760
because it's placed using certain rules
967
00:37:27,760 --> 00:37:29,650
and then the protocol can reflect that
968
00:37:29,650 --> 00:37:35,920
right now at the opposite end well there
969
00:37:35,920 --> 00:37:37,690
are many opposite ends but you can
970
00:37:37,690 --> 00:37:39,070
remove the structured peer-to-peer
971
00:37:39,070 --> 00:37:42,040
network and that requires using hashes
972
00:37:42,040 --> 00:37:43,360
in a very clever way one way or another
973
00:37:43,360 --> 00:37:44,770
and you can have purely unstructured
974
00:37:44,770 --> 00:37:48,730
ones in which it's truly ad-hoc I think
975
00:37:48,730 --> 00:37:52,750
Casa was like this right so Napster big
976
00:37:52,750 --> 00:37:54,370
problem with the service and casa came
977
00:37:54,370 --> 00:37:56,110
along in which you simply had to
978
00:37:56,110 --> 00:37:57,930
complete the ad-hoc
979
00:37:57,930 --> 00:37:59,800
Association all you had to know it's
980
00:37:59,800 --> 00:38:02,380
another another client that was part of
981
00:38:02,380 --> 00:38:04,060
the network and from that guy you got
982
00:38:04,060 --> 00:38:05,860
information about what he knows partial
983
00:38:05,860 --> 00:38:07,810
information what he knows so you can
984
00:38:07,810 --> 00:38:10,540
come up with higher-level ideas right
985
00:38:10,540 --> 00:38:13,870
in particular this idea of that what you
986
00:38:13,870 --> 00:38:16,990
know about other peers in a peer-to-peer
987
00:38:16,990 --> 00:38:18,900
system is some sort of a partial view
988
00:38:18,900 --> 00:38:21,700
right so essentially then you're saying
989
00:38:21,700 --> 00:38:23,770
hey there is some kind of a global view
990
00:38:23,770 --> 00:38:25,180
in the system is what the collection of
991
00:38:25,180 --> 00:38:28,300
all the peers do but no specific period
992
00:38:28,300 --> 00:38:30,310
is how gonna have a global view each of
993
00:38:30,310 --> 00:38:31,450
them are going to have only a partial
994
00:38:31,450 --> 00:38:32,560
view so they're not only about the
995
00:38:32,560 --> 00:38:34,690
subset of such nodes when they exchange
996
00:38:34,690 --> 00:38:36,070
messages they could exchange information
997
00:38:36,070 --> 00:38:38,290
about those partial views right so this
998
00:38:38,290 --> 00:38:41,080
is a particular strategy in which when a
999
00:38:41,080 --> 00:38:43,930
new node joins in or when you have one
1000
00:38:43,930 --> 00:38:46,570
want to talk about let's change a little
1001
00:38:46,570 --> 00:38:48,010
bit about what we know about in the
1002
00:38:48,010 --> 00:38:49,840
system and not really just going and
1003
00:38:49,840 --> 00:38:51,970
getting those files but trying to
1004
00:38:51,970 --> 00:38:53,290
maintain some sort of information about
1005
00:38:53,290 --> 00:38:55,360
connectivity right then essentially what
1006
00:38:55,360 --> 00:38:56,710
I can do is I can tell you what I know
1007
00:38:56,710 --> 00:38:58,990
about my partial view and you can
1008
00:38:58,990 --> 00:39:00,580
combine me with your parts with you now
1009
00:39:00,580 --> 00:39:02,410
what you do need is some mechanism to
1010
00:39:02,410 --> 00:39:03,640
limit the amount of information you have
1011
00:39:03,640 --> 00:39:05,920
to keep track of you can say hey I want
1012
00:39:05,920 --> 00:39:07,900
to know about see peers see some sort of
1013
00:39:07,900 --> 00:39:09,700
a constant okay maybe it's logarithmic
1014
00:39:09,700 --> 00:39:11,380
babies more than logarithmic whatever it
1015
00:39:11,380 --> 00:39:13,690
is so essentially what I can do is say
1016
00:39:13,690 --> 00:39:15,070
hey when I communicate with you I'll
1017
00:39:15,070 --> 00:39:20,560
pick randomly see over to half of the
1018
00:39:20,560 --> 00:39:22,210
nodes I know about and I'm gonna send
1019
00:39:22,210 --> 00:39:23,650
your message and tell you about those
1020
00:39:23,650 --> 00:39:25,420
nodes and essentially what you could do
1021
00:39:25,420 --> 00:39:27,430
is throw away half your information and
1022
00:39:27,430 --> 00:39:28,750
keep the half caf
1023
00:39:28,750 --> 00:39:30,130
from me and that will provide some kind
1024
00:39:30,130 --> 00:39:31,870
of a dynamic component into the system
1025
00:39:31,870 --> 00:39:34,120
that propagates the information not
1026
00:39:34,120 --> 00:39:36,430
quite clear why this is good but
1027
00:39:36,430 --> 00:39:38,320
potentially might be interesting right
1028
00:39:38,320 --> 00:39:39,790
well it turns out that this kind of
1029
00:39:39,790 --> 00:39:41,470
protocols it's a randomized protocols
1030
00:39:41,470 --> 00:39:45,160
right in which you randomly select some
1031
00:39:45,160 --> 00:39:46,540
parts of information and possibly even
1032
00:39:46,540 --> 00:39:47,890
randomly select who you send the
1033
00:39:47,890 --> 00:39:49,750
information to these kind of protocols
1034
00:39:49,750 --> 00:39:51,910
are extremely resilient now it's very
1035
00:39:51,910 --> 00:39:53,800
hard to understand intuitively why
1036
00:39:53,800 --> 00:39:54,490
that's the case
1037
00:39:54,490 --> 00:39:57,970
and it's hard theoretically to prove why
1038
00:39:57,970 --> 00:39:59,860
that's the case but it can be proved
1039
00:39:59,860 --> 00:40:01,480
theoretically right we are going to come
1040
00:40:01,480 --> 00:40:03,250
back to this later these are so-called
1041
00:40:03,250 --> 00:40:06,280
gossiping like algorithms that have
1042
00:40:06,280 --> 00:40:08,230
extremely good properties if you truly
1043
00:40:08,230 --> 00:40:10,600
pick a random node to exchange
1044
00:40:10,600 --> 00:40:12,940
information with then for example if you
1045
00:40:12,940 --> 00:40:18,310
just want to say want one single one
1046
00:40:18,310 --> 00:40:20,770
single message right let's all agree
1047
00:40:20,770 --> 00:40:23,290
that something happened you can have
1048
00:40:23,290 --> 00:40:25,780
this kind of virus like dissemination
1049
00:40:25,780 --> 00:40:28,630
that will run at exponential speed and
1050
00:40:28,630 --> 00:40:30,970
there are a lot of circumstances and
1051
00:40:30,970 --> 00:40:32,050
we're going to come back to this when we
1052
00:40:32,050 --> 00:40:33,690
talk about various networks right
1053
00:40:33,690 --> 00:40:35,590
exponential speed is very good in
1054
00:40:35,590 --> 00:40:38,530
general right that means in a
1055
00:40:38,530 --> 00:40:39,970
logarithmic number of steps everybody
1056
00:40:39,970 --> 00:40:42,310
knows about what happened now this is a
1057
00:40:42,310 --> 00:40:44,470
good logarithm hopefully as long as it
1058
00:40:44,470 --> 00:40:46,500
doesn't have a large constant okay so
1059
00:40:46,500 --> 00:40:49,870
strange as it might seem a lot of these
1060
00:40:49,870 --> 00:40:53,980
distributed systems use purely
1061
00:40:53,980 --> 00:40:55,630
randomized algorithms because they tend
1062
00:40:55,630 --> 00:40:57,160
to have a very good average behavior
1063
00:40:57,160 --> 00:41:01,180
okay well I'm gonna mention this later
1064
00:41:01,180 --> 00:41:03,460
when we actually see specifics but in
1065
00:41:03,460 --> 00:41:05,500
fact so how many of you took any class
1066
00:41:05,500 --> 00:41:07,120
that had some kind of randomization in
1067
00:41:07,120 --> 00:41:09,520
it sometimes the randomized algorithms
1068
00:41:09,520 --> 00:41:11,080
are taught a little bit as part of the
1069
00:41:11,080 --> 00:41:14,590
algorithms class right there is even
1070
00:41:14,590 --> 00:41:16,210
weirder stuff with pseudo random but
1071
00:41:16,210 --> 00:41:16,960
that's a different story
1072
00:41:16,960 --> 00:41:20,640
okay well you'll see some in this class
1073
00:41:20,640 --> 00:41:23,440
the important thing is a relatively
1074
00:41:23,440 --> 00:41:24,790
small amount of information for each of
1075
00:41:24,790 --> 00:41:26,830
the peers together with that complicated
1076
00:41:26,830 --> 00:41:28,300
theory will guarantee that you have good
1077
00:41:28,300 --> 00:41:30,190
properties in the global system okay and
1078
00:41:30,190 --> 00:41:32,370
such an unstructured peer-to-peer
1079
00:41:32,370 --> 00:41:34,360
protocol could be this one right with
1080
00:41:34,360 --> 00:41:36,130
the poor small pool mode and this one
1081
00:41:36,130 --> 00:41:37,660
actually stabilizes quite fast if you
1082
00:41:37,660 --> 00:41:40,000
share randomly with random neighbors I
1083
00:41:40,000 --> 00:41:42,400
share half my information very fast
1084
00:41:42,400 --> 00:41:44,319
know about very faraway neighbors and
1085
00:41:44,319 --> 00:41:46,450
that's a good thing because if you know
1086
00:41:46,450 --> 00:41:48,010
about very faraway neighbors you're
1087
00:41:48,010 --> 00:41:49,779
gonna get that logarithmic behavior in
1088
00:41:49,779 --> 00:41:51,700
terms of finding something very fast
1089
00:41:51,700 --> 00:41:54,309
okay now when everything else fails if
1090
00:41:54,309 --> 00:41:55,690
you want to find something you can do
1091
00:41:55,690 --> 00:41:57,369
something that's considered to be the
1092
00:41:57,369 --> 00:41:59,349
worst of the worst which is flood the
1093
00:41:59,349 --> 00:42:01,150
network you essentially start yelling
1094
00:42:01,150 --> 00:42:04,359
who knows where this file is you tell
1095
00:42:04,359 --> 00:42:05,890
all your neighbors your neighbors tell
1096
00:42:05,890 --> 00:42:07,270
other neighbors are the neighbors are
1097
00:42:07,270 --> 00:42:08,829
the neighbors eventually gets to who has
1098
00:42:08,829 --> 00:42:11,859
the file right that's one way to get
1099
00:42:11,859 --> 00:42:14,500
information of course to get that one
1100
00:42:14,500 --> 00:42:16,930
piece of the information with where is
1101
00:42:16,930 --> 00:42:19,359
that file you basically have to disrupt
1102
00:42:19,359 --> 00:42:21,730
everybody everybody has to know about
1103
00:42:21,730 --> 00:42:23,410
the fact that you are looking for sir
1104
00:42:23,410 --> 00:42:24,609
tonight and now of course if the item
1105
00:42:24,609 --> 00:42:25,809
it's important maybe that's warranted
1106
00:42:25,809 --> 00:42:28,390
it's like an Amber Alert right it you
1107
00:42:28,390 --> 00:42:30,279
just I've already start seeing it can
1108
00:42:30,279 --> 00:42:32,200
get worse than this right because the
1109
00:42:32,200 --> 00:42:33,819
yelling can propagate in waves and keep
1110
00:42:33,819 --> 00:42:37,089
on bouncing of the of the network if you
1111
00:42:37,089 --> 00:42:38,980
have no way to cut down on those
1112
00:42:38,980 --> 00:42:40,869
messages and make them die somehow they
1113
00:42:40,869 --> 00:42:44,650
can live for a very long time right for
1114
00:42:44,650 --> 00:42:45,970
example if you wouldn't remember that
1115
00:42:45,970 --> 00:42:48,760
you've seen this message if you send a
1116
00:42:48,760 --> 00:42:50,410
message to your neighbors your neighbors
1117
00:42:50,410 --> 00:42:51,640
to their neighbors but you're part of
1118
00:42:51,640 --> 00:42:52,869
the neighbors of they get you back the
1119
00:42:52,869 --> 00:42:54,390
message and everybody keeps on just
1120
00:42:54,390 --> 00:42:56,260
propagating this message is more and
1121
00:42:56,260 --> 00:42:57,910
more and more and more and more and then
1122
00:42:57,910 --> 00:43:00,460
everybody doesn't do I mean all the
1123
00:43:00,460 --> 00:43:01,869
messages in the system are just about
1124
00:43:01,869 --> 00:43:03,849
this flooding this doesn't even have a
1125
00:43:03,849 --> 00:43:06,609
way to die out right now you might say
1126
00:43:06,609 --> 00:43:09,490
hey we must do something about it but
1127
00:43:09,490 --> 00:43:10,839
sometimes it's tricky to do something
1128
00:43:10,839 --> 00:43:11,230
about it
1129
00:43:11,230 --> 00:43:13,599
so one particular solution for example
1130
00:43:13,599 --> 00:43:15,220
I'm just throwing it out there is some
1131
00:43:15,220 --> 00:43:18,010
sort of a time to live right for any
1132
00:43:18,010 --> 00:43:20,109
such message you include the counter and
1133
00:43:20,109 --> 00:43:22,150
you say this mashes message can be
1134
00:43:22,150 --> 00:43:24,839
retransmitted only let's say 20 times
1135
00:43:24,839 --> 00:43:27,940
every time the message gets sent again
1136
00:43:27,940 --> 00:43:30,190
you decrease the counter the counter
1137
00:43:30,190 --> 00:43:31,960
gets to zero you stop sending a message
1138
00:43:31,960 --> 00:43:34,420
that will make the message die it cannot
1139
00:43:34,420 --> 00:43:37,750
propagate more than 20 runs right now of
1140
00:43:37,750 --> 00:43:39,910
course the question is why 20 what's
1141
00:43:39,910 --> 00:43:43,529
special about 20 or how many do you need
1142
00:43:43,529 --> 00:43:47,200
right so then you have a solution but
1143
00:43:47,200 --> 00:43:48,730
it's only a partial solution because it
1144
00:43:48,730 --> 00:43:50,349
depends on magic parameters that you
1145
00:43:50,349 --> 00:43:51,339
still have to figure out how you are
1146
00:43:51,339 --> 00:43:52,869
actually select so if that number is
1147
00:43:52,869 --> 00:43:54,430
large you're guaranteed that everybody
1148
00:43:54,430 --> 00:43:55,599
hears about it with
1149
00:43:55,599 --> 00:43:57,549
very high probability but then you do
1150
00:43:57,549 --> 00:43:59,109
more disruption in the network if the
1151
00:43:59,109 --> 00:44:01,839
number is small then maybe there are
1152
00:44:01,839 --> 00:44:03,910
circumstances in which the right people
1153
00:44:03,910 --> 00:44:05,229
don't hear about it but at least you
1154
00:44:05,229 --> 00:44:06,579
don't have so many messages going in the
1155
00:44:06,579 --> 00:44:08,589
system right it's very very tricky
1156
00:44:08,589 --> 00:44:10,539
business that needs to be treated
1157
00:44:10,539 --> 00:44:13,749
possibly theoretically right by the way
1158
00:44:13,749 --> 00:44:17,170
the theory on such systems gets
1159
00:44:17,170 --> 00:44:19,390
extremely hard even under idealized
1160
00:44:19,390 --> 00:44:22,869
modelling conditions extremely hard even
1161
00:44:22,869 --> 00:44:24,910
partial success in doing some sort of a
1162
00:44:24,910 --> 00:44:26,589
theory and definitely distributed
1163
00:44:26,589 --> 00:44:27,970
systems literature is not the realm of
1164
00:44:27,970 --> 00:44:29,589
this right even partial successes can
1165
00:44:29,589 --> 00:44:33,039
it's it's kind of celebrated as a big
1166
00:44:33,039 --> 00:44:36,789
breakthrough right all right so this is
1167
00:44:36,789 --> 00:44:41,400
one such protocol right then you can
1168
00:44:41,400 --> 00:44:43,509
write some interesting research papers
1169
00:44:43,509 --> 00:44:44,950
in which you argue that even if you do
1170
00:44:44,950 --> 00:44:46,329
an instruction protocol if you put an
1171
00:44:46,329 --> 00:44:47,739
extra condition it starts to look like a
1172
00:44:47,739 --> 00:44:49,180
structured protocol right I'm not gonna
1173
00:44:49,180 --> 00:44:50,890
go into all the details but if you're a
1174
00:44:50,890 --> 00:44:52,569
little bit picky about how you select
1175
00:44:52,569 --> 00:44:53,920
that half information that you send to
1176
00:44:53,920 --> 00:44:55,509
somebody else and you have some sort of
1177
00:44:55,509 --> 00:44:57,970
a measure for that then you can actually
1178
00:44:57,970 --> 00:45:00,519
see that from everybody with everybody
1179
00:45:00,519 --> 00:45:02,229
connection that the very instruction
1180
00:45:02,229 --> 00:45:04,269
network might look like you can get more
1181
00:45:04,269 --> 00:45:05,559
and more structure than in the end look
1182
00:45:05,559 --> 00:45:07,420
very structured if you just have a
1183
00:45:07,420 --> 00:45:09,460
particular preference in how you select
1184
00:45:09,460 --> 00:45:10,989
the neighbors you want to retain over
1185
00:45:10,989 --> 00:45:14,499
time so the network can go from very
1186
00:45:14,499 --> 00:45:16,479
intertwined to a lot more structured
1187
00:45:16,479 --> 00:45:20,499
just by being picky about what
1188
00:45:20,499 --> 00:45:21,969
information you keep so it's kind of an
1189
00:45:21,969 --> 00:45:23,710
interesting result that says you don't
1190
00:45:23,710 --> 00:45:25,089
really need to be structured from the
1191
00:45:25,089 --> 00:45:26,289
beginning all you need to do is add
1192
00:45:26,289 --> 00:45:28,529
pickiness to unstructured and it will
1193
00:45:28,529 --> 00:45:31,660
form some sort of a structure now if you
1194
00:45:31,660 --> 00:45:33,369
think about this kind of a network then
1195
00:45:33,369 --> 00:45:34,989
it's not fantastic because to get from
1196
00:45:34,989 --> 00:45:36,609
one end to another for example this guy
1197
00:45:36,609 --> 00:45:38,950
wants to find a file that's here and you
1198
00:45:38,950 --> 00:45:42,099
have to go through many many hops to get
1199
00:45:42,099 --> 00:45:45,039
there so an interesting question would
1200
00:45:45,039 --> 00:45:46,630
be hey I might want to have structure
1201
00:45:46,630 --> 00:45:49,809
network but some some other links how
1202
00:45:49,809 --> 00:45:51,219
many such links would I need to be able
1203
00:45:51,219 --> 00:45:52,960
to traverse the network fast that's a
1204
00:45:52,960 --> 00:45:54,190
theoretical question and there are some
1205
00:45:54,190 --> 00:45:56,079
interesting solutions to that okay but
1206
00:45:56,079 --> 00:45:57,460
all them hard I mean just gonna mention
1207
00:45:57,460 --> 00:45:58,569
this kind of results you're not gonna
1208
00:45:58,569 --> 00:46:00,400
start proving anything about random
1209
00:46:00,400 --> 00:46:03,520
networks in this class okay
1210
00:46:03,520 --> 00:46:06,610
all right now I said peer-to-peer
1211
00:46:06,610 --> 00:46:09,610
network the peer in peer to peer means
1212
00:46:09,610 --> 00:46:12,640
everybody its equal right but sometimes
1213
00:46:12,640 --> 00:46:14,980
it's healthy to have some peers to be
1214
00:46:14,980 --> 00:46:16,600
more equal than some other peers and
1215
00:46:16,600 --> 00:46:18,490
that's where the super peers come into
1216
00:46:18,490 --> 00:46:18,880
play
1217
00:46:18,880 --> 00:46:21,280
I want you to understand that a lot of
1218
00:46:21,280 --> 00:46:25,840
this solutions came from evolution of
1219
00:46:25,840 --> 00:46:28,360
particular particular systems right so I
1220
00:46:28,360 --> 00:46:30,160
think casa initially was simply a
1221
00:46:30,160 --> 00:46:31,810
peer-to-peer system but then they figure
1222
00:46:31,810 --> 00:46:34,090
out that certain people have much better
1223
00:46:34,090 --> 00:46:35,740
computers than other people and much
1224
00:46:35,740 --> 00:46:37,270
better network connections which is even
1225
00:46:37,270 --> 00:46:40,420
more important hey why don't we give
1226
00:46:40,420 --> 00:46:42,220
more work to guys that have better
1227
00:46:42,220 --> 00:46:43,600
network connections and better computers
1228
00:46:43,600 --> 00:46:46,090
and that's how super peers popped up so
1229
00:46:46,090 --> 00:46:47,619
the super peer simply can take a lot
1230
00:46:47,619 --> 00:46:49,600
more of the load and it depends on if
1231
00:46:49,600 --> 00:46:51,190
you want to do some sort of a gradation
1232
00:46:51,190 --> 00:46:52,750
between peers and super peers or simply
1233
00:46:52,750 --> 00:46:54,580
have two classes peers and super peers
1234
00:46:54,580 --> 00:46:56,440
now the trick is the following to still
1235
00:46:56,440 --> 00:46:57,790
have enough super peers so you can't
1236
00:46:57,790 --> 00:47:00,250
shut down the system original Napster
1237
00:47:00,250 --> 00:47:02,860
system had essentially 14 super peers
1238
00:47:02,860 --> 00:47:04,119
which were the name servers and
1239
00:47:04,119 --> 00:47:05,650
everybody else was a norm up here the
1240
00:47:05,650 --> 00:47:07,150
super peers were just serving the Dames
1241
00:47:07,150 --> 00:47:13,210
you shut down oh and the Napster company
1242
00:47:13,210 --> 00:47:15,250
owned all of the 14 so all you need to
1243
00:47:15,250 --> 00:47:17,380
do is get some sort of a court
1244
00:47:17,380 --> 00:47:20,050
injunction since that company has to
1245
00:47:20,050 --> 00:47:21,760
work in a legitimate way you shut down
1246
00:47:21,760 --> 00:47:23,230
the entire system now if you have a
1247
00:47:23,230 --> 00:47:26,650
million super peers and a hundred
1248
00:47:26,650 --> 00:47:27,880
million normal peers
1249
00:47:27,880 --> 00:47:29,830
good luck chasing down the million super
1250
00:47:29,830 --> 00:47:32,290
peers right it's like fights you kill
1251
00:47:32,290 --> 00:47:36,700
one more spring up right and this
1252
00:47:36,700 --> 00:47:38,140
literally happens with a lot of these
1253
00:47:38,140 --> 00:47:42,760
services okay now these are interesting
1254
00:47:42,760 --> 00:47:44,920
ideas that we might want to use even
1255
00:47:44,920 --> 00:47:46,810
beyond just normal peer-to-peer systems
1256
00:47:46,810 --> 00:47:48,250
right if you want to have this super
1257
00:47:48,250 --> 00:47:50,830
resilient networks for example an
1258
00:47:50,830 --> 00:47:52,000
interesting thing I didn't talk about
1259
00:47:52,000 --> 00:47:54,040
and it's interesting in itself is this
1260
00:47:54,040 --> 00:47:57,190
idea of sensor networks so we keep on
1261
00:47:57,190 --> 00:47:58,030
talking about Computers Computers
1262
00:47:58,030 --> 00:47:59,590
Computers you have computers everywhere
1263
00:47:59,590 --> 00:48:01,480
computers mostly consume data and
1264
00:48:01,480 --> 00:48:03,240
somebody has to produce the data somehow
1265
00:48:03,240 --> 00:48:06,970
how well we can all say yes it's come
1266
00:48:06,970 --> 00:48:08,470
comes from databases and somebody
1267
00:48:08,470 --> 00:48:09,850
bothered to put it there but another way
1268
00:48:09,850 --> 00:48:12,580
to say is something measures it and
1269
00:48:12,580 --> 00:48:14,680
those are the sensors and that can be a
1270
00:48:14,680 --> 00:48:16,079
very interesting
1271
00:48:16,079 --> 00:48:19,859
kind of endeavor you don't device for
1272
00:48:19,859 --> 00:48:22,289
example by the way this is coming to
1273
00:48:22,289 --> 00:48:24,269
some extent it's already there but it's
1274
00:48:24,269 --> 00:48:26,130
possibly coming in a big way because
1275
00:48:26,130 --> 00:48:27,779
they're gonna run out of gizmos to put
1276
00:48:27,779 --> 00:48:30,209
in a cell phone soon your cell phone is
1277
00:48:30,209 --> 00:48:34,729
gonna be able to measure temperature
1278
00:48:34,729 --> 00:48:38,670
pressure possibly even your heart rate
1279
00:48:38,670 --> 00:48:40,199
when you hold it and stuff like that
1280
00:48:40,199 --> 00:48:42,779
right and then suddenly you you can ask
1281
00:48:42,779 --> 00:48:44,910
questions like what am I gonna do with
1282
00:48:44,910 --> 00:48:45,839
that information that will be
1283
00:48:45,839 --> 00:48:47,099
interesting by the way that could
1284
00:48:47,099 --> 00:48:48,599
provide very interesting data for
1285
00:48:48,599 --> 00:48:49,739
weather prediction the problem with
1286
00:48:49,739 --> 00:48:51,150
weather prediction now is you measure it
1287
00:48:51,150 --> 00:48:53,309
in too few points as you solve
1288
00:48:53,309 --> 00:48:54,479
differential equations and you simply
1289
00:48:54,479 --> 00:48:57,839
don't have enough points to know exactly
1290
00:48:57,839 --> 00:49:01,499
what's actually happening now if you
1291
00:49:01,499 --> 00:49:03,329
find any sort of application for example
1292
00:49:03,329 --> 00:49:06,089
for something a co-op a cool thing I
1293
00:49:06,089 --> 00:49:07,949
would like my phone to do is to tell me
1294
00:49:07,949 --> 00:49:10,079
what the outside temperature is because
1295
00:49:10,079 --> 00:49:12,179
that helps me determine whether I'm
1296
00:49:12,179 --> 00:49:14,880
justifying you feeling cold or something
1297
00:49:14,880 --> 00:49:17,449
is wrong with me that I'm feeling cold
1298
00:49:17,449 --> 00:49:20,249
this is why we have thermometers really
1299
00:49:20,249 --> 00:49:22,140
right because I mean otherwise you would
1300
00:49:22,140 --> 00:49:23,939
feel it's just kind of to calibrate our
1301
00:49:23,939 --> 00:49:28,380
own senses to say I would love to have a
1302
00:49:28,380 --> 00:49:31,319
thermometer and hydrometer on my cell
1303
00:49:31,319 --> 00:49:33,179
phone to know it's humid outside or not
1304
00:49:33,179 --> 00:49:35,219
humid outside now the moment I have the
1305
00:49:35,219 --> 00:49:37,439
convenience of having those sensors you
1306
00:49:37,439 --> 00:49:39,119
could gather that information that's a
1307
00:49:39,119 --> 00:49:41,630
sense of network okay or you could ask
1308
00:49:41,630 --> 00:49:44,609
could I deploy such sensor networks for
1309
00:49:44,609 --> 00:49:48,900
example I do some kind of kind of ocean
1310
00:49:48,900 --> 00:49:51,150
related research I'm gonna throw a
1311
00:49:51,150 --> 00:49:53,160
million sensors measure things with a
1312
00:49:53,160 --> 00:49:54,719
million sensors getting the information
1313
00:49:54,719 --> 00:49:57,359
and then well I can imagine things I
1314
00:49:57,359 --> 00:49:59,249
could do with it how many people solve
1315
00:49:59,249 --> 00:50:01,619
this movie but must be at least 15 years
1316
00:50:01,619 --> 00:50:04,079
old I think it was called twister with
1317
00:50:04,079 --> 00:50:06,209
those tornado researchers they had those
1318
00:50:06,209 --> 00:50:09,119
little flying sensors and the big
1319
00:50:09,119 --> 00:50:10,769
highlight of the movie is when they all
1320
00:50:10,769 --> 00:50:13,559
fly and they gather their data right
1321
00:50:13,559 --> 00:50:15,989
that's a sense of network oh that's
1322
00:50:15,989 --> 00:50:17,880
exactly a sense of network now
1323
00:50:17,880 --> 00:50:19,469
interesting question you have those
1324
00:50:19,469 --> 00:50:22,949
little gizmos they measure whatever they
1325
00:50:22,949 --> 00:50:26,939
measure and they send information now
1326
00:50:26,939 --> 00:50:28,319
that's the part that starts to become
1327
00:50:28,319 --> 00:50:29,860
troublesome what exactly does it mean
1328
00:50:29,860 --> 00:50:33,100
and information right who gets the
1329
00:50:33,100 --> 00:50:35,710
information how do you capture that
1330
00:50:35,710 --> 00:50:37,750
information how do you store it how do
1331
00:50:37,750 --> 00:50:39,850
you talk so fast with those sensors now
1332
00:50:39,850 --> 00:50:42,220
if it's just ten of them you can imagine
1333
00:50:42,220 --> 00:50:43,900
what computer not having problems
1334
00:50:43,900 --> 00:50:47,650
keeping up with it but let's say a
1335
00:50:47,650 --> 00:50:49,300
decent number to really get the shape of
1336
00:50:49,300 --> 00:50:50,830
a tornado would be a hundred thousand
1337
00:50:50,830 --> 00:50:51,550
right
1338
00:50:51,550 --> 00:50:53,770
how could you capture sensor information
1339
00:50:53,770 --> 00:50:57,450
from a hundred thousand little devices
1340
00:50:57,810 --> 00:51:00,730
so that's really the realm of the sensor
1341
00:51:00,730 --> 00:51:02,410
networks it starts to be very
1342
00:51:02,410 --> 00:51:04,270
problematic because you can't have each
1343
00:51:04,270 --> 00:51:05,770
of those a hundred thousand talking to
1344
00:51:05,770 --> 00:51:08,260
some sort of a server right you simply
1345
00:51:08,260 --> 00:51:11,740
won't be able to keep up with so many
1346
00:51:11,740 --> 00:51:14,680
things flying around I mean at least a
1347
00:51:14,680 --> 00:51:16,810
hundred thousand tcp/ip connections or
1348
00:51:16,810 --> 00:51:18,730
some sort I mean imagine writing a
1349
00:51:18,730 --> 00:51:20,290
program that can keep track of a hundred
1350
00:51:20,290 --> 00:51:22,420
thousand separate Internet connections
1351
00:51:22,420 --> 00:51:25,330
in a single machine what are you going
1352
00:51:25,330 --> 00:51:27,430
to do build a server farm to take it
1353
00:51:27,430 --> 00:51:28,990
with you in the truck that has the sense
1354
00:51:28,990 --> 00:51:32,620
of right so the idea is good it's sound
1355
00:51:32,620 --> 00:51:33,790
they aren't going to have all kinds of
1356
00:51:33,790 --> 00:51:35,560
small sensors to track how the tornado
1357
00:51:35,560 --> 00:51:38,470
moves but then the distribution systems
1358
00:51:38,470 --> 00:51:40,980
component comes into play and how do you
1359
00:51:40,980 --> 00:51:43,180
how do you get information from them a
1360
00:51:43,180 --> 00:51:44,410
particularly interesting idea we're
1361
00:51:44,410 --> 00:51:48,010
going to explore this right is you're
1362
00:51:48,010 --> 00:51:49,360
going to have sensors talk to sensors
1363
00:51:49,360 --> 00:51:50,800
and somehow propagate information and
1364
00:51:50,800 --> 00:51:52,450
then you have a lot less information to
1365
00:51:52,450 --> 00:51:53,800
get from the system because ultimately
1366
00:51:53,800 --> 00:51:55,120
what you want is some sort of knowledge
1367
00:51:55,120 --> 00:51:56,230
that is not a hundred thousand
1368
00:51:56,230 --> 00:51:57,910
measurements may be a lot less will be
1369
00:51:57,910 --> 00:51:59,890
good right so since the networks are all
1370
00:51:59,890 --> 00:52:01,600
about how do you cut down on the amount
1371
00:52:01,600 --> 00:52:03,760
of communication not to mention that you
1372
00:52:03,760 --> 00:52:06,010
need over the powerful antennas for all
1373
00:52:06,010 --> 00:52:07,330
those a hundred thousand guys to talk
1374
00:52:07,330 --> 00:52:10,840
long distance with the server if they
1375
00:52:10,840 --> 00:52:14,530
have peers nodes sends a note close
1376
00:52:14,530 --> 00:52:15,970
enough to them within few meters you can
1377
00:52:15,970 --> 00:52:18,130
use much much weaker signals and get
1378
00:52:18,130 --> 00:52:20,020
stuff done right so that's potentially
1379
00:52:20,020 --> 00:52:21,490
an interesting application of
1380
00:52:21,490 --> 00:52:23,770
distributed systems ideas that we need
1381
00:52:23,770 --> 00:52:25,630
to pursue and that's a particular kind
1382
00:52:25,630 --> 00:52:27,250
of if you want hardware architecture
1383
00:52:27,250 --> 00:52:30,790
okay in those circumstances you might in
1384
00:52:30,790 --> 00:52:33,160
fact use some sort of a super nodes and
1385
00:52:33,160 --> 00:52:36,010
normal nodes right the super nodes for
1386
00:52:36,010 --> 00:52:38,590
example could have a lot more battery
1387
00:52:38,590 --> 00:52:40,150
life a big problem in sensor networks is
1388
00:52:40,150 --> 00:52:42,250
battery life or you could even use the
1389
00:52:42,250 --> 00:52:43,210
following kind of tree
1390
00:52:43,210 --> 00:52:46,599
in which you switch roles but at certain
1391
00:52:46,599 --> 00:52:48,339
moments of time certain sensor networks
1392
00:52:48,339 --> 00:52:49,510
are super nodes and then they switch
1393
00:52:49,510 --> 00:52:50,859
roles because they drain their Barre
1394
00:52:50,859 --> 00:52:54,190
right by the way the battery life is one
1395
00:52:54,190 --> 00:52:55,270
of the biggest problems with any
1396
00:52:55,270 --> 00:52:59,859
autonomous sensor networks you can make
1397
00:52:59,859 --> 00:53:01,420
them small but then you have to put
1398
00:53:01,420 --> 00:53:02,619
small batteries in them
1399
00:53:02,619 --> 00:53:06,280
that's a trouble ok good so lots of
1400
00:53:06,280 --> 00:53:08,560
interesting ideas another one is this
1401
00:53:08,560 --> 00:53:11,050
idea of edge servers edge server systems
1402
00:53:11,050 --> 00:53:14,740
ok so when we think about we have
1403
00:53:14,740 --> 00:53:16,330
clients and servers even if you have the
1404
00:53:16,330 --> 00:53:18,490
traditional architecture right the
1405
00:53:18,490 --> 00:53:19,780
problem is the server can be on the
1406
00:53:19,780 --> 00:53:22,240
other side of the of the planet and then
1407
00:53:22,240 --> 00:53:23,950
you have large latencies and whatnot and
1408
00:53:23,950 --> 00:53:25,599
a particular idea this is what hekima
1409
00:53:25,599 --> 00:53:27,430
does I threw it up in the air in the
1410
00:53:27,430 --> 00:53:30,010
introduction was to play servers much
1411
00:53:30,010 --> 00:53:31,270
closer to where the client is in
1412
00:53:31,270 --> 00:53:34,810
particularly at the ISP right now those
1413
00:53:34,810 --> 00:53:36,609
kind of servers are called edge servers
1414
00:53:36,609 --> 00:53:40,089
right because they really live close to
1415
00:53:40,089 --> 00:53:42,580
your actual connection to the Internet
1416
00:53:42,580 --> 00:53:44,920
you don't then have to traverse the
1417
00:53:44,920 --> 00:53:47,680
Internet and then what you have to make
1418
00:53:47,680 --> 00:53:50,349
sure is that all the edge servers
1419
00:53:50,349 --> 00:53:52,599
maintain consistent and maybe
1420
00:53:52,599 --> 00:53:55,540
synchronizing some way information so
1421
00:53:55,540 --> 00:53:56,859
for example you're serving the pages of
1422
00:53:56,859 --> 00:53:58,960
CNN and by the way CNN it's a Akamai
1423
00:53:58,960 --> 00:54:01,480
client ok so it's not a hypothetical
1424
00:54:01,480 --> 00:54:04,680
scenario so you're serving pages for CNN
1425
00:54:04,680 --> 00:54:06,820
Akamai's algorithms essentially make
1426
00:54:06,820 --> 00:54:08,830
sure that when somebody at CNN publishes
1427
00:54:08,830 --> 00:54:10,599
an article they propagate fast enough to
1428
00:54:10,599 --> 00:54:12,280
all the edge servers and then the
1429
00:54:12,280 --> 00:54:13,839
connection is very fast all you all you
1430
00:54:13,839 --> 00:54:15,640
do is you talk to the server but the
1431
00:54:15,640 --> 00:54:18,099
distributed system aspect of it right it
1432
00:54:18,099 --> 00:54:22,380
is how do you keep these servers in sync
1433
00:54:22,380 --> 00:54:24,820
that's even more problematic for example
1434
00:54:24,820 --> 00:54:26,470
when you're actually doing transactions
1435
00:54:26,470 --> 00:54:29,530
your Amazon and you have multiple such a
1436
00:54:29,530 --> 00:54:32,470
server so people can buy very fast which
1437
00:54:32,470 --> 00:54:36,369
means you sell more right you have to
1438
00:54:36,369 --> 00:54:37,480
ask for today and I want you to
1439
00:54:37,480 --> 00:54:39,040
understand that it's important to
1440
00:54:39,040 --> 00:54:40,390
realize there are two aspects you can
1441
00:54:40,390 --> 00:54:41,650
have two different solutions even though
1442
00:54:41,650 --> 00:54:43,240
it looks like a single system so one
1443
00:54:43,240 --> 00:54:45,130
aspect when it comes to let's say
1444
00:54:45,130 --> 00:54:48,640
ecommerce is what items are available
1445
00:54:48,640 --> 00:54:50,650
and what's the description for the items
1446
00:54:50,650 --> 00:54:52,450
and the other aspect is the actual
1447
00:54:52,450 --> 00:54:55,730
financial transaction right
1448
00:54:55,730 --> 00:54:59,150
now how many of you noticed bought an
1449
00:54:59,150 --> 00:55:01,790
item from Amazon for Amazon later to
1450
00:55:01,790 --> 00:55:03,710
send an email say we apologize by we
1451
00:55:03,710 --> 00:55:05,990
don't actually have the item I bought a
1452
00:55:05,990 --> 00:55:07,339
keyboard then this happened to me and
1453
00:55:07,339 --> 00:55:08,180
it's very annoying
1454
00:55:08,180 --> 00:55:10,280
well they suggested that the sellers of
1455
00:55:10,280 --> 00:55:13,099
my habit but I already wasted the week
1456
00:55:13,099 --> 00:55:17,180
right so you might ask hey amazon has
1457
00:55:17,180 --> 00:55:18,980
computers computers have storage how
1458
00:55:18,980 --> 00:55:20,329
comes Amazon didn't know that they don't
1459
00:55:20,329 --> 00:55:21,220
have the keyboard
1460
00:55:21,220 --> 00:55:23,930
well because Amazon is cheating and they
1461
00:55:23,930 --> 00:55:26,180
are cheating with distributed system
1462
00:55:26,180 --> 00:55:30,070
techniques in order to alleviate
1463
00:55:30,070 --> 00:55:32,240
otherwise significant problems I would
1464
00:55:32,240 --> 00:55:34,579
have so in particular they did not have
1465
00:55:34,579 --> 00:55:37,700
keep ad server synchronized right they
1466
00:55:37,700 --> 00:55:40,579
said hey last night we had 10 keyboards
1467
00:55:40,579 --> 00:55:45,290
that's fine they're going to tell the ad
1468
00:55:45,290 --> 00:55:48,530
servers that we have 10 keyboards and
1469
00:55:48,530 --> 00:55:51,619
then all the time keyboards were bought
1470
00:55:51,619 --> 00:55:53,599
without information propagating to the
1471
00:55:53,599 --> 00:55:55,820
ad servers I jump on one of the answer
1472
00:55:55,820 --> 00:55:56,810
by the way you don't even know if there
1473
00:55:56,810 --> 00:55:58,130
is an ad server because you have to go
1474
00:55:58,130 --> 00:56:02,300
through your ISP right and you think you
1475
00:56:02,300 --> 00:56:05,750
go to amazon.com but you go to the ad
1476
00:56:05,750 --> 00:56:07,700
server where your ISP sends you that's
1477
00:56:07,700 --> 00:56:09,579
all networking magic right you just
1478
00:56:09,579 --> 00:56:12,980
routing and whatnot right that server
1479
00:56:12,980 --> 00:56:14,660
said oh yeah it sounds good
1480
00:56:14,660 --> 00:56:16,819
give me the money actually did the
1481
00:56:16,819 --> 00:56:20,060
financial transaction later when they
1482
00:56:20,060 --> 00:56:21,770
took the trouble to put together all of
1483
00:56:21,770 --> 00:56:23,690
this information they realize oops we
1484
00:56:23,690 --> 00:56:25,700
don't actually have the item no problem
1485
00:56:25,700 --> 00:56:27,680
we designed the system to be like this
1486
00:56:27,680 --> 00:56:31,460
so it's fault tolerant by mildly
1487
00:56:31,460 --> 00:56:33,410
annoying the user which is okay if you
1488
00:56:33,410 --> 00:56:35,869
know what you're doing right so what the
1489
00:56:35,869 --> 00:56:38,960
lesson here is you really have to take
1490
00:56:38,960 --> 00:56:40,730
tough choices even if you're one of the
1491
00:56:40,730 --> 00:56:42,200
big guys you might think hey there are
1492
00:56:42,200 --> 00:56:43,640
perfect solutions and all these big guys
1493
00:56:43,640 --> 00:56:45,170
use perfect solutions no they're using
1494
00:56:45,170 --> 00:56:46,760
perfect solutions that are designed in a
1495
00:56:46,760 --> 00:56:49,250
very specific way right in this case
1496
00:56:49,250 --> 00:56:51,829
they know that if they don't do it too
1497
00:56:51,829 --> 00:56:55,069
often it's not a disaster as long as
1498
00:56:55,069 --> 00:56:57,109
they give me my money back if they don't
1499
00:56:57,109 --> 00:56:58,670
give me my money back it's a disaster
1500
00:56:58,670 --> 00:57:00,380
but if they give me my money back it's
1501
00:57:00,380 --> 00:57:04,089
not such a big tragedy right so
1502
00:57:04,089 --> 00:57:06,410
sometimes and this is really the key to
1503
00:57:06,410 --> 00:57:08,030
successful distributed systems it's
1504
00:57:08,030 --> 00:57:09,019
perfectly happy
1505
00:57:09,019 --> 00:57:12,049
fine not to have perfect information or
1506
00:57:12,049 --> 00:57:14,359
to have a little bit of lying going
1507
00:57:14,359 --> 00:57:16,849
around in there as long as it's done in
1508
00:57:16,849 --> 00:57:19,009
a controlled way right and Amazon it's a
1509
00:57:19,009 --> 00:57:20,479
master in this by the way this is why
1510
00:57:20,479 --> 00:57:22,189
they are so big and so successful they
1511
00:57:22,189 --> 00:57:23,959
just found the right compromises this
1512
00:57:23,959 --> 00:57:26,509
being one of them okay instead of
1513
00:57:26,509 --> 00:57:28,369
insisting on really knowing what you
1514
00:57:28,369 --> 00:57:30,679
have in your inventory and then having a
1515
00:57:30,679 --> 00:57:33,019
very big distribution systems problem
1516
00:57:33,019 --> 00:57:35,809
and potentially limiting how many people
1517
00:57:35,809 --> 00:57:37,009
can do transactions at the same time
1518
00:57:37,009 --> 00:57:38,390
they say hey no let's just do
1519
00:57:38,390 --> 00:57:39,799
transactions and we'll figure out a
1520
00:57:39,799 --> 00:57:42,140
solution out of it right now that's not
1521
00:57:42,140 --> 00:57:44,959
true for the financial transaction if I
1522
00:57:44,959 --> 00:57:46,219
would do financial transaction and
1523
00:57:46,219 --> 00:57:47,719
sometimes it would fail that would be a
1524
00:57:47,719 --> 00:57:49,819
complete disaster by the way there are
1525
00:57:49,819 --> 00:57:51,499
strict regulations along these lines
1526
00:57:51,499 --> 00:57:56,749
right you cannot play with money without
1527
00:57:56,749 --> 00:57:58,399
giving very strict guarantees about how
1528
00:57:58,399 --> 00:58:00,819
what's happening with that money right
1529
00:58:00,819 --> 00:58:04,279
so what Amazon can do Visa and
1530
00:58:04,279 --> 00:58:07,669
MasterCard cannot do so for this I'm a
1531
00:58:07,669 --> 00:58:09,109
sucker this is a tremendous problem
1532
00:58:09,109 --> 00:58:11,359
because when they say the credit card
1533
00:58:11,359 --> 00:58:13,249
transaction went through then it went
1534
00:58:13,249 --> 00:58:14,869
through and it's a real transaction
1535
00:58:14,869 --> 00:58:17,089
right at the level of that financial
1536
00:58:17,089 --> 00:58:18,679
transaction you can't cheat but the
1537
00:58:18,679 --> 00:58:20,449
important thing is you focus now the
1538
00:58:20,449 --> 00:58:23,269
need to have a transaction everywhere -
1539
00:58:23,269 --> 00:58:24,649
the need to have a transaction only on
1540
00:58:24,649 --> 00:58:26,989
the very last step just the money
1541
00:58:26,989 --> 00:58:29,859
exchange by the way that's why
1542
00:58:29,859 --> 00:58:32,899
MasterCard and Visa charge at least two
1543
00:58:32,899 --> 00:58:35,749
point something percent because they do
1544
00:58:35,749 --> 00:58:37,519
this really fast sophisticated
1545
00:58:37,519 --> 00:58:39,019
distributed system their guarantee is
1546
00:58:39,019 --> 00:58:40,729
that you have real transactions right
1547
00:58:40,729 --> 00:58:42,619
and by the way this is why PayPal made a
1548
00:58:42,619 --> 00:58:44,269
lot of money because if you go out how
1549
00:58:44,269 --> 00:58:46,339
to do play their own game in the same
1550
00:58:46,339 --> 00:58:49,969
space okay so to some extent PayPal
1551
00:58:49,969 --> 00:58:53,269
founders deserve the big wealth they got
1552
00:58:53,269 --> 00:58:55,189
because they managed to build a really
1553
00:58:55,189 --> 00:58:56,599
good distributed systems it's all about
1554
00:58:56,599 --> 00:58:58,849
distributed systems at this point okay
1555
00:58:58,849 --> 00:59:00,829
good so let's read the story with this
1556
00:59:00,829 --> 00:59:03,139
ad servers where you play some how they
1557
00:59:03,139 --> 00:59:05,630
communicate what they do this is why
1558
00:59:05,630 --> 00:59:08,299
people use Akamai and not in-house ad
1559
00:59:08,299 --> 00:59:09,829
hoc solutions this is why people pay a
1560
00:59:09,829 --> 00:59:11,359
lot of money to Akamai because they
1561
00:59:11,359 --> 00:59:12,529
already have figured out a lot of these
1562
00:59:12,529 --> 00:59:16,249
things okay or this is why the big
1563
00:59:16,249 --> 00:59:19,039
companies like Amazon designed their own
1564
00:59:19,039 --> 00:59:20,269
solution because it's potentially
1565
00:59:20,269 --> 00:59:21,930
expensive to go with somebody else's
1566
00:59:21,930 --> 00:59:23,520
to convey themselves want to make a lot
1567
00:59:23,520 --> 00:59:26,730
of money right essentially what this
1568
00:59:26,730 --> 00:59:28,559
means is that virtually anybody who's
1569
00:59:28,559 --> 00:59:30,059
medium-sized or larger it's interested
1570
00:59:30,059 --> 00:59:31,380
in some sort of distributed system
1571
00:59:31,380 --> 00:59:33,809
implementation right so it's a big
1572
00:59:33,809 --> 00:59:35,279
market for people that understand
1573
00:59:35,279 --> 00:59:37,230
disability systems because they all have
1574
00:59:37,230 --> 00:59:39,539
to face this issues at the minimum how
1575
00:59:39,539 --> 00:59:41,670
do you use certain software Authority
1576
00:59:41,670 --> 00:59:43,980
does some disability systems right to
1577
00:59:43,980 --> 00:59:45,059
know how to fix something you have to
1578
00:59:45,059 --> 00:59:46,020
understand how it works
1579
00:59:46,020 --> 00:59:48,299
otherwise it's basically just taking a
1580
00:59:48,299 --> 00:59:49,859
hammer and knocking it on the side right
1581
00:59:49,859 --> 00:59:53,730
it's not particularly good okay so this
1582
00:59:53,730 --> 00:59:56,819
is a schematic of for example of torrent
1583
00:59:56,819 --> 00:59:58,710
BitTorrent and some other torrent sites
1584
00:59:58,710 --> 01:00:02,039
right in which you always go through
1585
01:00:02,039 --> 01:00:03,690
some sort of web interface even if you
1586
01:00:03,690 --> 01:00:06,000
don't see it by the way most of the apps
1587
01:00:06,000 --> 01:00:07,920
have a purely web-based interface of a
1588
01:00:07,920 --> 01:00:11,490
contact through a HTTP like protocol the
1589
01:00:11,490 --> 01:00:13,529
backend server even if they are actually
1590
01:00:13,529 --> 01:00:15,470
implemented on your iPhone or Android
1591
01:00:15,470 --> 01:00:17,700
okay that's very convenient because then
1592
01:00:17,700 --> 01:00:20,099
you can unify apps on multiple platforms
1593
01:00:20,099 --> 01:00:22,559
and web interfaces within the same kind
1594
01:00:22,559 --> 01:00:26,819
of big umbrella protocol okay so you
1595
01:00:26,819 --> 01:00:29,039
somehow go to some web page with bitter
1596
01:00:29,039 --> 01:00:31,079
torrents it tells you where a torrent
1597
01:00:31,079 --> 01:00:32,640
file is a doctor and file will tell you
1598
01:00:32,640 --> 01:00:34,500
where fragments of a particular file are
1599
01:00:34,500 --> 01:00:37,710
right as long as you make this resilient
1600
01:00:37,710 --> 01:00:40,380
as in multiple nodes know about torrent
1601
01:00:40,380 --> 01:00:40,710
files
1602
01:00:40,710 --> 01:00:42,029
you saw that properties not particularly
1603
01:00:42,029 --> 01:00:44,460
centralized and the total file will
1604
01:00:44,460 --> 01:00:46,529
indicate again where fragments of the
1605
01:00:46,529 --> 01:00:48,420
file are and the important thing is and
1606
01:00:48,420 --> 01:00:50,130
this is one interesting thing especially
1607
01:00:50,130 --> 01:00:53,099
if using christian on top of this nobody
1608
01:00:53,099 --> 01:00:57,029
has the file every file is partitioned
1609
01:00:57,029 --> 01:00:58,380
into multiple pieces and you have to
1610
01:00:58,380 --> 01:01:00,390
gather pieces from multiple places it's
1611
01:01:00,390 --> 01:01:02,460
easier to replicate them to make sure
1612
01:01:02,460 --> 01:01:04,349
that copies of every fragment are on
1613
01:01:04,349 --> 01:01:06,630
multiple or multiple sites right and
1614
01:01:06,630 --> 01:01:08,789
makes for an interesting distributed
1615
01:01:08,789 --> 01:01:11,160
system in fact the torrent if you think
1616
01:01:11,160 --> 01:01:12,390
about them are some sort of a file
1617
01:01:12,390 --> 01:01:13,710
system and they are in fact a
1618
01:01:13,710 --> 01:01:15,839
distributed file system this is one big
1619
01:01:15,839 --> 01:01:17,130
topic we are going to study in this
1620
01:01:17,130 --> 01:01:18,839
class the distributed file systems and
1621
01:01:18,839 --> 01:01:21,839
one way or another all the big companies
1622
01:01:21,839 --> 01:01:23,520
have some version of this distributed
1623
01:01:23,520 --> 01:01:25,079
file system right so for example what
1624
01:01:25,079 --> 01:01:28,170
powers a lot of the services that Google
1625
01:01:28,170 --> 01:01:29,940
provides is GFS the Google file system
1626
01:01:29,940 --> 01:01:31,859
which is a distributed file system and
1627
01:01:31,859 --> 01:01:33,359
Google design some 10 years back
1628
01:01:33,359 --> 01:01:35,839
right everything eventually goes to GFS
1629
01:01:35,839 --> 01:01:40,559
for for Google at least HDFS for let's
1630
01:01:40,559 --> 01:01:42,479
say Yahoo some other distributed file
1631
01:01:42,479 --> 01:01:45,539
systems okay now to some extent
1632
01:01:45,539 --> 01:01:46,739
disability file systems are the
1633
01:01:46,739 --> 01:01:47,999
connection between the distributed
1634
01:01:47,999 --> 01:01:51,119
systems and operating systems sometimes
1635
01:01:51,119 --> 01:01:52,890
we call disability systems disability
1636
01:01:52,890 --> 01:01:54,569
operating systems even well for the last
1637
01:01:54,569 --> 01:01:55,739
10 years we kind of dropped the
1638
01:01:55,739 --> 01:01:57,660
operating part right it's really
1639
01:01:57,660 --> 01:01:59,789
distributed systems when it comes to
1640
01:01:59,789 --> 01:02:01,799
file systems that's really the realm of
1641
01:02:01,799 --> 01:02:03,779
operating systems and by providing a
1642
01:02:03,779 --> 01:02:04,920
distributed file system you're getting
1643
01:02:04,920 --> 01:02:06,749
close to what an operating system will
1644
01:02:06,749 --> 01:02:09,749
do ok but large discussion later to be
1645
01:02:09,749 --> 01:02:10,859
done about this but essentially be
1646
01:02:10,859 --> 01:02:12,959
torrent it's some sort of disability
1647
01:02:12,959 --> 01:02:14,249
file system arguably peer-to-peer
1648
01:02:14,249 --> 01:02:19,259
network system so that begs the
1649
01:02:19,259 --> 01:02:20,640
following interesting question what's a
1650
01:02:20,640 --> 01:02:28,589
file so what are this files so how many
1651
01:02:28,589 --> 01:02:31,469
people did not use files ever right
1652
01:02:31,469 --> 01:02:32,219
that's what I thought
1653
01:02:32,219 --> 01:02:36,509
so what's really a file so let's try to
1654
01:02:36,509 --> 01:02:39,359
go deep ok let's make it simple I'm
1655
01:02:39,359 --> 01:02:41,430
running an operating system let's say
1656
01:02:41,430 --> 01:02:43,650
Linux and I create a file
1657
01:02:43,650 --> 01:02:46,859
what exactly is created where what what
1658
01:02:46,859 --> 01:02:52,170
happens I mean it's some sort of bit
1659
01:02:52,170 --> 01:02:54,269
somewhere right it has to be built we
1660
01:02:54,269 --> 01:02:58,199
know that the hard drive stores bits but
1661
01:02:58,199 --> 01:03:00,089
they have to be organized on how right
1662
01:03:00,089 --> 01:03:02,130
it's all about organizing things well it
1663
01:03:02,130 --> 01:03:03,359
turns out that it's a little bit hard to
1664
01:03:03,359 --> 01:03:05,759
say what the file is and where it is and
1665
01:03:05,759 --> 01:03:07,759
how it is how many of you know about
1666
01:03:07,759 --> 01:03:11,999
in-memory file systems so what's an
1667
01:03:11,999 --> 01:03:15,269
in-memory file system alright so the
1668
01:03:15,269 --> 01:03:16,529
files are written in my memory says
1669
01:03:16,529 --> 01:03:17,609
they're not even associated with the
1670
01:03:17,609 --> 01:03:20,190
hard drives anymore right why would you
1671
01:03:20,190 --> 01:03:25,259
do this in memory file systems we have
1672
01:03:25,259 --> 01:03:26,719
access to the memory why do I need files
1673
01:03:26,719 --> 01:03:29,190
so it's clear that FasTracks is but not
1674
01:03:29,190 --> 01:03:30,479
clear of that I need files why would I
1675
01:03:30,479 --> 01:03:35,069
do files but there is no persistence
1676
01:03:35,069 --> 01:03:36,749
because it's an in-memory file system
1677
01:03:36,749 --> 01:03:38,069
and it's never written under this with
1678
01:03:38,069 --> 01:03:39,779
no persistence literally no in-memory
1679
01:03:39,779 --> 01:03:41,430
file systems don't usually have
1680
01:03:41,430 --> 01:03:44,089
persistence
1681
01:03:45,130 --> 01:03:48,100
right so we're getting somewhere okay so
1682
01:03:48,100 --> 01:03:50,020
the file is some sort of a universal
1683
01:03:50,020 --> 01:03:52,630
abstraction that everybody likes to use
1684
01:03:52,630 --> 01:03:56,140
right so that's the problem there it has
1685
01:03:56,140 --> 01:03:57,250
nothing to do necessarily with
1686
01:03:57,250 --> 01:03:58,780
persistency because we throw away
1687
01:03:58,780 --> 01:04:00,370
persistency for the in-memory version it
1688
01:04:00,370 --> 01:04:01,840
has to do with the abstraction we are
1689
01:04:01,840 --> 01:04:03,040
used to the abstraction we like the
1690
01:04:03,040 --> 01:04:04,510
abstraction and all the programs use it
1691
01:04:04,510 --> 01:04:07,180
right so this is why this DVD file
1692
01:04:07,180 --> 01:04:08,950
systems are suddenly extremely important
1693
01:04:08,950 --> 01:04:11,110
because any program written to deal with
1694
01:04:11,110 --> 01:04:13,060
files could use a distributed file
1695
01:04:13,060 --> 01:04:15,880
system and be a distributed system right
1696
01:04:15,880 --> 01:04:19,360
if you can distribute files all you need
1697
01:04:19,360 --> 01:04:20,770
is to read and write files and you can
1698
01:04:20,770 --> 01:04:23,050
have a distributed system so this might
1699
01:04:23,050 --> 01:04:25,240
sound weird but one way for me to talk
1700
01:04:25,240 --> 01:04:28,480
to you is I write a part of a file
1701
01:04:28,480 --> 01:04:30,520
that's a distributed file the CV file
1702
01:04:30,520 --> 01:04:32,320
system somehow gets the information to
1703
01:04:32,320 --> 01:04:34,360
you it's not actively pushed to you but
1704
01:04:34,360 --> 01:04:35,200
if you know where to look
1705
01:04:35,200 --> 01:04:37,060
you suddenly talk to me you talk to me
1706
01:04:37,060 --> 01:04:40,570
for a file right how many people know
1707
01:04:40,570 --> 01:04:43,390
what the pipe is in an operating system
1708
01:04:43,390 --> 01:04:52,330
what's a pipe it really is a channel of
1709
01:04:52,330 --> 01:04:54,460
communication that's made to look like a
1710
01:04:54,460 --> 01:04:56,560
file so it's again the file abstraction
1711
01:04:56,560 --> 01:04:59,680
so a pipe it's a file except that it
1712
01:04:59,680 --> 01:05:01,210
never touches the hard drive and it's
1713
01:05:01,210 --> 01:05:03,790
really fast but it's still a file so
1714
01:05:03,790 --> 01:05:05,560
it's all about files so to a large
1715
01:05:05,560 --> 01:05:08,530
extent any modern operating system it's
1716
01:05:08,530 --> 01:05:14,110
about files right by by the way so this
1717
01:05:14,110 --> 01:05:16,480
might sound strange but you can even
1718
01:05:16,480 --> 01:05:19,870
access the network through files these
1719
01:05:19,870 --> 01:05:21,880
are the sockets especially named sockets
1720
01:05:21,880 --> 01:05:27,820
all right how does that work no it
1721
01:05:27,820 --> 01:05:29,830
starts to be very very weird there's
1722
01:05:29,830 --> 01:05:31,270
definitely no persistency whatsoever
1723
01:05:31,270 --> 01:05:34,570
because if nobody waits at the other end
1724
01:05:34,570 --> 01:05:35,980
first of all nothing gets transmitted or
1725
01:05:35,980 --> 01:05:37,360
if it waits and froze the information
1726
01:05:37,360 --> 01:05:39,160
nothing ever made it in a persistent way
1727
01:05:39,160 --> 01:05:42,750
anyway so what's that all about well
1728
01:05:42,750 --> 01:05:45,820
it's piggybacking on one abstraction
1729
01:05:45,820 --> 01:05:47,830
which is the file system a completely
1730
01:05:47,830 --> 01:05:48,820
different thing this kind of
1731
01:05:48,820 --> 01:05:50,410
communication and it's in fact very
1732
01:05:50,410 --> 01:05:52,630
convenient because in that way so files
1733
01:05:52,630 --> 01:05:55,780
allow you to make a certain resource
1734
01:05:55,780 --> 01:05:58,240
available to other parties by just
1735
01:05:58,240 --> 01:05:58,810
agreeing
1736
01:05:58,810 --> 01:06:00,790
some sort of a name it's literally can
1737
01:06:00,790 --> 01:06:03,040
be used as a directory service right so
1738
01:06:03,040 --> 01:06:04,420
in that circumstance the file it so
1739
01:06:04,420 --> 01:06:06,730
called a special file the file just
1740
01:06:06,730 --> 01:06:08,350
contains information about the fact that
1741
01:06:08,350 --> 01:06:09,670
I'm really a socket
1742
01:06:09,670 --> 01:06:11,830
and this is the information about the
1743
01:06:11,830 --> 01:06:13,840
socket the IP address I'm listening to
1744
01:06:13,840 --> 01:06:15,280
who I'm talking to and whatever else
1745
01:06:15,280 --> 01:06:18,580
right but then suddenly all of that
1746
01:06:18,580 --> 01:06:20,140
information that's about the socket has
1747
01:06:20,140 --> 01:06:22,180
a name and as far as the user is
1748
01:06:22,180 --> 01:06:25,300
concerned it looks like a file you open
1749
01:06:25,300 --> 01:06:26,410
the file you send something on the file
1750
01:06:26,410 --> 01:06:29,860
right so distribute file systems can be
1751
01:06:29,860 --> 01:06:32,380
regarded as some sort of viewer
1752
01:06:32,380 --> 01:06:34,270
connection as a file and put something
1753
01:06:34,270 --> 01:06:35,680
at one end and it pops up at the other
1754
01:06:35,680 --> 01:06:37,510
end in a very simple form with
1755
01:06:37,510 --> 01:06:39,250
persistency thrown in the middle to make
1756
01:06:39,250 --> 01:06:43,390
sure we can dissociate in time the
1757
01:06:43,390 --> 01:06:45,160
writing of the file and the reading of
1758
01:06:45,160 --> 01:06:47,770
the file right so you can write it and I
1759
01:06:47,770 --> 01:06:50,680
came back later and read it as long as I
1760
01:06:50,680 --> 01:06:52,630
come after it's fine it varies position
1761
01:06:52,630 --> 01:06:54,010
see if you don't have persistency you
1762
01:06:54,010 --> 01:06:56,380
can still have the file abstraction but
1763
01:06:56,380 --> 01:06:58,090
we have to be there at the same time I
1764
01:06:58,090 --> 01:07:00,430
start writing but I block until you
1765
01:07:00,430 --> 01:07:02,710
start reading and then I write you read
1766
01:07:02,710 --> 01:07:04,330
and then we are done we can still
1767
01:07:04,330 --> 01:07:05,740
pretend it was a file by the way this is
1768
01:07:05,740 --> 01:07:07,630
exactly what the pipe is all about right
1769
01:07:07,630 --> 01:07:09,700
the pipes in a traditional operating
1770
01:07:09,700 --> 01:07:11,770
systems require the reader and the
1771
01:07:11,770 --> 01:07:12,930
writer to be there at the same time
1772
01:07:12,930 --> 01:07:15,250
otherwise one blocks until the other one
1773
01:07:15,250 --> 01:07:18,430
shows up and does something right if you
1774
01:07:18,430 --> 01:07:20,590
think about persistency that allows you
1775
01:07:20,590 --> 01:07:23,140
to write now and somebody comes later to
1776
01:07:23,140 --> 01:07:24,910
pick it up so allows you more
1777
01:07:24,910 --> 01:07:26,710
flexibility in what to do with that
1778
01:07:26,710 --> 01:07:29,380
so persistency might be there to just
1779
01:07:29,380 --> 01:07:31,510
break this requirement to be there at
1780
01:07:31,510 --> 01:07:32,920
the same time and not necessarily to
1781
01:07:32,920 --> 01:07:35,020
keep the information forever lab for
1782
01:07:35,020 --> 01:07:36,730
example temporary files with temporary
1783
01:07:36,730 --> 01:07:39,190
results would work like this right so
1784
01:07:39,190 --> 01:07:41,230
that's kind of an interesting view of
1785
01:07:41,230 --> 01:07:43,450
what files might or might not do by the
1786
01:07:43,450 --> 01:07:45,490
way in most modern operating systems
1787
01:07:45,490 --> 01:07:47,440
every resource essentially is viewed
1788
01:07:47,440 --> 01:07:50,110
through this kind of a file abstraction
1789
01:07:50,110 --> 01:07:53,800
including raw devices right
1790
01:07:53,800 --> 01:07:57,820
for example even when it comes to to
1791
01:07:57,820 --> 01:08:00,670
file it up so how many of you know about
1792
01:08:00,670 --> 01:08:03,370
file systems within a file how do you
1793
01:08:03,370 --> 01:08:04,810
created a file system within a normal
1794
01:08:04,810 --> 01:08:09,970
file well probably all of you if you use
1795
01:08:09,970 --> 01:08:11,860
some sort of visualize virtualization
1796
01:08:11,860 --> 01:08:12,580
with VMware
1797
01:08:12,580 --> 01:08:14,500
but for example a particularly
1798
01:08:14,500 --> 01:08:15,790
interesting approach to this would be
1799
01:08:15,790 --> 01:08:18,609
this kind of encrypted file systems so
1800
01:08:18,609 --> 01:08:20,229
to crib for example it's a very nice
1801
01:08:20,229 --> 01:08:21,760
program runs on all the platforms that
1802
01:08:21,760 --> 01:08:23,680
allows you to create an entire file
1803
01:08:23,680 --> 01:08:26,500
system within a file it's encrypted so
1804
01:08:26,500 --> 01:08:28,510
when you shut it down it's basically
1805
01:08:28,510 --> 01:08:32,380
sealed by a complex presumably very hard
1806
01:08:32,380 --> 01:08:34,300
to break encryption system then take
1807
01:08:34,300 --> 01:08:35,979
that file you put it on USB stick you
1808
01:08:35,979 --> 01:08:37,479
transport it somewhere you send it in
1809
01:08:37,479 --> 01:08:39,040
your an email you pick it up at the
1810
01:08:39,040 --> 01:08:41,350
other end and you mount the file as a
1811
01:08:41,350 --> 01:08:43,510
full file system so what exactly does
1812
01:08:43,510 --> 01:08:47,260
that mean that means the operating
1813
01:08:47,260 --> 01:08:48,760
system has the possibility to create
1814
01:08:48,760 --> 01:08:51,430
this view of files with subtraction of
1815
01:08:51,430 --> 01:08:54,189
files on top of any set of raw bits so
1816
01:08:54,189 --> 01:08:55,450
ultimately there is not much of a
1817
01:08:55,450 --> 01:08:56,800
distinction between a normal file that
1818
01:08:56,800 --> 01:08:58,899
keeps bits and a bit interpretation
1819
01:08:58,899 --> 01:09:01,479
that's a file system versus the original
1820
01:09:01,479 --> 01:09:03,160
implementation of the file system on top
1821
01:09:03,160 --> 01:09:04,660
of the row bits that live on the medium
1822
01:09:04,660 --> 01:09:07,330
there is literally no difference in any
1823
01:09:07,330 --> 01:09:10,120
decent operating system and that
1824
01:09:10,120 --> 01:09:11,500
immediately suggests that the raw device
1825
01:09:11,500 --> 01:09:13,180
itself could be could be viewed as a
1826
01:09:13,180 --> 01:09:14,500
file and that's true in all modern
1827
01:09:14,500 --> 01:09:17,339
operating systems right slash dev slash
1828
01:09:17,339 --> 01:09:22,569
SDA it's in fact a file that allows you
1829
01:09:22,569 --> 01:09:26,500
access to the raw device right now why
1830
01:09:26,500 --> 01:09:29,200
is this important well this is important
1831
01:09:29,200 --> 01:09:30,520
especially if you want to do tricks
1832
01:09:30,520 --> 01:09:33,939
related to high performance file systems
1833
01:09:33,939 --> 01:09:35,380
because they are an abstraction they add
1834
01:09:35,380 --> 01:09:38,710
their own good and bad on top of it they
1835
01:09:38,710 --> 01:09:40,870
make things more convenient but at the
1836
01:09:40,870 --> 01:09:42,100
same time they could rob you of some
1837
01:09:42,100 --> 01:09:44,290
performance for example caching can
1838
01:09:44,290 --> 01:09:47,050
happen that it's undesirable now this
1839
01:09:47,050 --> 01:09:47,950
sounds strange
1840
01:09:47,950 --> 01:09:50,950
why would caching be undesirable because
1841
01:09:50,950 --> 01:09:52,779
and I'm not going to go into details but
1842
01:09:52,779 --> 01:09:54,820
I might later in the class sometimes
1843
01:09:54,820 --> 01:09:56,830
it's faster to write to some of the
1844
01:09:56,830 --> 01:10:00,070
modern devices then to cache we have
1845
01:10:00,070 --> 01:10:01,660
especially some of the SSD hard drives
1846
01:10:01,660 --> 01:10:06,490
can be extremely fast right so it's
1847
01:10:06,490 --> 01:10:08,920
strange but it might actually be faster
1848
01:10:08,920 --> 01:10:11,650
to ask the raw device to write a piece
1849
01:10:11,650 --> 01:10:13,330
of the memory than to have the processor
1850
01:10:13,330 --> 01:10:14,980
copy that thing from one part of the
1851
01:10:14,980 --> 01:10:16,630
memory to another part of the memory and
1852
01:10:16,630 --> 01:10:18,610
I've seen it happening ok and those
1853
01:10:18,610 --> 01:10:19,810
circumcised you definitely want to
1854
01:10:19,810 --> 01:10:21,910
access the raw device so knowing that
1855
01:10:21,910 --> 01:10:23,710
you have this hierarchy of the files and
1856
01:10:23,710 --> 01:10:25,420
you have this idea of a virtual file
1857
01:10:25,420 --> 01:10:26,110
system that
1858
01:10:26,110 --> 01:10:28,090
simulates the abstraction at multiple
1859
01:10:28,090 --> 01:10:29,860
levels of a file right I become
1860
01:10:29,860 --> 01:10:33,340
important if you want to obtain such a
1861
01:10:33,340 --> 01:10:36,070
performance also provides this ultimate
1862
01:10:36,070 --> 01:10:38,170
convenience all you have to worry about
1863
01:10:38,170 --> 01:10:40,420
it so far like interface which is what
1864
01:10:40,420 --> 01:10:43,420
open file write read and write parts of
1865
01:10:43,420 --> 01:10:44,620
the file if you have random access to it
1866
01:10:44,620 --> 01:10:47,110
pipes don't write close the file may be
1867
01:10:47,110 --> 01:10:49,830
create file but it's a very simple
1868
01:10:49,830 --> 01:10:52,900
abstraction right very easy to write
1869
01:10:52,900 --> 01:10:54,970
applications you just open files close
1870
01:10:54,970 --> 01:10:55,330
files
1871
01:10:55,330 --> 01:10:59,010
write distributed file systems allow
1872
01:10:59,010 --> 01:11:02,200
simple application writing on and to
1873
01:11:02,200 --> 01:11:03,550
have a distributed system like things
1874
01:11:03,550 --> 01:11:05,650
know they might not be the everything
1875
01:11:05,650 --> 01:11:07,840
and you see more and more systems now
1876
01:11:07,840 --> 01:11:09,520
that use other paradigms for example
1877
01:11:09,520 --> 01:11:10,990
like this remote procedure calls or some
1878
01:11:10,990 --> 01:11:13,540
some sort of a message exchange of some
1879
01:11:13,540 --> 01:11:17,230
some form oh by the way how could you
1880
01:11:17,230 --> 01:11:19,120
implement the message queues using these
1881
01:11:19,120 --> 01:11:21,370
three file systems a very important
1882
01:11:21,370 --> 01:11:23,470
question is how one abstraction can
1883
01:11:23,470 --> 01:11:25,510
emulate other abstractions could you do
1884
01:11:25,510 --> 01:11:30,520
this what is very simple you just do
1885
01:11:30,520 --> 01:11:31,870
file manipulation right if I want to
1886
01:11:31,870 --> 01:11:33,640
send your message we designate a special
1887
01:11:33,640 --> 01:11:35,350
file that's a distributed file so if I
1888
01:11:35,350 --> 01:11:36,730
write it in the information is
1889
01:11:36,730 --> 01:11:38,320
propagated and I simply write to the
1890
01:11:38,320 --> 01:11:40,690
file I'm simply gonna put the message at
1891
01:11:40,690 --> 01:11:41,980
the end of the file and you delete
1892
01:11:41,980 --> 01:11:43,270
messages from the beginning of the file
1893
01:11:43,270 --> 01:11:44,590
now it's not particularly efficient
1894
01:11:44,590 --> 01:11:47,320
maybe but it can easily be done with the
1895
01:11:47,320 --> 01:11:50,130
file distributed file system abstraction
1896
01:11:50,130 --> 01:11:53,470
okay now there are many distributed
1897
01:11:53,470 --> 01:11:55,690
systems some of them very popular some
1898
01:11:55,690 --> 01:11:58,510
of them esoteric right almost all of
1899
01:11:58,510 --> 01:12:00,640
them try to have a certain band a
1900
01:12:00,640 --> 01:12:03,010
certain different approach and cater to
1901
01:12:03,010 --> 01:12:05,620
a different segment okay some in this
1902
01:12:05,620 --> 01:12:07,630
class I'm not really gonna talk about
1903
01:12:07,630 --> 01:12:10,210
this is better than this one I'm mostly
1904
01:12:10,210 --> 01:12:12,550
going to look at details of particular
1905
01:12:12,550 --> 01:12:14,800
systems what is it that they want wanted
1906
01:12:14,800 --> 01:12:17,410
to to cater cater to which specific
1907
01:12:17,410 --> 01:12:20,050
scenario what are some of the
1908
01:12:20,050 --> 01:12:21,910
interesting features of that particular
1909
01:12:21,910 --> 01:12:24,040
system how does the technology work in
1910
01:12:24,040 --> 01:12:26,050
the hope that when you're faced with a
1911
01:12:26,050 --> 01:12:29,680
particular problem to be solved right
1912
01:12:29,680 --> 01:12:32,320
you can find your way to sort out
1913
01:12:32,320 --> 01:12:34,660
through what kind of things you could
1914
01:12:34,660 --> 01:12:37,690
try to do to to cause your own solution
1915
01:12:37,690 --> 01:12:39,219
now
1916
01:12:39,219 --> 01:12:41,239
something that the shipping systems
1917
01:12:41,239 --> 01:12:43,639
community looked at for a very long time
1918
01:12:43,639 --> 01:12:46,789
and I would say this is a core human
1919
01:12:46,789 --> 01:12:49,249
obsession to find the ultimate solution
1920
01:12:49,249 --> 01:12:52,460
right so an interesting question to ask
1921
01:12:52,460 --> 01:12:57,079
in any in any endeavor you want is is
1922
01:12:57,079 --> 01:12:59,480
there such a thing as the best solution
1923
01:12:59,480 --> 01:13:00,260
for this problem
1924
01:13:00,260 --> 01:13:03,219
end of story no questions asked right
1925
01:13:03,219 --> 01:13:05,150
from a practical point of view that
1926
01:13:05,150 --> 01:13:06,829
would be fantastic because if you find
1927
01:13:06,829 --> 01:13:09,349
the best solution you're done you know
1928
01:13:09,349 --> 01:13:10,940
that's the best solution we all use the
1929
01:13:10,940 --> 01:13:12,829
best solution let's move on and do
1930
01:13:12,829 --> 01:13:18,260
something else right it's really
1931
01:13:18,260 --> 01:13:19,699
equivalent in theoretical computer
1932
01:13:19,699 --> 01:13:21,260
science to finding those matching lower
1933
01:13:21,260 --> 01:13:23,019
bounds you found them you're done
1934
01:13:23,019 --> 01:13:25,099
athletes from a big o-notation point of
1935
01:13:25,099 --> 01:13:25,670
view okay
1936
01:13:25,670 --> 01:13:27,769
well it turns out and in DCB systems
1937
01:13:27,769 --> 01:13:33,050
that is definitely no best solution it's
1938
01:13:33,050 --> 01:13:35,239
all compromises and that's true in
1939
01:13:35,239 --> 01:13:37,969
almost everything right a particular
1940
01:13:37,969 --> 01:13:40,219
company is my lead to a different kind
1941
01:13:40,219 --> 01:13:42,199
of solution all right
1942
01:13:42,199 --> 01:13:43,610
a particular solution might be better
1943
01:13:43,610 --> 01:13:46,360
under certain circumstances for example
1944
01:13:46,360 --> 01:13:49,039
you might not need any full tolerance
1945
01:13:49,039 --> 01:13:50,480
whatsoever if your system is reliable
1946
01:13:50,480 --> 01:13:51,739
and I mentioned the fact that some
1947
01:13:51,739 --> 01:13:53,420
systems are just extremely reliable
1948
01:13:53,420 --> 01:13:56,659
right local networks are extremely
1949
01:13:56,659 --> 01:13:58,099
reliable you don't need to assume that
1950
01:13:58,099 --> 01:13:59,570
you're gonna have losses on the network
1951
01:13:59,570 --> 01:14:02,329
you simply don't right for example if
1952
01:14:02,329 --> 01:14:03,650
you don't get a reply within 10
1953
01:14:03,650 --> 01:14:04,969
milliseconds you know that the other guy
1954
01:14:04,969 --> 01:14:08,659
it's in big trouble in a local in a
1955
01:14:08,659 --> 01:14:10,670
local network and those assumptions can
1956
01:14:10,670 --> 01:14:12,289
actually be used to great effect to have
1957
01:14:12,289 --> 01:14:16,010
much faster systems right if you have to
1958
01:14:16,010 --> 01:14:18,829
send a message across the globe right to
1959
01:14:18,829 --> 01:14:20,749
some other server you're virtually
1960
01:14:20,749 --> 01:14:22,219
guaranteed that something will happen
1961
01:14:22,219 --> 01:14:24,019
in most of them is masked by tcp/ip this
1962
01:14:24,019 --> 01:14:27,110
is why you don't notice it right how do
1963
01:14:27,110 --> 01:14:30,440
you have done this you try to go to a
1964
01:14:30,440 --> 01:14:31,969
website it and it doesn't work you press
1965
01:14:31,969 --> 01:14:36,590
refresh and it instantly goes right one
1966
01:14:36,590 --> 01:14:38,179
second I mean it was the same Internet
1967
01:14:38,179 --> 01:14:40,760
right you can't assume that the internet
1968
01:14:40,760 --> 01:14:43,039
was very bad and then half a second
1969
01:14:43,039 --> 01:14:45,139
later I press refresh is very good so
1970
01:14:45,139 --> 01:14:46,249
what happened there well it's a very
1971
01:14:46,249 --> 01:14:48,860
complex system nobody quite knows right
1972
01:14:48,860 --> 01:14:51,170
so sometimes and this is an interesting
1973
01:14:51,170 --> 01:14:52,300
approach we're going to see this
1974
01:14:52,300 --> 01:14:54,520
right sometimes doing retries on the
1975
01:14:54,520 --> 01:14:56,020
same activity might actually
1976
01:14:56,020 --> 01:14:57,460
significantly improve your chance to get
1977
01:14:57,460 --> 01:14:59,500
the thing done faster but then you have
1978
01:14:59,500 --> 01:15:02,080
to somehow deal with the fact that you
1979
01:15:02,080 --> 01:15:04,120
already made another request now that's
1980
01:15:04,120 --> 01:15:05,740
not a problem if you ask for a refresh
1981
01:15:05,740 --> 01:15:08,620
on a page except if you try to do a
1982
01:15:08,620 --> 01:15:10,150
financial transaction when they actually
1983
01:15:10,150 --> 01:15:12,370
put a nice banner there please do not
1984
01:15:12,370 --> 01:15:17,860
press the bottom twice right until they
1985
01:15:17,860 --> 01:15:19,150
figure out you can do that in JavaScript
1986
01:15:19,150 --> 01:15:21,430
so once the button is pressed you simply
1987
01:15:21,430 --> 01:15:22,750
change the state in JavaScript and you
1988
01:15:22,750 --> 01:15:27,000
don't let people press it again except
1989
01:15:27,000 --> 01:15:29,830
that sometimes the first click did not
1990
01:15:29,830 --> 01:15:32,740
go through right three days later people
1991
01:15:32,740 --> 01:15:34,300
call the customer service and say I
1992
01:15:34,300 --> 01:15:35,890
clicked you told me not to click again
1993
01:15:35,890 --> 01:15:39,340
now what let me look right now that's
1994
01:15:39,340 --> 01:15:41,140
not good fault tolerance in the system
1995
01:15:41,140 --> 01:15:45,640
right so now that sounds funny but it's
1996
01:15:45,640 --> 01:15:48,700
in fact one of the core issues how can I
1997
01:15:48,700 --> 01:15:51,850
mean when you can reissue the same the
1998
01:15:51,850 --> 01:15:53,440
same command again and that potentially
1999
01:15:53,440 --> 01:15:55,990
might make things go much faster right
2000
01:15:55,990 --> 01:15:58,720
and when you cannot by the way that
2001
01:15:58,720 --> 01:16:01,090
thing has a name it's called so a
2002
01:16:01,090 --> 01:16:02,920
particular activities that can be done
2003
01:16:02,920 --> 01:16:04,540
multiple times with no harm done is
2004
01:16:04,540 --> 01:16:07,330
they're called idempotent okay for
2005
01:16:07,330 --> 01:16:09,100
example give me this page it doesn't
2006
01:16:09,100 --> 01:16:11,680
matter I'd say 10 times I just ignore
2007
01:16:11,680 --> 01:16:12,850
the other ones and I display the last
2008
01:16:12,850 --> 01:16:14,620
one when you're talking about financial
2009
01:16:14,620 --> 01:16:15,850
transactions we are almost never
2010
01:16:15,850 --> 01:16:19,120
idempotent and right now maybe if you
2011
01:16:19,120 --> 01:16:21,730
asked for what's the balance of my
2012
01:16:21,730 --> 01:16:23,710
account that's fine but if you say
2013
01:16:23,710 --> 01:16:25,270
transfer 100 thousand dollars when it's
2014
01:16:25,270 --> 01:16:26,530
definitely not fine if it happens
2015
01:16:26,530 --> 01:16:30,460
multiple times right except when you can
2016
01:16:30,460 --> 01:16:32,440
cheat when could you cheat is exactly
2017
01:16:32,440 --> 01:16:35,620
the Amazon approach if you say look if
2018
01:16:35,620 --> 01:16:37,780
you allow me every now and then to do a
2019
01:16:37,780 --> 01:16:39,730
transaction but revert it within 24
2020
01:16:39,730 --> 01:16:43,300
hours but I'll go ahead and do it and if
2021
01:16:43,300 --> 01:16:44,080
you have enough money in the bank
2022
01:16:44,080 --> 01:16:45,550
account I'll do two transfers over a
2023
01:16:45,550 --> 01:16:46,900
hundred thousand overnight figure out
2024
01:16:46,900 --> 01:16:48,700
that one is wrong and put the money back
2025
01:16:48,700 --> 01:16:50,470
are you okay with that yes good let's do
2026
01:16:50,470 --> 01:16:53,650
that for most people that's not okay
2027
01:16:53,650 --> 01:16:57,550
right occasionally your bank tells you
2028
01:16:57,550 --> 01:17:00,640
that you're some something happened to
2029
01:17:00,640 --> 01:17:02,080
your bank account and is negative a
2030
01:17:02,080 --> 01:17:04,660
hundred thousand people have experienced
2031
01:17:04,660 --> 01:17:05,290
some of that stuff
2032
01:17:05,290 --> 01:17:08,530
I which essentially means what that
2033
01:17:08,530 --> 01:17:09,880
everybody's cheating one way or another
2034
01:17:09,880 --> 01:17:11,410
as long as you can resolve this
2035
01:17:11,410 --> 01:17:12,640
conflicts as long as you can
2036
01:17:12,640 --> 01:17:15,640
re-establish this kind of a consistent
2037
01:17:15,640 --> 01:17:17,410
state you might be fine
2038
01:17:17,410 --> 01:17:20,110
so some sort of ideas pop up in here is
2039
01:17:20,110 --> 01:17:23,440
this consistence thing what's consistent
2040
01:17:23,440 --> 01:17:26,460
state what does it mean to be consistent
2041
01:17:26,460 --> 01:17:28,510
we are going to bump into this later
2042
01:17:28,510 --> 01:17:30,970
right you can say for example the one
2043
01:17:30,970 --> 01:17:32,440
way to define transactions we mentioned
2044
01:17:32,440 --> 01:17:33,790
transactions last time right if begin
2045
01:17:33,790 --> 01:17:35,740
transaction do something either abort or
2046
01:17:35,740 --> 01:17:38,110
commit and the requirement for a
2047
01:17:38,110 --> 01:17:40,390
transaction to be a transaction is to
2048
01:17:40,390 --> 01:17:41,890
leave the system in a consistent state
2049
01:17:41,890 --> 01:17:43,960
but you have a problem there what's a
2050
01:17:43,960 --> 01:17:47,230
consistent state all right what could be
2051
01:17:47,230 --> 01:17:49,780
a consistent state well why is this
2052
01:17:49,780 --> 01:17:51,940
problematic because so how many of you
2053
01:17:51,940 --> 01:17:53,380
know about verification Hardware
2054
01:17:53,380 --> 01:17:55,480
verification or hard Hardware in general
2055
01:17:55,480 --> 01:17:58,030
is very hard to say if some device
2056
01:17:58,030 --> 01:18:00,010
really does what it's supposed to do
2057
01:18:00,010 --> 01:18:02,490
right
2058
01:18:02,920 --> 01:18:04,930
humans are not pretty good at designing
2059
01:18:04,930 --> 01:18:08,470
things that you really always work it's
2060
01:18:08,470 --> 01:18:11,020
always that uncertainity maybe sometimes
2061
01:18:11,020 --> 01:18:14,260
it doesn't quite work right so when it
2062
01:18:14,260 --> 01:18:15,940
works we can call that some sort of a
2063
01:18:15,940 --> 01:18:18,280
consistent state it's something that we
2064
01:18:18,280 --> 01:18:20,440
expected we predicted and anything
2065
01:18:20,440 --> 01:18:22,180
that's out of what we predict you can
2066
01:18:22,180 --> 01:18:23,860
call it inconsistent state but we have a
2067
01:18:23,860 --> 01:18:26,010
lot of problems even writing any single
2068
01:18:26,010 --> 01:18:28,900
little program that for sure it's
2069
01:18:28,900 --> 01:18:30,730
consistent so then how can you talk
2070
01:18:30,730 --> 01:18:31,930
about the distribution system that has
2071
01:18:31,930 --> 01:18:33,640
so many other things going on to be
2072
01:18:33,640 --> 01:18:35,710
actually consistent well you can always
2073
01:18:35,710 --> 01:18:37,720
try to give a definition with respect to
2074
01:18:37,720 --> 01:18:40,090
something that has less of a freedom
2075
01:18:40,090 --> 01:18:43,650
right you say your program is consistent
2076
01:18:43,650 --> 01:18:46,300
right if it does exactly what the
2077
01:18:46,300 --> 01:18:47,590
program that running on one machine
2078
01:18:47,590 --> 01:18:49,930
would do now we don't know if the one
2079
01:18:49,930 --> 01:18:51,550
machine program is consistent or not but
2080
01:18:51,550 --> 01:18:52,930
if I do what the one machine does at
2081
01:18:52,930 --> 01:18:56,590
least I didn't make it less consistent
2082
01:18:56,590 --> 01:18:58,210
by doing it in a distributed fashion
2083
01:18:58,210 --> 01:19:02,110
right so that's one specific trickery
2084
01:19:02,110 --> 01:19:03,730
even when you provide the right
2085
01:19:03,730 --> 01:19:05,560
definition of what you're trying to
2086
01:19:05,560 --> 01:19:07,600
achieve okay not to mention that you can
2087
01:19:07,600 --> 01:19:09,100
actually prove negative results let's
2088
01:19:09,100 --> 01:19:10,590
say you can never really achieve
2089
01:19:10,590 --> 01:19:14,500
consistency in certain ways and so on so
2090
01:19:14,500 --> 01:19:16,450
global is one such distributed system
2091
01:19:16,450 --> 01:19:18,900
right
2092
01:19:19,819 --> 01:19:21,679
one of the interesting characteristics
2093
01:19:21,679 --> 01:19:23,869
in global ears that he can do redirect
2094
01:19:23,869 --> 01:19:26,299
so why would you do redirect well an
2095
01:19:26,299 --> 01:19:28,039
interesting idea is and this is actually
2096
01:19:28,039 --> 01:19:29,479
happening with the ad servers you think
2097
01:19:29,479 --> 01:19:31,339
you're accessing some server that the
2098
01:19:31,339 --> 01:19:32,719
main CNN server that you are actually
2099
01:19:32,719 --> 01:19:34,339
accessing the site server that that's
2100
01:19:34,339 --> 01:19:37,309
placed at your ISP right so these
2101
01:19:37,309 --> 01:19:41,449
redirects are crucial for dealing with
2102
01:19:41,449 --> 01:19:44,029
large loads for example when you're
2103
01:19:44,029 --> 01:19:46,099
really going let's say to ebay right
2104
01:19:46,099 --> 01:19:47,659
your go to eBay and try to do some kind
2105
01:19:47,659 --> 01:19:50,899
of a transaction right if you don't use
2106
01:19:50,899 --> 01:19:53,359
trickery literally a single server would
2107
01:19:53,359 --> 01:19:56,299
have to serve all the requests and
2108
01:19:56,299 --> 01:19:57,609
that's problematic at so many levels
2109
01:19:57,609 --> 01:19:59,509
okay especially when you have millions
2110
01:19:59,509 --> 01:20:01,699
of people pounding on it now what's the
2111
01:20:01,699 --> 01:20:04,489
trick that large companies use but by
2112
01:20:04,489 --> 01:20:06,709
the way they might use this tricks even
2113
01:20:06,709 --> 01:20:08,689
in a geographically distributed way in
2114
01:20:08,689 --> 01:20:10,099
which they have multiples or centers in
2115
01:20:10,099 --> 01:20:11,659
multiple places they use multiple
2116
01:20:11,659 --> 01:20:13,489
servers to serve the requests I mean
2117
01:20:13,489 --> 01:20:16,219
when it comes to the HTML requests for
2118
01:20:16,219 --> 01:20:18,739
example for the web browser right most
2119
01:20:18,739 --> 01:20:19,969
of the difficulty is putting together
2120
01:20:19,969 --> 01:20:22,009
that HTML page that you get that
2121
01:20:22,009 --> 01:20:23,479
contains some information from the
2122
01:20:23,479 --> 01:20:24,919
backend database but a lot of other
2123
01:20:24,919 --> 01:20:27,199
fuzzy stuff in there okay
2124
01:20:27,199 --> 01:20:29,389
some of it is for example pictures and
2125
01:20:29,389 --> 01:20:30,679
other things that you can get in mostly
2126
01:20:30,679 --> 01:20:32,329
read-only way I don't even necessarily
2127
01:20:32,329 --> 01:20:35,059
need to to pull them from databases so
2128
01:20:35,059 --> 01:20:36,649
there is a lot of work to be done beyond
2129
01:20:36,649 --> 01:20:38,869
just the actual database database work
2130
01:20:38,869 --> 01:20:40,699
and you easily could do it on multiple
2131
01:20:40,699 --> 01:20:42,199
machines but then your problem is the
2132
01:20:42,199 --> 01:20:44,839
following is how can you give the
2133
01:20:44,839 --> 01:20:47,209
illusion of a single entry point but to
2134
01:20:47,209 --> 01:20:48,979
have multiple machines that actually
2135
01:20:48,979 --> 01:20:51,079
serve the requests and these redirects
2136
01:20:51,079 --> 01:20:53,509
are crucial right so they're really one
2137
01:20:53,509 --> 01:20:57,649
of the main mechanisms for which you can
2138
01:20:57,649 --> 01:21:00,649
in fact provide a lot of the services we
2139
01:21:00,649 --> 01:21:02,089
want to provide in distributed systems
2140
01:21:02,089 --> 01:21:03,799
some sort of fault tolerance for example
2141
01:21:03,799 --> 01:21:06,589
one of the thousand front end servers
2142
01:21:06,589 --> 01:21:08,179
hiccup so what you just don't give it
2143
01:21:08,179 --> 01:21:10,459
requests by the way when you're doing
2144
01:21:10,459 --> 01:21:12,229
that it doesn't load and then you press
2145
01:21:12,229 --> 01:21:13,789
reload then he goes through it could be
2146
01:21:13,789 --> 01:21:15,649
that you got sent to one of these
2147
01:21:15,649 --> 01:21:17,839
servers that for some reason was stuck
2148
01:21:17,839 --> 01:21:19,939
and when you do a refresh your sent to
2149
01:21:19,939 --> 01:21:21,499
another server so there's some sort of
2150
01:21:21,499 --> 01:21:23,029
component that can actually do this
2151
01:21:23,029 --> 01:21:25,279
video right now these redirects can
2152
01:21:25,279 --> 01:21:28,399
happen in multiple ways for example with
2153
01:21:28,399 --> 01:21:29,809
the actor model what you could do is
2154
01:21:29,809 --> 01:21:31,670
have a front actor that yes requires
2155
01:21:31,670 --> 01:21:33,110
he doesn't do anything except delegated
2156
01:21:33,110 --> 01:21:35,030
to another actor now of course the
2157
01:21:35,030 --> 01:21:36,230
question is how would you implement this
2158
01:21:36,230 --> 01:21:38,000
front actor now you could use Scala and
2159
01:21:38,000 --> 01:21:39,350
so on and that will take you to a
2160
01:21:39,350 --> 01:21:41,179
certain level at some point it becomes
2161
01:21:41,179 --> 01:21:45,530
overwhelming and literally the thing to
2162
01:21:45,530 --> 01:21:47,900
do would be to do it at an extremely low
2163
01:21:47,900 --> 01:21:49,280
level and this is where the networking
2164
01:21:49,280 --> 01:21:51,830
people can come in right so a lot of
2165
01:21:51,830 --> 01:21:53,929
these redirects can actually be done
2166
01:21:53,929 --> 01:21:56,300
deep in the networking stack and the
2167
01:21:56,300 --> 01:21:58,670
natural protocol by the actual router so
2168
01:21:58,670 --> 01:22:01,280
these routers are extremely fast devices
2169
01:22:01,280 --> 01:22:03,530
that can get hundreds of millions of
2170
01:22:03,530 --> 01:22:06,560
packets now in and make hundreds of
2171
01:22:06,560 --> 01:22:08,540
millions going in other directions but
2172
01:22:08,540 --> 01:22:10,190
they can do this magic with redirects
2173
01:22:10,190 --> 01:22:12,920
also how do you do that well you simply
2174
01:22:12,920 --> 01:22:14,600
replace some information in the packet
2175
01:22:14,600 --> 01:22:16,760
with other information so you want it to
2176
01:22:16,760 --> 01:22:19,719
go to the main server that has a certain
2177
01:22:19,719 --> 01:22:21,380
IP address
2178
01:22:21,380 --> 01:22:23,270
well this router can actually change the
2179
01:22:23,270 --> 01:22:24,980
IP address and send it to one of the
2180
01:22:24,980 --> 01:22:26,360
thousand other servers that can actually
2181
01:22:26,360 --> 01:22:27,590
serve it right
2182
01:22:27,590 --> 01:22:30,290
in fact the tcp/ip protocol allows you
2183
01:22:30,290 --> 01:22:31,130
to do this
2184
01:22:31,130 --> 01:22:33,410
handover for a connection you want to
2185
01:22:33,410 --> 01:22:34,820
open a connection with this guy but I
2186
01:22:34,820 --> 01:22:36,020
hand you over to another guy and then
2187
01:22:36,020 --> 01:22:37,190
you establish the connection with the
2188
01:22:37,190 --> 01:22:39,530
other guy right if this is not done at
2189
01:22:39,530 --> 01:22:42,290
the very core of the network device you
2190
01:22:42,290 --> 01:22:44,000
have no chance to do large-scale load
2191
01:22:44,000 --> 01:22:45,980
balancing so that's an extreme version
2192
01:22:45,980 --> 01:22:48,949
of this redirect but this redirection
2193
01:22:48,949 --> 01:22:50,929
happened throughout the stack at a
2194
01:22:50,929 --> 01:22:52,219
higher and higher and higher and higher
2195
01:22:52,219 --> 01:22:55,520
level including in in JavaScript so for
2196
01:22:55,520 --> 01:22:57,140
example when you install say a Linux
2197
01:22:57,140 --> 01:22:58,699
distribution right you have to pick a
2198
01:22:58,699 --> 01:23:02,540
mirror an automatic mirror selection
2199
01:23:02,540 --> 01:23:04,780
it's in fact a form of redirect right
2200
01:23:04,780 --> 01:23:08,090
because you just I mean literally this
2201
01:23:08,090 --> 01:23:09,920
you can you can do you go to the main
2202
01:23:09,920 --> 01:23:13,429
website and simply in JavaScript you
2203
01:23:13,429 --> 01:23:15,980
send a JSON object in which you say here
2204
01:23:15,980 --> 01:23:18,500
are 20 meters and then JavaScript flips
2205
01:23:18,500 --> 01:23:21,010
a coin and says I go to this mirror
2206
01:23:21,010 --> 01:23:23,989
that's a redirect happening extremely
2207
01:23:23,989 --> 01:23:26,360
high in the client as opposed to
2208
01:23:26,360 --> 01:23:28,370
happening deep in the network stack but
2209
01:23:28,370 --> 01:23:29,739
it's essentially the same mechanism
2210
01:23:29,739 --> 01:23:32,780
right it's very good for open source
2211
01:23:32,780 --> 01:23:35,120
projects that cannot buy those really
2212
01:23:35,120 --> 01:23:36,710
tough routers that can do this kind of
2213
01:23:36,710 --> 01:23:38,989
magic they are very expensive by the way
2214
01:23:38,989 --> 01:23:40,550
they gives me running into millions of
2215
01:23:40,550 --> 01:23:43,340
dollars okay I mean they're really
2216
01:23:43,340 --> 01:23:44,510
really tough
2217
01:23:44,510 --> 01:23:47,000
routers by the way this actually begs
2218
01:23:47,000 --> 01:23:48,199
the following kind of interesting
2219
01:23:48,199 --> 01:23:52,250
question if you want to let's say attack
2220
01:23:52,250 --> 01:23:56,179
one of the main websites what would you
2221
01:23:56,179 --> 01:23:58,489
attack so it's hard to overwhelm a
2222
01:23:58,489 --> 01:24:03,860
thousand front end servers all of them
2223
01:24:03,860 --> 01:24:06,019
can easily deal with ten thousand
2224
01:24:06,019 --> 01:24:08,170
simultaneous connections no problem
2225
01:24:08,170 --> 01:24:11,539
right so you're talking about 10 million
2226
01:24:11,539 --> 01:24:13,190
simultaneous connections you really need
2227
01:24:13,190 --> 01:24:15,500
10 I mean it's hard to keep them busy
2228
01:24:15,500 --> 01:24:17,119
you have to get all the computers on the
2229
01:24:17,119 --> 01:24:18,829
planet to really pound on them to to
2230
01:24:18,829 --> 01:24:21,800
really make them sweat or you could just
2231
01:24:21,800 --> 01:24:23,719
take down the front the front router
2232
01:24:23,719 --> 01:24:25,730
this one this is what happened in 2001
2233
01:24:25,730 --> 01:24:27,050
they figure out how to treat the front
2234
01:24:27,050 --> 01:24:29,780
router the one that was doing the load
2235
01:24:29,780 --> 01:24:31,219
balancing the tricks with the rewriting
2236
01:24:31,219 --> 01:24:33,739
the hijacking right if you take the main
2237
01:24:33,739 --> 01:24:36,260
guide down the whole thing is done it
2238
01:24:36,260 --> 01:24:37,699
doesn't matter that the backend servers
2239
01:24:37,699 --> 01:24:39,949
are up it's the same issue we discussed
2240
01:24:39,949 --> 01:24:42,829
with with Napster you took the name
2241
01:24:42,829 --> 01:24:44,269
servers down you took the whole thing
2242
01:24:44,269 --> 01:24:45,920
down even though the information was
2243
01:24:45,920 --> 01:24:47,179
there there was no way to get to it
2244
01:24:47,179 --> 01:24:51,019
right so in fact that main router at the
2245
01:24:51,019 --> 01:24:53,030
entrance point it's in a centralized
2246
01:24:53,030 --> 01:24:54,559
solution this is the kind of thing that
2247
01:24:54,559 --> 01:24:56,539
these ruby systems don't like disability
2248
01:24:56,539 --> 01:24:58,250
systems people don't like right so
2249
01:24:58,250 --> 01:24:59,659
interesting questions are could you not
2250
01:24:59,659 --> 01:25:01,369
have that router there and still do well
2251
01:25:01,369 --> 01:25:03,260
and right but still doing such a good
2252
01:25:03,260 --> 01:25:05,150
job I mean maybe you can have a spare
2253
01:25:05,150 --> 01:25:08,210
and you'll flip tricky stuff by the way
2254
01:25:08,210 --> 01:25:09,590
these are important issues when it comes
2255
01:25:09,590 --> 01:25:11,420
to databases as well you have databases
2256
01:25:11,420 --> 01:25:14,210
that are now essentially some cater to
2257
01:25:14,210 --> 01:25:16,460
many many many clients you still have to
2258
01:25:16,460 --> 01:25:17,630
go and do sort of some kind of
2259
01:25:17,630 --> 01:25:20,570
transactions in the database right what
2260
01:25:20,570 --> 01:25:22,130
happens if the main database goes down
2261
01:25:22,130 --> 01:25:26,059
for example if that happens for for visa
2262
01:25:26,059 --> 01:25:28,099
it's a tragedy I mean people can't buy
2263
01:25:28,099 --> 01:25:31,670
their snacks right it's cash only for
2264
01:25:31,670 --> 01:25:34,000
the entire planet that's not good I mean
2265
01:25:34,000 --> 01:25:36,349
it's possible but this is something this
2266
01:25:36,349 --> 01:25:37,400
would happen right when the whole
2267
01:25:37,400 --> 01:25:40,789
network is done right now Google being
2268
01:25:40,789 --> 01:25:43,159
down everybody gets bored the creeks are
2269
01:25:43,159 --> 01:25:44,809
much is being down and that's a tragedy
2270
01:25:44,809 --> 01:25:48,289
at least for me I don't can't have cash
2271
01:25:48,289 --> 01:25:51,130
with little right
2272
01:25:51,130 --> 01:25:54,159
right and so this is the kind of picture
2273
01:25:54,159 --> 01:25:55,270
you can imagine right you could
2274
01:25:55,270 --> 01:25:56,679
intercept at the request you can
2275
01:25:56,679 --> 01:25:58,600
intercept in the in the serve you can
2276
01:25:58,600 --> 01:25:59,889
intercept at so many different levels
2277
01:25:59,889 --> 01:26:01,929
now why is that it's not true because
2278
01:26:01,929 --> 01:26:03,580
everything can be virtualized everybody
2279
01:26:03,580 --> 01:26:06,190
can can trick and present a certain view
2280
01:26:06,190 --> 01:26:07,510
but in fact do something very different
2281
01:26:07,510 --> 01:26:11,560
right if we would really have every
2282
01:26:11,560 --> 01:26:13,300
application talk directly on the wire
2283
01:26:13,300 --> 01:26:15,370
with the raw device and this was really
2284
01:26:15,370 --> 01:26:17,110
the case about twenty something years
2285
01:26:17,110 --> 01:26:19,060
ago more about 25 with the DOS operating
2286
01:26:19,060 --> 01:26:20,920
system and things like that then there
2287
01:26:20,920 --> 01:26:24,179
is no cheating to be done right this
2288
01:26:24,179 --> 01:26:27,610
abstractions allow cheating which is
2289
01:26:27,610 --> 01:26:31,030
good in this case so a lot of what these
2290
01:26:31,030 --> 01:26:34,989
three systems would have to do right is
2291
01:26:34,989 --> 01:26:38,350
to use such virtualization and to use
2292
01:26:38,350 --> 01:26:40,960
such cheating and such this kind of
2293
01:26:40,960 --> 01:26:43,389
rewrite to mask certain kinds of
2294
01:26:43,389 --> 01:26:45,340
failures right the server is done well
2295
01:26:45,340 --> 01:26:46,600
I'll send you to another server and
2296
01:26:46,600 --> 01:26:48,010
don't even know what was the server you
2297
01:26:48,010 --> 01:26:49,210
wanted to talk to in the first place
2298
01:26:49,210 --> 01:26:50,760
because they all look the same together
2299
01:26:50,760 --> 01:26:55,900
then right so in some services if you
2300
01:26:55,900 --> 01:26:57,850
look carefully at the URL you might even
2301
01:26:57,850 --> 01:27:00,130
see where on which specific server
2302
01:27:00,130 --> 01:27:01,810
something gets stored right you go to
2303
01:27:01,810 --> 01:27:04,449
the main website you go www flickr.com
2304
01:27:04,449 --> 01:27:06,699
and then when you look at your pictures
2305
01:27:06,699 --> 01:27:08,139
maybe they are on one particular server
2306
01:27:08,139 --> 01:27:10,480
so has a weird name okay some of them
2307
01:27:10,480 --> 01:27:13,179
completely mask it from the at the low
2308
01:27:13,179 --> 01:27:16,480
level in the networking right and by the
2309
01:27:16,480 --> 01:27:19,139
way when you access your gmail account
2310
01:27:19,139 --> 01:27:23,350
the actual emails leave on some server
2311
01:27:23,350 --> 01:27:25,300
not all on the same server by the way
2312
01:27:25,300 --> 01:27:26,620
but they live on some server somewhere
2313
01:27:26,620 --> 01:27:28,449
so I mean some machine knows what your
2314
01:27:28,449 --> 01:27:31,330
email content is it's just all the magic
2315
01:27:31,330 --> 01:27:32,650
that happens behind that's of this pure
2316
01:27:32,650 --> 01:27:34,330
distribution systems magic this using
2317
01:27:34,330 --> 01:27:36,489
file systems or some other means we'll
2318
01:27:36,489 --> 01:27:38,199
find where that file is and it makes it
2319
01:27:38,199 --> 01:27:39,699
back to you and you don't know better
2320
01:27:39,699 --> 01:27:41,739
and you definitely don't care which
2321
01:27:41,739 --> 01:27:44,590
which of the million Google servers has
2322
01:27:44,590 --> 01:27:47,590
your email as long as more than one has
2323
01:27:47,590 --> 01:27:48,820
your email so it's actually a
2324
01:27:48,820 --> 01:27:56,010
fault-tolerant right so
2325
01:27:56,130 --> 01:27:58,659
one of the main things you would like to
2326
01:27:58,659 --> 01:28:00,130
do in the context of disability systems
2327
01:28:00,130 --> 01:28:03,219
is to write this adaptive software
2328
01:28:03,219 --> 01:28:05,730
adaptive is good right because let me
2329
01:28:05,730 --> 01:28:08,469
self-healing self monitoring self
2330
01:28:08,469 --> 01:28:11,650
something right that sounds fantastic
2331
01:28:11,650 --> 01:28:15,690
because if it's self let's say healing
2332
01:28:15,690 --> 01:28:18,130
very some damage to the system it fixes
2333
01:28:18,130 --> 01:28:21,040
itself right and then it's as good as as
2334
01:28:21,040 --> 01:28:22,570
new and you keep on going and never
2335
01:28:22,570 --> 01:28:24,940
shuts down okay I mentioned the fact
2336
01:28:24,940 --> 01:28:27,969
that Erikson did precisely this with
2337
01:28:27,969 --> 01:28:31,030
Erlang the first large-scale language to
2338
01:28:31,030 --> 01:28:33,219
use actors who in fact design extremely
2339
01:28:33,219 --> 01:28:36,909
reliable telco systems I mean one way to
2340
01:28:36,909 --> 01:28:38,560
do it so I mentioned this before one way
2341
01:28:38,560 --> 01:28:40,270
to do extremely reliable is to never do
2342
01:28:40,270 --> 01:28:43,510
mistakes which we have yet to see or to
2343
01:28:43,510 --> 01:28:45,639
repair any mistakes you detect and
2344
01:28:45,639 --> 01:28:48,790
repair mistakes and then right if you
2345
01:28:48,790 --> 01:28:50,889
can find the bad part cut it off and put
2346
01:28:50,889 --> 01:28:55,210
another part in place the user might not
2347
01:28:55,210 --> 01:28:57,489
even notice anything more than a slight
2348
01:28:57,489 --> 01:29:02,050
delay in providing the service okay all
2349
01:29:02,050 --> 01:29:04,090
right so some of the approaches and this
2350
01:29:04,090 --> 01:29:07,510
again it's a way to be systematic about
2351
01:29:07,510 --> 01:29:09,310
how you talk about this but I mean these
2352
01:29:09,310 --> 01:29:10,449
are basically just fancy words
2353
01:29:10,449 --> 01:29:12,040
ultimately and you can do very many
2354
01:29:12,040 --> 01:29:14,170
variations on top of this is separation
2355
01:29:14,170 --> 01:29:16,420
of concerns computational reflection and
2356
01:29:16,420 --> 01:29:18,100
component based design so the easiest
2357
01:29:18,100 --> 01:29:20,560
one is component based design so why do
2358
01:29:20,560 --> 01:29:24,070
we even do component based design well
2359
01:29:24,070 --> 01:29:27,820
the trouble is we as humans and machines
2360
01:29:27,820 --> 01:29:30,520
are worse we are not particularly good
2361
01:29:30,520 --> 01:29:32,260
at keeping track of complicated things
2362
01:29:32,260 --> 01:29:33,790
unless we break them into simpler things
2363
01:29:33,790 --> 01:29:35,260
this is what engineering is all about
2364
01:29:35,260 --> 01:29:38,020
right get an extremely complicated
2365
01:29:38,020 --> 01:29:40,900
seeing broken down into nuts and bolts
2366
01:29:40,900 --> 01:29:42,250
and how you put them together and how
2367
01:29:42,250 --> 01:29:43,810
you keep them together how you build it
2368
01:29:43,810 --> 01:29:45,730
and how you maintain it later right
2369
01:29:45,730 --> 01:29:47,650
that immediately suggests some sort of
2370
01:29:47,650 --> 01:29:50,350
components so if we would be required
2371
01:29:50,350 --> 01:29:52,300
all the time to express everything in
2372
01:29:52,300 --> 01:29:53,710
terms of transistors it would be a
2373
01:29:53,710 --> 01:29:55,420
disaster we would all just make those
2374
01:29:55,420 --> 01:29:57,219
little sirens that go with slightly
2375
01:29:57,219 --> 01:30:01,239
different sounds right but we build
2376
01:30:01,239 --> 01:30:02,889
components some of them are full slice
2377
01:30:02,889 --> 01:30:04,350
processor that just does all the magic
2378
01:30:04,350 --> 01:30:05,560
right
2379
01:30:05,560 --> 01:30:07,239
and because of that we can accomplish a
2380
01:30:07,239 --> 01:30:08,289
lot more things now
2381
01:30:08,289 --> 01:30:09,579
poor InDesign it's to some extent
2382
01:30:09,579 --> 01:30:11,979
wasteful because you're not using the
2383
01:30:11,979 --> 01:30:14,379
hardware maybe at the highest potential
2384
01:30:14,379 --> 01:30:15,699
but at the same time allows you to do
2385
01:30:15,699 --> 01:30:16,780
complicated things that otherwise you
2386
01:30:16,780 --> 01:30:20,199
wouldn't be able to accomplish right for
2387
01:30:20,199 --> 01:30:22,449
example did you ever wonder how Intel is
2388
01:30:22,449 --> 01:30:27,189
putting together those processors so
2389
01:30:27,189 --> 01:30:30,280
first of all intel has many thousands of
2390
01:30:30,280 --> 01:30:31,959
engineers that design those components
2391
01:30:31,959 --> 01:30:34,539
they were doing so for many years if you
2392
01:30:34,539 --> 01:30:35,769
think they are redesigning everything
2393
01:30:35,769 --> 01:30:38,499
from scratch every time it's impossible
2394
01:30:38,499 --> 01:30:40,149
the complexity is just out of the scale
2395
01:30:40,149 --> 01:30:41,979
plus they buy designs from other
2396
01:30:41,979 --> 01:30:44,649
companies it's like buying libraries
2397
01:30:44,649 --> 01:30:47,109
right little company in Silicon Valley
2398
01:30:47,109 --> 01:30:50,499
designs a very good floating-point unit
2399
01:30:50,499 --> 01:30:54,359
Intel will license it why it's cheaper
2400
01:30:54,359 --> 01:30:57,099
right you cannot really go back to those
2401
01:30:57,099 --> 01:30:59,889
transistors all the time ok and that
2402
01:30:59,889 --> 01:31:02,709
even at that level is component based so
2403
01:31:02,709 --> 01:31:04,449
libraries for example are one way to do
2404
01:31:04,449 --> 01:31:06,489
component based programming you simply
2405
01:31:06,489 --> 01:31:08,429
use prepackaged
2406
01:31:08,429 --> 01:31:12,159
existing framework to get the job done
2407
01:31:12,159 --> 01:31:14,769
tcp/ip it's an extreme example open
2408
01:31:14,769 --> 01:31:16,659
connection send information this is
2409
01:31:16,659 --> 01:31:18,519
extremely complex behavior going on on
2410
01:31:18,519 --> 01:31:20,679
TCP but it's a component it's a block
2411
01:31:20,679 --> 01:31:23,859
just use it right so it's obvious why a
2412
01:31:23,859 --> 01:31:25,179
component based design would actually
2413
01:31:25,179 --> 01:31:26,919
help and we if you don't do this you
2414
01:31:26,919 --> 01:31:29,010
don't get any anything accomplished ok
2415
01:31:29,010 --> 01:31:32,919
but the interesting story right the
2416
01:31:32,919 --> 01:31:34,449
importance of abstraction so I found an
2417
01:31:34,449 --> 01:31:35,919
interesting example when somebody got
2418
01:31:35,919 --> 01:31:37,349
away with an extremely low level
2419
01:31:37,349 --> 01:31:40,749
abstraction and I really admire the guy
2420
01:31:40,749 --> 01:31:43,089
right so you normally in what language
2421
01:31:43,089 --> 01:31:45,369
are most games written anybody happens
2422
01:31:45,369 --> 01:31:48,819
to know C++ or C I mean it's kind of
2423
01:31:48,819 --> 01:31:51,249
split now okay but it used to be pure C
2424
01:31:51,249 --> 01:31:56,249
but I found out about this game designer
2425
01:31:56,249 --> 01:31:59,349
that was writing games in assembly as
2426
01:31:59,349 --> 01:32:04,839
late as 2005 okay so this is a guy that
2427
01:32:04,839 --> 01:32:07,809
started with assembly on the 80
2428
01:32:07,809 --> 01:32:09,760
processor and really got into it and
2429
01:32:09,760 --> 01:32:11,619
then just went with assembly so let me
2430
01:32:11,619 --> 01:32:12,909
remember the names of the games he did
2431
01:32:12,909 --> 01:32:15,909
so he did the railroad tycoon and then
2432
01:32:15,909 --> 01:32:19,209
the rollercoaster tycoon and then
2433
01:32:19,209 --> 01:32:22,059
whatever follow ups on this so he was
2434
01:32:22,059 --> 01:32:23,829
the only programmer for the game and he
2435
01:32:23,829 --> 01:32:26,439
wrote everything in assembly x86
2436
01:32:26,439 --> 01:32:28,569
assembly fold for the later games no
2437
01:32:28,569 --> 01:32:32,499
look some people just have it so they
2438
01:32:32,499 --> 01:32:34,449
can work with crazy abstractions and get
2439
01:32:34,449 --> 01:32:36,489
the work done most people can't so they
2440
01:32:36,489 --> 01:32:38,109
have to use higher-level components to
2441
01:32:38,109 --> 01:32:39,489
get things done in high-level languages
2442
01:32:39,489 --> 01:32:44,799
right so yeah I mean the exceptions
2443
01:32:44,799 --> 01:32:46,389
maybe strengthen the rules in some of
2444
01:32:46,389 --> 01:32:48,429
these this is the only known example of
2445
01:32:48,429 --> 01:32:51,459
an assembly complex assembly written
2446
01:32:51,459 --> 01:32:51,939
game
2447
01:32:51,939 --> 01:32:54,069
and trust me two people could not have
2448
01:32:54,069 --> 01:32:56,169
written that game it's only one mind can
2449
01:32:56,169 --> 01:32:58,479
keep that craziness under control right
2450
01:32:58,479 --> 01:33:01,029
it's two people cannot ever agree how to
2451
01:33:01,029 --> 01:33:02,489
do things such a low-level
2452
01:33:02,489 --> 01:33:04,929
okay separation of concerns this is
2453
01:33:04,929 --> 01:33:06,999
gonna be happening throughout this class
2454
01:33:06,999 --> 01:33:09,099
is how can you deal with issues
2455
01:33:09,099 --> 01:33:11,859
separately right so for example how can
2456
01:33:11,859 --> 01:33:13,209
you separate the basic functionality
2457
01:33:13,209 --> 01:33:14,469
from high-level functionality like
2458
01:33:14,469 --> 01:33:16,499
redundancy fault tolerance and whatnot
2459
01:33:16,499 --> 01:33:19,629
because usually low-level functionality
2460
01:33:19,629 --> 01:33:21,849
is more in the realm of the specifics
2461
01:33:21,849 --> 01:33:23,109
the application and the high level
2462
01:33:23,109 --> 01:33:25,719
functionality potentially can be bundled
2463
01:33:25,719 --> 01:33:28,209
up in a more uniform way so then you
2464
01:33:28,209 --> 01:33:29,799
might have more Universal solutions for
2465
01:33:29,799 --> 01:33:31,389
how do you make something float tolerant
2466
01:33:31,389 --> 01:33:33,369
and to some extent those high level
2467
01:33:33,369 --> 01:33:36,609
issues are harder to do right now
2468
01:33:36,609 --> 01:33:38,079
computational refraction this is
2469
01:33:38,079 --> 01:33:40,449
somewhat of a weird one because what
2470
01:33:40,449 --> 01:33:42,309
computational refraction really means is
2471
01:33:42,309 --> 01:33:46,749
the program can ask itself what behavior
2472
01:33:46,749 --> 01:33:48,249
it has I mean what would the program ask
2473
01:33:48,249 --> 01:33:50,229
well because the program was written by
2474
01:33:50,229 --> 01:33:54,069
the programmer and it's usually not the
2475
01:33:54,069 --> 01:33:55,959
same programmer throughout the time and
2476
01:33:55,959 --> 01:33:59,229
one way to diagnose yourself or one way
2477
01:33:59,229 --> 01:34:01,239
to know as a program what's going on is
2478
01:34:01,239 --> 01:34:05,379
to say what am i doing or how do i work
2479
01:34:05,379 --> 01:34:08,199
or what methods do I have so this is
2480
01:34:08,199 --> 01:34:09,129
something that happens in high-level
2481
01:34:09,129 --> 01:34:10,779
languages for example this is a very
2482
01:34:10,779 --> 01:34:15,189
very neat way to to program for example
2483
01:34:15,189 --> 01:34:17,289
this is one reason I like JavaScript you
2484
01:34:17,289 --> 01:34:19,149
can get a random library of whoever
2485
01:34:19,149 --> 01:34:22,299
wrote it and in JavaScript say hey what
2486
01:34:22,299 --> 01:34:25,029
methods do I have I don't I don't even
2487
01:34:25,029 --> 01:34:26,619
care how that guy wrote his library I'm
2488
01:34:26,619 --> 01:34:27,909
just gonna look at the methods using
2489
01:34:27,909 --> 01:34:29,559
this reflection what methods are did
2490
01:34:29,559 --> 01:34:30,909
heven I can guess what it is especially
2491
01:34:30,909 --> 01:34:33,249
if I do it in a console but that can be
2492
01:34:33,249 --> 01:34:34,089
done completely
2493
01:34:34,089 --> 01:34:35,679
in the language for example in
2494
01:34:35,679 --> 01:34:37,510
JavaScript you can write a for loop to
2495
01:34:37,510 --> 01:34:40,629
go over your content as an object that
2496
01:34:40,629 --> 01:34:42,010
essentially means that the content as an
2497
01:34:42,010 --> 01:34:45,459
object is not fixed as is for example
2498
01:34:45,459 --> 01:34:48,069
the case in Java or C++ right oh by the
2499
01:34:48,069 --> 01:34:49,329
way in Scala it's also fixed because
2500
01:34:49,329 --> 01:34:51,039
it's strongly typed language but this
2501
01:34:51,039 --> 01:34:53,229
refraction can come in handy in a big
2502
01:34:53,229 --> 01:34:56,439
way and especially can be useful if you
2503
01:34:56,439 --> 01:34:58,030
have multiple versions of the same of
2504
01:34:58,030 --> 01:35:00,519
the same program because for example one
2505
01:35:00,519 --> 01:35:02,019
way to make sure that you can work
2506
01:35:02,019 --> 01:35:03,729
correctly is to say do I have this
2507
01:35:03,729 --> 01:35:05,199
functionality if yes I'm gonna do
2508
01:35:05,199 --> 01:35:08,050
something if not say hey this version is
2509
01:35:08,050 --> 01:35:10,079
too old maybe you should upgrade
2510
01:35:10,079 --> 01:35:12,010
especially when you get separate
2511
01:35:12,010 --> 01:35:13,479
components that are not access is
2512
01:35:13,479 --> 01:35:15,189
synchronized from a versioning point of
2513
01:35:15,189 --> 01:35:17,709
view right so this computational
2514
01:35:17,709 --> 01:35:20,289
reflection it's a very high-level kind
2515
01:35:20,289 --> 01:35:24,399
of feature that makes design of these
2516
01:35:24,399 --> 01:35:26,559
three systems a little bit easier okay
2517
01:35:26,559 --> 01:35:27,939
so it's a kind of a nice thing to help
2518
01:35:27,939 --> 01:35:32,349
you definitely don't have in C++ so how
2519
01:35:32,349 --> 01:35:34,030
would you do this in C++ can you maybe
2520
01:35:34,030 --> 01:35:35,649
this is an interesting assignment write
2521
01:35:35,649 --> 01:35:37,510
a C++ program that enumerate all the
2522
01:35:37,510 --> 01:35:42,699
method it has an object has in C++ you
2523
01:35:42,699 --> 01:35:46,539
just know because of the curved what
2524
01:35:46,539 --> 01:35:48,339
methods you have but you can't say
2525
01:35:48,339 --> 01:35:51,760
enumerate what methods I have really the
2526
01:35:51,760 --> 01:35:53,379
new C plus percent that has a little bit
2527
01:35:53,379 --> 01:35:55,780
of reflection but not significantly
2528
01:35:55,780 --> 01:35:58,119
right in PHP JavaScript you can really
2529
01:35:58,119 --> 01:36:03,189
say what matters do I have by the way
2530
01:36:03,189 --> 01:36:06,059
the original object-oriented language
2531
01:36:06,059 --> 01:36:08,229
which was the first object oriented
2532
01:36:08,229 --> 01:36:10,919
language anybody knows
2533
01:36:13,589 --> 01:36:19,539
okay so I have C++ I have Fortran well
2534
01:36:19,539 --> 01:36:20,889
it turns out that none of the above yes
2535
01:36:20,889 --> 01:36:25,479
I'm sorry Simula C one had some ideas in
2536
01:36:25,479 --> 01:36:26,649
object-oriented but that first pure
2537
01:36:26,649 --> 01:36:29,439
object-oriented I'm sorry oh no way skal
2538
01:36:29,439 --> 01:36:32,379
I so knew I mean C++ is older how many
2539
01:36:32,379 --> 01:36:36,819
people heard about small talk small talk
2540
01:36:36,819 --> 01:36:38,169
is the original object-oriented
2541
01:36:38,169 --> 01:36:39,819
programming language and had reflection
2542
01:36:39,819 --> 01:36:42,309
the designers of small talk believe that
2543
01:36:42,309 --> 01:36:43,839
that's a crucial property of any
2544
01:36:43,839 --> 01:36:46,089
object-oriented language and then C++
2545
01:36:46,089 --> 01:36:47,260
came along and destroyed
2546
01:36:47,260 --> 01:36:51,489
right in small talk an object could say
2547
01:36:51,489 --> 01:36:52,780
what methods do I have
2548
01:36:52,780 --> 01:36:54,970
and then selectively call them or not
2549
01:36:54,970 --> 01:36:56,920
call them small talk was designed to do
2550
01:36:56,920 --> 01:36:59,200
graphic interfaces and it's crucial for
2551
01:36:59,200 --> 01:37:02,650
nice menu based system and whatnot to
2552
01:37:02,650 --> 01:37:04,060
have in fact some sort of an
2553
01:37:04,060 --> 01:37:06,790
object-oriented programming interface
2554
01:37:06,790 --> 01:37:08,920
makes it much much much easier all the
2555
01:37:08,920 --> 01:37:10,890
good interfaces like that have
2556
01:37:10,890 --> 01:37:12,460
significant characteristic of
2557
01:37:12,460 --> 01:37:14,650
object-oriented design okay with
2558
01:37:14,650 --> 01:37:17,890
reflection so how white C++ designers
2559
01:37:17,890 --> 01:37:21,370
knocked off reflection oh because it
2560
01:37:21,370 --> 01:37:22,660
gives performance right as I forget
2561
01:37:22,660 --> 01:37:23,920
about it it gives performance we are
2562
01:37:23,920 --> 01:37:27,340
knocking it off right so to a large
2563
01:37:27,340 --> 01:37:30,340
extent C++ is not a true object-oriented
2564
01:37:30,340 --> 01:37:34,180
language it's a kind of object-oriented
2565
01:37:34,180 --> 01:37:37,989
organized language we're not really an
2566
01:37:37,989 --> 01:37:39,790
object-oriented language not in the
2567
01:37:39,790 --> 01:37:42,190
small talk sense okay that doesn't make
2568
01:37:42,190 --> 01:37:45,220
it bad they just it essentially says
2569
01:37:45,220 --> 01:37:47,260
there are many flavors of everything and
2570
01:37:47,260 --> 01:37:48,610
you have to be careful how you call
2571
01:37:48,610 --> 01:37:50,140
things and what exactly do they mean
2572
01:37:50,140 --> 01:37:51,070
right
2573
01:37:51,070 --> 01:37:54,520
so different kind of story all right now
2574
01:37:54,520 --> 01:37:56,110
the last thing I want to mention is this
2575
01:37:56,110 --> 01:37:57,880
idea of a feedback control model because
2576
01:37:57,880 --> 01:37:59,800
this is what will allow you to monitor
2577
01:37:59,800 --> 01:38:01,989
yourself any feedback loop this is a big
2578
01:38:01,989 --> 01:38:04,330
issue for example in in control theory
2579
01:38:04,330 --> 01:38:06,430
any feedback loop consistent you start
2580
01:38:06,430 --> 01:38:09,520
with some initial configuration and what
2581
01:38:09,520 --> 01:38:10,690
you're trying to do is do some
2582
01:38:10,690 --> 01:38:12,400
Corrections they have to be based on
2583
01:38:12,400 --> 01:38:14,350
some sort of a loop in which you look at
2584
01:38:14,350 --> 01:38:16,120
what you were doing yeah that means you
2585
01:38:16,120 --> 01:38:18,190
measure it with some metric you do some
2586
01:38:18,190 --> 01:38:20,140
sort of analysis you figure out how to
2587
01:38:20,140 --> 01:38:21,310
adjust yourself and you apply the
2588
01:38:21,310 --> 01:38:23,290
adjustment and you keep staying in that
2589
01:38:23,290 --> 01:38:25,000
loop for example if you notice that a
2590
01:38:25,000 --> 01:38:26,680
part of it doesn't quite work and this
2591
01:38:26,680 --> 01:38:28,000
is one great thing you can do with
2592
01:38:28,000 --> 01:38:29,980
actors you have actors with morning poor
2593
01:38:29,980 --> 01:38:31,660
parts of the system if they say notice
2594
01:38:31,660 --> 01:38:33,510
that something is hiccupping one way to
2595
01:38:33,510 --> 01:38:36,220
to heal the system is you kill those
2596
01:38:36,220 --> 01:38:39,180
components and create the menu all right
2597
01:38:39,180 --> 01:38:42,280
any of them ran in a weird state if you
2598
01:38:42,280 --> 01:38:44,430
can afford to do that and not overly
2599
01:38:44,430 --> 01:38:47,770
disrupt the system your it's perfect for
2600
01:38:47,770 --> 01:38:49,440
example if you have a thousand servers
2601
01:38:49,440 --> 01:38:51,489
front-end service some of them will run
2602
01:38:51,489 --> 01:38:54,520
amok for whatever reasons best way is to
2603
01:38:54,520 --> 01:38:56,590
use this hardware devices that can
2604
01:38:56,590 --> 01:38:58,780
simply cut the power of the machine and
2605
01:38:58,780 --> 01:39:00,270
reboot it in a hard way
2606
01:39:00,270 --> 01:39:03,240
and you started again 30 seconds later
2607
01:39:03,240 --> 01:39:06,060
or whatever it's like new as opposed to
2608
01:39:06,060 --> 01:39:09,540
running on some sort of a problem so in
2609
01:39:09,540 --> 01:39:11,280
principle you could mask a lot a problem
2610
01:39:11,280 --> 01:39:13,650
a lot of problems I'm sorry you might
2611
01:39:13,650 --> 01:39:15,330
have for example you have a program that
2612
01:39:15,330 --> 01:39:17,790
has memory leaks you simply put an actor
2613
01:39:17,790 --> 01:39:19,710
in front of it that watches it and when
2614
01:39:19,710 --> 01:39:22,950
the problem is true severe you kill it
2615
01:39:22,950 --> 01:39:24,840
and it's like you know so you can mask
2616
01:39:24,840 --> 01:39:29,100
any bad code with self-healing which
2617
01:39:29,100 --> 01:39:30,720
means you have to be able to be careful
2618
01:39:30,720 --> 01:39:31,860
how you use this self-healing but in
2619
01:39:31,860 --> 01:39:33,270
principle it could be something that
2620
01:39:33,270 --> 01:39:35,100
will make the system very robust as long
2621
01:39:35,100 --> 01:39:35,880
as you can do this
2622
01:39:35,880 --> 01:39:39,660
take it down put it back up and you can
2623
01:39:39,660 --> 01:39:41,430
keep the system running while this is
2624
01:39:41,430 --> 01:39:43,440
actually happening this is one of the
2625
01:39:43,440 --> 01:39:45,810
things goggle does all the time okay
2626
01:39:45,810 --> 01:39:47,640
they monitor all the systems they simply
2627
01:39:47,640 --> 01:39:49,110
knock off the ones that don't work and
2628
01:39:49,110 --> 01:39:50,880
you don't even feel it because they mask
2629
01:39:50,880 --> 01:39:54,090
the failure actor actor model should
2630
01:39:54,090 --> 01:39:56,730
make this a much easier exercise alright
2631
01:39:56,730 --> 01:39:58,740
so this is it for this lecture I'm going
2632
01:39:58,740 --> 01:40:01,770
to go to my office and find the project
2633
01:40:01,770 --> 01:40:03,150
and post the first project so you can
2634
01:40:03,150 --> 01:40:04,080
start working on it
2635
01:40:04,080 --> 01:40:06,350
okay and we're gonna have more fun
2636
01:40:06,350 --> 01:40:08,610
Thursday and talk about distributed
2637
01:40:08,610 --> 00:00:00,000
systems