1
00:00:19,040 --> 00:00:23,730
all right so I want to finish that

2
00:00:23,730 --> 00:00:25,680
introduction to disability systems this

3
00:00:25,680 --> 00:00:27,330
should give you a very fast overview of

4
00:00:27,330 --> 00:00:29,220
what kind of things we want what kind of

5
00:00:29,220 --> 00:00:30,660
problems we want to solve that's the

6
00:00:30,660 --> 00:00:32,729
biggest question in any any area right

7
00:00:32,729 --> 00:00:35,070
and what I'm going to try to do is try

8
00:00:35,070 --> 00:00:37,950
to point out the obsessions that these

9
00:00:37,950 --> 00:00:41,370
two business systems will have why do I

10
00:00:41,370 --> 00:00:43,200
call them obsessions so at least this is

11
00:00:43,200 --> 00:00:45,090
the way of thinking about things every

12
00:00:45,090 --> 00:00:47,040
area of computer science or other kind

13
00:00:47,040 --> 00:00:48,899
of Sciences is obsessed about certain

14
00:00:48,899 --> 00:00:51,239
topics in order to think like them you

15
00:00:51,239 --> 00:00:53,309
have to switch at least temporarily to

16
00:00:53,309 --> 00:00:55,530
the same kind of obsessions so they have

17
00:00:55,530 --> 00:00:57,420
a deep regard for certain things and a

18
00:00:57,420 --> 00:00:59,129
deep disregard for others and this is

19
00:00:59,129 --> 00:01:01,890
exactly how you understand the specific

20
00:01:01,890 --> 00:01:05,069
area so for example in Nam how many

21
00:01:05,069 --> 00:01:06,540
people here take algorithms the

22
00:01:06,540 --> 00:01:08,490
algorithms class right all of you have

23
00:01:08,490 --> 00:01:10,560
to write you know it's unavoidable okay

24
00:01:10,560 --> 00:01:12,540
so what's the obsession in algorithms

25
00:01:12,540 --> 00:01:14,940
the Big O notation and in particular the

26
00:01:14,940 --> 00:01:18,570
chasing of logarithms right to the point

27
00:01:18,570 --> 00:01:20,070
that they prove certain things are black

28
00:01:20,070 --> 00:01:22,470
best better than logarithms right log

29
00:01:22,470 --> 00:01:24,930
star Union bound is log star which is

30
00:01:24,930 --> 00:01:27,090
impossible to tell apart from all of one

31
00:01:27,090 --> 00:01:31,110
but it's a fancy way to say a little

32
00:01:31,110 --> 00:01:33,390
italy-- to little worse but that doesn't

33
00:01:33,390 --> 00:01:34,500
mean necessity that in practice the

34
00:01:34,500 --> 00:01:35,880
program runs fast or doesn't run fast

35
00:01:35,880 --> 00:01:37,980
it's just a particular kind of obsession

36
00:01:37,980 --> 00:01:40,050
in that area so the important thing here

37
00:01:40,050 --> 00:01:41,430
is gonna be what are the obsessions in

38
00:01:41,430 --> 00:01:43,320
distributed systems what are the things

39
00:01:43,320 --> 00:01:45,030
disability systems like to talk about

40
00:01:45,030 --> 00:01:49,530
and I'll try to explain why okay all

41
00:01:49,530 --> 00:01:50,460
right so one of them is this

42
00:01:50,460 --> 00:01:53,160
transparency deal right make as many

43
00:01:53,160 --> 00:01:55,500
things as possible transparent and as I

44
00:01:55,500 --> 00:01:58,560
mentioned three lectures back this

45
00:01:58,560 --> 00:02:00,690
obsession has even been standardized

46
00:02:00,690 --> 00:02:02,550
right there is an iso standard of how

47
00:02:02,550 --> 00:02:04,020
exactly should you talk about the

48
00:02:04,020 --> 00:02:05,700
transparency obsession I'm not gonna go

49
00:02:05,700 --> 00:02:07,950
for that again another one is the

50
00:02:07,950 --> 00:02:12,150
scalability it's the central if you want

51
00:02:12,150 --> 00:02:16,709
idea in distributed systems any central

52
00:02:16,709 --> 00:02:18,989
centrality in the algorithm it's evil

53
00:02:18,989 --> 00:02:22,110
let's get rid of it none of the things

54
00:02:22,110 --> 00:02:24,030
that we do should be centralized in any

55
00:02:24,030 --> 00:02:26,270
way shape or form and then this is the

56
00:02:26,270 --> 00:02:29,780
and the reasons why I mean this is the

57
00:02:29,780 --> 00:02:31,070
standard things that are going to happen

58
00:02:31,070 --> 00:02:33,230
if you standardize if you centralize and

59
00:02:33,230 --> 00:02:35,360
they're all considered bad things right

60
00:02:35,360 --> 00:02:37,460
so centralized services means a single

61
00:02:37,460 --> 00:02:39,200
server for all users what if that single

62
00:02:39,200 --> 00:02:41,090
server goes down let's put more servers

63
00:02:41,090 --> 00:02:43,460
centralized data a single online

64
00:02:43,460 --> 00:02:46,540
telephone book this is similar to the

65
00:02:46,540 --> 00:02:48,950
centralized services centralized data

66
00:02:48,950 --> 00:02:50,990
might actually mean data it's on a farm

67
00:02:50,990 --> 00:02:53,060
available from multiple web sites but is

68
00:02:53,060 --> 00:02:54,770
still owned by a single let's say entity

69
00:02:54,770 --> 00:02:56,660
and that could be also a problem for

70
00:02:56,660 --> 00:02:58,960
different reasons centralized algorithms

71
00:02:58,960 --> 00:03:01,610
write doing routing based on completing

72
00:03:01,610 --> 00:03:03,740
information this is a much much much

73
00:03:03,740 --> 00:03:06,500
bigger kind of evil decentralized

74
00:03:06,500 --> 00:03:09,460
algorithm are really the core of

75
00:03:09,460 --> 00:03:11,570
distributed systems can you design

76
00:03:11,570 --> 00:03:12,830
algorithms that don't have complete

77
00:03:12,830 --> 00:03:14,180
knowledge we're going to come back to

78
00:03:14,180 --> 00:03:16,400
this later now what I want you to

79
00:03:16,400 --> 00:03:18,110
understand though okay and I have to be

80
00:03:18,110 --> 00:03:19,370
honest with you from the beginning what

81
00:03:19,370 --> 00:03:21,230
I want you to understand this it's good

82
00:03:21,230 --> 00:03:23,270
to be obsessed don't get me wrong but at

83
00:03:23,270 --> 00:03:24,890
some point it comes back and haunt you

84
00:03:24,890 --> 00:03:27,650
being overly obsessed about things is

85
00:03:27,650 --> 00:03:30,380
not healthy either right and the surface

86
00:03:30,380 --> 00:03:31,880
circumstances might be perfectly

87
00:03:31,880 --> 00:03:34,160
acceptable to design centralized

88
00:03:34,160 --> 00:03:35,900
algorithms for example if the total

89
00:03:35,900 --> 00:03:37,490
amount of knowledge that's required is

90
00:03:37,490 --> 00:03:39,050
relatively small you're better off by

91
00:03:39,050 --> 00:03:40,730
telling everybody that knowledge and let

92
00:03:40,730 --> 00:03:42,230
them run whatever centralized arguing

93
00:03:42,230 --> 00:03:45,410
they want also centralized services if

94
00:03:45,410 --> 00:03:48,170
you have a hundred customers you really

95
00:03:48,170 --> 00:03:49,370
don't care about the centralized

96
00:03:49,370 --> 00:03:52,010
services you don't so if you're sure

97
00:03:52,010 --> 00:03:53,209
that you're only gonna have a hundred

98
00:03:53,209 --> 00:03:54,890
customers do the centralized solution

99
00:03:54,890 --> 00:03:56,600
and move on and do something else

100
00:03:56,600 --> 00:03:58,580
there's no need to design a super

101
00:03:58,580 --> 00:04:00,860
scalable solution if you're not sure

102
00:04:00,860 --> 00:04:05,060
that that quantity a quality demands of

103
00:04:05,060 --> 00:04:06,380
the users are going to get to the level

104
00:04:06,380 --> 00:04:08,150
where that's warranted right so treat

105
00:04:08,150 --> 00:04:10,340
them exactly like obsessions that you

106
00:04:10,340 --> 00:04:12,920
can turn on or off okay now of course

107
00:04:12,920 --> 00:04:14,120
the interesting question in this class

108
00:04:14,120 --> 00:04:15,920
is gonna be if I do want to achieve

109
00:04:15,920 --> 00:04:18,320
these goals which are important goals

110
00:04:18,320 --> 00:04:20,329
under certain circumstances what do i do

111
00:04:20,329 --> 00:04:23,510
then right but you don't necessarily

112
00:04:23,510 --> 00:04:26,270
always want to do that and I'm gonna

113
00:04:26,270 --> 00:04:27,740
argue throughout the class that

114
00:04:27,740 --> 00:04:30,919
solutions already in existence try to

115
00:04:30,919 --> 00:04:32,360
knock off some of these obsessions

116
00:04:32,360 --> 00:04:33,680
because engineer said hey it's

117
00:04:33,680 --> 00:04:35,540
ridiculous do this let's not do it let's

118
00:04:35,540 --> 00:04:37,910
let's right so a lot of for example the

119
00:04:37,910 --> 00:04:40,010
systems and

120
00:04:40,010 --> 00:04:42,050
Hadoop is like this right a MapReduce

121
00:04:42,050 --> 00:04:44,060
engine in fact has quite a lot of

122
00:04:44,060 --> 00:04:47,270
centralization right which immediately

123
00:04:47,270 --> 00:04:49,820
makes it a not good solution in the eyes

124
00:04:49,820 --> 00:04:52,010
of the disability systems community and

125
00:04:52,010 --> 00:04:53,300
hey let's improve it and you have

126
00:04:53,300 --> 00:04:55,190
thousands of papers that talk about how

127
00:04:55,190 --> 00:04:57,590
you could remove that centralization

128
00:04:57,590 --> 00:04:59,150
requirement in Hadoop but of course

129
00:04:59,150 --> 00:05:00,830
nobody bothers to run any of those in

130
00:05:00,830 --> 00:05:02,240
practice and they still run and they are

131
00:05:02,240 --> 00:05:04,790
happy about an actual reasonably

132
00:05:04,790 --> 00:05:06,620
centralized at least this decision maker

133
00:05:06,620 --> 00:05:10,460
in Hadoop okay all right so these are

134
00:05:10,460 --> 00:05:12,560
the so-called scalability problems I

135
00:05:12,560 --> 00:05:13,910
mean what scalability for so we're going

136
00:05:13,910 --> 00:05:15,170
to talk about scalability up and down

137
00:05:15,170 --> 00:05:16,580
you already have used the word

138
00:05:16,580 --> 00:05:18,980
scalability I'm sure but it's important

139
00:05:18,980 --> 00:05:21,170
to say what scalability okay

140
00:05:21,170 --> 00:05:25,280
and this is an obsession that's good but

141
00:05:25,280 --> 00:05:27,050
not as a single obsession

142
00:05:27,050 --> 00:05:29,360
okay so scalability refers to the fact

143
00:05:29,360 --> 00:05:31,730
that if I add more hardware to the

144
00:05:31,730 --> 00:05:35,480
problem ideally if I double let's forget

145
00:05:35,480 --> 00:05:37,160
about hardware because I even hate

146
00:05:37,160 --> 00:05:38,900
things like if I double the number of

147
00:05:38,900 --> 00:05:40,970
processors something else I mean the

148
00:05:40,970 --> 00:05:43,460
performance doubles what I like to think

149
00:05:43,460 --> 00:05:45,530
in general and I think this is the only

150
00:05:45,530 --> 00:05:47,600
reasonable obsession is to obsess about

151
00:05:47,600 --> 00:05:49,910
money because it's the scarce scarce

152
00:05:49,910 --> 00:05:53,150
here of all resources and I mean replace

153
00:05:53,150 --> 00:05:54,560
money with anything that costs money and

154
00:05:54,560 --> 00:05:56,630
is the same right gold or whatever you

155
00:05:56,630 --> 00:06:00,050
want okay so the question to me to a

156
00:06:00,050 --> 00:06:01,760
large extent and this is the Ultima

157
00:06:01,760 --> 00:06:04,820
scalability years I have problems

158
00:06:04,820 --> 00:06:08,000
because the system is too slow I would

159
00:06:08,000 --> 00:06:09,320
like to improve the speed of the system

160
00:06:09,320 --> 00:06:11,810
by a factor of true can I just throw the

161
00:06:11,810 --> 00:06:13,280
same amount of money as I already have

162
00:06:13,280 --> 00:06:15,740
invested in the system and get an

163
00:06:15,740 --> 00:06:17,480
addition to the system that allows me

164
00:06:17,480 --> 00:06:20,180
now to run twice as fast if yes I

165
00:06:20,180 --> 00:06:22,370
achieved at least money scalability but

166
00:06:22,370 --> 00:06:23,390
there are many other kinds of

167
00:06:23,390 --> 00:06:25,880
scalability but it's always about if I

168
00:06:25,880 --> 00:06:28,070
put more resources in is the running

169
00:06:28,070 --> 00:06:30,170
time going down number of users go

170
00:06:30,170 --> 00:06:31,760
proportionally up and it's always that

171
00:06:31,760 --> 00:06:34,850
chase of the proportionality ideal

172
00:06:34,850 --> 00:06:38,260
scalability it's always thought of as

173
00:06:38,260 --> 00:06:40,720
proportional increase in resources

174
00:06:40,720 --> 00:06:42,830
proportional increase in how good the

175
00:06:42,830 --> 00:06:46,010
result is and you're gonna see lots and

176
00:06:46,010 --> 00:06:47,240
lots and lots of papers in which they

177
00:06:47,240 --> 00:06:49,550
say hey I designed an algorithm my

178
00:06:49,550 --> 00:06:51,110
algorithm is scalable how do I prove

179
00:06:51,110 --> 00:06:53,660
this well let's put on the x-axis

180
00:06:53,660 --> 00:06:55,310
the number of course used to run the

181
00:06:55,310 --> 00:06:57,230
computation on the y-axis the running

182
00:06:57,230 --> 00:06:59,120
time or whatever the fruit would and

183
00:06:59,120 --> 00:07:02,570
then you show linear increase when you

184
00:07:02,570 --> 00:07:04,220
have more coarse things get linearly

185
00:07:04,220 --> 00:07:06,110
better then you draw with the dotted

186
00:07:06,110 --> 00:07:08,150
line the ideal curve right so this is

187
00:07:08,150 --> 00:07:10,790
how a good this resistance paper looks

188
00:07:10,790 --> 00:07:12,860
like you say let's say throughput how

189
00:07:12,860 --> 00:07:14,390
many requests I can do per second you

190
00:07:14,390 --> 00:07:19,250
say ideal scale up my scale up you see

191
00:07:19,250 --> 00:07:21,410
I'm very very close it's so good what's

192
00:07:21,410 --> 00:07:23,450
not so good I not so good scale up we've

193
00:07:23,450 --> 00:07:25,220
seen some pictures like this is it goes

194
00:07:25,220 --> 00:07:27,290
like this and then flat this is number

195
00:07:27,290 --> 00:07:30,650
of course let's say all right not good

196
00:07:30,650 --> 00:07:33,950
good scaleable not scalable or so they

197
00:07:33,950 --> 00:07:38,450
say now again having this it's a good

198
00:07:38,450 --> 00:07:40,400
thing but it's not a solution for

199
00:07:40,400 --> 00:07:41,690
everything I want you to understand this

200
00:07:41,690 --> 00:07:43,190
throughout the class and this is my my

201
00:07:43,190 --> 00:07:44,620
biggest problem with a lot of the

202
00:07:44,620 --> 00:07:47,030
disability systems they are scalable but

203
00:07:47,030 --> 00:07:48,110
they're not particularly efficient in

204
00:07:48,110 --> 00:07:52,160
the first place so imagine that I'm not

205
00:07:52,160 --> 00:07:53,630
going to only show you the scalability

206
00:07:53,630 --> 00:07:54,980
I'll show you that a system that a

207
00:07:54,980 --> 00:07:57,170
single system can deal with 10 million

208
00:07:57,170 --> 00:07:59,120
users has no scalability whatsoever I

209
00:07:59,120 --> 00:08:01,220
can deal with 10 million users versus

210
00:08:01,220 --> 00:08:02,810
your scalable system that can be a per

211
00:08:02,810 --> 00:08:05,210
system with only 1 million I mean with

212
00:08:05,210 --> 00:08:06,560
only a hundred thousand users you

213
00:08:06,560 --> 00:08:08,720
literally need the hundred machines even

214
00:08:08,720 --> 00:08:10,250
if your system is scaleable to get to

215
00:08:10,250 --> 00:08:11,780
the point where you compare to my

216
00:08:11,780 --> 00:08:13,370
machine and you might never grow beyond

217
00:08:13,370 --> 00:08:15,470
those 10 million people so scalability

218
00:08:15,470 --> 00:08:17,870
it's not everything and that I want you

219
00:08:17,870 --> 00:08:20,060
to keep in mind from the beginning and

220
00:08:20,060 --> 00:08:21,050
now of course if you have a highly

221
00:08:21,050 --> 00:08:22,400
efficient algorithm that's also scalable

222
00:08:22,400 --> 00:08:23,570
hey that's fantastic

223
00:08:23,570 --> 00:08:27,740
ok so things about scalability I mean

224
00:08:27,740 --> 00:08:29,060
there are many obstacles to achieve

225
00:08:29,060 --> 00:08:30,830
scalability we are gonna see this the

226
00:08:30,830 --> 00:08:32,840
part these are some of the reasons no

227
00:08:32,840 --> 00:08:34,039
machine has complete information about

228
00:08:34,039 --> 00:08:36,110
the system state when you look at the

229
00:08:36,110 --> 00:08:37,429
system as a whole and we're gonna do

230
00:08:37,429 --> 00:08:39,650
this you draw a nice graph and you say

231
00:08:39,650 --> 00:08:41,000
this machine does this this is the

232
00:08:41,000 --> 00:08:42,590
exchanges net whatever they do the

233
00:08:42,590 --> 00:08:45,170
trouble is nobody can see that that

234
00:08:45,170 --> 00:08:46,940
graph right so think about for example

235
00:08:46,940 --> 00:08:50,060
all the computers and the routers and

236
00:08:50,060 --> 00:08:53,330
whatnot in the internet by the way you

237
00:08:53,330 --> 00:08:56,000
can buy programs that cost hundreds of

238
00:08:56,000 --> 00:08:57,980
thousands to millions of dollars to just

239
00:08:57,980 --> 00:09:01,070
draw the topology that might exist at

240
00:09:01,070 --> 00:09:02,510
some point in some corner of the

241
00:09:02,510 --> 00:09:04,700
Internet right you might know the

242
00:09:04,700 --> 00:09:05,990
machines you control but you don't know

243
00:09:05,990 --> 00:09:07,100
the machines somebody else

244
00:09:07,100 --> 00:09:08,210
controls and the Internet has made

245
00:09:08,210 --> 00:09:09,980
ultimately as many many many parts and

246
00:09:09,980 --> 00:09:11,870
knowing how parts come together might

247
00:09:11,870 --> 00:09:14,330
allow you to optimize the network but

248
00:09:14,330 --> 00:09:16,970
there are no good tools to just give you

249
00:09:16,970 --> 00:09:19,010
that global picture so global picture is

250
00:09:19,010 --> 00:09:20,750
very hard to achieve any algorithm that

251
00:09:20,750 --> 00:09:22,280
depends on that is gonna be in trouble

252
00:09:22,280 --> 00:09:26,330
right machines make decisions based only

253
00:09:26,330 --> 00:09:28,130
a local information now you can make

254
00:09:28,130 --> 00:09:30,110
decisions based on non-local information

255
00:09:30,110 --> 00:09:31,280
but you have to exchange information

256
00:09:31,280 --> 00:09:34,130
with other entities and what's the

257
00:09:34,130 --> 00:09:36,140
problem with that that takes time by the

258
00:09:36,140 --> 00:09:38,210
time the answer comes back the state of

259
00:09:38,210 --> 00:09:40,730
the system change already right so if

260
00:09:40,730 --> 00:09:43,100
you have a very dynamic environment and

261
00:09:43,100 --> 00:09:44,090
that happens for example with these

262
00:09:44,090 --> 00:09:47,270
mobile devices right for example imagine

263
00:09:47,270 --> 00:09:49,400
that you would like to say which

264
00:09:49,400 --> 00:09:50,870
students are in campus and you're just

265
00:09:50,870 --> 00:09:52,400
watching the MAC addresses on their

266
00:09:52,400 --> 00:09:53,480
phone by the way this is kind of

267
00:09:53,480 --> 00:09:56,210
possible now it's scary possible to some

268
00:09:56,210 --> 00:09:58,280
extent but in order to have a very good

269
00:09:58,280 --> 00:09:59,960
idea where everybody is you have to move

270
00:09:59,960 --> 00:10:01,550
fast enough with the messages and what

271
00:10:01,550 --> 00:10:02,570
you track because otherwise the people

272
00:10:02,570 --> 00:10:05,480
already moved right and if people are on

273
00:10:05,480 --> 00:10:06,920
a train you're in deep trouble because

274
00:10:06,920 --> 00:10:09,220
the Train already moves very fast right

275
00:10:09,220 --> 00:10:12,050
failure of one machine does not ruin the

276
00:10:12,050 --> 00:10:14,540
algorithm this is you remember the

277
00:10:14,540 --> 00:10:15,890
transparency one of the transparencies

278
00:10:15,890 --> 00:10:17,900
was failure transparency hide any kind

279
00:10:17,900 --> 00:10:20,210
of failure and please make it look as if

280
00:10:20,210 --> 00:10:23,780
nothing happened if you do that that's

281
00:10:23,780 --> 00:10:26,990
great for the user right but this is

282
00:10:26,990 --> 00:10:28,700
probably one of the very hard things to

283
00:10:28,700 --> 00:10:31,010
deal with how do you mask these failures

284
00:10:31,010 --> 00:10:32,750
how do you make sure that one machine

285
00:10:32,750 --> 00:10:35,930
that does not ruin the rest of the

286
00:10:35,930 --> 00:10:37,430
algorithm that essentially is going to

287
00:10:37,430 --> 00:10:38,380
mean that you need to build some

288
00:10:38,380 --> 00:10:40,760
redundancies in the system actually a

289
00:10:40,760 --> 00:10:43,760
lot of redundancies in the system ok so

290
00:10:43,760 --> 00:10:44,870
we are going to come back to all of

291
00:10:44,870 --> 00:10:45,830
these things as we talk about

292
00:10:45,830 --> 00:10:47,510
distributed systems algorithms no

293
00:10:47,510 --> 00:10:49,250
increase in simply this assumption about

294
00:10:49,250 --> 00:10:51,740
global clock now this is a weird one ok

295
00:10:51,740 --> 00:10:53,810
this is a really really weird one what

296
00:10:53,810 --> 00:10:55,370
exactly does it mean that we don't have

297
00:10:55,370 --> 00:10:57,680
a global clock we all have watches right

298
00:10:57,680 --> 00:10:59,210
there is an official time what does it

299
00:10:59,210 --> 00:11:02,090
mean not to have a global clock ok but

300
00:11:02,090 --> 00:11:04,010
here's the problem the problem is we

301
00:11:04,010 --> 00:11:05,690
need an extremely precise global clock

302
00:11:05,690 --> 00:11:09,860
right so when you're doing operations on

303
00:11:09,860 --> 00:11:11,330
a computer they happen at gigahertz

304
00:11:11,330 --> 00:11:13,430
right especially if you get that nice

305
00:11:13,430 --> 00:11:16,040
close to assembly code so everything is

306
00:11:16,040 --> 00:11:18,740
measured in nanoseconds

307
00:11:18,740 --> 00:11:21,260
all right so would be not fantastic if

308
00:11:21,260 --> 00:11:23,360
we have coordinated clocks with the

309
00:11:23,360 --> 00:11:26,840
precision of a nanosecond right if that

310
00:11:26,840 --> 00:11:29,680
would happen then a lot of good stuff

311
00:11:29,680 --> 00:11:32,630
it's it's gonna it's gonna work in our

312
00:11:32,630 --> 00:11:33,980
distributed algorithms because any

313
00:11:33,980 --> 00:11:35,330
activity we do we can just put the

314
00:11:35,330 --> 00:11:36,950
timestamp of when it happened and we

315
00:11:36,950 --> 00:11:38,330
would have a clearer idea of who was

316
00:11:38,330 --> 00:11:40,490
first somebody asked in class yeah but

317
00:11:40,490 --> 00:11:42,350
what if this guy sends the message first

318
00:11:42,350 --> 00:11:43,760
and this guy received a second or

319
00:11:43,760 --> 00:11:45,920
whatever how do we even know what first

320
00:11:45,920 --> 00:11:48,800
and second means why is this important

321
00:11:48,800 --> 00:11:50,330
I'm gonna talk extensively about this

322
00:11:50,330 --> 00:11:53,120
it's extremely hard to get highly

323
00:11:53,120 --> 00:11:55,580
coordinated clocks now how many people

324
00:11:55,580 --> 00:11:58,340
how many people here have heard about

325
00:11:58,340 --> 00:12:01,400
the GPS well you have it on your phone

326
00:12:01,400 --> 00:12:03,110
you better have heard about it how does

327
00:12:03,110 --> 00:12:08,150
a principle how does the GPS work good

328
00:12:08,150 --> 00:12:09,500
so that's very nice right we put

329
00:12:09,500 --> 00:12:12,110
satellites so what well so here's a

330
00:12:12,110 --> 00:12:15,110
here's a trick with GPS the little GPS

331
00:12:15,110 --> 00:12:17,710
your phone it's in fact solving

332
00:12:17,710 --> 00:12:20,270
trigonometric problem all the time it's

333
00:12:20,270 --> 00:12:22,550
a spatial 3-d trigonometric problem what

334
00:12:22,550 --> 00:12:24,530
it's doing is it listens to signals

335
00:12:24,530 --> 00:12:26,930
coming from satellites those signals

336
00:12:26,930 --> 00:12:30,500
contain in them the time at which the

337
00:12:30,500 --> 00:12:32,600
signal was produced it takes a while for

338
00:12:32,600 --> 00:12:34,280
the system to for the signal to

339
00:12:34,280 --> 00:12:37,610
propagate right so if you get a signal

340
00:12:37,610 --> 00:12:39,380
from four of our satellites you can

341
00:12:39,380 --> 00:12:41,330
actually compute how long it took the

342
00:12:41,330 --> 00:12:43,460
light to go from the satellite to you

343
00:12:43,460 --> 00:12:44,870
you know where the position of the

344
00:12:44,870 --> 00:12:46,460
satellite is because that's encoding in

345
00:12:46,460 --> 00:12:48,650
IDs in these messages you can solve that

346
00:12:48,650 --> 00:12:50,360
trigonometric problem and say this is my

347
00:12:50,360 --> 00:12:52,640
3d position on earth so that sounds

348
00:12:52,640 --> 00:12:54,680
fantastic except that it requires a

349
00:12:54,680 --> 00:12:57,650
magic ingredient the magic ingredient is

350
00:12:57,650 --> 00:13:00,110
the clocks on all the satellites have to

351
00:13:00,110 --> 00:13:02,680
be coordinated at the nanosecond

352
00:13:02,680 --> 00:13:06,590
resolution right so those satellites

353
00:13:06,590 --> 00:13:10,640
have so by the way the precision of the

354
00:13:10,640 --> 00:13:14,660
the GPS increased significantly even

355
00:13:14,660 --> 00:13:15,770
then in the last few years I mean the

356
00:13:15,770 --> 00:13:17,300
coolest thing I've heard is people that

357
00:13:17,300 --> 00:13:20,480
have tractors farms right they now can

358
00:13:20,480 --> 00:13:22,850
guide the tractor via satellite it

359
00:13:22,850 --> 00:13:25,820
apparently cost about $1500 to sign up

360
00:13:25,820 --> 00:13:28,070
for that special GPS program because the

361
00:13:28,070 --> 00:13:31,250
resolution of that is one inch they can

362
00:13:31,250 --> 00:13:32,329
tell the posi

363
00:13:32,329 --> 00:13:34,999
of the truck with the precision one inch

364
00:13:34,999 --> 00:13:38,629
which means they can guide that big tool

365
00:13:38,629 --> 00:13:40,970
that harvest or whatever very precisely

366
00:13:40,970 --> 00:13:42,649
not to miss anything because otherwise

367
00:13:42,649 --> 00:13:44,869
is an extremely boring job and the hell

368
00:13:44,869 --> 00:13:46,759
this fancy algorithms is the best way to

369
00:13:46,759 --> 00:13:48,889
design algorithms right design a nice

370
00:13:48,889 --> 00:13:51,019
algorithm that can go on a weirdly

371
00:13:51,019 --> 00:13:53,360
shaped field so that in the shortest

372
00:13:53,360 --> 00:13:54,920
amount of time you pick up all the

373
00:13:54,920 --> 00:13:56,449
whatever you're supposed to pick up but

374
00:13:56,449 --> 00:13:58,879
you need that one inch resolution on the

375
00:13:58,879 --> 00:14:02,179
GPS right by the way the GPS that I have

376
00:14:02,179 --> 00:14:05,149
in my car sometimes I wonder if it even

377
00:14:05,149 --> 00:14:08,989
hits the right town right let alone an

378
00:14:08,989 --> 00:14:10,459
inch so how do you get such a small

379
00:14:10,459 --> 00:14:13,399
resolution well the secret is not be in

380
00:14:13,399 --> 00:14:15,019
better ways to compute something the

381
00:14:15,019 --> 00:14:17,149
secret is really in better coordinated

382
00:14:17,149 --> 00:14:20,269
clocks right by the way all those

383
00:14:20,269 --> 00:14:21,980
satellites have extremely expensive

384
00:14:21,980 --> 00:14:24,019
clocks they are atomic clocks cetacean

385
00:14:24,019 --> 00:14:26,959
based atomic clocks and this is the kind

386
00:14:26,959 --> 00:14:28,249
of technological problem that had to be

387
00:14:28,249 --> 00:14:30,319
solved on the satellites now is it

388
00:14:30,319 --> 00:14:31,819
possible to achieve extremely

389
00:14:31,819 --> 00:14:34,369
coordinated clocks yes but it's

390
00:14:34,369 --> 00:14:35,869
extremely expensive you're not going to

391
00:14:35,869 --> 00:14:38,379
see one in your phone anytime soon right

392
00:14:38,379 --> 00:14:40,669
now you're gonna see a lot of algorithms

393
00:14:40,669 --> 00:14:43,129
depend on some sort of a clock there and

394
00:14:43,129 --> 00:14:45,110
the clocks make it much much easier much

395
00:14:45,110 --> 00:14:47,540
simpler but we can't put a multi-million

396
00:14:47,540 --> 00:14:50,419
dollar celsium clock that's a big in

397
00:14:50,419 --> 00:14:53,809
every phone right to get those clocks so

398
00:14:53,809 --> 00:14:55,189
you might as well just give up and say I

399
00:14:55,189 --> 00:14:57,139
can't coordinate clocks it's so bad

400
00:14:57,139 --> 00:15:00,470
actually that different course in a

401
00:15:00,470 --> 00:15:03,220
single chip don't run on the same clock

402
00:15:03,220 --> 00:15:05,660
literally they all have their own clock

403
00:15:05,660 --> 00:15:08,389
their own beating and they cannot tell

404
00:15:08,389 --> 00:15:11,540
if two things happened at the same time

405
00:15:11,540 --> 00:15:13,759
and what does it even mean for two

406
00:15:13,759 --> 00:15:14,809
things to happen at the same time

407
00:15:14,809 --> 00:15:16,189
because any information about that thing

408
00:15:16,189 --> 00:15:17,989
happening takes at least the speed of

409
00:15:17,989 --> 00:15:19,249
light to propagate in the other place

410
00:15:19,249 --> 00:15:21,799
right so things are really messed up

411
00:15:21,799 --> 00:15:24,410
when it comes to the same time okay oh

412
00:15:24,410 --> 00:15:27,290
by the way this is the coolest thing

413
00:15:27,290 --> 00:15:27,730
ever

414
00:15:27,730 --> 00:15:30,769
with the GPS it turns out that the

415
00:15:30,769 --> 00:15:33,290
precision of the GPS would be only about

416
00:15:33,290 --> 00:15:35,660
two miles as opposed to this kind of

417
00:15:35,660 --> 00:15:37,100
inches if you would not do a

418
00:15:37,100 --> 00:15:39,199
relativistic correction so you have to

419
00:15:39,199 --> 00:15:42,470
use the Einstein's general relativity to

420
00:15:42,470 --> 00:15:44,749
account for the fact that the time goes

421
00:15:44,749 --> 00:15:46,370
a little bit slower

422
00:15:46,370 --> 00:15:48,260
where the satellites are because of the

423
00:15:48,260 --> 00:15:50,810
gravity the gravity is a little bit

424
00:15:50,810 --> 00:15:53,750
smaller there and this is probably you

425
00:15:53,750 --> 00:15:55,250
haven't bumped into this right I'm Stein

426
00:15:55,250 --> 00:15:56,960
wrote some very nice very fancy

427
00:15:56,960 --> 00:15:58,450
accusations to describe how

428
00:15:58,450 --> 00:16:01,090
gravitational fields influence the time

429
00:16:01,090 --> 00:16:03,800
it's completely messed up and upside

430
00:16:03,800 --> 00:16:05,510
down I mean you have to know so much to

431
00:16:05,510 --> 00:16:08,470
just implement this kind of a GPS okay

432
00:16:08,470 --> 00:16:14,420
and yeah no cell phone sessom clocks

433
00:16:14,420 --> 00:16:19,010
anytime soon for sure okay all right now

434
00:16:19,010 --> 00:16:22,220
when it comes to scalability there are

435
00:16:22,220 --> 00:16:24,440
many more traditional and nice

436
00:16:24,440 --> 00:16:25,610
traditional techniques you're going to

437
00:16:25,610 --> 00:16:27,020
we are going to see in the class and

438
00:16:27,020 --> 00:16:28,460
these techniques become very very

439
00:16:28,460 --> 00:16:30,290
important all right let me give you just

440
00:16:30,290 --> 00:16:32,300
one idea and this is something that

441
00:16:32,300 --> 00:16:34,370
you're almost guaranteed to bump into

442
00:16:34,370 --> 00:16:38,089
right you have so the model that's

443
00:16:38,089 --> 00:16:40,460
everywhere now is you have front ends

444
00:16:40,460 --> 00:16:43,370
that are either for example applications

445
00:16:43,370 --> 00:16:45,950
in cell phones or web interfaces for a

446
00:16:45,950 --> 00:16:47,300
computer or whatnot and some sort of

447
00:16:47,300 --> 00:16:50,330
back-end some sort of combination

448
00:16:50,330 --> 00:16:54,170
database and web servers and whatnot in

449
00:16:54,170 --> 00:16:56,330
the in the backend and one of the big

450
00:16:56,330 --> 00:16:58,610
questions is how could you design such a

451
00:16:58,610 --> 00:16:59,990
system that can support millions of

452
00:16:59,990 --> 00:17:01,100
users we're going to talk a little bit

453
00:17:01,100 --> 00:17:03,200
about later a little bit about the

454
00:17:03,200 --> 00:17:04,310
Instagram how many people heard about

455
00:17:04,310 --> 00:17:06,230
the Instagram these guys with click

456
00:17:06,230 --> 00:17:07,609
pictures forget about and just put them

457
00:17:07,609 --> 00:17:10,220
there okay the trouble Instagram ran

458
00:17:10,220 --> 00:17:14,480
into is they went from 50,000 users to a

459
00:17:14,480 --> 00:17:17,929
million users in a week they panicked

460
00:17:17,929 --> 00:17:21,230
okay they were just basically some kids

461
00:17:21,230 --> 00:17:23,569
that took some off-the-shelf tools to

462
00:17:23,569 --> 00:17:26,480
design this it was going fine at 50,000

463
00:17:26,480 --> 00:17:29,120
users and the system started to crawl

464
00:17:29,120 --> 00:17:30,830
the moment they reach the million and it

465
00:17:30,830 --> 00:17:32,810
was going up because it just exploded

466
00:17:32,810 --> 00:17:34,730
now why it exploded in terms of

467
00:17:34,730 --> 00:17:38,090
popularity I don't know they've done

468
00:17:38,090 --> 00:17:39,560
something right but the problem is it

469
00:17:39,560 --> 00:17:42,020
exploded so they had to redesign the

470
00:17:42,020 --> 00:17:43,880
backend completely rewrite completely

471
00:17:43,880 --> 00:17:45,350
the backend and they've done it in three

472
00:17:45,350 --> 00:17:45,800
days

473
00:17:45,800 --> 00:17:48,470
okay I call that the scalability wall a

474
00:17:48,470 --> 00:17:50,750
lot of startups die because they cannot

475
00:17:50,750 --> 00:17:53,270
go through the scalability wall the

476
00:17:53,270 --> 00:17:55,880
initial solution they design is not

477
00:17:55,880 --> 00:17:58,010
going to allow them to take those

478
00:17:58,010 --> 00:18:00,410
millions of users right

479
00:18:00,410 --> 00:18:02,450
you don't solve that problem your ab

480
00:18:02,450 --> 00:18:03,800
dies that's it

481
00:18:03,800 --> 00:18:05,420
I mean the moment it becomes sluggish

482
00:18:05,420 --> 00:18:07,250
nobody uses it nobody's going to have

483
00:18:07,250 --> 00:18:09,110
patience people people who sell phones

484
00:18:09,110 --> 00:18:11,750
have absolutely no patience right

485
00:18:11,750 --> 00:18:14,210
absolutely no patience you have exactly

486
00:18:14,210 --> 00:18:15,950
30 seconds to convince me this app is

487
00:18:15,950 --> 00:18:18,560
worth anything if not delete move on to

488
00:18:18,560 --> 00:18:21,500
the next one all right I'm the same I

489
00:18:21,500 --> 00:18:23,510
don't all right so some of the scaling

490
00:18:23,510 --> 00:18:26,600
techniques are move computation away

491
00:18:26,600 --> 00:18:28,040
move things away from the server and

492
00:18:28,040 --> 00:18:29,270
this is one of them now this is

493
00:18:29,270 --> 00:18:30,860
something that people usually don't talk

494
00:18:30,860 --> 00:18:33,830
too much about traditional way is for

495
00:18:33,830 --> 00:18:36,410
example - this is web forms right fill

496
00:18:36,410 --> 00:18:38,720
in information in in a web page and then

497
00:18:38,720 --> 00:18:40,100
send it to the back end and then the

498
00:18:40,100 --> 00:18:41,690
back end computes all of that I mean

499
00:18:41,690 --> 00:18:42,530
first of all you send a lot of

500
00:18:42,530 --> 00:18:45,560
information does analysis and says hey

501
00:18:45,560 --> 00:18:47,240
this one is good not good send a message

502
00:18:47,240 --> 00:18:48,560
back and if it's good maybe do whatever

503
00:18:48,560 --> 00:18:50,660
the form was supposed to to tell you to

504
00:18:50,660 --> 00:18:52,070
do a transaction in a database or things

505
00:18:52,070 --> 00:18:55,190
of this sort okay now what's the problem

506
00:18:55,190 --> 00:18:57,500
with this the server does too much all

507
00:18:57,500 --> 00:18:59,890
right now I mentioned this before

508
00:18:59,890 --> 00:19:03,500
JavaScript right it's now at an amazing

509
00:19:03,500 --> 00:19:05,870
level by the way even phones run it well

510
00:19:05,870 --> 00:19:08,180
like to learn desktops I've seen Unreal

511
00:19:08,180 --> 00:19:09,320
Tournament how many people know about

512
00:19:09,320 --> 00:19:11,630
our realtor none come on it's not the

513
00:19:11,630 --> 00:19:15,880
same ok real tournament is this well

514
00:19:15,880 --> 00:19:19,150
somewhat violent but very free the game

515
00:19:19,150 --> 00:19:21,830
it has been ported to the web browser

516
00:19:21,830 --> 00:19:24,380
this is because of a very very nice tool

517
00:19:24,380 --> 00:19:26,360
that can take C++ code and generate a

518
00:19:26,360 --> 00:19:27,740
very specific kind of JavaScript code

519
00:19:27,740 --> 00:19:30,320
but the point is JavaScript runs so well

520
00:19:30,320 --> 00:19:32,140
in the browser now you can run 3d games

521
00:19:32,140 --> 00:19:34,640
right now in my opinion this is a

522
00:19:34,640 --> 00:19:36,320
tremendous opportunity why because if

523
00:19:36,320 --> 00:19:37,850
you can run 3d games you can do so much

524
00:19:37,850 --> 00:19:40,190
more in JavaScript which means you

525
00:19:40,190 --> 00:19:42,680
should move a lot of stuff in the front

526
00:19:42,680 --> 00:19:44,840
end in the browser all the validation

527
00:19:44,840 --> 00:19:47,000
for the forms and whatnot especially the

528
00:19:47,000 --> 00:19:49,370
really stupid mistakes right now you

529
00:19:49,370 --> 00:19:51,290
cannot prevent in this way attacks

530
00:19:51,290 --> 00:19:52,490
because then they're going to replace

531
00:19:52,490 --> 00:19:53,960
your JavaScript code and do low-level

532
00:19:53,960 --> 00:19:56,630
codes but at least thinks to help the

533
00:19:56,630 --> 00:19:58,760
user you can definitely delegate them to

534
00:19:58,760 --> 00:20:01,130
the front end that immediately means

535
00:20:01,130 --> 00:20:02,600
you're more scalable in the backend

536
00:20:02,600 --> 00:20:04,310
because the server does less the server

537
00:20:04,310 --> 00:20:06,560
just providing some data is not doing

538
00:20:06,560 --> 00:20:07,310
this crazy

539
00:20:07,310 --> 00:20:09,350
other computation that's needed this is

540
00:20:09,350 --> 00:20:10,580
extremely important for example for

541
00:20:10,580 --> 00:20:11,990
visualization in which all the

542
00:20:11,990 --> 00:20:14,000
computation to run the visualization

543
00:20:14,000 --> 00:20:16,160
it's running the JavaScript in the front

544
00:20:16,160 --> 00:20:18,650
end and the server just says huh here's

545
00:20:18,650 --> 00:20:20,180
a data and then you can support

546
00:20:20,180 --> 00:20:22,220
thousands millions of users no problem

547
00:20:22,220 --> 00:20:24,230
you just need bandwidth and that there

548
00:20:24,230 --> 00:20:25,610
are standard tricks like compression and

549
00:20:25,610 --> 00:20:27,080
other things to it make it even more

550
00:20:27,080 --> 00:20:29,390
scalable or we've not you moved to

551
00:20:29,390 --> 00:20:32,150
another data center or whatever alright

552
00:20:32,150 --> 00:20:35,660
alright so many many techniques for this

553
00:20:35,660 --> 00:20:38,000
another one for example and this is a

554
00:20:38,000 --> 00:20:40,220
more geographical scalability the

555
00:20:40,220 --> 00:20:42,860
trouble is light and electrical signals

556
00:20:42,860 --> 00:20:45,470
in general they need a lot of time to go

557
00:20:45,470 --> 00:20:47,540
from one end to the other at the scale

558
00:20:47,540 --> 00:20:52,340
of this very fast messages right so that

559
00:20:52,340 --> 00:20:54,770
already puts a lot of constraints in how

560
00:20:54,770 --> 00:20:58,460
fast you can talk so how do you solve

561
00:20:58,460 --> 00:21:00,860
latency problems so scalability can be

562
00:21:00,860 --> 00:21:02,330
in terms of many aspects one of them is

563
00:21:02,330 --> 00:21:06,830
latency right humans tolerate about 300

564
00:21:06,830 --> 00:21:09,470
millisecond total latency anything above

565
00:21:09,470 --> 00:21:11,450
that it's very annoying and I'm sure you

566
00:21:11,450 --> 00:21:12,890
yourself have been annoyed right web

567
00:21:12,890 --> 00:21:16,190
page at loads very slowly or you try to

568
00:21:16,190 --> 00:21:17,990
do something in an app and it seems to

569
00:21:17,990 --> 00:21:20,960
kind of grind in there is you have bad

570
00:21:20,960 --> 00:21:22,340
things to say about the web designers

571
00:21:22,340 --> 00:21:25,250
right is the backend for sure right so

572
00:21:25,250 --> 00:21:26,960
how do you solve these things especially

573
00:21:26,960 --> 00:21:29,180
in a geographical way well for example

574
00:21:29,180 --> 00:21:30,920
if you could tell from what country

575
00:21:30,920 --> 00:21:34,220
people come assuming that network the

576
00:21:34,220 --> 00:21:35,870
network layout kind of goes with the

577
00:21:35,870 --> 00:21:37,580
country which is not necessarily true

578
00:21:37,580 --> 00:21:40,580
then you can in fact position multiple

579
00:21:40,580 --> 00:21:43,100
servers for your service in various

580
00:21:43,100 --> 00:21:46,010
countries or at various ISPs intercept

581
00:21:46,010 --> 00:21:47,540
then the requests that normally would

582
00:21:47,540 --> 00:21:49,250
have went to your main website that's

583
00:21:49,250 --> 00:21:51,440
maybe in the US and try to serve it from

584
00:21:51,440 --> 00:21:54,290
that closer server by the way I mean

585
00:21:54,290 --> 00:21:56,330
this sounds like a cool thing to do

586
00:21:56,330 --> 00:21:59,300
right has been done Akamai how many

587
00:21:59,300 --> 00:22:01,730
people heard about that client what few

588
00:22:01,730 --> 00:22:03,380
of you why the rest of the people did

589
00:22:03,380 --> 00:22:06,290
not heard about Tekamah because we don't

590
00:22:06,290 --> 00:22:09,620
care to normal users they have as a

591
00:22:09,620 --> 00:22:12,500
customer the big guys you know about

592
00:22:12,500 --> 00:22:15,950
right Google and Facebook and CNN and

593
00:22:15,950 --> 00:22:17,930
whatnot so what do they do it's called a

594
00:22:17,930 --> 00:22:22,130
web accelerator instead of you going all

595
00:22:22,130 --> 00:22:23,810
the way to the end of the world to talk

596
00:22:23,810 --> 00:22:27,340
to the CNN web server that might be

597
00:22:27,340 --> 00:22:29,380
where their servers are they put a very

598
00:22:29,380 --> 00:22:32,410
nice server at your ISP they propagate

599
00:22:32,410 --> 00:22:34,270
the content that CNN wants to have and

600
00:22:34,270 --> 00:22:36,640
maybe it's delay by few seconds but

601
00:22:36,640 --> 00:22:38,440
nobody's gonna notice and then when you

602
00:22:38,440 --> 00:22:40,540
go to cnn.com it's intercepted by your

603
00:22:40,540 --> 00:22:42,790
ISP and runs on that metal server so

604
00:22:42,790 --> 00:22:45,130
Akamai pays for hosting servers at the

605
00:22:45,130 --> 00:22:48,040
ISP side and this is why certain web

606
00:22:48,040 --> 00:22:49,390
sites are so fast

607
00:22:49,390 --> 00:22:52,890
right because of Akamai because of this

608
00:22:52,890 --> 00:22:54,700
scalability through geographical

609
00:22:54,700 --> 00:22:56,110
techniques in which you don't only have

610
00:22:56,110 --> 00:22:59,110
multiple servers to take the load but

611
00:22:59,110 --> 00:23:00,670
they are much closer to you in terms of

612
00:23:00,670 --> 00:23:03,850
time to access right now an extreme

613
00:23:03,850 --> 00:23:06,850
version of this by the way here's maybe

614
00:23:06,850 --> 00:23:08,320
second trading how many people know

615
00:23:08,320 --> 00:23:09,640
about millisecond trading this is about

616
00:23:09,640 --> 00:23:11,620
making money very fast or routing on a

617
00:23:11,620 --> 00:23:12,340
very fast

618
00:23:12,340 --> 00:23:14,950
so what's millisecond trading means I

619
00:23:14,950 --> 00:23:18,100
want to buy and sell stock or other

620
00:23:18,100 --> 00:23:20,410
things that have money associated with

621
00:23:20,410 --> 00:23:23,770
them fast at a millisecond level right

622
00:23:23,770 --> 00:23:29,230
now here's a problem a server in Japan

623
00:23:29,230 --> 00:23:31,660
right or even in Gainesville might be a

624
00:23:31,660 --> 00:23:34,120
hundred milliseconds away from the New

625
00:23:34,120 --> 00:23:38,130
York Stock Exchange's very fast trading

626
00:23:38,130 --> 00:23:42,460
server in the time it takes the message

627
00:23:42,460 --> 00:23:45,730
that says stocks of Microsoft have been

628
00:23:45,730 --> 00:23:47,530
sold at this price point price point in

629
00:23:47,530 --> 00:23:49,030
the time it takes to make it to my

630
00:23:49,030 --> 00:23:50,470
server let alone the time it takes me to

631
00:23:50,470 --> 00:23:52,900
process it and so on the guys that have

632
00:23:52,900 --> 00:23:57,310
the servers in Manhattan right where the

633
00:23:57,310 --> 00:23:58,360
servers of the New York Stock Exchange

634
00:23:58,360 --> 00:24:02,140
are got a message already decided to buy

635
00:24:02,140 --> 00:24:04,360
or to sell what to do and send a buy or

636
00:24:04,360 --> 00:24:06,400
sell request and they already have the

637
00:24:06,400 --> 00:24:08,350
order fulfilled by the time the initial

638
00:24:08,350 --> 00:24:10,810
message gets to me right now why is that

639
00:24:10,810 --> 00:24:12,820
important because you can make a lot of

640
00:24:12,820 --> 00:24:14,950
money by just writing small fluctuations

641
00:24:14,950 --> 00:24:17,830
right you see people try to predict the

642
00:24:17,830 --> 00:24:19,150
stock market for the longest time and

643
00:24:19,150 --> 00:24:20,320
this guy said hey you don't need to

644
00:24:20,320 --> 00:24:21,970
predict the stock market you only have

645
00:24:21,970 --> 00:24:23,170
to predict what's gonna happen in the

646
00:24:23,170 --> 00:24:26,350
next 100 milliseconds if it takes time

647
00:24:26,350 --> 00:24:28,420
for other people to react right if they

648
00:24:28,420 --> 00:24:29,800
see the marketing going up if it takes

649
00:24:29,800 --> 00:24:31,570
on 300 milliseconds to see that it's

650
00:24:31,570 --> 00:24:34,360
going up if I can trade 10 times faster

651
00:24:34,360 --> 00:24:35,590
than the 300 milliseconds I'll make

652
00:24:35,590 --> 00:24:37,660
money and you're making money by writing

653
00:24:37,660 --> 00:24:39,250
this way you see going up you start

654
00:24:39,250 --> 00:24:40,810
buying you see going down it starts

655
00:24:40,810 --> 00:24:41,080
early

656
00:24:41,080 --> 00:24:44,220
and do it faster than anybody else no

657
00:24:44,220 --> 00:24:47,230
question you want to do millisecond

658
00:24:47,230 --> 00:24:49,239
trading where should you have your

659
00:24:49,239 --> 00:24:54,820
service so here's what's happening out

660
00:24:54,820 --> 00:24:56,350
here it at the stock exchange but you

661
00:24:56,350 --> 00:24:58,659
can have it across the street from the

662
00:24:58,659 --> 00:25:00,820
stock exchange so the most expensive

663
00:25:00,820 --> 00:25:03,399
real estate in New York it's across the

664
00:25:03,399 --> 00:25:04,960
street from New York Stock Exchange it's

665
00:25:04,960 --> 00:25:08,739
a building that has a big fiber going to

666
00:25:08,739 --> 00:25:11,080
that building right where you can place

667
00:25:11,080 --> 00:25:13,600
your millisecond trading servers and

668
00:25:13,600 --> 00:25:16,690
that's where 99 plus plus percent of all

669
00:25:16,690 --> 00:25:18,489
the millisecond trading servers are and

670
00:25:18,489 --> 00:25:20,590
people are paying thousands of dollars

671
00:25:20,590 --> 00:25:23,830
rent for couple of square feet per month

672
00:25:23,830 --> 00:25:25,080
right

673
00:25:25,080 --> 00:25:27,730
so it's an extreme version of

674
00:25:27,730 --> 00:25:30,909
scalability you scale in terms of of

675
00:25:30,909 --> 00:25:32,889
distance because if you don't do so it's

676
00:25:32,889 --> 00:25:34,509
not what's the point of trading if those

677
00:25:34,509 --> 00:25:35,919
guys are writing the way then you're not

678
00:25:35,919 --> 00:25:38,649
right by the way I've heard the coolest

679
00:25:38,649 --> 00:25:40,450
thing also but what I'm trying to do in

680
00:25:40,450 --> 00:25:42,639
this class is to explain to you that

681
00:25:42,639 --> 00:25:44,529
everything that happens here it's

682
00:25:44,529 --> 00:25:45,879
connected with everything because of

683
00:25:45,879 --> 00:25:47,559
cell phones and websites and whatnot

684
00:25:47,559 --> 00:25:50,980
right so a lot of these stories have an

685
00:25:50,980 --> 00:25:52,389
interesting meaning that I'm hoping

686
00:25:52,389 --> 00:25:53,830
you're gonna remember mostly because of

687
00:25:53,830 --> 00:25:55,720
the story okay so the thing that I've

688
00:25:55,720 --> 00:25:58,149
heard about is that they're building now

689
00:25:58,149 --> 00:26:01,019
a trance

690
00:26:01,019 --> 00:26:08,350
oceanic a fiber to go from London to to

691
00:26:08,350 --> 00:26:12,220
Japan to Tokyo and the expense of that

692
00:26:12,220 --> 00:26:13,809
is very very large I mean I mean why

693
00:26:13,809 --> 00:26:14,799
would you do that because you already

694
00:26:14,799 --> 00:26:16,119
have ways to get to Japan

695
00:26:16,119 --> 00:26:17,559
well they are going to do that because

696
00:26:17,559 --> 00:26:21,190
they can cut down the travel time of the

697
00:26:21,190 --> 00:26:23,379
messages to a hundred milliseconds

698
00:26:23,379 --> 00:26:26,799
versus about 300 milliseconds and the

699
00:26:26,799 --> 00:26:28,299
business model for that fiber is very

700
00:26:28,299 --> 00:26:30,999
simple we are gonna rent bandwidth on

701
00:26:30,999 --> 00:26:33,309
the fiber for people that want now to

702
00:26:33,309 --> 00:26:35,529
trade in between two stock exchanges the

703
00:26:35,529 --> 00:26:36,850
London Stock Exchange and the Tokyo

704
00:26:36,850 --> 00:26:38,409
Stock Exchange so essentially if you

705
00:26:38,409 --> 00:26:40,809
don't rent been do it on that fiber you

706
00:26:40,809 --> 00:26:43,629
might as well not bother to do arbitrage

707
00:26:43,629 --> 00:26:45,249
or some other kind of trading in between

708
00:26:45,249 --> 00:26:49,960
the two stock exchanges right so by the

709
00:26:49,960 --> 00:26:52,240
way enormous fortunes were made by just

710
00:26:52,240 --> 00:26:53,590
realizing that there is a little

711
00:26:53,590 --> 00:26:54,790
discrepancy in price we

712
00:26:54,790 --> 00:26:56,980
- in things and trading fast to fill the

713
00:26:56,980 --> 00:26:59,440
gap right so if you know 300-millisecond

714
00:26:59,440 --> 00:27:00,640
hundred milliseconds before everybody

715
00:27:00,640 --> 00:27:03,100
else that the price of the same

716
00:27:03,100 --> 00:27:04,570
commodity it's a little bit higher in

717
00:27:04,570 --> 00:27:06,520
one sake change than another right by

718
00:27:06,520 --> 00:27:08,260
doing many many transactions or large

719
00:27:08,260 --> 00:27:09,400
volume translation or whatever you can

720
00:27:09,400 --> 00:27:10,450
make a lot of money and people made a

721
00:27:10,450 --> 00:27:13,480
lot of money right so this is probably

722
00:27:13,480 --> 00:27:16,060
the most direct way in which concepts of

723
00:27:16,060 --> 00:27:18,040
distributed systems correlate with raw

724
00:27:18,040 --> 00:27:21,600
large amounts of money right so a

725
00:27:21,600 --> 00:27:26,230
particularly lucrative application of

726
00:27:26,230 --> 00:27:27,370
what we are going to learn in this class

727
00:27:27,370 --> 00:27:29,380
potentially is to start doing those

728
00:27:29,380 --> 00:27:30,700
millisecond treaties by the way I don't

729
00:27:30,700 --> 00:27:32,770
play with your own money you're gonna

730
00:27:32,770 --> 00:27:35,350
lose all of it very fast okay and you

731
00:27:35,350 --> 00:27:36,730
can still lose billions of dollars

732
00:27:36,730 --> 00:27:39,790
within one second okay that happened

733
00:27:39,790 --> 00:27:41,890
right one of those algorithms so think

734
00:27:41,890 --> 00:27:43,540
about algorithms not running properly

735
00:27:43,540 --> 00:27:46,330
when one of the fast trading algorithms

736
00:27:46,330 --> 00:27:49,240
runs amok in few seconds before anybody

737
00:27:49,240 --> 00:27:50,620
can press the red button and shut down

738
00:27:50,620 --> 00:27:52,720
the system you already lost the billion

739
00:27:52,720 --> 00:27:54,040
dollars and that happened I think JP

740
00:27:54,040 --> 00:27:55,420
Morgan lost a couple billion dollars

741
00:27:55,420 --> 00:27:57,340
because one of the fascinating

742
00:27:57,340 --> 00:27:59,530
algorithms ran amok uh not to mention

743
00:27:59,530 --> 00:28:01,000
that destabilize the market a couple of

744
00:28:01,000 --> 00:28:02,740
times and the whole market went haywire

745
00:28:02,740 --> 00:28:04,900
and they had to stop trading on the New

746
00:28:04,900 --> 00:28:05,950
York Stock Exchange because some of

747
00:28:05,950 --> 00:28:07,570
these algorithms decided to do things

748
00:28:07,570 --> 00:28:11,950
right that's probably the way as a

749
00:28:11,950 --> 00:28:13,360
computer scientist that you can create

750
00:28:13,360 --> 00:28:14,920
the most havoc right you convinced

751
00:28:14,920 --> 00:28:16,030
people to let you play with this

752
00:28:16,030 --> 00:28:17,590
dangerous things and then you do things

753
00:28:17,590 --> 00:28:21,060
okay all right

754
00:28:21,060 --> 00:28:23,050
pitfall so these are the classic

755
00:28:23,050 --> 00:28:24,700
pitfalls associated with developing

756
00:28:24,700 --> 00:28:26,230
disability assistance what pitfalls

757
00:28:26,230 --> 00:28:28,060
means in this context is things you

758
00:28:28,060 --> 00:28:30,400
might assume are true but in fact they

759
00:28:30,400 --> 00:28:32,260
are not and if you depend in a crucial

760
00:28:32,260 --> 00:28:36,190
way on them then you're in trouble again

761
00:28:36,190 --> 00:28:38,200
maybe you are actually or maybe you're

762
00:28:38,200 --> 00:28:39,910
not right so at some point you have to

763
00:28:39,910 --> 00:28:41,500
draw a line and say enough is enough

764
00:28:41,500 --> 00:28:43,300
right so it's this common sense that's

765
00:28:43,300 --> 00:28:44,560
kind of hard to get because if you get

766
00:28:44,560 --> 00:28:45,850
overly obsessed all right about the

767
00:28:45,850 --> 00:28:47,650
pitfalls or scale abilities or whatever

768
00:28:47,650 --> 00:28:50,020
you're gonna design these non-existing

769
00:28:50,020 --> 00:28:51,550
solutions or you'll never design a

770
00:28:51,550 --> 00:28:52,900
solution because you convinced yourself

771
00:28:52,900 --> 00:28:54,220
that is not worth designing a solution

772
00:28:54,220 --> 00:28:55,870
because you cannot get that completely

773
00:28:55,870 --> 00:28:58,780
decentralized algorithm right but you

774
00:28:58,780 --> 00:29:00,250
can write papers about why that's not

775
00:29:00,250 --> 00:29:01,510
possible maybe we're going to see

776
00:29:01,510 --> 00:29:02,830
negative results in this class so it's

777
00:29:02,830 --> 00:29:03,730
interesting actually

778
00:29:03,730 --> 00:29:07,660
okay so let me go through them and then

779
00:29:07,660 --> 00:29:08,559
I'll come back a little bit

780
00:29:08,559 --> 00:29:10,509
the negative results Oh pitfalls right

781
00:29:10,509 --> 00:29:16,559
Network is reliable well it's not unless

782
00:29:16,559 --> 00:29:19,330
it's a local network anybody has any

783
00:29:19,330 --> 00:29:25,299
idea how reliable a wire network is you

784
00:29:25,299 --> 00:29:26,590
see so this is the problem there are

785
00:29:26,590 --> 00:29:29,289
pitfalls to pitfalls if you really have

786
00:29:29,289 --> 00:29:31,990
your data in a rack so just a nice

787
00:29:31,990 --> 00:29:33,789
support for the machines and you use

788
00:29:33,789 --> 00:29:36,490
high quality network cards the

789
00:29:36,490 --> 00:29:38,590
reliability it's incredible it's really

790
00:29:38,590 --> 00:29:40,779
stupid to assume that your network is

791
00:29:40,779 --> 00:29:43,749
not reliable literally ok but if you're

792
00:29:43,749 --> 00:29:45,369
talking about wireless network not

793
00:29:45,369 --> 00:29:46,570
particularly reliable you're talking

794
00:29:46,570 --> 00:29:47,499
about very long distance communication

795
00:29:47,499 --> 00:29:49,990
from many hops not reliable so even this

796
00:29:49,990 --> 00:29:56,649
network is reliable can be bent right by

797
00:29:56,649 --> 00:29:58,029
the way the reliability right now for

798
00:29:58,029 --> 00:30:00,879
wired networks I think is 1 in 10 to the

799
00:30:00,879 --> 00:30:02,860
18 bits gets flipped or something like

800
00:30:02,860 --> 00:30:04,210
that I mean you're not gonna see a bit

801
00:30:04,210 --> 00:30:05,619
flipped in a year basically on a

802
00:30:05,619 --> 00:30:10,149
reliable network local network but

803
00:30:10,149 --> 00:30:12,879
you're gonna see a lot of bits messed up

804
00:30:12,879 --> 00:30:14,740
in wireless networks the network it's

805
00:30:14,740 --> 00:30:17,769
secure we're talking we are going to

806
00:30:17,769 --> 00:30:19,090
talk a little bit about security and

807
00:30:19,090 --> 00:30:20,259
this is one of the biggest problems

808
00:30:20,259 --> 00:30:22,210
right now there's no such thing as

809
00:30:22,210 --> 00:30:24,129
secure right so you probably already

810
00:30:24,129 --> 00:30:26,169
took or have seen some security in some

811
00:30:26,169 --> 00:30:27,909
other classes all right so it's always

812
00:30:27,909 --> 00:30:30,759
insecure in particular in some ways keep

813
00:30:30,759 --> 00:30:32,139
tog Rafi helps a little bit we're gonna

814
00:30:32,139 --> 00:30:34,509
touch on this later the network is

815
00:30:34,509 --> 00:30:36,899
homogeneous that essentially means that

816
00:30:36,899 --> 00:30:38,799
everybody who participates in the

817
00:30:38,799 --> 00:30:39,730
network has roughly the same

818
00:30:39,730 --> 00:30:41,980
characteristics it's absolutely not true

819
00:30:41,980 --> 00:30:44,440
maybe even for the cellphone some people

820
00:30:44,440 --> 00:30:46,330
simply have a very old cell phone you

821
00:30:46,330 --> 00:30:47,830
have to cater to them as well or to say

822
00:30:47,830 --> 00:30:51,490
well unless you run whatever the Samsung

823
00:30:51,490 --> 00:30:53,470
Galaxy 4 I'm not gonna let you go on

824
00:30:53,470 --> 00:30:54,610
this website that's not a particularly

825
00:30:54,610 --> 00:30:56,610
good business model right

826
00:30:56,610 --> 00:30:59,110
topology does not change this is again

827
00:30:59,110 --> 00:31:00,669
not true for many many many reasons

828
00:31:00,669 --> 00:31:03,039
especially once you achieve a certain

829
00:31:03,039 --> 00:31:05,169
size so what's happening is if things

830
00:31:05,169 --> 00:31:07,600
are relatively small their speed Falls

831
00:31:07,600 --> 00:31:09,940
might not be quite as big as a as a

832
00:31:09,940 --> 00:31:12,909
pitfall but beyond a certain size right

833
00:31:12,909 --> 00:31:14,879
things are gonna start going down

834
00:31:14,879 --> 00:31:17,200
sometimes exponentially I'm gonna talk

835
00:31:17,200 --> 00:31:18,490
about this when we talk about sensor

836
00:31:18,490 --> 00:31:19,870
networks ok

837
00:31:19,870 --> 00:31:22,000
I happen to do some research on that so

838
00:31:22,000 --> 00:31:23,230
I know leave me more than the textbook

839
00:31:23,230 --> 00:31:24,850
writers on the issue but all right

840
00:31:24,850 --> 00:31:28,659
latency zero never a zero even when two

841
00:31:28,659 --> 00:31:30,480
cores talk to each other okay

842
00:31:30,480 --> 00:31:34,330
bandwidth it's infinite definitely not

843
00:31:34,330 --> 00:31:36,010
true you try to squeeze too much on that

844
00:31:36,010 --> 00:31:37,779
pipe it's not gonna be good and it can

845
00:31:37,779 --> 00:31:41,080
get bad in much weirder ways then you

846
00:31:41,080 --> 00:31:42,659
reach capacity and you cannot send more

847
00:31:42,659 --> 00:31:45,130
transport cost it's zero

848
00:31:45,130 --> 00:31:48,760
trees on only one administrator this has

849
00:31:48,760 --> 00:31:50,770
to do more with how do you keep up a

850
00:31:50,770 --> 00:31:53,919
large number of machines right so the

851
00:31:53,919 --> 00:31:55,720
large data centers now can easily host

852
00:31:55,720 --> 00:31:58,590
hundred thousand or more machines even

853
00:31:58,590 --> 00:32:01,539
even if you're just Google and you try

854
00:32:01,539 --> 00:32:02,770
to be very organized you still need

855
00:32:02,770 --> 00:32:04,419
multiple it means to take care of so

856
00:32:04,419 --> 00:32:06,970
many machines to do things right so that

857
00:32:06,970 --> 00:32:09,630
assumption might not be cool all right

858
00:32:09,630 --> 00:32:12,580
now one of the more direct ways to put

859
00:32:12,580 --> 00:32:14,110
together a distributed system and this

860
00:32:14,110 --> 00:32:16,720
was actually explored quite a lot is to

861
00:32:16,720 --> 00:32:18,610
have some sort of cluster computing

862
00:32:18,610 --> 00:32:19,840
system in which you have multiple

863
00:32:19,840 --> 00:32:21,970
independent machines you have a network

864
00:32:21,970 --> 00:32:23,980
connection in between them and some sort

865
00:32:23,980 --> 00:32:28,360
of a software middleware of some sort to

866
00:32:28,360 --> 00:32:30,610
do some coordinated tasks now we are

867
00:32:30,610 --> 00:32:32,230
used to coordinated us because that's

868
00:32:32,230 --> 00:32:33,460
how we interact with other human beings

869
00:32:33,460 --> 00:32:34,270
right

870
00:32:34,270 --> 00:32:36,580
we normally I mean normally I would say

871
00:32:36,580 --> 00:32:38,890
we don't but I'm gonna mention it's like

872
00:32:38,890 --> 00:32:40,870
we normally don't control each other's

873
00:32:40,870 --> 00:32:43,779
brains except that some guys from I

874
00:32:43,779 --> 00:32:45,460
don't remember not Carnegie Mellon one

875
00:32:45,460 --> 00:32:46,809
of the universities managed to get the

876
00:32:46,809 --> 00:32:48,760
break to brain interface a guy was

877
00:32:48,760 --> 00:32:49,990
thinking about something I got another

878
00:32:49,990 --> 00:32:51,490
guy to move something on the screen it's

879
00:32:51,490 --> 00:32:53,409
somewhere on YouTube or whatever right

880
00:32:53,409 --> 00:32:55,779
so the brain to brain interfaces are

881
00:32:55,779 --> 00:32:57,940
coming but for now we are just doing

882
00:32:57,940 --> 00:33:01,179
some sort of I we communicate one way or

883
00:33:01,179 --> 00:33:03,370
not and we do something the same thing

884
00:33:03,370 --> 00:33:04,510
has to be done with computers and this

885
00:33:04,510 --> 00:33:05,950
is really what the disability system is

886
00:33:05,950 --> 00:33:06,520
all about

887
00:33:06,520 --> 00:33:10,539
so one classic way to set this up is to

888
00:33:10,539 --> 00:33:13,120
have some sort of a master node that

889
00:33:13,120 --> 00:33:14,950
coordinates all the activities and then

890
00:33:14,950 --> 00:33:16,330
a number of worker nodes except that

891
00:33:16,330 --> 00:33:19,090
this breaks one of those obsessions

892
00:33:19,090 --> 00:33:21,789
right which is the centralization part

893
00:33:21,789 --> 00:33:23,140
of the algorithm is centralized

894
00:33:23,140 --> 00:33:27,789
we have only one boss right so if the

895
00:33:27,789 --> 00:33:31,059
boss dies the whole cluster is unusable

896
00:33:31,059 --> 00:33:32,440
and

897
00:33:32,440 --> 00:33:35,130
yes and we're gonna see this in class we

898
00:33:35,130 --> 00:33:37,000
designate somebody else to be the boss

899
00:33:37,000 --> 00:33:38,830
and it takes over to being a boss right

900
00:33:38,830 --> 00:33:40,300
so that's for example called video real

901
00:33:40,300 --> 00:33:42,730
action so one fancy thing that the CV

902
00:33:42,730 --> 00:33:44,650
systems people looked at is hey how

903
00:33:44,650 --> 00:33:47,080
could you design a system in which if

904
00:33:47,080 --> 00:33:49,060
the boss dies we can have a boss that's

905
00:33:49,060 --> 00:33:50,350
not a problem and that actually makes

906
00:33:50,350 --> 00:33:51,430
the algorithm faster but if the boss

907
00:33:51,430 --> 00:33:52,780
dies we elect another boss and that guy

908
00:33:52,780 --> 00:33:54,460
takes over and we switch over to another

909
00:33:54,460 --> 00:33:55,870
version of the algorithm and we are fine

910
00:33:55,870 --> 00:33:59,770
right if you do that then you have a

911
00:33:59,770 --> 00:34:02,110
very hard to kill Network you literally

912
00:34:02,110 --> 00:34:06,730
have to do disrupt a lot more things and

913
00:34:06,730 --> 00:34:09,760
just kill the to the boss right what

914
00:34:09,760 --> 00:34:11,440
happens if a computer goes down for

915
00:34:11,440 --> 00:34:13,960
example well if the boss knows what a

916
00:34:13,960 --> 00:34:15,520
computer was doing it could delegate

917
00:34:15,520 --> 00:34:17,620
exactly the same work to another worker

918
00:34:17,620 --> 00:34:19,810
but that assumes that the data is then

919
00:34:19,810 --> 00:34:22,120
available on an ordinal so you need some

920
00:34:22,120 --> 00:34:23,650
redundancy in the data to be able to do

921
00:34:23,650 --> 00:34:25,389
these things these are the kind of

922
00:34:25,389 --> 00:34:28,239
techniques used in this video systems to

923
00:34:28,239 --> 00:34:29,710
design such systems that are going to be

924
00:34:29,710 --> 00:34:31,210
resilient one way or another to

925
00:34:31,210 --> 00:34:35,800
particular kind of faults okay the

926
00:34:35,800 --> 00:34:38,230
trouble is when the master node is not

927
00:34:38,230 --> 00:34:40,870
dead but it's a little-bitty copy right

928
00:34:40,870 --> 00:34:43,030
orderly be more than he copy right then

929
00:34:43,030 --> 00:34:45,219
it's a problem because you might say hey

930
00:34:45,219 --> 00:34:47,860
let's assume that that guy is dead you

931
00:34:47,860 --> 00:34:49,719
bring another master and then suddenly

932
00:34:49,719 --> 00:34:51,100
that guy wakes up and you have two

933
00:34:51,100 --> 00:34:52,840
masters and you can wreak havoc in your

934
00:34:52,840 --> 00:34:54,520
algorithm when you get contradictory

935
00:34:54,520 --> 00:34:57,330
information from now two master nodes

936
00:34:57,330 --> 00:35:00,880
it's doable things can be done right but

937
00:35:00,880 --> 00:35:03,160
that's what distributed systems these

938
00:35:03,160 --> 00:35:04,420
are some of the core issues in

939
00:35:04,420 --> 00:35:06,430
contingency systems now grid computing

940
00:35:06,430 --> 00:35:07,750
system I don't want to talk much about

941
00:35:07,750 --> 00:35:12,280
this this was extremely high top for a

942
00:35:12,280 --> 00:35:13,750
number of years and now he's dying down

943
00:35:13,750 --> 00:35:16,930
because other buzz words came up by the

944
00:35:16,930 --> 00:35:20,470
way there is no area in computer science

945
00:35:20,470 --> 00:35:22,030
that has more buzz words than

946
00:35:22,030 --> 00:35:25,510
distributed systems maybe databases okay

947
00:35:25,510 --> 00:35:27,310
but but in between these two you have

948
00:35:27,310 --> 00:35:30,460
the most buzzwords so for example who

949
00:35:30,460 --> 00:35:32,760
here knows what cloud computing is

950
00:35:32,760 --> 00:35:36,010
what's cloud computing well let me tell

951
00:35:36,010 --> 00:35:39,310
you a secret big companies were asked

952
00:35:39,310 --> 00:35:42,070
specifically to define what they mean by

953
00:35:42,070 --> 00:35:44,260
cloud computing and they still did not

954
00:35:44,260 --> 00:35:45,970
quite want to answer

955
00:35:45,970 --> 00:35:49,000
right so cloud computing it's anything

956
00:35:49,000 --> 00:35:52,720
that has some sort of internet access

957
00:35:52,720 --> 00:35:55,510
one way or another to allow some sort of

958
00:35:55,510 --> 00:35:57,160
too much needs do the same thing at the

959
00:35:57,160 --> 00:35:59,319
same time maybe because they call cloud

960
00:35:59,319 --> 00:36:00,550
computing human things I didn't have

961
00:36:00,550 --> 00:36:02,619
this characteristic right so cloud

962
00:36:02,619 --> 00:36:04,450
computing is one of the biggest buzz

963
00:36:04,450 --> 00:36:06,400
words literally they took everything

964
00:36:06,400 --> 00:36:08,140
that happen in distributed systems for

965
00:36:08,140 --> 00:36:10,480
the last 40 years and that said let's

966
00:36:10,480 --> 00:36:12,220
call this cloud computing because it's

967
00:36:12,220 --> 00:36:14,440
gonna sound so cool to the normal people

968
00:36:14,440 --> 00:36:17,200
oh it's in the cloud now right it's

969
00:36:17,200 --> 00:36:19,260
whatever it is it's in the cloud right

970
00:36:19,260 --> 00:36:22,390
and then Amazon Web Services goes down

971
00:36:22,390 --> 00:36:24,190
and takes down hundred important

972
00:36:24,190 --> 00:36:26,500
websites because everybody was in the

973
00:36:26,500 --> 00:36:30,069
cloud right so the series systems are

974
00:36:30,069 --> 00:36:30,819
very best I guess

975
00:36:30,819 --> 00:36:32,530
okay so on grid computing it's another

976
00:36:32,530 --> 00:36:34,630
big buzzword but more in the scientific

977
00:36:34,630 --> 00:36:36,790
world they said hey we want to

978
00:36:36,790 --> 00:36:38,319
generalize a little bit the simple model

979
00:36:38,319 --> 00:36:39,640
in which you have a master and you have

980
00:36:39,640 --> 00:36:42,160
computer nodes and so on let's put

981
00:36:42,160 --> 00:36:43,569
together millions of machines right all

982
00:36:43,569 --> 00:36:45,069
the universities have lots of computers

983
00:36:45,069 --> 00:36:46,599
I mean we have lots of computers that

984
00:36:46,599 --> 00:36:48,099
you I have we have lots of computers at

985
00:36:48,099 --> 00:36:50,260
other universities let's just form this

986
00:36:50,260 --> 00:36:51,550
kind of computational and network

987
00:36:51,550 --> 00:36:53,380
between them so ideal situation will be

988
00:36:53,380 --> 00:36:55,119
I need to run a large-scale simulation

989
00:36:55,119 --> 00:36:58,329
let's say right it's very easy to do

990
00:36:58,329 --> 00:37:00,490
large-scale simulations all you have to

991
00:37:00,490 --> 00:37:03,339
do is put two say hey let's simulate for

992
00:37:03,339 --> 00:37:06,900
example how hundred billion stars

993
00:37:06,900 --> 00:37:08,740
gravitationally interact with each other

994
00:37:08,740 --> 00:37:11,859
to figure out if the shape we see four

995
00:37:11,859 --> 00:37:14,710
galaxies makes sense or not right some

996
00:37:14,710 --> 00:37:16,089
differential equations then just solve

997
00:37:16,089 --> 00:37:19,869
them 400 billion points right so it

998
00:37:19,869 --> 00:37:23,170
starts it's easy to describe but takes

999
00:37:23,170 --> 00:37:24,849
forever to get anything done in that

1000
00:37:24,849 --> 00:37:27,579
area so then the idea was hey we need

1001
00:37:27,579 --> 00:37:29,560
some sort of a protocol some sort of

1002
00:37:29,560 --> 00:37:30,880
communication between all the parts

1003
00:37:30,880 --> 00:37:33,099
involved so then we can even schedule

1004
00:37:33,099 --> 00:37:34,869
such jobs then you have many problems

1005
00:37:34,869 --> 00:37:36,520
right machines are far away so large

1006
00:37:36,520 --> 00:37:38,230
latency much so we cannot really

1007
00:37:38,230 --> 00:37:39,880
communicate very fast and cannot make

1008
00:37:39,880 --> 00:37:43,089
that assumption machine come go up up

1009
00:37:43,089 --> 00:37:47,200
and down what happens then and so on and

1010
00:37:47,200 --> 00:37:48,490
so forth and they started designing

1011
00:37:48,490 --> 00:37:49,930
these kind of protocols for grid

1012
00:37:49,930 --> 00:37:51,130
computing the trouble is we never

1013
00:37:51,130 --> 00:37:53,020
finished and they never really got to a

1014
00:37:53,020 --> 00:37:54,310
point where they have a standard that

1015
00:37:54,310 --> 00:37:55,780
everybody uses and these kind of

1016
00:37:55,780 --> 00:37:57,940
deployments go because people said hey

1017
00:37:57,940 --> 00:37:59,800
is not cool enough let's go and do cloud

1018
00:37:59,800 --> 00:38:02,410
right which essentially is what grid

1019
00:38:02,410 --> 00:38:03,580
computing was trying to do in the first

1020
00:38:03,580 --> 00:38:05,050
place but with slightly different

1021
00:38:05,050 --> 00:38:06,640
variations on top of it I'm not gonna go

1022
00:38:06,640 --> 00:38:08,260
through what it takes to do it

1023
00:38:08,260 --> 00:38:10,660
right because it's not even clear that

1024
00:38:10,660 --> 00:38:13,630
this is the right solution let me tell

1025
00:38:13,630 --> 00:38:15,490
you also a kind of a secret nobody

1026
00:38:15,490 --> 00:38:17,050
really knows what the right solution is

1027
00:38:17,050 --> 00:38:19,570
you have solutions we have things but so

1028
00:38:19,570 --> 00:38:22,840
students ask me quite often is I want to

1029
00:38:22,840 --> 00:38:25,120
do this what's the best solution well my

1030
00:38:25,120 --> 00:38:27,160
answer always is I do not know because

1031
00:38:27,160 --> 00:38:29,440
it depends on so many things you can

1032
00:38:29,440 --> 00:38:31,210
turn on and off some of the secessions

1033
00:38:31,210 --> 00:38:33,490
right or some of those pitfalls that you

1034
00:38:33,490 --> 00:38:35,500
care and you don't care about and then

1035
00:38:35,500 --> 00:38:36,820
you might find a solution for those

1036
00:38:36,820 --> 00:38:38,260
circumstances but the trouble is which

1037
00:38:38,260 --> 00:38:39,790
ones should you keep on and which one

1038
00:38:39,790 --> 00:38:43,450
should you keep off to a large extent

1039
00:38:43,450 --> 00:38:45,190
until you build it and you deploy you

1040
00:38:45,190 --> 00:38:46,650
don't quite know how is it going to work

1041
00:38:46,650 --> 00:38:49,450
right and a lot of the things that look

1042
00:38:49,450 --> 00:38:51,940
good when you've done only a simulation

1043
00:38:51,940 --> 00:38:53,470
of how they might work what in fact

1044
00:38:53,470 --> 00:38:56,320
disastrous when you deploy them right

1045
00:38:56,320 --> 00:38:58,240
and things that theoretically should not

1046
00:38:58,240 --> 00:38:59,560
have happened happened nevertheless

1047
00:38:59,560 --> 00:39:01,480
right Google going down three minutes

1048
00:39:01,480 --> 00:39:03,610
Amazon Web service is going down for

1049
00:39:03,610 --> 00:39:10,810
almost a day right now transaction

1050
00:39:10,810 --> 00:39:12,430
processing systems how many people know

1051
00:39:12,430 --> 00:39:16,030
what the transaction is so what are

1052
00:39:16,030 --> 00:39:18,670
these transactions right I mean first of

1053
00:39:18,670 --> 00:39:20,410
all why should we care about why do we

1054
00:39:20,410 --> 00:39:23,620
invent so many complicated topics and so

1055
00:39:23,620 --> 00:39:25,840
many complicated the gizmos and then

1056
00:39:25,840 --> 00:39:27,850
spend a lifetime trying to do something

1057
00:39:27,850 --> 00:39:29,440
with them it's why should we care about

1058
00:39:29,440 --> 00:39:31,720
transactions and briefly mentioned this

1059
00:39:31,720 --> 00:39:35,140
before right transactions are important

1060
00:39:35,140 --> 00:39:37,690
because they simplify the way the user

1061
00:39:37,690 --> 00:39:40,740
can think about what's going on I'm

1062
00:39:40,740 --> 00:39:43,740
sorry

1063
00:39:44,799 --> 00:39:47,599
no that's not what what transactions are

1064
00:39:47,599 --> 00:39:49,219
all about what transactions are really

1065
00:39:49,219 --> 00:39:52,309
all about okay is to make a very strong

1066
00:39:52,309 --> 00:39:55,909
promise as a system to simplify what the

1067
00:39:55,909 --> 00:39:58,699
user thinks it's happening so it allows

1068
00:39:58,699 --> 00:40:03,289
you to wipe out mistakes by just saying

1069
00:40:03,289 --> 00:40:05,959
abort transaction alright so think about

1070
00:40:05,959 --> 00:40:08,179
it even for a program you can say begin

1071
00:40:08,179 --> 00:40:10,579
transaction go and change variables and

1072
00:40:10,579 --> 00:40:12,859
do things and so on and say hey I ran in

1073
00:40:12,859 --> 00:40:15,109
trouble why don't we forget that I

1074
00:40:15,109 --> 00:40:16,609
started whatever I wanted to do a board

1075
00:40:16,609 --> 00:40:18,229
transaction and make it exactly like it

1076
00:40:18,229 --> 00:40:20,899
was before wouldn't that be cool right

1077
00:40:20,899 --> 00:40:23,659
say I ran in trouble go back to the

1078
00:40:23,659 --> 00:40:25,549
point we had in five minutes ago and

1079
00:40:25,549 --> 00:40:27,380
let's start from there because I'm gonna

1080
00:40:27,380 --> 00:40:28,899
try to do things in a different way that

1081
00:40:28,899 --> 00:40:32,359
has a user right it's one way or the

1082
00:40:32,359 --> 00:40:34,429
other part which is you start doing

1083
00:40:34,429 --> 00:40:37,249
something you do whatever you do there

1084
00:40:37,249 --> 00:40:38,269
is always the possibility you have to

1085
00:40:38,269 --> 00:40:40,099
cancel but when you know for sure that

1086
00:40:40,099 --> 00:40:41,509
you completed the task you say hey I

1087
00:40:41,509 --> 00:40:44,149
want to now to remember this state and

1088
00:40:44,149 --> 00:40:46,489
in particular I want to make sure that

1089
00:40:46,489 --> 00:40:49,009
if I make modifications to that state

1090
00:40:49,009 --> 00:40:50,899
and somebody has makes modification to

1091
00:40:50,899 --> 00:40:52,369
the state to me it looks like a simple

1092
00:40:52,369 --> 00:40:54,529
simple scenario in which I was the only

1093
00:40:54,529 --> 00:40:55,759
one modifying the state

1094
00:40:55,759 --> 00:40:57,979
so this transactional systems allow you

1095
00:40:57,979 --> 00:41:00,349
to have this significantly simplified

1096
00:41:00,349 --> 00:41:02,239
view I'm the only one using the system I

1097
00:41:02,239 --> 00:41:03,949
can roll back that's kind of the abort

1098
00:41:03,949 --> 00:41:06,649
in transactions or if I say yes the

1099
00:41:06,649 --> 00:41:08,689
transaction is committed then everybody

1100
00:41:08,689 --> 00:41:09,889
knows from his action is committed

1101
00:41:09,889 --> 00:41:11,630
alright this is important for example

1102
00:41:11,630 --> 00:41:14,449
when you do money transactions all right

1103
00:41:14,449 --> 00:41:15,319
you can get these sophisticated

1104
00:41:15,319 --> 00:41:17,239
scenarios in which you want to basically

1105
00:41:17,239 --> 00:41:21,709
go up and move for example move money

1106
00:41:21,709 --> 00:41:23,659
between banks you have to take money

1107
00:41:23,659 --> 00:41:25,489
from one and put it in the other so it

1108
00:41:25,489 --> 00:41:27,289
has to be the case it doesn't matter if

1109
00:41:27,289 --> 00:41:28,759
there are glitches if people plug the

1110
00:41:28,759 --> 00:41:30,169
cord or whatever they do it has to be

1111
00:41:30,169 --> 00:41:31,429
the case that you either didn't do

1112
00:41:31,429 --> 00:41:32,809
anything in the first place and you tell

1113
00:41:32,809 --> 00:41:35,829
the user your transaction got aborted or

1114
00:41:35,829 --> 00:41:38,299
you tell the user your transaction made

1115
00:41:38,299 --> 00:41:40,009
it through and then you took the money

1116
00:41:40,009 --> 00:41:41,509
from this Bank and you put it in the

1117
00:41:41,509 --> 00:41:42,859
other one because otherwise you're in

1118
00:41:42,859 --> 00:41:44,359
trouble you have to conserve the amount

1119
00:41:44,359 --> 00:41:46,819
of money because otherwise it's not good

1120
00:41:46,819 --> 00:41:50,329
right so these transactions come from

1121
00:41:50,329 --> 00:41:52,699
database systems were first implemented

1122
00:41:52,699 --> 00:41:55,540
in the super expensive database engines

1123
00:41:55,540 --> 00:41:57,640
right but now we are interested in this

1124
00:41:57,640 --> 00:42:00,430
kind of distributive transactions across

1125
00:42:00,430 --> 00:42:02,800
database transactions in which I might

1126
00:42:02,800 --> 00:42:05,290
have airline database and some bank

1127
00:42:05,290 --> 00:42:07,270
database and whatnot and I want if the

1128
00:42:07,270 --> 00:42:08,710
transaction the overall transaction goes

1129
00:42:08,710 --> 00:42:09,820
through and I think that is a teacher

1130
00:42:09,820 --> 00:42:12,610
here yes right these are nasty

1131
00:42:12,610 --> 00:42:15,880
transactions right if the transaction in

1132
00:42:15,880 --> 00:42:17,500
the airline database goes through the

1133
00:42:17,500 --> 00:42:19,930
hotel database goes through then the

1134
00:42:19,930 --> 00:42:21,310
whole transaction goes through but if

1135
00:42:21,310 --> 00:42:22,690
there is any failure anywhere you have

1136
00:42:22,690 --> 00:42:26,830
to roll back both right now from the

1137
00:42:26,830 --> 00:42:28,570
users point of view it's fantastic from

1138
00:42:28,570 --> 00:42:30,010
the application design point of view is

1139
00:42:30,010 --> 00:42:31,600
fantastic you just say begin transaction

1140
00:42:31,600 --> 00:42:34,210
do whatever and transaction or abort

1141
00:42:34,210 --> 00:42:36,190
transaction no worries in the world from

1142
00:42:36,190 --> 00:42:37,840
the system point of view it's really

1143
00:42:37,840 --> 00:42:39,550
really hard to implement this

1144
00:42:39,550 --> 00:42:40,930
transactions so what we are going to be

1145
00:42:40,930 --> 00:42:42,400
interested in this class is how do you

1146
00:42:42,400 --> 00:42:43,900
implement this distributed transactions

1147
00:42:43,900 --> 00:42:45,820
when multiple parties are involved you

1148
00:42:45,820 --> 00:42:47,950
see the trouble is it's not enough to

1149
00:42:47,950 --> 00:42:50,380
have one server say commit and another's

1150
00:42:50,380 --> 00:42:53,680
able to say abort that's a disaster so

1151
00:42:53,680 --> 00:42:54,880
imagine that you have a hundred servers

1152
00:42:54,880 --> 00:42:56,110
involved they're going to see this

1153
00:42:56,110 --> 00:42:57,790
examples in class a hundred service

1154
00:42:57,790 --> 00:42:59,260
involved they all keep copies of the

1155
00:42:59,260 --> 00:43:00,970
data you have to replicate data to keep

1156
00:43:00,970 --> 00:43:02,530
it in all those different corners of the

1157
00:43:02,530 --> 00:43:05,290
world to run faster right and they all

1158
00:43:05,290 --> 00:43:07,150
now have to agree whether they commit or

1159
00:43:07,150 --> 00:43:08,590
abort oops

1160
00:43:08,590 --> 00:43:11,230
right so you can yeah is very very

1161
00:43:11,230 --> 00:43:12,610
tricky and even if you have multiple

1162
00:43:12,610 --> 00:43:13,930
servers in a single rack you can get

1163
00:43:13,930 --> 00:43:16,630
tricky because your expectations go

1164
00:43:16,630 --> 00:43:18,670
higher I want to run a hundred thousand

1165
00:43:18,670 --> 00:43:20,920
transactions per second right and then a

1166
00:43:20,920 --> 00:43:24,370
message looks too slow potentially and

1167
00:43:24,370 --> 00:43:27,520
none of these things have unique

1168
00:43:27,520 --> 00:43:29,830
solutions or perfect solutions it's

1169
00:43:29,830 --> 00:43:32,140
almost always about a compromise that's

1170
00:43:32,140 --> 00:43:33,610
acceptable for those kind of

1171
00:43:33,610 --> 00:43:35,230
circumstances and it's all about

1172
00:43:35,230 --> 00:43:36,550
compromises in computer science by the

1173
00:43:36,550 --> 00:43:39,430
way right can we find a reasonable

1174
00:43:39,430 --> 00:43:41,980
common ground between the requirements

1175
00:43:41,980 --> 00:43:44,020
and what's possible to do now we know

1176
00:43:44,020 --> 00:43:45,940
for sure that is not possible

1177
00:43:45,940 --> 00:43:48,730
theoretically to produce ideal solutions

1178
00:43:48,730 --> 00:43:50,290
for certain problems and those are what

1179
00:43:50,290 --> 00:43:51,840
the negative results are all about a

1180
00:43:51,840 --> 00:43:53,860
negative result in this should be system

1181
00:43:53,860 --> 00:43:55,870
is essentially say you can't have your

1182
00:43:55,870 --> 00:43:58,810
cake and eat it too under even relaxed

1183
00:43:58,810 --> 00:44:00,430
circumstances I mean we're gonna make

1184
00:44:00,430 --> 00:44:02,890
that precise later what it means okay or

1185
00:44:02,890 --> 00:44:05,080
you can have your cake eat it too but

1186
00:44:05,080 --> 00:44:06,400
you need something that you can't have

1187
00:44:06,400 --> 00:44:08,750
like a global clock so

1188
00:44:08,750 --> 00:44:10,250
something somewhere has to give you

1189
00:44:10,250 --> 00:44:11,960
can't really have everything now that

1190
00:44:11,960 --> 00:44:12,890
doesn't mean that you cannot have

1191
00:44:12,890 --> 00:44:14,510
practical solutions it just means that

1192
00:44:14,510 --> 00:44:16,760
if you have an unrealistic expectations

1193
00:44:16,760 --> 00:44:20,930
you can't achieve them right any

1194
00:44:20,930 --> 00:44:22,430
negative results from math I should know

1195
00:44:22,430 --> 00:44:24,050
about right this is what girdle did

1196
00:44:24,050 --> 00:44:26,930
right it cut cut down deep into the

1197
00:44:26,930 --> 00:44:28,970
heart of the mathematicians but say by

1198
00:44:28,970 --> 00:44:31,130
saying you cannot accessorize Matic

1199
00:44:31,130 --> 00:44:33,230
that's was devastating for mathematics

1200
00:44:33,230 --> 00:44:35,090
until they figure out that it's a

1201
00:44:35,090 --> 00:44:37,130
profitable business to prove more

1202
00:44:37,130 --> 00:44:42,650
negative results anyway so yes right

1203
00:44:42,650 --> 00:44:44,540
this is how a transaction processing

1204
00:44:44,540 --> 00:44:47,510
system might look like maybe some sort

1205
00:44:47,510 --> 00:44:48,740
of a mourning toward the client talks to

1206
00:44:48,740 --> 00:44:50,210
the monitor and money talks to multiple

1207
00:44:50,210 --> 00:44:52,280
servers extensive discussions in class

1208
00:44:52,280 --> 00:44:55,670
about it right enterprise application

1209
00:44:55,670 --> 00:44:57,560
integration this is a very very big area

1210
00:44:57,560 --> 00:44:59,690
I mean UF ism is in fact an enterprise

1211
00:44:59,690 --> 00:45:01,250
we have hundreds of thousands of

1212
00:45:01,250 --> 00:45:03,740
computers very sophisticated database

1213
00:45:03,740 --> 00:45:05,840
servers that don't always do the right

1214
00:45:05,840 --> 00:45:08,240
thing right how do you make all of those

1215
00:45:08,240 --> 00:45:09,980
things come together and have some sort

1216
00:45:09,980 --> 00:45:12,920
of a smooth information sharing right of

1217
00:45:12,920 --> 00:45:15,650
this PeopleSoft system right it took

1218
00:45:15,650 --> 00:45:18,170
only ten years to make it not be really

1219
00:45:18,170 --> 00:45:18,680
bad

1220
00:45:18,680 --> 00:45:25,460
all right now pervasive systems are

1221
00:45:25,460 --> 00:45:26,990
really the cell phones that are

1222
00:45:26,990 --> 00:45:28,970
everywhere or other devices right Google

1223
00:45:28,970 --> 00:45:30,710
glasses right who doesn't want Google

1224
00:45:30,710 --> 00:45:33,320
glasses it's my kid already started

1225
00:45:33,320 --> 00:45:35,360
saving money for Google glasses

1226
00:45:35,360 --> 00:45:38,060
he's 11 that's the last thing I want in

1227
00:45:38,060 --> 00:45:40,730
my house right Google glasses but so

1228
00:45:40,730 --> 00:45:42,920
these systems are so because now we can

1229
00:45:42,920 --> 00:45:44,660
put the electronics everywhere you can

1230
00:45:44,660 --> 00:45:45,020
have

1231
00:45:45,020 --> 00:45:47,510
hey I'm smartwatches by the way

1232
00:45:47,510 --> 00:45:50,740
apparently Samsung is gonna beat the

1233
00:45:50,740 --> 00:45:53,240
Apple with who's gonna produce the first

1234
00:45:53,240 --> 00:45:56,570
SmartWatch they announced some release

1235
00:45:56,570 --> 00:45:59,090
date in September okay so say it start

1236
00:45:59,090 --> 00:46:01,970
saving money there is already a startup

1237
00:46:01,970 --> 00:46:04,250
that it's trying to do the same thing so

1238
00:46:04,250 --> 00:46:05,720
on and so forth right so these devices

1239
00:46:05,720 --> 00:46:06,920
are basically everywhere or they are

1240
00:46:06,920 --> 00:46:08,120
talking about medical systems in which

1241
00:46:08,120 --> 00:46:09,590
you put all kinds of sensors and all

1242
00:46:09,590 --> 00:46:10,970
kinds of tags on everybody and then you

1243
00:46:10,970 --> 00:46:12,890
know how everybody is doing or not doing

1244
00:46:12,890 --> 00:46:14,690
so it's what they do with the astronauts

1245
00:46:14,690 --> 00:46:17,030
when they go at least I used to I'm not

1246
00:46:17,030 --> 00:46:18,500
really very still bottom out right when

1247
00:46:18,500 --> 00:46:20,090
they send people on the moon they had

1248
00:46:20,090 --> 00:46:21,500
these sensors on them so that the

1249
00:46:21,500 --> 00:46:22,390
doctors on earth

1250
00:46:22,390 --> 00:46:24,490
monitor them and know what's going on

1251
00:46:24,490 --> 00:46:26,260
with them you have no reason not to do

1252
00:46:26,260 --> 00:46:27,880
that from a technological point of view

1253
00:46:27,880 --> 00:46:29,230
in the next few years but the question

1254
00:46:29,230 --> 00:46:30,190
is what are you gonna do with all that

1255
00:46:30,190 --> 00:46:31,359
information how do you put it together

1256
00:46:31,359 --> 00:46:33,190
how do you process it how do you not

1257
00:46:33,190 --> 00:46:34,869
overwhelm the systems systems are

1258
00:46:34,869 --> 00:46:36,700
already very overwhelmed even with much

1259
00:46:36,700 --> 00:46:38,799
less information right cell towers are

1260
00:46:38,799 --> 00:46:41,140
overwhelmed even by the relatively

1261
00:46:41,140 --> 00:46:42,700
simple communication we have with with

1262
00:46:42,700 --> 00:46:44,680
cell phones right so you could have

1263
00:46:44,680 --> 00:46:46,809
these things everywhere you can

1264
00:46:46,809 --> 00:46:49,569
technologically pull them off but then

1265
00:46:49,569 --> 00:46:51,190
you have to do something with it they

1266
00:46:51,190 --> 00:46:52,900
are talking about putting sensors in the

1267
00:46:52,900 --> 00:46:55,720
cell phone if you would have a

1268
00:46:55,720 --> 00:46:57,490
temperature sensor and a pressure sensor

1269
00:46:57,490 --> 00:46:59,319
in the cell phone you don't I mean you

1270
00:46:59,319 --> 00:47:01,900
could you could do a very different kind

1271
00:47:01,900 --> 00:47:03,400
of weather prediction

1272
00:47:03,400 --> 00:47:05,440
you simply have on the ground

1273
00:47:05,440 --> 00:47:08,710
measurements right in millions of points

1274
00:47:08,710 --> 00:47:10,450
with extremely high accuracy you don't

1275
00:47:10,450 --> 00:47:12,010
have to say the temperature in

1276
00:47:12,010 --> 00:47:15,069
Gainesville is whatever 78 you have a

1277
00:47:15,069 --> 00:47:16,569
detailed map of what the temperature in

1278
00:47:16,569 --> 00:47:19,480
Gainesville is now that's the good news

1279
00:47:19,480 --> 00:47:21,490
the bad news is how do you put all that

1280
00:47:21,490 --> 00:47:22,990
information together how do you get it

1281
00:47:22,990 --> 00:47:24,220
out of the system how do you push in the

1282
00:47:24,220 --> 00:47:25,599
computational model could you really do

1283
00:47:25,599 --> 00:47:27,069
better in the computational model right

1284
00:47:27,069 --> 00:47:29,500
so your first happy that you get more

1285
00:47:29,500 --> 00:47:33,759
info data and this is the the data

1286
00:47:33,759 --> 00:47:35,619
goldrush but then you get very upset

1287
00:47:35,619 --> 00:47:38,559
that you have no particularly good means

1288
00:47:38,559 --> 00:47:40,119
to mine the data this happened for

1289
00:47:40,119 --> 00:47:42,730
example with the genome project billions

1290
00:47:42,730 --> 00:47:45,819
of dollars invested in sequencing but

1291
00:47:45,819 --> 00:47:47,980
then you say okay we have all of these

1292
00:47:47,980 --> 00:47:51,819
genomes now now what right where is that

1293
00:47:51,819 --> 00:47:53,410
cool stuff we can do with that you know

1294
00:47:53,410 --> 00:47:56,519
we do a little bit but is this right

1295
00:47:56,519 --> 00:47:58,720
lots of other gold rushes right for

1296
00:47:58,720 --> 00:48:01,059
example the particle accelerator at CERN

1297
00:48:01,059 --> 00:48:02,619
right it's measuring all of these things

1298
00:48:02,619 --> 00:48:04,599
but in the end it just says oh yes so

1299
00:48:04,599 --> 00:48:06,009
there is this particular there's no

1300
00:48:06,009 --> 00:48:09,119
particle right so from trillions of

1301
00:48:09,119 --> 00:48:12,579
measurements you just get oh yes we

1302
00:48:12,579 --> 00:48:16,299
found Higgs it's there it only took four

1303
00:48:16,299 --> 00:48:17,950
months to compute whatever they were

1304
00:48:17,950 --> 00:48:19,599
computing from that right to extract

1305
00:48:19,599 --> 00:48:22,000
knowledge from such vast amount of

1306
00:48:22,000 --> 00:48:23,980
information it's still not necessarily a

1307
00:48:23,980 --> 00:48:26,470
figured out issue but of course you

1308
00:48:26,470 --> 00:48:27,880
first have to gather the data I'll talk

1309
00:48:27,880 --> 00:48:29,410
more about that okay

1310
00:48:29,410 --> 00:48:32,019
so that's essentially my overall

1311
00:48:32,019 --> 00:48:33,940
introduction there are more slides here

1312
00:48:33,940 --> 00:48:34,920
you can look

1313
00:48:34,920 --> 00:48:37,420
Health System sensor networks I'll talk

1314
00:48:37,420 --> 00:48:39,700
more about this later this is the kind

1315
00:48:39,700 --> 00:48:41,710
of stuff we want to solve in terms of

1316
00:48:41,710 --> 00:48:44,320
problems the question is how and that's

1317
00:48:44,320 --> 00:48:45,430
what the rest of the class is gonna be

1318
00:48:45,430 --> 00:48:46,450
about they're gonna have lots of

1319
00:48:46,450 --> 00:48:49,450
projects to solve them yourself in Scala

1320
00:48:49,450 --> 00:48:52,960
so start perfecting your Scala skills

1321
00:48:52,960 --> 00:48:56,110
I'm gonna throw the first project in and

1322
00:48:56,110 --> 00:48:57,730
you're already gonna accomplish some

1323
00:48:57,730 --> 00:48:59,050
interesting thing I'm gonna ask you to

1324
00:48:59,050 --> 00:49:02,380
find a specific kind of numbers and this

1325
00:49:02,380 --> 00:49:04,240
is an interesting problem that popped up

1326
00:49:04,240 --> 00:49:06,250
in a math paper right the math paper

1327
00:49:06,250 --> 00:49:08,890
proved that only certain numbers have a

1328
00:49:08,890 --> 00:49:10,360
certain property you're just gonna find

1329
00:49:10,360 --> 00:49:11,470
all of them because that's the nicest

1330
00:49:11,470 --> 00:49:13,090
way to check a theory right you just go

1331
00:49:13,090 --> 00:49:15,400
and brute force computer okay all right

1332
00:49:15,400 --> 00:49:16,720
so I'll see you Tuesday but in the

1333
00:49:16,720 --> 00:49:18,580
meantime on post oh no quiz this week

1334
00:49:18,580 --> 00:49:21,460
because this is fluff stuff so once I

1335
00:49:21,460 --> 00:49:23,440
start teaching something rial will do a

1336
00:49:23,440 --> 00:49:25,840
quiz next week or this week start

1337
00:49:25,840 --> 00:00:00,000
learning Scott