字幕記錄


00:00
okay uh good afternoon good morning
00:03
good evening good night wherever you are
00:06
uh let's get started again
00:07
uh so uh today we have a guest lecture
00:11
and probably speaker that needs a little
00:13
introduction
00:14
uh uh there's uh russ cox uh who's one
00:16
of the
00:17
co-leads on the go uh project and
00:20
you know we'll talk a lot more about it
00:22
uh let me say a couple words
00:24
uh and not try to embarrass russ too
00:27
much
00:29
russia has a long experience with
00:30
distributed systems uh
00:32
he was a developer and contributor to
00:36
plan nine uh when he was a high school
00:38
student and as an undergrad at harvard
00:41
he joined the phd program at mit uh
00:44
which is where we met up and probably if
00:47
you're taking any
00:48
sort of you know pdos class if you will
00:51
there's going to be a
00:52
you will see russ's touches on it
00:55
and certainly in 824 you know the
00:59
the go switch to go for us has been a
01:02
wonderful thing
01:03
and uh but if you differ in opinion
01:07
of course feel free to ask russ
01:09
questions and make suggestions um he's
01:11
always welcome to uh entertain any
01:15
ideas so with that russ it's yours great
01:18
thanks can you still see the slides
01:20
is that working okay great so um so we
01:23
built go to
01:24
support writing the sort of distributed
01:26
systems that we were building at google
01:28
and that made go a great fit for you
01:30
know what came next which is now called
01:31
cloud software
01:32
and also a great fit for a24 um so
01:35
in this lecture i'm going to try to
01:37
explain how i think about writing some
01:39
current programs in go
01:41
and i'm going to walk through the sort
01:42
of design and implementation of
01:44
programs for four different patterns
01:47
that i see come up often
01:48
and along the way i'm going to try to
01:50
highlight some hints or rules of thumb
01:51
that you can keep in mind when designing
01:53
your own go programs
01:55
and i know the syllabus links to an
01:56
older version of these slides so you
01:58
might have seen them already
01:59
i hope that the lecture form is a bit
02:01
more intelligible than just sort of
02:02
looking at the
02:03
slides um and i hope that in general
02:06
these patterns are like common enough
02:08
that you know maybe they'll be helpful
02:09
by themselves but also that you know
02:11
you'll
02:12
you'll the hints will help you prepare
02:14
for whatever it is you need to implement
02:18
so to start it's important to
02:20
distinguish between concurrency and
02:22
parallelism
02:23
and concurrency is about how you write
02:25
your programs about being able to
02:27
compose independently executing control
02:30
flows whether you want to call them
02:31
processes or threads or go routines
02:34
so that your program can be dealing with
02:36
lots of things at once
02:37
without turning into a giant mess on the
02:40
other hand
02:40
parallelism is about how the programs
02:42
get executed about allowing multiple
02:44
computations to run
02:46
simultaneously so that the program can
02:47
be doing lots of things at once not just
02:50
dealing with lots of things at once
02:52
and so concurrency lends itself
02:53
naturally to parallel execution
02:55
but but today the focus is on how to use
02:57
go's concurrency support to make your
02:59
programs clearer
03:01
not to make them faster if they do get
03:03
faster that's wonderful but but that's
03:04
not the point today
03:07
so i said i'd walk through the design
03:09
and implementation of some programs for
03:11
four
03:12
common concur excuse me concurrency
03:14
patterns that i see often
03:16
but before we get to those i want to
03:18
start with what seems like a really
03:19
trivial problem but that illustrates
03:22
one of the most important points about
03:23
what it means to use concurrency
03:25
to structure programs a decision that
03:28
comes up
03:28
over and over when you design concurrent
03:30
programs is whether to represent states
03:33
as code or as data and by as code i mean
03:36
the control flow in the program
03:38
so suppose we're reading characters from
03:40
a file and we need to scan over a c
03:42
style quoted string oh hello so the
03:44
slides aren't changing
03:46
yeah it will they well can you see
03:48
prologue gorgeous for state right now
03:50
no we see the title slide oh no yeah i
03:53
was wondering about that because um
03:55
there was like a border around this
03:56
thing when i started and then it went
03:58
away
04:00
so let me let me just unshare and
04:01
reshare
04:04
i have to figure out how to do that in
04:07
zoom
04:08
uh unfortunately the keynote menu wants
04:12
to be up and i don't know how to get to
04:13
the zoom
04:14
menu um
04:19
ah my screen sharing is paused why is my
04:22
screen sharing paused
04:24
can i resume there we go yeah
04:27
all right i don't know the zoom box says
04:30
your screen sharing is paused so if that
04:31
now now the border's back so i'll watch
04:33
that
04:35
all right so um see
04:38
i was back here so so you know we're
04:41
reading a string
04:42
it's not a parallel program it's reading
04:43
one character at a time so there's no
04:45
opportunity for parallelism but there is
04:46
a good opportunity for concurrency
04:48
so if we don't actually care about the
04:50
exact escape sequences in the string
04:53
what we need to do is match this regular
04:54
expression and we don't have to worry
04:56
about understanding it exactly
04:57
we'll come back to what it means but but
04:59
that's basically all you have to do is
05:01
implement this regular expression
05:02
and so you know you probably all know
05:04
you can turn a regular expression into a
05:06
state machine
05:07
and so we might use a tool that
05:08
generates this code
05:10
and in this code there's a single
05:12
variable state that's the state of the
05:14
machine
05:14
and the loop goes over the state one
05:16
character at a time reads a character
05:18
depending on the state and the character
05:19
changes to a different state until it
05:21
gets to the end
05:22
and so like this is a completely
05:24
unreadable program but it's the kind of
05:26
thing that you know an auto-generated
05:27
program might look like
05:29
and and the important point is that the
05:30
program state is stored
05:32
in data in this variable that's called
05:34
state and if you can change it to store
05:36
the state
05:37
in code that's often clearer
05:41
so here's what i mean um suppose we
05:44
duplicate the read care calls
05:45
into each case of the switch so we
05:48
haven't made any semantic changes here
05:49
we just took the read care that was at
05:50
the top and we moved it into the middle
05:54
now instead of setting state and then
05:56
immediately doing the switch again
05:57
we can change those into go to's
06:01
and then we can simplify a little bit
06:03
further there's a go to state one that's
06:05
right before the state one label we can
06:06
get rid of that
06:08
then there's a um i guess yeah so then
06:11
there's
06:12
uh you know there's only one way to get
06:14
to state two so we might as well pull
06:15
the state two code up and put it inside
06:17
the if where the go to appears
06:19
and then you know both sides of that if
06:21
now end in go to state one
06:23
so we can hoist that out and now what's
06:26
left is actually a pretty simple program
06:28
you know state zero is never jumped to
06:30
so it just begins there
06:31
and then state one is just a regular
06:33
loop so we might as well make that
06:34
look like a regular loop um
06:38
and now like this is you know looking
06:39
like a program
06:41
and then finally we can you know get rid
06:42
of some variables and simplify a little
06:44
bit further
06:46
and um and we can rotate the loop so
06:48
that you know we don't do a return true
06:50
in the middle of the loop we do the
06:51
return true at the end
06:54
and so now we've got this program that
06:56
is actually
06:57
you know reasonably nice and it's worth
07:00
mentioning that
07:01
it's possible to clean up you know much
07:02
less egregious examples you know if you
07:04
had tried to write this by hand
07:05
your first attempt might have been the
07:07
thing on the left where you've got this
07:08
extra piece of state
07:10
and then you can apply the same kinds of
07:12
transformations to
07:13
move that state into the actual control
07:15
flow and end up at the same program that
07:17
we have on the right that's cleaner
07:19
so this is you know a useful
07:21
transformation to keep in mind
07:23
anytime you have state that kind of
07:25
looks like
07:26
it might be just reiterating what's
07:29
what's happening in the program counter
07:31
um and so you know you can see this
07:34
if the the origin in the original state
07:37
like if state equals zero the program
07:38
counter is at the beginning of the
07:39
function
07:40
and if state equals one or if an escape
07:42
equals false and the other version the
07:44
per encounter is just inside the for
07:45
loop and state equals two is you know
07:47
further down in the for loop
07:48
and the benefit of writing it this way
07:50
instead of with the states
07:51
is that it's much easier to understand
07:54
like i can actually
07:55
just walk through the code and explain
07:56
it to you you know if you just read
07:57
through the code you read an opening
07:59
quote
07:59
and then you start looping and then
08:01
until you find the closing quote you
08:02
read a character and if it's a backslash
08:04
you skip the next character and that's
08:05
it right you can just read that off the
08:07
page which you couldn't do
08:08
in the original this version also
08:11
happens to run
08:12
faster although that doesn't really
08:14
matter for us
08:15
um but as i mentioned i'm going to
08:18
highlight what i think are kind of
08:19
important lessons as hints for designing
08:21
your own go programs and this is the
08:22
first one
08:23
to convert data state into code state
08:26
when it makes your programs clearer
08:28
and again like these are all hints you
08:31
should you shouldn't
08:32
you know for all of these you should
08:33
consider it as you know only if it helps
08:35
you can decide
08:38
so one problem with this hint is that
08:40
not all programs have the luxury of
08:42
having complete control over their
08:44
control flow so
08:46
you know here's a different example
08:47
instead of having a read care function
08:49
that can be called
08:50
this code is written to have a process
08:53
care method that you have to hand the
08:54
character to
08:55
one at a time and then process care has
08:58
no choice really
08:59
but to you know encode its state in an
09:02
explicit state variable because
09:03
after every character it has to return
09:05
back out and so
09:07
it can't save the state in the program
09:08
counter in the stack it has to have the
09:10
state
09:10
in an actual variable but
09:14
in go we have another choice right
09:16
because we can't save the state on that
09:18
stack
09:18
and in that program counter but you know
09:21
we can make another go routine
09:22
to hold that state for us so supposing
09:26
we already have this
09:27
debugged read string function that we
09:29
really don't want to
09:30
rewrite in this other way we just want
09:32
to reuse it it works
09:33
maybe it's really big and hairy it's
09:35
much more complicated than the thing we
09:36
saw
09:37
we just want to reuse it and so the way
09:39
we can do that and go is we can start a
09:41
new go routine
09:42
that does the read string part and it's
09:44
the same read string code as before we
09:46
pass in the character reader
09:48
and now here the um you know the init
09:51
method
09:52
makes this this go routine to do the
09:54
character reading and then every time
09:56
the process care
09:57
method is called um we send a message to
10:01
the go routine on the car channel that
10:03
says
10:03
here's the next character and then we
10:05
receive a message back that says like
10:07
tell me the current status and the
10:08
current status is always either
10:10
i need more input or you know it
10:13
basically you know was it okay or not
10:16
and so um
10:18
you know this lets us move the this the
10:21
program counter that we
10:22
we couldn't do on the first stack into
10:24
the other stack of the go routine
10:26
and so using additional go teams is a
10:28
great way to hold
10:29
additional code state and give you the
10:31
ability to do these kinds of cleanups
10:33
even if the original structure the
10:35
product the problem makes it look like
10:37
you can't
10:41
but go ahead i i assume you're fine with
10:45
uh people asking questions
10:46
yeah absolutely i just wanted to make
10:47
sure that yeah yeah definitely please
10:50
interrupt um and so so the hint here is
10:53
to use additional go routines
10:55
to hold additional code state and
10:57
there's there's one
10:58
caveat to this and then it's not free to
11:01
to just make go routines right you have
11:03
to actually make sure that they exit
11:04
because otherwise you'll just accumulate
11:06
them and so you do have to think about
11:08
uh you know why does the go routine exit
11:10
like you know is it going to get cleaned
11:12
up
11:13
and in this case we know that you know q
11:16
dot parse
11:17
is going to return where you know parse
11:20
go
11:22
sorry that's not right um
11:26
oh sorry the read string here read
11:27
string is going to return any time it
11:29
sends a
11:30
a message that says need more input
11:32
where'd it go
11:33
there's something missing from this
11:34
slide
11:38
sorry i went through this last night um
11:43
so so as we go in we go into init we
11:46
kick off this go routine it's going to
11:48
call read care a bunch of times
11:49
and then we read the status once and
11:51
that that first status is going to
11:52
happen
11:53
because the the first call to read care
11:56
from read string is going to send i need
11:58
more input and then we're going to send
11:59
a character back
12:00
um we're going to send the character
12:02
back in process care
12:04
and then every time process care gets
12:05
called it returns a status
12:07
and so up until you get um you know need
12:10
more input you're going to get the
12:12
um uh sorry
12:15
this is not working um you're going to
12:17
get any more input for every time you
12:18
want to read a character
12:20
and then when it's done reading
12:21
characters what i haven't shown you here
12:23
what seems to be missing
12:25
somehow is when things exit and when
12:28
things exit
12:29
let's see if it's on this slide yeah so
12:31
there's a return success and a return
12:33
bad input that i'd forgotten about
12:35
and so uh you know these return a
12:37
different status and then they're done
12:39
so when process care uh you know
12:42
in in the read stream version when it
12:44
returns you know bad input or success
12:47
we we say that you know it's done and so
12:49
as long as the caller
12:51
is going through and um
12:54
you know calling until it gets something
12:56
that's not need more input
12:58
then the go routine will finish but you
13:00
know maybe if we stop early if the
13:02
caller like hits an eof and stops on its
13:04
own without telling us that it's done
13:06
there's a go routine left over and so
13:07
that could be a problem
13:09
and so you just you need to make sure
13:10
that you know when and why
13:12
each go routine will exit and the nice
13:15
thing is that if you do make a mistake
13:17
and you leave guardians stuck they just
13:19
sit there it's like the best possible
13:21
bug in the world because they just sit
13:22
around waiting for you to look at them
13:24
and all you have to do is remember to
13:25
look for them
13:26
and so you know here's a very simple
13:28
program at least go routines and it runs
13:30
an http server
13:32
and so you know if we run this it kicks
13:34
off a whole bunch of effort routines
13:36
and they all uh block trying to send to
13:38
a channel and then it makes the http
13:40
server
13:40
and so if i run this program it just
13:42
sits there and if i type control
13:44
backslash on a unix system i get a sig
13:46
quit
13:47
which makes it crash and dump all the
13:48
stacks of the go routines
13:50
and you can see on the slide that you
13:52
know it's going to print over and over
13:53
again here's the go routine in h
13:54
called from g called from f and and in a
13:57
channel send
13:58
and if you look at the line numbers you
14:00
can see exactly where they are
14:02
another option is that since we're in an
14:04
http server
14:06
and the hp server imports the net http
14:09
prof package
14:10
you can actually just visit the http
14:12
server's debug pprofgoreteen url
14:15
which gives you the stacks of all the
14:16
running go routines and unlike the crash
14:18
dump
14:19
it takes a little more effort and it
14:21
deduplicates the go routines based on
14:22
their stacks
14:24
and so and then it sorts them by how
14:26
many there are of each stack and so if
14:27
you have a go routine leak
14:29
the leak shows up at the very top so in
14:31
this case you've got 100 go routines
14:32
stuck in h called from g
14:34
call from f and then we can see there's
14:36
like one of a couple other go routines
14:37
and we don't really care about them
14:39
and so you know this is a new hint that
14:41
it just it's really really useful to
14:43
look for stucco routines by just
14:45
going to this end point all right
14:49
so that was kind of the warm-up now i
14:52
want to look at the first
14:53
real concurrency pattern which is a
14:54
publish subscribe server
14:56
so publish subscribe is a way of
14:58
structuring a program that you decouple
15:00
the parts that are publishing
15:01
interesting events
15:02
from the things that are subscribing to
15:04
them and there's a published subscriber
15:05
pub sub server in the middle that
15:07
connects those
15:08
so the individual publishers and the
15:10
individual subscribers don't have to be
15:11
aware
15:12
of exactly who the other ones are so you
15:15
know on your android phone
15:16
um an app might publish a make a phone
15:18
call event and then the the dialer might
15:20
subscribe to that and actually start and
15:22
you know help dial
15:24
and and so in a real pub sub server
15:26
there are ways to filter events based on
15:27
like what kind they are
15:28
so that when you publish and make a
15:30
phone call event like it doesn't go to
15:31
your email program
15:33
but you know for now we're just going to
15:34
assume that the filtering is taken care
15:36
of separately
15:37
and we're just worried about the actual
15:38
publish and subscribe
15:40
and the concurrency of that so here's an
15:43
api
15:44
we want to implement with any number of
15:46
clients that can call subscribe
15:48
with a channel and afterwards events
15:51
that are published will be sent on that
15:53
channel
15:54
and then when a client is no longer
15:55
interested it can call cancel and pass
15:58
in the same channel
15:59
to say stop sending me events on that
16:01
channel and the way that cancel will
16:03
signal that it really is done
16:05
sending events on that channel is it
16:06
will close the channel so that the the
16:08
receive the caller can can keep
16:10
receiving events until it sees the
16:11
channel get closed and then it knows
16:13
that the cancel has taken effect
16:16
um so notice that the information is
16:20
only flowing
16:20
one way on the channel right you can
16:22
send to the channel
16:24
and then it the receiver can receive
16:26
from it and the information flows from
16:28
the sender to the receiver and it never
16:29
goes the other way
16:30
so closing is also a signal from the
16:32
sender
16:33
to the receiver but all the sending is
16:35
over the receiver cannot close the
16:37
channel to tell the sender like i don't
16:39
want you to send anymore
16:40
because that's information going the
16:42
opposite direction and it's just a lot
16:44
easier to reason about
16:45
if the information only goes one way and
16:48
of course
16:49
if you need communication in both
16:51
directions you can use a pair of
16:52
channels
16:53
and it often turns out to be the case
16:54
that those uh
16:56
different directions may have different
16:57
types of data flowing like before we saw
17:00
that there were runes going in one
17:01
direction and status updates going in
17:02
the other direction
17:04
so how do we implement this api
17:07
here's a pretty basic implementation
17:09
that you know could be good enough
17:11
we have a server and the server state is
17:14
a map of registered subscriber channels
17:16
protected by a lock
17:18
we initialize the server by just
17:20
allocating the map
17:22
and then to publish the event we just
17:24
send it to every registered channel
17:26
to subscribe a new channel we just add
17:28
it to the map and to cancel we take it
17:30
out of the map
17:32
and then because these are all um these
17:35
are all methods that might be called
17:36
from multiple go routines
17:38
um we need to call lock and unlock
17:40
around these
17:42
to um you know protect the map and
17:44
notice that i wrote defer unlock
17:46
right after the lock so i don't have to
17:49
remember to unlock it later
17:51
uh you've probably all seen this you
17:52
know it's sort of a nice idiom to just
17:53
do the lock unlock and then you know
17:55
have a blank line and have that be its
17:56
own kind of paragraph in the code
18:02
one thing i want to point out is that
18:04
using defer makes sure that the mutex
18:06
gets unlocked
18:07
even if you have multiple returns from
18:09
the function so you can't forget
18:10
but it also makes sure that it gets
18:12
unlocked if you have a panic
18:14
like in subscribe and cancel where
18:16
there's you know panics for misuse
18:18
and there is a subtlety here about if
18:21
you might not want to unlock the mutex
18:23
if
18:23
the panic happened while the thing that
18:25
was locked is in some inconsistent state
18:27
but i'm going to ignore that for now in
18:29
general
18:31
you try to avoid having the the things
18:33
that might panic
18:35
happen while you're you know potentially
18:36
an inconsistent state
18:38
and i should also point out that the use
18:40
of panic at all in subscribe
18:42
and cancel implies that you really trust
18:44
your clients not to misuse the interface
18:46
that
18:47
it is a program error worth you know
18:49
tearing down the entire program
18:51
potentially
18:52
for that to happen and in a bigger
18:54
program where other clients were using
18:55
this api
18:57
you'd probably want to return an error
18:58
instead and not have the possibility of
19:01
taking down the whole program
19:02
but panicking simplifies things for now
19:05
and you know error handling in general
19:06
is kind of
19:07
not the topic today
19:10
a more important concern with this code
19:12
than panics is what happens
19:14
if a go routine is slow to receive
19:17
events
19:18
so all the operations here are done
19:20
holding the mutex which means all the
19:22
clients kind of have to proceed in
19:23
lockstep
19:25
so during publish there's a loop that's
19:27
sending
19:28
on the channels sending the event to
19:30
every channel and if one subscriber
19:32
falls behind
19:33
the next subscriber doesn't get the
19:35
event until that slow subscriber you
19:36
know wakes up and actually gets the
19:38
the event off off that channel and so
19:41
one slow subscriber
19:42
can slow down everyone else and you know
19:45
forcing them to proceed in lockstep this
19:47
way
19:47
is not always a problem if you've you
19:50
know documented the restriction
19:51
and for whatever reason you know how the
19:53
clients are are written
19:54
and you know that they won't ever fall
19:56
too far behind this could be totally
19:58
fine it's a really simple implementation
20:01
and um and it has nice properties like
20:03
on return from publish you know that the
20:05
event has
20:06
actually been handed off to each of the
20:07
other grow routines you don't know that
20:09
they've started processing it but you
20:10
know it's been handed off
20:12
and so you know maybe that's good enough
20:14
and you could stop here
20:16
a second option is that if you need to
20:18
tolerate just a little bit of slowness
20:20
on the the subscribers then you could
20:23
say that they need to give you a
20:24
buffered channel with room for a couple
20:26
events in the buffer
20:27
so that you know when you're publishing
20:30
you know as long as they're not too far
20:31
behind there'll always be room
20:33
for the new event to go into the channel
20:35
buffer
20:36
and then the actual publish won't block
20:38
for too long
20:39
and again maybe that's good enough if
20:41
you're sure that they won't ever fall
20:42
too far behind
20:43
you get to stop there but in a really
20:46
big program
20:48
you do want to cope more gracefully with
20:50
arbitrarily arbitrarily slow subscribers
20:53
and so then the question is what do you
20:54
do
20:55
and so you know in general you have
20:56
three options you can slow down the
20:58
event generator
20:59
which is what the previous solutions
21:01
implicitly do
21:02
because publish stops until the
21:05
subscribers catch up
21:07
or you can drop events or you can queue
21:09
an arbitrary number of past events
21:11
those are pretty much your only options
21:14
so we talked about
21:15
you know publish and slowing down the
21:17
event generator
21:19
there's a middle ground where you
21:20
coalesce the events or you drop them
21:23
um so that you know the subscriber might
21:26
find out that
21:27
you know hey you missed some events and
21:29
i can't tell you what they were because
21:30
i didn't save them but but i'm at least
21:31
going to tell you
21:32
you missed five events and then maybe it
21:34
can do something else to try to catch up
21:37
and this is the kind of approach that um
21:39
that we take in the profiler so in the
21:41
profiler if you've used it
21:43
if uh there's a go routine that uh fills
21:46
the profile on on a signal handler
21:48
actually
21:49
with profiling events and then there's a
21:51
separate go routine whose job is to read
21:52
the data back out and like write it to
21:54
disk or send it to a http request or
21:56
whatever it is you're doing with profile
21:58
data
21:59
and there's a buffer in the middle and
22:01
if the
22:02
receiver from the profile data falls
22:04
behind when the buffer fills up we start
22:06
adding entries to
22:08
a final profile entry that just has a
22:11
single entry that's that's
22:12
a function called runtime.lost profile
22:14
data and so
22:15
if you go look at the profile you see
22:17
like hey the program spent five percent
22:18
of its time in lost profile data
22:20
that just means you know the the profile
22:22
reader
22:23
was too slow and it didn't catch up and
22:26
and we lost some of the profile but
22:27
we're clear about exactly
22:28
you know what the error rate is in the
22:30
profile and you pretty much never see
22:33
that because all the readers actually do
22:34
keep up
22:35
but just in case they didn't you have a
22:37
pretty clear signal um
22:40
an example of purely dropping the events
22:42
is the os signal package
22:44
where um
22:47
you have to pass in a channel that will
22:49
be ready to receive the signal
22:50
a signal like sig hop or sig quit and
22:54
when the signal comes in
22:55
the run time tries to send to each of
22:57
the channels that subscribe to that
22:58
signal
22:59
and if it can't send to it it just
23:00
doesn't it's just gone
23:02
um because you know we're in a signal
23:03
handler we can't wait and so
23:06
what the callers have to do is they have
23:07
to pass in a buffered channel and if
23:09
they pass in a buffered channel
23:10
that has you know length at least one
23:12
buffer length at least one
23:14
and they only register that channel to a
23:16
single signal
23:18
then you know that if a signal comes in
23:21
you're definitely going to get told
23:22
about it
23:23
if it comes in twice you might only get
23:25
told about it once
23:26
but that's actually the same semantics
23:28
that unix gives to processes for signals
23:30
anyway
23:30
so that's fine so those are both
23:32
examples of dropping or coalescing
23:34
events
23:36
and then the third choice is that you
23:38
might actually just
23:39
really not want to lose any events it
23:41
might just be really important that you
23:43
never lose anything
23:44
in which case you know you can queue an
23:46
arbitrary number of events you can
23:48
somehow arrange for
23:50
the program to just save all the events
23:52
that the
23:53
you know slow subscriber hasn't seen yet
23:54
somewhere and and give them to the
23:56
subscriber later
23:58
and it's really important to think
23:59
carefully before you do that because in
24:01
a distributed system
24:03
you know there's always slow computers
24:05
always computers that
24:06
have fallen offline or whatever and they
24:08
might be gone for a while
24:10
and so you don't want to introduce
24:11
unbounded queuing in general you want to
24:13
think very carefully before you do that
24:15
and think well you know
24:16
how unbounded is it really and can i
24:18
tolerate that
24:19
and so like that's a reason why channels
24:21
don't have just an unbounded buffering
24:24
it's really almost never the right
24:25
choice and if it is the right choice
24:27
you probably want to build it very
24:28
carefully
24:30
um and so but we're going to build one
24:32
just to see what it would look like
24:35
and before we do that i just want to
24:38
adjust the program a little bit
24:39
so we have this mutex in the code
24:42
and the mutex is an example of of
24:45
keeping the
24:46
the state whether you're locked or not
24:48
in a state variable
24:49
but we can also move that into a program
24:51
counter variable
24:52
by putting it in a different go routine
24:55
and so
24:57
in this case we can start a new go
24:58
routine that runs a program a function
25:01
called s dot loop
25:02
and it handles requests sent on three
25:04
new channels publish subscribe and
25:06
cancel
25:07
and so in init we make the channels and
25:10
then we we kick off
25:11
s dot loop and s dot loop is sort of
25:14
the amalgamation of the previous method
25:16
bodies and it just
25:18
receives from any of the three channels
25:20
a request
25:21
a publish a subscriber a cancel request
25:23
and it does whatever was asked
25:26
and now that map the subscriber map
25:29
can be just a local variable in s dot
25:31
loop
25:32
and and so um you know it's the same
25:34
code
25:35
but now that data is clearly owned by
25:38
s.loop nothing else could even get to it
25:40
because it's a local variable
25:44
and then we just need to change the
25:45
original methods to send the work over
25:47
to the loop go routine and so uppercase
25:49
publish
25:50
now sends on lowercase publish the
25:52
channel
25:53
the event that it wants to publish and
25:55
similarly subscribe and cancel
25:58
they create a request that has a channel
26:01
uh
26:01
that we want to subscribe and also a
26:03
channel to get the answer back
26:05
and they send that into the loop and
26:07
then the loop sends back the answer
26:11
and so i referred to transforming the
26:14
program this way as like converting the
26:15
mutex
26:16
into a go routine because we took the
26:18
data state of the mutex there's like a
26:20
lock bit inside it and now that lock bit
26:22
is
26:22
implicit in the program counter of the
26:24
loop um
26:25
it's very clear that you can't ever have
26:27
you know a publish and subscribe
26:29
happening at the same time
26:30
because it's just single threaded code
26:32
and just you know executes in sequence
26:36
on the other hand the the original
26:38
version had a kind of like clarity of
26:40
state where you could sort of inspect it
26:42
and and reason about well this is the
26:44
important state and and it's harder in
26:46
the
26:47
go routine version to see like what's
26:48
important state and what's kind of
26:50
incidental state from just having a go
26:52
routine
26:53
and in a given situation you know one
26:55
might be more important than the other
26:57
so a couple years ago i did all the labs
27:00
for the class when it first switched to
27:01
go
27:02
and and raft is a good example of where
27:05
you probably prefer the state with the
27:07
mutex is because
27:08
raft is is so different from most
27:11
concurrent programs and that like
27:12
each replica is just kind of profoundly
27:15
uncertain of its state right like the
27:17
state transitions
27:18
you know one moment you think you're the
27:20
leader and the next moment you've been
27:21
deposed
27:22
like one moment your log has ten entries
27:23
the next moment you find actually no it
27:24
only has two entries
27:26
and so being able to manipulate that
27:28
state directly
27:29
rather than having to you know somehow
27:31
get it in and out of the program counter
27:32
makes a lot more sense for raft but
27:34
that's pretty unique in most situations
27:37
it cleans things up to put the state in
27:39
the program counter
27:42
all right so in order to deal with the
27:44
slow subscribers
27:46
now we're going to add some helper go
27:47
routines and their job
27:49
is to manage a particular subscriber's
27:51
backlog and keep the overall program
27:53
from blocking
27:54
and so this is the helper go team and
27:57
the the
27:57
the main loop go routine will send the
27:59
events to the helper
28:00
which we then trust because we wrote it
28:02
not to fall arbitrarily behind
28:05
and then the helpers job is to cue
28:07
events if needed and send them off to
28:08
the subscriber
28:10
all right so this actually has um two
28:13
problems
28:14
the first is that if there's nothing in
28:16
the queue
28:17
then the select is actually wrong to try
28:19
to offer q of zero and in fact just
28:21
evaluating q of zero at the start of the
28:23
select will panic
28:24
because the queue is empty and so we can
28:27
fix these
28:28
by setting up the arguments separately
28:30
from the select and in particular
28:32
we need to make a channel send out
28:35
that's going to be nil
28:36
which is never able to proceed in a
28:39
select
28:40
um as we know when we don't want to send
28:42
and it's going to be the actual
28:44
out channel when we do want to send and
28:46
then we have to have a separate variable
28:47
that holds the event that we're going to
28:49
send it will only you know
28:50
actually read from q of 0 if there's
28:52
something in the queue
28:55
the second thing that's wrong is that we
28:57
need to handle closing of the channel of
28:59
the input channel
29:00
because when the input channel closes we
29:02
need to flush the rest of the queue and
29:04
then we need to close the output channel
29:06
so to check for that we change the
29:08
select from just
29:09
doing e equals receive from n to e comma
29:12
okay equals receive from n and the comma
29:14
okay we'll be told
29:15
whether or not the channel is actually
29:17
sending real data or else it's closed
29:20
and so when okay is false we can set
29:22
into nil
29:23
to say let's stop trying to receive from
29:24
in there's nothing there we're just
29:25
going to keep getting told that it's
29:27
closed
29:28
and then when the loop is fine when the
29:31
queue is finally empty we can exit the
29:33
loop
29:33
and so we change the for condition to
29:36
say we want to keep exiting the loop as
29:38
long as
29:38
there actually still is an input channel
29:40
and there's something
29:41
to write back to the output channel and
29:43
then once both of those are not true
29:45
anymore
29:45
it's time to close it's time to exit the
29:47
loop and we close the output channel
29:49
and we're done and so now we've
29:50
correctly propagated the closing
29:52
of the input channel to the output
29:54
channel
29:56
so that was the helper and the server
29:58
loop used to look like this
30:01
and to update it we just changed the
30:03
subscription map
30:04
before it was a map from subscribe
30:06
channels to bools it was just basically
30:08
a set
30:09
and now it's a map from subscribe
30:11
channel to helper channel
30:12
and every time we get a new subscription
30:15
we make a helper channel
30:16
we kick off a helper go routine and we
30:19
record the helper channel in the
30:20
subscription map
30:21
instead of the the actual channel and
30:24
then the rest of
30:26
uh the rest of the the loop actually
30:29
barely changes at all
30:32
so i do want to point out that like if
30:34
you wanted to have a different strategy
30:36
for you know what you do with uh clients
30:39
that fall too far behind
30:40
that can all go in the helper go routine
30:42
the code on the screen right now
30:44
is completely unchanged so we've we've
30:45
completely separated the
30:47
publish subscribe maintaining the the
30:49
actual list of subscribers map
30:51
from the what do you do when things get
30:53
too slow map
30:54
or problem and so it's really nice that
30:58
you've got this clean separation of
30:59
concerns into completely different go
31:01
routines and that can help you you know
31:02
keep your program simpler
31:04
and so that's the general hint is that
31:06
you can use go routines a lot of the
31:07
time to separate independent concerns
31:11
all right so um
31:15
the second pattern for today is a work
31:17
scheduler
31:18
and you did one of these in lab one for
31:20
mapreduce and i'm just gonna
31:21
you know build up to that and and this
31:24
doesn't do all the rpc stuff it just
31:25
kind of assumes that there's kind of
31:27
channel
31:27
channel based interfaces to all the the
31:29
servers
31:31
so you know we have this function
31:33
scheduled it takes a
31:34
fixed list of servers has a number of
31:36
tasks to run
31:37
and it has just this abstracted function
31:39
call that you you call
31:41
to run the task on a specific server you
31:44
can imagine it was you know doing the
31:45
rpcs underneath
31:48
so we're going to need some way to keep
31:49
track of which servers are available
31:51
to execute new tasks and so one option
31:54
is to use our own stack or queue
31:55
implementation
31:56
but another option is to use a channel
31:58
because it's a good
31:59
synchronized queue and so we can send
32:02
into the channel
32:03
to add to the queue and receive from it
32:05
to pop something off
32:07
and in this case we'll make the queue be
32:09
a queue of servers
32:10
and we'll start off it's a queue of idle
32:12
servers servers that aren't doing any
32:14
work for us right now
32:15
and we'll start off by just initializing
32:17
it by sending all the known servers into
32:19
the idle list
32:21
and then we can loop over the tasks and
32:23
for every task we kick off a go routine
32:25
and its job is to pull a server off the
32:27
idle list
32:28
run the task and then put the server
32:30
back on
32:32
and this loop body is another example of
32:35
the earlier hint to use guaranteeing
32:36
select independent things run
32:38
independently
32:39
because each task is running as a
32:41
separate concern they're all running in
32:42
parallel
32:44
unfortunately there are two problems
32:46
with this program
32:48
the first one is that the closure that's
32:50
running as a new go routine refers to
32:51
the loop iteration variable which is
32:53
task
32:54
and so by the time the go routine starts
32:55
exiting you know the loop has probably
32:57
continued and done at task plus plus and
32:59
so it's actually getting the wrong value
33:00
of task
33:02
you've probably seen this by now um and
33:04
of course the best way to to
33:06
catch this is to run the race detector
33:08
and at google we even encourage teams to
33:10
set up canary servers that
33:11
run the race detector and split off
33:13
something like you know 0.1 percent of
33:15
their traffic to it
33:16
just to catch um you know races that
33:18
might be in the production system
33:20
and you know finding a bug with a race
33:22
detector is is way better than having to
33:24
debug some you know corruption later
33:27
so there are two ways to fix this race
33:30
the first way
33:30
is to give the closure an explicit
33:32
parameter and pass it in
33:34
and the go statement requires a function
33:37
call specifically for this reason
33:39
so that you can set specific arguments
33:41
that get evaluated
33:42
in the context of the original go
33:44
routine and then get copied to the new
33:46
go routine
33:47
and so in this case we can declare a new
33:49
argument task two we can pass
33:51
task to it and then inside the go
33:53
routine task 2
33:54
is a completely different copy of of
33:56
task
33:58
and i only named it task 2 to make it
33:59
easier to talk about
34:01
but of course there's a bug here and the
34:03
bug is that
34:04
i forgot to update task inside the
34:06
function to refer to task two instead of
34:08
task
34:09
and so we basically never do that um
34:12
what we do instead
34:13
is we just give it the same name so that
34:16
it's impossible now
34:17
for the code inside the go regime to
34:19
refer to the wrong copy of task
34:22
um that was the first way to fix the
34:24
race there's a second way which is you
34:25
know sort of cryptic the first time you
34:27
see it but it amounts to the same thing
34:29
and that is that you just make a copy of
34:31
the the variable
34:32
inside the loop body so every time
34:36
a colon equals happens that creates a
34:38
new variable so in the for loop in the
34:40
outer for loop there's a colon equals at
34:42
the beginning
34:42
and there's not one the rest of the loop
34:44
so that's all just one variable for the
34:46
entire loop
34:47
whereas if we put a colon equals inside
34:48
the body every time we run an iteration
34:50
of the loop that's a different variable
34:52
so if the guard if the go function
34:55
closure captures that variable
34:57
those will all be distinct so we can do
34:59
the same thing we do task two and this
35:01
time i remember to update the body
35:03
but you know just like before it's too
35:05
easy to forget to update the body
35:07
and so typically you write task colon
35:08
equals task which looks kind of magical
35:10
the first time you see it but but that's
35:12
what it's for
35:14
all right so i said there were two bugs
35:16
in the program the first one was this
35:18
race on task
35:19
and the second one is that uh we didn't
35:22
actually
35:23
do anything after we kicked off all the
35:25
tasks we're not waiting for them to be
35:26
done
35:28
um and and in particular uh we're
35:31
kicking them off way too fast
35:33
because you know if there's like a
35:35
million tasks you're going to kick off a
35:36
million guard teams and they're all just
35:37
going to sit waiting for one of the five
35:39
servers
35:39
which is kind of inefficient and so what
35:41
we can do
35:42
is we can pull the fetching of the the
35:45
next idle server up
35:46
out of the go routine and we pull it up
35:50
out of the go routine
35:51
now we'll only kick off a go routine
35:53
when there is a next server to use
35:56
and then we can kick it off and and you
35:58
know use that server and put it back
36:00
and the using the server and put it back
36:01
runs concurrently but
36:03
doing the the fetch of the idle server
36:05
inside the loop slows things down so
36:06
that
36:07
there's only ever now number of servers
36:09
go routines running instead of number of
36:10
tasks
36:12
and that receive is essentially creating
36:14
some back pressure to slow down the loop
36:16
so it doesn't get too far ahead and then
36:19
i mentioned we have to wait for the task
36:20
to finish
36:21
and so we can do that by just at the end
36:23
of the loop uh going over the the list
36:25
again and pulling all the servers out
36:27
and we've pulled you know the right
36:28
number of servers out of the idle list
36:30
that means they're all done and so
36:32
that's that's the
36:33
full program now to me the most
36:36
important part of this
36:37
is that you still get to write a for
36:39
loop to iterate over the tasks
36:41
there's lots of other languages where
36:42
you have to do this with state machines
36:44
or some sort of callbacks
36:46
and you don't get the luxury of encoding
36:48
this in the control flow
36:49
um and so this is a you know much
36:51
cleaner way where you can just you know
36:53
use a regular loop
36:55
but there are some some changes we could
36:57
make some improvements
36:58
and so one improvement is to notice that
37:01
there's only one go routine that makes
37:03
requests of a server at a particular
37:05
time
37:05
so instead of having one go routine per
37:07
task maybe we should have one go routine
37:09
per server
37:10
because there are probably going to be
37:12
fewer servers than tasks
37:14
and to do that we have to change from
37:16
having a channel of idle servers to a
37:18
channel of
37:18
you know yet to be done tasks and so
37:21
we've renamed the idle channel to work
37:23
and then we also need a done channel to
37:26
count um
37:27
you know how many uh tasks are done so
37:29
that we know when we're completely
37:30
finished
37:32
and so here there's a new function run
37:34
tasks and that's going to be the per
37:36
server function and we kick off one of
37:38
them for each server
37:40
and run tasks his job is just to loop
37:42
over the work channel
37:43
run the tasks and when the server is
37:45
done we send true to done
37:48
and the you know the server tells us
37:50
that you know it's done
37:52
and the server exits when the work
37:53
channel gets closed that's what makes
37:55
that for loop
37:56
actually stop so then
37:59
you know having kicked off the servers
38:00
we can then just sit there in a loop
38:02
and send each task to the work channel
38:05
close the work channel and say hey
38:07
there's no more work coming all the
38:08
servers you should finish and then and
38:09
then exit
38:10
and then wait for all the servers to
38:11
tell us that they're done
38:15
so in the lab there were a couple
38:17
complications one was that
38:18
you know you might get new servers at
38:20
any given time um and so we could change
38:22
that by saying the servers come in on a
38:24
channel of strings
38:27
and and that actually fits pretty well
38:28
into the current structure where
38:30
you know when you get a new server you
38:32
just um kick off a new uh
38:34
run tasks go routine and so the only
38:36
thing we have to change here is to put
38:37
that loop
38:38
into its own go routine so that while
38:40
we're sending tasks to servers we can
38:42
still accept new servers and kick off
38:44
the helper go routines
38:47
but now we have this problem that we
38:48
don't really have a good way to tell
38:49
when all the servers are done because we
38:51
don't know how many servers there
38:52
are and so we could try to like
38:56
maintain that number as servers come in
38:58
but it's a little tricky
38:59
and instead we can count the number of
39:01
tasks that have finished
39:02
so we just move the done sending true to
39:05
done up a line
39:06
so that instead of doing it per server
39:08
we now do it per task
39:09
and then at the end of the loop or at
39:11
the end of the function we just have to
39:12
wait for the right number of tasks to be
39:14
done
39:15
and so so now again we sort of know uh
39:19
why these are gonna the finish um
39:21
there's actually a deadlock still
39:23
and that is that if the the number of
39:25
tasks is um
39:27
is too big actually i think always you
39:30
you'll get a deadlock
39:31
and if you run this you know you get
39:32
this nice thing where the dirt it tells
39:34
you like hey your routines are stuck and
39:35
the problem is
39:36
that you know we have this run task uh
39:39
server loop
39:40
and the server loop is trying to say hey
39:42
i'm done and you're trying to say hey
39:44
like here's some more work so if you
39:45
have more than one task you'll run into
39:47
this deadlock
39:48
where you know you're trying to send the
39:50
next task to a server
39:51
i guess that is more task than servers
39:54
you're trying to send the next task to a
39:55
server and all the servers are trying to
39:56
say hey i'm done with the previous task
39:58
but you're not there to receive from the
40:00
done channel
40:01
and so again you know it's really nice
40:04
that the
40:04
the guardians just hang around and wait
40:06
for you to look at them and we can fix
40:08
this
40:09
one way to fix this would be to add a
40:12
separate loop that actually does a
40:14
select
40:14
that either sends some work or accounts
40:17
for some of the work being done
40:19
that's fine but a cleaner way to do this
40:22
is to take the the work sending loop the
40:25
task sending loop and put it in its own
40:27
go routine
40:28
so now it's running independently of the
40:30
counting loop and the counting loop
40:32
can can run and you know unblock servers
40:35
that are done with certain tasks while
40:37
other tasks are still being sent
40:41
but the simplest possible fix for this
40:44
is to just make the work channel big
40:45
enough
40:46
that you're never gonna run out of space
40:49
because we might decide that you know
40:50
having a go routine per task is you know
40:52
a couple kilobytes per task
40:54
but you know an extra inch in the
40:56
channel is eight bytes
40:58
so probably you can spend eight bytes
40:59
per task
41:01
and so if you can you just make the work
41:03
channel big enough that you know that
41:04
all the sends on work
41:05
are going to never block and you'll
41:07
always get down to the the counting loop
41:10
at the end pretty quickly and so
41:13
doing that actually sets us up pretty
41:15
well for the other wrinkle in the lab
41:16
which is that
41:17
sometimes calls can time out and here
41:19
i've modeled it by
41:20
the call returning a false so just say
41:22
hey it didn't work
41:24
um and so you know in run task it's
41:27
really easy to say like if
41:30
it's really easy to say like if the call
41:32
uh fails
41:34
then or sorry if the call succeeds then
41:36
you're done but if it fails just put the
41:38
task back on the work list
41:40
and because it's a queue not a stack
41:42
putting it back on the work list is very
41:43
likely to hand it to some other server
41:46
um and so that will you know probably
41:48
succeed
41:49
because it's some other server i mean
41:50
this is all kind of hypothetical but
41:52
um uh it's a really you know it fits
41:56
really well into the structure that
41:57
we've created
42:00
all right and the final change is that
42:02
because the server guarantees are
42:03
sending on work
42:05
we do have to uh wait to close it until
42:07
we know that they're done sending
42:09
and uh because again you can't close you
42:11
know before they finish sending
42:14
and so we just have to move the close
42:16
until after we've counted that all the
42:17
tasks are done
42:19
um and you know sometimes we get to this
42:21
point and people ask like why can't you
42:23
just
42:23
kill go routines like why not just be
42:25
able to say look hey kill all the server
42:27
guardians at this point we know that
42:28
they're not needed anymore
42:29
and the answer is that you know the go
42:31
routine has state and it's interacting
42:32
with the rest of the program and if it
42:34
all of a sudden just stops
42:36
it's sort of like it hung right and
42:38
maybe it was holding a lock
42:39
maybe it was in the middle of some sort
42:40
of communication with some other guru
42:42
team that was kind of expecting an
42:43
answer so we need to find some way to
42:46
tear them down more gracefully and
42:47
that's by telling them explicitly hey
42:49
you know
42:49
you're done you can you can go away and
42:51
then they can clean up however
42:52
they need to clean up
42:57
um you know speaking of cleaning up
42:59
there's there's actually one more thing
43:00
we have to do which is to shut down the
43:01
loop that's that's watching for new
43:03
servers
43:04
and so we do have to put a select in
43:05
here where
43:07
uh you know the the thing that's waiting
43:10
for new servers on the server channel we
43:11
have to
43:12
tell it okay we're done just like stop
43:14
watching for new servers because all the
43:15
servers are gone
43:16
um and we could make this the caller's
43:19
problem but but this is actually fairly
43:21
easy to do
43:24
all right so um pattern number three
43:26
which is a a client for a replicated
43:28
server
43:29
of service so here's the interface that
43:32
we want to implement we have some
43:34
service
43:35
that we want that is replicated for
43:38
reliability
43:39
and it's okay for a client to talk to
43:40
any one of these servers
43:42
and so the the replicated client is
43:45
given a list of servers the uh the
43:48
arguments to init is a list of servers
43:50
and a function that lets you call one of
43:53
the servers with a particular argument
43:55
set and get a reply
43:57
and then being given that during init
44:00
the replicated client then provides
44:02
a call method that doesn't tell you what
44:05
server it's going to use it just finds a
44:07
good server to use
44:08
and it keeps the same keeps using the
44:10
same server for as long as it can until
44:12
it finds out that that server is no good
44:15
so in this situation there's almost no
44:17
shared state that you need to isolate
44:19
and so like the only state that persists
44:20
from one call to the next is what server
44:22
did i use last time because i'm going to
44:23
try to use that again
44:25
so in this case that's totally fine for
44:27
a mutex i'm just going to leave it there
44:29
it's always okay to use mutex if that's
44:31
the cleanest way to write the code
44:33
you know some people get the wrong
44:34
impression from how much we talk about
44:36
channels but it's always okay to use a
44:37
mutex if that's all you need
44:40
so now we need to implement this
44:42
replicated call method whose job is to
44:44
try sending to lots of different servers
44:47
right but but first to try the
44:49
the original server so so what does it
44:52
mean if
44:52
you know the try fails well there's like
44:55
no
44:56
clear way for it to fail above it just
44:58
always returns a reply and so the only
45:00
way it can fail is if it's taking too
45:01
long
45:02
so we'll assume that if it takes too
45:03
long that means it failed
45:05
so in order to deal with timeouts we
45:08
have to run that
45:08
that code in the background in a
45:10
different go routine so we can do
45:12
something like this
45:14
um where we set a timeout we create a
45:16
timer
45:18
and then we use the go routine to send
45:19
in the background and then at the end we
45:21
wait and either we get the timeout
45:23
or we get the actual reply if we get the
45:25
actual reply
45:26
we return it if we get the timeout we
45:29
have to do something we'll have to
45:30
figure out what to do
45:32
um it's worth pointing out that you have
45:34
to
45:35
call tdot stop because otherwise the
45:37
timer sits in a timer queue that you
45:39
know it's going to go off in one second
45:41
and so you know if this call took a
45:42
millisecond and you have this timer
45:44
that's going to sit there for the next
45:45
second
45:46
and then you do this in a loop and you
45:47
get a thousand timers sitting in that
45:49
that um that queue before they start
45:51
actually you know um
45:52
disappearing and so this is kind of a
45:55
wart in the api
45:56
but it's been there forever and we've
45:58
never fixed it um
46:00
and and so you just have to remember to
46:02
call stop
46:04
uh and then you know now we have to
46:05
figure out what do we do in the case of
46:07
the timeout
46:08
and so in the case of the timeout we're
46:10
going to need to try a different server
46:11
so we'll write a loop and we'll start
46:14
at um the id that id0 it says
46:18
and you know if a reply comes in that's
46:20
great and otherwise we'll reset the
46:22
timeout and go around the loop again
46:24
and try sending to a different server
46:26
and notice
46:28
there's only one done channel in this
46:30
program and so
46:31
you know on the third iteration of the
46:33
loop we might be waiting
46:35
and then finally the first server gives
46:36
us a reply that's totally fine we'll
46:38
take that reply that's great
46:41
um and so then we'll stop and return it
46:44
and but if we get all the way through
46:46
the loop it means that we've sent the
46:47
request to every single server
46:49
in which case there's no more timeouts
46:51
we just have to wait for one of them to
46:52
come back
46:53
and so that's the the plain receive and
46:55
the return at the end
46:58
and then it's important to notice that
47:00
the done channel
47:01
is buffered now so that if you know
47:04
you've sent the result to three
47:05
different servers
47:06
you're going to take the first reply and
47:07
return but the others are going to want
47:10
to send responses too
47:11
and we don't want those go routines to
47:13
just sit around forever trying to send
47:14
to a channel that we're not reading from
47:16
so we make the buffer big enough that
47:17
they can send into the buffer and then
47:19
go away
47:20
and the channel just gets garbage
47:22
collected
47:27
that says like why can't the timer just
47:29
be garbage collected when nobody's
47:30
referencing it instead of having to to
47:32
wait when it goes off when you said that
47:34
you have multiple
47:34
waiting if it goes off in one
47:36
millisecond yeah the the problem is the
47:37
timer
47:38
is referenced by the the run time it's
47:41
in the list of active timers
47:43
and so calling stop takes it out of the
47:45
list of active timers
47:46
and and so like that's arguably kind of
47:48
a wart in that
47:49
like in the specific case of a timer
47:51
that's like
47:53
only going to ever get used in this
47:55
channel way like we could have special
47:56
case that by like
47:57
having the channel because inside the
47:59
timer is this t.c channel right
48:01
so we could have had like a different
48:03
kind of channel implementation
48:04
that inside had a bit that said hey i'm
48:06
a timer channel right
48:08
and and and then like the select on it
48:10
would like know to just wait
48:12
but if you just let go of it it would
48:13
just disappear we've kind of like
48:16
thought about doing that for a while but
48:17
we never did and
48:18
so this is like the state of the world
48:20
um but but you know the garbage
48:22
collector can't distinguish between
48:24
you know the reference inside the
48:25
runtime and the reference and the rest
48:26
of the program it's all just references
48:28
and so until we like special case that
48:31
channel in some way like we we can't
48:33
actually get rid of that
48:37
thank you sure so um so then the only
48:40
thing we have left is to
48:42
have this preference where we try to use
48:43
the same um id that we did the previous
48:46
time
48:47
and so to do that preference um
48:50
we you know had the server id coming
48:52
back in the reply anyway
48:53
in the result channel and so you know we
48:56
do the same sort of
48:57
loop but we loop over an offset from the
48:59
actual id we're going to use which is
49:01
the pre
49:02
the preferred one and then when we get
49:04
an answer
49:05
we uh set the preferred one to where we
49:07
got the answer from and then we reply
49:09
and you'll notice that i used a go to
49:10
statement that's okay if you need to go
49:12
to it's fine
49:13
um it's not sort of there's no zealotry
49:16
here
49:18
all right so uh the fourth one and then
49:21
we'll we'll do some questions
49:22
um is a protocol multiplexer and this is
49:26
kind of the logic of a core of any rpc
49:28
system
49:29
and and this comes up a lot i feel like
49:31
i wrote a lot of these in grad school
49:33
and sort of years after that
49:35
and so the basic api of a protocol
49:38
multiplexer
49:38
is that it sits in from some service
49:40
which we're going to pass to the init
49:42
method
49:43
and then having been initialized with a
49:44
service you can call
49:47
and you can call call and give it a
49:49
message a request message and then it'll
49:51
you know give you back the reply message
49:53
at some point
49:54
and the things it needs from the service
49:55
to do multiflexing
49:57
is that given a message it has to be
49:59
able to pull out the tag that uniquely
50:01
identifies the message
50:02
and and will identify the the reply
50:04
because it will come back in with a
50:06
matching tag and then it needs to be
50:08
able to send a message out
50:09
and to receive you know a message but
50:11
the send and receive
50:13
um are there arbitrary messages that are
50:14
not matched
50:16
it's the multiplexer's job to actually
50:18
match them
50:21
so um to start with we'll have a go
50:23
routine that's in charge of calling send
50:25
and another group team that's in charge
50:27
of calling receive both in just a simple
50:29
loop
50:29
and so to initialize the service we'll
50:31
set up the structure and then we'll kick
50:33
off the send loop and the receive loop
50:35
and then we also have a map of pending
50:37
requests and the map
50:39
it maps from the tag that we saw the id
50:42
number in the messages to
50:43
a channel where the reply is supposed to
50:45
go
50:47
the send loop is fairly simple you just
50:50
range over the things that need to be
50:51
sent and you send them
50:52
and this just has the effect of
50:53
serializing the calls to send because
50:55
we're not going to force
50:57
the service implementation to you know
50:59
deal with
51:00
us sending you know from multiple
51:01
routines at once we're serializing it so
51:03
that it can just be thinking of
51:04
you know sending one one packet at a
51:06
time
51:09
and then the receive loop uh is a little
51:11
bit more complicated it pulls a receive
51:13
it pulls a reply off the
51:14
the service and again they're serialized
51:17
so we're only reading one at a time
51:18
and then it pulls the tag out of the
51:20
reply and then it says ah i need to find
51:23
the channel to send this to
51:25
uh so it pulls the channel out of the
51:26
pending map it takes it out of the
51:28
pending map so that you know if we
51:30
accidentally get another one we won't
51:32
try to send it
51:33
and then it sends the reply and then to
51:36
do a call
51:38
you just have to set yourself up in the
51:39
map and then hand it to send and wait
51:41
for the reply
51:42
so we start off we get the tag out
51:45
we make our own done channel we insert
51:48
the tag
51:48
into the map after first checking for
51:50
bugs and then
51:52
we send the the argument message to send
51:55
and then we wait for the reply to come
51:56
in undone
51:57
it's very very simple i mean like i used
51:59
to write these sort of things in c and
52:01
it was it was much much worse
52:04
so that was all the patterns that i
52:06
wanted to show
52:08
and um you know i hope that those end up
52:09
being useful for you in whatever future
52:12
program you're writing
52:13
and and i hope that they're you know
52:15
just sort of good ideas even in non-go
52:18
programs but that you know thinking
52:19
about them and go can help you when you
52:21
go to do other things as well
52:23
so i'm gonna put them all back up and
52:24
then um i have some questions that fran
52:27
sent that were you know from all of you
52:30
and um we'll probably have some time for
52:32
uh you know questions from from the chat
52:34
as well
52:35
i have no idea in zoom where the chat
52:37
window is so
52:39
when we get to that people can just
52:40
speak up just
52:42
i don't use zoom on a daily basis
52:44
unfortunately um
52:46
so uh and and normally i know how to use
52:48
zoom like regularly but with with the
52:50
presentation it's like zoom is in this
52:51
minimize thing that doesn't have half
52:53
the things i'm used to
52:54
anyway um someone asked how long ago
52:57
took
52:57
and so far it's been about 13 and a half
53:00
years
53:01
we started discussions in late september
53:03
2007
53:04
i joined full-time in august 2008 when i
53:06
finished at mit
53:08
we did the initial open source launch
53:10
november 2009
53:12
we released go one the sort of first
53:14
stable version in october 2011.
53:17
uh or sorry the plan was october 2011.
53:19
go one itself was march 2012.
53:22
and then we've just been on you know
53:23
it's a regular schedule since then
53:25
the next major change of course is is
53:27
going to be generics
53:28
and um and adding generics and that's
53:30
probably going to be go 118
53:32
which is going to be next in february
53:37
someone asked you know how big a team
53:38
does it take to build a language like go
53:41
and you know for those first two years
53:43
there were just five of us
53:45
and and that was enough to get us to uh
53:47
you know something that we released that
53:49
actually could run in production
53:51
but it was fairly primitive um you know
53:54
it was it was a good prototype it was a
53:55
solid working prototype but
53:57
but it wasn't like what it is today and
53:59
over time we've expanded a fair amount
54:01
now we're up to something like 50 people
54:03
employed directly or employed by google
54:05
to work directly on go
54:07
and then there's tons of open source
54:09
contributors i mean there's literal cast
54:11
of thousands that have helped us over
54:12
the last 13 years
54:14
and there's absolutely no way we could
54:15
have done it even with 50 people
54:17
without all the different contributions
54:19
from the outside
54:23
someone asked about design priorities um
54:26
and and motivations and you know we we
54:29
built it for us
54:30
right the priority was to build
54:31
something that was gonna help google and
54:33
it just turned out that google was like
54:35
a couple years ahead we were just in a
54:36
really lucky spot where google was a
54:38
couple years ahead of the rest of the
54:39
industry
54:40
on having to write distributed systems
54:42
right now everyone using
54:43
cloud software is is writing programs
54:45
that talk to other programs and sending
54:47
messages and you know there's
54:49
hardly any single machine programs
54:51
anymore
54:52
and so you know we sort of locked into
54:55
at some level
54:56
you know building the language that we
54:58
that the rest of the world needed a
54:59
couple years later
55:01
and and then the other thing that that
55:03
was really a priority was making it work
55:04
for
55:05
large numbers of programmers and because
55:07
you know google had a very large number
55:09
of programmers working in one code base
55:11
and and now we have open source where
55:13
you know even if you're a small team
55:15
you're depending on code that's written
55:16
by a ton of other people usually
55:18
and so a lot of the the issues that come
55:21
up with just having many programmers
55:22
still come up in that context
55:24
so those were really the things we were
55:26
trying to solve
55:28
and you know for all of these things we
55:30
we took a long time
55:31
before we were willing to actually
55:33
commit to putting something in the
55:34
language like everyone basically had to
55:36
agree
55:36
in the the core original group and and
55:39
so that meant that
55:41
it took us a while to sort of get the
55:42
pieces exactly the way we wanted them
55:44
but once we got them there they've
55:46
actually been very stable and solid and
55:48
really nice and they work together well
55:50
and and the same thing is kind of
55:52
happening with generics now
55:53
where we actually feel i feel personally
55:56
really good about generics i feel like
55:58
it feels like the rest of go and that
56:00
just wasn't the case for the proposals
56:02
that we had
56:02
you know even a couple years ago much
56:04
less the you know early ones
56:08
uh someone said they they really like
56:10
defer uh which is unique to language and
56:12
and i do too
56:13
thank you um but i wanted to point out
56:15
that you know we we did absolutely
56:17
you know create defer for go but um
56:20
swift has adopted it and i think there's
56:21
a proposal for sipos bus to adopt it as
56:23
well so you know hopefully it kind of
56:25
moves out a little bit
56:29
there was a question about um go and
56:31
using capitalization for exporting
56:34
and which i know is like something that
56:35
uh you know
56:37
sort of is jarring when you first see it
56:39
and and the story behind that is that
56:41
well
56:41
we needed something and we knew that we
56:43
would need something but like at the
56:44
beginning we just said look everything's
56:45
exported everything's publicly visible
56:47
we'll deal with it later and after about
56:50
a year it was like clear that we needed
56:51
some way to
56:52
you know let programmers hide things
56:54
from other programmers
56:55
and you know c plus plus has this public
56:58
colon and private colon
57:00
and in a large struct it's actually
57:02
really annoying that like
57:03
you're looking you're in the you're
57:04
looking at definitions and you have to
57:06
scroll backwards and try to find where
57:08
the like most recent public colon or
57:09
private colon was
57:10
and if it's really big it can be hard to
57:12
find one and so it's like hard to tell
57:14
whether a particular definition is
57:15
public or private and then in java of
57:17
course it's at the beginning of every
57:19
single field
57:20
and that seemed kind of excessive too
57:22
it's just too much typing
57:24
and so we looked around some more and
57:26
and someone pointed out to us that well
57:27
python has this convention where you put
57:29
an underscore in front to make something
57:31
hidden
57:32
and that seemed interesting but you
57:34
probably don't want the default to be
57:35
not hidden
57:36
you want the default to be hidden um and
57:39
then we thought about well we could put
57:40
like a plus in front of names
57:42
um and then someone suggested well like
57:46
what about
57:46
uppercase could be exported and it
57:48
seemed like a dumb terrible idea
57:50
it really did um but as you think about
57:53
it like
57:54
i really didn't like this idea um and i
57:56
have like very clear memory of sitting
57:58
of like the room and what i was staring
58:00
at as we discussed this
58:02
uh but i had no logical argument against
58:04
it and it turned out it was fantastic
58:06
it was like it seemed bad it just like
58:08
aesthetically
58:09
but it is one of my favorite things now
58:11
about go that when you look at a use of
58:13
something
58:14
you can see immediately you get that bit
58:16
of is this something that other people
58:18
can access or not
58:19
at every use because if you know you see
58:21
code calling a function to do
58:23
you know whatever it is that it does you
58:24
think oh wow like
58:26
can other people do that and and you
58:28
know your brain sort of takes care of
58:29
that but now i go to c
58:30
plus and i see calls like that and i get
58:33
really worried i'm like wait is that is
58:34
that something other classes can get at
58:37
um and having that bid actually turns
58:39
out to be really useful for for reading
58:40
code
58:42
a couple people asked about generics if
58:44
you don't know we have an active
58:46
proposal for generics we're actively
58:48
working on implementing it
58:49
we hope that the the release later in
58:52
the year
58:52
uh towards the end of the year will
58:54
actually have you know a full version of
58:56
generics that you can you can actually
58:57
use
58:58
the the um that'll be like a preview
59:00
release the real release that we hope it
59:02
will be in
59:03
is go 118 which is february of next year
59:06
so maybe next class
59:07
uh we'll actually get to use generics
59:09
we'll see
59:10
but i'm certainly looking forward to
59:12
having like a generic min and max the
59:13
reason we don't have those is that
59:15
you'd have to pick which type they were
59:16
for or have like a whole suite of them
59:18
and it just seemed silly it seemed like
59:19
we should wait for generics
59:22
um someone asked is there any area of
59:25
programming where go
59:26
may not be the best language but it's
59:28
still used
59:29
and and the answer is like absolutely
59:31
like that happens all the time with
59:32
every language
59:33
um i think go is actually really good
59:35
all around language
59:37
um but you know you might use it for
59:39
something that's not perfect for
59:41
just because the rest of your program is
59:43
written and go and you want to
59:44
interoperate with the rest of the
59:45
program
59:46
so you know there's this website called
59:47
the online encyclopedia of integer
59:49
sequences
59:50
it's a search engine you type in like
59:51
two three five seven eleven and it tells
59:53
you those are the primes
59:54
um and it turns out that the back end
59:56
for that is all written and go
59:58
and if you type in a sequence it doesn't
60:00
know it actually does some pretty
60:01
sophisticated math on the numbers
60:03
all with big numbers and things like
60:04
that and all of that is written in go to
60:06
because
60:07
it was too annoying to shell out to
60:09
maple and mathematica and
60:10
sort of do that cross-language thing
60:12
even though you'd much rather implement
60:13
it in those languages
60:14
so you know you run into those sorts of
60:16
compromises all the time and that's fine
60:20
um someone asked about uh
60:24
you know go is supposed to be simple so
60:26
that's why there's like no generics and
60:27
no sets
60:29
but isn't also for software developers
60:30
and don't software developers need all
60:32
this stuff and you know it's silly to
60:33
reconstruct it
60:35
and i think that's it's true that
60:36
there's someone in tension but but
60:38
simplicity in the sense of leaving
60:40
things out was not ever the goal
60:42
so like for sets you know it just seemed
60:45
like maps are so close to sets you just
60:47
have a
60:47
set a map where the value is empty or a
60:49
boolean
60:50
that's a set and for generics like you
60:53
have to remember that when we started go
60:55
in 2007 java was like just
60:58
finishing a true fiasco of a rollout of
61:01
generics
61:02
and so like we were really scared of
61:04
that we knew that if we just tried to do
61:06
it
61:06
um you know we would get it wrong and we
61:09
knew that we could write a lot of useful
61:10
programs without generics
61:12
and so that was what we did and um and
61:15
we came back to it when you know we felt
61:17
like
61:17
okay we've you know spent enough time
61:19
writing other programs we kind of know a
61:20
lot more about what we need from from
61:22
generics for go
61:24
and and we can take the time to talk to
61:25
real experts and i think that you know
61:28
it would have been nice to have them
61:30
five or ten years ago but we wouldn't
61:31
have had
61:32
the really nice ones that we're going to
61:34
have now so i think it was probably the
61:35
right decision
61:40
um so there was a question about go
61:42
routines and the relation to the plan
61:43
line thread library which which was all
61:45
cooperatively scheduled
61:46
and whether go routines were ever
61:48
properly scheduled and like if that
61:49
caused problems
61:51
and it is absolutely the case that like
61:53
go and and
61:54
the go routine runtime were sort of
61:56
inspired by previous experience on plan
61:58
nine
61:59
there was actually a different language
62:01
called aleph on an early version plan
62:03
nine
62:03
that was compiled it had channels it had
62:06
select
62:07
it had things we called tasks which were
62:09
a little bit like our teens but it
62:11
didn't have a garbage collector and that
62:12
made things really annoying in a lot of
62:14
cases
62:15
and also the way that tasks work they
62:17
were tied to a specific thread so you
62:19
might have
62:19
three tasks in one thread and two tasks
62:22
and another thread
62:23
and in the three tasks in the first
62:24
thread the only one ever ran at a time
62:27
and they could only reschedule during a
62:29
channel operation
62:30
and so you would write code where those
62:32
three tasks were all operating on the
62:33
same data structure
62:35
and you just knew because it was in your
62:37
head when you wrote it
62:38
that you know it was okay for these two
62:40
different tasks to be scribbling over
62:42
the same data structure because they
62:43
could never be running at the same time
62:45
and meanwhile you know in the other
62:46
thread you've got the same situation
62:48
going on with different data and
62:49
different tasks
62:50
and then you come back to the same
62:51
program like six months later and you
62:52
totally forget which tasks could
62:54
write to different pieces of data and
62:56
i'm sure that we had tons of races i
62:58
mean it was just
62:59
it was a nice model for small programs
63:01
and it was a terrible model for for
63:03
programming over a long period of time
63:04
or having a big program that other
63:06
people had to work on
63:07
so so that was never the model for go
63:09
the model for go was always
63:11
it's good to have these lightweight go
63:12
routines but they're gonna all be
63:14
running independently and if they're
63:16
going to share anything they need to use
63:17
locks and they need to use channels to
63:18
commute to communicate
63:20
and coordinate explicitly and and that
63:23
that has definitely scaled a lot better
63:25
than any of the planned line stuff ever
63:26
did
63:27
um you know sometimes people hear that
63:30
go routines are cooperatively scheduled
63:31
and they they think you know something
63:33
more like that
63:34
it's it's true that early on the go
63:37
routines were not as preemptively
63:39
scheduled as you would like
63:41
so in the very very early days the only
63:43
preemption points when you called into
63:44
the run time
63:45
shortly after that the preemption points
63:47
were any time you entered a function
63:50
but if you were in a tight loop for a
63:52
very long time that would never preempt
63:54
and that would cause like garbage
63:55
collector delays because the garbage
63:56
collector would need to
63:57
stop all the go routines and there'd be
63:59
some guaranteeing stuck in a tight loop
64:00
and it would take forever to finish the
64:02
loop
64:03
um and so actually in the last couple
64:05
releases we finally started we figured
64:06
out how to get
64:07
um unix signals to deliver to threads in
64:10
just the right way
64:11
so that and we can have the right
64:12
bookkeeping to actually be able to use
64:14
that as a preemption mechanism
64:16
and and so now things are i think i
64:18
think the preemption delays for garbage
64:20
collection are actually bounded finally
64:22
but but from the start the model has
64:23
been
64:24
that you know they're running
64:25
preemptively and and they don't get
64:27
control over when they get preempted
64:30
uh as a sort of follow-on question
64:32
someone else asked uh you know where
64:34
they can look to in the source tree to
64:35
learn more about guru teams
64:37
and and the go team scheduler and and
64:39
the answer is that you know this is
64:41
basically a little operating system like
64:42
it's a little operating system that sits
64:44
on top of
64:44
the other operating system instead of on
64:46
top of cpus
64:48
um and so the first thing too is like
64:50
take six eight two eight which is like
64:52
there i mean i i worked on 6828 and and
64:54
xv6
64:56
like literally like the year or two
64:57
before i went and did the go run time
64:59
and so like there's a huge amount of 688
65:00
in the go runtime
65:02
um and in the actual go runtime
65:04
directory there's a file called proc.go
65:06
which is you know proc stands for
65:08
process because like that's what it is
65:09
in
65:09
the operating systems um and i would
65:12
start there like that's the file to
65:13
start with and then sort of pull on
65:14
strings
65:18
someone asked about python sort of
65:20
negative indexing
65:21
where you can write x of minus one and
65:23
and that comes up a lot
65:24
especially from python programmers and
65:27
and it seems like a really great idea
65:28
you write these like really nice elegant
65:30
programs where like you want to get the
65:31
last element you just say x minus one
65:33
but the real problem is that like you
65:35
have x of i and you have a loop that's
65:37
like counting down from
65:38
from you know n to zero and you have an
65:41
off by one somewhere and like now x of
65:43
minus one instead of being
65:44
you know x of i when i is minus one
65:47
instead of being an error where you see
65:48
like immediately say hey there's a bug i
65:50
need to find that
65:50
it just like silently grabs the element
65:52
off the other end of the array
65:54
and and that's where you know the sort
65:56
of python
65:57
um you know simplicity you know makes
66:00
things worse
66:02
and so that was why we left it out
66:03
because it was it was gonna hide bugs
66:04
too much we thought
66:06
um you know you could imagine something
66:08
where you say like x of dollar minus one
66:10
or len minus one
66:12
not len of x but just len but you know
66:14
it seemed like too much of a special
66:16
case and it really
66:17
it doesn't come up enough
66:20
um someone asked about uh you know
66:23
what aspect of go was hardest to
66:25
implement
66:26
and honestly like a lot of this is not
66:28
very hard um
66:29
we've done most of this before we'd
66:31
written operating systems and threading
66:33
libraries and channel implementations
66:35
and so like doing all that again was
66:36
fairly straightforward
66:38
the hardest thing was probably the
66:39
garbage collector
66:42
go is unique among garbage collected
66:43
languages in that it gives programmers a
66:45
lot
66:46
more control over memory layout so if
66:47
you want to have a struct with two
66:49
different
66:49
other structs inside it that's just one
66:51
big chunk of memory
66:52
it's not a struct with pointers to two
66:54
other chunks of memory
66:56
and because of that and you can take the
66:57
address of like the second field in the
66:59
struct and pass that around
67:00
and that means the garbage collector has
67:02
to be able to deal with a pointer that
67:03
could point into the middle of an
67:05
allocated object and that's just
67:06
something that java and lisp and other
67:08
things just don't do
67:10
um and so that makes the garbage
67:12
collector a lot more complicated in how
67:14
it maintains its data structures
67:16
and we also knew from the start that you
67:18
really want low latency because if
67:19
you're handling
67:20
network requests uh you can't you know
67:23
just
67:24
pause for 200 milliseconds while and
67:26
block all of those
67:27
in progress requests to do a garbage
67:29
collection it really needs to be
67:30
in you know low latency and not stop
67:32
things and we thought that multicore
67:34
would be a good
67:35
a good opportunity there because we
67:37
could have the garbage collector sort of
67:38
doing one core
67:39
and the go program using the other cores
67:41
and and that might work really well and
67:42
that actually did
67:43
turn out to work really well but it
67:45
required hiring a real expert in garbage
67:48
collection
67:48
to uh like figure out how to do it um
67:52
and make it work
67:53
but but now it's it's really great um i
67:56
i have a quick question yeah you said um
67:59
like if it's struck like it's declared
68:02
inside another stroke
68:03
it actually is all a big chunk of memory
68:06
yeah
68:06
why do why did you implement it like
68:08
that what's the reasoning behind that
68:10
um i well so there's a couple reasons
68:12
one is for a garbage collector right
68:14
it's a service
68:15
and the load on the garbage collector is
68:17
proportional to the number of objects
68:18
you allocate
68:19
and so if you have you know a struct
68:21
with five things in it you can make that
68:22
one allocation that's like a fifth of
68:25
the the load on the garbage collector
68:26
and that turns out to be really
68:27
important
68:28
but the other thing that's really
68:29
important is cache locality
68:31
right like if you have the processor is
68:33
pulling in chunks of memory
68:34
in like you know 64 byte chunks or
68:36
whatever it is and it's much better at
68:37
reading memory that's all together
68:39
than reading memory that's scattered and
68:41
so
68:42
um you know we have a git server at
68:44
google called garrett
68:46
that is written in java and it was just
68:47
starting at the time that
68:49
go was you know just coming out and and
68:51
we we just missed like garrett being
68:53
written and go i think by like a year
68:55
um but we talked to the the guy who had
68:57
written garrett and he said that like
68:59
one of the
68:59
biggest problems in in garrett was like
69:02
you have all these shot one hashes
69:03
and just having the idea of 20 bytes is
69:06
like impossible to have in java you
69:08
can't just have 20 bytes in a struct
69:10
you have to have a pointer to an object
69:12
and the object
69:13
like you know you can't even have 20
69:15
bytes in the object right you have to
69:16
declare like five different ins or
69:17
something like that to get 20 bites
69:19
and there's just like no good way to do
69:21
it and and it's just the overhead of
69:23
just a simple thing like that
69:25
really adds up um and so you know
69:28
we thought giving programmers control
69:30
over memory was really important
69:33
um so another question was
69:36
was about automatic parallelization like
69:38
for loops and things like that
69:40
we don't do anything like that in the
69:41
standard go tool chain there are there
69:43
are go compilers for go front ends for
69:45
gcc and llvm
69:47
and so to the extent that those do those
69:49
kind of loop optimizations in c
69:50
i think you know we get the same from
69:52
the go friends for those
69:54
but it's it's not the kind of
69:56
parallelization that we typically need
69:57
at google
69:58
it's it's more um you know lots of
70:00
servers running different things
70:02
and and so you know that sort of you
70:04
know like the sort of big vector math
70:06
kind of stuff doesn't come up as much so
70:08
it just hasn't been
70:09
that important to us um
70:12
and then the last question i have
70:14
written now is that someone uh
70:16
asked about like how do you decide when
70:18
to acquire release locks and why don't
70:19
you have re-entry locks
70:20
and for that i want to go back a slide
70:22
let me see
70:25
yeah here so like you know during the
70:28
lecture i said things like the lock pro
70:30
like new protects the map or it protects
70:32
the data
70:33
but what we really mean at that point is
70:34
that we're saying that the lock protects
70:36
some collection of invariants that apply
70:39
to the data or that are true of the data
70:41
and the reason that we have the lock is
70:42
to to protect the operations that depend
70:45
on the invariants
70:46
and that sometimes temporarily
70:47
invalidate the invariants from each
70:49
other
70:50
and so when you call lock what you're
70:52
saying is
70:53
i need to make use of the invariance
70:55
that this lock protects
70:57
and when you call unlock what you're
70:58
saying is i don't need them anymore and
71:00
if i
71:00
temporarily invalid invalidated them
71:03
i've put them back
71:04
so that the next person who calls lock
71:05
will see you know correct invariants
71:08
so in the mux you know we want the
71:10
invariant that each registered pending
71:12
channel
71:13
gets at most one reply and so to do that
71:16
when we take don out of the map
71:18
we also delete it from the map before we
71:20
unlock it
71:21
and if there was some separate kind of
71:22
cancel operation that was directly
71:24
manipulating the map as well
71:26
it could lock the it could call lock it
71:28
could take the thing out
71:30
call unlock and then you know if it
71:32
actually found one it would know
71:34
no one is going to send to that anymore
71:36
because i took it out
71:38
whereas if you know we had written this
71:40
code to have
71:41
you know an extra unlock and re-lock
71:43
between the done
71:44
equals pending of tag and the delete
71:47
then you wouldn't have that you know
71:49
protection of the invariants anymore
71:50
because you would have
71:51
put things back you unlocked and
71:53
relocked while the invariants were
71:55
broken
71:56
and so it's really important to you know
71:58
correctness
71:59
to think about locks as protecting
72:01
invariants
72:02
and and so if you have re-entrant locks
72:05
uh
72:06
all that goes out the window without the
72:08
re-entrant lock when you call lock
72:10
on the next line you know okay the lock
72:13
just got acquired
72:14
all the invariants are true if you have
72:16
a re-entrant lock
72:17
all you know is well all the invariants
72:19
were true
72:20
for whoever locked this the first time
72:22
who like might be way up here on my call
72:23
stack
72:24
and and you really know nothing um and
72:27
so that makes it a lot harder to reason
72:29
about like what can you assume
72:31
and and so i think reentrant locks are
72:32
like a really unfortunate part of java's
72:34
legacy
72:35
another big problem with re-engine locks
72:37
is that if you have code
72:38
where you know you call something and it
72:40
is depending on the re-entrant lock
72:42
for you know something where you've
72:43
acquired the lock up above
72:45
and and then at some point you say you
72:47
know what actually i want to like have a
72:48
timeout on this or i want to do it uh
72:50
you know in some other go routine while
72:52
i wait for something else
72:53
when you move that code to a different
72:55
go routine re-entrant always means
72:57
locked on the same stack that's like the
72:59
only plausible thing it could possibly
73:00
mean
73:01
and so if you move the code that was
73:03
doing the re-entrant lock
73:04
onto a different stack then it's going
73:07
to deadlock because it's going to
73:08
that lock is now actually going to real
73:10
lock acquire and it's going to be
73:11
waiting for you to let go of the lock i
73:13
mean you're not going to let go of it
73:14
because you know you think
73:14
that code needs to finish running so
73:16
it's actually like completely
73:17
fundamentally incompatible with
73:19
restructurings where you take code and
73:21
run it in different threads or different
73:22
guarantees
73:24
and so so anyway like my advice there is
73:25
to just you know think about locks as
73:26
protecting invariants
73:28
and then you know just avoid depending
73:30
on reentrant locks it it really just
73:32
doesn't scale well to to real programs
73:35
so i'll put this list back up
73:37
actually you know we have that up long
73:38
enough i can try to figure out how to
73:39
stop presenting
73:40
um and then i can take a few more
73:44
questions
73:47
um i had i had a question yeah um and
73:50
i mean i i think coming from python like
73:53
it's very
73:54
useful right it's very common to use
73:56
like like standard functional operations
73:58
right like map
73:59
yeah um or filter stuff like that like
74:03
um like list comprehension
74:06
and when you know i switched over to go
74:09
and started programming
74:10
it's used i i looked it up and people
74:13
say like you shouldn't do this
74:14
do this with loop right i was wondering
74:16
why
74:17
um well i mean one is that like you
74:20
can't do it the other way so you might
74:21
just look through the way you can do it
74:22
um but uh you know a bigger a bigger
74:25
issue is that
74:27
well there's that was one answer the
74:29
other answer is that
74:31
uh you know if you do it that way you
74:33
actually end up creating a lot of
74:34
garbage
74:35
and if you care about like not putting
74:37
too much load on the garbage collector
74:38
that kind of is another way
74:39
to avoid that you know so if you've got
74:43
like
74:43
a map and then a filter and then another
74:45
map like you can make that one loop over
74:47
the data instead of three loops over the
74:48
data each of which generate a new piece
74:50
of garbage
74:53
but you know now that we have generics
74:55
coming um you'll actually be able to
74:56
write those functions like you couldn't
74:58
actually write what the type signature
74:59
of those functions were
75:00
before and so like you literally
75:01
couldn't write them and python gets away
75:04
with this because there's no no
75:05
you know static types but now we're
75:07
actually going to have a way to do that
75:08
and i totally expect that once generics
75:10
go in there will be a package slices and
75:12
if you import slices you can do
75:13
slices.map and slices.filter
75:15
and like slices.unique or something like
75:17
that and and i think
75:18
those will all happen um and you know if
75:21
if that's the right thing then that's
75:22
great
75:25
thanks sure um
75:30
one of the hints that you had it was
75:33
about
75:33
running go routines that are independent
75:37
like concurrently um and some of the
75:40
examples of the code i
75:42
i think i couldn't understand it seemed
75:44
to me like you can
75:45
just like call the function in the same
75:48
thread
75:49
rather than a different thread and i was
75:53
not sure why you would call it in a
75:54
different thread
75:56
so um usually it's because you want
75:59
them to proceed independently so um so
76:02
in one of the
76:03
one of the examples we had like the
76:05
there was a loop that was sending
76:07
um you know tasks to the work queue
76:10
but there was the servers were running
76:12
in different go routines and reading
76:14
from the work queue and doing work
76:15
but then when they were done they would
76:17
send uh you know hey i'm done now to the
76:19
done channel
76:20
but ascend in go doesn't complete until
76:23
the receive
76:24
actually matches with it and so if the
76:27
thing that's sending on the work queue
76:29
is not going to start receiving from the
76:30
done channel until it's done sending to
76:32
all the work queues
76:33
or sending all the work into all the
76:35
tasks into the work queue
76:37
then now you have a deadlock because the
76:39
the main thread
76:40
the main go routine is trying to send
76:42
new work to the servers
76:44
the servers are not taking new work
76:45
they're trying to tell the main thread
76:47
that they're done
76:48
but the main thread's not going to
76:49
actually start at like reading from the
76:51
done channel
76:52
until it finishes giving out all the
76:53
work and so there's just they're just
76:55
staring at each other waiting for
76:56
different things to happen
76:57
whereas if we take that loop that if we
77:00
just put the little girl routine around
77:01
the loop that's sending the work
77:03
then that can go somewhere else and then
77:05
it can proceed independently and while
77:06
it's stuck
77:07
waiting for the servers to send to um
77:10
take more work
77:10
the servers are stuck waiting for the
77:12
main go routine to you know
77:14
acknowledge that it finished some work
77:17
and now the main goal team actually gets
77:18
down to the loop
77:19
that you know pulls that finishes that
77:21
actually acknowledges that it finished
77:23
the work that reads from the done
77:24
channel
77:25
and so it's just a way to separate out
77:26
you know these are two different things
77:28
that logically
77:29
they didn't have to happen one after the
77:30
other and because they were happening
77:32
one after the other that caused a
77:33
deadlock and by taking one out and
77:35
moving it let it run independently
77:37
um that removes the deadlock
77:41
thank you so much sure could you talk a
77:44
little bit about how ghost race detector
77:46
is implemented
77:48
sure it is the llvm race detector um
77:51
and so that probably doesn't help but
77:53
but it is exactly the thing that
77:54
llvm calls thread sanitizer and um
77:58
and so we actually have a little binary
77:59
blob that uh you know we link against
78:02
because we don't want to depend on all
78:03
of lvm but it's the llvm race detector
78:06
and the way the llvm race sector works
78:08
is that it allocates a ton of
78:10
extra virtual memory and then based on
78:12
the address of
78:13
of the thing being read or written it
78:15
has this other
78:16
you know spot in virtual memory where it
78:18
records
78:19
information about like the last uh
78:22
thread you know it thinks of threads but
78:24
their go routines
78:25
um has with the last thread that did a
78:27
read or a write
78:28
and then also every time a synchronizing
78:30
event happens like you know a
78:32
communication from one go routine to
78:33
another
78:34
uh that counts as establishing a happens
78:37
before edge between two different go
78:38
routines
78:39
and if you ever get something where you
78:41
have a read and a write
78:43
and they're not properly sequenced right
78:46
like so if you have a read and then it
78:47
happens before something in another
78:49
chain which then you know later does the
78:50
right that's fine
78:52
but if you have a read and a write and
78:53
there's no happens before sequence that
78:55
connects them
78:56
then um then that's a race and it
78:58
actually you know
78:59
has some pretty clever ways to you know
79:02
dynamically figure out quickly you know
79:04
did this read happen is there a happens
79:07
before a path between this readings
79:08
right as they happen
79:09
and it slows down the program by like
79:11
maybe 10x
79:12
but you know if you just divert a small
79:14
amount of traffic there
79:15
that's probably fine if it's for testing
79:18
that's also probably fine
79:19
and it's way better than like not
79:20
finding out about the races so it's
79:22
totally worth it
79:23
and honestly 10 or 20 x is is fantastic
79:26
the original thread sanitizer was more
79:28
like
79:28
100 or a thousand x and that was not
79:30
good enough well what's the rate
79:32
detector called
79:33
lrvm uh it's called thread sanitizer but
79:36
it's part of llvm
79:37
which is um the clang c compiler the the
79:40
one that
79:41
um almost everyone uses now
79:44
is is part of the llvm project
79:54
can you talk about slices um and like
79:56
the design choices having them as
79:58
views on a raise which like confused me
80:00
at first
80:01
yeah yeah it is a little confusing at
80:03
first um
80:05
the the main thing is that you want it
80:06
to be efficient to kind of walk through
80:08
an array or to like
80:09
you know if you're in quicksort or merge
80:11
sword or something where you have an
80:12
array of things
80:13
and now you want to say well now sort
80:14
this half and sort the other half
80:16
you want to be able to efficiently say
80:18
like here this is half of the previous
80:20
one like
80:21
you know sort that and so in c the way
80:24
you do that is you just pass in
80:26
you know the pointer to the first
80:27
element and the number of elements
80:29
and that's basically all a slice is and
80:31
then the other pattern that comes up a
80:33
lot when you're
80:34
you know trying to be efficient with
80:36
arrays is you have to grow them
80:38
and and so you don't want to recall
80:40
realic on every single
80:42
new element you want to amortize that
80:44
and so the way you do that
80:46
in in c again is that you have a base
80:48
pointer you have the length that you're
80:50
using right now and you have the length
80:51
that you allocated
80:52
and then to you know add one you you
80:54
check and see if the length is is bigger
80:56
than the amount you allocated if so you
80:57
reallocate it and otherwise you just
80:58
keep bumping it forward
81:00
and and slices are really just an
81:02
encoding of those idioms
81:03
because those are kind of the most
81:05
efficient way to manage the memory
81:07
and so in in any kind of like c plus
81:09
vector or
81:10
um sort of thing like that that's what's
81:12
going on underneath
81:14
but it makes it a lot harder to um like
81:16
the c plus
81:17
vector because of ownership reasons you
81:20
know the vector is tied to the actual
81:21
underlying memory it's a lot harder to
81:23
get
81:23
like a sub vector that's just the view
81:25
onto like the second half for merge sort
81:28
so that's sort of the idea is that it
81:30
just like there are all these patterns
81:32
for
81:32
accessing memory efficiently that came
81:33
from c and we tried to make them fit and
81:36
to go in an idiomatic way
81:38
in a safe way
81:42
can you talk about how you decided to um
81:45
implement the go like remote module
81:47
system where you import directly from a
81:48
url
81:49
versus like yeah um i mean i just didn't
81:52
want to run a service and like like
81:54
you know a lot of the things like ruby
81:57
gems and those like were not as as
81:59
for the front of my mind at the time
82:02
just because they were newer
82:03
but like i had used pearl for a while
82:04
and like cpan and and i just thought it
82:07
was it was insane that like
82:08
everyone was fighting over these short
82:10
names like db you know
82:12
there probably shouldn't be an argument
82:13
over like who gets to make the db
82:15
package
82:16
um and so putting domain names in the
82:19
front seemed like a good way to
82:20
decentralize it
82:21
and and it was also a good way for us
82:23
not to run any server because you know
82:24
we could just say well
82:25
you know we'll recognize the host name
82:27
and then and then go grab it from source
82:28
control
82:29
um from someone else's server and that
82:31
turned out to be a really great idea i
82:33
think
82:34
um because we just we don't have that
82:36
kind of same infrastructure
82:37
that other things depend on like in the
82:40
java world it's actually
82:42
really problematic there are multiple
82:43
there's no sort of standard registry but
82:46
they all use these short names
82:47
and so uh like maven can be configured
82:50
to build from multiple different
82:52
registries
82:53
and you if you're an open source
82:55
software package provider you actually
82:56
have to go around and be sure that you
82:58
upload it to all the different
82:59
registries
82:59
because if you don't if you miss one and
83:01
it becomes popular someone else will
83:03
upload different code to that one
83:05
and um and then like maven actually just
83:07
takes whichever one comes back first it
83:08
just like sends a request to all of them
83:10
and whatever comes back first so like
83:12
you know if someone wants to make a
83:13
malicious copy of your package all you
83:14
do is find some registry other people
83:16
use that you forgot to upload it to
83:17
and like you know they get to win the
83:19
race sometimes
83:21
so it's like it's a real problem like i
83:23
think having domain name there really
83:25
helps split up the ownership in a really
83:27
important way
83:28
thank you sure
83:34
so the maybe we should take a quick uh
83:37
pause here
83:37
those people that have to go can go i'm
83:39
sure russ is willing to uh stick around
83:41
for a little bit longer
83:42
yeah and answer any questions uh but i
83:44
do want to thank
83:45
ross for giving this lecture uh you know
83:47
hopefully this will help you running
83:49
more
83:49
good go programs these patterns
83:52
and uh so thank you russ very welcome
83:56
it's nice to be here
84:00
and then more questions feel free to ask
84:01
questions yeah
84:03
oh just a little logistical thing uh the
84:06
slides that are on the 6824 website are
84:09
not
84:10
they exactly the same as russ's slides
84:12
people
84:13
check them out i'll get franz a new pdf
84:15
yeah
84:18
more general question about when is
84:21
writing a new language the
84:23
like the best solution to a problem
84:25
that's a great question
84:26
um it's almost never the best solution
84:30
but you know at the time we had just an
84:32
enormous number of programmers like
84:34
thousands of programmers working in one
84:35
code base
84:36
and the compilations were just taking
84:39
forever because
84:40
um seatbelts plus was just not not meant
84:42
for you know efficient incremental
84:44
compilation
84:46
and and so it and furthermore
84:49
at the time like threading libraries
84:51
were really awful like people just
84:52
didn't use threats i remember
84:54
like one of the first days i was at mit
84:55
and talking to robert and robert said to
84:57
me
84:58
um like in 2001 he said to me like well
85:00
we don't use threads here because
85:01
threads are slow
85:02
and and that was like totally normal
85:04
like that was just the way the world at
85:05
the time
85:06
um and and at google we were having a
85:09
lot of trouble
85:10
because it was all event-based like
85:11
little callbacks in c plus plus
85:13
and there were these multi-core machines
85:15
and we actually didn't know how to get
85:17
things to work on them because like
85:18
linux threads were not something you
85:20
could really rely on to work
85:22
and and so we ended up like if you had a
85:24
four core machine you just run four
85:25
different
85:26
process in completely independent
85:27
processes of the web server and just
85:28
treat it as like four machines
85:30
um and that was clearly like not very
85:32
efficient
85:33
so like there were a lot of good reasons
85:35
to like try something
85:37
um but you know it's a huge amount of
85:40
work to get to the point where go is
85:42
today and i think that
85:43
um so much is not the language right
85:46
like there were important things that we
85:47
made did in the language that enabled
85:48
other
85:49
um considerations but uh
85:52
so much of the successful languages the
85:54
ecosystem that got built up around it
85:55
and the tooling that we built and the go
85:57
command and like all these like not the
85:59
language things so
86:00
you know programming language uh people
86:02
who are like focus on the language
86:03
itself
86:04
i think sometimes get distracted by all
86:06
the stuff around like they miss all the
86:08
stuff around it
86:15
um can i ask a follow-up on that yeah i
86:18
was wondering how is
86:19
working on go different now since it's
86:22
more mature than
86:23
it was before oh
86:27
that's a great question um you know in
86:29
the early days it was so easy to make
86:31
changes
86:31
and now it's really hard to make changes
86:33
i think that's the number one thing
86:35
um you know in the early days like
86:39
everything was in one source code
86:40
repository literally all the go code in
86:42
the world was the one source code
86:43
repository and so like there were days
86:44
where we changed the syntax like you
86:45
used to have a star before chan
86:47
every time you set a channel because it
86:48
was then it was a pointer underneath and
86:50
it was all kind of exposed so you'd
86:52
always say star channel instead of jan
86:53
and
86:54
and and similarly for maps and at some
86:56
point we realized like
86:57
this is dumb like you have to say the
86:59
star let's just take it out
87:00
and um and so like we made the change to
87:02
the compiler and i opened up literally
87:04
like
87:04
the couple hundred go source files in
87:06
the world in my editor and like
87:08
the entire team stood behind me and like
87:10
i typed some regular expressions and we
87:11
looked at the effect on the files
87:13
yep that looks right save it you know
87:15
compile it we're done
87:16
and like today you know we can't make
87:18
backwards compatible changes at all
87:20
um and and even making you know new
87:23
changes like it
87:25
it affects a lot of people and so uh
87:28
you know you sort of propose something
87:30
and you know people point out well this
87:32
won't work for me and you try to like
87:33
adjust that maybe
87:35
um it's just it's a lot harder we
87:36
estimate there's at least a million
87:38
maybe two million go programmers in the
87:39
world and
87:40
it's very different from when they were
87:41
you know four or five
87:51
not sure if this is a valid question but
87:53
what what language is go written in is
87:55
it written in go also or no
87:57
now it is now it is the original um
87:59
compiler runtime were written in c
88:01
but a few years ago we went through a
88:02
big um we actually wrote a
88:05
a program to translate c to go and that
88:07
only worked for rc code but still it was
88:09
good enough
88:10
so that we wouldn't lose kind of all the
88:12
sort of encoded knowledge in that code
88:14
about why things were the way they were
88:15
and like how things work so we have to
88:16
start from scratch
88:17
but now it's all written and go and you
88:19
know a little bit of assembly
88:21
and that means that um people can uh you
88:24
know
88:24
people who know go can help on the the
88:26
go project whereas before like
88:28
if you wanted to work on the compiler or
88:30
the runtime you had to know c really
88:32
well and like
88:33
we weren't getting a lot of people knew
88:34
c really well like there's not actually
88:35
that many of them proportionately and
88:37
and furthermore like our entire user
88:38
base is go programmers not c programmers
88:40
so moving to go was was a really big
88:42
deal
88:46
i was wondering how did you prioritize
88:48
what features to add to the language at
88:50
like this point like in all generics
88:52
like a lot of people were like asking
88:54
for that like
88:55
did y'all know like how you choose what
88:57
to work on
88:59
i mean we've considered language mostly
89:00
frozen for a while
89:02
and um and so we haven't been adding
89:04
much uh there was a long period where we
89:05
said we weren't adding anything and then
89:07
we added a little bit of things
89:08
in the last couple years to lead up to
89:10
generics just kind of shake the rust off
89:12
on like all the like
89:13
what breaks when you change something in
89:14
the language so like you can put
89:16
underscores between digits and long
89:17
numbers now
89:18
things like that um but you know
89:21
generics has clearly been the next thing
89:22
that
89:23
needed to happen and we just had to
89:24
figure out how
89:26
in general we try to only add things
89:28
that don't have weird kind of
89:30
interference with other features
89:31
and we try to add things that are you
89:33
know really important that will help a
89:34
lot of people for the kinds of programs
89:36
that
89:37
we're trying to target with go which is
89:38
like distributed systems and
89:40
that sort of thing
89:50
cool thank you oh i had a question
89:53
actually yeah
89:54
uh so um for i noticed that like you
89:58
know
89:58
uh go doesn't have like basic functions
90:01
like min or max for like
90:03
yeah so is that like something that
90:05
you're considering like say adding with
90:07
like the generic stuff maybe is that why
90:09
you didn't decide yeah exactly right
90:11
because like you can't have a min
90:12
you'd have been event and you could have
90:13
minivan date but those had to have
90:15
different names and that was kind of
90:16
annoying
90:17
um so now we can write just a generic
90:19
name over any type that has a less than
90:21
operator
90:23
yeah that'll be good and you know
90:24
honestly like for the specific case of
90:26
min and max
90:27
so i know it's not that hard to code i
90:30
know i was gonna say i'm starting to
90:31
feel like we should just make some
90:32
built-ins like
90:32
like you know print and things like that
90:35
so that you know you can just always
90:36
have them
90:37
but even if we don't like you it'll be
90:39
math.min and that'll be there at least
90:41
um yeah we really didn't want to make
90:43
them built-ins until we could like
90:44
express their types and we couldn't do
90:46
that until generics happened
90:48
because there is actually a min for like
90:49
floating points actually
90:51
yeah i know it's kind of weird because
90:53
it's because the math library is
90:55
basically copied from the c math.h set
90:57
of things yes
91:00
so that's a good point like we can't
91:02
actually put them in math because
91:03
they're already there
91:04
okay but no yeah but we'll figure it out
91:07
like i think we should probably just put
91:09
them in the language but we have to get
91:10
generis through first
91:11
and another thing actually i noticed
91:13
that you did usako like competitive
91:15
programming yeah i did too
91:16
actually oh cool yeah so how did you so
91:20
actually i included this in one of the
91:22
questions that i
91:22
submitted let me pull it up um so my
91:25
question was like
91:26
um how did how was like how did you
91:31
go from doing competitive programming to
91:33
like doing what you
91:34
you're doing now at google working on
91:36
going how's the transition
91:37
between like competitive programming to
91:40
systems also
91:41
finally what made you decide to go into
91:43
systems and how did it relate to
91:45
programming i mean competitive
91:46
programming at the time that i did it
91:48
was not as all-consuming as i gather it
91:50
is now
91:50
like like you know you could just like
91:53
be able to
91:54
implement a simple dynamic programming
91:55
like little two for loops and that was
91:57
fine and now you have
91:58
all these like complex hall algorithms
92:00
and all that stuff that i can't do
92:02
so like you know at some point like at
92:03
some level like it was different
92:05
um but you know i was actually more
92:08
interested in the sort of
92:09
systems you kind of stopped from the
92:10
start and and the the program contests
92:13
were just like something fun to do on
92:14
the side
92:15
so there wasn't like a huge transition
92:17
there um i was never into like
92:19
implementing complex algorithms and and
92:21
that you know max flow and all those
92:22
sorts of things
92:24
on the other hand like when you start a
92:26
new language you actually do get to
92:28
write a lot of
92:29
core things right um like someone has to
92:31
write the sort function
92:33
and it has to be a good general sort
92:34
function and like i spent a while
92:36
last month like looking into dip
92:38
algorithms and and that's like you know
92:40
sort of matches that background pretty
92:42
well so like it does come up
92:44
um but you know it's just it's just a
92:47
different kind of programming
92:48
oh so you thought of it as more of a
92:49
side thing back then no like yeah
92:51
it wasn't it was definitely not the sort
92:52
of main thing i did when i was writing
92:54
programs
92:54
yeah because like today it's effectively
92:56
like the main thing i know i know it's
92:58
you
92:58
know if you don't do it full-time like
93:00
there's just no way you can
93:02
you know there just weren't that many
93:03
people who cared it you know in
93:07
uh 1995 yeah 20 years later
93:15
um can you ask a related question to
93:16
that so how did you decide to go
93:19
from i'm from like academic work
93:22
into i mean your work is still like a
93:25
little bit more different than
93:27
like the usual like software engineering
93:29
thing but
93:30
still yeah um
93:33
you know i got lucky uh i i grew up near
93:35
bell labs in new jersey and so like that
93:37
was how i ended up working on playing
93:39
the iron a little bit in high school and
93:40
college
93:41
um and so you know i sort of knew i was
93:44
going to go to grad school and
93:45
you know the plan was to go back to bell
93:47
labs but it kind of imploded while i was
93:48
in grad school with the dot com boom and
93:51
the dot com crash
93:53
and um and so like you know google was
93:55
was sort of a
93:57
just vacuuming up phds systems phds at
93:59
the time
94:00
and and and doing really interesting
94:02
things i mean you probably you know
94:03
there's a i don't know i haven't looked
94:04
at syllabus for this year but you know
94:06
there's things like spanner and
94:07
um big table and chubby and and things
94:10
like that and you know they they had a
94:11
whole host of good distributed systems
94:13
kind of stuff going on
94:15
and so you know it was sort of lucky to
94:17
be able to to go to that too
94:19
um and you know at the time i graduated
94:22
i was also looking at you know
94:23
industrial research labs like microsoft
94:25
research and
94:26
and places like that so you know there's
94:28
definitely an opportunity there for
94:29
you know researchy things but not in
94:32
academia if that's what you want
94:34
um it's a little harder to find now i
94:36
mean most of the places i know like
94:37
microsoft research imploded too
94:39
a couple years later but um you know
94:42
it's uh it's still an option and and you
94:45
know it's just a
94:47
slightly different path um you end up
94:50
the the differences i see from academia
94:51
is like you end up caring a ton
94:53
more about actually making things work
94:55
100 time and supporting them for like a
94:57
decade or more
94:58
whereas like you finish your paper and
95:00
you kind of like get to put it off to
95:01
the side and that's that's really nice
95:02
actually
95:03
at some level um it's uh
95:06
it's definitely strange to me to be you
95:08
know editing source files
95:10
that i wrote you know in in some cases
95:13
actually 20 years ago
95:15
um because i used a bunch of code that
95:16
i'd already written when we started go
95:19
and it's very weird to think that like
95:20
i've been keeping this program running
95:22
for 20 years
95:26
thinking