字幕記錄 00:00 okay uh good afternoon good morning 00:03 good evening good night wherever you are 00:06 uh let's get started again 00:07 uh so uh today we have a guest lecture 00:11 and probably speaker that needs a little 00:13 introduction 00:14 uh uh there's uh russ cox uh who's one 00:16 of the 00:17 co-leads on the go uh project and 00:20 you know we'll talk a lot more about it 00:22 uh let me say a couple words 00:24 uh and not try to embarrass russ too 00:27 much 00:29 russia has a long experience with 00:30 distributed systems uh 00:32 he was a developer and contributor to 00:36 plan nine uh when he was a high school 00:38 student and as an undergrad at harvard 00:41 he joined the phd program at mit uh 00:44 which is where we met up and probably if 00:47 you're taking any 00:48 sort of you know pdos class if you will 00:51 there's going to be a 00:52 you will see russ's touches on it 00:55 and certainly in 824 you know the 00:59 the go switch to go for us has been a 01:02 wonderful thing 01:03 and uh but if you differ in opinion 01:07 of course feel free to ask russ 01:09 questions and make suggestions um he's 01:11 always welcome to uh entertain any 01:15 ideas so with that russ it's yours great 01:18 thanks can you still see the slides 01:20 is that working okay great so um so we 01:23 built go to 01:24 support writing the sort of distributed 01:26 systems that we were building at google 01:28 and that made go a great fit for you 01:30 know what came next which is now called 01:31 cloud software 01:32 and also a great fit for a24 um so 01:35 in this lecture i'm going to try to 01:37 explain how i think about writing some 01:39 current programs in go 01:41 and i'm going to walk through the sort 01:42 of design and implementation of 01:44 programs for four different patterns 01:47 that i see come up often 01:48 and along the way i'm going to try to 01:50 highlight some hints or rules of thumb 01:51 that you can keep in mind when designing 01:53 your own go programs 01:55 and i know the syllabus links to an 01:56 older version of these slides so you 01:58 might have seen them already 01:59 i hope that the lecture form is a bit 02:01 more intelligible than just sort of 02:02 looking at the 02:03 slides um and i hope that in general 02:06 these patterns are like common enough 02:08 that you know maybe they'll be helpful 02:09 by themselves but also that you know 02:11 you'll 02:12 you'll the hints will help you prepare 02:14 for whatever it is you need to implement 02:18 so to start it's important to 02:20 distinguish between concurrency and 02:22 parallelism 02:23 and concurrency is about how you write 02:25 your programs about being able to 02:27 compose independently executing control 02:30 flows whether you want to call them 02:31 processes or threads or go routines 02:34 so that your program can be dealing with 02:36 lots of things at once 02:37 without turning into a giant mess on the 02:40 other hand 02:40 parallelism is about how the programs 02:42 get executed about allowing multiple 02:44 computations to run 02:46 simultaneously so that the program can 02:47 be doing lots of things at once not just 02:50 dealing with lots of things at once 02:52 and so concurrency lends itself 02:53 naturally to parallel execution 02:55 but but today the focus is on how to use 02:57 go's concurrency support to make your 02:59 programs clearer 03:01 not to make them faster if they do get 03:03 faster that's wonderful but but that's 03:04 not the point today 03:07 so i said i'd walk through the design 03:09 and implementation of some programs for 03:11 four 03:12 common concur excuse me concurrency 03:14 patterns that i see often 03:16 but before we get to those i want to 03:18 start with what seems like a really 03:19 trivial problem but that illustrates 03:22 one of the most important points about 03:23 what it means to use concurrency 03:25 to structure programs a decision that 03:28 comes up 03:28 over and over when you design concurrent 03:30 programs is whether to represent states 03:33 as code or as data and by as code i mean 03:36 the control flow in the program 03:38 so suppose we're reading characters from 03:40 a file and we need to scan over a c 03:42 style quoted string oh hello so the 03:44 slides aren't changing 03:46 yeah it will they well can you see 03:48 prologue gorgeous for state right now 03:50 no we see the title slide oh no yeah i 03:53 was wondering about that because um 03:55 there was like a border around this 03:56 thing when i started and then it went 03:58 away 04:00 so let me let me just unshare and 04:01 reshare 04:04 i have to figure out how to do that in 04:07 zoom 04:08 uh unfortunately the keynote menu wants 04:12 to be up and i don't know how to get to 04:13 the zoom 04:14 menu um 04:19 ah my screen sharing is paused why is my 04:22 screen sharing paused 04:24 can i resume there we go yeah 04:27 all right i don't know the zoom box says 04:30 your screen sharing is paused so if that 04:31 now now the border's back so i'll watch 04:33 that 04:35 all right so um see 04:38 i was back here so so you know we're 04:41 reading a string 04:42 it's not a parallel program it's reading 04:43 one character at a time so there's no 04:45 opportunity for parallelism but there is 04:46 a good opportunity for concurrency 04:48 so if we don't actually care about the 04:50 exact escape sequences in the string 04:53 what we need to do is match this regular 04:54 expression and we don't have to worry 04:56 about understanding it exactly 04:57 we'll come back to what it means but but 04:59 that's basically all you have to do is 05:01 implement this regular expression 05:02 and so you know you probably all know 05:04 you can turn a regular expression into a 05:06 state machine 05:07 and so we might use a tool that 05:08 generates this code 05:10 and in this code there's a single 05:12 variable state that's the state of the 05:14 machine 05:14 and the loop goes over the state one 05:16 character at a time reads a character 05:18 depending on the state and the character 05:19 changes to a different state until it 05:21 gets to the end 05:22 and so like this is a completely 05:24 unreadable program but it's the kind of 05:26 thing that you know an auto-generated 05:27 program might look like 05:29 and and the important point is that the 05:30 program state is stored 05:32 in data in this variable that's called 05:34 state and if you can change it to store 05:36 the state 05:37 in code that's often clearer 05:41 so here's what i mean um suppose we 05:44 duplicate the read care calls 05:45 into each case of the switch so we 05:48 haven't made any semantic changes here 05:49 we just took the read care that was at 05:50 the top and we moved it into the middle 05:54 now instead of setting state and then 05:56 immediately doing the switch again 05:57 we can change those into go to's 06:01 and then we can simplify a little bit 06:03 further there's a go to state one that's 06:05 right before the state one label we can 06:06 get rid of that 06:08 then there's a um i guess yeah so then 06:11 there's 06:12 uh you know there's only one way to get 06:14 to state two so we might as well pull 06:15 the state two code up and put it inside 06:17 the if where the go to appears 06:19 and then you know both sides of that if 06:21 now end in go to state one 06:23 so we can hoist that out and now what's 06:26 left is actually a pretty simple program 06:28 you know state zero is never jumped to 06:30 so it just begins there 06:31 and then state one is just a regular 06:33 loop so we might as well make that 06:34 look like a regular loop um 06:38 and now like this is you know looking 06:39 like a program 06:41 and then finally we can you know get rid 06:42 of some variables and simplify a little 06:44 bit further 06:46 and um and we can rotate the loop so 06:48 that you know we don't do a return true 06:50 in the middle of the loop we do the 06:51 return true at the end 06:54 and so now we've got this program that 06:56 is actually 06:57 you know reasonably nice and it's worth 07:00 mentioning that 07:01 it's possible to clean up you know much 07:02 less egregious examples you know if you 07:04 had tried to write this by hand 07:05 your first attempt might have been the 07:07 thing on the left where you've got this 07:08 extra piece of state 07:10 and then you can apply the same kinds of 07:12 transformations to 07:13 move that state into the actual control 07:15 flow and end up at the same program that 07:17 we have on the right that's cleaner 07:19 so this is you know a useful 07:21 transformation to keep in mind 07:23 anytime you have state that kind of 07:25 looks like 07:26 it might be just reiterating what's 07:29 what's happening in the program counter 07:31 um and so you know you can see this 07:34 if the the origin in the original state 07:37 like if state equals zero the program 07:38 counter is at the beginning of the 07:39 function 07:40 and if state equals one or if an escape 07:42 equals false and the other version the 07:44 per encounter is just inside the for 07:45 loop and state equals two is you know 07:47 further down in the for loop 07:48 and the benefit of writing it this way 07:50 instead of with the states 07:51 is that it's much easier to understand 07:54 like i can actually 07:55 just walk through the code and explain 07:56 it to you you know if you just read 07:57 through the code you read an opening 07:59 quote 07:59 and then you start looping and then 08:01 until you find the closing quote you 08:02 read a character and if it's a backslash 08:04 you skip the next character and that's 08:05 it right you can just read that off the 08:07 page which you couldn't do 08:08 in the original this version also 08:11 happens to run 08:12 faster although that doesn't really 08:14 matter for us 08:15 um but as i mentioned i'm going to 08:18 highlight what i think are kind of 08:19 important lessons as hints for designing 08:21 your own go programs and this is the 08:22 first one 08:23 to convert data state into code state 08:26 when it makes your programs clearer 08:28 and again like these are all hints you 08:31 should you shouldn't 08:32 you know for all of these you should 08:33 consider it as you know only if it helps 08:35 you can decide 08:38 so one problem with this hint is that 08:40 not all programs have the luxury of 08:42 having complete control over their 08:44 control flow so 08:46 you know here's a different example 08:47 instead of having a read care function 08:49 that can be called 08:50 this code is written to have a process 08:53 care method that you have to hand the 08:54 character to 08:55 one at a time and then process care has 08:58 no choice really 08:59 but to you know encode its state in an 09:02 explicit state variable because 09:03 after every character it has to return 09:05 back out and so 09:07 it can't save the state in the program 09:08 counter in the stack it has to have the 09:10 state 09:10 in an actual variable but 09:14 in go we have another choice right 09:16 because we can't save the state on that 09:18 stack 09:18 and in that program counter but you know 09:21 we can make another go routine 09:22 to hold that state for us so supposing 09:26 we already have this 09:27 debugged read string function that we 09:29 really don't want to 09:30 rewrite in this other way we just want 09:32 to reuse it it works 09:33 maybe it's really big and hairy it's 09:35 much more complicated than the thing we 09:36 saw 09:37 we just want to reuse it and so the way 09:39 we can do that and go is we can start a 09:41 new go routine 09:42 that does the read string part and it's 09:44 the same read string code as before we 09:46 pass in the character reader 09:48 and now here the um you know the init 09:51 method 09:52 makes this this go routine to do the 09:54 character reading and then every time 09:56 the process care 09:57 method is called um we send a message to 10:01 the go routine on the car channel that 10:03 says 10:03 here's the next character and then we 10:05 receive a message back that says like 10:07 tell me the current status and the 10:08 current status is always either 10:10 i need more input or you know it 10:13 basically you know was it okay or not 10:16 and so um 10:18 you know this lets us move the this the 10:21 program counter that we 10:22 we couldn't do on the first stack into 10:24 the other stack of the go routine 10:26 and so using additional go teams is a 10:28 great way to hold 10:29 additional code state and give you the 10:31 ability to do these kinds of cleanups 10:33 even if the original structure the 10:35 product the problem makes it look like 10:37 you can't 10:41 but go ahead i i assume you're fine with 10:45 uh people asking questions 10:46 yeah absolutely i just wanted to make 10:47 sure that yeah yeah definitely please 10:50 interrupt um and so so the hint here is 10:53 to use additional go routines 10:55 to hold additional code state and 10:57 there's there's one 10:58 caveat to this and then it's not free to 11:01 to just make go routines right you have 11:03 to actually make sure that they exit 11:04 because otherwise you'll just accumulate 11:06 them and so you do have to think about 11:08 uh you know why does the go routine exit 11:10 like you know is it going to get cleaned 11:12 up 11:13 and in this case we know that you know q 11:16 dot parse 11:17 is going to return where you know parse 11:20 go 11:22 sorry that's not right um 11:26 oh sorry the read string here read 11:27 string is going to return any time it 11:29 sends a 11:30 a message that says need more input 11:32 where'd it go 11:33 there's something missing from this 11:34 slide 11:38 sorry i went through this last night um 11:43 so so as we go in we go into init we 11:46 kick off this go routine it's going to 11:48 call read care a bunch of times 11:49 and then we read the status once and 11:51 that that first status is going to 11:52 happen 11:53 because the the first call to read care 11:56 from read string is going to send i need 11:58 more input and then we're going to send 11:59 a character back 12:00 um we're going to send the character 12:02 back in process care 12:04 and then every time process care gets 12:05 called it returns a status 12:07 and so up until you get um you know need 12:10 more input you're going to get the 12:12 um uh sorry 12:15 this is not working um you're going to 12:17 get any more input for every time you 12:18 want to read a character 12:20 and then when it's done reading 12:21 characters what i haven't shown you here 12:23 what seems to be missing 12:25 somehow is when things exit and when 12:28 things exit 12:29 let's see if it's on this slide yeah so 12:31 there's a return success and a return 12:33 bad input that i'd forgotten about 12:35 and so uh you know these return a 12:37 different status and then they're done 12:39 so when process care uh you know 12:42 in in the read stream version when it 12:44 returns you know bad input or success 12:47 we we say that you know it's done and so 12:49 as long as the caller 12:51 is going through and um 12:54 you know calling until it gets something 12:56 that's not need more input 12:58 then the go routine will finish but you 13:00 know maybe if we stop early if the 13:02 caller like hits an eof and stops on its 13:04 own without telling us that it's done 13:06 there's a go routine left over and so 13:07 that could be a problem 13:09 and so you just you need to make sure 13:10 that you know when and why 13:12 each go routine will exit and the nice 13:15 thing is that if you do make a mistake 13:17 and you leave guardians stuck they just 13:19 sit there it's like the best possible 13:21 bug in the world because they just sit 13:22 around waiting for you to look at them 13:24 and all you have to do is remember to 13:25 look for them 13:26 and so you know here's a very simple 13:28 program at least go routines and it runs 13:30 an http server 13:32 and so you know if we run this it kicks 13:34 off a whole bunch of effort routines 13:36 and they all uh block trying to send to 13:38 a channel and then it makes the http 13:40 server 13:40 and so if i run this program it just 13:42 sits there and if i type control 13:44 backslash on a unix system i get a sig 13:46 quit 13:47 which makes it crash and dump all the 13:48 stacks of the go routines 13:50 and you can see on the slide that you 13:52 know it's going to print over and over 13:53 again here's the go routine in h 13:54 called from g called from f and and in a 13:57 channel send 13:58 and if you look at the line numbers you 14:00 can see exactly where they are 14:02 another option is that since we're in an 14:04 http server 14:06 and the hp server imports the net http 14:09 prof package 14:10 you can actually just visit the http 14:12 server's debug pprofgoreteen url 14:15 which gives you the stacks of all the 14:16 running go routines and unlike the crash 14:18 dump 14:19 it takes a little more effort and it 14:21 deduplicates the go routines based on 14:22 their stacks 14:24 and so and then it sorts them by how 14:26 many there are of each stack and so if 14:27 you have a go routine leak 14:29 the leak shows up at the very top so in 14:31 this case you've got 100 go routines 14:32 stuck in h called from g 14:34 call from f and then we can see there's 14:36 like one of a couple other go routines 14:37 and we don't really care about them 14:39 and so you know this is a new hint that 14:41 it just it's really really useful to 14:43 look for stucco routines by just 14:45 going to this end point all right 14:49 so that was kind of the warm-up now i 14:52 want to look at the first 14:53 real concurrency pattern which is a 14:54 publish subscribe server 14:56 so publish subscribe is a way of 14:58 structuring a program that you decouple 15:00 the parts that are publishing 15:01 interesting events 15:02 from the things that are subscribing to 15:04 them and there's a published subscriber 15:05 pub sub server in the middle that 15:07 connects those 15:08 so the individual publishers and the 15:10 individual subscribers don't have to be 15:11 aware 15:12 of exactly who the other ones are so you 15:15 know on your android phone 15:16 um an app might publish a make a phone 15:18 call event and then the the dialer might 15:20 subscribe to that and actually start and 15:22 you know help dial 15:24 and and so in a real pub sub server 15:26 there are ways to filter events based on 15:27 like what kind they are 15:28 so that when you publish and make a 15:30 phone call event like it doesn't go to 15:31 your email program 15:33 but you know for now we're just going to 15:34 assume that the filtering is taken care 15:36 of separately 15:37 and we're just worried about the actual 15:38 publish and subscribe 15:40 and the concurrency of that so here's an 15:43 api 15:44 we want to implement with any number of 15:46 clients that can call subscribe 15:48 with a channel and afterwards events 15:51 that are published will be sent on that 15:53 channel 15:54 and then when a client is no longer 15:55 interested it can call cancel and pass 15:58 in the same channel 15:59 to say stop sending me events on that 16:01 channel and the way that cancel will 16:03 signal that it really is done 16:05 sending events on that channel is it 16:06 will close the channel so that the the 16:08 receive the caller can can keep 16:10 receiving events until it sees the 16:11 channel get closed and then it knows 16:13 that the cancel has taken effect 16:16 um so notice that the information is 16:20 only flowing 16:20 one way on the channel right you can 16:22 send to the channel 16:24 and then it the receiver can receive 16:26 from it and the information flows from 16:28 the sender to the receiver and it never 16:29 goes the other way 16:30 so closing is also a signal from the 16:32 sender 16:33 to the receiver but all the sending is 16:35 over the receiver cannot close the 16:37 channel to tell the sender like i don't 16:39 want you to send anymore 16:40 because that's information going the 16:42 opposite direction and it's just a lot 16:44 easier to reason about 16:45 if the information only goes one way and 16:48 of course 16:49 if you need communication in both 16:51 directions you can use a pair of 16:52 channels 16:53 and it often turns out to be the case 16:54 that those uh 16:56 different directions may have different 16:57 types of data flowing like before we saw 17:00 that there were runes going in one 17:01 direction and status updates going in 17:02 the other direction 17:04 so how do we implement this api 17:07 here's a pretty basic implementation 17:09 that you know could be good enough 17:11 we have a server and the server state is 17:14 a map of registered subscriber channels 17:16 protected by a lock 17:18 we initialize the server by just 17:20 allocating the map 17:22 and then to publish the event we just 17:24 send it to every registered channel 17:26 to subscribe a new channel we just add 17:28 it to the map and to cancel we take it 17:30 out of the map 17:32 and then because these are all um these 17:35 are all methods that might be called 17:36 from multiple go routines 17:38 um we need to call lock and unlock 17:40 around these 17:42 to um you know protect the map and 17:44 notice that i wrote defer unlock 17:46 right after the lock so i don't have to 17:49 remember to unlock it later 17:51 uh you've probably all seen this you 17:52 know it's sort of a nice idiom to just 17:53 do the lock unlock and then you know 17:55 have a blank line and have that be its 17:56 own kind of paragraph in the code 18:02 one thing i want to point out is that 18:04 using defer makes sure that the mutex 18:06 gets unlocked 18:07 even if you have multiple returns from 18:09 the function so you can't forget 18:10 but it also makes sure that it gets 18:12 unlocked if you have a panic 18:14 like in subscribe and cancel where 18:16 there's you know panics for misuse 18:18 and there is a subtlety here about if 18:21 you might not want to unlock the mutex 18:23 if 18:23 the panic happened while the thing that 18:25 was locked is in some inconsistent state 18:27 but i'm going to ignore that for now in 18:29 general 18:31 you try to avoid having the the things 18:33 that might panic 18:35 happen while you're you know potentially 18:36 an inconsistent state 18:38 and i should also point out that the use 18:40 of panic at all in subscribe 18:42 and cancel implies that you really trust 18:44 your clients not to misuse the interface 18:46 that 18:47 it is a program error worth you know 18:49 tearing down the entire program 18:51 potentially 18:52 for that to happen and in a bigger 18:54 program where other clients were using 18:55 this api 18:57 you'd probably want to return an error 18:58 instead and not have the possibility of 19:01 taking down the whole program 19:02 but panicking simplifies things for now 19:05 and you know error handling in general 19:06 is kind of 19:07 not the topic today 19:10 a more important concern with this code 19:12 than panics is what happens 19:14 if a go routine is slow to receive 19:17 events 19:18 so all the operations here are done 19:20 holding the mutex which means all the 19:22 clients kind of have to proceed in 19:23 lockstep 19:25 so during publish there's a loop that's 19:27 sending 19:28 on the channels sending the event to 19:30 every channel and if one subscriber 19:32 falls behind 19:33 the next subscriber doesn't get the 19:35 event until that slow subscriber you 19:36 know wakes up and actually gets the 19:38 the event off off that channel and so 19:41 one slow subscriber 19:42 can slow down everyone else and you know 19:45 forcing them to proceed in lockstep this 19:47 way 19:47 is not always a problem if you've you 19:50 know documented the restriction 19:51 and for whatever reason you know how the 19:53 clients are are written 19:54 and you know that they won't ever fall 19:56 too far behind this could be totally 19:58 fine it's a really simple implementation 20:01 and um and it has nice properties like 20:03 on return from publish you know that the 20:05 event has 20:06 actually been handed off to each of the 20:07 other grow routines you don't know that 20:09 they've started processing it but you 20:10 know it's been handed off 20:12 and so you know maybe that's good enough 20:14 and you could stop here 20:16 a second option is that if you need to 20:18 tolerate just a little bit of slowness 20:20 on the the subscribers then you could 20:23 say that they need to give you a 20:24 buffered channel with room for a couple 20:26 events in the buffer 20:27 so that you know when you're publishing 20:30 you know as long as they're not too far 20:31 behind there'll always be room 20:33 for the new event to go into the channel 20:35 buffer 20:36 and then the actual publish won't block 20:38 for too long 20:39 and again maybe that's good enough if 20:41 you're sure that they won't ever fall 20:42 too far behind 20:43 you get to stop there but in a really 20:46 big program 20:48 you do want to cope more gracefully with 20:50 arbitrarily arbitrarily slow subscribers 20:53 and so then the question is what do you 20:54 do 20:55 and so you know in general you have 20:56 three options you can slow down the 20:58 event generator 20:59 which is what the previous solutions 21:01 implicitly do 21:02 because publish stops until the 21:05 subscribers catch up 21:07 or you can drop events or you can queue 21:09 an arbitrary number of past events 21:11 those are pretty much your only options 21:14 so we talked about 21:15 you know publish and slowing down the 21:17 event generator 21:19 there's a middle ground where you 21:20 coalesce the events or you drop them 21:23 um so that you know the subscriber might 21:26 find out that 21:27 you know hey you missed some events and 21:29 i can't tell you what they were because 21:30 i didn't save them but but i'm at least 21:31 going to tell you 21:32 you missed five events and then maybe it 21:34 can do something else to try to catch up 21:37 and this is the kind of approach that um 21:39 that we take in the profiler so in the 21:41 profiler if you've used it 21:43 if uh there's a go routine that uh fills 21:46 the profile on on a signal handler 21:48 actually 21:49 with profiling events and then there's a 21:51 separate go routine whose job is to read 21:52 the data back out and like write it to 21:54 disk or send it to a http request or 21:56 whatever it is you're doing with profile 21:58 data 21:59 and there's a buffer in the middle and 22:01 if the 22:02 receiver from the profile data falls 22:04 behind when the buffer fills up we start 22:06 adding entries to 22:08 a final profile entry that just has a 22:11 single entry that's that's 22:12 a function called runtime.lost profile 22:14 data and so 22:15 if you go look at the profile you see 22:17 like hey the program spent five percent 22:18 of its time in lost profile data 22:20 that just means you know the the profile 22:22 reader 22:23 was too slow and it didn't catch up and 22:26 and we lost some of the profile but 22:27 we're clear about exactly 22:28 you know what the error rate is in the 22:30 profile and you pretty much never see 22:33 that because all the readers actually do 22:34 keep up 22:35 but just in case they didn't you have a 22:37 pretty clear signal um 22:40 an example of purely dropping the events 22:42 is the os signal package 22:44 where um 22:47 you have to pass in a channel that will 22:49 be ready to receive the signal 22:50 a signal like sig hop or sig quit and 22:54 when the signal comes in 22:55 the run time tries to send to each of 22:57 the channels that subscribe to that 22:58 signal 22:59 and if it can't send to it it just 23:00 doesn't it's just gone 23:02 um because you know we're in a signal 23:03 handler we can't wait and so 23:06 what the callers have to do is they have 23:07 to pass in a buffered channel and if 23:09 they pass in a buffered channel 23:10 that has you know length at least one 23:12 buffer length at least one 23:14 and they only register that channel to a 23:16 single signal 23:18 then you know that if a signal comes in 23:21 you're definitely going to get told 23:22 about it 23:23 if it comes in twice you might only get 23:25 told about it once 23:26 but that's actually the same semantics 23:28 that unix gives to processes for signals 23:30 anyway 23:30 so that's fine so those are both 23:32 examples of dropping or coalescing 23:34 events 23:36 and then the third choice is that you 23:38 might actually just 23:39 really not want to lose any events it 23:41 might just be really important that you 23:43 never lose anything 23:44 in which case you know you can queue an 23:46 arbitrary number of events you can 23:48 somehow arrange for 23:50 the program to just save all the events 23:52 that the 23:53 you know slow subscriber hasn't seen yet 23:54 somewhere and and give them to the 23:56 subscriber later 23:58 and it's really important to think 23:59 carefully before you do that because in 24:01 a distributed system 24:03 you know there's always slow computers 24:05 always computers that 24:06 have fallen offline or whatever and they 24:08 might be gone for a while 24:10 and so you don't want to introduce 24:11 unbounded queuing in general you want to 24:13 think very carefully before you do that 24:15 and think well you know 24:16 how unbounded is it really and can i 24:18 tolerate that 24:19 and so like that's a reason why channels 24:21 don't have just an unbounded buffering 24:24 it's really almost never the right 24:25 choice and if it is the right choice 24:27 you probably want to build it very 24:28 carefully 24:30 um and so but we're going to build one 24:32 just to see what it would look like 24:35 and before we do that i just want to 24:38 adjust the program a little bit 24:39 so we have this mutex in the code 24:42 and the mutex is an example of of 24:45 keeping the 24:46 the state whether you're locked or not 24:48 in a state variable 24:49 but we can also move that into a program 24:51 counter variable 24:52 by putting it in a different go routine 24:55 and so 24:57 in this case we can start a new go 24:58 routine that runs a program a function 25:01 called s dot loop 25:02 and it handles requests sent on three 25:04 new channels publish subscribe and 25:06 cancel 25:07 and so in init we make the channels and 25:10 then we we kick off 25:11 s dot loop and s dot loop is sort of 25:14 the amalgamation of the previous method 25:16 bodies and it just 25:18 receives from any of the three channels 25:20 a request 25:21 a publish a subscriber a cancel request 25:23 and it does whatever was asked 25:26 and now that map the subscriber map 25:29 can be just a local variable in s dot 25:31 loop 25:32 and and so um you know it's the same 25:34 code 25:35 but now that data is clearly owned by 25:38 s.loop nothing else could even get to it 25:40 because it's a local variable 25:44 and then we just need to change the 25:45 original methods to send the work over 25:47 to the loop go routine and so uppercase 25:49 publish 25:50 now sends on lowercase publish the 25:52 channel 25:53 the event that it wants to publish and 25:55 similarly subscribe and cancel 25:58 they create a request that has a channel 26:01 uh 26:01 that we want to subscribe and also a 26:03 channel to get the answer back 26:05 and they send that into the loop and 26:07 then the loop sends back the answer 26:11 and so i referred to transforming the 26:14 program this way as like converting the 26:15 mutex 26:16 into a go routine because we took the 26:18 data state of the mutex there's like a 26:20 lock bit inside it and now that lock bit 26:22 is 26:22 implicit in the program counter of the 26:24 loop um 26:25 it's very clear that you can't ever have 26:27 you know a publish and subscribe 26:29 happening at the same time 26:30 because it's just single threaded code 26:32 and just you know executes in sequence 26:36 on the other hand the the original 26:38 version had a kind of like clarity of 26:40 state where you could sort of inspect it 26:42 and and reason about well this is the 26:44 important state and and it's harder in 26:46 the 26:47 go routine version to see like what's 26:48 important state and what's kind of 26:50 incidental state from just having a go 26:52 routine 26:53 and in a given situation you know one 26:55 might be more important than the other 26:57 so a couple years ago i did all the labs 27:00 for the class when it first switched to 27:01 go 27:02 and and raft is a good example of where 27:05 you probably prefer the state with the 27:07 mutex is because 27:08 raft is is so different from most 27:11 concurrent programs and that like 27:12 each replica is just kind of profoundly 27:15 uncertain of its state right like the 27:17 state transitions 27:18 you know one moment you think you're the 27:20 leader and the next moment you've been 27:21 deposed 27:22 like one moment your log has ten entries 27:23 the next moment you find actually no it 27:24 only has two entries 27:26 and so being able to manipulate that 27:28 state directly 27:29 rather than having to you know somehow 27:31 get it in and out of the program counter 27:32 makes a lot more sense for raft but 27:34 that's pretty unique in most situations 27:37 it cleans things up to put the state in 27:39 the program counter 27:42 all right so in order to deal with the 27:44 slow subscribers 27:46 now we're going to add some helper go 27:47 routines and their job 27:49 is to manage a particular subscriber's 27:51 backlog and keep the overall program 27:53 from blocking 27:54 and so this is the helper go team and 27:57 the the 27:57 the main loop go routine will send the 27:59 events to the helper 28:00 which we then trust because we wrote it 28:02 not to fall arbitrarily behind 28:05 and then the helpers job is to cue 28:07 events if needed and send them off to 28:08 the subscriber 28:10 all right so this actually has um two 28:13 problems 28:14 the first is that if there's nothing in 28:16 the queue 28:17 then the select is actually wrong to try 28:19 to offer q of zero and in fact just 28:21 evaluating q of zero at the start of the 28:23 select will panic 28:24 because the queue is empty and so we can 28:27 fix these 28:28 by setting up the arguments separately 28:30 from the select and in particular 28:32 we need to make a channel send out 28:35 that's going to be nil 28:36 which is never able to proceed in a 28:39 select 28:40 um as we know when we don't want to send 28:42 and it's going to be the actual 28:44 out channel when we do want to send and 28:46 then we have to have a separate variable 28:47 that holds the event that we're going to 28:49 send it will only you know 28:50 actually read from q of 0 if there's 28:52 something in the queue 28:55 the second thing that's wrong is that we 28:57 need to handle closing of the channel of 28:59 the input channel 29:00 because when the input channel closes we 29:02 need to flush the rest of the queue and 29:04 then we need to close the output channel 29:06 so to check for that we change the 29:08 select from just 29:09 doing e equals receive from n to e comma 29:12 okay equals receive from n and the comma 29:14 okay we'll be told 29:15 whether or not the channel is actually 29:17 sending real data or else it's closed 29:20 and so when okay is false we can set 29:22 into nil 29:23 to say let's stop trying to receive from 29:24 in there's nothing there we're just 29:25 going to keep getting told that it's 29:27 closed 29:28 and then when the loop is fine when the 29:31 queue is finally empty we can exit the 29:33 loop 29:33 and so we change the for condition to 29:36 say we want to keep exiting the loop as 29:38 long as 29:38 there actually still is an input channel 29:40 and there's something 29:41 to write back to the output channel and 29:43 then once both of those are not true 29:45 anymore 29:45 it's time to close it's time to exit the 29:47 loop and we close the output channel 29:49 and we're done and so now we've 29:50 correctly propagated the closing 29:52 of the input channel to the output 29:54 channel 29:56 so that was the helper and the server 29:58 loop used to look like this 30:01 and to update it we just changed the 30:03 subscription map 30:04 before it was a map from subscribe 30:06 channels to bools it was just basically 30:08 a set 30:09 and now it's a map from subscribe 30:11 channel to helper channel 30:12 and every time we get a new subscription 30:15 we make a helper channel 30:16 we kick off a helper go routine and we 30:19 record the helper channel in the 30:20 subscription map 30:21 instead of the the actual channel and 30:24 then the rest of 30:26 uh the rest of the the loop actually 30:29 barely changes at all 30:32 so i do want to point out that like if 30:34 you wanted to have a different strategy 30:36 for you know what you do with uh clients 30:39 that fall too far behind 30:40 that can all go in the helper go routine 30:42 the code on the screen right now 30:44 is completely unchanged so we've we've 30:45 completely separated the 30:47 publish subscribe maintaining the the 30:49 actual list of subscribers map 30:51 from the what do you do when things get 30:53 too slow map 30:54 or problem and so it's really nice that 30:58 you've got this clean separation of 30:59 concerns into completely different go 31:01 routines and that can help you you know 31:02 keep your program simpler 31:04 and so that's the general hint is that 31:06 you can use go routines a lot of the 31:07 time to separate independent concerns 31:11 all right so um 31:15 the second pattern for today is a work 31:17 scheduler 31:18 and you did one of these in lab one for 31:20 mapreduce and i'm just gonna 31:21 you know build up to that and and this 31:24 doesn't do all the rpc stuff it just 31:25 kind of assumes that there's kind of 31:27 channel 31:27 channel based interfaces to all the the 31:29 servers 31:31 so you know we have this function 31:33 scheduled it takes a 31:34 fixed list of servers has a number of 31:36 tasks to run 31:37 and it has just this abstracted function 31:39 call that you you call 31:41 to run the task on a specific server you 31:44 can imagine it was you know doing the 31:45 rpcs underneath 31:48 so we're going to need some way to keep 31:49 track of which servers are available 31:51 to execute new tasks and so one option 31:54 is to use our own stack or queue 31:55 implementation 31:56 but another option is to use a channel 31:58 because it's a good 31:59 synchronized queue and so we can send 32:02 into the channel 32:03 to add to the queue and receive from it 32:05 to pop something off 32:07 and in this case we'll make the queue be 32:09 a queue of servers 32:10 and we'll start off it's a queue of idle 32:12 servers servers that aren't doing any 32:14 work for us right now 32:15 and we'll start off by just initializing 32:17 it by sending all the known servers into 32:19 the idle list 32:21 and then we can loop over the tasks and 32:23 for every task we kick off a go routine 32:25 and its job is to pull a server off the 32:27 idle list 32:28 run the task and then put the server 32:30 back on 32:32 and this loop body is another example of 32:35 the earlier hint to use guaranteeing 32:36 select independent things run 32:38 independently 32:39 because each task is running as a 32:41 separate concern they're all running in 32:42 parallel 32:44 unfortunately there are two problems 32:46 with this program 32:48 the first one is that the closure that's 32:50 running as a new go routine refers to 32:51 the loop iteration variable which is 32:53 task 32:54 and so by the time the go routine starts 32:55 exiting you know the loop has probably 32:57 continued and done at task plus plus and 32:59 so it's actually getting the wrong value 33:00 of task 33:02 you've probably seen this by now um and 33:04 of course the best way to to 33:06 catch this is to run the race detector 33:08 and at google we even encourage teams to 33:10 set up canary servers that 33:11 run the race detector and split off 33:13 something like you know 0.1 percent of 33:15 their traffic to it 33:16 just to catch um you know races that 33:18 might be in the production system 33:20 and you know finding a bug with a race 33:22 detector is is way better than having to 33:24 debug some you know corruption later 33:27 so there are two ways to fix this race 33:30 the first way 33:30 is to give the closure an explicit 33:32 parameter and pass it in 33:34 and the go statement requires a function 33:37 call specifically for this reason 33:39 so that you can set specific arguments 33:41 that get evaluated 33:42 in the context of the original go 33:44 routine and then get copied to the new 33:46 go routine 33:47 and so in this case we can declare a new 33:49 argument task two we can pass 33:51 task to it and then inside the go 33:53 routine task 2 33:54 is a completely different copy of of 33:56 task 33:58 and i only named it task 2 to make it 33:59 easier to talk about 34:01 but of course there's a bug here and the 34:03 bug is that 34:04 i forgot to update task inside the 34:06 function to refer to task two instead of 34:08 task 34:09 and so we basically never do that um 34:12 what we do instead 34:13 is we just give it the same name so that 34:16 it's impossible now 34:17 for the code inside the go regime to 34:19 refer to the wrong copy of task 34:22 um that was the first way to fix the 34:24 race there's a second way which is you 34:25 know sort of cryptic the first time you 34:27 see it but it amounts to the same thing 34:29 and that is that you just make a copy of 34:31 the the variable 34:32 inside the loop body so every time 34:36 a colon equals happens that creates a 34:38 new variable so in the for loop in the 34:40 outer for loop there's a colon equals at 34:42 the beginning 34:42 and there's not one the rest of the loop 34:44 so that's all just one variable for the 34:46 entire loop 34:47 whereas if we put a colon equals inside 34:48 the body every time we run an iteration 34:50 of the loop that's a different variable 34:52 so if the guard if the go function 34:55 closure captures that variable 34:57 those will all be distinct so we can do 34:59 the same thing we do task two and this 35:01 time i remember to update the body 35:03 but you know just like before it's too 35:05 easy to forget to update the body 35:07 and so typically you write task colon 35:08 equals task which looks kind of magical 35:10 the first time you see it but but that's 35:12 what it's for 35:14 all right so i said there were two bugs 35:16 in the program the first one was this 35:18 race on task 35:19 and the second one is that uh we didn't 35:22 actually 35:23 do anything after we kicked off all the 35:25 tasks we're not waiting for them to be 35:26 done 35:28 um and and in particular uh we're 35:31 kicking them off way too fast 35:33 because you know if there's like a 35:35 million tasks you're going to kick off a 35:36 million guard teams and they're all just 35:37 going to sit waiting for one of the five 35:39 servers 35:39 which is kind of inefficient and so what 35:41 we can do 35:42 is we can pull the fetching of the the 35:45 next idle server up 35:46 out of the go routine and we pull it up 35:50 out of the go routine 35:51 now we'll only kick off a go routine 35:53 when there is a next server to use 35:56 and then we can kick it off and and you 35:58 know use that server and put it back 36:00 and the using the server and put it back 36:01 runs concurrently but 36:03 doing the the fetch of the idle server 36:05 inside the loop slows things down so 36:06 that 36:07 there's only ever now number of servers 36:09 go routines running instead of number of 36:10 tasks 36:12 and that receive is essentially creating 36:14 some back pressure to slow down the loop 36:16 so it doesn't get too far ahead and then 36:19 i mentioned we have to wait for the task 36:20 to finish 36:21 and so we can do that by just at the end 36:23 of the loop uh going over the the list 36:25 again and pulling all the servers out 36:27 and we've pulled you know the right 36:28 number of servers out of the idle list 36:30 that means they're all done and so 36:32 that's that's the 36:33 full program now to me the most 36:36 important part of this 36:37 is that you still get to write a for 36:39 loop to iterate over the tasks 36:41 there's lots of other languages where 36:42 you have to do this with state machines 36:44 or some sort of callbacks 36:46 and you don't get the luxury of encoding 36:48 this in the control flow 36:49 um and so this is a you know much 36:51 cleaner way where you can just you know 36:53 use a regular loop 36:55 but there are some some changes we could 36:57 make some improvements 36:58 and so one improvement is to notice that 37:01 there's only one go routine that makes 37:03 requests of a server at a particular 37:05 time 37:05 so instead of having one go routine per 37:07 task maybe we should have one go routine 37:09 per server 37:10 because there are probably going to be 37:12 fewer servers than tasks 37:14 and to do that we have to change from 37:16 having a channel of idle servers to a 37:18 channel of 37:18 you know yet to be done tasks and so 37:21 we've renamed the idle channel to work 37:23 and then we also need a done channel to 37:26 count um 37:27 you know how many uh tasks are done so 37:29 that we know when we're completely 37:30 finished 37:32 and so here there's a new function run 37:34 tasks and that's going to be the per 37:36 server function and we kick off one of 37:38 them for each server 37:40 and run tasks his job is just to loop 37:42 over the work channel 37:43 run the tasks and when the server is 37:45 done we send true to done 37:48 and the you know the server tells us 37:50 that you know it's done 37:52 and the server exits when the work 37:53 channel gets closed that's what makes 37:55 that for loop 37:56 actually stop so then 37:59 you know having kicked off the servers 38:00 we can then just sit there in a loop 38:02 and send each task to the work channel 38:05 close the work channel and say hey 38:07 there's no more work coming all the 38:08 servers you should finish and then and 38:09 then exit 38:10 and then wait for all the servers to 38:11 tell us that they're done 38:15 so in the lab there were a couple 38:17 complications one was that 38:18 you know you might get new servers at 38:20 any given time um and so we could change 38:22 that by saying the servers come in on a 38:24 channel of strings 38:27 and and that actually fits pretty well 38:28 into the current structure where 38:30 you know when you get a new server you 38:32 just um kick off a new uh 38:34 run tasks go routine and so the only 38:36 thing we have to change here is to put 38:37 that loop 38:38 into its own go routine so that while 38:40 we're sending tasks to servers we can 38:42 still accept new servers and kick off 38:44 the helper go routines 38:47 but now we have this problem that we 38:48 don't really have a good way to tell 38:49 when all the servers are done because we 38:51 don't know how many servers there 38:52 are and so we could try to like 38:56 maintain that number as servers come in 38:58 but it's a little tricky 38:59 and instead we can count the number of 39:01 tasks that have finished 39:02 so we just move the done sending true to 39:05 done up a line 39:06 so that instead of doing it per server 39:08 we now do it per task 39:09 and then at the end of the loop or at 39:11 the end of the function we just have to 39:12 wait for the right number of tasks to be 39:14 done 39:15 and so so now again we sort of know uh 39:19 why these are gonna the finish um 39:21 there's actually a deadlock still 39:23 and that is that if the the number of 39:25 tasks is um 39:27 is too big actually i think always you 39:30 you'll get a deadlock 39:31 and if you run this you know you get 39:32 this nice thing where the dirt it tells 39:34 you like hey your routines are stuck and 39:35 the problem is 39:36 that you know we have this run task uh 39:39 server loop 39:40 and the server loop is trying to say hey 39:42 i'm done and you're trying to say hey 39:44 like here's some more work so if you 39:45 have more than one task you'll run into 39:47 this deadlock 39:48 where you know you're trying to send the 39:50 next task to a server 39:51 i guess that is more task than servers 39:54 you're trying to send the next task to a 39:55 server and all the servers are trying to 39:56 say hey i'm done with the previous task 39:58 but you're not there to receive from the 40:00 done channel 40:01 and so again you know it's really nice 40:04 that the 40:04 the guardians just hang around and wait 40:06 for you to look at them and we can fix 40:08 this 40:09 one way to fix this would be to add a 40:12 separate loop that actually does a 40:14 select 40:14 that either sends some work or accounts 40:17 for some of the work being done 40:19 that's fine but a cleaner way to do this 40:22 is to take the the work sending loop the 40:25 task sending loop and put it in its own 40:27 go routine 40:28 so now it's running independently of the 40:30 counting loop and the counting loop 40:32 can can run and you know unblock servers 40:35 that are done with certain tasks while 40:37 other tasks are still being sent 40:41 but the simplest possible fix for this 40:44 is to just make the work channel big 40:45 enough 40:46 that you're never gonna run out of space 40:49 because we might decide that you know 40:50 having a go routine per task is you know 40:52 a couple kilobytes per task 40:54 but you know an extra inch in the 40:56 channel is eight bytes 40:58 so probably you can spend eight bytes 40:59 per task 41:01 and so if you can you just make the work 41:03 channel big enough that you know that 41:04 all the sends on work 41:05 are going to never block and you'll 41:07 always get down to the the counting loop 41:10 at the end pretty quickly and so 41:13 doing that actually sets us up pretty 41:15 well for the other wrinkle in the lab 41:16 which is that 41:17 sometimes calls can time out and here 41:19 i've modeled it by 41:20 the call returning a false so just say 41:22 hey it didn't work 41:24 um and so you know in run task it's 41:27 really easy to say like if 41:30 it's really easy to say like if the call 41:32 uh fails 41:34 then or sorry if the call succeeds then 41:36 you're done but if it fails just put the 41:38 task back on the work list 41:40 and because it's a queue not a stack 41:42 putting it back on the work list is very 41:43 likely to hand it to some other server 41:46 um and so that will you know probably 41:48 succeed 41:49 because it's some other server i mean 41:50 this is all kind of hypothetical but 41:52 um uh it's a really you know it fits 41:56 really well into the structure that 41:57 we've created 42:00 all right and the final change is that 42:02 because the server guarantees are 42:03 sending on work 42:05 we do have to uh wait to close it until 42:07 we know that they're done sending 42:09 and uh because again you can't close you 42:11 know before they finish sending 42:14 and so we just have to move the close 42:16 until after we've counted that all the 42:17 tasks are done 42:19 um and you know sometimes we get to this 42:21 point and people ask like why can't you 42:23 just 42:23 kill go routines like why not just be 42:25 able to say look hey kill all the server 42:27 guardians at this point we know that 42:28 they're not needed anymore 42:29 and the answer is that you know the go 42:31 routine has state and it's interacting 42:32 with the rest of the program and if it 42:34 all of a sudden just stops 42:36 it's sort of like it hung right and 42:38 maybe it was holding a lock 42:39 maybe it was in the middle of some sort 42:40 of communication with some other guru 42:42 team that was kind of expecting an 42:43 answer so we need to find some way to 42:46 tear them down more gracefully and 42:47 that's by telling them explicitly hey 42:49 you know 42:49 you're done you can you can go away and 42:51 then they can clean up however 42:52 they need to clean up 42:57 um you know speaking of cleaning up 42:59 there's there's actually one more thing 43:00 we have to do which is to shut down the 43:01 loop that's that's watching for new 43:03 servers 43:04 and so we do have to put a select in 43:05 here where 43:07 uh you know the the thing that's waiting 43:10 for new servers on the server channel we 43:11 have to 43:12 tell it okay we're done just like stop 43:14 watching for new servers because all the 43:15 servers are gone 43:16 um and we could make this the caller's 43:19 problem but but this is actually fairly 43:21 easy to do 43:24 all right so um pattern number three 43:26 which is a a client for a replicated 43:28 server 43:29 of service so here's the interface that 43:32 we want to implement we have some 43:34 service 43:35 that we want that is replicated for 43:38 reliability 43:39 and it's okay for a client to talk to 43:40 any one of these servers 43:42 and so the the replicated client is 43:45 given a list of servers the uh the 43:48 arguments to init is a list of servers 43:50 and a function that lets you call one of 43:53 the servers with a particular argument 43:55 set and get a reply 43:57 and then being given that during init 44:00 the replicated client then provides 44:02 a call method that doesn't tell you what 44:05 server it's going to use it just finds a 44:07 good server to use 44:08 and it keeps the same keeps using the 44:10 same server for as long as it can until 44:12 it finds out that that server is no good 44:15 so in this situation there's almost no 44:17 shared state that you need to isolate 44:19 and so like the only state that persists 44:20 from one call to the next is what server 44:22 did i use last time because i'm going to 44:23 try to use that again 44:25 so in this case that's totally fine for 44:27 a mutex i'm just going to leave it there 44:29 it's always okay to use mutex if that's 44:31 the cleanest way to write the code 44:33 you know some people get the wrong 44:34 impression from how much we talk about 44:36 channels but it's always okay to use a 44:37 mutex if that's all you need 44:40 so now we need to implement this 44:42 replicated call method whose job is to 44:44 try sending to lots of different servers 44:47 right but but first to try the 44:49 the original server so so what does it 44:52 mean if 44:52 you know the try fails well there's like 44:55 no 44:56 clear way for it to fail above it just 44:58 always returns a reply and so the only 45:00 way it can fail is if it's taking too 45:01 long 45:02 so we'll assume that if it takes too 45:03 long that means it failed 45:05 so in order to deal with timeouts we 45:08 have to run that 45:08 that code in the background in a 45:10 different go routine so we can do 45:12 something like this 45:14 um where we set a timeout we create a 45:16 timer 45:18 and then we use the go routine to send 45:19 in the background and then at the end we 45:21 wait and either we get the timeout 45:23 or we get the actual reply if we get the 45:25 actual reply 45:26 we return it if we get the timeout we 45:29 have to do something we'll have to 45:30 figure out what to do 45:32 um it's worth pointing out that you have 45:34 to 45:35 call tdot stop because otherwise the 45:37 timer sits in a timer queue that you 45:39 know it's going to go off in one second 45:41 and so you know if this call took a 45:42 millisecond and you have this timer 45:44 that's going to sit there for the next 45:45 second 45:46 and then you do this in a loop and you 45:47 get a thousand timers sitting in that 45:49 that um that queue before they start 45:51 actually you know um 45:52 disappearing and so this is kind of a 45:55 wart in the api 45:56 but it's been there forever and we've 45:58 never fixed it um 46:00 and and so you just have to remember to 46:02 call stop 46:04 uh and then you know now we have to 46:05 figure out what do we do in the case of 46:07 the timeout 46:08 and so in the case of the timeout we're 46:10 going to need to try a different server 46:11 so we'll write a loop and we'll start 46:14 at um the id that id0 it says 46:18 and you know if a reply comes in that's 46:20 great and otherwise we'll reset the 46:22 timeout and go around the loop again 46:24 and try sending to a different server 46:26 and notice 46:28 there's only one done channel in this 46:30 program and so 46:31 you know on the third iteration of the 46:33 loop we might be waiting 46:35 and then finally the first server gives 46:36 us a reply that's totally fine we'll 46:38 take that reply that's great 46:41 um and so then we'll stop and return it 46:44 and but if we get all the way through 46:46 the loop it means that we've sent the 46:47 request to every single server 46:49 in which case there's no more timeouts 46:51 we just have to wait for one of them to 46:52 come back 46:53 and so that's the the plain receive and 46:55 the return at the end 46:58 and then it's important to notice that 47:00 the done channel 47:01 is buffered now so that if you know 47:04 you've sent the result to three 47:05 different servers 47:06 you're going to take the first reply and 47:07 return but the others are going to want 47:10 to send responses too 47:11 and we don't want those go routines to 47:13 just sit around forever trying to send 47:14 to a channel that we're not reading from 47:16 so we make the buffer big enough that 47:17 they can send into the buffer and then 47:19 go away 47:20 and the channel just gets garbage 47:22 collected 47:27 that says like why can't the timer just 47:29 be garbage collected when nobody's 47:30 referencing it instead of having to to 47:32 wait when it goes off when you said that 47:34 you have multiple 47:34 waiting if it goes off in one 47:36 millisecond yeah the the problem is the 47:37 timer 47:38 is referenced by the the run time it's 47:41 in the list of active timers 47:43 and so calling stop takes it out of the 47:45 list of active timers 47:46 and and so like that's arguably kind of 47:48 a wart in that 47:49 like in the specific case of a timer 47:51 that's like 47:53 only going to ever get used in this 47:55 channel way like we could have special 47:56 case that by like 47:57 having the channel because inside the 47:59 timer is this t.c channel right 48:01 so we could have had like a different 48:03 kind of channel implementation 48:04 that inside had a bit that said hey i'm 48:06 a timer channel right 48:08 and and and then like the select on it 48:10 would like know to just wait 48:12 but if you just let go of it it would 48:13 just disappear we've kind of like 48:16 thought about doing that for a while but 48:17 we never did and 48:18 so this is like the state of the world 48:20 um but but you know the garbage 48:22 collector can't distinguish between 48:24 you know the reference inside the 48:25 runtime and the reference and the rest 48:26 of the program it's all just references 48:28 and so until we like special case that 48:31 channel in some way like we we can't 48:33 actually get rid of that 48:37 thank you sure so um so then the only 48:40 thing we have left is to 48:42 have this preference where we try to use 48:43 the same um id that we did the previous 48:46 time 48:47 and so to do that preference um 48:50 we you know had the server id coming 48:52 back in the reply anyway 48:53 in the result channel and so you know we 48:56 do the same sort of 48:57 loop but we loop over an offset from the 48:59 actual id we're going to use which is 49:01 the pre 49:02 the preferred one and then when we get 49:04 an answer 49:05 we uh set the preferred one to where we 49:07 got the answer from and then we reply 49:09 and you'll notice that i used a go to 49:10 statement that's okay if you need to go 49:12 to it's fine 49:13 um it's not sort of there's no zealotry 49:16 here 49:18 all right so uh the fourth one and then 49:21 we'll we'll do some questions 49:22 um is a protocol multiplexer and this is 49:26 kind of the logic of a core of any rpc 49:28 system 49:29 and and this comes up a lot i feel like 49:31 i wrote a lot of these in grad school 49:33 and sort of years after that 49:35 and so the basic api of a protocol 49:38 multiplexer 49:38 is that it sits in from some service 49:40 which we're going to pass to the init 49:42 method 49:43 and then having been initialized with a 49:44 service you can call 49:47 and you can call call and give it a 49:49 message a request message and then it'll 49:51 you know give you back the reply message 49:53 at some point 49:54 and the things it needs from the service 49:55 to do multiflexing 49:57 is that given a message it has to be 49:59 able to pull out the tag that uniquely 50:01 identifies the message 50:02 and and will identify the the reply 50:04 because it will come back in with a 50:06 matching tag and then it needs to be 50:08 able to send a message out 50:09 and to receive you know a message but 50:11 the send and receive 50:13 um are there arbitrary messages that are 50:14 not matched 50:16 it's the multiplexer's job to actually 50:18 match them 50:21 so um to start with we'll have a go 50:23 routine that's in charge of calling send 50:25 and another group team that's in charge 50:27 of calling receive both in just a simple 50:29 loop 50:29 and so to initialize the service we'll 50:31 set up the structure and then we'll kick 50:33 off the send loop and the receive loop 50:35 and then we also have a map of pending 50:37 requests and the map 50:39 it maps from the tag that we saw the id 50:42 number in the messages to 50:43 a channel where the reply is supposed to 50:45 go 50:47 the send loop is fairly simple you just 50:50 range over the things that need to be 50:51 sent and you send them 50:52 and this just has the effect of 50:53 serializing the calls to send because 50:55 we're not going to force 50:57 the service implementation to you know 50:59 deal with 51:00 us sending you know from multiple 51:01 routines at once we're serializing it so 51:03 that it can just be thinking of 51:04 you know sending one one packet at a 51:06 time 51:09 and then the receive loop uh is a little 51:11 bit more complicated it pulls a receive 51:13 it pulls a reply off the 51:14 the service and again they're serialized 51:17 so we're only reading one at a time 51:18 and then it pulls the tag out of the 51:20 reply and then it says ah i need to find 51:23 the channel to send this to 51:25 uh so it pulls the channel out of the 51:26 pending map it takes it out of the 51:28 pending map so that you know if we 51:30 accidentally get another one we won't 51:32 try to send it 51:33 and then it sends the reply and then to 51:36 do a call 51:38 you just have to set yourself up in the 51:39 map and then hand it to send and wait 51:41 for the reply 51:42 so we start off we get the tag out 51:45 we make our own done channel we insert 51:48 the tag 51:48 into the map after first checking for 51:50 bugs and then 51:52 we send the the argument message to send 51:55 and then we wait for the reply to come 51:56 in undone 51:57 it's very very simple i mean like i used 51:59 to write these sort of things in c and 52:01 it was it was much much worse 52:04 so that was all the patterns that i 52:06 wanted to show 52:08 and um you know i hope that those end up 52:09 being useful for you in whatever future 52:12 program you're writing 52:13 and and i hope that they're you know 52:15 just sort of good ideas even in non-go 52:18 programs but that you know thinking 52:19 about them and go can help you when you 52:21 go to do other things as well 52:23 so i'm gonna put them all back up and 52:24 then um i have some questions that fran 52:27 sent that were you know from all of you 52:30 and um we'll probably have some time for 52:32 uh you know questions from from the chat 52:34 as well 52:35 i have no idea in zoom where the chat 52:37 window is so 52:39 when we get to that people can just 52:40 speak up just 52:42 i don't use zoom on a daily basis 52:44 unfortunately um 52:46 so uh and and normally i know how to use 52:48 zoom like regularly but with with the 52:50 presentation it's like zoom is in this 52:51 minimize thing that doesn't have half 52:53 the things i'm used to 52:54 anyway um someone asked how long ago 52:57 took 52:57 and so far it's been about 13 and a half 53:00 years 53:01 we started discussions in late september 53:03 2007 53:04 i joined full-time in august 2008 when i 53:06 finished at mit 53:08 we did the initial open source launch 53:10 november 2009 53:12 we released go one the sort of first 53:14 stable version in october 2011. 53:17 uh or sorry the plan was october 2011. 53:19 go one itself was march 2012. 53:22 and then we've just been on you know 53:23 it's a regular schedule since then 53:25 the next major change of course is is 53:27 going to be generics 53:28 and um and adding generics and that's 53:30 probably going to be go 118 53:32 which is going to be next in february 53:37 someone asked you know how big a team 53:38 does it take to build a language like go 53:41 and you know for those first two years 53:43 there were just five of us 53:45 and and that was enough to get us to uh 53:47 you know something that we released that 53:49 actually could run in production 53:51 but it was fairly primitive um you know 53:54 it was it was a good prototype it was a 53:55 solid working prototype but 53:57 but it wasn't like what it is today and 53:59 over time we've expanded a fair amount 54:01 now we're up to something like 50 people 54:03 employed directly or employed by google 54:05 to work directly on go 54:07 and then there's tons of open source 54:09 contributors i mean there's literal cast 54:11 of thousands that have helped us over 54:12 the last 13 years 54:14 and there's absolutely no way we could 54:15 have done it even with 50 people 54:17 without all the different contributions 54:19 from the outside 54:23 someone asked about design priorities um 54:26 and and motivations and you know we we 54:29 built it for us 54:30 right the priority was to build 54:31 something that was gonna help google and 54:33 it just turned out that google was like 54:35 a couple years ahead we were just in a 54:36 really lucky spot where google was a 54:38 couple years ahead of the rest of the 54:39 industry 54:40 on having to write distributed systems 54:42 right now everyone using 54:43 cloud software is is writing programs 54:45 that talk to other programs and sending 54:47 messages and you know there's 54:49 hardly any single machine programs 54:51 anymore 54:52 and so you know we sort of locked into 54:55 at some level 54:56 you know building the language that we 54:58 that the rest of the world needed a 54:59 couple years later 55:01 and and then the other thing that that 55:03 was really a priority was making it work 55:04 for 55:05 large numbers of programmers and because 55:07 you know google had a very large number 55:09 of programmers working in one code base 55:11 and and now we have open source where 55:13 you know even if you're a small team 55:15 you're depending on code that's written 55:16 by a ton of other people usually 55:18 and so a lot of the the issues that come 55:21 up with just having many programmers 55:22 still come up in that context 55:24 so those were really the things we were 55:26 trying to solve 55:28 and you know for all of these things we 55:30 we took a long time 55:31 before we were willing to actually 55:33 commit to putting something in the 55:34 language like everyone basically had to 55:36 agree 55:36 in the the core original group and and 55:39 so that meant that 55:41 it took us a while to sort of get the 55:42 pieces exactly the way we wanted them 55:44 but once we got them there they've 55:46 actually been very stable and solid and 55:48 really nice and they work together well 55:50 and and the same thing is kind of 55:52 happening with generics now 55:53 where we actually feel i feel personally 55:56 really good about generics i feel like 55:58 it feels like the rest of go and that 56:00 just wasn't the case for the proposals 56:02 that we had 56:02 you know even a couple years ago much 56:04 less the you know early ones 56:08 uh someone said they they really like 56:10 defer uh which is unique to language and 56:12 and i do too 56:13 thank you um but i wanted to point out 56:15 that you know we we did absolutely 56:17 you know create defer for go but um 56:20 swift has adopted it and i think there's 56:21 a proposal for sipos bus to adopt it as 56:23 well so you know hopefully it kind of 56:25 moves out a little bit 56:29 there was a question about um go and 56:31 using capitalization for exporting 56:34 and which i know is like something that 56:35 uh you know 56:37 sort of is jarring when you first see it 56:39 and and the story behind that is that 56:41 well 56:41 we needed something and we knew that we 56:43 would need something but like at the 56:44 beginning we just said look everything's 56:45 exported everything's publicly visible 56:47 we'll deal with it later and after about 56:50 a year it was like clear that we needed 56:51 some way to 56:52 you know let programmers hide things 56:54 from other programmers 56:55 and you know c plus plus has this public 56:58 colon and private colon 57:00 and in a large struct it's actually 57:02 really annoying that like 57:03 you're looking you're in the you're 57:04 looking at definitions and you have to 57:06 scroll backwards and try to find where 57:08 the like most recent public colon or 57:09 private colon was 57:10 and if it's really big it can be hard to 57:12 find one and so it's like hard to tell 57:14 whether a particular definition is 57:15 public or private and then in java of 57:17 course it's at the beginning of every 57:19 single field 57:20 and that seemed kind of excessive too 57:22 it's just too much typing 57:24 and so we looked around some more and 57:26 and someone pointed out to us that well 57:27 python has this convention where you put 57:29 an underscore in front to make something 57:31 hidden 57:32 and that seemed interesting but you 57:34 probably don't want the default to be 57:35 not hidden 57:36 you want the default to be hidden um and 57:39 then we thought about well we could put 57:40 like a plus in front of names 57:42 um and then someone suggested well like 57:46 what about 57:46 uppercase could be exported and it 57:48 seemed like a dumb terrible idea 57:50 it really did um but as you think about 57:53 it like 57:54 i really didn't like this idea um and i 57:56 have like very clear memory of sitting 57:58 of like the room and what i was staring 58:00 at as we discussed this 58:02 uh but i had no logical argument against 58:04 it and it turned out it was fantastic 58:06 it was like it seemed bad it just like 58:08 aesthetically 58:09 but it is one of my favorite things now 58:11 about go that when you look at a use of 58:13 something 58:14 you can see immediately you get that bit 58:16 of is this something that other people 58:18 can access or not 58:19 at every use because if you know you see 58:21 code calling a function to do 58:23 you know whatever it is that it does you 58:24 think oh wow like 58:26 can other people do that and and you 58:28 know your brain sort of takes care of 58:29 that but now i go to c 58:30 plus and i see calls like that and i get 58:33 really worried i'm like wait is that is 58:34 that something other classes can get at 58:37 um and having that bid actually turns 58:39 out to be really useful for for reading 58:40 code 58:42 a couple people asked about generics if 58:44 you don't know we have an active 58:46 proposal for generics we're actively 58:48 working on implementing it 58:49 we hope that the the release later in 58:52 the year 58:52 uh towards the end of the year will 58:54 actually have you know a full version of 58:56 generics that you can you can actually 58:57 use 58:58 the the um that'll be like a preview 59:00 release the real release that we hope it 59:02 will be in 59:03 is go 118 which is february of next year 59:06 so maybe next class 59:07 uh we'll actually get to use generics 59:09 we'll see 59:10 but i'm certainly looking forward to 59:12 having like a generic min and max the 59:13 reason we don't have those is that 59:15 you'd have to pick which type they were 59:16 for or have like a whole suite of them 59:18 and it just seemed silly it seemed like 59:19 we should wait for generics 59:22 um someone asked is there any area of 59:25 programming where go 59:26 may not be the best language but it's 59:28 still used 59:29 and and the answer is like absolutely 59:31 like that happens all the time with 59:32 every language 59:33 um i think go is actually really good 59:35 all around language 59:37 um but you know you might use it for 59:39 something that's not perfect for 59:41 just because the rest of your program is 59:43 written and go and you want to 59:44 interoperate with the rest of the 59:45 program 59:46 so you know there's this website called 59:47 the online encyclopedia of integer 59:49 sequences 59:50 it's a search engine you type in like 59:51 two three five seven eleven and it tells 59:53 you those are the primes 59:54 um and it turns out that the back end 59:56 for that is all written and go 59:58 and if you type in a sequence it doesn't 60:00 know it actually does some pretty 60:01 sophisticated math on the numbers 60:03 all with big numbers and things like 60:04 that and all of that is written in go to 60:06 because 60:07 it was too annoying to shell out to 60:09 maple and mathematica and 60:10 sort of do that cross-language thing 60:12 even though you'd much rather implement 60:13 it in those languages 60:14 so you know you run into those sorts of 60:16 compromises all the time and that's fine 60:20 um someone asked about uh 60:24 you know go is supposed to be simple so 60:26 that's why there's like no generics and 60:27 no sets 60:29 but isn't also for software developers 60:30 and don't software developers need all 60:32 this stuff and you know it's silly to 60:33 reconstruct it 60:35 and i think that's it's true that 60:36 there's someone in tension but but 60:38 simplicity in the sense of leaving 60:40 things out was not ever the goal 60:42 so like for sets you know it just seemed 60:45 like maps are so close to sets you just 60:47 have a 60:47 set a map where the value is empty or a 60:49 boolean 60:50 that's a set and for generics like you 60:53 have to remember that when we started go 60:55 in 2007 java was like just 60:58 finishing a true fiasco of a rollout of 61:01 generics 61:02 and so like we were really scared of 61:04 that we knew that if we just tried to do 61:06 it 61:06 um you know we would get it wrong and we 61:09 knew that we could write a lot of useful 61:10 programs without generics 61:12 and so that was what we did and um and 61:15 we came back to it when you know we felt 61:17 like 61:17 okay we've you know spent enough time 61:19 writing other programs we kind of know a 61:20 lot more about what we need from from 61:22 generics for go 61:24 and and we can take the time to talk to 61:25 real experts and i think that you know 61:28 it would have been nice to have them 61:30 five or ten years ago but we wouldn't 61:31 have had 61:32 the really nice ones that we're going to 61:34 have now so i think it was probably the 61:35 right decision 61:40 um so there was a question about go 61:42 routines and the relation to the plan 61:43 line thread library which which was all 61:45 cooperatively scheduled 61:46 and whether go routines were ever 61:48 properly scheduled and like if that 61:49 caused problems 61:51 and it is absolutely the case that like 61:53 go and and 61:54 the go routine runtime were sort of 61:56 inspired by previous experience on plan 61:58 nine 61:59 there was actually a different language 62:01 called aleph on an early version plan 62:03 nine 62:03 that was compiled it had channels it had 62:06 select 62:07 it had things we called tasks which were 62:09 a little bit like our teens but it 62:11 didn't have a garbage collector and that 62:12 made things really annoying in a lot of 62:14 cases 62:15 and also the way that tasks work they 62:17 were tied to a specific thread so you 62:19 might have 62:19 three tasks in one thread and two tasks 62:22 and another thread 62:23 and in the three tasks in the first 62:24 thread the only one ever ran at a time 62:27 and they could only reschedule during a 62:29 channel operation 62:30 and so you would write code where those 62:32 three tasks were all operating on the 62:33 same data structure 62:35 and you just knew because it was in your 62:37 head when you wrote it 62:38 that you know it was okay for these two 62:40 different tasks to be scribbling over 62:42 the same data structure because they 62:43 could never be running at the same time 62:45 and meanwhile you know in the other 62:46 thread you've got the same situation 62:48 going on with different data and 62:49 different tasks 62:50 and then you come back to the same 62:51 program like six months later and you 62:52 totally forget which tasks could 62:54 write to different pieces of data and 62:56 i'm sure that we had tons of races i 62:58 mean it was just 62:59 it was a nice model for small programs 63:01 and it was a terrible model for for 63:03 programming over a long period of time 63:04 or having a big program that other 63:06 people had to work on 63:07 so so that was never the model for go 63:09 the model for go was always 63:11 it's good to have these lightweight go 63:12 routines but they're gonna all be 63:14 running independently and if they're 63:16 going to share anything they need to use 63:17 locks and they need to use channels to 63:18 commute to communicate 63:20 and coordinate explicitly and and that 63:23 that has definitely scaled a lot better 63:25 than any of the planned line stuff ever 63:26 did 63:27 um you know sometimes people hear that 63:30 go routines are cooperatively scheduled 63:31 and they they think you know something 63:33 more like that 63:34 it's it's true that early on the go 63:37 routines were not as preemptively 63:39 scheduled as you would like 63:41 so in the very very early days the only 63:43 preemption points when you called into 63:44 the run time 63:45 shortly after that the preemption points 63:47 were any time you entered a function 63:50 but if you were in a tight loop for a 63:52 very long time that would never preempt 63:54 and that would cause like garbage 63:55 collector delays because the garbage 63:56 collector would need to 63:57 stop all the go routines and there'd be 63:59 some guaranteeing stuck in a tight loop 64:00 and it would take forever to finish the 64:02 loop 64:03 um and so actually in the last couple 64:05 releases we finally started we figured 64:06 out how to get 64:07 um unix signals to deliver to threads in 64:10 just the right way 64:11 so that and we can have the right 64:12 bookkeeping to actually be able to use 64:14 that as a preemption mechanism 64:16 and and so now things are i think i 64:18 think the preemption delays for garbage 64:20 collection are actually bounded finally 64:22 but but from the start the model has 64:23 been 64:24 that you know they're running 64:25 preemptively and and they don't get 64:27 control over when they get preempted 64:30 uh as a sort of follow-on question 64:32 someone else asked uh you know where 64:34 they can look to in the source tree to 64:35 learn more about guru teams 64:37 and and the go team scheduler and and 64:39 the answer is that you know this is 64:41 basically a little operating system like 64:42 it's a little operating system that sits 64:44 on top of 64:44 the other operating system instead of on 64:46 top of cpus 64:48 um and so the first thing too is like 64:50 take six eight two eight which is like 64:52 there i mean i i worked on 6828 and and 64:54 xv6 64:56 like literally like the year or two 64:57 before i went and did the go run time 64:59 and so like there's a huge amount of 688 65:00 in the go runtime 65:02 um and in the actual go runtime 65:04 directory there's a file called proc.go 65:06 which is you know proc stands for 65:08 process because like that's what it is 65:09 in 65:09 the operating systems um and i would 65:12 start there like that's the file to 65:13 start with and then sort of pull on 65:14 strings 65:18 someone asked about python sort of 65:20 negative indexing 65:21 where you can write x of minus one and 65:23 and that comes up a lot 65:24 especially from python programmers and 65:27 and it seems like a really great idea 65:28 you write these like really nice elegant 65:30 programs where like you want to get the 65:31 last element you just say x minus one 65:33 but the real problem is that like you 65:35 have x of i and you have a loop that's 65:37 like counting down from 65:38 from you know n to zero and you have an 65:41 off by one somewhere and like now x of 65:43 minus one instead of being 65:44 you know x of i when i is minus one 65:47 instead of being an error where you see 65:48 like immediately say hey there's a bug i 65:50 need to find that 65:50 it just like silently grabs the element 65:52 off the other end of the array 65:54 and and that's where you know the sort 65:56 of python 65:57 um you know simplicity you know makes 66:00 things worse 66:02 and so that was why we left it out 66:03 because it was it was gonna hide bugs 66:04 too much we thought 66:06 um you know you could imagine something 66:08 where you say like x of dollar minus one 66:10 or len minus one 66:12 not len of x but just len but you know 66:14 it seemed like too much of a special 66:16 case and it really 66:17 it doesn't come up enough 66:20 um someone asked about uh you know 66:23 what aspect of go was hardest to 66:25 implement 66:26 and honestly like a lot of this is not 66:28 very hard um 66:29 we've done most of this before we'd 66:31 written operating systems and threading 66:33 libraries and channel implementations 66:35 and so like doing all that again was 66:36 fairly straightforward 66:38 the hardest thing was probably the 66:39 garbage collector 66:42 go is unique among garbage collected 66:43 languages in that it gives programmers a 66:45 lot 66:46 more control over memory layout so if 66:47 you want to have a struct with two 66:49 different 66:49 other structs inside it that's just one 66:51 big chunk of memory 66:52 it's not a struct with pointers to two 66:54 other chunks of memory 66:56 and because of that and you can take the 66:57 address of like the second field in the 66:59 struct and pass that around 67:00 and that means the garbage collector has 67:02 to be able to deal with a pointer that 67:03 could point into the middle of an 67:05 allocated object and that's just 67:06 something that java and lisp and other 67:08 things just don't do 67:10 um and so that makes the garbage 67:12 collector a lot more complicated in how 67:14 it maintains its data structures 67:16 and we also knew from the start that you 67:18 really want low latency because if 67:19 you're handling 67:20 network requests uh you can't you know 67:23 just 67:24 pause for 200 milliseconds while and 67:26 block all of those 67:27 in progress requests to do a garbage 67:29 collection it really needs to be 67:30 in you know low latency and not stop 67:32 things and we thought that multicore 67:34 would be a good 67:35 a good opportunity there because we 67:37 could have the garbage collector sort of 67:38 doing one core 67:39 and the go program using the other cores 67:41 and and that might work really well and 67:42 that actually did 67:43 turn out to work really well but it 67:45 required hiring a real expert in garbage 67:48 collection 67:48 to uh like figure out how to do it um 67:52 and make it work 67:53 but but now it's it's really great um i 67:56 i have a quick question yeah you said um 67:59 like if it's struck like it's declared 68:02 inside another stroke 68:03 it actually is all a big chunk of memory 68:06 yeah 68:06 why do why did you implement it like 68:08 that what's the reasoning behind that 68:10 um i well so there's a couple reasons 68:12 one is for a garbage collector right 68:14 it's a service 68:15 and the load on the garbage collector is 68:17 proportional to the number of objects 68:18 you allocate 68:19 and so if you have you know a struct 68:21 with five things in it you can make that 68:22 one allocation that's like a fifth of 68:25 the the load on the garbage collector 68:26 and that turns out to be really 68:27 important 68:28 but the other thing that's really 68:29 important is cache locality 68:31 right like if you have the processor is 68:33 pulling in chunks of memory 68:34 in like you know 64 byte chunks or 68:36 whatever it is and it's much better at 68:37 reading memory that's all together 68:39 than reading memory that's scattered and 68:41 so 68:42 um you know we have a git server at 68:44 google called garrett 68:46 that is written in java and it was just 68:47 starting at the time that 68:49 go was you know just coming out and and 68:51 we we just missed like garrett being 68:53 written and go i think by like a year 68:55 um but we talked to the the guy who had 68:57 written garrett and he said that like 68:59 one of the 68:59 biggest problems in in garrett was like 69:02 you have all these shot one hashes 69:03 and just having the idea of 20 bytes is 69:06 like impossible to have in java you 69:08 can't just have 20 bytes in a struct 69:10 you have to have a pointer to an object 69:12 and the object 69:13 like you know you can't even have 20 69:15 bytes in the object right you have to 69:16 declare like five different ins or 69:17 something like that to get 20 bites 69:19 and there's just like no good way to do 69:21 it and and it's just the overhead of 69:23 just a simple thing like that 69:25 really adds up um and so you know 69:28 we thought giving programmers control 69:30 over memory was really important 69:33 um so another question was 69:36 was about automatic parallelization like 69:38 for loops and things like that 69:40 we don't do anything like that in the 69:41 standard go tool chain there are there 69:43 are go compilers for go front ends for 69:45 gcc and llvm 69:47 and so to the extent that those do those 69:49 kind of loop optimizations in c 69:50 i think you know we get the same from 69:52 the go friends for those 69:54 but it's it's not the kind of 69:56 parallelization that we typically need 69:57 at google 69:58 it's it's more um you know lots of 70:00 servers running different things 70:02 and and so you know that sort of you 70:04 know like the sort of big vector math 70:06 kind of stuff doesn't come up as much so 70:08 it just hasn't been 70:09 that important to us um 70:12 and then the last question i have 70:14 written now is that someone uh 70:16 asked about like how do you decide when 70:18 to acquire release locks and why don't 70:19 you have re-entry locks 70:20 and for that i want to go back a slide 70:22 let me see 70:25 yeah here so like you know during the 70:28 lecture i said things like the lock pro 70:30 like new protects the map or it protects 70:32 the data 70:33 but what we really mean at that point is 70:34 that we're saying that the lock protects 70:36 some collection of invariants that apply 70:39 to the data or that are true of the data 70:41 and the reason that we have the lock is 70:42 to to protect the operations that depend 70:45 on the invariants 70:46 and that sometimes temporarily 70:47 invalidate the invariants from each 70:49 other 70:50 and so when you call lock what you're 70:52 saying is 70:53 i need to make use of the invariance 70:55 that this lock protects 70:57 and when you call unlock what you're 70:58 saying is i don't need them anymore and 71:00 if i 71:00 temporarily invalid invalidated them 71:03 i've put them back 71:04 so that the next person who calls lock 71:05 will see you know correct invariants 71:08 so in the mux you know we want the 71:10 invariant that each registered pending 71:12 channel 71:13 gets at most one reply and so to do that 71:16 when we take don out of the map 71:18 we also delete it from the map before we 71:20 unlock it 71:21 and if there was some separate kind of 71:22 cancel operation that was directly 71:24 manipulating the map as well 71:26 it could lock the it could call lock it 71:28 could take the thing out 71:30 call unlock and then you know if it 71:32 actually found one it would know 71:34 no one is going to send to that anymore 71:36 because i took it out 71:38 whereas if you know we had written this 71:40 code to have 71:41 you know an extra unlock and re-lock 71:43 between the done 71:44 equals pending of tag and the delete 71:47 then you wouldn't have that you know 71:49 protection of the invariants anymore 71:50 because you would have 71:51 put things back you unlocked and 71:53 relocked while the invariants were 71:55 broken 71:56 and so it's really important to you know 71:58 correctness 71:59 to think about locks as protecting 72:01 invariants 72:02 and and so if you have re-entrant locks 72:05 uh 72:06 all that goes out the window without the 72:08 re-entrant lock when you call lock 72:10 on the next line you know okay the lock 72:13 just got acquired 72:14 all the invariants are true if you have 72:16 a re-entrant lock 72:17 all you know is well all the invariants 72:19 were true 72:20 for whoever locked this the first time 72:22 who like might be way up here on my call 72:23 stack 72:24 and and you really know nothing um and 72:27 so that makes it a lot harder to reason 72:29 about like what can you assume 72:31 and and so i think reentrant locks are 72:32 like a really unfortunate part of java's 72:34 legacy 72:35 another big problem with re-engine locks 72:37 is that if you have code 72:38 where you know you call something and it 72:40 is depending on the re-entrant lock 72:42 for you know something where you've 72:43 acquired the lock up above 72:45 and and then at some point you say you 72:47 know what actually i want to like have a 72:48 timeout on this or i want to do it uh 72:50 you know in some other go routine while 72:52 i wait for something else 72:53 when you move that code to a different 72:55 go routine re-entrant always means 72:57 locked on the same stack that's like the 72:59 only plausible thing it could possibly 73:00 mean 73:01 and so if you move the code that was 73:03 doing the re-entrant lock 73:04 onto a different stack then it's going 73:07 to deadlock because it's going to 73:08 that lock is now actually going to real 73:10 lock acquire and it's going to be 73:11 waiting for you to let go of the lock i 73:13 mean you're not going to let go of it 73:14 because you know you think 73:14 that code needs to finish running so 73:16 it's actually like completely 73:17 fundamentally incompatible with 73:19 restructurings where you take code and 73:21 run it in different threads or different 73:22 guarantees 73:24 and so so anyway like my advice there is 73:25 to just you know think about locks as 73:26 protecting invariants 73:28 and then you know just avoid depending 73:30 on reentrant locks it it really just 73:32 doesn't scale well to to real programs 73:35 so i'll put this list back up 73:37 actually you know we have that up long 73:38 enough i can try to figure out how to 73:39 stop presenting 73:40 um and then i can take a few more 73:44 questions 73:47 um i had i had a question yeah um and 73:50 i mean i i think coming from python like 73:53 it's very 73:54 useful right it's very common to use 73:56 like like standard functional operations 73:58 right like map 73:59 yeah um or filter stuff like that like 74:03 um like list comprehension 74:06 and when you know i switched over to go 74:09 and started programming 74:10 it's used i i looked it up and people 74:13 say like you shouldn't do this 74:14 do this with loop right i was wondering 74:16 why 74:17 um well i mean one is that like you 74:20 can't do it the other way so you might 74:21 just look through the way you can do it 74:22 um but uh you know a bigger a bigger 74:25 issue is that 74:27 well there's that was one answer the 74:29 other answer is that 74:31 uh you know if you do it that way you 74:33 actually end up creating a lot of 74:34 garbage 74:35 and if you care about like not putting 74:37 too much load on the garbage collector 74:38 that kind of is another way 74:39 to avoid that you know so if you've got 74:43 like 74:43 a map and then a filter and then another 74:45 map like you can make that one loop over 74:47 the data instead of three loops over the 74:48 data each of which generate a new piece 74:50 of garbage 74:53 but you know now that we have generics 74:55 coming um you'll actually be able to 74:56 write those functions like you couldn't 74:58 actually write what the type signature 74:59 of those functions were 75:00 before and so like you literally 75:01 couldn't write them and python gets away 75:04 with this because there's no no 75:05 you know static types but now we're 75:07 actually going to have a way to do that 75:08 and i totally expect that once generics 75:10 go in there will be a package slices and 75:12 if you import slices you can do 75:13 slices.map and slices.filter 75:15 and like slices.unique or something like 75:17 that and and i think 75:18 those will all happen um and you know if 75:21 if that's the right thing then that's 75:22 great 75:25 thanks sure um 75:30 one of the hints that you had it was 75:33 about 75:33 running go routines that are independent 75:37 like concurrently um and some of the 75:40 examples of the code i 75:42 i think i couldn't understand it seemed 75:44 to me like you can 75:45 just like call the function in the same 75:48 thread 75:49 rather than a different thread and i was 75:53 not sure why you would call it in a 75:54 different thread 75:56 so um usually it's because you want 75:59 them to proceed independently so um so 76:02 in one of the 76:03 one of the examples we had like the 76:05 there was a loop that was sending 76:07 um you know tasks to the work queue 76:10 but there was the servers were running 76:12 in different go routines and reading 76:14 from the work queue and doing work 76:15 but then when they were done they would 76:17 send uh you know hey i'm done now to the 76:19 done channel 76:20 but ascend in go doesn't complete until 76:23 the receive 76:24 actually matches with it and so if the 76:27 thing that's sending on the work queue 76:29 is not going to start receiving from the 76:30 done channel until it's done sending to 76:32 all the work queues 76:33 or sending all the work into all the 76:35 tasks into the work queue 76:37 then now you have a deadlock because the 76:39 the main thread 76:40 the main go routine is trying to send 76:42 new work to the servers 76:44 the servers are not taking new work 76:45 they're trying to tell the main thread 76:47 that they're done 76:48 but the main thread's not going to 76:49 actually start at like reading from the 76:51 done channel 76:52 until it finishes giving out all the 76:53 work and so there's just they're just 76:55 staring at each other waiting for 76:56 different things to happen 76:57 whereas if we take that loop that if we 77:00 just put the little girl routine around 77:01 the loop that's sending the work 77:03 then that can go somewhere else and then 77:05 it can proceed independently and while 77:06 it's stuck 77:07 waiting for the servers to send to um 77:10 take more work 77:10 the servers are stuck waiting for the 77:12 main go routine to you know 77:14 acknowledge that it finished some work 77:17 and now the main goal team actually gets 77:18 down to the loop 77:19 that you know pulls that finishes that 77:21 actually acknowledges that it finished 77:23 the work that reads from the done 77:24 channel 77:25 and so it's just a way to separate out 77:26 you know these are two different things 77:28 that logically 77:29 they didn't have to happen one after the 77:30 other and because they were happening 77:32 one after the other that caused a 77:33 deadlock and by taking one out and 77:35 moving it let it run independently 77:37 um that removes the deadlock 77:41 thank you so much sure could you talk a 77:44 little bit about how ghost race detector 77:46 is implemented 77:48 sure it is the llvm race detector um 77:51 and so that probably doesn't help but 77:53 but it is exactly the thing that 77:54 llvm calls thread sanitizer and um 77:58 and so we actually have a little binary 77:59 blob that uh you know we link against 78:02 because we don't want to depend on all 78:03 of lvm but it's the llvm race detector 78:06 and the way the llvm race sector works 78:08 is that it allocates a ton of 78:10 extra virtual memory and then based on 78:12 the address of 78:13 of the thing being read or written it 78:15 has this other 78:16 you know spot in virtual memory where it 78:18 records 78:19 information about like the last uh 78:22 thread you know it thinks of threads but 78:24 their go routines 78:25 um has with the last thread that did a 78:27 read or a write 78:28 and then also every time a synchronizing 78:30 event happens like you know a 78:32 communication from one go routine to 78:33 another 78:34 uh that counts as establishing a happens 78:37 before edge between two different go 78:38 routines 78:39 and if you ever get something where you 78:41 have a read and a write 78:43 and they're not properly sequenced right 78:46 like so if you have a read and then it 78:47 happens before something in another 78:49 chain which then you know later does the 78:50 right that's fine 78:52 but if you have a read and a write and 78:53 there's no happens before sequence that 78:55 connects them 78:56 then um then that's a race and it 78:58 actually you know 78:59 has some pretty clever ways to you know 79:02 dynamically figure out quickly you know 79:04 did this read happen is there a happens 79:07 before a path between this readings 79:08 right as they happen 79:09 and it slows down the program by like 79:11 maybe 10x 79:12 but you know if you just divert a small 79:14 amount of traffic there 79:15 that's probably fine if it's for testing 79:18 that's also probably fine 79:19 and it's way better than like not 79:20 finding out about the races so it's 79:22 totally worth it 79:23 and honestly 10 or 20 x is is fantastic 79:26 the original thread sanitizer was more 79:28 like 79:28 100 or a thousand x and that was not 79:30 good enough well what's the rate 79:32 detector called 79:33 lrvm uh it's called thread sanitizer but 79:36 it's part of llvm 79:37 which is um the clang c compiler the the 79:40 one that 79:41 um almost everyone uses now 79:44 is is part of the llvm project 79:54 can you talk about slices um and like 79:56 the design choices having them as 79:58 views on a raise which like confused me 80:00 at first 80:01 yeah yeah it is a little confusing at 80:03 first um 80:05 the the main thing is that you want it 80:06 to be efficient to kind of walk through 80:08 an array or to like 80:09 you know if you're in quicksort or merge 80:11 sword or something where you have an 80:12 array of things 80:13 and now you want to say well now sort 80:14 this half and sort the other half 80:16 you want to be able to efficiently say 80:18 like here this is half of the previous 80:20 one like 80:21 you know sort that and so in c the way 80:24 you do that is you just pass in 80:26 you know the pointer to the first 80:27 element and the number of elements 80:29 and that's basically all a slice is and 80:31 then the other pattern that comes up a 80:33 lot when you're 80:34 you know trying to be efficient with 80:36 arrays is you have to grow them 80:38 and and so you don't want to recall 80:40 realic on every single 80:42 new element you want to amortize that 80:44 and so the way you do that 80:46 in in c again is that you have a base 80:48 pointer you have the length that you're 80:50 using right now and you have the length 80:51 that you allocated 80:52 and then to you know add one you you 80:54 check and see if the length is is bigger 80:56 than the amount you allocated if so you 80:57 reallocate it and otherwise you just 80:58 keep bumping it forward 81:00 and and slices are really just an 81:02 encoding of those idioms 81:03 because those are kind of the most 81:05 efficient way to manage the memory 81:07 and so in in any kind of like c plus 81:09 vector or 81:10 um sort of thing like that that's what's 81:12 going on underneath 81:14 but it makes it a lot harder to um like 81:16 the c plus 81:17 vector because of ownership reasons you 81:20 know the vector is tied to the actual 81:21 underlying memory it's a lot harder to 81:23 get 81:23 like a sub vector that's just the view 81:25 onto like the second half for merge sort 81:28 so that's sort of the idea is that it 81:30 just like there are all these patterns 81:32 for 81:32 accessing memory efficiently that came 81:33 from c and we tried to make them fit and 81:36 to go in an idiomatic way 81:38 in a safe way 81:42 can you talk about how you decided to um 81:45 implement the go like remote module 81:47 system where you import directly from a 81:48 url 81:49 versus like yeah um i mean i just didn't 81:52 want to run a service and like like 81:54 you know a lot of the things like ruby 81:57 gems and those like were not as as 81:59 for the front of my mind at the time 82:02 just because they were newer 82:03 but like i had used pearl for a while 82:04 and like cpan and and i just thought it 82:07 was it was insane that like 82:08 everyone was fighting over these short 82:10 names like db you know 82:12 there probably shouldn't be an argument 82:13 over like who gets to make the db 82:15 package 82:16 um and so putting domain names in the 82:19 front seemed like a good way to 82:20 decentralize it 82:21 and and it was also a good way for us 82:23 not to run any server because you know 82:24 we could just say well 82:25 you know we'll recognize the host name 82:27 and then and then go grab it from source 82:28 control 82:29 um from someone else's server and that 82:31 turned out to be a really great idea i 82:33 think 82:34 um because we just we don't have that 82:36 kind of same infrastructure 82:37 that other things depend on like in the 82:40 java world it's actually 82:42 really problematic there are multiple 82:43 there's no sort of standard registry but 82:46 they all use these short names 82:47 and so uh like maven can be configured 82:50 to build from multiple different 82:52 registries 82:53 and you if you're an open source 82:55 software package provider you actually 82:56 have to go around and be sure that you 82:58 upload it to all the different 82:59 registries 82:59 because if you don't if you miss one and 83:01 it becomes popular someone else will 83:03 upload different code to that one 83:05 and um and then like maven actually just 83:07 takes whichever one comes back first it 83:08 just like sends a request to all of them 83:10 and whatever comes back first so like 83:12 you know if someone wants to make a 83:13 malicious copy of your package all you 83:14 do is find some registry other people 83:16 use that you forgot to upload it to 83:17 and like you know they get to win the 83:19 race sometimes 83:21 so it's like it's a real problem like i 83:23 think having domain name there really 83:25 helps split up the ownership in a really 83:27 important way 83:28 thank you sure 83:34 so the maybe we should take a quick uh 83:37 pause here 83:37 those people that have to go can go i'm 83:39 sure russ is willing to uh stick around 83:41 for a little bit longer 83:42 yeah and answer any questions uh but i 83:44 do want to thank 83:45 ross for giving this lecture uh you know 83:47 hopefully this will help you running 83:49 more 83:49 good go programs these patterns 83:52 and uh so thank you russ very welcome 83:56 it's nice to be here 84:00 and then more questions feel free to ask 84:01 questions yeah 84:03 oh just a little logistical thing uh the 84:06 slides that are on the 6824 website are 84:09 not 84:10 they exactly the same as russ's slides 84:12 people 84:13 check them out i'll get franz a new pdf 84:15 yeah 84:18 more general question about when is 84:21 writing a new language the 84:23 like the best solution to a problem 84:25 that's a great question 84:26 um it's almost never the best solution 84:30 but you know at the time we had just an 84:32 enormous number of programmers like 84:34 thousands of programmers working in one 84:35 code base 84:36 and the compilations were just taking 84:39 forever because 84:40 um seatbelts plus was just not not meant 84:42 for you know efficient incremental 84:44 compilation 84:46 and and so it and furthermore 84:49 at the time like threading libraries 84:51 were really awful like people just 84:52 didn't use threats i remember 84:54 like one of the first days i was at mit 84:55 and talking to robert and robert said to 84:57 me 84:58 um like in 2001 he said to me like well 85:00 we don't use threads here because 85:01 threads are slow 85:02 and and that was like totally normal 85:04 like that was just the way the world at 85:05 the time 85:06 um and and at google we were having a 85:09 lot of trouble 85:10 because it was all event-based like 85:11 little callbacks in c plus plus 85:13 and there were these multi-core machines 85:15 and we actually didn't know how to get 85:17 things to work on them because like 85:18 linux threads were not something you 85:20 could really rely on to work 85:22 and and so we ended up like if you had a 85:24 four core machine you just run four 85:25 different 85:26 process in completely independent 85:27 processes of the web server and just 85:28 treat it as like four machines 85:30 um and that was clearly like not very 85:32 efficient 85:33 so like there were a lot of good reasons 85:35 to like try something 85:37 um but you know it's a huge amount of 85:40 work to get to the point where go is 85:42 today and i think that 85:43 um so much is not the language right 85:46 like there were important things that we 85:47 made did in the language that enabled 85:48 other 85:49 um considerations but uh 85:52 so much of the successful languages the 85:54 ecosystem that got built up around it 85:55 and the tooling that we built and the go 85:57 command and like all these like not the 85:59 language things so 86:00 you know programming language uh people 86:02 who are like focus on the language 86:03 itself 86:04 i think sometimes get distracted by all 86:06 the stuff around like they miss all the 86:08 stuff around it 86:15 um can i ask a follow-up on that yeah i 86:18 was wondering how is 86:19 working on go different now since it's 86:22 more mature than 86:23 it was before oh 86:27 that's a great question um you know in 86:29 the early days it was so easy to make 86:31 changes 86:31 and now it's really hard to make changes 86:33 i think that's the number one thing 86:35 um you know in the early days like 86:39 everything was in one source code 86:40 repository literally all the go code in 86:42 the world was the one source code 86:43 repository and so like there were days 86:44 where we changed the syntax like you 86:45 used to have a star before chan 86:47 every time you set a channel because it 86:48 was then it was a pointer underneath and 86:50 it was all kind of exposed so you'd 86:52 always say star channel instead of jan 86:53 and 86:54 and and similarly for maps and at some 86:56 point we realized like 86:57 this is dumb like you have to say the 86:59 star let's just take it out 87:00 and um and so like we made the change to 87:02 the compiler and i opened up literally 87:04 like 87:04 the couple hundred go source files in 87:06 the world in my editor and like 87:08 the entire team stood behind me and like 87:10 i typed some regular expressions and we 87:11 looked at the effect on the files 87:13 yep that looks right save it you know 87:15 compile it we're done 87:16 and like today you know we can't make 87:18 backwards compatible changes at all 87:20 um and and even making you know new 87:23 changes like it 87:25 it affects a lot of people and so uh 87:28 you know you sort of propose something 87:30 and you know people point out well this 87:32 won't work for me and you try to like 87:33 adjust that maybe 87:35 um it's just it's a lot harder we 87:36 estimate there's at least a million 87:38 maybe two million go programmers in the 87:39 world and 87:40 it's very different from when they were 87:41 you know four or five 87:51 not sure if this is a valid question but 87:53 what what language is go written in is 87:55 it written in go also or no 87:57 now it is now it is the original um 87:59 compiler runtime were written in c 88:01 but a few years ago we went through a 88:02 big um we actually wrote a 88:05 a program to translate c to go and that 88:07 only worked for rc code but still it was 88:09 good enough 88:10 so that we wouldn't lose kind of all the 88:12 sort of encoded knowledge in that code 88:14 about why things were the way they were 88:15 and like how things work so we have to 88:16 start from scratch 88:17 but now it's all written and go and you 88:19 know a little bit of assembly 88:21 and that means that um people can uh you 88:24 know 88:24 people who know go can help on the the 88:26 go project whereas before like 88:28 if you wanted to work on the compiler or 88:30 the runtime you had to know c really 88:32 well and like 88:33 we weren't getting a lot of people knew 88:34 c really well like there's not actually 88:35 that many of them proportionately and 88:37 and furthermore like our entire user 88:38 base is go programmers not c programmers 88:40 so moving to go was was a really big 88:42 deal 88:46 i was wondering how did you prioritize 88:48 what features to add to the language at 88:50 like this point like in all generics 88:52 like a lot of people were like asking 88:54 for that like 88:55 did y'all know like how you choose what 88:57 to work on 88:59 i mean we've considered language mostly 89:00 frozen for a while 89:02 and um and so we haven't been adding 89:04 much uh there was a long period where we 89:05 said we weren't adding anything and then 89:07 we added a little bit of things 89:08 in the last couple years to lead up to 89:10 generics just kind of shake the rust off 89:12 on like all the like 89:13 what breaks when you change something in 89:14 the language so like you can put 89:16 underscores between digits and long 89:17 numbers now 89:18 things like that um but you know 89:21 generics has clearly been the next thing 89:22 that 89:23 needed to happen and we just had to 89:24 figure out how 89:26 in general we try to only add things 89:28 that don't have weird kind of 89:30 interference with other features 89:31 and we try to add things that are you 89:33 know really important that will help a 89:34 lot of people for the kinds of programs 89:36 that 89:37 we're trying to target with go which is 89:38 like distributed systems and 89:40 that sort of thing 89:50 cool thank you oh i had a question 89:53 actually yeah 89:54 uh so um for i noticed that like you 89:58 know 89:58 uh go doesn't have like basic functions 90:01 like min or max for like 90:03 yeah so is that like something that 90:05 you're considering like say adding with 90:07 like the generic stuff maybe is that why 90:09 you didn't decide yeah exactly right 90:11 because like you can't have a min 90:12 you'd have been event and you could have 90:13 minivan date but those had to have 90:15 different names and that was kind of 90:16 annoying 90:17 um so now we can write just a generic 90:19 name over any type that has a less than 90:21 operator 90:23 yeah that'll be good and you know 90:24 honestly like for the specific case of 90:26 min and max 90:27 so i know it's not that hard to code i 90:30 know i was gonna say i'm starting to 90:31 feel like we should just make some 90:32 built-ins like 90:32 like you know print and things like that 90:35 so that you know you can just always 90:36 have them 90:37 but even if we don't like you it'll be 90:39 math.min and that'll be there at least 90:41 um yeah we really didn't want to make 90:43 them built-ins until we could like 90:44 express their types and we couldn't do 90:46 that until generics happened 90:48 because there is actually a min for like 90:49 floating points actually 90:51 yeah i know it's kind of weird because 90:53 it's because the math library is 90:55 basically copied from the c math.h set 90:57 of things yes 91:00 so that's a good point like we can't 91:02 actually put them in math because 91:03 they're already there 91:04 okay but no yeah but we'll figure it out 91:07 like i think we should probably just put 91:09 them in the language but we have to get 91:10 generis through first 91:11 and another thing actually i noticed 91:13 that you did usako like competitive 91:15 programming yeah i did too 91:16 actually oh cool yeah so how did you so 91:20 actually i included this in one of the 91:22 questions that i 91:22 submitted let me pull it up um so my 91:25 question was like 91:26 um how did how was like how did you 91:31 go from doing competitive programming to 91:33 like doing what you 91:34 you're doing now at google working on 91:36 going how's the transition 91:37 between like competitive programming to 91:40 systems also 91:41 finally what made you decide to go into 91:43 systems and how did it relate to 91:45 programming i mean competitive 91:46 programming at the time that i did it 91:48 was not as all-consuming as i gather it 91:50 is now 91:50 like like you know you could just like 91:53 be able to 91:54 implement a simple dynamic programming 91:55 like little two for loops and that was 91:57 fine and now you have 91:58 all these like complex hall algorithms 92:00 and all that stuff that i can't do 92:02 so like you know at some point like at 92:03 some level like it was different 92:05 um but you know i was actually more 92:08 interested in the sort of 92:09 systems you kind of stopped from the 92:10 start and and the the program contests 92:13 were just like something fun to do on 92:14 the side 92:15 so there wasn't like a huge transition 92:17 there um i was never into like 92:19 implementing complex algorithms and and 92:21 that you know max flow and all those 92:22 sorts of things 92:24 on the other hand like when you start a 92:26 new language you actually do get to 92:28 write a lot of 92:29 core things right um like someone has to 92:31 write the sort function 92:33 and it has to be a good general sort 92:34 function and like i spent a while 92:36 last month like looking into dip 92:38 algorithms and and that's like you know 92:40 sort of matches that background pretty 92:42 well so like it does come up 92:44 um but you know it's just it's just a 92:47 different kind of programming 92:48 oh so you thought of it as more of a 92:49 side thing back then no like yeah 92:51 it wasn't it was definitely not the sort 92:52 of main thing i did when i was writing 92:54 programs 92:54 yeah because like today it's effectively 92:56 like the main thing i know i know it's 92:58 you 92:58 know if you don't do it full-time like 93:00 there's just no way you can 93:02 you know there just weren't that many 93:03 people who cared it you know in 93:07 uh 1995 yeah 20 years later 93:15 um can you ask a related question to 93:16 that so how did you decide to go 93:19 from i'm from like academic work 93:22 into i mean your work is still like a 93:25 little bit more different than 93:27 like the usual like software engineering 93:29 thing but 93:30 still yeah um 93:33 you know i got lucky uh i i grew up near 93:35 bell labs in new jersey and so like that 93:37 was how i ended up working on playing 93:39 the iron a little bit in high school and 93:40 college 93:41 um and so you know i sort of knew i was 93:44 going to go to grad school and 93:45 you know the plan was to go back to bell 93:47 labs but it kind of imploded while i was 93:48 in grad school with the dot com boom and 93:51 the dot com crash 93:53 and um and so like you know google was 93:55 was sort of a 93:57 just vacuuming up phds systems phds at 93:59 the time 94:00 and and and doing really interesting 94:02 things i mean you probably you know 94:03 there's a i don't know i haven't looked 94:04 at syllabus for this year but you know 94:06 there's things like spanner and 94:07 um big table and chubby and and things 94:10 like that and you know they they had a 94:11 whole host of good distributed systems 94:13 kind of stuff going on 94:15 and so you know it was sort of lucky to 94:17 be able to to go to that too 94:19 um and you know at the time i graduated 94:22 i was also looking at you know 94:23 industrial research labs like microsoft 94:25 research and 94:26 and places like that so you know there's 94:28 definitely an opportunity there for 94:29 you know researchy things but not in 94:32 academia if that's what you want 94:34 um it's a little harder to find now i 94:36 mean most of the places i know like 94:37 microsoft research imploded too 94:39 a couple years later but um you know 94:42 it's uh it's still an option and and you 94:45 know it's just a 94:47 slightly different path um you end up 94:50 the the differences i see from academia 94:51 is like you end up caring a ton 94:53 more about actually making things work 94:55 100 time and supporting them for like a 94:57 decade or more 94:58 whereas like you finish your paper and 95:00 you kind of like get to put it off to 95:01 the side and that's that's really nice 95:02 actually 95:03 at some level um it's uh 95:06 it's definitely strange to me to be you 95:08 know editing source files 95:10 that i wrote you know in in some cases 95:13 actually 20 years ago 95:15 um because i used a bunch of code that 95:16 i'd already written when we started go 95:19 and it's very weird to think that like 95:20 i've been keeping this program running 95:22 for 20 years 95:26 thinking