Transcript Search in video 0:00 all right good afternoon everybody let's get started so welcome to six double of 0:06 four today we're going to continue and in fact finish our journey through operating systems by finishing our 0:12 discussion of virtual memory and then talking about the two remaining pieces that we have how to communicate with IO 0:19 devices and how user little processes can communicate with the operating 0:24 system kernel and request services from a to using system calls so as a reminder you know the three key functions that we 0:31 want our operating system to perform is a to schedule processes into the CPU as I mentioned to virtualize the CPU we've 0:38 seen how to do this with supervisor modem with exceptions we started our discussion of virtual memory on the last 0:46 lecture which the goal of virtual memory is to provide a private address space which process and also to abstract the 0:52 storage resources of the machine so that you know we can use more memory or we can give programs the illusion that they 1:00 were using more memory than the machine has physical memory and the final piece 1:05 of the puzzle which we will see today is that the operating system provides several services to two processes and 1:12 let's processes involve these services through a mechanism that we call system goals okay so in terms of virtual memory 1:21 where we started the discussion on the last lecture remember that what we wanted was first to get protection among 1:28 processes we have this notion of a virtual you know private address space and then demand paging so that we could 1:36 use main memory as a cache for disk and so we could we could have more memory 1:43 we'd offer more memory to programs than we have physical memory in our machine so we saw two different mechanisms 1:49 segmentation remember segmentation was very simple basically each process address space is a contiguous chunk of 1:56 physical memory this is what we call a segment of physical memory it was very simple to implement we just needed a 2:02 couple of register at a couple of registers to the CPU the base unbound registers the problem is that this leads 2:08 to fragmentation and also we couldn't do this demand paging functionality that we want 2:13 and so to solve this problem we introduce this this alternative of 2:18 paging right where instead of having the address space be a contiguous chunk of 2:23 physical memory we manage physical memory in pages in small chunks of a 2:30 fixed size called pages and then we have this Bates table that maps from virtual 2:37 page numbers or virtual pages to physical pages right and so for every 2:43 memory access access that we're going to perform we start with a virtual address we index into the page table we find the 2:49 physical page number for the virtual page number that we want and we construct the physical address this way 2:54 right that had two great benefits we solve this fragmentation problem now we 3:00 can store the process the address space or the memory of each process all over 3:07 the place right on these contiguous pages home physical memory and now we can do them on paging right by having 3:14 this resident bit in the page table we were able to essentially detect when we 3:19 don't have a a particular page on main memory and essentially we took a page 3:25 fault exception and brought it from disk but these are the two problems right the 3:31 first one is that these Bates tables are pretty large and then you know we need 3:37 to access these page tables on every reference right so the solution that we ended up with is you know we would put 3:45 these page tables in memory and then to accelerate Bates Bates table accesses 3:51 essentially we introduced this notion of trying of a translation lookaside buffer or a TLB in short to cache virtual page 4:01 number to physical page number translations right to catch translations so a TLB is just a cache for 4:07 translations and so what we're going to do on every access is first go to the TLB if if we have a valley translation 4:14 then we can strike the physical address very very quickly otherwise we have to go to memory right so caching you know 4:20 same trick that we've applied so far but again apply to caching pates 4:25 table entries right and so we ended up with with with this setup where we have 4:32 another TLB we have memory or checking the TLB to construct the virtual address now so this is one of the two problems 4:40 right and so in recitation you saw many different problems that essentially have 4:45 you know a TLB a page table and they ask you to you know solve essentially you 4:52 know do a few accesses and see how the different the different structures in the machine change right and how they 4:59 are accessed but we still have another problem right the problem is that the way we're representing page tables the 5:06 way we're storing that translations from virtual page numbers the physical page numbers is not efficient and so let's 5:13 think about this so we have 32 bit address event in our risk by processor right let's say we have 4 kilobyte pages 5:21 which is a typical page size and so our each page table entry is going to be 5:27 around 32 bits of 4 bytes right we need you know on the order of 20 bits to 5:33 store the physical page number and then a few bits for status permission bits and so on so let's just make it four 5:39 bytes now that means that every process is going to require to do the 20 page 5:46 table entries in its page table right to do the 30 to minus 12 right because each 5:54 page table is 4 kilobytes the 2 to the 12 bytes and so is 4 megabytes you know 6:01 it's a it's a formal white page table per process a lot you have gigabytes of 6:08 memory on your machines right so it's not it's for a single process it would be okay but 6:16 it's not that great when you start thinking about well I need to actually have hundreds of processes right I won't 6:21 have hundreds thousands of processes in my machine right so I'm going to need if I want to keep all these page tables in 6:26 memory I mean you need gigabytes of memory right now you know your can your 6:32 computer's have 4 to 16 gigabytes of memory it would be completely impractical to store so much data just 6:38 or to devote so much data just for page tables so you know one simple thing we 6:43 could try is make their pages larger what if we say you know we use 64 kilobytes or 1 6:49 megabyte pages well so this introduces other problems right not all processes 6:55 one chunks of one megabyte right and so that's going to make your page tables 7:00 smaller but it is also going to add all their overheads right and you know 7:07 bringing each page from this is going to take a while because now the pages are 7:12 very large right essentially this is the same trade-off that you're having caches with block size remember when we we 7:18 talked about caches we said well you know you want to have a block size that's few tens of bytes because that 7:24 let's amortized overheads of bringing line and the overheads of storing all the metadata associated with that cache 7:29 line but you don't want you know cache lines that are so large that then you can have only a few cache lines in your 7:35 in your cache right and so this is the same the same problem but even if we 7:41 said you know let's use larger pages think about what happens when we move from 32 to 64-bit address spaces so most 7:48 of the machines that you have today actually support 64-bit virtual addresses and so even if you had one 7:53 megabyte pages if you do the math you actually would require two to the forty four eight by page table entries and 7:59 that's 35 terabytes and not even your hard drive has 35 terabytes and so you 8:05 know the problem here is that we're using a very inefficient representation 8:11 for page tables right we're using an array and that array is going to be mostly empty right is it going to only 8:19 have value translations for the entries that where we have actually a piece 8:26 where the process actually has a page so what we would like you to use a data structure that represents 8:33 these you know pairs of virtual to physical pages essentially key value 8:38 pairs right and whose size grows with the number of entries of of key value 8:45 pairs of physical or virtual to physical value pairs that we are storing in this data structure so can you tell me you 8:52 know which data structure support that hash table yeah that's that's a good one 9:00 is there anything else as far as array but yes that actually well so there are 9:10 many different variants right but another I mean the two main ones are a hash table but also a tree right so a 9:19 tree it's it's an ordered container within this case you know it's not 9:24 particularly useful but it's useful for other reasons and let's see why and so in fact most processors they implement 9:31 this multi-level trees by page tables as multi-level trees this is what's called a hierarchical page table and so forget 9:39 about the fact that you know I started talking about trees and let's let's 9:45 understand how this page table is organized from first principles you know in this particular setup we have a 12 9:52 bit offset right and then we have a 20-bit virtual page number and we need to translate this into a physical page 9:58 number right so let's divide and conquer right if instead of 20 bits we did the 10:04 translation ten bits at a time then our problem would be very easy right so let's take that then most significant 10:10 bits and if I tell you look you want to translate a virtual page number to a 10:15 physical page number for you know and you have ten bits worth of virtual page 10:22 number right that's pretty easy that's just a thousand entries right to do the ten so this is what the level one page 10:30 table is storing so what we're going to do is we are going to have a level one you know this structure called a level 10:37 one page table right that we're going to index with the top with the most 10:42 significant n bits which we call the level 1 index and so now the it's entry here is not going to 10:51 point to memory this is the key difference it's going to point to another small entry which we call a 10:57 level 2 page table that then we index with the 10 remaining bits and because 11:03 we have 10 remaining bits this each of these level 2 page tables has a thousand 11:09 entries to write so the key idea here is each of these structures is very small right and now it's translation proceeds 11:16 in two steps first you go you access the route of the current page table and that 11:22 tells you the starting address of the level 1 page table you use these 10 bits 11:28 over here as the offset to find this page table entry which is not giving you a physical address yet or a physical 11:35 page number yet but it's actually a pointer to this second level page table and then you use these the remaining 11:41 bits that you have to actually index into these level 2 page table and this has the page table entry that you want 11:48 right and then you use the offset to find the word right so now each translation is a 3 level it's a 3 step 11:54 process rather than a 2 step process ok is this clear yes 12:03 that's a great question so if we populated these three entirely yes but 12:11 we don't this is the trick so you see here right this entry is not 12:18 pointing anywhere so what we're going to do is we're only going to populate the 12:23 entries of the level one page table that were actually using and that's the key 12:29 advantage of of this organization right is that this makes the the memory that 12:34 the page table takes proportional to how many pages are actually mapped by the process right typically what you'll have 12:40 is in our process that starts with one and two over here one entry over here you'll have to level to page tables and 12:47 then as the process wants more memory it will allocate you know more level to page table entries right there's another 12:56 benefit here which is that each of these level one and level two page tables it's 13:02 actually very small so a thousand entries four bytes per entry that's four kilobytes that fits in a page right so 13:09 before we have this problem where you know every time that we started our process we needed to allocate some giant 13:15 chunk of memory right that actually have to be continues for us to store the page table so we still have some 13:21 fragmentation issues now here we've broken up the page table into these little units that we can allocate 13:27 independently just as we allocate pages so in fact the operating system is always just allocating little pages 13:33 right so everything is much simpler now what's the problem here 13:44 yes I think you still have to figure out the mapping from this level to fake 13:49 paging tables to the actual addressing so here right so just accessing the 13:56 level to page tables it doesn't doesn't give you the address you need to do yet another lookup right so that's the 14:03 problem that now instead of doing a single access to an array we're doing 14:08 two accesses to different memory locations to do a single translation 14:13 right it's not clear alright so now we've made memory access we've made 14:20 translations more expensive right we're essentially trading off time for space where we're doing something that's more 14:27 expensive but much more space efficient but hey we introduced the old ways right till these are caching our translations 14:33 and so this is going to be rare so because we have tlvs we can afford to have a more complex structure that is 14:39 more space efficient right okay so this 14:45 is how pretty much all systems implement virtual memory today x86 for example has 14:52 more levels because it has a 64-bit address space so in general you'll have two three sometimes four four steps this 15:01 is by the way this is also why we call base table accesses page table walks 15:06 because you know it sounds kind of weird if you have an array called a Dewar's you're just accessing one location here 15:12 you're actually walking a data structure you're traversing this tree okay so this 15:21 is this concludes our discussion of the end of virtual memory techniques but we still have to figure out one thing which 15:28 is how virtual memory interacts with caches and so at a high level we have 15:34 two options right we've been discussing basically a situation where we don't have caches we just have a cpu TLB and 15:41 main memory right but if we want to introduce caches and we want to introduce caches because they're crucial 15:47 for performance now we have two options the first one is to put the cache before the TLB in which case the CPU is 15:54 accessing the cache with virtual addresses when the cache misses is also issuing a virtual address the TLB is translating 16:01 it to a physical address and the main memory always operates on physical addresses right this is pretty good 16:06 because now we don't need to you know TLB accesses are not that frequent so any anytime we're hitting the cache we 16:13 don't have to do a TLB access so TLB accesses are not on the critical path they're not adding lots of cycles that 16:20 sounds good but there's a major problem which is that every time that we contacts which this cache has invalid 16:28 data now right it's caching virtual addresses and every time we change from one address space to the other for 16:34 example when we take an exception and we go from the process into the kernel now the addresses that this cache is storing 16:42 are not valid anymore right and so let's look at the alternative which is to put 16:47 the TLB before the cache so here the CPU issues issues accesses 16:53 using virtual addresses the TLB translates into a virtual to a physical address and then the cache caches 16:59 physical addresses right this is great because we do not need to flash on a 17:04 context switch right so if we're invoking system calls all over the place we're not going to discard this valuable 17:10 data that we have retained in the cache the problem is that now we need to have that ELB access in front of the cache 17:18 and this is going you know a TLB is small but it's at least going to add one or two cycles right this is not great 17:25 because that is going to add stages to our pipeline is going to add latency to 17:30 our loads it's so it looks like there's no good solution here right fortunately 17:39 there's a way that you can have a physically address cache that you can have essentially store or cache physical 17:48 cache lines physical addresses and actually check the cache and the TLB in 17:54 parallel can you guess how 18:04 so there's actually some CPUs that have done some tricks with caching you know 18:10 storing the virtual address and the physical address but it's actually something pretty simple so let's think about what the 18:16 translation is doing right you start with that virtual address and this has 18:21 two components right it has the Bates offset and then it has the virtual page 18:30 number and then you're translating this 18:36 to a physical address but the only thing 18:46 that changes is you take the virtual 18:52 page number turn you into a physical place number but the Bates offset is the same right so now let's think about what 18:59 I cash this so our cash and address have 19:05 three fields right it has the block offset it has the index bits 19:23 and it has the Cashtown right okay so 19:33 when do you need the tag this by the way 19:38 is the same physical address right when you're doing a cache look up when do you 19:44 need the tag at the beginning of the lookup or at the end of the lookup at 19:52 the end when you're doing the comparison so the only thing you need to start the cache look up is the index bits right 20:00 the way a cache works is we take the index bits we index into the particular 20:05 said that we want we get all the data right we get the tag then we compare the 20:10 tag with the tag of our address right 20:19 okay so that means that you can give under some conditions you can actually 20:27 access both the cache on the TLB in parallel can you tell me what that condition is depending on how many index 20:34 bits you have 20:42 yes 20:50 the bass upset exactly so if the index bits all fit within the Bates offset 20:58 then you already have them when you have the virtual address right so if you're 21:03 in this situation then you can do them in parallel now if your cache is larger 21:09 and have more index bits now you know part of your index bits might change so 21:16 you cannot do this in parallel and so what most systems do is in fact they 21:24 stay within this is bound at least for the level 1 cache for caches that come 21:29 later that's fine because you know it's okay that they have larger latency so 21:37 this is a pretty clever trick right now you know we're not incurring extra 21:43 cycles and we're still cashing in physical addresses and we need we don't need to flash our cache now there's one 21:50 limitation here right and the limitation is that we cannot have as many index 21:56 bits as we want so if we want to make our cache larger the only way we can do 22:01 this is to our ways right is to increase the associativity we cannot make we cannot have more sets beyond a certain 22:07 point and so for example pretty much every x86 processor in the last 50 years 22:13 has a 32 kilobyte cache that has 8 ways so if you divide 32 kilobytes by 8 22:20 that's 8 kilobyte that's 4 kilobytes right and the page size of Exedy 6 it's 4 kilobytes right so they're essentially 22:27 keeping each each of the ways to be for kyo to store 4 kilobytes of data so the 22:33 index bits fit exactly within the page offset bits and they can do both accesses in parallel ok any questions 22:41 yes 22:47 absolutely absolutely but so here's here's here's what happens right you take the index bits from the 22:53 virtual address start the lookup then you get the tag right you get the tax for from you know you get all the 23:00 relevant tags and now you need to compare them but by that point you've done the access in the TLB right and you 23:06 already have the physical page number and so you can compare the the tag against the right values okay so this 23:16 concludes everything about that we're going to see about virtual memory let's move on to to i/o so we focused our on 23:25 on the processor right we focus a lot on on virtual memory but you know all these 23:31 IO devices we haven't we haven't discussed so far right these input/output devices and so the way 23:37 systems communicate with with i/o devices is is generally pretty simple you have some set of shared io registers 23:46 that both the processor and the i/o devices can write I can read and write and then they're communicating by 23:52 reading and writing to these registers and so there are two key questions in how you do i oh the first one is how do you access these 23:59 IO registers from the processor side and then the second one is how do the 24:04 processor and the device coordinator on its transfer write if you want to read if you want to send or receive some data 24:10 to or from a device how do you actually coordinate so for accessing i/o 24:18 registers you know one option that some architectures some ISAs implement is to 24:24 add additional instructions essentially implement some abstraction of having channels with some ports and being able 24:30 to to read and write to essentially some buffers that are then sent to some 24:37 devices this is actually not frequently used because it is inflexible and it's 24:43 adding instructions that then we go and we have to support in in the ISA and so 24:49 you know most most instruction sets do not do not support this instead the 24:55 preferred way of communicating with or accessing this i/o registers is through what we call memory mapped i/o 25:01 and so here I already searched our mapped to some particular locations in 25:06 physical memory and then the processor accessed them just with normal loads and stores so here's how physical memory 25:13 looks we have you know some data that's backed by main memory but then some 25:19 addresses are actually going to be routed to these i/o registers and so we can issue loads and stores and 25:27 essentially communicate with with the particular device that we care about that way now these are normal loads and 25:36 stores from the perspective of the ISA yes they're their registers because 25:46 there's it's some state that's kept somewhere in the system it's it's not memory because it's some small amount of 25:53 state that we're we're reading on right yes and so yes as I was saying you know 26:00 one of the reasons why we call them registers and our memories this these loads and stores actually should not be 26:05 cached right so you need to be careful to when when you design your system you 26:11 know if it's a normal memory access then you can send that through the cache and all the way up to main memory but if 26:18 this if you get a lot of or a store to some address that corresponds to one of 26:24 these higher registers then it needs to be routed directly to the i/o registers and then what happens is each of these 26:30 registers is used by a different device right some devices might use multiple 26:36 registers yes 26:42 uh you could do that as well but because 26:47 the cash is very tightly integrated with with the CPU it's not generally a good idea input / output okay 27:00 so memory mapped i/o is the standard way now something that where the you know 27:09 the other aspect or the other key dimension of communicating with IO devices is is how do you coordinate its 27:16 transfer right and so here we have two options following base tile or what we call synchronous i/o where the processor 27:23 is periodically checking this register right that's that's associated with with 27:28 the specific i/o device to see if it can perform some operation of this order to see if it has some data available and so 27:35 basically you're constantly checking on the status of the d-backs the other option is what we call 27:41 interrupts which which is asynchronous where essentially you have more of our 27:46 request response interface the processor energy is our request basically by 27:52 writing to one of these i/o registers and then moves on to do some other work and then when the request is serviced or 27:58 when the device has some event that needs some attention then the device 28:04 interrupts the processor which then can go look at the register and see what 28:09 what is to be done right so broadly speaking can you tell me a couple of you 28:16 know good things about it about its approach what is good about polling 28:26 like like you can they can take turns and you know which one is doing what and 28:33 so you don't have to like do extra set up overhead coupler the shared part but 28:40 for but it will potentially be slower interrupts it could be faster since they're both doing things at the same 28:47 time but then you have to make sure that the shared type that they're using is a it's not quite that it's faster right 28:53 it's actually interesting to have higher latency because an interrupt needs some time to be processed by you know that 29:01 the the Como handler needs to fire needs to save all the registers needs to get to the right interrupt handler whereas 29:06 if the if the CPU is already in a tight loop polling the register right then 29:12 it's going to take much less time but the key advantage here is that interrupts led the processor move on to 29:17 do other things right so for example if we want to access a file and it takes you know 10 milliseconds to read it from 29:24 from the right place on disk we can actually do something useful for those 10 milliseconds rather than just keep 29:30 checking with with the disk a you know do you have something right are you 29:36 doing yet okay so let's see a very simple example so consider a simple 29:42 character based display so you have you know modern displays are more complicated they actually let you draw 29:49 arbitrary pixels but early displays actually work like this where basically 29:54 you have a single I or register that has this format it's it's a 32-bit register but most bits are unused and there's you 30:02 know the the eight so bits 0 through 7 store a character and then be 31 is what 30:09 we call a ready bit and so the ready bit is set to 1 only when the display is 30:14 ready to print a character and then when the processor wants to display an 8-bit 30:20 character it writes it to this register it writes it to this char bits over here 30:25 and then it sets the ready bit to syrup and that lets the let the display know 30:33 that it actually has some character that it can grab when the display has 30:38 processed the character it sets the ready bit to 1 and so now the processor is able to put in more 30:45 more characters right yes sir ready 1 30:51 means it's ready to take a new character not that it's ready to display yes it 30:56 means that it's ready to take a new character so it could you know there might be still some latency until the 31:02 the character shows but ready just says yourself the processor hey I got this character you can you can give me a new 31:08 one so it's some form of flow control right ok so rather than you know just showing 31:18 you some code let's see Adam of how this works so I've written a little Louis it 31:25 works like this so you have three 31:31 processes here the operating system kernel which we're gonna call process zero and process 1 process 2 they all 31:39 have a single segment in memory so we're gonna use segmentation based virtual memory and so process 1 is in addresses 31:48 you know 1 0 0 0 0 2 1 FF c so this is 31:53 64 kilobytes worth of memory process 2 has the next 64 kilobytes of memory and 31:59 then at this address over here for 0 0 0 0 0 0 0 you have the display 32:07 I already stir right so the way this works is you know I have a simulator just like the simulator that you use for 32:13 a lab for it's a little bit more complicated this it's it's emulating an entire system and so you have these 32:20 these very simple processes that are just in an infinite loop doing you know not doing much 32:26 basically incrementing a counter this is process 1 this is process 2 and so you 32:32 know they're slightly different in that they they have a different way of incrementing the counter but otherwise they're the same not very interesting 32:39 and then you know there's the common handler here right which is what you saw 32:46 in your recitation which eventually calls the dispatcher 32:51 this has essentially it's a very simple 32:57 dispatcher that if you know the causes a timer interrupts is going to call this interrupt timer routine right so you 33:03 know if I build this system I can 33:10 actually you know run it but it's not gonna do much because we have no IO yet so we're not you know a system that 33:17 doesn't have any inputs or any outputs really isn't very interesting so let's try to do something so if we go bad heat 33:24 back here one thing that we can do is if you're taking an interrupt you know 33:31 let's print when we're essentially switching switching processes right so this is the interrupt time a routine 33:36 that you sell in the you know a couple of presentations ago and so we can call 33:42 this you know print string routine that 33:47 essentially you know we can say switching processes on timer interrupts 33:56 right okay so now every time the timer interrupts Pires we're essentially changing switching processes between 34:03 presses one and process two and then we want to print this message now print 34:09 string is this function over here in this other file which is certainly 34:15 complicated because we don't want to use half word and byte loads but conceptually that's this loop that's 34:21 that's shown in the comment over here so it's a simple while loop it scans through the different characters of the 34:28 string and until it hits a zero then the null character which is essentially what you did when you detect that the string 34:34 you know it's atherton what we call the termination character of the string it's essentially printing it character by 34:39 character right and then we have to implement this print character routine that essentially needs to write to our 34:47 display register right so I already have a pointer to this display register over here so display so what I would have to 34:54 write it something like display mm IO equals x right so X is a 32-bit value 34:59 but actually only the eight lower bits are nonzero so have sort of the dating the four month 35:06 everyone we just need to write it to our 32-bits and that's it right now if I compile this is this going to work as as you 35:14 expect is this going to work correctly 35:20 what what do you expect will happen here I'm writing to this display I'm writing 35:27 through the special address right yes 35:35 right I didn't I just wrote to it right and so here's here's what happens we're 35:42 missing some characters right because not checking so what we need to do is we 35:48 need to do polling right we need to go back and say you know first check that 35:55 you know wait or you know while you know 36:00 the contents of this memory address are 36:06 less than zero X right yep so while b31 36:15 is not set right which is detective if the value is less than this hex 36:21 character over here then you just wait right this is just an infinite loop 36:26 mommy it's a loop that will keep going going back until it reads a value a a 36:33 value that's smaller than that right and so now it works right now we're properly 36:39 waiting and getting all the characters okay any questions 36:47 yes our i/o registers you basically eats 37:03 a register and so what you have is circuitry in in the address bus right 37:09 that's saying if it's this address and then go access this register done don't send the access through memories yes 37:20 yeah there's nothing there there's doesn't physical memory doesn't even need to be there right because you're 37:27 never going to access it at that address 37:36 because that gives you more flexibility as to you know where you intercept the 37:42 access so what happens in real implementations is that you actually send these all the way through the cache 37:48 but you send it with a special bit that says please don't cache these and it's only when you get sort of to the outside 37:53 of the chip that you say oh actually this is an i/o request routed this 37:59 particular way so it is it is easing the implementation of the CPU it is making 38:06 it simpler this also is more flexible than just using you know a bunch of registers 38:11 because that gives you a lot of space right we already have these very large pool of memory right and so if instead 38:18 of one register we want you know one megabyte worth of you know a buffer right that the reddit that the i/o 38:25 device offers then we can we can also do that whereas if we use registers that were very tightly coupled to the CPU we 38:31 could not okay so the other alternative 38:37 which you will see in recitation is what we call interrupts base IO and so here you know suppose that we have a keyboard 38:44 and we have in a very similar format where instead of ready we have this full bit and so communication here goes the 38:51 other way right the keyboard is going to store a character here and set the full bit to one and then the processor is 38:58 going to interpret that will be two as you know I need to be at this character from this register now of 39:05 course one option would be to do the same thing that we've done and every now and then go on interrupt the processor 39:10 or you know every night every time we go into the OS we could check you know we could call the keyboard to see is there 39:15 any key that you type anything but it's much more efficient in this case to use 39:21 an interrupt right to have the keyboard generate an interrupt whenever it writes 39:26 this register and so then interrupt handler is just going to fire immediately read the character and then 39:34 do whatever you know it needs to with it maybe buffer it somewhere maybe send it to a process that's listening forward 39:40 for input okay now this particular scenario works fine 39:48 because the keyboard is extremely slow right you know you pressing one key every 10 39:54 milliseconds is is you know very aggressive right like that type is will 40:01 will press a key every few milliseconds but in you know less than a microsecond you handle the interrupt right and so 40:08 more or higher data rate devices you know hard drives on network cards have 40:15 more sophisticated mechanisms things like direct memory access we're essentially there writing large amounts of data or entire entire packets 40:21 directly to to physical memory and so we're not going to disguise this here but just be aware that you know there 40:27 are more sophisticated mechanisms for hiya okay so this concludes our 40:34 discussion of i/o now the last piece of the puzzle that we need to take a look at is how can processes communicate with 40:41 the operating system right so we have seen how to implement this virtual 40:46 processor this virtual memory and now we have all these system calls or you know all these sort of facilities which are 40:51 provided by the operating system through what we call system calls and system calls are essentially invoked by 40:59 executing an instruction so that each process has you know special instruction that causes an exception so the good 41:06 news is we've already seen this mechanism it's exactly the same mechanism that we saw a couple of lectures ago right so it's one 41:13 instruction it's going to generate an exception with a particular exception cause right so 41:18 that the colonel knows that this is a system call and they open the process 41:24 once something wasn't some service it's not that it executed some illegal instruction right okay so in risk five 41:31 the the way this happens is with this equal instruction we start stands for 41:36 environment call used to be called supervisor call but you know they changed it and so this causes an 41:43 exception sets the encode CSR to a particular value value eight but that's you know an irrelevant detail and now 41:51 you know how the process and the kernel communicate you know that's up to the 41:57 API that's not something that the ISA defines right it's this application binary interface between the process and 42:03 the kernel so different kernels will do different things but generally a good idea is to use a similar convention as 42:09 the function call as as function calls and so you know in Linux for example 42:16 will pass the system call number in register a seven so because there are multiple system calls we'd need to tell 42:23 the colonel you know I wanted this one I want this particular function right so 42:29 that is what the system call number denotes and then other arguments that this call takes in registers a CRNA 6 42:36 and then the kernel will return results in AC or a 1 or maybe it will write some 42:43 data in memory for example if you ask to read some file then you know the colonel might write that to the processes memory 42:50 and then return and you know the key difference with normal calls a normal 42:58 function calls is that all registers are preserved right so because we're already saving and restoring all the registers 43:04 we don't overwrite temporary registers or anything like that ok so again let's see this in action rather than seeing 43:11 code let's write some so going back to our set up you know these are still not 43:19 very useful let's see if we can print something from the process itself and so 43:24 I have this all slightly different file which is sort of 43:32 the same loop but now this loop is essentially incrementing some counter 43:37 that's stored in T zero and then if these three were treated some value 43:43 which in this case is 0 X 1 0 0 0 so 4096 it's going to it's it's going to 43:53 reset this this counter to 0 and then it's going to load the address of this string over here that says hello from 44:01 process 1 into a into register a 0 is going to load the system call number for 44:07 print which is 0 X 1 3 into register a 7 and it's going to call going to issue 44:14 this ecole instruction right I'm so this is going to invoke the kernel now if we 44:19 make this and run it you know nothing is going to happen because we still need to 44:25 implement the system call handler for this thing so let's see how to implement the system call handler for this thing 44:32 and so essentially if the dispatcher 44:37 finds that the cause is a system call then it calls this cisco exception 44:43 handler routine and the cisco exception handler essentially has different cases 44:49 for different potential system calls so this is currently only implementing the 44:54 get PID get process ID system call which is not very interesting basically it gives you the ID of the process where 45:00 you're currently running store it returns it in register a 0 right and so to do that is writing into register a 0 45:06 of the currently running process let's try to implement this if the syscall 45:12 which by the way this is call is here right we're just reading register a 7 45:18 from the process so this is this print which is you know a constant that I have 45:23 defined somewhere else that is just 0 X 13 then well what we need to do will 45:29 when to call print string on the value of this right 45:36 so now you know a small see detail that you don't need to go to know about 45:41 because remember we're not gonna quiz you on pointers but this is an integer we need to make it you know we need to 45:47 tell see you know interpret this as a pointer to a string and so yeah if we do 45:53 this and fix the typo right now will 46:01 this work so there's there's let's let's just ride 46:12 and see what happens and then try to understand what's going on so you see that instead of printing hello from 46:17 process one there's some garbage that's periodically printed here not quite what we expected why what's what's going on 46:28 yes yeah so when I jump into the kernel I'm 46:34 an address zero right and I just you know process one is inclusive all 46:40 addresses you know one zero zero zero zero but it actually believes that its 46:47 own address is zero starts at zero right that's the point of segmentation so we 46:55 need to translate these address and that 47:01 translation yeah there's a routine that does this essentially by looking at the base register which is stored somewhere 47:07 in core proc it's process is going to have a different one so we need to pass carpark and the register and now if we 47:14 compile this again well ami holy works right so now we're 47:20 passing you know we're reading we're reading the string from the right address now okay 47:26 there's some slight problem here if you 47:32 ever code a system call handler like this you're in for a surprise so this is 47:37 still horribly wrong even though it appears to work can you tell me why 47:49 so this doesn't have to do with functionality but it has to do with security what if process one is a malicious 47:57 process and it constructs a stream that just right at the edge of its own 48:03 address space and then it passes into the kernel my kernel is going to print 48:09 the string to the end and then it's going to go into the address space of process tool and keep going until it 48:15 finds a zero right so essentially we're violating the protection mechanisms that we want to enforce and so a proper 48:23 implementation of this which we're not going going to do would also have to check that the string is completely 48:29 within the outer space of process one right okay so the kernel implements all 48:41 sorts of system calls things for you know system calls for accessing files you know in parentheses you can see that 48:48 are the system called names or the main certain the names for the main system calls for Linux but you know Windows or 48:54 other operating systems implement different ones using network connections managing memory getting information 49:00 about the system or the currently running process you know it'd be the user that's running this this process 49:06 things like that getting getting the time of the day waiting for a certain event so for example sometimes you want 49:12 to sleep for some time or wait until there's some contents available in a in 49:18 a socket right until you receive some sort of message creating an interrupting other processes and you know all sorts 49:24 of other things and so this is essentially implementing you know all the functionality that we want our 49:30 applications to access now applications or you know when you write application code you're not generally going to be 49:37 using system calls directly instead you're going to call it be calling libraries that then call these system 49:43 calls but sometimes it's useful to know that these system calls are what's going on underneath because you know sometimes 49:51 especially when you're debugging or where you're looking at performance issues you want to know what 49:56 cause actually are invoked one last issue is that some of these system calls 50:02 may actually block the process because the operating system cannot serve them immediately and so what happens here is 50:08 you know what we've seen so far is that you have a bunch of processes that are created at some point they're there they 50:15 start being ready they're eventually schedule right into the CPU for some time and then they're d schedule they 50:22 become ready again and so on until they complete but in fact if you if a process 50:30 issues a system call that cannot be served immediately for example if you want to access a file that's not in 50:37 memory and the president and the kernel is to fetch it from disk that's gonna take some time 50:43 the process is put in this waiting state right where it cannot run and then when 50:49 whatever condition it is waiting on is satisfied the process is then again work walking up becomes ready and then 50:56 becomes available for scheduling again right so when you actually look at what 51:01 processes you you're you're running in the machine you and your machine you'll see that some of them are ready some are executing some of them might be waiting 51:09 ok so that's it for operating systems you know we've barely scratched the 51:15 surface they're an exciting topic if you want to learn more check out six oh three three or six eight to eight and 51:21 that's all for today Happy Thanksgiving