Transcript


Search in video
0:00
all right good afternoon everybody let's get started so welcome to six double of
0:06
four today we're going to continue and in fact finish our journey through operating systems by finishing our
0:12
discussion of virtual memory and then talking about the two remaining pieces that we have how to communicate with IO
0:19
devices and how user little processes can communicate with the operating
0:24
system kernel and request services from a to using system calls so as a reminder you know the three key functions that we
0:31
want our operating system to perform is a to schedule processes into the CPU as I mentioned to virtualize the CPU we've
0:38
seen how to do this with supervisor modem with exceptions we started our discussion of virtual memory on the last
0:46
lecture which the goal of virtual memory is to provide a private address space which process and also to abstract the
0:52
storage resources of the machine so that you know we can use more memory or we can give programs the illusion that they
1:00
were using more memory than the machine has physical memory and the final piece
1:05
of the puzzle which we will see today is that the operating system provides several services to two processes and
1:12
let's processes involve these services through a mechanism that we call system goals okay so in terms of virtual memory
1:21
where we started the discussion on the last lecture remember that what we wanted was first to get protection among
1:28
processes we have this notion of a virtual you know private address space and then demand paging so that we could
1:36
use main memory as a cache for disk and so we could we could have more memory
1:43
we'd offer more memory to programs than we have physical memory in our machine so we saw two different mechanisms
1:49
segmentation remember segmentation was very simple basically each process address space is a contiguous chunk of
1:56
physical memory this is what we call a segment of physical memory it was very simple to implement we just needed a
2:02
couple of register at a couple of registers to the CPU the base unbound registers the problem is that this leads
2:08
to fragmentation and also we couldn't do this demand paging functionality that we want
2:13
and so to solve this problem we introduce this this alternative of
2:18
paging right where instead of having the address space be a contiguous chunk of
2:23
physical memory we manage physical memory in pages in small chunks of a
2:30
fixed size called pages and then we have this Bates table that maps from virtual
2:37
page numbers or virtual pages to physical pages right and so for every
2:43
memory access access that we're going to perform we start with a virtual address we index into the page table we find the
2:49
physical page number for the virtual page number that we want and we construct the physical address this way
2:54
right that had two great benefits we solve this fragmentation problem now we
3:00
can store the process the address space or the memory of each process all over
3:07
the place right on these contiguous pages home physical memory and now we can do them on paging right by having
3:14
this resident bit in the page table we were able to essentially detect when we
3:19
don't have a a particular page on main memory and essentially we took a page
3:25
fault exception and brought it from disk but these are the two problems right the
3:31
first one is that these Bates tables are pretty large and then you know we need
3:37
to access these page tables on every reference right so the solution that we ended up with is you know we would put
3:45
these page tables in memory and then to accelerate Bates Bates table accesses
3:51
essentially we introduced this notion of trying of a translation lookaside buffer or a TLB in short to cache virtual page
4:01
number to physical page number translations right to catch translations so a TLB is just a cache for
4:07
translations and so what we're going to do on every access is first go to the TLB if if we have a valley translation
4:14
then we can strike the physical address very very quickly otherwise we have to go to memory right so caching you know
4:20
same trick that we've applied so far but again apply to caching pates
4:25
table entries right and so we ended up with with with this setup where we have
4:32
another TLB we have memory or checking the TLB to construct the virtual address now so this is one of the two problems
4:40
right and so in recitation you saw many different problems that essentially have
4:45
you know a TLB a page table and they ask you to you know solve essentially you
4:52
know do a few accesses and see how the different the different structures in the machine change right and how they
4:59
are accessed but we still have another problem right the problem is that the way we're representing page tables the
5:06
way we're storing that translations from virtual page numbers the physical page numbers is not efficient and so let's
5:13
think about this so we have 32 bit address event in our risk by processor right let's say we have 4 kilobyte pages
5:21
which is a typical page size and so our each page table entry is going to be
5:27
around 32 bits of 4 bytes right we need you know on the order of 20 bits to
5:33
store the physical page number and then a few bits for status permission bits and so on so let's just make it four
5:39
bytes now that means that every process is going to require to do the 20 page
5:46
table entries in its page table right to do the 30 to minus 12 right because each
5:54
page table is 4 kilobytes the 2 to the 12 bytes and so is 4 megabytes you know
6:01
it's a it's a formal white page table per process a lot you have gigabytes of
6:08
memory on your machines right so it's not it's for a single process it would be okay but
6:16
it's not that great when you start thinking about well I need to actually have hundreds of processes right I won't
6:21
have hundreds thousands of processes in my machine right so I'm going to need if I want to keep all these page tables in
6:26
memory I mean you need gigabytes of memory right now you know your can your
6:32
computer's have 4 to 16 gigabytes of memory it would be completely impractical to store so much data just
6:38
or to devote so much data just for page tables so you know one simple thing we
6:43
could try is make their pages larger what if we say you know we use 64 kilobytes or 1
6:49
megabyte pages well so this introduces other problems right not all processes
6:55
one chunks of one megabyte right and so that's going to make your page tables
7:00
smaller but it is also going to add all their overheads right and you know
7:07
bringing each page from this is going to take a while because now the pages are
7:12
very large right essentially this is the same trade-off that you're having caches with block size remember when we we
7:18
talked about caches we said well you know you want to have a block size that's few tens of bytes because that
7:24
let's amortized overheads of bringing line and the overheads of storing all the metadata associated with that cache
7:29
line but you don't want you know cache lines that are so large that then you can have only a few cache lines in your
7:35
in your cache right and so this is the same the same problem but even if we
7:41
said you know let's use larger pages think about what happens when we move from 32 to 64-bit address spaces so most
7:48
of the machines that you have today actually support 64-bit virtual addresses and so even if you had one
7:53
megabyte pages if you do the math you actually would require two to the forty four eight by page table entries and
7:59
that's 35 terabytes and not even your hard drive has 35 terabytes and so you
8:05
know the problem here is that we're using a very inefficient representation
8:11
for page tables right we're using an array and that array is going to be mostly empty right is it going to only
8:19
have value translations for the entries that where we have actually a piece
8:26
where the process actually has a page so what we would like you to use a data structure that represents
8:33
these you know pairs of virtual to physical pages essentially key value
8:38
pairs right and whose size grows with the number of entries of of key value
8:45
pairs of physical or virtual to physical value pairs that we are storing in this data structure so can you tell me you
8:52
know which data structure support that hash table yeah that's that's a good one
9:00
is there anything else as far as array but yes that actually well so there are
9:10
many different variants right but another I mean the two main ones are a hash table but also a tree right so a
9:19
tree it's it's an ordered container within this case you know it's not
9:24
particularly useful but it's useful for other reasons and let's see why and so in fact most processors they implement
9:31
this multi-level trees by page tables as multi-level trees this is what's called a hierarchical page table and so forget
9:39
about the fact that you know I started talking about trees and let's let's
9:45
understand how this page table is organized from first principles you know in this particular setup we have a 12
9:52
bit offset right and then we have a 20-bit virtual page number and we need to translate this into a physical page
9:58
number right so let's divide and conquer right if instead of 20 bits we did the
10:04
translation ten bits at a time then our problem would be very easy right so let's take that then most significant
10:10
bits and if I tell you look you want to translate a virtual page number to a
10:15
physical page number for you know and you have ten bits worth of virtual page
10:22
number right that's pretty easy that's just a thousand entries right to do the ten so this is what the level one page
10:30
table is storing so what we're going to do is we are going to have a level one you know this structure called a level
10:37
one page table right that we're going to index with the top with the most
10:42
significant n bits which we call the level 1 index and so now the it's entry here is not going to
10:51
point to memory this is the key difference it's going to point to another small entry which we call a
10:57
level 2 page table that then we index with the 10 remaining bits and because
11:03
we have 10 remaining bits this each of these level 2 page tables has a thousand
11:09
entries to write so the key idea here is each of these structures is very small right and now it's translation proceeds
11:16
in two steps first you go you access the route of the current page table and that
11:22
tells you the starting address of the level 1 page table you use these 10 bits
11:28
over here as the offset to find this page table entry which is not giving you a physical address yet or a physical
11:35
page number yet but it's actually a pointer to this second level page table and then you use these the remaining
11:41
bits that you have to actually index into these level 2 page table and this has the page table entry that you want
11:48
right and then you use the offset to find the word right so now each translation is a 3 level it's a 3 step
11:54
process rather than a 2 step process ok is this clear yes
12:03
that's a great question so if we populated these three entirely yes but
12:11
we don't this is the trick so you see here right this entry is not
12:18
pointing anywhere so what we're going to do is we're only going to populate the
12:23
entries of the level one page table that were actually using and that's the key
12:29
advantage of of this organization right is that this makes the the memory that
12:34
the page table takes proportional to how many pages are actually mapped by the process right typically what you'll have
12:40
is in our process that starts with one and two over here one entry over here you'll have to level to page tables and
12:47
then as the process wants more memory it will allocate you know more level to page table entries right there's another
12:56
benefit here which is that each of these level one and level two page tables it's
13:02
actually very small so a thousand entries four bytes per entry that's four kilobytes that fits in a page right so
13:09
before we have this problem where you know every time that we started our process we needed to allocate some giant
13:15
chunk of memory right that actually have to be continues for us to store the page table so we still have some
13:21
fragmentation issues now here we've broken up the page table into these little units that we can allocate
13:27
independently just as we allocate pages so in fact the operating system is always just allocating little pages
13:33
right so everything is much simpler now what's the problem here
13:44
yes I think you still have to figure out the mapping from this level to fake
13:49
paging tables to the actual addressing so here right so just accessing the
13:56
level to page tables it doesn't doesn't give you the address you need to do yet another lookup right so that's the
14:03
problem that now instead of doing a single access to an array we're doing
14:08
two accesses to different memory locations to do a single translation
14:13
right it's not clear alright so now we've made memory access we've made
14:20
translations more expensive right we're essentially trading off time for space where we're doing something that's more
14:27
expensive but much more space efficient but hey we introduced the old ways right till these are caching our translations
14:33
and so this is going to be rare so because we have tlvs we can afford to have a more complex structure that is
14:39
more space efficient right okay so this
14:45
is how pretty much all systems implement virtual memory today x86 for example has
14:52
more levels because it has a 64-bit address space so in general you'll have two three sometimes four four steps this
15:01
is by the way this is also why we call base table accesses page table walks
15:06
because you know it sounds kind of weird if you have an array called a Dewar's you're just accessing one location here
15:12
you're actually walking a data structure you're traversing this tree okay so this
15:21
is this concludes our discussion of the end of virtual memory techniques but we still have to figure out one thing which
15:28
is how virtual memory interacts with caches and so at a high level we have
15:34
two options right we've been discussing basically a situation where we don't have caches we just have a cpu TLB and
15:41
main memory right but if we want to introduce caches and we want to introduce caches because they're crucial
15:47
for performance now we have two options the first one is to put the cache before the TLB in which case the CPU is
15:54
accessing the cache with virtual addresses when the cache misses is also issuing a virtual address the TLB is translating
16:01
it to a physical address and the main memory always operates on physical addresses right this is pretty good
16:06
because now we don't need to you know TLB accesses are not that frequent so any anytime we're hitting the cache we
16:13
don't have to do a TLB access so TLB accesses are not on the critical path they're not adding lots of cycles that
16:20
sounds good but there's a major problem which is that every time that we contacts which this cache has invalid
16:28
data now right it's caching virtual addresses and every time we change from one address space to the other for
16:34
example when we take an exception and we go from the process into the kernel now the addresses that this cache is storing
16:42
are not valid anymore right and so let's look at the alternative which is to put
16:47
the TLB before the cache so here the CPU issues issues accesses
16:53
using virtual addresses the TLB translates into a virtual to a physical address and then the cache caches
16:59
physical addresses right this is great because we do not need to flash on a
17:04
context switch right so if we're invoking system calls all over the place we're not going to discard this valuable
17:10
data that we have retained in the cache the problem is that now we need to have that ELB access in front of the cache
17:18
and this is going you know a TLB is small but it's at least going to add one or two cycles right this is not great
17:25
because that is going to add stages to our pipeline is going to add latency to
17:30
our loads it's so it looks like there's no good solution here right fortunately
17:39
there's a way that you can have a physically address cache that you can have essentially store or cache physical
17:48
cache lines physical addresses and actually check the cache and the TLB in
17:54
parallel can you guess how
18:04
so there's actually some CPUs that have done some tricks with caching you know
18:10
storing the virtual address and the physical address but it's actually something pretty simple so let's think about what the
18:16
translation is doing right you start with that virtual address and this has
18:21
two components right it has the Bates offset and then it has the virtual page
18:30
number and then you're translating this
18:36
to a physical address but the only thing
18:46
that changes is you take the virtual
18:52
page number turn you into a physical place number but the Bates offset is the same right so now let's think about what
18:59
I cash this so our cash and address have
19:05
three fields right it has the block offset it has the index bits
19:23
and it has the Cashtown right okay so
19:33
when do you need the tag this by the way
19:38
is the same physical address right when you're doing a cache look up when do you
19:44
need the tag at the beginning of the lookup or at the end of the lookup at
19:52
the end when you're doing the comparison so the only thing you need to start the cache look up is the index bits right
20:00
the way a cache works is we take the index bits we index into the particular
20:05
said that we want we get all the data right we get the tag then we compare the
20:10
tag with the tag of our address right
20:19
okay so that means that you can give under some conditions you can actually
20:27
access both the cache on the TLB in parallel can you tell me what that condition is depending on how many index
20:34
bits you have
20:42
yes
20:50
the bass upset exactly so if the index bits all fit within the Bates offset
20:58
then you already have them when you have the virtual address right so if you're
21:03
in this situation then you can do them in parallel now if your cache is larger
21:09
and have more index bits now you know part of your index bits might change so
21:16
you cannot do this in parallel and so what most systems do is in fact they
21:24
stay within this is bound at least for the level 1 cache for caches that come
21:29
later that's fine because you know it's okay that they have larger latency so
21:37
this is a pretty clever trick right now you know we're not incurring extra
21:43
cycles and we're still cashing in physical addresses and we need we don't need to flash our cache now there's one
21:50
limitation here right and the limitation is that we cannot have as many index
21:56
bits as we want so if we want to make our cache larger the only way we can do
22:01
this is to our ways right is to increase the associativity we cannot make we cannot have more sets beyond a certain
22:07
point and so for example pretty much every x86 processor in the last 50 years
22:13
has a 32 kilobyte cache that has 8 ways so if you divide 32 kilobytes by 8
22:20
that's 8 kilobyte that's 4 kilobytes right and the page size of Exedy 6 it's 4 kilobytes right so they're essentially
22:27
keeping each each of the ways to be for kyo to store 4 kilobytes of data so the
22:33
index bits fit exactly within the page offset bits and they can do both accesses in parallel ok any questions
22:41
yes
22:47
absolutely absolutely but so here's here's here's what happens right you take the index bits from the
22:53
virtual address start the lookup then you get the tag right you get the tax for from you know you get all the
23:00
relevant tags and now you need to compare them but by that point you've done the access in the TLB right and you
23:06
already have the physical page number and so you can compare the the tag against the right values okay so this
23:16
concludes everything about that we're going to see about virtual memory let's move on to to i/o so we focused our on
23:25
on the processor right we focus a lot on on virtual memory but you know all these
23:31
IO devices we haven't we haven't discussed so far right these input/output devices and so the way
23:37
systems communicate with with i/o devices is is generally pretty simple you have some set of shared io registers
23:46
that both the processor and the i/o devices can write I can read and write and then they're communicating by
23:52
reading and writing to these registers and so there are two key questions in how you do i oh the first one is how do you access these
23:59
IO registers from the processor side and then the second one is how do the
24:04
processor and the device coordinator on its transfer write if you want to read if you want to send or receive some data
24:10
to or from a device how do you actually coordinate so for accessing i/o
24:18
registers you know one option that some architectures some ISAs implement is to
24:24
add additional instructions essentially implement some abstraction of having channels with some ports and being able
24:30
to to read and write to essentially some buffers that are then sent to some
24:37
devices this is actually not frequently used because it is inflexible and it's
24:43
adding instructions that then we go and we have to support in in the ISA and so
24:49
you know most most instruction sets do not do not support this instead the
24:55
preferred way of communicating with or accessing this i/o registers is through what we call memory mapped i/o
25:01
and so here I already searched our mapped to some particular locations in
25:06
physical memory and then the processor accessed them just with normal loads and stores so here's how physical memory
25:13
looks we have you know some data that's backed by main memory but then some
25:19
addresses are actually going to be routed to these i/o registers and so we can issue loads and stores and
25:27
essentially communicate with with the particular device that we care about that way now these are normal loads and
25:36
stores from the perspective of the ISA yes they're their registers because
25:46
there's it's some state that's kept somewhere in the system it's it's not memory because it's some small amount of
25:53
state that we're we're reading on right yes and so yes as I was saying you know
26:00
one of the reasons why we call them registers and our memories this these loads and stores actually should not be
26:05
cached right so you need to be careful to when when you design your system you
26:11
know if it's a normal memory access then you can send that through the cache and all the way up to main memory but if
26:18
this if you get a lot of or a store to some address that corresponds to one of
26:24
these higher registers then it needs to be routed directly to the i/o registers and then what happens is each of these
26:30
registers is used by a different device right some devices might use multiple
26:36
registers yes
26:42
uh you could do that as well but because
26:47
the cash is very tightly integrated with with the CPU it's not generally a good idea input / output okay
27:00
so memory mapped i/o is the standard way now something that where the you know
27:09
the other aspect or the other key dimension of communicating with IO devices is is how do you coordinate its
27:16
transfer right and so here we have two options following base tile or what we call synchronous i/o where the processor
27:23
is periodically checking this register right that's that's associated with with
27:28
the specific i/o device to see if it can perform some operation of this order to see if it has some data available and so
27:35
basically you're constantly checking on the status of the d-backs the other option is what we call
27:41
interrupts which which is asynchronous where essentially you have more of our
27:46
request response interface the processor energy is our request basically by
27:52
writing to one of these i/o registers and then moves on to do some other work and then when the request is serviced or
27:58
when the device has some event that needs some attention then the device
28:04
interrupts the processor which then can go look at the register and see what
28:09
what is to be done right so broadly speaking can you tell me a couple of you
28:16
know good things about it about its approach what is good about polling
28:26
like like you can they can take turns and you know which one is doing what and
28:33
so you don't have to like do extra set up overhead coupler the shared part but
28:40
for but it will potentially be slower interrupts it could be faster since they're both doing things at the same
28:47
time but then you have to make sure that the shared type that they're using is a it's not quite that it's faster right
28:53
it's actually interesting to have higher latency because an interrupt needs some time to be processed by you know that
29:01
the the Como handler needs to fire needs to save all the registers needs to get to the right interrupt handler whereas
29:06
if the if the CPU is already in a tight loop polling the register right then
29:12
it's going to take much less time but the key advantage here is that interrupts led the processor move on to
29:17
do other things right so for example if we want to access a file and it takes you know 10 milliseconds to read it from
29:24
from the right place on disk we can actually do something useful for those 10 milliseconds rather than just keep
29:30
checking with with the disk a you know do you have something right are you
29:36
doing yet okay so let's see a very simple example so consider a simple
29:42
character based display so you have you know modern displays are more complicated they actually let you draw
29:49
arbitrary pixels but early displays actually work like this where basically
29:54
you have a single I or register that has this format it's it's a 32-bit register but most bits are unused and there's you
30:02
know the the eight so bits 0 through 7 store a character and then be 31 is what
30:09
we call a ready bit and so the ready bit is set to 1 only when the display is
30:14
ready to print a character and then when the processor wants to display an 8-bit
30:20
character it writes it to this register it writes it to this char bits over here
30:25
and then it sets the ready bit to syrup and that lets the let the display know
30:33
that it actually has some character that it can grab when the display has
30:38
processed the character it sets the ready bit to 1 and so now the processor is able to put in more
30:45
more characters right yes sir ready 1
30:51
means it's ready to take a new character not that it's ready to display yes it
30:56
means that it's ready to take a new character so it could you know there might be still some latency until the
31:02
the character shows but ready just says yourself the processor hey I got this character you can you can give me a new
31:08
one so it's some form of flow control right ok so rather than you know just showing
31:18
you some code let's see Adam of how this works so I've written a little Louis it
31:25
works like this so you have three
31:31
processes here the operating system kernel which we're gonna call process zero and process 1 process 2 they all
31:39
have a single segment in memory so we're gonna use segmentation based virtual memory and so process 1 is in addresses
31:48
you know 1 0 0 0 0 2 1 FF c so this is
31:53
64 kilobytes worth of memory process 2 has the next 64 kilobytes of memory and
31:59
then at this address over here for 0 0 0 0 0 0 0 you have the display
32:07
I already stir right so the way this works is you know I have a simulator just like the simulator that you use for
32:13
a lab for it's a little bit more complicated this it's it's emulating an entire system and so you have these
32:20
these very simple processes that are just in an infinite loop doing you know not doing much
32:26
basically incrementing a counter this is process 1 this is process 2 and so you
32:32
know they're slightly different in that they they have a different way of incrementing the counter but otherwise they're the same not very interesting
32:39
and then you know there's the common handler here right which is what you saw
32:46
in your recitation which eventually calls the dispatcher
32:51
this has essentially it's a very simple
32:57
dispatcher that if you know the causes a timer interrupts is going to call this interrupt timer routine right so you
33:03
know if I build this system I can
33:10
actually you know run it but it's not gonna do much because we have no IO yet so we're not you know a system that
33:17
doesn't have any inputs or any outputs really isn't very interesting so let's try to do something so if we go bad heat
33:24
back here one thing that we can do is if you're taking an interrupt you know
33:31
let's print when we're essentially switching switching processes right so this is the interrupt time a routine
33:36
that you sell in the you know a couple of presentations ago and so we can call
33:42
this you know print string routine that
33:47
essentially you know we can say switching processes on timer interrupts
33:56
right okay so now every time the timer interrupts Pires we're essentially changing switching processes between
34:03
presses one and process two and then we want to print this message now print
34:09
string is this function over here in this other file which is certainly
34:15
complicated because we don't want to use half word and byte loads but conceptually that's this loop that's
34:21
that's shown in the comment over here so it's a simple while loop it scans through the different characters of the
34:28
string and until it hits a zero then the null character which is essentially what you did when you detect that the string
34:34
you know it's atherton what we call the termination character of the string it's essentially printing it character by
34:39
character right and then we have to implement this print character routine that essentially needs to write to our
34:47
display register right so I already have a pointer to this display register over here so display so what I would have to
34:54
write it something like display mm IO equals x right so X is a 32-bit value
34:59
but actually only the eight lower bits are nonzero so have sort of the dating the four month
35:06
everyone we just need to write it to our 32-bits and that's it right now if I compile this is this going to work as as you
35:14
expect is this going to work correctly
35:20
what what do you expect will happen here I'm writing to this display I'm writing
35:27
through the special address right yes
35:35
right I didn't I just wrote to it right and so here's here's what happens we're
35:42
missing some characters right because not checking so what we need to do is we
35:48
need to do polling right we need to go back and say you know first check that
35:55
you know wait or you know while you know
36:00
the contents of this memory address are
36:06
less than zero X right yep so while b31
36:15
is not set right which is detective if the value is less than this hex
36:21
character over here then you just wait right this is just an infinite loop
36:26
mommy it's a loop that will keep going going back until it reads a value a a
36:33
value that's smaller than that right and so now it works right now we're properly
36:39
waiting and getting all the characters okay any questions
36:47
yes our i/o registers you basically eats
37:03
a register and so what you have is circuitry in in the address bus right
37:09
that's saying if it's this address and then go access this register done don't send the access through memories yes
37:20
yeah there's nothing there there's doesn't physical memory doesn't even need to be there right because you're
37:27
never going to access it at that address
37:36
because that gives you more flexibility as to you know where you intercept the
37:42
access so what happens in real implementations is that you actually send these all the way through the cache
37:48
but you send it with a special bit that says please don't cache these and it's only when you get sort of to the outside
37:53
of the chip that you say oh actually this is an i/o request routed this
37:59
particular way so it is it is easing the implementation of the CPU it is making
38:06
it simpler this also is more flexible than just using you know a bunch of registers
38:11
because that gives you a lot of space right we already have these very large pool of memory right and so if instead
38:18
of one register we want you know one megabyte worth of you know a buffer right that the reddit that the i/o
38:25
device offers then we can we can also do that whereas if we use registers that were very tightly coupled to the CPU we
38:31
could not okay so the other alternative
38:37
which you will see in recitation is what we call interrupts base IO and so here you know suppose that we have a keyboard
38:44
and we have in a very similar format where instead of ready we have this full bit and so communication here goes the
38:51
other way right the keyboard is going to store a character here and set the full bit to one and then the processor is
38:58
going to interpret that will be two as you know I need to be at this character from this register now of
39:05
course one option would be to do the same thing that we've done and every now and then go on interrupt the processor
39:10
or you know every night every time we go into the OS we could check you know we could call the keyboard to see is there
39:15
any key that you type anything but it's much more efficient in this case to use
39:21
an interrupt right to have the keyboard generate an interrupt whenever it writes
39:26
this register and so then interrupt handler is just going to fire immediately read the character and then
39:34
do whatever you know it needs to with it maybe buffer it somewhere maybe send it to a process that's listening forward
39:40
for input okay now this particular scenario works fine
39:48
because the keyboard is extremely slow right you know you pressing one key every 10
39:54
milliseconds is is you know very aggressive right like that type is will
40:01
will press a key every few milliseconds but in you know less than a microsecond you handle the interrupt right and so
40:08
more or higher data rate devices you know hard drives on network cards have
40:15
more sophisticated mechanisms things like direct memory access we're essentially there writing large amounts of data or entire entire packets
40:21
directly to to physical memory and so we're not going to disguise this here but just be aware that you know there
40:27
are more sophisticated mechanisms for hiya okay so this concludes our
40:34
discussion of i/o now the last piece of the puzzle that we need to take a look at is how can processes communicate with
40:41
the operating system right so we have seen how to implement this virtual
40:46
processor this virtual memory and now we have all these system calls or you know all these sort of facilities which are
40:51
provided by the operating system through what we call system calls and system calls are essentially invoked by
40:59
executing an instruction so that each process has you know special instruction that causes an exception so the good
41:06
news is we've already seen this mechanism it's exactly the same mechanism that we saw a couple of lectures ago right so it's one
41:13
instruction it's going to generate an exception with a particular exception cause right so
41:18
that the colonel knows that this is a system call and they open the process
41:24
once something wasn't some service it's not that it executed some illegal instruction right okay so in risk five
41:31
the the way this happens is with this equal instruction we start stands for
41:36
environment call used to be called supervisor call but you know they changed it and so this causes an
41:43
exception sets the encode CSR to a particular value value eight but that's you know an irrelevant detail and now
41:51
you know how the process and the kernel communicate you know that's up to the
41:57
API that's not something that the ISA defines right it's this application binary interface between the process and
42:03
the kernel so different kernels will do different things but generally a good idea is to use a similar convention as
42:09
the function call as as function calls and so you know in Linux for example
42:16
will pass the system call number in register a seven so because there are multiple system calls we'd need to tell
42:23
the colonel you know I wanted this one I want this particular function right so
42:29
that is what the system call number denotes and then other arguments that this call takes in registers a CRNA 6
42:36
and then the kernel will return results in AC or a 1 or maybe it will write some
42:43
data in memory for example if you ask to read some file then you know the colonel might write that to the processes memory
42:50
and then return and you know the key difference with normal calls a normal
42:58
function calls is that all registers are preserved right so because we're already saving and restoring all the registers
43:04
we don't overwrite temporary registers or anything like that ok so again let's see this in action rather than seeing
43:11
code let's write some so going back to our set up you know these are still not
43:19
very useful let's see if we can print something from the process itself and so
43:24
I have this all slightly different file which is sort of
43:32
the same loop but now this loop is essentially incrementing some counter
43:37
that's stored in T zero and then if these three were treated some value
43:43
which in this case is 0 X 1 0 0 0 so 4096 it's going to it's it's going to
43:53
reset this this counter to 0 and then it's going to load the address of this string over here that says hello from
44:01
process 1 into a into register a 0 is going to load the system call number for
44:07
print which is 0 X 1 3 into register a 7 and it's going to call going to issue
44:14
this ecole instruction right I'm so this is going to invoke the kernel now if we
44:19
make this and run it you know nothing is going to happen because we still need to
44:25
implement the system call handler for this thing so let's see how to implement the system call handler for this thing
44:32
and so essentially if the dispatcher
44:37
finds that the cause is a system call then it calls this cisco exception
44:43
handler routine and the cisco exception handler essentially has different cases
44:49
for different potential system calls so this is currently only implementing the
44:54
get PID get process ID system call which is not very interesting basically it gives you the ID of the process where
45:00
you're currently running store it returns it in register a 0 right and so to do that is writing into register a 0
45:06
of the currently running process let's try to implement this if the syscall
45:12
which by the way this is call is here right we're just reading register a 7
45:18
from the process so this is this print which is you know a constant that I have
45:23
defined somewhere else that is just 0 X 13 then well what we need to do will
45:29
when to call print string on the value of this right
45:36
so now you know a small see detail that you don't need to go to know about
45:41
because remember we're not gonna quiz you on pointers but this is an integer we need to make it you know we need to
45:47
tell see you know interpret this as a pointer to a string and so yeah if we do
45:53
this and fix the typo right now will
46:01
this work so there's there's let's let's just ride
46:12
and see what happens and then try to understand what's going on so you see that instead of printing hello from
46:17
process one there's some garbage that's periodically printed here not quite what we expected why what's what's going on
46:28
yes yeah so when I jump into the kernel I'm
46:34
an address zero right and I just you know process one is inclusive all
46:40
addresses you know one zero zero zero zero but it actually believes that its
46:47
own address is zero starts at zero right that's the point of segmentation so we
46:55
need to translate these address and that
47:01
translation yeah there's a routine that does this essentially by looking at the base register which is stored somewhere
47:07
in core proc it's process is going to have a different one so we need to pass carpark and the register and now if we
47:14
compile this again well ami holy works right so now we're
47:20
passing you know we're reading we're reading the string from the right address now okay
47:26
there's some slight problem here if you
47:32
ever code a system call handler like this you're in for a surprise so this is
47:37
still horribly wrong even though it appears to work can you tell me why
47:49
so this doesn't have to do with functionality but it has to do with security what if process one is a malicious
47:57
process and it constructs a stream that just right at the edge of its own
48:03
address space and then it passes into the kernel my kernel is going to print
48:09
the string to the end and then it's going to go into the address space of process tool and keep going until it
48:15
finds a zero right so essentially we're violating the protection mechanisms that we want to enforce and so a proper
48:23
implementation of this which we're not going going to do would also have to check that the string is completely
48:29
within the outer space of process one right okay so the kernel implements all
48:41
sorts of system calls things for you know system calls for accessing files you know in parentheses you can see that
48:48
are the system called names or the main certain the names for the main system calls for Linux but you know Windows or
48:54
other operating systems implement different ones using network connections managing memory getting information
49:00
about the system or the currently running process you know it'd be the user that's running this this process
49:06
things like that getting getting the time of the day waiting for a certain event so for example sometimes you want
49:12
to sleep for some time or wait until there's some contents available in a in
49:18
a socket right until you receive some sort of message creating an interrupting other processes and you know all sorts
49:24
of other things and so this is essentially implementing you know all the functionality that we want our
49:30
applications to access now applications or you know when you write application code you're not generally going to be
49:37
using system calls directly instead you're going to call it be calling libraries that then call these system
49:43
calls but sometimes it's useful to know that these system calls are what's going on underneath because you know sometimes
49:51
especially when you're debugging or where you're looking at performance issues you want to know what
49:56
cause actually are invoked one last issue is that some of these system calls
50:02
may actually block the process because the operating system cannot serve them immediately and so what happens here is
50:08
you know what we've seen so far is that you have a bunch of processes that are created at some point they're there they
50:15
start being ready they're eventually schedule right into the CPU for some time and then they're d schedule they
50:22
become ready again and so on until they complete but in fact if you if a process
50:30
issues a system call that cannot be served immediately for example if you want to access a file that's not in
50:37
memory and the president and the kernel is to fetch it from disk that's gonna take some time
50:43
the process is put in this waiting state right where it cannot run and then when
50:49
whatever condition it is waiting on is satisfied the process is then again work walking up becomes ready and then
50:56
becomes available for scheduling again right so when you actually look at what
51:01
processes you you're you're running in the machine you and your machine you'll see that some of them are ready some are executing some of them might be waiting
51:09
ok so that's it for operating systems you know we've barely scratched the
51:15
surface they're an exciting topic if you want to learn more check out six oh three three or six eight to eight and
51:21
that's all for today Happy Thanksgiving