CS 481: Test #2 Makeup

The test is open book, open notes. There are four questions of equal weight. Keep your answers short; each problem may have several questions, but each question has a concise and (for an OS question) unambiguous answer.

Problem 1. Suppose you have a byte-addressed computer system with a 40-bit logical address and a 48-bit physical address. The page size is 64Kbytes and each page entry takes 4 bytes.

  • How many pages are there in the full logical address space? logical address space has 2^40 bytes, so it has 2^40 / 64K or 2^(40-16) = 2^24 pages

    How many frames in the full physical address space? physical address space has 2^48 bytes, so it has 2^48 / 64K or 2^(48-16) = 2^32 frames

  • We shall use 2-level paging and want a secondary page table to fit within one page. How many bits will be use to index a master page? a secondary page?

    Since a page table entry is said to use 4 bytes (doubtful, as we shall see), a page can hold 64K/4 = 16K such entries, corresponding to a 14-bit page index. That leaves 24-14=10 bits for the index into the master page, which will thus have at most 2^10 entries.

  • Show a typical master page table entry and a typical secondary page table entry.

    a secondary page table entry must store the frame index (32 bits) along with some basic information such as a presence bit, a dirty bit, and a few other such bits and perhaps counters. But the frame number uses up the full entry, so there is a problem...

    since we are not told how many bytes should be used for a master page table entry, we can do better; it needs to store a frame number (where the secondary page table is) and some access information, along with presence bit, and other paging information (in case the secondary page tables are pageable, which they probably are).

  • You are running a 16Gbytes program on this system. How many page frames would you need in order to put your entire program (including its page tables) in memory?

    16Gbytes is 2^34 bytes; so we need 2^(34-16) = 2^18 frames to hold the program itself, plus page tables; 2^18 frames require at least 2^(18-14) = 2^4 secondary page tables, plus the master page table, so 17 additional frames.

  • Problem 2. You have a RISC machine with a paged virtual memory system. In a RISC machine, some instructions are memory transfer instructions (load and store to and from registers) and all others are internal to the CPU; thus an instruction causes either one or two (logical) memory references: one for the next instruction and, in the case of transfer intructions, one for data. Your system uses a TLB to cache page table entries; it also has an instruction cache and a data cache. The TLB takes 20ns to respond; the data cache and instruction cache both respond in 50 ns; a memory access costs you 300ns; and a disk access costs you 10ms.

    Assume that the TLB hit ratio is 90%, the instruction cache ratio 95%, the data cache hit ratio 85%; when the TLB does not contain the sought-after page table entry, the page table itself has it in 99.9% of the cases. About 40% of the instructions executed in our program are memory transfer instructions.

    Compute the time spent accessing memory, on average, per instruction. Compare that with the ideal time (if all hit ratios were 100%): is the answer satisfactory? If so, why? If not, can you point out any imbalance in the architecture?

    A detailed computation is annoying, but it is pretty clear that the system is badly imbalanced: if the address we seek to translate is not in the TLB and the page table does not have it either, we get a page fault. This happens 10%*0.1% of the time, or one time for each 10,000 memory references; but even 1/10000 is a huge quantity when it is taken from the disk access time: 10ms/10000 = 1 microsecond, which is way out of range of everything else (the cache is 20 times faster, but useless in paging faults). Otherwise things looked pretty good: 20ns for the TLB, 10%*50ns to get something out of the data cache when needed---adding only 5ns to the 20---, and a small chance of 15% to have to go to real memory, for another contribution of 10%*15%*300ns or 4.5ns, keeping the whole thing down to 29.5ns for one address translation NOT COUNTING PAGE FAULTS.

    Problem 3. We are running a matrix multiplication program, which uses the obvious code to compute matrix C as the product of matrices A and B (all square matrices of size n).

    for i=1 to n do
    for j=1 to n do
    tmp <- 0
    for k=1 to n do
    tmp <- tmp + a(i,k)*b(k,j)
    c(i,j) <- tmp
    Assume that n is 2**16 and that our paged system has pages of 4Kbytes each; one entry in our matrices takes 16 bytes (long reals). All three matrices are stored row by row, from the first row to the last (row-major order).

  • Assuming unlimited memory, compute the number of page faults triggered by the program.

    The three matrices together use up 3*(2^16)^2=3*2^32 bytes; that's 3*2^32 / 2^12 = 3*2^20 pages for storing all three. This is exactly what will happen when memory is available, causing 3*2^20 page faults.

  • Now assume that we have been allocated 256 frames and that the page replacement algorithm is simply FIFO -- it is always the oldest page that gets replaced first. How many page faults are generated?

    256 frames allows us to store 2^8*2^12=2^20 bytes, or 16 rows or columns. That means that we can run the computation of a matrix element entirely in memory. However, FIFO will get rid of the current row of the first matrix operand (using 16 frames) after 240 page faults; thus computing a row of the answer will take 2^20 faults on columns (every column causes 16 faults) and another 2^10/240 (a negligible quantity) fault on the current row; this get repeated 2^16 times for a total of 2^36 page faults.

  • Finally, repeat the computation for the same number of pages, but with an optimal page replacement policy.

    Not much help here, because we cannot alter the code and have very few frames. We can keep the frame holding the result and the 16 frames holding the current row in main memory, paging only on the columns; but paging the columns is the problem, leading to 2^20 pages faults per row and thus still 2^36 faults overall.

  • Problem 4. The working-set model is recognized as a good model for page replacement and page frame allocation, but it is not a strategy. Actual paging strategies based on the owkring set model use indirect approaches to maintaining a working set in memory for each process. Briefly discuss (a) why the working set concept cannot be implemented directly; and (b) a specific paging mechanism that attempts to simulate the working set concept.

    It is very hard to measure the size of a working set; page faulting is normal and detecting abnormally high fault rates difficult to do on the fly.

    Back to CS 481 home page