Comprehensive Exam Preparation

Topics
Terms
Old Tests and Questions
- Tests
- Questions by Topic
Resources
- comps sites
- OS
- Vu's Site

Topics

OS

threads – dinosaur 4

(see threads)

scheduling – dinosaur 5

(see scheduling, proportional share scheduling, lottery reading, lottery scheduling)

run multiple process on a single CPU with the goals of

mimicking a multi-cpu environment
maximally utilizing the cpu e.g. while one process is I/O blocking let other processes run

The average length of time processes run the cpu between I/O interrupts can be important for the selection of a scheduler

Scheduling decisions can be required in response to the following four conditions

process switches from the running state to the waiting state (e.g. due to I/O request)
process switches from running to ready (e.g. due to interrupt)
process switches from waiting to ready (e.g. I/O completes)
process terminates

in nonpreemtive or cooperative scheduling, processes are only swapped in cases (1) and (4) above, otherwise the scheduler is preemptive

the dispatcher is responsible for

switching context
switching to user mode
jumping to the proper location in the user program to restart the program

scheduling criteria

cpu utilization: want to keep the cpu running all the time
throughput: number of processes completed per unit time
turnaround time: interval form time of submission to time of completion
waiting time: sum of the amount of time spent waiting i the ready queue
response time: time from submission to first response

scheduling algorithms
first-come first-serve
easily managed and programmed with a FIFO queue
average wait time is often very long – e.g. when a long-running process is the "first in"

not good with cpu maximization, say ∃ one cpu bound process and many i/o bound processes – the scheduler will continually cycle the FIFO queue, whenever the cpu bound process gets the cpu, then the i/o all completes, and all i/o devices and processes sit idle while the cpu process churns, after it finished all the i/o processes cycle in a quick succession of tiny bursts, and we're back to the beginning with most resources and processes idling. this is called the convoy effect.

shortest job first
associated with each process is the length of it's next cpu burst – always run the shortest job first.
provably optimal for minimum average waiting time

impossible to know the length of a processes cpu burst before hand – generally this is estimated to be the exponential average of a processes recent cpu burst lengths

priority scheduling
a priority is associated with each process, and the cpu is allocated to the process with the highest priority (in SJF above the priority is based on expected wait time).
indefinite blocking or starvation, low-priority jobs may never get on the cpu

rumor has it that, when they shut down an IBM 7094 at MIT in 1973, they found a low-priority process that had been submitted in 1976 and had not yet been run

a solution to this problem is aging. aging gradually increases the priority of a process with it's time since last cpu burst

round robin
designed specifically for time-sharing systems. similar to FCFS with preemption, a time-quantum or time-slice (generally 10-100 milliseconds) size is defined, the queue is treated as circular, and the cpu scheduler going around the queue doling out time slices.
with very large slices RR ≅ FCFS

with very small slices 1 cpu and n processes looks like n cpu's working at 1/n speed (less context switching time)

context-switch-time/slice-time = % time spent context switching

multilevel queue scheduling
different classes of processes, e.g. foreground (interactive) and background (batch) process
partitions the ready into many different queues. processes are permanently assign to one queue. each of which can have it's own algorithm (e.g. RR for interactive, FCFS for batch).

inter-queue scheduling is generally handled with something like a fixed-priority scheduler (e.g. foreground before background)

multilevel feedback queue scheduling
like the above only it is possible for processes to switch queues. the idea being to assign processes to queues based on their cpu burstiness.
for example say ∃ 3 queues of increasing time quanta. a new process is added to the top (fastest) queue. if it doesn't complete in it's time quanta, then it's demoted to the next queue.

parameters
- number of queues
- scheduling algorithm for each queue
- method used to determine when to upgrade a process to a higher queue
- method used to downgrade a process
- method used to place newly arrived processes into an initial queue
multi-processor scheduling
- in asymmetric multiprocessing one cpu handles all scheduling decisions and all other cpus only run user code – this reduces the need for sharing system data among cpus.
- in symmetric multiprocessing each processor is self-scheduling, all processes may be organized in a common ready queue, or each processor may have it's own ready queue.
generally it's preferable not to switch a process from one cpu to another for things like cache memory – the inclination to keep a process on it's most recent cpu is called processor affinity

load balancing is important to take advantage of the all processors. a common queue makes load-balancing unnecessary, but is generally not desirable. Two approaches to load balancing are push migration and pull migration

push migration
a specific task checks the load on each processors and potentially distributes the load by migrating processes

pull migration
idle processors pull tasks from loaded processors

there is a tension between processor affinity and load balancing

synchronization – dinosaur 6

coordination of light weight processes

a solution to the problem of competing critical sections must satisfy the following three requirements

mutual exclusion: If process P_i is executing its critical section then no other process can be executing in their critical section
Progress: if critical section is open, and some processes want to enter their critical section then only those processes not in their remainder section can participate in the decision, and the decision must be reached in finite time
Bounded waiting: ∃ a bound on the number of times other processes are allowed to enter their critical sections while ∃ a process waiting to enter its own critical section

the OS is fertile ground for race conditions

non-preemptive kernels: protect against race conditions by never interrupting a process in it's critical secton
preemptive kernels: more complicated because it requires synchronization of shared kernel data, but better for real time performance and protection against arbitrarily long running critical sections

hardware support in the form of instructions for atomic test-and-modify (e.g. TestAndSet()) behavior or word-swapping behavior

with TestAndSet

while(TestAndSetLock(&lock))
  ; // do nothing

with Swap

key = true;
while(key == true)
  Swap(&lock, &key);

an integer variable which (apart from initialization) is only accessed through two methods
- wait() or test()
```
wait(s) {
  while(s <= 0)
    ;
  s--;
}
```
- signal() or increment()
```
signal(s) {
  s++;
}
```
the values of counting semaphores can have an arbitrary range, while binary semaphores (or mutexes see monitors) can only have the value of 0 or 1. The initialized value of the semaphore determines how many process can concurrently use the resource

the main downside to semaphores is that they require busy waiting or spinlocks where a process burns cpu while waiting on a resource. this can be overcome with a wait queue, and processes which block when a resource is busy and then wait on the queue until the receive a "wakeup" signal.

semaphore with a wait queue

typedef struct {
  int value;
  struct process *list;
}

wait()

wait(semaphore *s){
  s->value--;
  if (s->value < 0){
    add_self_to_queue(s->list);
    block();
  }
}

signal()

signal(semaphore *s){
  s->value++;
  if(s->value <=0){
    p = remove_from_queue(s->list);
    wakeup(p);
  }
}

note that these wait() and signal() operations are themselves critical sections, and are generally guarded by a simple lock like a spin lock.

see synchronization problems

monitor

high-level programming language construct used to provide safe programmer-error free use of semaphores and mutexes

programmers define variable declarations and critical operations inside the monitor, and then the monitor provides synchronization protection for those variables and methods

monitor{
  // shared variable declarations
  procedure p1(...){
  }
  procedure p1(...){
  }
  // ...
  initialization(){
  }
}

for custom synchronization needs not addressed by the predefined methods monitors may also expose condition variables (e.g. x) which can provide x.wait() and x.signal() operations which are like their semaphore alternatives but are safer.

in general monitors can be misused in the same way as semaphores

atomic transactions

a series of read and write commands terminated by either a commit or an abort

commit: the effect of the previous reads and writes are applied to the system
abort: all previous actions are rolled back

one solution to providing these features is write-ahead loggin in which each action is logged before it is propagated to the system. in these systems the entire state can be recreated by re-running the log, however each logical write requires two physical writes, and the logs take up as much space as the actual data.

When restoring data, any transaction with a <T_i starts> and no <T_i commits> need not be applied to the database.

we can use checkpoints to increase the efficiency of rebuilding our data. the system will periodically execute a checkpoint which involves

output all log records currently residing in volatile storage (e.g. memory) to disk
output all modified data to disk
output a log record <checkpoint> onto stable storage

when multiple transactions are active simultaneously the execution of the transactions must be equivalent to the case where these transactions are executed serially in some order -- this is called serializability

if a schedule can be transformed into a serial schedule by a series of swaps of nonconflicting operations we say that it is conflict serializable

a single transaction may require locks on a number of data objects, to ensure serializability the sequence of locking and unlocking should happen in two phases

a growing phase in which the transaction can acquire but not release locks
a shrinking phase in which the transaction can release but not acquire locks

alternately a time-stamp based approach can be used to implement serializability, in which each item is associated with both a

read-time-stamp recording the time of the last successful read, and a
write-time-stamp recording the time of the last successful write

these can be used to organize any conflicting operations into a single conflict-free series.

memory – dinosaur 8

memory layout and compile times

base register
smallest legal address

limit register
largest legal address

addressing spaces

bound at

compile time
if you know where program is to run in memory

load time
compiler makes relocatable code → bound at load time

exec. time
lets process move during execution, requires special hardware
MMU
converts logical (virtual) addresses into physical addresses, this enables the exec. time binding mentioned above
the base register is used as an offset register, it is added to every address

dynamic loading
each routine is loaded as needed, so…
- many parts of the program may never need to be brought into memory
- system libraries can be shared between processes (with some OS help at link time)
stubs
placed in executables where system libraries would go, they handle checking if the needed routine is in memory (with OS help), and then if they load it into memory then they replace themselves with a jump to the address of the system routine in memory

swapping
moving processes between main memory and the backing store
if compile or load time binding, then the process must always be swapped into the same part of main memory

for efficiency in multiprogramming environments you always want the time quanta to be greater than the swap time

if a swap is needed for a process waiting for I/O you must either let the I/O complete, or handle all I/O to/from OS buffers, so that processes don't overwrite each other's address spaces with stale I/O.
memory allocation
how to break up main memory
- fixed size partitions
- 1 big hole which is filled with…
  
  first fit
  each proc takes the first big enough space
  
  best fit
  each proc takes the space closest to it's requirements
  
  worst fit
  always put in the biggest space – because biggest remainder
non-contiguous memory
- paging
- segmentation
  
  logical memory physical memory
  
  frames pages
- paging
  a standard size is used for memory allocation
```
       high order   low order
           bit        bit
            |          |
            v          v
address \rightarrow [page #, page offset]

            |                                           +----------
            |             page table                    |  memory  
            |                +----+                     |          
            |                |    |                     |          
            |    index       +----+   physical offset   |          
            +--------------> |    | ------------------> |          
                             +----+                     |          
                             |    |                     |          
                             +----+                     |          
                             |    |                     |          
                             +----+                     |          
                                                        +----------
```
  - 1 page table per process
  - stored in
    - dedicated registers – fast but small
    - page table base register points to table in memory, slow
    - TLB – small, fast, associative hardware cache, when accessed compares against all keys simultaneously, can only hold part of the page table sometimes you'll get a TLB miss and have to go to memory
  - ASID – Address Space ID
    - relates processes to address spaces
    - means that the TLB need not be flushed between processes, just entry by entry as the spaces fill up
- protection
  bits w/each page-table entry can specify read-only or read-write, execute-only, and more importantly
  
  valid/invalid bit
  protect non-user/process accessible pages in the page table
  
  paging allows for shared pages say everyone uses Emacs, only one Emacs in memory
structure of page table
hierarchical paging
multi-level page tables, further divide page number
```
 page number | offset
+---------------------+
|  p1 |  p2  |        |
+---------------------+
```
up to 4 levels of page tables on some 32-bit machines, generally not deemed appropriate for 64-bit machines
hashed page tables
page number is a hash key, page table is a linked list of [hash key, page address, offset] elements, in clustered page tables each key points to 16 pages rather than just 1

inverted page table
table is indexed by actual memory addresses, and each element holds the related virtual addresses, this decreases the size of the page table, but increases the time required to use it as processes must search through the table to find the entry for their virtual address

logical memory	physical memory
frames	pages

segmentation
a logical address space is a collection of variable length named segments each of which has a name and a length

<segment number, offset>

these are mapped through a segment table to turn the segment number and offset into memory addresses

  segment table
   +-----+----+
   |     |    |
   |limit|base|
   |     |    |
   +-----+----+

can be used with paging

          logical            linear            physical
           addr               addr              addr   
 +------+       +-----------+      +-----------+      +-----------+
 | CPU  |------>|sementation|----->|  paging   |----->| physical  |
 |      |       |   unit    |      |   unit    |      |  memory   |
 +------+       +-----------+      +-----------+      +-----------+

virtual memory – dinosaur 9

executing code must be located in physical memory, although rarely is the entirety of a program needed for it's execution – parts handling edge cases, variables that never reach their full potential size, etc…
virtual memory allows programs to address virtual memory which could be much larger than physical memory, the process references it's memory through a virtual address space which the OS maps to physical memory
sharing and virtual memory
- for system libraries used by many processes, the libraries can be mapped into each processes virtual address space but need live in only one place on disk
- processes can communicate through shared memory
- allows pages of memory to be shared with new processes created by e.g. fork
implementation
demand paging
only load pages into physical memory as they are needed. when a process is started the

pager
guesses which pages of the processes memory may be needed, and swaps them into physical memory. a

page table
is used to hold addresses of pages, and to mark which pages are both valid and resident in physical memory. when the process tries to access a page which is marked invalid a

page-fault trap
will trap to the OS and the following procedure takes place
1. check a page table in the PCB to check if the reference is valid (i.e. in the processes virtual address space)
2. if invalid then terminate process, else page in the page
3. find a free frame in physical memory
4. schedule disk I/O to read in the page
5. update processes table and page table with new page
6. restart the instruction interrupted by the trap
pure demand paging
when no pages are brought into memory until they raise a trap as in the above

locality of reference
is a property of programs that tends to lead to good performance in demand paging schemes

effective access time
is the average time taken to access a memory location, in the access of page faults it should be on the order of 100-200 nanoseconds. if p is the prob. of a page fault on a memory access, then
$$\text{effective access time} = (1-p) \times ma + p \times \text{page fault time}$$

the following steps take place on a page fault
1. trap to OS
2. save process state
3. determine interrupt was a page fault
4. check for legal referee
5. read from disk to free frame
6. while I/O block yield to other processes
7. disk interrupt
8. save state of currently executing process
9. determine interrupt was from disk
10. update page table
11. wait for CPU to be allocated to this process again
12. restore process state
so lets say roughly ~8 milliseconds for a page fault, so the effective memory access time is dominated by page faults – need to keep the page fault rate low!
fork and exec
many forked processes immediately call exec, which means that they shouldn't need to fault in all of their parent's memory pages.

copy on write
only duplicates a memory page when it is written two by one process, so the child and parent can share memory pages until someone starts writing – the makes fork calls like the above much more efficient

zero-fill-on-demand
many OS maintain a pool of pages which can be used when a page is needed e.g. when some writes in the above scenario. these pages are 0'd out before being allocated.

vfork
is a very efficient for implementation, in which the called is suspended and the child owns the address space – it is intended mainly for children which will immediately call exec
page replacement
virtual memory leads to memory over-allocation where the combined sizes of the virtual memory of all processes is bigger than the size of physical memory, in these cases page replacement is required.
new page-fault service w/page-replacement
1. find desired page on disk
2. find a free frame
  1. if ∃ free frame, then use it
  2. if not select a victim frame w/page-replacement
  3. if modify-bit is set, write the victim frame to disk, update page table
3. read page into newly freed frame, update page/frame tables
4. restart process
the modify-bit is set when a page in memory has been modified, and needs to be written out to disk

the performance of these systems is largely controlled by the frame-allocation algorithm and the page-replacement algorithm
page replacement algorithms

FIFO page replacement
when a page needs to be swapped out then swap out the oldest page

Belady's Anomoly
sometimes there can be more faults with more memory when using a FIFO algorithm

optimal page replacement algorithm
useful for comparison, must know the future

LRU page replacement
replace the least recently used page, two implementations

counters
there is a system-wide clock which increments with each memory event, and when a page is accessed the current clock time is placed in the page-table entry for that page

stack
whenever a page is accessed it is placed on top of the stack, the bottom of the stack will be the LRU page

LRU is a stack algorithm, which means that the set of pages in memory is a subset of the pages which would be in memory if there was more memory

LRU only makes sense with hardware support, which is not always supplied

LRU replacement approximation
can use hardware support in the form of a reference bit. periodically all reference bits set to zero, when a page is used it's reference bit is set to 1, then we can always see which pages have been used since the last zeroing.
this can be improved by using an 8-bit byte for reference and recording the usage pattern of a page over the last 8 zeroings

second chance
is like fifo with reference bits, if a page is selected in fifo and it's reference bit is set (it's been used recently) then it is moved to the top of the fifo list – so it won't be up for replacement again until all other pages have been rejected or given second chances

second chance w/modify
can be improved by considering the modify bit as well.

counting based algorithms

least frequently used
increment use counter on every page reference, this can be inefficient when a page gets heavy initial use, and then isn't read again

MFU
based on the argument that the LFU page was probably just brought in and needs to be kept around – stupid
frame allocation
- minimum number of frames, it doesn't make sense to give a process too few frames
algorithms

equal allocation
each process gets num-frames/num-procs frames

proportional allocation
allocate frames to processes based on their size

global allocation
when a process needs a new frame it is selected from all processes frames

local allocation
a processes new frames must come from its own pool of frames – can protect against thrashing applications

thrashing
if the number of frames required by a process falls below it's minimum number, then it will thrash

working set model
pages used in the last Δ time are in the working set of a process, OS tracks each processes working set and tries to maintain it in memory

page fault frequency
decide upper and lower page-fault rates, and try to assign each process enough memory to keep it in that range
memory mapped files
maps a disk block to (a) page(s) in memory, the virtual memory system and paging then handle all reads and writes to the file – although some file writes may live for a while in physical memory before propagating to disk.
memory mapped I/O
some sequences of memory are reserved for I/O devices, reads/writes to these addresses cause reads/writes through the I/O devices. this is appropriate for fast devices (like video)
kernel memory
generally has it's own pool of memory because it
- is often allocated in sub-page sizes
- is often required to be physically contiguous to satisfy some hardware devices
too-larges sizes of memory allocation can result in internal fragmentation, which can be fought through

buddy system
memory divided by twos until it's not more than twice the size needed, and then allocated contiguously, if needs grow then two nearby halves can be coalesed into a bigger chunk

slab allocation
a slab is one or more contiguous pages, and a cache is one or more contiguous slabs, each cache has a number of objects of some type (resource descriptors, semaphores, etc…). objects are allocated rather than pages. in linux slabs are full, empty, or partial. this fights fragmentation and ensure quick object allocation and de-allocation
misc

prepaging
can make process startup more efficient

page size
dependent on hardware, effects virtual memory performance

TLB reach
is equal to the page size times the number of pages that fit in the TLB, various page sizes can increase TLB reach w/o increasing internal fragmentation

inverted page table
reduce memory needed to map virtual-to-physical address translations, ∀ physical memory pages we track the process id and vm page. however when a page fault occurs we need to know every page for a process, which is no longer stored in physical memory – however page faults are slow anyways, so this information can be stored outside of physical memory

I/O lock
sometimes need to lock a page into memory, say I just read a bunch of shit from disk, and it needs to be read before it's paged out

file systems – dinosaur 11

secondary storage (mainly considering on disk)

transfer data in blocks which contain 1 or more sectors which contain anywhere from 32 to 4096 bytes (usually 512)

file system provides logical interface to user, and physical implementation on disk, many levels

  application programs 
           |
           v
  logical file system: meta-data and structure info
           |         
           v         
file-organization module: logical->physical translation
           |            
           v            
    basic file system: generic commands to device drivers
           |
           v
       I/O control: device drivers
           |      
           v      
        devices

file system control structures

boot control block
contains information used to boot an OS from the disk, can be empty of no OS is present

volume control block
volume (partition) details, number of blocks, free blocks (counts and pointers), free FCBs (counts and pointers) also called a superblock or master file table

directory structure
can be stored in a master file table or in inodes

file control block
contains meta-data about a file like ownership, permissions, and location and size of contents, (on unix it may also contain a field indicating if it's a directory or a device) this may be called an inode or it can be stored in the master file table

Some file system data is loaded into memory at mount time

mount table
information about the mounted volumes

directory-structure cache
holds information about recently accessed directories

open-file table (system-wide)
FCB of every open file

open-file table (process-wide)
pointers to the system-wide open file table

Actions on file systems
- to create a file an FCB is allocated, and the containing directory is then updated, and the FCB and directory are written to disk
- a open() OS call opens a file for reading, it will
  - search the system-wide open-file table for the file, if not then the directory structure must be searched (time consuming) and the file is placed onto the system-wide file table
  - a pointer is created from the process specific file table to the system wide file table – this pointer may also include a read/write offset into the file for the process
  - when the process closes the file, the process-specific pointer is removed, and the open-file count in the system-wide table is decremented
partitions
disk partitions can be raw (no FS, useful for DB) or cooked with a file system.
the root partition is mounted at boot time, it contains the OS

windows mounts behind C: letter namespaces, unix allows mount points to be located arbitrarily in the directory structure, a special flag in the inode specifies if a directory is a mount point.

virtual file systems
how does a system handle multiple file system types?

             +------------------------+
             |  File-system interface |
             +----------+-------------+
                        |
                        V
                +-----------------+
         +------|  VFS interface  |--------+
         |      +-----------------+        |
         |              |                  |
         v              v                  v
    +-----------+  +-----------+   +------------+
    | local FS1 |  | local FS2 |   | remote FS3 |
    +-----------+  +-----------+   +------------+

file system interface

consists of open, read, write, close

virtual file system

associates files with their FS, and uses backend specific calls for each file

separates generic operations from FS-specific implementation
uniquely represents a file, each file has a network-wide unique vnode, this uniqueness is required for network file systems

four objects of VFS

inode object, individual file
file object, open file
superblock, entire file system
dentry, individual directory entry

directory
on disk directories consist of lists of files, which can be stored in a list (can be slow to search, insert, delete), a sorted list (requires sorting), or a hash table (which can be fast, but fixed size and requires new hash functions for each new size so adding a 65th entry could be a pain)
one options could be a chained-overflow hash in which a fixed size hash table is used, and if it overflows, then entries can grow into linked lists
allocation on disk

distributed shared memory

http://cs.gmu.edu/cne/modules/dsm/dsmsway.html

Theory

cs550 : all old/current homeworks
cs530: all homeworks

math basics – Corman, Chpt. 3

logarithm review

a = b^log_b(a)
log_c(ab) = log_ca + log_cb
log_b(aⁿ) = n log_b(a)
log_b(1/a) = -log_b(a)
log_b(a) = 1/log_a(b)
a^log_b(n) = n^log_b(a)

some series expansions

ln(1+x) = x - x²/2 + x³/3 + … when |x| < 1
x/(1+x) ≤ ln(1+x) ≤ x if x > -1
n!<nⁿ
and with Stirling's Approximation $n! = \sqrt{2 \pi n} \left(\frac{n}{e}\right)^{n}\left(1 + \Theta(\left(\frac{1}{n}\right))\right)$
lg(n!) = Θ(n lg n)

more series

arithmetic series: $\sum^{n}_{k=1} k= 1 + 2 + ... + n = \frac{1}{2}n(n+1) = \Theta(n^{2})$
geometric series: $\sum^{n}_{k=0} x^k = 1 + x + x^{2} ... + x^{n} = \frac{x^{n+1}-1}{x-1}$ when the summation is infinite and |x|<1 then we have an infinite decreasing geometric series and $\sum^{\infty}_{k=0} x^{k} = \frac{1}{1-x}$
harmonic series: $H_{n} = 1 + 1/2 + 1/3 + ... + 1/n = \sum^{n}_{k=1}\frac{1}{k} = ln(n) + O(1)$
telescoping series: ∀ sequence a₀, a₁, … a_n $\sum^{n}_{k=1}(a_{k} - a_{k-1}) = a_{n} - a_{0}$ and similarly $\sum^{n-1}_{k=o}(a_{k} - a_{k+1}) = a_{0}-a_{n}$
products: we can convert a product to a summation by using $lg\left(\prod^{n}_{k=1}{a_{k}}\right) = \sum^{n}_{k=1}{lg(a_{k})}$

Regular Languages

Finite Automata

Asymptotic Notation – Corman, Chpt 2

Θ: Θ(g(n)) = {f(n) : ∃ positive constants c₁, c₂, and n₀ s.t. c₁(g(n)) ≤ f(n) ≤ c₂(g(n)) ∀ n ≥ n₀}
O: O(g(n)) = {f(n) : ∃ positive constants c and n₀ s.t. 0 ≤ f(n) ≤ c(g(n)) ∀ n ≥ n₀}
o: o(g(n)) = {f(n) : ∃ positive constants c and n₀ s.t. 0 ≤ f(n) < c(g(n)) ∀ n ≥ n₀}
Ω: Ω(g(n)) = {f(n) : ∃ positive constants c and n₀ s.t. 0 ≤ c(g(n)) ≤ f(n) ∀ n ≥ n₀}
ω: ω(g(n)) = {f(n) : ∃ positive constants c and n₀ s.t. 0 ≤ c(g(n)) < f(n) ∀ n ≥ n₀}

asymptotic notations obey equivalence properties

transitivity: f(n) = Θ(g(n)) ∧ g(n) = Θ(h(n)) → f(n) = Θ(h(n)), also true for for O, o, Ω and ω
reflexivity: f(n) = Θ(f(n)), also true for O and Ω
symmetry: f(n) = Θ(g(n)) ↔ g(n) = Θ(f(n))
transpose symmetry: f(n) = O(g(n)) ↔ g(n) = Ω(f(n)) and f(n) = o(g(n)) ↔ g(n) = ω(f(n))

note that not all functions are asymptotically comparable, e.g. it is possible that for functions f(n) and g(n) neither f(n) = Ω(g(n)) or f(n) = O(g(n))

Recurrence Relations – Corman, Chpt 4

recurrence: equation or inequality which describes a function in terms of its value on smaller inputs

methods of solving recurrences

substitution method: guess a bound and then use mathematical induction to prove correctness
iteration method: converts the recurrence into a summation
master method: provides bounds for recurrences of the form T(n) = aT(n/b) + f(n), where a ≥ 1, b > 1, and f(n) is given

substitution method
guess and check, e.g.

start with
T(n) = 2T⌊ n/2 ⌋ + n

guess
T(n) = n lg n

substitute
T(n) = 2(c⌊ n/2 ⌋ lg ⌊ n/2 ⌋) + n

solve

\begin{eqnarray} T(n) &=& 2(c\lfloor \frac{n}{2} \rfloor lg \lfloor \frac{n}{2} \rfloor) + n\\ &=& c n lg \lfloor \frac{n}{2} \rfloor + n\\ &=& c n lg n - c n lg 2 + n\\ &=& c n lg n - c n + n\\ &=& c n lg n\\ \end{eqnarray}

note
be careful to choose an n₀ large enough so that we can pick a c which works – e.g. if we picked n₀=1 above, then ¬∃ c s.t. T(1) ≥ c1lg1 = 0

subtract a low order term
- if an inductive hypothesis is almost there, then you can try subtracting a low order term from the guess, e.g.
- T(n) = T(n/2) + T(n/2) + 1
- we guess T(n) = Θ(n) which makes sense, but then after reducing we're left with T(n) ≤ cn + 1 which doesn't work, so we change our guess
- T(n) = Θ(n) - b where b ≥ 0
- sub that in the above and it turns out cleanly
be sure it reduces exactly
- guessing T(n) ≤ cn for the above
- then \begin{eqnarray} T(n) &\leq& 2(\lfloor cn/2 \rfloor) + n\\ &\leq& cn + n\\\ &=& O(n) \end{eqnarray}
  which is wrong, we needed to prove T(n) ≤ n
sometime renaming variables helps
- $T(n) = 2T(\lfloor\sqrt{n}\rfloor) + lg n$
- if we rename m = lg n then we get
- $T(2^m) = 2T(\lfloor2^{m/2}\rfloor) + m$ which looks easier
- S(m) = T(2^m)
- then we have S(m) = 2S(m/2)+m
- which works out to $$S(m) = ](m lg m) = O(lg n lg lg n)$$
iteration method
just keep plugging it into itself, and use the identities from places like math-basics

recursion tree
break out as a tree, and then sum the cost across all nodes at each level of the tree, the result will be a series which is the total run-time.
or for a fancier example (authored by Manuel Kirsch, taken from the web)
master method
cookbook method for solving recurrences of the form T(n) = aT(n/b) + f(n) where a ≥ 1 and b > 1
T(n) can be bounded asymptotically as follows
1. if f(n)=O(n^log_ba-ε) for some ε>0, T(n) = Θ(n^log_ba)
2. if f(n)=Θ(n^log_ba), then T(n)=Θ(n^log_balg(n))
3. if f(n)=Ω(n^log_ba+ε) for some ε>0 and af(n/b) ≤ cf(n) for some c>1 and all sufficiently large n, then T(n)=Θ(f(n))
basically this comes down to which is bigger, the recursive cost defined by n^log_ba or the cost of splitting and re-combining captured by f(n).

Note: that there are recurrence relations which slip between these three cases, e.g. when f(n) is larger then n^log_ba, but not polynomially larger (e.g. T(n) = 2T(n/2) + n lg n), etc…

Sorting Algorithms – Corman, Chpt 7-9

Heapsort
heaps
array which can be viewed as a binary tree. ∀ index i into the array
- parent(i) = i/2
- left(i) = 2i
- right(i) = 2i+1
which satisfies the heap property, namely A[parent(i)] ≥ A[i]

the height of each element is always Θ(lg(n)), and basic operations on heaps run in O(lg(n)) time
heapify
O(lg(n)), maintains the heap property
takes an array A and index i as inputs, when called it is assumed that the left(i) and right(i) subheaps are valid heaps, but A[i] may be smaller than either. basically heapify keep dropping element A[i] down to the larger of it's children, and then recursively calls heapify. the recursive call has size at most 2n/3 (if the bottom row is exactly 1/2 full)

T(n) ≤ T(2n/3) + Θ(1)

which by case 2 of master theorem is O(lg(n))

build-heap
O(n), builds a heap, calling heap in a bottom up manner
BUILD-HEAP
heap-size[A] ← length[A]
for i ← length[A]/2 downto 1
do Heapify(A,i)

for a heap there are at most $\frac{n}{2h+1^{}}$ nodes of height h, so the running time of build-heap will be O(n)

heapsort
O(n lg(n))

extract-max & insert
O(lg(n))

one popular application of heaps is to implement priority queues
Quicksort
worst case running time is Θ(n²) but expected running time is Θ(nlgn)
Quicksort(A,p,r)
if p<r
     then q ← Partition(A,p,r)
         Quicksort(A,p,q)
         Quicksort(A,q+1,r)

where Partition(A,p,r) divides A[p..r] selects a pivot x=A[p], and partitions A[p..r] into two [a,x,b] s.t. ∀ y ∈ a y<x and ∀ y ∈ b y>x

worst case happens when the partitions are of size n-1 and 1 (i.e. a sorted list), the best is when partitions are balanced
Lower bound for comparison sort
any comparison-based sort will run in at least Ω(nlgn) time, because that is the minimum number of comparisons needed to determine the sorted order.
A based on decision trees, each comparison is between 2 elements so it's a binary decision tree, ∃ n! possible ways that an n-element list could be sorted, so the list has n! leaves. a lower bound on the height of the trees, is a lower bound on the running time.
- given n! leaf nodes, we know 2^h ≥ n! → h ≥ lg(n!)
- using Stirling's Approximation, we can say n n! ≥ (n/e)ⁿ
\begin{eqnarray} h &\geq& lg{\left(\frac{n}{e}\right)^n}\\ h &=& n lg n - n lg e\\ h &=& \Omega(n lg n) \end{eqnarray}
non-comparison sorts
- Counting sort
  if each element in a list is ∈[1..k] then they can be sorted in O(k) time.
  uses a intermediate array C[n] which is setup to hold the number of elements of each size, so ∀ i C[i] is equal to the number of times i is a value in A, then a second pass sets every C[i] equal to the number of elements ≤ i, and finally the elements of A can easily be dropped into appropriate places in B
  
  this runs in O(n) time – 3 passes
- Radix Sort
  sort on the decimal places or on seperate keys
  RADIX-SORT(A,d)
  for i ← 1 to d
  do use a stable sort to sort array A on digit i
- Bucket Sort
  runs in linear time on average for integers evenly distributed across a small range
  items are put into n buckets, then each bucket is sorted and they are combined

Programming Languages

logic programming

(see LP:2010-01-28 Thu) In logic programming everything is relations, relations always have one more argument than the related function

factorial in logic programming

base case !P (1, 1).
further cases !P (s(x),y) :- !P (x,z), *P (s(x), y, z)

now with append

append-P(nil, y, y).
append-P([a|x], y, [a|z]) :- append-P(x, y, z)

the ":-" operator can be though of as reverse implication

terminology and conventions

query: a function call, a set of initial states and you want to see if and what satisfies them, a predicate symbol with a set of terms
relations are named by predicate symbols
terms: argument to predicate symbols
- variable
- constant
- functor applied to a term
functors are the primitive functions in the language (e.g. cons)
atom: is a predicate symbol applied on terms
ground term: a term w/o variables
clause: disjunction of a finite set of literals (or'd together)
literal: is an atom or its negation
Horn clauses: most logic programming only deals with Horn clauses, these are clauses in which at most one literal is positive – most of the time we will have exactly one positive literal
"HEAD :- BODY" ≡ HEAD ∨ ¬ BODY1 ∨ ¬ BODY2 ∨ ¬ BODY3 ∨ ¬ …
logic program: the conjunction of a finite set of Horn clauses

λ-calculus

(see quick lambda calculus review and SFW:2009-10-29 Thu)

syntax
- expressions
  - variable / identifier
  - abstraction: if e is an expression and v is a variable then (λ (v) e) is an expression
  - application: if e₁ and e₂ are expressions then (e₁ e₂) is an expression, the application of e₁ to e₂
computation
sequence of application of the following rules
- β-rule: evaluates an application of an abstraction to an expression
  - ((λ (x) e) e₂) -> replace all free occurrences of x in e with e₂
- α-rule: expressions are equal up to universal uniform variable renaming, and the α-rule allows you to rename bound formal parameters
- η-rule: (λ (x) (e x)) -> e
church constants
booleans
- true – (λ (x) (λ (y) x))
- false – (λ (x) (λ (y) y))
so (T e1 e2) = e1, and (F e1 e2) = e2

so not is (λ (x) (x F T))

and and is (λ (b) (λ (c) b c)) false

work like conditional statements, so
- true A B = A
- false A B = B
natural numbers
functions:
successor

(λ (n)
   (λ (f)
     (λ (x)
       (f (f n x)))))

sum

(λ (n)
   (λ (m)
     (λ (f)
       (n f m))))

and numbers

0
(λ (f) (λ (x) x))

1
(λ (f) (λ (x) (f x)))

2
(λ (f) (λ (x) (f (f x))))
y-combinator
Ω = ((λ (x) (x x)) (λ (x) (x x)))
provides for recursion, e.g.

((λ (f)
     ((λ (x) (f (x x)))
       (λ (x) (f (x x)))))
     g)

will create an infinite nesting of applications of g
theorems
Church-Rosser Theorem: any expression e has a single unique (modulo the α-rule) normal form, however it is possible that some paths of computation from e terminate and some don't terminate
Turing Church Thesis: e is computable by a Turing-machine iff e is computable by λ-calculus

garbage collection

mark-and-sweep: when memory is low, program execution is frozen, the GC moves through the working set of memory marking everything that is the root set, that is those objects which are accessible from functions on the heap. Then in a second pass, all of those objects which are transitively accessible from objects in the root set are also marked as in use. Then all of those objects which are not marked are removed
copying garbage collector: memory is divided into a from-space and a to-space. initially everything is in the from-space, then when it starts to fill, objects in use are copied into the to-space, leaving unused objects in the from-space. This implicitly divides the objects into those in use which have been copied, and those not in use. As the to-space starts to fill up, the from-space is cleared, the semantics of the two spaces are swapped, and the process repeats.

tradeoffs

mark-and-sweep
- bloats objects with a mark bit
- freezes program execution
- requires multiple passes of most of memory
copying
- requires lots of extra memory space
- lots of movement/copying of data

Terms

OS terms

LWP: light weight process – a kernel thread is exposed to the user through a light weight process which appears to the user as a virtual cpu.
PCB: process control block – used to (re)activate a process when it is given access to the cpu
FCFS: first come first serve scheduling policy
convoy effect: when 1 cpu bound process wrecks the efficiency of a FCFS scheduler for many i/o bound processes
SMP: symmetric multiprocessor system – a system with multiple homogeneous processors each of which handles it's own scheduling
symmetric multithreading: like SMP, but with multiple logical rather than physical processors – these logical processors are a feature provided by the hardware
hyperthreading: (see symmetric multithreading)
PCS: process contention scope, when two threads belonging to the same process are competing for cpu resources
SCS: system contention scope, scheduling kernel threads to system resources – normal scheduling
race condition: interleaved read/writes of multiple processes to an unlocked shared resource – this can lead to inconsistent resource state
deadlock: when two or more processes are waiting indefinitely for an event that can be caused only by one of the waiting processes
transaction: a collection of instructions which form a logical unit and in which either all or executed or none are executed

Theory terms

Programming Languages terms

Old Tests and Questions


	OS	PL	Th
Spring 2005	systems-spring05.pdf		compsSpring05.pdf
Fall 2005	systems-fall05.pdf	2005-ProgLang.pdf
Spring 2006	systems-spring06.pdf	2006-ProgLang.pdf
Fall 2006	fall06.pdf	2006-ProgLang.pdf	compsFall06.pdf
Spring 2007	spring07.pdf	2007-ProgLang.pdf	compsSpring07.pdf
Fall 2007	fall07.pdf	2007-ProgLang.pdf	compsFall07.pdf
Spring 2008	spring08.pdf	2008-ProgLang.pdf
Fall 2008		proglang-08.pdf
Spring 2009	syscompsspring09.pdf	2009-ProgLang.pdf	theory₀₉.pdf

Tests

Spring 2005

OS
- section 2
  1. In a write through cache, every write to the cache causes a write to main memory. This ensures that main-memory always contains the latest version of any cached data, slows cache writes to the speed of memory writes
    In a write back cache, writes to cache locations are not immediately propagated to memory but are rather marked "dirty". When a "dirty" section is evicted form the cache it is written to memory. This allows faster cache writes, but can require two memory access to memory on a cache-miss (one to write out the dirty cache, and another to read in new content from memory). This also ensures less consistency guarantees.
  2. log-structured file systems optimize writes. This is done by buffering many small writes to the file system and writing them out to one continuous section of disk. This turns small random-access writes into fewer large contiguous writes allowing the write speed of disks to approach the maximums supported by hardware.
    in the log-structured file systems, data written at the same time is collocated, while in the Berkley FFS data belonging to the same file is collocated. So reading will be faster for the Berkley FFS as a whole file can read in fewer larger contiguous reads.
  3. skip TCP
  4. consistency models – typically there is no global clock
    strong consistency
    (called sequential consistency in Munin paper) any write is immediately visible to subsequent reads
    
    causal ordering
    uses communication between processes to determine a global partial ordering – think Lamport clock
    
    weak consistency
    this is not really ever used. makes no guarantees that writes will be visible to future reads
    
    eventual consistency
    write will eventually be seen
    
    release consistency
    uses two synchronization operators release and acquire, either
    
    all writes are globally available after the writer releases the memory (i.e. eager)
    
    all writes are available to anyone who subsequently requires the memory (i.e. lazy also called entry consistency)
    PRAM consistency
    Pipelined Random Access Memory, also known as FIFO, writes are seen in the order they issue from the writer, which means that different processes could see different interleaved order of writes from different writers (only consistent order form a particular writer)
    The main difference between weakened models like release consistency and entry consistency is that they allow for less coupling between processes. This reduces the communication overhead between processes. While strong and PRAM consistency provide more stringent guarantees simplifying the job of the developer they require much more work on the part of the OS.
- section 3
  1. using base-bound address space transformation instead of page tables
    this is simpler to implement, and takes less hardware (e.g. no TLB) and less transformations for memory lookup (no need to look into a page table for dereference). These are both appropriate for lower-power smaller embedded systems.
    
    it looses some of the advantages of page tables, e.g. the ability to address large address spaces, the ability to load large programs into memory one-page at a time, and the ability to fight the fragmenting of main memory through regular page sizes. However with the smaller simpler programs running on these systems, and the lower degree of multiprogramming, these tradeoffs are generally in favor of base-and-bound addressing.
  2. spin-locks "busy-wait" of sit on the CPU continually checking for the lock to become available – this is ideal for locks which are quickly aquire-able because it avoid the need for context switching, however this is bad for low-wait locks as it wastes CPU time.
    blocking locks will yield the CPU to wait for a lock, placing the yielded process in a que for reactivation when the lock becomes free, this has the opposite benefits of spin locking.
    
    with a highly bimodal distribution the introduction of a max-spin-number variable could be used to spin for the time of the normal short lock, and then yield the CPU assuming the lock is of the longer variety.
  3. kernel designs
    monolithic
    fast in that all of the working components of the OS are already in kernel space, and they can share information and processes and communicate efficiently. this has the downside in that the kernel is large and complex, and it can be hard to develop new components.
    
    monolithic with loadable modules
    part way between monolithic and exokernel/library OS systems
    
    microkernels
    the kernel is only responsible for process separation and protection, in the case of L4
    
    address-spaces
    
    threads
    
    scheduling
    
    and synchronous inter-process communication
    
    the micro kernel can do these simple things very quickly and efficiently, while still being easy to maintain. Also, having much of the functionality of the OS implemented outside of the kernel it makes the OS easier to maintain and extend.
    
    having to push many system calls through the kernel-space user-space boundary multiple times (e.g. syscall for disk I/O, into kernel, then sent to disk driver in userspace, then back to kernel in kernel-space, then back out to OP)
    exokernels / library-systems
    the kernel focuses on securely multiplexing hardware resources between untrusted software, and it makes these resources fully accessible to user-space processes, so your device drivers are implemented entirely as linked libraries in userspace, avoiding the need for many user/kernel-space crossings
    very easy to extend new functionality, in some ways faster than µ-kernels
    
    not possible for separate drivers held in separate processes to cooperate.
- section 4
  1. multiprocessor scheduling
    
    single global scheduler
    easy to implement things like barriers, easy to apply a global ordering on the scheduled events in the system, easy to implement global scheduling decisions and to avoid starvation, simple to program and to reason about. All the information is in one place.
    brittle, in that if this one proc breaks, then everything breaks until a new leader is elected
    
    slow, this one proc could becomes a bottleneck
    
    inefficient, this one proc may need to keep track of a very large number of jobs, leading to inefficient sorting of jobs by priority, etc…
    
    needs large job quanta, parallelism needs to be at a high level (can't efficiently exploit low-level parallelism and/or cooperation)
    
    can't scale past a certain number of processes simply because of length of proc. queue
    
    cooperating per-proc schedulers
    would take load off of the global scheduler addressing problems of the size of the proc. queue on the global scheduler, and it's potential for serving as a bottle neck (e.g. context switch requires network I/O)
    lose ability to enforce global ordering on tasks.
    
    increased complexity
    
    scheduler hierarchy
    more difficult to program/reason, must divide task and resources into hierarchy
    less privileged points (but still exist e.g. at the top of the hierarchy)
    
    difficult to program and reason about
    
    no single communication bottlenecks
    
    smaller job quanta is possible
  2. how to implement a network FS with a simple processing switch at it's core
Theory
- Exam Seating
  Given a graph, can you order the nodes of the graph s.t. no two adjacent nodes in the ordering share an edge.
  Hamiltonian Path ⊆ Exam Seating
  
  for any Hamiltonian Path take the compliment of the graph, and try to find an Exam Seating.
- flying between cities
  not finished on the third part
  1. Djikstra's shortest path.
    keep a sorted list of possible outgoing paths, at each iteration follow the lightest path in the list, and add the aggregate weights of the outgoing paths to the list.
    
    This runs in time polynomial in the size of the graph – no edge will be traversed more than once.
  2. We can no longer track the shortest aggregate path (because length matters as well). So we can do this one through multiplication of the transition matrix. k multiplications runs in time polynomial in n but not in k.
  3. We can't do this in multiplications of the transition matrix. There are (n-1)^k possible paths of length k. We can't sort these in polynomial time in either n or k.
- minimum weight spanning tree
  minimum weight spanning tree.
  - check all cycles containing e.
  - if e is the heaviest edge in any of those cycles, then it is not in the minimum weight spanning tree.
  the first part can be done by incrementally stepping out along edges leaving e. as each edge is traversed check if it completes a cycle, and check if e is the max-weight edge in that cycle, this will take O(|E|) time.
- hash table
  chance of log(n) items in one place
  keep in mind that
  - $\binom{n}{k} \simeq \frac{n^k}{k!}$
  for any one bin \begin{eqnarray*} P(k) &=& \binom{n}{k}\left(\frac{1}{n}\right)^{k}\left(1-\frac{1}{n}\right)^{n-k}\\ &\leq&\frac{n^k}{k!}\left(\frac{1}{n}\right)^{k}\left(1-\frac{1}{n}\right)^{n-k}\\ &<& \frac{1}{k!} \end{eqnarray*} so for all bins
  $P(k) < \frac{n}{k!}$
  with Sterling's approximation we can change this to
  $P(k) < \frac{n}{(\frac{k}{e})^{k}}$
  so we'll want $\left(\frac{k}{e}\right)^{k} > n$, which becomes true when we get k up to about ln(n)
- perfect matching
  easy, every vertex which has different edge matchings is in the same cycle in each matching, and can transfer between the two graphs by rotating the cycling.
  if two graphs have many different edges in many different cycles, well then just do the cycles one-by-one.
- is NP-completeness relevant to the real world
  not needed now
  - cryptography P=NP → cracking is as easy as decryption

Fall 2005

OS
- short answer
  1. deadlock is when no progress can be made in a system because more than one process are all mutually holding resources needed by other processes and are waiting for the freeing of resources needed by other processes.
    - mutual exclusion
    - no preemption
    - hold and wait
    - circular wait
  2. the optimal page replacement strategy is to evict the page which will next be needed furthest in the future. This strategy is frequency approximated by LRU which assumes that the immediate future will resemble the immediate past
  3. a journaling file system maintains a journal of all changes to disk. In the event of a failure which would otherwise leave the disk in an inconsistent state, rather than having to check all inodes on disk the OS can step back to the last confirmed valid point in the journal and replay all subsequent actions from there.
    - how is this different from the journal OS
  4. ????
- medium answer
  1. Caching
  2. Virtual Memory
  3. ???
  4. Policy and Mechanism in Modern OS
- design
PL
Theory

Spring 2006

Theory
- 7
  transition matrix
  
  1/3 1/3 1/3 0 0 0
  
  0 1/3 1/3 1/3 0 0
  
  0 0 1/3 1/3 1/3 0
  
  0 0 0 1/3 1/3 1/3
  
  1/3 0 0 0 1/3 1/3
  
  1/3 1/3 0 0 0 1/3
  eigenvector/eigenvalue pair
  for a matrix A, the eigenvector x s.t. A x = λ x where A is the eigenvector and λ is the eigenvalue
  - A x = λ x
  - [A x - λ x] = 0
  - [A - λ I] x = 0 – where I is the identity matrix
  - [A - λ I] x = 0 ⇒ det([A - λ I] x) = det([A - λ I])det(x) = 0
  determinant
  
  1 2
  
  3 4
  
  sum of product along down-sloping diagonals minus sum product of up-sloping diagonals, so 1*4 + 2*3 - 1*4 - 2*3 = 0
  so if det([A - λ I]) = 0, then we know [A - λ I] =
  
  1/3 - λ 1/3 1/3 0 0 0
  
  0 1/3 - λ 1/3 1/3 0 0
  
  0 0 1/3 - λ 1/3 1/3 0
  
  0 0 0 1/3 - λ 1/3 1/3
  
  1/3 0 0 0 1/3 - λ 1/3
  
  1/3 1/3 0 0 0 1/3 - λ
  
  and…
  - (1/3 - λ)⁶ + (1/3)⁶ + (1/3)⁶ = 0
  - (1/3 - λ)⁶ = 2(1/3)⁶
  - (1/3 - λ) = (2(1/3)⁶)^1/6
  - 1/3 - λ = 2^1/6(1/3)
  - λ = 1-2^1/6/3

1/3	1/3	1/3	0	0	0
0	1/3	1/3	1/3	0	0
0	0	1/3	1/3	1/3	0
0	0	0	1/3	1/3	1/3
1/3	0	0	0	1/3	1/3
1/3	1/3	0	0	0	1/3

1	2
3	4

1/3 - λ	1/3	1/3	0	0	0
0	1/3 - λ	1/3	1/3	0	0
0	0	1/3 - λ	1/3	1/3	0
0	0	0	1/3 - λ	1/3	1/3
1/3	0	0	0	1/3 - λ	1/3
1/3	1/3	0	0	0	1/3 - λ

Fall 2006

PL
- 3.2 Axiomatic and Denotational Semantics – comparison
  
  denotational semantics
  Formal basis is in set theory. State is explicitly represented. A program is a mathematical functions (the composition of many functions) from state to state.
  Expressions return the value of some aspect of the state.
  
  Statements are functions from state to state.
  
  axiomatic semantics
  More abstract than denotational semantics. Rather than explicitly representing state, we only track predicate statements about the state. A program transforms these predicates, and the semantics of a program are represented in terms of weakest preconditions and strongest postconditions of these predicates represented as Hoare triplets.
  
  These semantics are similar in that they can both be used to track the effect of a program on state (with varying degrees of abstraction). Also both of these semantics break programs into side-effect-free expressions, and side-effect-full statements. The Axiomatic semantics of a program can be derived from the program's Denotational semantics.
  
  There are cases in which each semantics could be preferable to another.
  - When working with an imperative program, in which some properties (expressed in predicate logic) of the state are of interest (e.g. invariants) and where program termination is of interest, then Axiomatic semantics will be preferable.
    For example the following program which returns the product of two positive inputs, will terminate and when it does a will be equal to the product of the inputs.
```
a = 0
b = INPUT1
c = INPUT2
while c > 0 do
  a = a + b
  c = c - 1
endwhile
a
```
    Termination can be proved with the following space $\mathbb{N}$, ordering $<$, and measure c, and the observation that c decreases under this measure with every loop iteration, shown with the weakest precondition across the while loop.
    
    \begin{eqnarray*} \{Q\} c := c-1 \{P\}\\ Q &=& wp(c := c-1, P)\\ Q &=& P|^{c+1}_{c} \end{eqnarray*} so ∀ loop iterations the value of c decreases under the strict ordering described above.
    
    Correctness can be shown with loop invariant $$b \times c + a = INPUT1 \times INPUT2$$
    
    This can be seen trivially at the start of the loop because $a=0$. This property is maintained through each loop iteration. \begin{eqnarray*} \{Q\} c := c-1 \{P\}\\ Q &=& wp(a=a+b; c:=c-1, P)\\ Q &=& P|^{c+1}_{c}|^{a-b}_{a}\\ \end{eqnarray*} our loop invariant remains true across this substitution \begin{eqnarray*} INPUT1 \times INPUT2 &=& b \times c + a|^{c+1}_{c}|^{a-b}_{a}\\ &=& b \times c + a - b|^{c+1}_{c}\\ &=& b \times (c+1) + a - b\\ &=& (b \times c) + b + a - b\\ &=& b \times c + a \end{eqnarray*} so when $c=0$ then $INPUT1 \times INPUT2 = a$.
  - When working with a program which can be naturally decomposed into a series of function compositions then denotational semantics may be preferable, this is especially true if it is possible to reason mathematically about actions of the statements in the program on the state of the program.

Fall 2007

OS
- section 2
  - 1
    - before 25
    - before 34
  - 2
    7. Fall 2006 3.2
  - 3
    don't care
  - 4
    sequential and causal consistency in a group RPC system
    
    strict consistency
    everything happens in exactly the order in which it was sent – everything happens in the same order as if it was done by a single processor
    
    causal consistency
    events which potentially are causally related are seen by every node of the system in the same order
- section 3
  - 1
    - H
    - F
    - E
    - D
    - A
    - B
    - G
    - C
  - 2
    12. Fall 2007 3.2
  - 3
    distributed hash table
    - servers organized in a logical ring
    - every node maintains a table of the tables powers of two further along from itself down the ring of servers
    - send a request to a server, it sends the request to the server in it's lookup table which is closest to and smaller than the destination
- section 4
  - 1
  - 2

Spring 2008

OS
- 4.2 VM
  how to make a VM look convincingly like a real OS
  - fake everything and have virtual drivers living underneath all of the hardware drivers
  How to detect if you are running inside of a VM
  - peg the CPU, and compare cpu-time to wall clock time, with knowledge of the CPU and the OS it should be possible to calculate
    - the instructions that should be executed on the CPU
    - the wall-clock time it should take to execute the code
    - the actual time it takes to execute the code
  - time required for hardware access (e.g. memory) should be able to detect presence of the VM supervisor
  - environmental factors, fan speed, cpu temperature monitors
- 4.3 Network Protocol Performance and Security
  1. send receipt packets even for packets you're not receiving
  2. random IDs in the sent packets make it impossible to fake a receipt

Fall 2008

PL
- 3.3 Denotational and Axiomatic Semantics – application
  A simple language with the following operations
  
  + × - > ≥ if ∘ while
  
  is extended to include
  Since expressions will now have side effects, we must change our denotational semantics. Both expressions and statements will now accept and return tupples of a value and a state <Val,St>, more formally they will be functions from tupples to tupples. In this way the state will explicitly pass through every expression and statement in a continuation passing style.
  
  So, for example the denotational semantics of the id operator will become (using the convention where Val refers to the first element of the argument tupple, and St refers to the second element of the argument tupple). $$\llbracket id \rrbracket = $$ Following this convention we can define the denotational semantics of the composition and assignment operators.
  
  All of the primitive operators can be encoded in the following manner.
  
  Note that the commutativity of our arithmetic operators is now broken, since either expression could change the state and thus change the value of the subsequent expression.
  
  The new operators can be encoded as.
  
  Finally the if and while operators can be encoded as.
  
  The Axiomatic Semantics can be derived from the Denotational Semantics. Assignment $x := e$ must now take into account the effect of the expression on the state. \begin{eqnarray*} \{Q\} x := e \{P\}\\ wp(x := e, P) &=& wp(e, P)|_{x}^{e_{val}} \end{eqnarray*}
- 3.5 fixpoint semantics
  - P_a
    1. add(0,x,x) :- nat(x).
    2. add(s(x),y,s(z)) :- add(x,y,z).
    3. inc(x,y) :- add(x,s(0),y).
    4. nat(0).
    5. nat(s(x)) :- nat(x).
    Calculation of fixed point
    - TP⁰(∅) = nat(0) (0.0) from 4
    - TP¹(∅) = TP(TP⁰)
      - nat(s(0)) from 5 applied to 0.0 (1.0)
      - add(0,0,0) from 1 applied to 0.0 (1.1)
    - TP²(∅) = TP(T¹)## with x=s(0)
      - nat(s(x)) from 5 applied to 1.0 (2.0)
      - add(0,x,x) from 1 applied to 1.0 (2.1)
      - add(x,0,x) from 2 applied to 1.1 (2.2)
    - TP³(∅) = TP(TP²) ## with x = s(0)
      - add(s(0), x, s(x)) from 2 applied to 2.1 (3.0)
      - inc(0,x) from 3 applied to 2.1 (3.1)
    - TP⁴(∅) = TP(TP³) ## with y=s(x)
      - inc(x,y) from 3 applied to 3.2 (4.0)
    - fixed point
  - P_b
    1. add(0,0,0).
    2. add(s(A),0,s(A)) :- nat(A).
    3. add(A,s(B),s(C)) :- add(A,B,C).
    4. inc(0,s(0)).
    5. inc(s(A),s(B)) :- inc(A,B).
    6. nat(0)
    7. nat(s(x)) :- nat(x)
    Calculation of fixed point
    - TP⁰(∅) =
      - add(0,0,0) from 1 (0.0)
      - inc(0,s(0)) from 4 (0.1)
      - nat(0) from 6 (0.2)
    - TP¹(∅) = TP(TP⁰)
      - …

+	×	-	>	≥	if	∘	while

Spring 2009

Theory
theory₀₉.pdf
OS
- short answer
  1. in the Chord what is the purpose of the finger table – to index the next ln(n) neighbors
  2. pid in a TLB is useful so you don't have to flush the TLB entirely when you context switch, instead you can flush entry by entry
  3. RISC can run at a higher clock speed because the instructions are closer to the hardware, CISC can do more with one instruction but the clock could be slower
    runtime = Instructions × Cycles/Instruction × Sec/Cycle
    
    also, due to increased complexity, CISC instructions could be harder to parallelize on the instruction level
  4. bridges connect networks, and learning bridges remember the shortest path to individual machines
    bridges connect networks, at first they just flood broadcasts, but once they learn where something is they target their output
- medium answer
  1. If I/O bound processes have a higher priority they will be on and off of the CPU quickly, increasing the total resource usage by keeping the I/O devices busy.
    On a user system this will also increase "responsiveness" of the machine, but pushing through processes which interact with the users (screen or keyboard are both I/O devices).
  2. Multimedia tends to have a lot of large streaming writes and reads which means that a contiguous series of blocks will be written at once. With these large streaming I/Os there is less need for parallel access, and you'd rather have your parity bits out of the way of your reads/writes.
  3. shared memory through message passing. each page lives on a single machine, and it's writes are broadcast to other machines reading that page.
- long answer
  1. TCP traffic between cities
    
    traffic engineering
    conscious adjustment of traffic to avoid busy times of day
    
    route flutter
    slight variations in packet headers can change the intermediate routers used
    
    diurnal patterns
    measure at different times of day, (email in morning, and then at night when you get home etc…)
    
    asymmetries
    route there could be different from route back, also maybe producers and consumers of information are not evenly distributed between countries
  2. with >32 node systems what are the advantages of running separate kernels on separate processors
    
    disadvantages
    increased communication overhead for implementing distributed consistency
    a lock could be more difficult, disabling interrupts doesn't help when other kernels still have active threads which could try to use your resource
    
    sharing resources between kernels (e.g. one large external disk) would be difficult (you would need a kernel of kernels)
    
    (see Quicksilver)
    
    if you have to pass a big message, you just write the message out to a place in memory and pass around the address. Kernels could similarly respond to a message by writing to a place in shared memory
    
    leader election is the process of designating a single process as the organizer of some task between separate computers
    
    again we could our "each kernel has a designated place in memory" trick to allow nodes to communicate between memory writes.
PL
build up with TP

Questions by Topic

file system

Jan 2009 3.2
Jan 2008 3.2
Aug 2006 2.1 4.2
Jan 2006 2.3
Aug 2005 2.3
Jan 2005 2.2
Spring 2004 3.3
Aug 2003 3.3
Aug 2001 1.2,1.3 1.4
Jan 2001 1.1

NP-Completeness

Spring 2009
- 3
  NP-complete
  can reduce Hamiltonian path to k-leaf spanning tree
  
  2-leaf spanning tree ≡ Hamiltonian path
- TODO 6
  dynamic programming
  not all graphs are trees so this doesn't mean that P=NP
- 7
  This is ∈ P because there are only two choices for each vertex, so it's like 2-SAT.
Fall 2007
- 1 (widget salesman)
  - like traveling salesman, but not
  - vertices weighted with different values for different days
  think of this as a large directed graph with weighted vertices, each city/day combo gets a vertex, directed edges allow traversal through the graph moving forward in time.
  
  dynamic programming
  
  like the beach-front property problem in the HW
  
  time and space
  - this would take polynomial time thanks to dynamic programming
  - this would take exponential space – because you would need to store the answer for each subset of the graph
```
(defn travel [s G W]
  (max (map
        ;; where t is the node landed on by that edge
        (fn [t] (+ (:value t) (travel t G W)))
        (edges-leaving-to s G))))
```
- 4
  DNF is in P
  converting from CNF to DNF, you get an exponential number of clauses
- DONE 5
  I think this would be a regular language. This is true because the state is always finite.
  We say that u ∼ v if ∀ w : uw ∈ L ⇔ vw ∈ L, if a language has a finite number of equivalence classes under ∼, then it is regular
  
  Lets say that u ∼ v iff the set of the last two digits of the possible predecessor strings of u and v are equal. It is clear that if this is the case, then u and v satisfy the ∼ property. The number of such sets is finite, because 2²=8 possible strings of two bits and 64 possible states. $\square$
  
  how about for a two dimensional tiling?
  - intuition says it should work
  - apparently wikipedia says otherwise
Spring 2007
- 1 (shore-front property)
- TODO 2
  take 1/edge-weight for each edge weight and then run minimum spanning tree
- DONE 3
  dynamic programming, working from right to left along the sequence
  these would take nlgn space, n subsequences which need to be tracked, and each subsequence has a number of size n (taking up lg(n) space)
  
  6 2 17 4 9 8 10 5 12
```
(defn rests [lst] (if (empty? lst) '() (cons lst (rests (rest lst)))))
(defn mymax [lst] (if (empty? lst) 0 (apply max lst)))

(memoize
 (defn inc-subseq [lst]
   (if (empty? lst)
     0
     (let [me (first lst)]
       (inc (mymax (map inc-subseq
                        (filter #(< (first %) me) (rests (rest lst))))))))))

(inc-subseq nums)
```
  and in emacs-lisp
```
(defvar my-cache '())

(defun inc-subseq (lst)
  (or (cadr (assoc lst my-cache))
      (if (= (length lst) 0)
          0
        ((lambda (answer) (setq my-cache (cons (list lst answer) my-cache)) answer)
         (+ 1 (apply #'max
                     (cons 0 (maplist
                              (lambda (el)
                                (inc-subseq (when (< (car el) (car lst)) el)))
                              (cdr lst)))))))))

(inc-subseq (reverse '(6 2 17 4 9 8 10 5 12)))
```
- 4
  - optimal solution
    - place three connected nodes off to the side
    - connect two of the three to one node – thereby forcing it to have a color
    - ∀ remaining nodes in the graph, continually connect it to two of the three in our floating triangle until one of the solutions is colorable – thereby forcing a coloring for that node
    this will take at most 3n time
  - suboptimal solution
    - pick a node at random and call it red (without loss of generality)
    - ∀ other nodes, connect those nodes to the selected node, and ask if the remaining graph is still 3-colorable, if not, then the newly selected node is also red
    - after n questions we know all of the red nodes
    - we then do the same thing for green and blue
    after at most 3n questions we have found all of the red, green, and blue nodes
    
    there could be some nodes left over
- DONE 6
  first we can check a witness can be checked in polynomial time, ∀ corner we check that the rectangle at that corner doesn't overlap the nearest corners – like O(n²) or so
  look at p.153 in the book, also maybe p.178
  
  maybe this is equal to integer partition or subset sum
  
  if integer partition is NP-complete, then cool can do the following
  
  let every little rectangle have unit height, and the big rectangle have height of 2 units, tiling is the same as the partition problem
  
  if we're scared of rotation, then we can just size our rectangles such that the smallest length is greater than 2
  
  http://en.wikipedia.org/wiki/List_of_NP-complete_problems
Fall 2006:
- 1
  Just take nodes out one at a time and then ask "does there still ∃ an independent set of size k".
  notes
  - independent set converts to vertex cover
  - independent set converts to clique in the complement of the graph
- 3
  A counter which increments for each "(" and decrements for each ")", then the counter would be in Ω(log(n)).
  An example of a string which can't fit into o(log(n)) would be the string of n "("s, because with the last "(", the counter would hit it's max and cycle back to 0.
- 4
  - turn every vertex into a triangle, converts from 3-coloring → equal 3-coloring
  for these types of problems we need to
  - show that the conversion can be done in polynomial time
  - show that we can map yes instances to yes instance and no instance to no instance
- 6
  If P=NP then we can collapse quantifiers ∃ and ∀, the entire polynomial hierarchy collapses and we the encrypt crypto and the decrypt crypto problem become equal (and easy)
Spring 2005, 1, 2, 3, 6.

6	2	17	4	9	8	10	5	12

typing

Jan 2009
- 3.1
  did (1) and (2) w/pencil and paper.
  for (3) it's not as clear how to handle variant types, also, what is meant by "sums"
- TODO 3.2
  need to talk to Darko
Aug 2008 3.1, 3.2 (Same as 2009)
Jan 2008
- TODO 1.3 (Simple type lambda calculus)
  look up erasure – once I have the book…
- spring 08 1.4
  1. division by 0
```
f 3
    where
      f x = x `div` 0
```
    lets say that div has type Int → Int → Maybe Int, then f x has type Int → Maybe Int, and more specifically because f always divides by 0 it has type Int → Nothing, so the type of the entire expression is Nothing
  2. f 3 where f x = x has type Int, because f has type a → a, and x is an Int
  3. (f x, f "a") again f has type a → a, so f 3 has type Int and f "a"= has type String, and there tuple then has type =(Int, String)
  4. in the following
```
f g [[1],[2,3]]
    where
      f h = map (map h)
      g n = n * n
```
    we can assign the following types
```
map :: (a -> b) -> [a] -> [b]
f :: (a -> b) -> [[a]] -> [[b]]
g :: Num a => a -> a
```
    so the composition of f and g has type
```
Num a => [[a]] -> [[a]]    which when applied to an item of type =[[Int]]= results in a an item of
```
    the same type
  5. the following "i believe" should fail with a recursive type issue
```
f g [[1],[2,3]]
    where
      g :: Num a => a -> a
      g n = n * n
      twice :: (a -> a) -> a -> a
      twice f a = f (f a)
      f :: (a -> a) -> -- fails to have a type, infinite recursive type
      f h = twice map h
```
  discuss what choices the language designers made, and what would results from stricter and/or from less strict polymorphism.
  
  (see http://en.wikipedia.org/wiki/Type_polymorphism)
- 3.3
  lookup let polymorphism, and polymorphic functions
  using let it can be possible for a polymorphic function to be used with multiple different, so for example id (λ x . x) is fully polymorphic, and the following could work where id is applied to objects of two different types (a function and a natural number)
  
  let id=(λ x . x) in
  (id (λ k . "this")) (id 5)
  
  the above works because the type of id is calculated to be polymorphic during evaluation of the head of the let before the body of the let, however the following will not work, because the type of id is first calculated to be a function on functions (λ k . k), and then it can't be re-applied to 5 (a number)
  
  (λ id . ((id(λ k . k))(id 5)))(λ n . n)
  
  This could be resolved through declaration of polymorphic functions
Aug 2007 1.6 (ML)
(see spring 08 1.4)
Jan 2007 1.1 (SML)
Aug 2006
- 1.3 (ML)
  (see spring 08 1.4)
- TODO 3.1 (lambda calculus)
  1. (if x (succ 0) (iszero y))
  2. if e is typable with type Τ and f is a normal form of e, then f has the type of Τ.
    the evaluation rules of λ-calculus preserve type, an expression which does not have a normal form does not have a type
Jan 2005
- 1.4 (ML type-correctness)
- 1.5 (ML Type reconstruction)
- 3.3 (type-correct)
Spr 2004 3.3
Jan 2003
- 1 (ML Type reconstruction)
- 7 (static semantic of an explicitly-typed core)
Aug 2003 1.2.2 (SML) 1.3.1 (SML)

axiomatic and denotational semantics

(for axiomatic semantics see cs550-notes)

wikipedia is actually a pretty good resource for this

Spring 2009: 3.3, 3.4
- 3.3
  in a simple programming language with +, *, -, <, ≥, assignment, conditionals, sequencing and while loops, and side effect free expressions
  we add ++ and – (both prefix and postfix) to the language, what changes in the semantic rules of the language (consider both axiomatic and denotational).
  
  With the addition of these rules the language no longer has side effect free expressions. Now essentially all language constructs which rely on the side-effect free nature of expressions (i.e. a conditional assuming that it's guard doesn't change state) will need to be updated to explicitly allow expressions to affect state.
  
  give an example of a guarded if statement with both a side-effect free expression in the guard, and with a potentially active expression in the guard.
  - denotational semantics
    - [if b then s₁ else s₂] = [b] → [s₁] else [s₂]
    - becomes
    - [if b then s₁ else s₂] = ([b] → [s₁] else [s₂]) o [b]
    - we can then break all other language constructs down into instances of the if and loop, where loop isn't affected by this change
  - axiomatic semantics
    - wp(if b then s₁ else s₂, α) = β, so
      - β ∧ b → wp(s₁,α)
      - β ∧ ¬ b → wp(s₂,α)
    - becomes
    - wp(if b then s₁ else s₂, α) = β, so
      - β ∧ b → wp(s₁,wp(b,α))
      - β ∧ ¬ b → wp(s₂wp(b,α))
- 3.4
  
  structural operational semantics
  Analysis of the meaning of a program as the series of operations which constitute the programs evaluation. These operations can be constructed as if → then rules of evaluation. (see operational semantics)
  
  natural semantics
  ?
  
  denotational semantics
  explicit state, and the meaning of a program is a mathematical function, foundational basis is set-theory, program is a mathematical function from state to state.
  
  expressions
  don't have side effects
  
  statements
  have side effects
  
  axiomatic semantics
  programs operate on formulas specifying properties of state, the semantics of a program is the effect of the program on the strongest properties of the states
Fall 2008: 3.3 (same as S09 3.3), 3.4 (very similar to S09 3.4)
Spring 2008: 3.1, 3.2 (same as F08 3.4)
Fall 2007: 3.1, 3.2 (indirectly)
Spring 2007: 3.3, 3.4.1 (same as S09 3.4)
Fall 2006: 3.2 (same as S09 3.4)
Spring 2006: 3.2 (same as S09 3.4)
Fall 2005: 3.2
Spring 2005: 3.1 (same as S07 3.3), 3.2 (same as S09 3.3)

probability

TODO Spring 2009: 5
how many n items can we fit into an m place hash before we start to see collisions
we know that when $n = \Theta(\sqrt{m})$ we start to see collision

how big can n be in relation to m before we start to see collisions
\begin{math} \binom{n}{3} \times \frac{1}{m^3} \geq 1 \end{math} $\binom{n}{k}$ is always roughly equal to $\frac{n^k}{k!}$
answer is $m^{\frac{2}{3}}$
Fall 2007: 3, 7.
$\frac{1}{3}\left(\left(\frac{-1}{2}\right)^n \frac{2}{3} \right)$
Spring 2007: 7
- p(k-1|k) will be
  
  give a kumquat (k)/3
  
  receive not a kumquat (k)/3
- p(k+1|k) will be
  
  give not a kumquat
  
  receive a kumquat
- p(k|k)
  
  give and get kumquat
  
  give and get not-kumquat
Spring 2005: 4
n elements in an n-space hash, with high probability no space has log(n) elements
- see 07-hashing.pdf
keep in mind that
- $\binom{n}{k} \simeq \frac{n^k}{k!}$
\begin{eqnarray*} P(k) &=& \binom{n}{k}\left(\frac{1}{n}\right)^{k}\left(1-\frac{1}{n}\right)^{n-k}\\ &\leq&\frac{n^k}{k!}\left(\frac{1}{n}\right)^{k}\left(1-\frac{1}{n}\right)^{n-k}\\ &<& \frac{1}{k!} \end{eqnarray*}

give a kumquat	(k)/3
receive not a kumquat	(k)/3

give not a kumquat
receive a kumquat

give and get kumquat
give and get not-kumquat

fixed points

Spring 2009: 3.5
looks like while rev1 and rev2 are true for the same input sets, the two programs are not fully equivalent through fixed point analysis.
1. lets pick the naturals for our atoms
2. we start applying TP to this domain for a couple of steps,
  1. first just everything which doesn't have a body
  2. then build up on those, body → head, etc…
3. eventually maybe prove inductively that ∀ sets which satisfy rev1, those same sets will also satisfy rev2, start with empty trivial sets and grow from there
Fall 2008 : 3.5
This says compute the fixed point semantics, while 3.6 says compute the logical semantics, what's the difference given that the meaning of a logic program is that program's minimal fixed point. Aren't these equal?
fixed point semantics is
- start with empty (just the ground terms)
- begin applying TP building up successive sets
Spring 2008: 3.4
Fall 2007: 3.3
Jan 2007: 3.1
Fall 2006 : 3.3
Jan 2006 : 3.3
fall 2005 : 3.3

prolog questions

Jan 2009: 1.5

the second is a "continuation passing" version of the first, in which the state of the computation is explicitly passed between functions.

possible stack trace?

natlist([0, s(0)], write('success!'))
nat(0, natlist([s(0)], C))
nat(0, nat(s(0)), natlist([], write('success!')))
nat(0, nat(s(0)), call(write('success!')))
nat(0, nat(s(0)), TRUE)
nat(0, TRUE)
TRUE

output of

natlist([], C) :- call(C).
natlist([X|T], C) :- nat(X, natlist(T, C)).

nat(0, C) :- call(C).
nat(s(X), C) :- nat(X, C).

natlist([0,s(0)],write('success')).

yields

?- natlist([0,s(0)],write('success')).
success
true.

and

natlist([]).
natlist([X|T]) :- nat(X), natlist(T).

nat(0).
nat(s(X)) :- nat(X).

natlist([0,s(0)]), write('success').

yields

?- natlist([0,s(0)]), write('success').
success
true.

implement a parser to do this

%% matches prolog clauses, and write out their continuation versions
cont([]).
cont(A,B) :- write(A), bodycont(B).

%% matches the body of a clause, and writes out a continuation version
bodycont(B) :- %% only one clause, then print it w/continuation arg.
        not 'C'(B,b_first,',',b_rest,')'),
        not 'C'(B,b_inside,')'),
        write(B), write(b_inside) write(',C)').
bodycont(B) :- %% multiple arguments, then recurse
        'C'(B,b_first,',',b_rest),
        write(b_first), write(','), bodycont(b_rest).

Aug 2008: 1.3
same as above (binarized)
Jan 2008: 1.1
1. write a program to translate roman numerals
```
%% lookup values of the numbers
value('I',1).
value('V',5).
value('X',10).
value('L',50).
value('C',100).
value('D',500).
value('M',1000).
value(A,B,N) :- value(A,M), value(B,O), >=(M,O), plus(M,O,N).
value(A,B,N) :- value(A,M), value(B,O),  <(M,O), plus(N,M,O).

%% values for strings of characters
roman([],0).
roman([A],U) :- value(A,V), =(U,V).
roman([X,Y|Rest],U) :-
        value(X,Xval), value(Y,Yval), >=(Xval,Yval),
        roman([Y|Rest],V), plus(Xval,V,U).
roman([X,Y|Rest],U) :-
        value(X,Xval), value(Y,Yval), <(Xval,Yval),
        roman([Y|Rest],V), plus(U,Xval,V).
```
  some notes on the above
  - there is no minus in prolog, you only need plus
  - keep in mind the difference between "," and "|" when dealing with lists.
  - two functions are needed to differentiate between the literal lookup value of a letter, and it's actual contribution to the total of the whole numeral.
2. can this program be "run backwards" to subtract roman numerals?
  If it means subtract as in "-" the arithmetic operation on numbers, well then no, this translates from numerals (i.e. strings of characters), to the actual abstract platonic ideals which are numbers (i.e. quantities). This program could not be used to manipulate quantities, only to translate from character strings to quantities.
  
  However the program can be used to generate numerals from quantities, however these would not necessarily be constructed as the Romans would have done, e.g.
```
roman(Numeral,1972).
Numeral = ['I','I','I','I'...]
```
  however with some length restriction there are interesting results.
```
?- roman(['I', 'I', 'I', 'I', 'L', 'M', 'M', 'X', 'X'] , M).
M = 1972 
```
Jan 2008 1.2
same as above (binarized)

axiomatic/denotational semantics questions

8) Fall 2006 3.2 (Combined with 7)

λ-calculus questions

1) Spring 2009 2.1
α-normalization is the primary means of evaluation in the λ-calculus. It is the process of recursively applying β-reductions to a λ-expression until it either reaches a normal form, or diverges or falls into an infinite loop indicating that ¬∃ a normal form.
a meaningful example – summation of church numerals 1 + 2

;; +
λ m n. λ f x. (n f m f x)
;; 1
λ f x. (f x)
;; 2
λ f x. (f (f x))

β-reduction

λ n. λ f x. (n f (λ f x. (f x)) f x)
;; 2
λ f x. (f (f x))

β-reduction

λ n. λ f x. (n f (f x))
;; 2
λ f x. (f (f x))

β-reduction

λ f x. (λ f x. (f (f x)) f (f x))

β-reduction

;; 3
λ f x. (f (f (f x)))
2) Fall 2008 2.2 (Same Question as 1)
3) Fall 2008 1.3
4) Spring 2008 2.2 (Same as 1 and 2)
5) Spring 2008 3.3
write down the evaluation and typing rules for the pure call-by-value λ-calculus
first lets fully define our simply-typed call-by-value λ-calculus with the following abstract syntax
```
t ::=
    c, an atom
    v, a variable
    (t,t), an application of two terms
    (\lambda x. y), a \lambda expression where x is a variable and y is a term
    let x=t in y, x is var, t and y are terms
    x \rightarrow y; z, where x y and z are terms
    primop, t_1, t_2, etc... where t_i are all terms and primop is a primitive
```
evaluation
application
(a b) if a is a λ expression (e.g. (λ x. e)) then pass b to a or [x → b]e

if
the following is strict in t and is lazy in t₁ and t₂, meaning if t is $\bot$ then the whole statement is also, but not necessarily so for t₁ and t₂
- if true then t₁ else t₂ → t₁
- if false then t₁ else t₂ → t₂
- $\frac{t \rightarrow t'}{if t then t_1 else t_2 \rightarrow if t' then t_1 else t_2}$
???
β-reduction
typing
- true : bool
- false : bool
- 0 : nat
- $\frac{t_1 : nat}{ succ(t_1) : nat}$
- $\frac{t_1 : nat}{ pred(t_1) : nat}$
- $\frac{t_1 : bool, t_2 : T, t_3 : T}{if t_1 then t_2 else t_3}$
6) Spring 2006 3.1
1. normal forms
  1. 1. ((λ (x) (λ (y) (x (λ (x) x)))) (λ (y) (x y)))
    2. (λ (y) ((λ (y) (x y)) (λ (x) x)))
    3. (λ (y) ((x (λ (x) x))))
    4. ((λ (x) (y (λ (v) (x v)))) (λ (y) (v y)))
    5. (y (λ (v) ((λ (y) (v₁ y)) v)))
    6. (y (λ (v) ((v₁ v))))
2. type checking
```
;; g :: [a -> b] -> [[a] -> [b]]
(define g (lambda (f)
            (lambda (l)
              (if (null l)
                  nil
                  (cons (f (f (car l)))
                        ((g f) (cdr l)))))))
;;        cdr     :: [a] -> [a]
;;     (g cdr)    :: [[a]] -> [[a]]
;;  (g (g cdr))   :: [[[a]]] -> [[[a]]]
;; ((g (g cdr) l) :: [[[a]]]
((g (g cdr)) l)
```
  - l is a list of any type
  - g and (g (g cdr)) are functions with the types shown above where a can be any type
3. fixed points
```
(define F (lambda (f)
            (if (= x 0)
                1
                (if (= x 1)
                    2
                    (+ (f x)
                       (f x)
                       (neg (f (- x 1))))))))
```
  the least fixed point would be the function f defined by the following table.
  
  input output
  
  $\bot$ $\bot$
  
  0 1
  
  1 2
  
  $\bot$ $\bot$
  
  $\vdots$ $\vdots$
  
  this is the only, and also the least fixed point of this function.
  
  ¬∃ any lesser fixed points
  assume ∃ g s.t. g is less than f and g is a fixed point of F. Then either g(0) < f(0), or g(1) < f(1) because everywhere else f is $\bot$, this is a contradiction, because if either of the above cases is true, then F(g) ≠ g and g is not a fixed point, similarly
  
  ¬∃ any /greater fixed point
  assume ∃ g s.t. g is greater than f and g is a fixed point of F. Since g is a fixed point we know it's values for inputs of $\bot$, 0 and 1, so ∃ n>1 s.t. g(n)≠$\bot$, however this is a contradiction, because ∀ n>1 and ∀ functions h F(h)(n) = $\bot$, because it calls F(h)(n).
  
  so f is both the minimal and unique fixed point of F
7) Fall 2006 3.1
1. give an example of a λ-expression without a type
  if t₁ then 0 else FALSE
  
  this can't be typed because it's type could vary based on the value of t₁, however this could still be part of a λ-expression which evaluates cleanly
2. if e has type Τ and f is a normal form of e, then f is also well typed and has a normal form of Τ. This is true because all evaluation rules in the λ calculus preserve the type of well-typed expressions – I guess I could write them out, for this just pl-spring08-3.3.
3. while every expression e has a unique normal form, it is possible that some paths of evaluation of e terminate while others do not → it is possible that ∃ e s.t. e is not typable because some paths of evaluation of e don't terminate, but those paths of evaluation from e that do terminate into a normal form (f) which could be typeable
4. (see above pl-spring06-3.1)
8) Fall 2006 3.2 (Combined with 7)
this isn't λ-calculus, but it looks tough (see fall-06-3.2)
9) Spring 2005 3.1
10) Fall 2005 1.6
11) Fall 2005 3.3
12) Spring 2004 3.3

input	output
\(\bot\)	\(\bot\)
0	1
1	2
\(\bot\)	\(\bot\)
\(\vdots\)	\(\vdots\)

haskell questions

permute

permute :: Eq a => [a] -> [[a]]
permute [] = [[]]
permute a = foldr (\ x rest -> (map (\ tail -> x:tail) (permute (less x a)))++rest) [] a
    where
      less x ys = [y | y<-ys, not (y == x)]

in a different way

rem_dups :: (Eq a) => [a] -> [a]
rem_dups = foldr (\ x a -> if elem x a then a else x:a) []

permute :: (Eq a) => [a] -> [[a]]
permute [] = [[]]
permute a = concat $ map (\ x -> map (x:) $ permute $ less x a) $ rem_dups a
    where
      less x ys = [y | y<-ys, not (y == x)]

just looking at primes – not from a test
sieve of Eratosthenes – taken from en.literateprograms.org

module Main( main ) where
import System( getArgs )

primes :: [Integer]
primes = sieve [2..]
    where
      sieve (p:xs) = p : sieve [x|x <- xs, x `mod` p > 0]

main = do
  args <- getArgs
  print $ show(take (read (head args)) primes)

results in a script which take a number and prints out the first that many primes

now a faster version

module Main( main ) where
import System( getArgs )

merge :: (Ord a) => [a] -> [a] -> [a]
merge xs@(x:xt) ys@(y:yt) = 
  case compare x y of
    LT -> x : (merge xt ys)
    EQ -> x : (merge xt yt)
    GT -> y : (merge xs yt)

diff :: (Ord a) => [a] -> [a] -> [a]
diff xs@(x:xt) ys@(y:yt) = 
  case compare x y of
    LT -> x : (diff xt ys)
    EQ -> diff xt yt
    GT -> diff xs yt

primes, nonprimes :: [Integer]
primes    = [2, 3, 5] ++ (diff [7, 9 ..] nonprimes) 
nonprimes = foldr1 f . map g . tail $ primes
  where 
    f (x:xt) ys = x : (merge xt ys)
    g p         = [ n * p | n <- [p, p + 2 ..]]

main = do
  args <- getArgs
  print $ show(last (take (read (head args)) primes))

spring 09 1.1
hungry hungry haskell

data Hungry a = T
              | H (a -> Hungry)
hhh :: Int -> Hungry
hhh 0 = id
hhh n = const (hhh (n-1))

crazy data type

data A a = a -> A

spring 09 1.2
matrix multiplication – can be done in language of choice

C11 = M1 + M4 − M5 + M7
C12 = M3 + M5
C21 = M2 + M4
C22 = M1 − M2 + M3 + M6

s.t.

M1 = (A11 + A22 ) · (B11 + B22 )
M2 = (A21 + A22 ) · B11
M3 = A11 · (B12 − B22 )
M4 = A22 · (B21 − B11 )
M5 = (A11 + A12 ) · B22
M6 = (A21 − A11 ) · (B11 + B12 )
M7 = (A12 − A22 ) · (B21 + B22 )

working in clojure

(defn mm [a b]
  (if (= (count a) 1)
    (list (list (* (first (first a))
                   (first (first b)))))
    (let [size (/ (count a) 2)
          fslice (fn [m] (map #(take size %) m))
          bslice (fn [m] (map #(drop size %) m))
          m+  (fn [a b] (map #(map + %1 %2) a b))
          m-  (fn [a b] (map #(map - %1 %2) a b))
          a11 (fslice (take size a))
          a12 (bslice (take size a))
          a21 (fslice (drop size a))
          a22 (bslice (drop size a))
          b11 (fslice (take size b))
          b12 (bslice (take size b))
          b21 (fslice (drop size b))
          b22 (bslice (drop size b))
          m1  (mm (m+ a11 a22) (m+ b11 b22))
          m2  (mm (m+ a21 a22) b11)
          m3  (mm a11 (m- b12 b22))
          m4  (mm a22 (m- b21 b11))
          m5  (mm (m+ a11 a12) b22)
          m6  (mm (m- a21 a11) (m+ b11 b12))
          m7  (mm (m- a12 a22) (m+ b21 b22))]
      (concat (map concat
                   ;; c11
                   (m+ (m- (m+ m1 m4) m5) m7)
                   ;; c12
                   (m+ m3 m5))
              (map concat
                   ;; c21
                   (m+ m2 m4)
                   ;; c22
                   (m+ (m+ (m- m1 m2) m3) m4))))))

(def a
     '((1  2  3  4)
       (5  6  7  8)
       (9  10 11 12)
       (13 14 15 16)))

(def b
     '((1  2  3  4)
       (5  6  7  8)
       (9  10 11 12)
       (13 14 15 16)))

(mm a b)

working in Haskell

mdo :: (Num a) => (a -> a -> b) -> [[a]] -> [[a]] -> [[b]]
mdo f a b = map (\ (x,y) -> (map (\(l,r)->l`f`r) (zip x y))) (zip a b)

mm :: (Num a) => [[a]] -> [[a]] -> [[a]]
mm [[a]] [[b]] = [[a*b]]
mm a b = (map (\ (x,y) -> x++y) (zip c11 c12))++
         (map (\ (x,y) -> x++y) (zip c21 c22))
    where
      c11 = mdo (+) (mdo (-) (mdo (+) m1 m4)  m5) m7
      c12 = mdo (+) m3 m5
      c21 = mdo (+) m2 m4
      c22 = mdo (+) (mdo (+) (mdo (- ) m1 m2) m3) m6
      m1 = mm (mdo (+) a11 a22) (mdo (+) b11 b22)
      m2 = mm (mdo (+) a21 a22) b11
      m3 = mm a11 (mdo (-) b12 b22)
      m4 = mm a22 (mdo (-) b21 b11)
      m5 = mm (mdo (+) a11 a12) b22
      m6 = mm (mdo (-) a21 a11) (mdo (+) b11 b12)
      m7 = mm (mdo (-) a12 a22) (mdo (+) b21 b22)
      a11 = map (take size) (take size a)
      a12 = map (drop size) (take size a)
      a21 = map (take size) (drop size a)
      a22 = map (drop size) (drop size a)
      b11 = map (take size) (take size b)
      b12 = map (drop size) (take size b)
      b21 = map (take size) (drop size b)
      b22 = map (drop size) (drop size b)
      size = (round ((fromIntegral (length a))/2))

-- [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]]

DONE spring 09 1.3 (Haskell BFS)
(see annotated Haskell BFS)

data Tree a = E
            | T a (Tree a) (Tree a)
              deriving Show

-- so much easier than bfs
dfnum :: Tree a -> Tree Int
dfnum a = fst(dfn(a,1))
    where
      dfn :: ((Tree a), Int) -> (Tree Int, Int)
      dfn (E,n) = (E,n)
      dfn (T _ left right, n) = (T (snd l) (fst l) (fst r), (snd r))
          where
            l = dfn(left, n)
            r = dfn(right, ((snd l) + 1))

data Tree a = E
            | T a (Tree a) (Tree a)
              deriving Show

bfnum :: Tree a -> Tree Int
bfnum t = (head (bfn([t],1)))
    where
      bfn :: ([Tree a], Int) -> [Tree Int]
      bfn ([],_) = []
      bfn (trees,n) =
          bfcomb n trees (bfn (next_trees, ((round((fromIntegral(length next_trees))/2))+n)))
              where
                next_trees = (foldr (\ t a -> case t of E->[]++a; T _ l r->[l,r]++a;) [] trees)
      bfcomb :: Int -> [Tree a] -> [Tree Int] -> [Tree Int]
      bfcomb n (E:ts) rs               = [E] ++ (bfcomb n ts rs)
      bfcomb n (T _ _ _:ts) (r1:r2:rs) = [T n r1 r2] ++ (bfcomb (n+1) ts rs)

George's BFS

bfnum :: Tree a -> Tree Int
bfnum t = head $ q 1 [t]
q :: Int -> [Tree a] -> [Tree Int]
q _ [] = []
q i ts = dq (num i ts) (q n $ concat [[l, r] | T a l r <- ts])
  where dq [] _ = []
        dq (T i _ _:hs) (l:r:lrs) = T i l r:-Dq hs lrs
        dq (E:hs) (lrs) = E:-Dq hs lrs
        num k (T _ _ _:tl) = T k E E:num (k+1) tl
        num k (E:tl) = E:num k tl
        n = length [a | T a _ _ <- ts] + i

George's even shorter BFS

data Tree a = E
            | T a (Tree a) (Tree a)
              deriving Show

bfnum :: Tree a -> Tree Int
bfnum t = head $ q 1 [t]
    where q i ts = dq i ts (q (i + length [a | T a _ _ <- ts])
                             (concat [[l, r] | T a l r <- ts]))
          dq i (T _ _ _:hs) (l:r:lrs) = T i l r:dq (i+1) hs lrs
          dq i (E:hs) (lrs) = E:dq i hs lrs

TODO spring 09 1.4
prove that scanl as defined here
```
scanl :: (b -> a -> b) -> b -> [a] -> [b]
scanl f e = map (foldl f e) . inits

inits :: [a] -> [[a]]
inits [] = [[]]
inits (x:xs) = [] : map (x:) (inits xs)
```
runs in O(n²) time. Then give the definition of another function (scanl') which is more efficient.

There are many redundant computations. We foldl over the subsequences in inits which, however these elements are nested subsequences, so for example f e [a₁] will be called n times when a₁ is the last element of the input, f e [a₂,a₁] will be called n-1 times when a₂ is the pen-last element of the input, etc…

The following faster version ensures that f is only called once per subsequence of the input, or n+1=Θ(n) times.
```
scanl' :: (b -> a -> b) -> b -> [a] -> [b]
scanl' f e = -- recursive, build up incrementally,
             -- call function on first of accumulator and next of new
```
Prove that the two are equal using the calculational proof style

spring 08 1.5

appendLists :: [a] -> [a] -> [a]
appendLists [] ys = ys
appendLists (x:xs) ys = x : appendLists xs ys

prove that the above is associative

app (app xs ys) za == app xs (app ys zs)

see reasoning about functions & data structures

fall 07 1.1
(see equational reasoning and reasoning about functions & data structures)
```
element :: Eq a => a -> [a] -> Bool
element _ [] = False
element x (y:ys) = (x == y) || (element x ys)

less :: Eq a => a -> [a] -> [a]
less _ [] = []
less x ys = [y | y<-ys, not (y == x)]

equal :: Eq a => [a] -> [a] -> Bool
equal [] [] = True
equal _  [] = False
equal [] _  = False
equal (x:xs) ys = (element x ys) && equal xs (less x ys)

union :: Eq a => [a] -> [a] -> [a]
union [] a = a
union a [] = a
union (x:xs) ys = x : union xs (less x ys)

intn :: Eq a => [a] -> [a] -> [a]
intn [] _ = []
intn _ [] = []
intn (x:xs) ys = (if elem x ys then [x] else []) ++ (intn xs (less x ys))

ps :: [a] -> [[a]]
ps = foldr (\ x a -> (map (x:) a)++a) [[]]
```
proofs of validity

equal
proof by induction on the size of the sets, first off if the sets are empty we win by the first three pattern matches.
Assume equal works for sets of size n, then for sets of size n+1, take the first element of one set (e.g. x from x:xs), if the other set (ys) includes x, then xs ≡ ys iff (equal xs (less x ys)) which is returned by our algorithm. else if ys does not include x, then xs ≠ ys as returned by our algorithm.

union
union xs ys = {z | z ∈ xs ∨ z ∈ ys}, we must retain the invariant that ∀ z ∈ zs (z ∈ xs ∨ z ∈ ys) and also ∀ z₁,z₂ ∈ zs z₁ ≠ z₂
induction on the size of xs, we win if xs is empty, because we return ys, which we know is a set.

assume union works for length n xs, then let xs have size n+1, we simply concat the first element of xs with union xs (less x ys), we know that x ∈ zs only once because it's not in xs (x:xs was a set) and it's not in (less x ys) by definition of less, also we know that (x ∈ xs ∨ x ∈ ys) because x ∈ xs

intersection
like the above only with ∧ instead of ∨

ps
∀ subset x ⊆ xs, x ∈ ps(xs), lets do induction on the size of x, |x|
if |x| = 0, then we win, because we start with [[]]

let |x| = n, and let a ∈ x, then ps(less a xs) will by the inductive assumption include (less a x) which means that ps(xs) will include x, because ∀ a ∈ xs, ps(xs) will at some point make a copy of every subset in the result, and add a to it. could probably use more verbiage, but gets the point across.

fall 07 1.2
Write a foldt function for folding over the leafs of a tree from the bottom up.

data Tree a = E
            | T a (Tree a) (Tree a)
              deriving Show
foldt :: (a -> [b] -> b) -> [b] -> Tree a -> b
foldt _ [b] E = b
foldt f b (T a l r) = f a [(foldt f b l), (foldt f b r)]

-- foldt (\ x a -> x + sum a) [0] (T 1 (T 2 E E) (T 3 E E))

scheme questions

spring 09 1.1
define a scheme function foo with the following behavior

> ((foo 0) ’i)
i
> (((foo 1) ’i) ’ii)
ii
> ((((foo 2) ’i) ’ii) ’iii)
iii
> (((((foo 3) ’i) ’ii) ’iii) ’iv)
iv

(define foo
  (lambda (i)
    (if (= i 0)
        (lambda (x) x)
        (lambda (x) (foo (- i 1))))))

Because of it's strict typing, Haskell doesn't accept a function which could return a variable-arity function. Maybe this is possible with type classes.

foo :: a -> a
foo 0 = id
foo n = const (foo (n-1))

synchronization

1. Spring 2005 3.2
2. Fall 2005 2.1
3. Fall 2005 3.2
4. Spring 2006 2.1
5. Spring 2006 3.2
6. Spring 2006 4.2
7. Fall 2006 3.2
(see Lamport timestamps and time-clocks.pdf)
In Lamport's logical clock setup, each process maintains it's own counter or logical clock which it updates according to the following simple rules
1. a process increments it's counter before each event
2. when a process sends a message it includes it's counter value with the message
3. Upon receiving a message the receiver process sets it's counter to be greater than the maximum of it's own counter and the received value in the message
This generates a partial ordering on all events in the system, and processes are "synchronized" based on how much they communicate.
8. Spring 2007 4.1 (largely same as #7)
Lamports clock assigns each process an incrementer (logical clock) which increments with every action and in response to communications. Comparison of the logical clock values of processes provides a partial ordering on the system, which has the nice property that it ensures that any two causally related events are properly ordered by the clocks.
this is accomplished through th following rules
1. each process has a logical clock
2. each process increments its clock with each action or communication
3. each communication is sent with the sender's logical clock value, and with each communication received a process will reset its clock to the max of its own clock and that attached to the communication
9. Fall 2007 2.1 (exactly same as #8)
10. Fall 2007 2.2 (same as #7)
DONE 11. Fall 2007 3.1
This looks good, it updates the swap cache first, and never has too many pages in physical memory
1. E put page in swap cache
2. D for all processes change page table entries to swap cache entries
3. B restore page table entries for any faulted pages
4. H get physical page
5. F write physical page to swap drive
6. C free physical page to system
7. A read from swap drive
8. G remove from cache to clean up

WAITING 12. Fall 2007 3.2

State "WAITING" from "STARTED" 2010-06-18 Fri 17:31
almost done, but would like to talk about it with other poeple

a – modify to use locks

typedef struct ringbuffer {
  /* Buffer is empty if rb->head == rb->tail. Buffer is full if
   * next(rb->tail) == rb->head */
  void **head, /* Points to first occupied element in buffer */
    **tail, /* Points to first free element in buffer */
    void **buf, /* Points to the start of the allocated ring */
    **end; /* Points one element *past* the allocated ring */
} ringbuffer_t;

void **RingbufferNext(ringbuffer_t *rb, void **item)
{
  void **next = item + 1;
  if (next >= rb->end)
    return rb->buf;
  else
    return next;
}

int RingbufferAppendSingle(ringbuffer_t *rb, void *value)
{
  void **next = RingbufferNext(rb, rb->tail);
  if (next == rb->head)
    return -1;
  *rb->tail = value;
  rb->tail = next;
  return 0;
}

int RingbufferRemoveSingle(ringbuffer_t *rb, void **value)
{
  if (rb->head == rb->tail)
    return -1;
  *value = *rb->head;
  rb->head = RingbufferNext(rb, rb->head);
  return 0;
}

add a mutex to the ringbuffer struct so that it becomes

typedef struct ringbuffer {
  /* Buffer is empty if rb->head == rb->tail. Buffer is full if
   * next(rb->tail) == rb->head */
  int lock; /* 0 means available 1 means locked */
  void **head, /* Points to first occupied element in buffer */
    **tail, /* Points to first free element in buffer */
    void **buf, /* Points to the start of the allocated ring */
    **end; /* Points one element *past* the allocated ring */
} ringbuffer_t;

void **RingbufferNext(ringbuffer_t *rb, void **item)
{
  while(rb->lock); // spin until unlocked
  rb->lock = 1;
  void **next = item + 1;
  if (next >= rb->end) {
    rb->lock = 0; // unlock when done
    return rb->buf;
  } else {
    rb->lock = 0; // unlock when done
    return next;
  }
}

and

int RingbufferRemoveSingle(ringbuffer_t *rb, void **value)
{
  while(rb->lock); // spin until unlocked
  rb->lock = 1; // lock
  if (rb->head == rb->tail) {
    rb->lock = 0; // unlock when done
    return -1;
  }
  *value = *rb->head;
  rb->head = RingbufferNext(rb, rb->head);
  rb->lock = 0; // unlock when done
  return 0;
}

don't really need to lock to check, so those parts could be move outside of the lock.

DONE b – with compare_and_swap and w/o locks
How could this code be change to avoid the need for locks using an atomic compare_and_swap?

Ensures that the buffer isn't full, then if it hasn't changed in the interim, it updates the tail and assigns a new value.

int RingbufferAppendSingle(ringbuffer_t *rb, void *value)
{
  for(;;){ // loop until success
    void **tail_start = rb->tail;
    // check if full and if so then quit
    void **next = RingbufferNext(rb, tail_start);
    if (next == rb->head)
      return -1;
    /* if tail_start is still valid, then update value */
    if (compare_and_swap(rb->tail, tail_start, next)){
      *rb->tail = value;
      /* if tail_start is *still* valid, then update pointer */
      if (compare_and_swap(rb->tail, tail_start, next)) {
        *rb->tail = next;
        return 1;
      }
    }
  }
}

Simpler than the above, because we can read the value without the need to do a first check.

int RingbufferRemoveSingle(ringbuffer_t *rb, void **value)
{
  for(;;){
    **void saved_head = rb->head;
    if (rb->head == rb->tail)
      return -1;
    *value = *rb->head;
    if(compare_and_set(rb->head, saved_head, RingbufferNext(rb, rb->head)))
      return 0;
  }
}

DONE c advantages of two above approaches


	pro	con
lock	no superfluous writes	requires physical locks
	simpler semantics
	clear locking sections
swap	no need for locks on objects	many writes will be lost if swap fails
		writing in the busy wait

DONE 13. Spring 2008 2.1
nice.
- each thread only updates it's own place in the array
with 2 threads
- looks like it works for two threads, because whenever t₁ passes into the critical section, it sets a¹=1 before hand, and t₀ will never pass into the critical section while a¹=1
- alternately whenever t₀ passed into the critical section, it sets a²=1 which will stall t₁
does not work for more than 2 threads
- at beginning any number of threads that aren't t₀ will pass right through to the critical section
14. Spring 2008 2.2
15. Spring 2008 2.3 (same as #7)
16. Spring 2008 3.1

virtual memory

1) spring 09 2.2
so you don't have to flush everything on context switch
2) spring 08 3.3 big
read dinosaur chpt 6
3) fall 07 3.1
1. H
2. E
3. D
4. F
5. C
6. A
7. B
8. G
TODO 4) fall 07 4.2
5) spring 07 3.2
In a demand paging VM system, pages are not brought into memory until they are needed (through a page fault). After a fault the following happens
1. the process requests an address
2. the address is not in the page table → page fault
3. traps to the Kernel
4. kernel checks that the logical address is valid, i.e. it is within the range of addresses assigned to the process
5. if valid a disk I/O is scheduled to read in the page from disk (disk address is equal to the logical address + the base disc address for the process – this information is kept in the PCB process control block)
6. when disk I/O completes, we trap to the kernel, the kernel then
7. the kernel checks for a free page, if ∃ a free page, then it is used, if ¬∃ a free page, then page-replacement
8. the kernel places the contents of the disk into a frame in physical memory
9. the kernel then updates the processes page table with an entry indexed by the logical address and keyed by the address of the frame in physical memory
5) fall 06 2.4
copy on write is used for example when a process forks, and each child process would prefer to share the address space until it is modified.
Assume a and b are the children procs, then a writes to the address space, the following manipulation of the page table will occur.
1. the written page is copied
2. the location of the corresponding page is updated in b's page table to point to the copy
3. a's page table will still point to the original page, since a did the writing
4. more? is the hierarchical bit relevant?
6) spring 06 2.1
The clock algorithm is a more efficient form of second chance for page replacement. An iterator (clock hand) steps through a circular page list, each time it hits a page it checks the page's reference bit (to see if it's been used recently) and if it has, it goes on to the next page, else it removes that page.
In this way the essentials of second chance replacement are retained (i.e. if a page is being used it won't be removed unless all other pages are also being used).
7) fall 05 2.2

optimal page replacement strategy
swap out the page that will be required furthest in the future – if not required at all then ∞

this is great but impossible – need to know the future

One frequently used approximation would be…

LRU
least recently used, under the unprovable idea which Hume considers to be the foundation of all human understanding, that the future will be like the past. In this case we assume that the pages used most recently must be critical to what's happening in the system, and should be retained, similarly the page which has not been used for the longest time should be swapped out, because it probably won't be used in the future
8) fall 05 3.2
inverted page tables are page tables which are indexed by physical memory address rather than by logical address. They have the advantage that the number of entries in the page table will not exceed the number of frames in physical memory, however they have the unfortunate drawback that when processes page fault, they must search the page table for their logical address, rather than simply index into the page table using their logical address.
hierarchical page tables are another way of dealing with the issue of large page tables. A hierarchical series of nested page tables are used each of which is indexed by some portion of the logical address, and points to the next deeper page table, or ultimately to a physical address. The logical address may be broken out as follows.
```
  +-----------------------------------------------------------+
  |  index into  |   next index   |            |    offset in |
  |  top level   |   next level   |  etc...    |    physical  |
  |  page table  |                |            |    memory    |
  +-----------------------------------------------------------+
```
not generally considered practical for 64-bit machines.
9) spring 05 3.1
using base and bound addressing versus full page tables
One would choose a base and bound approach for it's lighter hardware requirements, e.g. no need for a TLB, or for hardware supporting a fast access page table close to the processor. These decisions allow processors to be both lighter and less expensive. The downside of, however the benefits provided by tlb and page tables, namely the large size addressable physical memory available through a page table (or hierarchically organized page tables) is lost. A simple solution relying on the use of base and bound registers supports a much smaller range of addressable physical memory.

a base bound logical → physical address translation scheme also requires that each process be allocated a contiguous block of memory, rather than the pages allowable through a page table, this loses the benefits of pages
- paging processes into memory in discrete chunks
- fighting fragmentation
- if the process needs to grow, there may not be contiguous space available, could have to swap to disk – w/pages could just get a new page

polynomial time

spring 09 6
fall 07 1
fall 07 5
spring 07 1
spring 07 2
spring 07 3
fall 06 (it says spring 2005) 1
fall 06 (it says spring 2005) 2
fall 06 (it says spring 2005) 3
spring 05 2 a and b, 3, and 5

misc

DONE annotated Haskell BFS
Quick aside: this site has a nice Haskell review http://learnyouahaskell.com.

Breadth First Tree Numbering by George Stelle

data Tree a = E
            | T a (Tree a) (Tree a)
              deriving Show

bfnum :: Tree a -> Tree Int
bfnum t = head $ que 1 [t]
    where que i ts = dque i ts (que (i + length [a | T a _ _ <- ts])
                                    (concat [[l, r] | T a l r <- ts]))
          dque i (T _ _ _:hs) (l:r:lrs) = T i l r:dque (i+1) hs lrs
          dque i (E:hs)       (lrs)     = E:dque i hs lrs

This function breaks into interleaved recursive calls to two functions (que and dque) which operate on successive levels of the input tree.

            Tree Levels           Function Calls
            -----------           --------------

level 1         T1           que [T1]           
              /    \           dque [T1] que [T2, T3]
             /      \                 dque [T2, T3] que [E, E, E, E]
level 2     T2       T3
           /  \     /  \
          /    \   /    \
level 3  E      E E      E

que

Each recursive call to que descends another level into the tree.

The

length [a | T a _ _ <- ts]

portion returns the number of trees in the current level of the tree, ensuring that the count on the next level starts on the right number.

The

concat [[l, r] | T a l r <- ts]

then packs the elements of the next level into a list, and passes that list to the next call of que

dque

This takes an integer, as well as two lists of trees the first list is one level of the tree, and the next list is the next deepest level (the recursive call to que). It runs through these lists, placing the elements of the second list into the leafs of the trees of the first, and incrementing it's integer as appropriate

subset sum ⊆ Exact spanning tree
DONE Tiling ⊆ CA Predecessor
RPC semantics for OS

Resources

comps sites

comprehensive_examinations – previous exams at the bottom
suggested reading
- topics-for-operating-systems.pdf
- topics-and-readings-for-PL-Jun2009.pdf
- theory
  - Sipser, Intro to the Theory of Computation
  - Kleinberg and Tardos, Algorithm Design
  - Dasgupta, Papadimitriou and Vazirani, Algorithms
  - Papadimitriou, Computational Complexity

OS

data/comps/topics-for-operating-systems.pdf

Dinosaur book : read all and do all exercises
Networking book: read but don't spend too much time
Architecture: get peoples notes (taken from large 4-volume book)

http://pages.cs.wisc.edu/~remzi/Classes/537/Spring2010/notes.html

Vu's Site

https://sites.google.com/a/cs.unm.edu/comps/

if you'd like to edit it – give me your gmail username so I can add you to the owner list

VN -

Footnotes:

¹ FOOTNOTE DEFINITION NOT FOUND: 1

² FOOTNOTE DEFINITION NOT FOUND: 0

1/3	1/3	1/3	0	0	0
0	1/3	1/3	1/3	0	0
0	0	1/3	1/3	1/3	0
0	0	0	1/3	1/3	1/3
1/3	0	0	0	1/3	1/3
1/3	1/3	0	0	0	1/3

1/3	1/3	1/3	0	0	0
0	1/3	1/3	1/3	0	0
0	0	1/3	1/3	1/3	0
0	0	0	1/3	1/3	1/3
1/3	0	0	0	1/3	1/3
1/3	1/3	0	0	0	1/3