CS 451 Programming Paradigms
ML Assignment Two: Multiprocessing a Virtual Machine

In this assignment you will be given some code that provides certain functions and you will be asked to develop the rest of the code. The time you take to understand what is here will make the difference between getting the assignment done (a B or C) and genuinely learning the material that the assignment covers (an A or B).

You will implement a simple virtual machine with two integer registers and four memory locations. Within the registers the operations of addition (ADD), subtraction (SUB), and multiplcation (MUL) can be performed. Values can also be moved between registers (PUT) and placed directly into them (VPUT). Values from locations in memory can be placed in registers (LPUT) and values can be placed in memory locations directly (WRI with an integer argument) or from registers (WRI with a register argument). All commands have two arguments and the destination is always specified by the first argument.

You will be simulating an operating system process in which the above virtual machine executes on a particular program. We provide test functions that create several processes and direct each of them to execute for a certain number of instructions in a round-robin fashion. A process maintains a "program counter" by keeping the list of instructions that remain to be executed and maintains "state" by keeping copies of the values in register and memory locations at the end of each set of executions.

The following grammar describes the language interface to the virtual machine. It is shown in BNF where nonterminals--the items in brackets-- are replaced by any of the |- separated items following the arrows, recursively. Terminals--the items in quotes or all capitals--are actual strings from the language and are not recursively replaced in applying the grammar.

<stmts>   ->  <stmt> <stmts>
          |   ""

<stmt> -> "vput" <reg> <val> | "lput" <reg> <loc> | "put" <reg> <reg> | "wri" <loc> <val> | "wri" <loc> <reg> | "add" <reg> <reg> | "mul" <reg> <reg> | "sub" <reg> <reg>

<reg> -> "r1" | "r2"

<loc> -> "l1" | "l2" | "l3" | "l4"

<val> -> ML_INTEGER

A program consists of any number of assemtly-like statements. Each of these specifies an operation followed by a destination and a source, in that order. In the arithmetic operations the first register is both the destination register and the first source register. When the operation is add, sub, or mul, the corresponding operation performed on the values in the registers specified by the two arguments and the result is placed in the register specified by the first argument. When the operation is put, the value in the register specified by the second argument is placed in the register specified by the first argument. When the operation is vput, the second argument is treated as an integer to be placed in the register specified by the first argument (like "load immediate" for those familiar with assembly language). With lput the second argument is a location in the memory whose value should be placed in the register specified by the first argument. With wri there are two possible uses of the command. If the second argument is just an integer, then that value is placed in the memory location indicated by the first argument. If the second argument is a register then the value in that register is placed in the memory location indicated by the first argument. Note also that all operations change the location (memory or register) specified by the first argument and leave the second argument (register or value) untouched. The two registers are identified by r1 and r2. The memory locations are identified by l1, l2, l3, and l4.

Here are three examples of programs for this machine and their output. These should help make the grammar more clear to those who have not seen BNF before. The machine's registers and memory are always initialized to zero at the start of a program. The output is in a different format than will be produced by your program. The comments in the template code, shown below, demonstrate what your output should look like.

Program1              Program2             Program3
   vput r2 5             wri  l1 5            vput r1 5
   vput r1 ~4            wri  l2 6            vput r2 6
   add  r1 r2            lput r1 l1           add  r1 r2
   vput r2 0             vput r2 3            wri  l2 r1
                         sub  r2 r1           vput r1 7
                                              lput r2 l2
                                              mul  r2 r1

Output1               Output2              Output3
  Register1: 1          Register1: 5         Register1: 7
  Register2: 0          Register2: ~2        Register2: 77
  Memory1: 0            Memory1: 5           Memory1: 0
  Memory2: 0            Memory2: 6           Memory2: 11
  Memory3: 0            Memory3: 0           Memory3: 0
  Memory4: 0            Memory4: 0           Memory4: 0

You will build a parser that follows this grammar as it processes a program for the virtual machine. The work of each function for each nonterminal in the grammar handles the manipulations of data that implement the machine and its operations. The state of the machine will be passed as a parameter among these functions. You may want to reread the parsing example in chapter five of the ML text before you start on the assignment.

This virtual machine will be created using a functor. You will use a functor also to create a process out of a program structure. The process will create its own copy of a virtual machine and maintain the state for the program. The structures produced by the process functor have an execute function that takes and integer argument. Execute calls the interpret funtion in the structure produced by the virtual machine functor, which executes for that integer number of instructions before returning, updating the state along the way.

You will be given some of the code for this assignment and asked to write several more functions to complete the code for a working process and virtual machine.


The code you are given can be found in ~jalewis/451/ml/vmprocTemplate.sml on the cs machines. You should copy this to your own directory, renaming it vmproc.sml. There are also three input files and a test program file there that you can use for testing your code: progone.vm, progtwo.vm, progthree.vm, and vmtester.sml. The code in vmprocTemplate.sml clearly indicates what code you need to write. This code will not compile until you write the indicated code (or at least stub it out with parameters and do-nothing function bodies. Each function is accompanied by an indication of the type-signature of the function and comments describing that type signature and the basic behavior the function must exhibit. We present the code below, with discussion, in sections.

(********************************************************************)
(*                 TYPE DECLARATIONS                                *)
(********************************************************************)
datatype token = INT of int | REG of int | LOC of int |
                 VPUT | LPUT | WRI | PUT | ADD | MUL | SUB

type state = ( (int * int) * (int * int * int * int) )
(********************************************************************)



(********************************************************************)
(*               SIGNATURE  DECLARATIONS                            *)
(********************************************************************)
signature VMPROCESS = 
  sig
    val execute : int -> unit
  end

signature VIRTMACH = 
  sig 
    val interpret : token list * state * int -> token list * state
  end

signature PROGRAM =
  sig
     val sourceFile : TextIO.instream
     val resultFile : TextIO.outstream
  end
(********************************************************************)

First the datatype token is defined. Notice that the tokens corresponding to elements of the language that have lexical as well as syntactical value (specific instances of general types) have an associated integer value. These are INT, REG, and LOC for actual values, registers, and memory locations. Secondly a simple renaming of the type for containing state information is performed. The state is a tuple of two tuples, the first being the 2-tuple for the two register values, and the second being the 4-tuple for the four memory values.

Next the signatures for the three kinds of structures in the program are declared. The VMPROCESS signature describes a structure with one visible component, the function execute. This is the function which instructs a copy of the virtual machine to interpret a certain number of instructions for the particular program in the process. The VIRTMACH signature describes a structure also with one visible component, the function interpret. This is the function which actually begins the matching and execution of statements from the token list for the program.

(********************************************************************)
(*                   PROGRAM INSTANCE CREATION                      *)
(********************************************************************)
structure ProgOne : PROGRAM = 
struct
  val sourceFile = TextIO.openIn("progone.vm")
  val resultFile = TextIO.openOut("progone.out")
end

structure ProgTwo : PROGRAM = 
struct
  val sourceFile = TextIO.openIn("progtwo.vm")
  val resultFile = TextIO.openOut("progtwo.out")
end

structure ProgThree : PROGRAM = 
struct
  val sourceFile = TextIO.openIn("progthree.vm")
  val resultFile = TextIO.openOut("progthree.out")
end
(********************************************************************)

These lines simply create structures that point to the source and destination files--the program itself and the place for its output. The .vm files have to be there for the program to run. If the .out files are not there they will be created by the program.

(*****************************************************************************)
(*                          VIRTUAL MACHINE FUNCTOR                          *)
(*****************************************************************************)
functor makeVirtualMachine(val outfile : TextIO.outstream) : VIRTMACH = 
struct

val outf : TextIO.outstream = outfile;
exception Syntax of string
fun syntaxError s = raise (Syntax s)

The functor makeVirutalMachine takes an argument that is a pointer to a file open for output (obtained from the program structure, as will be seen). It defines the Syntax exception and a local value that is a copy of the received file pointer. The syntaxError function just makes raising an exception simpler.

(*======================FUNCTIONS YOU MUST WRITE=====================*)
fun printState : state -> unit 
     (* state -> output to file *)
		 (* Prints an output to the outfile that looks like
					Registers: 1 2
					Memory: 1 0 0 4
     *)

fun regStore : int * int * state -> state
     (* reg#, value, oldstate -> newstate *)
		 (* Puts value in reg# and returns the newstate tuple *)
		 (* Raises exception Syntax with appropriate message if # is not 1 or 2 *)

fun memStore : int * int * state -> state
     (* mem#, value, oldstate -> newstate *)
		 (* Puts value in mem# and returns the newstate tuple *)
		 (* Raises exception Syntax with appropriate message if # is not 1 - 4 *)

fun regGet : int * state -> int 
     (* reg#, state -> value *)
		 (* Returns the value in reg# obtained from the state *)
		 (* Raises exception Syntax with appropriate message if # is not 1 or 2 *)

fun memGet : int * state -> int
     (* mem#, state -> value *)
		 (* Returns the value in mem# obtained from the state *)
		 (* Raises exception Syntax with appropriate message if # is not 1 - 4 *)

fun stmt : token * token * token * state -> state 
     (* three tokens from the token list, oldstate -> newstate *)
		 (* This is the heart of the parsing.  Given three tokens
				that have been successfully removed from the token list by stmts,
				match on the particulars and do the processing.  Match also
				many different incorrect combinations/range violations and
				raise exception Syntax with appropriate messages *)

fun stmts : token list * state * int -> token list * state 
     (* old token list, oldstate, instruction count -> (new token list, newstate) *) 
		 (* If the token list is empty, close the outfile and return the tuple (nil,s) 
				where s  is state.  Otherwise try to match three tokens from the list, 
				using them to call stmt.  That result is the newstate.  Print it,
				and then if the instruction count not zero recurse, decrementing the count.
				If it is zero, print to outfile the line "======> CONTEXT SWITCH" and
				return the tuple (remaining token list, newstate).
				Also raise exception Syntax if there is no successful match. *)

(*===================================================================*)

Each of the functions above you must write. The type signature is shown to help indicate how the function fits into the whole program. The comments indicate some of the details of each function's behavior.

fun interpret(tokenStream : token list, s : state, numInstr : int ) : token list * state  =
  stmts(tokenStream,s,numInstr)
  handle 
   Syntax s => (TextIO.output(outf,"Syntax error: "^s^"\n"); (nil,((0,0),(0,0,0,0))) ) 

end 
(*********************************************************************************)

The interpret function is given to you. It takes the current token list, obtained from the process running this particular virtual machine (see below), the current state (similarly obtained), and a number of instructions to be executed (obtained from the argument to the execute function in the process). Notice that the return value of the handler expression is of the same type as interpret but with dummy zero values. The stmts function will be responsible for decrementing the numInstr value with each recursive call to itself and for stopping when the call has reached a value of zero for numInstr.

(*********************************************************************)
(*                          VMPROCESS FUNCTOR                          *)
(*********************************************************************)
functor makeVMProcess(prog : PROGRAM ) : VMPROCESS
=
struct

(******************************************)
(* Some utilities for the parsing process *)
(******************************************)
exception Invalid_Register of int
exception Invalid_Memory of int 
exception Lexical of string

fun regError k = raise (Invalid_Register k)
fun memError k = raise (Invalid_Memory k)
fun lexicalError s = raise (Lexical s)

The functor takes an instance of a PROGRAM structure and uses the makeVirtualMachine functor to create an instance of a virtual machine and combines these with state-variables for the current token list and the machine state. The three exceptions are used during the lexical analysis done by mapping the token function of the set of whitespace-separated strings from the input. The Invalid_Register and Invalid_Memory exceptions are only raised if a token such as l7 or r3 is in the input. All other lexical errors, such as muq instead of mul cause Lexcial to be raised. Again the functions are simply shorthand for raising the exceptions.

fun token(s : string) : token =
  if Char.isDigit(String.sub(s,0)) orelse (String.sub(s,0) = #"~")
  then case Int.fromString(s)
       of SOME n => if (String.size(Int.toString(n))=String.size(s))
                    then INT n
                    else lexicalError ("Bad numeric token "^s)
       |  NONE   => lexicalError ("Bad numeric token "^s)
  else
    if String.size(s)=2
    then
      if (String.sub(s,0) = #"r")
      then
        if Char.isDigit(String.sub(s,1))
        then case Int.fromString(Char.toString(String.sub(s,1)))
             of SOME n => if (n<1) orelse (n>2)
                          then regError(n)
                          else REG n
             |  NONE   => lexicalError ("Bad register token "^s)
        else lexicalError("Bad register token "^s)
      else
        if (String.sub(s,0) = #"l")
        then
          if Char.isDigit(String.sub(s,1))
          then case (Int.fromString(Char.toString(String.sub(s,1))))
               of SOME n => if (n<1) orelse (n>4)
                            then memError(n)
                            else LOC n
               |  NONE   => lexicalError ("Bad token "^s)
          else lexicalError("Bad memory location token "^s)
        else lexicalError("Bad memory location or register token "^s)
    else
      case s
         of "vput"  => VPUT
         |  "put"   => PUT
         |  "add"   => ADD
         |  "mul"   => MUL
         |  "sub"   => SUB
         |  "lput"  => LPUT
         |  "wri"   => WRI
         |    _     => lexicalError ("Bad token "^s)

fun getStrings(infile: TextIO.instream) : string list =
  String.tokens Char.isSpace (TextIO.input infile)

This is the lexical scanner function. It takes a string and determines which of the values for datatype token it is appropriate to return, extracting any lexemes that need to be bound with the token, such as the 5 in INT 5.

(***********************************************)
(* Local state for tracking execution progress *)
(***********************************************)

val sourceFile : TextIO.instream = prog.sourceFile;
val resultFile : TextIO.outstream = prog.resultFile;
structure virtmach : VIRTMACH = makeVirtualMachine(val outfile  = resultFile);

The pointers to the input and output files for the process just are copies of the pointers in the program that the process is built from. The virtual machine writes results so is informed of the output file pointer upon its creation.

(*======================CODE YOU MUST WRITE=====================*)
val tokencoding : token list ref
       (* set this equal to a reference to the result of mapping  
					token over the result of getStrings applied to sourceFile 
          and handle exceptions here *)

You will want to map the token function over the result of applying getStrings() to sourceFile. You will assign a reference to this value to the tokencoding value. This will happen just once when the functor first creates the resulting structure. At the end of each execute call you will replace that token list with the one you get back from the virtual machine, which will have had the executed instructions removed.

(*****************************************************)
(*   Provide int ref's for                           *)
(*                                                   *)
(*   reg1, reg2, mem1, mem2, mem3, and mem4          *)
(*                                                   *)
(*   and initialize them to reference values of zero *)
(*****************************************************)

You need six values that are references to integer, initialized to zero. These act as the state (the register and memory) values stored when a partial execution has concluded. They are reference variables so they can actually be updated for the next execution.

fun setState : state -> unit 
      (* newstate -> reference variables for state updated *)

fun getState : unit -> state
      (* no arguments -> current state tuple *)

fun execute : int -> unit 
      (* number of instructions to process -> output of state at each instruction to file *)
			(*                                      and update of state and remaining token list *)
(*==============================================================*)

end
(******************************************************************)

The function setState takes a state variable and writes each of its members to the reference variables inside the process structure. The function getState accesses these reference variables and constructs a state tuple out of them to return to the caller. The function execute calls the interpret function of the virtual machine with an integer number indicating how many instructions to execute. When it is finished it updates the state and token list.


Two features of ML are crucial to your functions in this assignment and are among the primary reasons for the assignment. If you use them carefully the work will be much easier and your understanding of ML and of the essentials of this course will be better. Before you choose to implement the assignment in one of the numerous other ways it can be done, speak to one of us!

The first feature is the use of tuples as first-class objects. You've seen them passed as parameters and how name-binding takes place there. The same element-wise name-binding extends to assignment, such as in the let-binding: let val (x,y) = <something that is a 2-tuple> in a = x + y; Functions can also return tuples as values.

The second feature is list-element matching in parameter binding. For instance, if the first argument passed to a function is a list and the formal parameter matches that argument using the cons operator, as in el::els, then el will be bound to the first element of the list and els will be bound to the rest or tail (or cdr) of the list. This is very useful in recursive functions that take action based on the head of the list and then recurse on the tail. It provides a local name-binding for these pieces.

Matching and exception handling are also both very important, as they are ubiquitous in ML (and elsewhere for that matter).

You should include as much error-checking as you can. You should find at least four syntactic violations (there are many more than that) for which you can raise an exception and give an appropriate message.


The following example shows how to test your code once it compiles.

-use "vmproc.sml";
<compiler responses>

-use "vmtester.sml";
<compiler responses>

Then you can go check out the results in the file .out files.


You can use the *.vm files currently in the above specified directory to test your work as you develop it. These programs do not test any errors in the lexical or syntactical aspects of the input. You should test these yourself. You will be given a new set of test files to run your code on just before it is due. This new set will test errors. You should turn in your code and the results of those tests (as well as any others you choose to add which test functionality beyond the requisite) in class.

[ Back to CS451: The ML Segment ]