// Run through OCR 8/16/99 RJF// MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY Al Memo 379 November 1976 LAMBDA THE ULTIMATE DECLARATIVE by Guy Lewis Steele Jr. * Abstract: In this paper, a sequel to LAMBDA: The Ultimate Imperative, a new view of LAMBDA as a renaming operator is presented and contrasted with the usual functional view taken by LISP. This view, combined with the view of function invocation as a kind of generalized GOTO, leads to several new insights into the nature of the LISP evaluation mechanism and the symmetry between form and function, evaluation and application, and control and environment. It also complements Hewitt's actors theory nicely, explaining the intent of environment manipulation as cleanly, generally, and intuitively as the actors theory explains control structures. The relationship between functional and continuation-passing styles of programming is also clarified. This view of LAMBDA leads directly to a number of specific techniques for use by an optimizing compiler: (1) Temporary locations and user-declared variables may be allocated in a uniform manner. (2) Procedurally defined data structures may compile into code as good as would be expected for data defined by the more usual declarative means. (3) Lambda-calculus-theoretic models of such constructs as GOTOL DO loops, call-by-name, etc. may be used directly as macros, the expansion of which may then compile into code as good as that produced by compilers which are designed especially to handle GOTO, DO, etc. The necessary characteristics of such a compiler designed according to this philosophy are discussed. Such a compiler is to be built in the near future as a testing ground for these ideas. Keywords: environments, lambda-calculus, procedurally defined data, data types, optimizing compilers, control structures, function invocation, temporary variables, continuation passing, actors, lexical scoping, dynamic binding This report describes research done at the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. Support for the laboratory's artificial intelligence research is provided in part by the Advanced Research Projects Agency of the Department of Defense under Office of Naval Research contract N00014-75-C-0643. * NSF Fellow Contents // page numbers are probably wrong// 1. A Different View of LAMBDA I 1.1. Primitive Operations in Programming Languages 1 1.2. Function Invocation: The Ultimate Imperative 2 1.3 LAMBDA as a Renaming Operator 7 1.4. An Example: Compiling a Simple Function 8 1.5. Who Pops the Return Address? 11 2. Lexical and Dynamic Binding 3. LAMBDA, Actors, and Continuations 16 3.1. Actors = Closures (mod Syntax) 16 3.2. The Procedural View of Data Types 20 4. Some Proposed Organization for a Compiler 25 4.1. Basic Issues 25 4.2. Some Side Issues 27 5. Conclusions 29 Appendix A. Conversion to Continuation-Passing Style 30 Appendix B. Continuation-Passing with Multiple Valeu Return 38 Notes 39 References 42 Acknowledgements Thanks are due to Gerald Susaman, Carl Hewitt, Allen Brown, Jon Doyle, Richard Stalluan, and Richard Zippel for discussing the issues presented here and for proofreading various drafts of the document. An earlier version of this document was submitted in April 1976 to the Department of Electrical Engineering and Computer Science at KIT in the form of a proDosal for research towards a Master's Thesis. 1. A Different View of LAMBDA Historically, LAMBDA expressions in LISP have been viewed as functions: objects which, when applied ordered sets of arguments, yield single values. These single values typically then become arguments for yet other functions. The consistent use of functions in LISP leads to what is called the applicative programming style. Here we discuss a more general view, of which the functional view will turn out to be a special case. We will consider a new interpretation of LAMBDA as an environment operator which performs the primitive declarative operation of renaming a quantity, and we will consider a function call to be a primitive unconditional imperative operator which includes GOTO as a special case. (In an earlier paper (Steele 76) we described LAMBDA as 'the ultimate imperative'. Here we assert that this was unfortunately misleading, for it is function invocation which is imperative.) 1. 1. Primitive 0perations in Programming Languages What are the primitive operations common to all high-level programming languages? It is the data manipulation primitives which most clearly differentiate high-level languages: FORTRAN has numbers, characters, and arrays; PL/I has strings and structures as well; LISP has list cells and atomic symbols. All have, however, similar notions of control structures and of variables. If we ignore the various data types and data manipulation primitives, we find that only a few primitive ideas are left. Some of these are: Transfer of control Environment operations Side effects Process synchronization Transfer of control may be subdivided into conditional and unconditional transfers. Environment operations include binding of variables on function entry, declaration of local variables, and so on. Side effects include not only modifications to data structures, but altering of global variables and input/output. Process synchronization includes such issues as resource allocation and passing of information between processes in a consistent manner. Large numbers of primitive constructs are provided in contemporary programming languages for these purposes. The following short catalog is by no means complete, but only representative: Transfer of control Sequential blocks GOTO IF-THEN-ELSE WHILE-DO, REPEAT-UNTIL, and other loops CASE SELECT EXIT (also known as ESCAPE or CATCH/THROW) Decision tables Environment operations Formal procedure parameters Declarations within blocks Assignments to local variables Pattern matching Side effects Assignments to global (or COMMON) variables Input/output Assignments to array elements Assignments to other data structures Process synchronization Semaphores Critical regions Monitors Path expressions Often attempts are made to reduce the number of operations of each type to some minimal set. Thus, for example, there have been proofs that sequential blocks, IF-THEN-ELSE, and WHILE-DO form a complete set of control operations. One can even do without IF-THEN-ELSE, though the technique for eliminating it seems to produce more rather than less complexity. (Note No IF-THEN-ELSE) A minimal set should contain primitives which are not only universal but also easy to describe, simple to implement, and capable of describing more complex constructs in a straightforward manner. This is why the semaphore is' still commonly used; its simplicity makes it is easy to describe as well as implement, and it can be used to describe more complex synchronization operators. The expositors of monitors and path expressions, for example, go to great lengths to describe them in terms of semaphores [Hoare 74) (Campbell 74); but it would be difficult to describe either of these 'high- leveP synchronization constructs in terms of the other. With the criteria of simplicity, universality, and expressive power in mind, let us consider some choices for sets of control and environment operators. Side effects and process synchronization will not be treated further in this paper. 1.2. Function Invocation: The Ultimate Imperative The essential characteristic of a control operator is that it transfers control. Itmay do this in a more or less disciplined way, but this discipline is generally more conceptual than actual; to put it another way, "down underneath, DO, CASE, and SELECT all compile into IFs and GOTOs'. This is why many people resist the elimination of GOTO from high-level languages; just as the semaphore seems to be a fundamental synchronization primitive, so the GOTO seems to be a fundamental control primitive from which, together with IF, any more complex one can be constructed if necessary. (There has been a recent controversy over the nested IF-ThEN-ELSE as well. Alternatives such as repetitions of tests or decision tables have been examined. However, there is no denying that IF-THEN-ELSE seems to be the simplest conditional control operator c'sily capable of expressing all others.) One of the difficulties of using GOTO, however, is that to communicate information from the code gone from to the code gone to it is necessary to use global variables. This was a fundamental difficulty with the CONNIVER language (McDermott 74), for example; while CONNIVER allowed great flexibility in its control structures, the passing around of data was so undisciplined as to be completely unmanageable. It would be nice if we had some primitive which passed some data along while performing a GOTO. It turns out that almost every high-level programming language already has such a primitive: the function call! This construct is almost always completely ignored by those who catalog control constructs; whether it is because function calling is taken for granted, or because it is not considered a true control construct, I do not know. One might suspect that there is a bias against function calling because it is typically implemented as a complex, slow operation, often involving much saving of registers, allocation of temporary storage, etc. (Note Expensive Procedures) Let us consider the claim that a function invocation is equivalent to a GOTO which passes some data. But what about the traditional view of a function call which expects a returned value? The standard scenario for a function call runs something like this: (1) Calculate the arguments and put them where the function expects to find them. (2] Call the function, saving a return address (on the PDP-1O, for example, a PUSHJ instruction is used, which transfers control to the function after saving a return address on a pushdown stack). (3) The function calculates a value and puts it where its caller can get it. (4) The function returns to the saved address, throwing the saved address away (on the PDP-1O, this is done with a POPJ instruction, which pops an address off the stack and jumps to that address). It would appear that the saved return address is necessary to the scenario. If we always compile a function invocation as a pure GOTO instead, how can the function know where to return? To answer this we must consider carefully the steps logically required in order to compute the value of a function applied to a set of arguments. Suppose we have a function BAR defined as: (DEFINE BAR (LAMBDA (X Y) (F (G X) (H Y)))) In a typical LISP implementation, when we arrive at the code for BAR we expect to have two computed quantities, the arguments, plus a return address, probably on the control stack. Once we have entered BAR and given the names X and Y to the arguments, we must invoke the three functions denoted by F, G, and H. When we invoke 6 or H, it is necessary to supply a return address, because we must eventually return to the code in BAR to complete the computation by invoking F. But we do not have to supply a return address to F; we can merely perform a GOTO, and F will inherit the return address originally supplied to BAR. Let us simulate the behavior of a PDP-1O pushdown stack to see why this is true. If we consistently used PUSHJ for calling a function and POPJ for returning from one, then the code for BAR, F, 6, and H would look something like this: BAR: ... F: PUSHJ6 POPJ MM: PUSHJH 6: BARZ: ... POPJ PUSHIJ F BAR3: POPJ H: POPJ We have labeled not only the entry points to the functions, but also a few key points within BAR, for expository purposes. We are justified in putting no ellipsis between the PUSHJ F and the POPq.J in BAR, because we assume that no cleanup other than the POPJ is necessary, and because the value returned by F (in the assumed RESULT register) will be returned from BAR also. Let us depict a pushdown stack as a list growing towards the right. On arrival at BAR, the caller of BAR has left a return address on the stack. On executing the PUSHJ 6, we enter the function 6 after leaving a return address BAR1 on the stack: ., , BARI The function 6 may call other functions in turn, adding other return addresses to the stack, but these other functions will pop them again on exit, and so on arrival at the POPJ in 6 the stack is the same. The POPJ pops the address BAR1 and jumps there, leaving the stack like this: ..., In a similar manner, the address BARZ is pushed when H is called, and H pops this address on exit. The same is true of F and BAR3. On return from F, the POPJ in BAR is executed, and the return address supplied by BAR's caller is popped and jumped to. Notice that during the execution of F the stack looks like this: ., , BAR3, ... Suppose that at the end of BAR we replaced the PUSHJ F, POPJ by GOTO F. Then on arrival at the GOTO the stack would look like this: ., The stack would look this way on arrival at the POPJ in F, and so F would pop this return address and return to BAR's caller. The net effect is as before. The value returned by F has been returned to BAR's caller, and the stack was left the same. The only difference was that one fewer stack slot was consumed during the execution of F, because we did not push the address BAR3. Thus we see that F may be invoked in a manner different from the way in which 6 and H are invoked. This fact is somewhat disturbing. We would like our function invocation mechanism to be uniform, not only for aesthetic reasons, but so that functions may be compiled separately and linked up at run time with a minimum of special-case interfacing. Uniformity is achieved in some LISPs by always using PUSHJ and never GOTO, but this is at the expense of using more stack space than logically necessary. At the end of every function X the sequence "PUSHJ Y; POPJ" will occur, where Y is the last function invoked by X, requiring a logically unnecessary return address pointing to a POPJ. (Note Debugging) An alternate approach is suggested by the implementation of the SCHEME interpreter. (Sussman 15) We note that the textual difference between the calls on F and 6 is that the call on 6 is nested as an argument to another function call, whereas the call to F is not. This suggests that we save a return address on the stack when we begin to evaluate a form (function call) which is to provide an argument for another function, rather than when we invoke the function. (The SCHEME interpreter works in exactly this way.) This discipline produces a rather elegant symmetry: evaluation of forms (function invocation) pushes additional control stack, and application of functions (function entry and the consequent binding of variables) pushes additional environment stack. Thus for BAR we would compile approximately the following code: BAR: PUSH [BARI) ;save return address for (6 X) GOTO 6 ,call function 6 BARI: PUSH (BAR2) ;save return address for (H Y) (set up arguments for H> GOTO H ,call function H BAR2: GOTO F ,call function F The instruction PUSH [XJ pushes the address X on the stack. Note that no code appears in BAR which ever pops a return address off the stack; it pushes return addresses for 6 and H, but 6 and H are responsible for popping them, and BAR passes its own return address implicitly to F without popping it. This point is extremely important, and we shall return to it later. Those familiar with the MacLISP compiler will recognize the code of the previous example as being similar to the 'LSUBR' calling convention. Under this convention, more than just return addresses are kept on the control stack; a function receives its arguments on the stack, above the return address. Thus, when BAR is entered, there are (at least) three items on the stack: the last argument, Y, is on top; below that, the previous (and in fact first) one, X; and below that, the return address. The complete code for BAR might look like this: BAR: PUSH (BARl) ;save return address for (6 X) PUSH -2(P) ,push a copy of X GOTO 6 ;call function 6 BARi: PUSH RESULT ;result of 6 is in RESULT register PUSH [BAR2) ;save return address for (H Y) PUSH -2(P) ;push a copy of Y GOTO H ;call function H BAR2: POP -2(P) ;clobber X with result of 6 MOVEM RESULT,(P) ;clobber Y with result of H GOTO F ;call function F (There is some tricky code at point BARZ: on return from H the stack looks like: ..., , X, Y, After the POP instruction, the stack looks like: ..., , , Y That is, the top item of the stack has replaced the one two below it. After the MOVEM (move to memory) instruction: ..., , (result from 6>, which is exactly the correct setup for calling F. Let us not here go into the issue of how such clever code might be generated, but merely recognize the fact that it gets the stack into the necesssary condition for calling F.) Suppose that the saving of a return address and the setting up of arguments were commutative operations. (This is not true of the LSUBR calling convention, because both operations use the stack; but it is true of the SUBR convention, where the arguments are 'spread" (McCarthy 62] (Moon 74) in registers, and the return address on the stack.) Then we may permute the code as follows (from the original example): BAR: PUSH (BARi] ;save return address for (6 X) GOTO C' ,call function 6 BARi: PUSH (BARZJ ;save return address for (H Y) GOTO H ,call function H BAR2: (set up arguments for F in registers> GOTOF ;call function F As it happens, the PDP-1O provides an instruction, PUSHJ, defined as follows: PUSH (LlJ GOTO 6 is the same as PUSHJ 6 Ll: LI: except that the PUSHJ takes less code. Thus we may write the code as: BAR: PUSHJ 6 ,save return address, call 6 (save result of 6> PUSHJ H ,save return address, call H (set up arguments for F in registers> GOTO F ,call function F This is why PUSHJ (and similar instructions on other machines, whether they save the return adress on a stack, in a register, or in a memory location) works as a subroutine call, and, by extension, why up to now many people have thought of pushing the return address at function call time rather than at form evaluation time. The use of GOTO to call a function 'tail-recursively' (known around MIT as the 'JRST hack', from the PDP-lO instruction for GOTO, though the hack itself it dates back to the PDP-l) is in fact not just a hack, but rather the most uniform method for invoking functions. PUSHJ is not a function calling primitive per se, therefore, but rather an optimization of this general approach. 1.3. LAMBDA as a Renaming Operator Environment operators also take various forms. The most common are assignment to local variables and binding of arguments to functions, but there are others, such as pattern-matching operators (as in COMIT [MITRLE 62] [Yngve 72], SNOBOL [Forte 67), MICRO-PLANNER [Sussman 71], CONNIVER [McDermott 74], and PLASMA [Smith 75)). It is usual to think of these operators as altering the contents of a named location, or of causing the value associated with a name to be changed. In understanding the action of an environment operator it may be more fruitful to take a different point of view, which is that the value involved is given a new (additional) name. If the name had previously been used to denote another quantity, then that former use is shadowed; but this is not necessarily an essential property of an environment operator, for we can often use alpha-conversion ("uniquization" of variable names) to avoid such shadowing. It is not the names which are important to the computation, but rather the quantities; hence it is appropriate to focus on the quantities and think of them as having one or more names over time, rather than thinking of a name as having one or more values over time. Consider our previous example involving BAR. On entry to BAR two quantities are passed, either in registers or on the stack. Within BAR these quantities are known as X and Y, and may be referred to by those names. In other environments these quantities may be known by other names; if the code in BAR's caller were (BAR 14 (+ X 3)), then the first quantity is known as 14 and the second has no explicit name. (Note Return Address) On entry to BAR, however, the LAMBDA assigns the names X and Y to these two quantities. The fact that X means something else to BAR's caller is of no significance, since these names are for BAR's use only. Thus the LAMBDA not only assigns names, but determines the extent of their significance (their scope). Note an interesting symmetry here: control constructs determine constraints in time (sequencing) in a program, while environment operators determine constraints in space (textual extent, or scope). One way in which the renaming view of LAMBDA may be useful is in allocation of temporaries in a compiler. Suppose that we use a targeting and preferencing scheme similar to that described by in [Wulf 75] and [Johnsson 75). Under such a scheme, the names used in a program are partitioned by the compiler into sets called 'preference classes". The grouping of several names into the same set indicates that it is preferable, other things being equal, to have the quantities referred to by those names reside in the same memory location at run time; this may occur because the names refer to the same quantity or to related quantities (such as X and X+1). A set may also have a specified target, a particular memory location which is preferable to any other for holding quantities named by members of the set. As an example, consider the following code skeleton: ((LAMBDA (A B) ) (+ X Y) (ft Z 14)) Suppose that within the compiler the names Ti and T2 have been assigned to the temporary quantities resulting from the addition and multiplication. Then to process the "binding" of A and B we need only add A to the preference class of Ti, and B to the preference class of T2. This will have the effect of causing A and Ti to refer to the same location, wherever that may be; similarly B and T2 will refer to the same location. If TI is saved on a stack and T2 winds up in a register, fine; references to A and B within the will automatically have this information. On the other hand, suppose that is (FOO 1 A B), where FOO is a built-in function which takes its arguments in registers 1, 2, and 3. Then A's preference class will be targeted on register 2, and B's on register 3 (since these are the only uses of A and B within ); this will cause Ti and T2 to have the same respective targets, and at the outer level an attempt will be made to perform the addition in register 2 and the multiplication in register 3. This general scheme will produce much better code than a scheme which says that all LAMBDA expressions must, like the function FOO, take their arguments in certain registers. Note too that no code whatsoever is generated for the variable bindings as such; the fact that we assign names to the results of the expressions (+ X Y) and (' Z W) rather than writing (FOO 1 (A Z 14) (.,. X Y)) makes no difference at all, which is as it should be. Thus, compiler temporaries and simple user variables are treated on a completely equal basis. This idea was used in [Johnsson 75], but without any explanation of why such equal treatment is justified. Here we have some indication that there is conceptually no difference between a user variable and a compiler-generated temporary. This claim will be made more explicit later in the discussion of continuation-passing. Names are merely a convenient textual device for indicating the various places in a program where a computed quantity is referred to. If we could, say, draw arrows instead, as in a data flow diagram, we would not need to write names. In any case, names are eliminated at compile time, and so by run time the distinction between user names and the compiler's generated names has been lost. Thus, at the low level, we may view LAMBDA as a renaming operation which has more to do with the internal workings of the compiler (or the interpreter), and with a notation for indicating where quantities are referred to, than with the semantics as such of the computation to be performed by the program. 1.4. An Example: Compiling a Simple Function One of the important consequences of the view of LAMBDA and function calls presented above is that programs written in a style based on the lambda- calculus-theoretic models of higher-level constructs such as DO loops (see (Stoy 743 [Steele 76]) will be correctly compiled. As an example, consider this iterative factorial function: (DEFINE FACT (LAMBDA (N) (LABELS ((FACTi (LAMBDA (M A) (IF (: N 0) A (FACTi (- N 1) (' M A)))))) (FACTi N 1)))) Let us step through a complete compilation process for this function, based on the ideas w~ have seen. (This scenario is intended only to exemplify certain ideas, and does not reflect entirely accurately the targeting and preferencing techniques described in [Wulf 753 and (Johnsson 75].) First, let us assign names to all the intermediate quantities (temporaries) which will arise: (DEFINE FACT (LAMBDA (N) Tlu(LABELS ((FACTL (LAMBDA (N A) TZu(IF T3u(: N 0) A T4m(FACTl T5.(- N 1) T6u(' N A)))))) T7u(FACTi N 1)))) We have attached a name TI-Ti to all the function calls in the definition; these names refer to the quantities which will result from these function calls. Now let us place the names in preference classes. Since N is used only once, as an argument to FACTi, which will call that argument N, N and N belong in the same class; T5 also belongs to this class for the same reason. Ti, T2, T4, and Ti belong in the same class because they are all names, in effect, for the result of FACTi or FACT. T6 and A belong in the same class, because T6 is an argument to FACTL; TZ and A belong in the same class, because A is one possible result of the IF. T3 is in a class by itself. (N, N, T5} (A, Ti, T2, T4, T6, Ti) (T3) A fairly complicated analysis of the 'lifetimes' of these quantities shows that N and T5 must coexist simultaneously (while calculating T6), and so they cannot really be assigned the same memory location. Hence we must split TS off into a class of its own after all. Let us suppose that we prefer to target the result of a global function into register RESULT, and the single argument to a' function into register ARG. (FACTi, which is not a global function, is not subject to these preferences.) Then we have: (N, N) target ARG (by virtue of N) (T5 I (A, Ti, T2, T4, T6, T7) target RESULT (by virtue of TI) (T3 I T3, on the other hand, will need no memory location (a property of the PDP-1O instruction set). Thus we might get this assignment of locations: (N, N) ARO (T5) Ri (A, Ti, T2, T4, T6, Ti) RESULT where RI is an arbitrarily chosen register. We now really have two functions to compile, FACT and FACTi. Up to now we have used the renaming properties of LAMBDA to assign registers; now we use the GOTO property of function calls to construct this code skeleton: FACT: GOTO FACTi ,call FACTi FACTi: POPJ FACTiA: GOTO FACTi ;FACTl calling itself Filling in the arithmetic operations and register assignments gives: ;;; On arrival here, quantity named N is in register ARG. FACT: MOVEI RESULT,1 ;N already in MG; set up I GOTO FACTI ;call FACTi ;;; On arrival here, quantity named M is in MG, ;;; and quantity named A is in RESULT. FACTi: JUMPN ARG,FACT1A POPJ ,A is already in RESULT! FACTiA: MOVE Ri,ARG ;uist do subtraction in RI SUBI Ri,I IMUL RESULT,ARG ;do multiplication MOVE AR6,RI ;now put result of subtraction in MG GOTO FACTi ;FACTI calling itself This code, while not perfect, is not bad. The major deficiency, which is the use of RI, is easily cured if the compiler could know at some level that the subtraction and multiplication can be interchanged (for neither has side effects which would affect the other), producing: FACTiA: IMUL RESULT,ARG SUBI ARG,i GOTO FACTi Similarly, the sequence: GOTO FACT 1 FACTi: could be optimized by removing the GOTO. These tricks, however, are known by any current reasonably sophisticated optimizing compiler. What is more important is the philosophy taken in interpreting the meaning of the program during the compilation process. The structure of this compiled code is a loop, not a nested sequence of stack-pushing function calls. Like the SCHEME interpreter or the various PLASMA implementations, a compiler based on these ideas would correctly reflect the semantics of lambda- calculus-based models of high-level constructs. 1.5. Who Pops the Return Address? Earlier we showed a translation of BAR into "machine language", and noted that there was no code which explicitly popped a return address; the buck was always passed to another function (F, 6, or H). This may seem surprising at first, but it is in fact a necessary consequence of our view of function calls as "GOTOs with a message'. We will show by induction that only primitive functions not expressible in our language (SCHEME) perform POPS; indeed, only this nature of the primitives determines the fact that our language is functionally oriented! What is the last thing performed by a function? Consider the definition of one: (DEFINE FUN (LAMBDA (Xi X2 ... XN) )) Now (body> must be a form in our language. There are several cases: (1] Constant, variable, or closure. In this case we actually compiled a POPS in the case of FACT above, but we could view constants, variables,' and closures (in general, things which 'evaluate trivially" in the sense described in [Steele 76]) as functions of zero arguments if we wished, and so GOTO a place which would get the value of the constant, variable, or closure into RESULT. This place would inherit the return address, and so our function need not pop it. Alternatively, we may view constants, etc. as primitives, the same way we regard integer addition as a primitive (note that CTA2 above required a POPS, since we had 'open-coded" the addition primitive). [2] (IF (pred> (expi> ). In this case the last thing our function does is the last thing or does, and so we appeal to this analysis inductively. [3] (LABELS (defns> ). In this case the last thing our function does is the last thing does. This may involve invoking a function defined in the LABELS, but we can consider them to be separate functions for our purposes here. [4] A function call. In this case the function called will inherit the return address. Since these are all the cases, we must conclude that our function never pops its return address! But it must get popped at some point so that the final value may be returned. Or must it? If we examine the four cases again and analyze the recursive argument, it becomes clear that the last thing a function that we define in SCHEME eventually does is invoke another function. The functions we define therefore cannot cause a return address to be popped. It is, rather, the primitive, built-in operators of the language which pop return addresses. These primitives cannot be directly expressed in the language itself (or, more accurately, there is some basis set of them which cannot be expressed). It is the constants (which we may temporarily regard as zero-argument functions), the arithmetic operators, and so forth which pop the return address. (One might note that in the compilation of CURRIED-TRIPLE-ADD above, a POPS appeared only at the point the primitive '+' function was open-coded as ADD instructions.) 2.0 Lexical and Dynamic Binding The examples of the previous section, by using only local variables, avoided the question of whether variables are lexically or dynamically scoped. In this section we will ~see that lexical scoping is necessary in order to reflect the semantics of lambda-calculus-based models. We might well ask, then, if LISP was originally based on lambda calculus, why do most current LISP systems employ dynamic binding rather than lexical? The primary reason seems to be the introduction of stack hardware at about the time of early LISP development. (This was not pure cause and effect; rather, each phenomenon influenced the other.) The point is that a dynamic bindings stack parallels the control stack in structure. If one has an escape operator [Reynolds 72] (also known as CATCH [Moon 74] or EXIT [Wulf 71] [Wulf 72]) then the 'control stack' may be, in general, a tree structure, just as the introduction of FUNAROs requires that the environment be tree-structured. [Moses 70] If these operators are forbidden, or only implemented in the 'downward' sense (in the same way that ALGOL provides "downward funarg' (procedure arguments to functions) but not "upward funarg' (procedure-valued functions)) as they historically have been in most non-toy LISP systems, then hardware stack instructions can always be used for function calling and environment binding. Since the introduction of stack hardware (e.g. in the PDP-6), most improvements to LISP's variable binding methods have therefore started with dynamic binding and then tried to patch it up. MacLISP [Moon 74] uses the so-called shallow access scheme, in which the current value of a variable is in a fixed location, and old values are on a stack. The advantage of this technique is that variables can be accessed using only a single memory reference. When code is compiled, variables are divided into two classes: special variables are kept in their usual fixed locations, while local variables are kept wherever convenient, at the compiler's discretion, saving time over the relatively expensive special binding mechanism. InterLISP [Teitelman 74] (before spaghetti stacks) used a deep access scheme, in which it was necessary to look up on the bindings stack to find variable bindings; if a variable was not bound on the stack, the its global value cell was checked. The cost of the stack search was ameliorated by looking up, on entry to a function, the locations of variables needed by that function. The advantage of this scheme is that the "top level" value of a variable is easily accessed, since it is always in the variable's value cell. (InterLISP also divides variables into two classes for purposes of compilation; only special variables need be looked up on the bindings stack.) Two other notable techniques are the use of value cells as a cache for a deep dynamic access scheme, and 'spaghetti stacks' [Bobrow 73], which attempt to allow the user to choose between static and dynamic binding. The problem with the latter is that they are so general that it is difficult for the compiler to optimize anything; also, they do not completely solve the problem of choosing between static and dynamic binding. For example, the GEN- SQRT-OF-GIVEN-EXTRA-ToLERANCE function given in [Steele 76] cannot be handled properly with spaghetti stacks in the straightforward way. The difficulty is that there is only one access link for each frame, while there are conceptually two distinct access methods, namely lexical and dynamic. Unfortunately, dynamic binding creates two difficulties. One is the well-known "FUNARG" problem [Moses 70]; the essence of this problem is that lexical scoping is desired for functional arguments. The other is more subtle. Consider the FACT example above. If we were to use dynamic binding, then every time around the FACTi loop it would be necessary to bind N and A on a stack. Thus the binding stack would grow arbitrarily deep as we went around the loop many times. It might be argued that a compiler might notice that the old values of N and A can never be referenced, and so might avoid pushing N and A onto a stack. This is true of this special case, but is undecidable in general, given that the compiler may not be in a position to examine all the functions called by the function being compiled. Let us consider our BAR example above: (DEFINE BAR (LAMBDA (X Y) (F (6 X) (H Y)))) Under dynamic binding, F might refer to the variables X and Y bound by BAR. Hence we must push X and Y onto the bindings stack before calling F, and we must also pop them back off when F returns. It is the latter operation that causes difficulties. We cannot merely GOTO F any more; we must provide to F the return address of a routine which will pop X and Y and then return from BAR. F cannot inherit BAR's return address, because the unbinding operation must occur between the return from F and the return from BAR. Thus, if we are to adhere to the view proposed earlier of LAMBDA and function calls, we are compelled to accept lexical scoping of variables. This will solve our two objections to dynamic binding, but there are two objections to lexical scoping to be answered. The first is whether it will be inherently less efficient than dynamic binding (particularly given that we know so much about how to implement the latter!); the second is whether we should abandon dynamic binding, inasmuch as it has certain useful applications. ALGOL implementors have used lexical scoping for many years, and have evolved techniques for handling it efficiently, in particular the device known as the display. [Dijkstra 67] Some machines have even had special hardware for this purpose [Hauck 68], just as PDP-6's and PDP-i0's have special hardware which aids dynamic binding. The important point is that even if deep access is used, it is not necessary to search for a variable's binding as it is for dynamic binding, since the binding must occur at a fixed place relative to the current environment. The display is in fact simply a double-indexing scheme for accessing a binding in constant time. It is not difficult to see that search is unnecessary if we consider that the binding appears lexically in a fixed place relative to the reference to the variable; a compiler can determine the appropriate offset at compile time. Furthermore, the "access depth" of a lexical variable is equal to the number of closures which contain it, and in typical programs this depth is small (less than 5). In an optimizing compiler for lexically scoped LISP it would not necessary to create environment structures in a standard form. Local variables could be kept in any available registers if desired. It would not necessary to interface these environment structures to the interpreter. Because the scoping would be strictly lexical, a reference to a variable in a compiled environment structure must occur in compiled code appearing within the LAMBDA that bound the variable, and so no interpreted reference could refer to such a variable. Similarly, no compiled variable reference could refer to an environment structure created by the interpreter. (An exception to this argument is the case of writing an interactive debugging package, but that will be discussed later. This problem can be fixed in any case if the compiler outputs an appropriate map of variable locations for use by the debugger.) Consider this extension of a classic example of the use of closures: (DEFINE CURRIED-TRIPLE-ADD (LAMBDA (X) (LAMBDA (Y) (LAMBDA (Z) (. X Y Z))))) Using a very simple-minded approach, let us represent a closure as a vector whose first element is a pointer to the code and whose succeeding elements are all the quantities needed by that closure. We will write a vector as [xO, xi, ..., x]. Let us also assume that when a closed function is called the closure itself is in register CLOSURE. (This is convenient anyway on a PDP- 10, since one can call the closure by doing an indexed GOTO, such as GOTO @(CLOSURE), where @ means indirection through the first element of the vector.) Let us use the LSUBR calling convention described earlier for passing arguments. Finally, let there be a series of functions nCLOSE which create closure vectors of n elements, each taking its arguments in reverse order for convenience (the argument on top of the stack becomes element 0 of the vector.) Then the code might look like this: CTA:PUSH [CTAiJ ;X is on stack; add address of code GOTO 2CLOSE ;create closure [CTAi, X] CTAI:PUSH CLOSURE ;now address of [CTAi, XJ is in CLOSURE PUSH [CTA2J ;Y was on stack on entry GOTO 3CLOSE ;return closure [CTA2, [CTAi, X], Y] CTA2: POP RESULT ;pop Z into result ADD RESULT,2(CLOSURE) ;add in Y (using commutativity, etc.) MOVE TEMP,1(CLOSURE) ;fetch pointer to outer closure ADD RESULT, I(TEMP) ;add in X POPJ ;return sum in RESULT Admittedly this does not compare favorably with uncurried addition, but the point is to illustrate how easily closures can be produced and accessed. If several variables had been closed in the outer closure rather than just X, then one might endeavor in CTA2 to fetch the outer closure pointer only once, just as in ALGOL one loads a display slot only once and then uses it many times to access the variables in that contour. A point to note is that it is not necessary to divide lexically scoped variables into two classes for compilation purposes; the compiler can always determine whether a variable is referred to globally or not. Furthermore, when creating a closure (i.e. a FUNARG), the compiler can determine precisely what. variables are needed by the closure and include only those variables in the data structure for the closure, if it thinks that would be more efficient. For example, consider the following code skeleton: (LAMBDA (A B C D E) (LAMBDA(FG)... B... E... H...) It is quite clear that H is a global variable and so must be 'special', whereas B and E are local (though global to the inner LAMBDA). When the compiler creates code to close the inner LAMBDA expression, the closure need only include the variables B and E, and not A, C, or D. The latter variables in fact can be kept in registers; only B and E need be kept in a semi- permanent data structure, and even then only if the inner closure is actually created. Hewitt [Hewitt 76) has mentioned this idea repeatedly, saying actors are distinguished from LISP closures in that actor closures contain precisely those "acquaintances" which are necessary for the actor closure to run, whereas LISP closures may contain arbitrary numbers of unnecessary variable bindings. This indeed is an extremely important point to us here, but he failed to discuss two aspects of this idea: (1) Hewitt spoke in the context of interpreters and other "incremental" implementations rather than of full-blown compilers. In an interpreter it is much more convenient to use a uniform closure method than to run around determining which variables are actually needed for the closure. In fact, to do this efficiently in PLASMA, it is necessary to perform a "reduction' pre-pass on the expression, which is essentially a semi-compilation of the code; it is perhaps unfair to compare a compiler to an interpreter. {Note PLASMA Reduction) In any case, the semantics of the language are unaffected; it doesn't matter that extra variable bindings are present if they are not referred to. Thus this is an efficiency question only, a question of what a compiler can do to save storage, and not a question of semantics. (2) It is not always more efficient to create minimal closures! Consider the following case: (J..AMBDA (A B C D) (LAMBDA() A B...) (LAMBDA () A C ...) (LAMBDA() A (LAMBDA () B C...) (LAMBDA () B D ...) (LAMBDA() C The six closures, if each created minimally, will together contain twelve variable bindings; but if they shared the single environment containing A, B, C, and D as in a LISP interpreter, there would be only four bindings. Thus PLASMA may in certain cases take more storage with its closure strategy rather than less. On the other hand, suppose five of the closures are used immediately and then discarded, and only the sixth survives indefinitely. Then in the long run, PLASMA's strategy would do better! The moral is that neither strategy is guaranteed to be the more efficient in any absolute sense, since the efficiency can be made a function of the behavior of the user's program, not just of the textual form of the program. The compiler should be prepared to make a decision as to which is more efficient (and in some simple and common cases such a choice can be made correctly), and perhaps to accept advice from the user in the form of declarations. It seems, then, that if these ideas are brought to bear, lexical binding need not be expensive. This leaves the question of whether to abandon dynamic binding completely. Steele and Sussman [Steele 76] demonstrate clearly the technique for simulating dynamic binding in a lexically scoped language; they also make a case for separating the two kinds of variables and having two completely distinct binding mechanisms, exhibiting a programming example which cannot be coded easily using only dynamic binding or only lexical scoping. The two mechanisms naturally require different compilation techniques (one difference is that fluid'variables, unlike static ones, are somewhat tied down to particular locations or search mechanisms because it cannot generally be determined at compile time who will reference a variable when), but they are each so valuable in certain contexts that in a general- purpose programming language it would be foolish to abandon either. 3. LAMBDA, Actors, and Continuations Suppose that we choose a set of primitive operators which are not functions. This will surely produce a radical change in our style of programming, but, by the argument of the previous section, it will not change our interpretation of LAMBDA and function calling. A comparison between our view of LAMBDA and the notion of actors as presented by Hewitt will motivate the choice of a certain set of non-functional primitives which lead to the so- called continuation-passing' style. LtActort!Slosur!Afl ta~ In [Sussman 75] Sussman and Steele note that actors (other than those which embody side effects and synchronization) and closures of LAMBDA expressions are isomorphic in their behavior. Smith and Hewitt [Smith 75] describe an actor as a combination of a script (code to be executed) and a set of acquaintances (computational quantities available to the code). A LISP closure in like manner is a combination of a body of code and a set of variable bindings (or, using our idea of renaming, a set of computational quantities with (possibly implicitly) associated names). Hewitt [Hewitt 16] has challenged this isomorphism, saying that closures may contain unnecessary quantities, but I have already dealt with this issue above. Let us therefore examine this isomorphism more closely. We have noted above that it is more accurate to think of the caller of a LAMBDA as performing a GOTO rather than the LAMBDA itself. It is the operation of invocation that is the transfer of control. This transfer of control is similar to the transfer of control from one actor to another. In the actors model, when control is passed from one actor to another, more than a GOTO is performed. A computed quantity, the message, is passed to the invoked actor. This corresponds to the set of arguments passed to a LAMBDA expression. Now if we wish to regard the actor/LAMBDA expression as a black box, then we need not be concerned with the renaming operation; all we care about is that an answer eventually comes out. We do not care that the LAMBDA expression will 'spread the set of arguments out and assign names to various quantities. In fact, there are times when the caller may not wish to think of the argument set as a set of distinct values; this attitude is reflected in the APPLY primitive of LISP, and in the FEXPR calling convention. The actors model points out that, at the interface of caller and callee, we may usefully think of the argument set as a single entity. In the actors model, one important element of the standard message is the continuation. This is equivalent to the notion of return address in a LISP system (more accurately, the continuation is equivalent to the return address plus all the quantities i4hich will be needed by the code at that address). We do not normally think of the return address as an argument to a LAMBDA expression, because standard LISP notation suppresses that fact. On the one hand, though, Steele and Sussman [Steele 76] point out that it is possible to write LISP code in such a manner that return addresses are passed explicitly. (This style corresponds to the use in PLASMA of •.> and (:= to the exclusion of in>,