SECD machine
编程语言有语法,各种数学逻辑、结构化数据都有语法。
乔姆斯基的语言体系及巴科斯范式是语法分析的基础,语法分析将字符串转换成有结构的抽象语法数据。对于语法的结构化表示,在命令式语言中使用数据结构,在函数式语言中使用列表或者自定义的数据类型。
函数式语言的抽象性使人常常忘记了语法分析。
归纳和递归是集合论、逻辑、计算理论的基础概念,同样也是程序语言理论的核心概念。
语言的实现方式包括编译和解释,对其理解的关键是环境和作用域,函数式语言天生适合这些任务。
解释型的实现包括高级的(抽象机、求值器)和低级的(编译器+虚拟机),它们之间可以相互转换。对于lambda演算,Bruijn编码,CPS转换,闭包转换是实现的基本工具。虚拟机有CAM,VEC,Zinc等,抽象机有Krivine,CEK,CLS,SECD等。Krivine,CEK是基于所谓的标准语义,分别使用传名调用和传值调用。CLS,SECD基于栈语义。
函数式语言的简单、高阶等特点使得函数式编程变化多样,出现许多不寻常的编程方法和模式,如monad。
类型论的研究包括语言中的应用和数学逻辑中的应用。这两个方面是相互联系的,相互借鉴的。语言中的类型系统设计要在灵活和表达性之间权衡,类型在程序优化、安全性、可扩展性、并行化、软件维护等方面都有应用。对于数学中的应用,类型论主要研究构造逻辑在类型论中的表示和推理,同时通过证明的构造可以获得满足特定规格的程序。
范畴是由一组对象和满足一定条件的态射组成的。
两个范畴之间的映射称为函子,而自然变换则是描述两个函子之间的关系。
基于以上的几个概念,范畴论能够作为数学基础语言代替集合论。大部分人已经习惯了集合论的直觉方式,所以理解范畴论就显得抽象多了。
将typed lambda 演算的类型视为对象,函数视为态射,则lambda 演算构成一个范畴。
笛卡尔闭范畴是指笛卡尔积(×),态射(→)封闭,它与含多参数函数、高阶函数的lambda演算形成的范畴等价。
monad利用类型系统给类型打上标记,对纯空间和非纯空间进行隔离,使得使用monad类型的函数永远返回monad类型,提供do语法模拟命令式风格。纯世界可进入非纯世界,在非纯世界中可以使用纯世界的数据,却最终无法逃脱非纯世界。
基于栈与基于寄存器的区别
基于寄存器的虚拟机:
1、使用堆栈来分配激活记录器
2、基于寄存器代码免去了使用push和pop命令的麻烦,减少了每个函数的指令总数。
3、代码尺寸和解码效率不如基于栈虚拟机,因为它包含操作数,所以指令大于基于堆栈的指令。但是基于寄存器产生更少的代码,所以总的代码数不会增加。
4、寄存器虚拟机必须从操作指令中解码操作数,需要额外的解码操作。
基于栈的虚拟机:
1、代码必须使用这些指令来移动变量(即push和pop)
2、代码尺寸小和解码效率会更高些
3、堆栈虚拟机指令有隐含的操作数。
来源: <https://blog.csdn.net/strliu/article/details/7906017 >
SECD machine
The SECD machine is a highly influential virtual machine and abstract machine intended as a target for functional programming language compilers. The letters stand for Stack, Environment,Control, Dump, the internal registers of the machine. These registers point to linked lists in memory.
The machine was the first to be specifically designed to evaluate lambda calculus expressions. It was originally described by Peter J. Landin as part of his ISWIM programming language definition in 1963. The description published by Landin was fairly abstract, and left many implementation choices open (like an operational semantics). Hence the SECD machine is often presented in a more detailed form, such as Peter Henderson's Lispkit Lisp compiler, which has been distributed since 1980. Since then it has been used as the target for several other experimental compilers.
In 1989 researchers at the University of Calgary worked on a hardware implementation of the machine.[1]
Contents
[hide]
Informal description[edit]
When evaluation of an expression begins, the expression is loaded as the only element of C. The environment E, stack S and dump D begin empty.
During evaluation of C it is converted to reverse Polish notation with ap (for apply) being the only operator. For example, the expression F (G X) (a single list element) is changed to the list X:G:ap:F:ap.
Evaluation of C proceeds similarly to other RPN expressions. If the first item in C is a value, it is pushed onto the stack S. More exactly, if the item is an identifier, the value pushed onto the stack will be the binding for that identifier in the current environment E. If the item is an abstraction, a closure is constructed to preserve the bindings of its free variables (which are in E), and it is this closure which is pushed onto the stack.
If the item is ap, two values are popped off the stack and the application done (first applied to second). If the result of the application is a value, it is pushed onto the stack.
If the application is of an abstraction to a value, however, it will result in a lambda calculus expression which may itself be an application (rather than a value), and so cannot be pushed onto the stack. In this case, the current contents of S, E, and C are pushed onto D (which is a stack of these triples), S is reinitialised to empty, and C is reinitialised to the application result with E containing the environment for the free variables of this expression, augmented with the binding that resulted from the application. Evaluation then proceeds as above.
Completed evaluation is indicated by C being empty, in which case the result will be on the stack S. The last saved evaluation state on D is then popped, and the result of the completed evaluation is pushed onto the stack contents restored from D. Evaluation of the restored state then continues as above.
If C and D are both empty, overall evaluation has completed with the result on the stack S.
Registers and memory[edit]
The SECD machine is stack-based. Functions take their arguments from the stack. The arguments to built-in instructions are encoded immediately after them in the instruction stream.
Like all internal data-structures, the stack is a list, with the S register pointing at the list's head or beginning. Due to the list structure, the stack need not be a continuous block of memory, so stack space is available as long as there is a single free memory cell. Even when all cells have been used, garbage collection may yield additional free memory. Obviously, specific implementations of the SECD structure can implement the stack as a canonical stack structure, so improving the overall efficiency of the virtual machine, provided that a strict bound be put on the dimension of the stack.
The C register points at the head of the code or instruction list that will be evaluated. Once the instruction there has been executed, the C is pointed at the next instruction in the list—it is similar to an instruction pointer (or program counter) in conventional machines, except that subsequent instructions are always specified during execution and are not by default contained in subsequent memory locations, as it is the case with the conventional machines.
The current variable environment is managed by the E register, which points at a list of lists. Each individual list represents one environment level: the parameters of the current function are in the head of the list, variables that are free in the current function, but bound by a surrounding function, are in other elements of E.
The dump, at whose head the D register points, is used as temporary storage for values of the other registers, for example during function calls. It can be likened to the return stack of other machines.
The memory organization of the SECD machine is similar to the model used by most functional language interpreters: a number of memory cells, each of which can hold either an atom (a simple value, for example 13), or represent an empty or non-empty list. In the latter case, the cell holds two pointers to other cells, one representing the first element, the other representing the list except for the first element. The two pointers are traditionally named car and cdr respectively—but the more modern terms head and tail are often used instead. The different types of values that a cell can hold are distinguished by a tag. Often different types of atoms (integers, strings, etc.) are distinguished as well.
So a list holding the numbers 1, 2, and 3, usually written as "(1 2 3)", could be represented as follows:
Address Tag Content (value for integers, car & cdr for lists) 9 [ integer | 2 ]
8 [ integer | 3 ]
7 [ list | 8 | 0 ]
6 [ list | 9 | 7 ]
...
2 [ list | 1 | 6 ]
1 [ integer | 1 ]
0 [ nil ]
The memory cells 3 to 5 do not belong to our list, the cells of which can be distributed randomly over the memory. Cell 2 is the head of the list, it points to cell 1 which holds the first element's value, and the list containing only 2 and 3 (beginning at cell 6). Cell 6 points at a cell holding 2 and at cell 7, which represents the list containing only 3. It does so by pointing at cell 8 containing the value 3, and pointing at an empty list (nil) as cdr. In the SECD machine, cell 0 always implicitly represents the empty list, so no special tag value is needed to signal an empty list (everything needing that can simply point to cell 0).
The principle that the cdr in a list cell must point at another list is just a convention. If both car and cdr point at atoms, that will yield a pair, usually written like "(1 . 2)"
Instructions[edit]
- nil pushes a nil pointer onto the stack
- ldc pushes a constant argument onto the stack
- ld pushes the value of a variable onto the stack. The variable is indicated by the argument, a pair. The pair's car specifies the level, the cdr the position. So "(1 . 3)" gives the current function's (level 1) third parameter.
- sel expects two list arguments, and pops a value from the stack. The first list is executed if the popped value was non-nil, the second list otherwise. Before one of these list pointers is made the new C, a pointer to the instruction following sel is saved on the dump.
- join pops a list reference from the dump and makes this the new value of C. This instruction occurs at the end of both alternatives of a sel.
- ldf takes one list argument representing a function. It constructs a closure (a pair containing the function and the current environment) and pushes that onto the stack.
- ap pops a closure and a list of parameter values from the stack. The closure is applied to the parameters by installing its environment as the current one, pushing the parameter list in front of that, clearing the stack, and setting C to the closure's function pointer. The previous values of S, E, and the next value of C are saved on the dump.
- ret pops one return value from the stack, restores S, E, and C from the dump, and pushes the return value onto the now-current stack.
- dum pushes a "dummy", an empty list, in front of the environment list.
- rap works like ap, only that it replaces an occurrence of a dummy environment with the current one, thus making recursive functions possible
A number of additional instructions for basic functions like car, cdr, list construction, integer addition, I/O, etc. exist. They all take any necessary parameters from the stack.
References[edit]
- Jump up^ A paper on the design, SECD: DESIGN ISSUES is available.
Further reading[edit]
- Danvy, Olivier. A Rational Deconstruction of Landin's SECD Machine. BRICS research report RS-04-30, 2004. ISSN 0909-0878
- Field, Anthony J. Field and Peter G. Harrison. 1988 Functional Programming. Addison-Wesley. ISBN 0-201-19249-7
- Graham, Brian T. 1992 "The SECD Microprocessor: A Verification Case Study". Springer. ISBN 0-7923-9245-0
- Henderson, Peter. 1980 Functional Programming: Application and Implementation. Prentice Hall. ISBN 0-13-331579-7
- Kogge, Peter M. The Architecture of Symbolic Computers. ISBN 0-07-035596-7
- Landin, P. J. (January 1964). "The Mechanical Evaluation of Expressions". Comput. J. 6 (4): 308–320. doi:10.1093/comjnl/6.4.308. edit
- Landin, P. J. (March 1966). "The next 700 programming languages" (PDF). Comm. ACM 9 (3): 157–166. doi:10.1145/365230.365257. edit
External links[edit]
SECD抽象机[编辑]
SECD 机是非常有影响的意图作为函数式编程语言编译器目标的虚拟机。SECD 分别是这个机器的内部寄存器的名字的首字母:Stack, Environment, Code, Dump。这些寄存器指向在内存中链表。
这种机器最初明确设计用来计算 lambda 演算表达式。最初 Peter J. Landin 在1963年把它作为他的 ISWIM编程语言定义的一部分描述。Landin 出版的描述非常抽象,(象一种操作语义那样)留下很多实现选择开放着。所以 SECD 机经常以更具体的形式出现,比如 Peter Henderson 的 Lispkit Lisp 编译器,它自1980年开始发行。此后它已经被用做多个其他实验编译器的目标。
在1989年在卡尔加里大学的研究者制作了这种机器的一个硬件实现。
寄存器和内存[编辑]
SECD 机是基于栈的,函数从栈中取得它们的形式参数(parameter)。与之相对,指令的实际参数(argument)跟在指令之后。
栈同所有内部数据结构一样是个列表,S 寄存器指向这个列表的头部或开始端。由于列表结构,栈不需要连续成块的内存,所以只要有一个单一空闲内存单元栈空间就是可获得的。即使在所有单元都已经使用了时候,垃圾收集仍可以产生额外的空闲内存。
C 寄存器指向要计算的代码或指令列表的头部。一旦指令已经被执行,C 就指向在列表中的下一个指令—它类似于常规机器中的“指令指针”(或程序计数器),除了后续指令不需要在后续内存位置上之外。
E 寄存器管理当前变量环境,它指向一个列表的列表。每个单独列表都代表一个环境级别:当前函数的形式参数位于列表的头部,在当前函数中*但受外围函数约束(bound)的变量在 E 的其他元素中。
D 寄存器指向转储(dump)的头部,它被用做其他寄存器的值的临时存储,例如在函数调用期间。它联系于其他机器的返回栈。
SECD 机的内存组织类似于大多数函数式编程语言解释器所用的模型:一些内存单元,每个都持有要么一个“原子”(一个简单值例如“13”),或表示一个空或非空列表。在后者情况,单元持有到其他单元的两个指针,一个表示第一个元素,另一个表示除去第一个元素之外的列表。这两个指针传统上分别叫做 car 和 cdr — 现在更常使用现代术语“头”和“尾”。单元持有值的不同类型由一个“标志”来区分。原子的不同类型(整数、字符串等等)经常是同样可区分的。
所以持有数“1”,“2”和“3”的一个列表,经常写为“(1 2 3)”,可以表示为如下:
地址 标志 内容(对整数是值,对列表是 car 与 cdr)
9 [ integer | 2 ]
8 [ integer | 3 ]
7 [ list | 8 | 0 ]
6 [ list | 9 | 7 ]
...
2 [ list | 1 | 6 ]
1 [ integer | 1 ]
0 [ nil ]
内存单元 3 到 5 不属于这个列表,它的单元可以在内存中随机分布。单元 2 是这个列表的头部,它指向持有第一个元素的值的单元 1,和只包含“2”和“3”的(开始于单元 6)列表。单元 6 指向持有“2”的单元,和表示只包含“3”的列表的单元 7。它接着指向持有值“3”的单元 8,和指向空列表(nil)作为 cdr。在 SECD 机中,单元 0 总是暗含表示空列表,所以不需要特殊的标志值来指示空列表(只需要简单的指向单元 0)。
在列表的单元中 cdr 必须指向另一个列表的原则只是个约定。如果 car 和 cdr 二者都指向原子,则生成一个点对,通常写为如“(1 . 2)”这样。
指令[编辑]
- nil 把一个 nil(空)指针压入栈顶
- ldc 把一个常量实际参数压入栈顶
- ld 把一个变量的值压入栈顶。这个变量是由实际参数指示的一个点对。这个点对的 car 指定级别,cdr 指定位置。所以“(1 . 3)”给出当前函数(级别 1)的第三个形式参数。
- sel 期望两个列表实际参数,并从栈顶弹出一个值。如果弹出的值非 nil 在执行第一个列表,否则执行第二个列表。在这些列表指针之一被作为新的 C 寄存器之前,保存到随后 sel 的指令的指针到转储上。
- join 从转储中弹出一个列表引用并把它作为 C 寄存器的新值。这个指令出现在 sel 的两个选择二者的结束处。
- ldf 接受表示函数的一个列表实际参数。它构造一个闭包(包含函数和当前环境的一个点对)并把它压入栈顶。
- ap 从栈顶弹出一个闭包和形式参数值的一个列表。通过安装这个闭包的环境为当前环境,把形式参数列表压在它的上面,清空栈,并设置 C 寄存器为这个闭包的函数指针,这样就把闭包应用于形式参数之上。S, E寄存器以前的值和 C 寄存器的下一个值被保存到转储上。
- ret 从栈顶弹出返回一个值,从转储中恢复 S, E 和 C 寄存器,并把这个返回值压入现在当前的栈顶。
- dum 把一个“哑元”也就是空列表压入环境列表的顶部。
- rap 如 ap 那样工作,唯一不同的是它把哑环境的出现替代为当前环境,因此使递归成为可能。
存在一些基本函数的补充指令如 car, cdr,列表构造,整数加法,I/O,等等。它们都必须从栈上取得形式参数。
进一步阅读[编辑]
- Peter M. Kogge: The Architecture of Symbolic Computers. ISBN 0-07-035596-7
- Peter Henderson, Functional Programming: Application and Implementation. Prentice Hall, 1980. ISBN 0-13-331579-7
- Anthony J. Field and Peter G. Harrison: Functional Programming. Addison-Wesley, 1988. ISBN 0-201-19249-7
- Olivier Danvy:A Rational Deconstruction of Landin's SECD Machine. BRICS research report RS-04-30, 2004. ISSN 0909-0878
外部链接[编辑]
- SECD collection
- Original SECD paper,"The Mechanical evaluation of Expressions" P.J. Landin The Computer Journal Vol. 6 pp308-320 1964
=================== End