Iterators: Signs of Weakness in Object-Oriented Languages [1]Henry G. Baker Nimble Computer Corporation, 16231 Meadow Ridge Way, Encino, CA 91436 (818) 986-1436 (818) 986-1360 (FAX) Copyright (c) 1992 by Nimble Computer Corporation ______________________________________________________________________________________ The appearance of iterators in an object-oriented language appears to be inversely related to the power of the language's intrinsic control structures. Iterator classes are used for the sole purpose of enumerating the elements of an abstract collection class without revealing its implementation. We find that the availability of higher-order functions and function closures eliminates the need for these ad hoc iterator classes, in addition to providing the other benefits of "mostly functional programming". We examine a purely functional iteration protocol for the difficult task of comparing the collections of leaves on two non-isomorphic trees--the so-called "samefringe" problem--and find that its type signature requires recursive (cyclic) functional types. Although higher-order "member functions" and recursive (cyclic) functional types are unusual in object-oriented languages, they arise quite naturally and provide a satisfying programming style. ______________________________________________________________________________________ A. INTRODUCTION As a long-time Lisp programmer experienced in the use of mapcar and friends, I was perplexed by the appearance of "iterators" in various object-oriented programming languages--e.g., CLU [Liskov77] and C++ [Stroustrup86]. Since all of the usual forms of iteration can be trivially simulated using mapping functions, I couldn't understand the need for the apparently ad hoc notion of an iterator, whose sole function was to provide the machinery to enumerate the objects in a collection without revealing the implementation of the collection. However, recent experience with C and C++ has finally revealed the source of my cognitive dissonance--these languages lack the ability to write and use simple mapping functions like Lisp's mapcar! In particular, while C and C++ offer the ability to pass a function as an argument to another function (i.e., a "funarg"), the functions being passed cannot contain references to variables of intermediate scope (i.e., the functions are not function closures), and therefore they are essentially useless for most mapping processes. On the other hand, most languages of the Pascal family--e.g., Ada and Modula--provide for the creation and passing of function closures and therefore do not require iterators for simple iteration processes.[2][1] The primary purpose of this paper is a discussion of the expressive power of various languages, but efficiency is always a concern, especially with constructs that must examine large collections. It should therefore be pointed out that the expression of a loop in terms of mapping functions, function closures and recursive function-calling does not mean inefficiency. It is well-known how to compile "tail-recursive" functions into highly efficient iteration [Steele78], and high quality Lisp systems have been "open-coding" mapping functions for at least 20 years. B. MAPPING FUNCTIONS The Lisp language was the first language to make extensive use of recursion and mapping functions instead of Fortran and Algol-like DO/FOR iteration constructs. For example, the Lisp function mapc "maps" a function "down" a list--i.e., it applies the function successively to every element in the list. Here is one definition for mapc: (defun mapc (function list) ; [3][2] tail-recursive loop. (if (null list) nil (progn (funcall function (car list)) (mapc function (cdr list))))) For those people find it more productive to think of an iterative process instead of a mapping process, Common Lisp provides the macro dolist. The following fragment shows two different ways to print all of the elements of a list. (dolist (x l) (print x)) ; prints elements of list l. (mapc #'(lambda (x) (print x)) l) ; or more simply (after eta conversion), (mapc #'print l) ; prints elements of list l. Regardless of the syntax, the two forms are equivalent, as the following definition of the dolist macro shows. (defmacro dolist ((var list) &body forms) ; approximation to CL 'dolist'. `(mapc #'(lambda (,var) ,@forms) ,list)) Mapping functions are not limited to Lisp lists, as our definition of the Lisp dotimes iteration macro shows. (defmacro dotimes ((var limit) &body forms) ; [4][3] `(maptimes #'(lambda (,var) ,@forms) ,limit)) (defun maptimes (function n) (labels ((myloop (i) ; local tail-recursive function to perform loop. (if (= i n) nil (progn (funcall function i) (myloop (1+ i)))))) (myloop 0))) (maptimes #'print 10) ; prints numbers from 0 through 9. (dotimes (i 10) (print i)) ; prints numbers from 0 through 9. It should be clear that the more complicated iteration forms of Pascal, C, C++, etc., can all be emulated in the same fashion using mapping functions. C. GENERATORS AND CO-ROUTINES Iterators are a 1980's version of the 1950's concept of generators. A generator is a subroutine which is called like a function, but returns a "new" value every time it is called (i.e., it is emphatically not a mathematical function). The "random number generator" is prototypic of a generator. Generators were heavily investigated in the late 1960's and early 1970's in the context of Artificial Intelligence programs which had to "hypothesize and test". By hiding the mechanism for hypothesizing a solution inside a generator, the program could be made more modular and understandable. Generators have much in common with co-routines, and many generators are programmed as co-routines. Co-routines are most generally thought of as two independent programs that communicate by means of an intermediate buffer. Whenever the consumer of the generated objects requires another object, it takes one from the buffer, and if the buffer is currently empty, it waits for the generator to generate new objects and place them into the buffer. The generator works to fill up the buffer, and when it is full, it goes to sleep until room is once again available for new objects. The buffer size is usually finite, and may actually be zero, in which case the generator is activated directly by the consumer upon every request for a new object. One is tempted to think that every co-routine generator can be emulated by a function which has access to some persistent local state (e.g., an Algol-60 "own" variable), but this is not the case. The standard counter-example is the "samefringe" problem, which can be solved by two co-routined "fringe" generators, but cannot be solved by a function with only a bounded amount of persistent local state. The "fringe" of a finite tree is the simple enumeration of its leaves in left-to-right order. The "samefringe" problem is the problem of deciding whether two finite trees have the same fringe. The "samefringe" problem is not purely academic, but arises in the implementation of equality in sequence classes which have "lazy" concatenation semantics. In such lazy sequences, the concatenation operation is very fast, because it simply binds together two subsequences without examining the subsequences at all. The tradeoff comes when the sequences must be enumerated; the abstract sequence is obtained by examining the "leaves" of the concatenation tree in order. In other words, the abstract sequence is simply the "fringe" of the tree of concatenations. If one constructs at least one of the fringes as an explicit data structure, the samefringe problem is easily solved. (defun fringe (x &optional (c nil)) ; fringe of tree x. (if (atom x) (cons x c) (fringel x c))) (defun fringel (xl c) ; fringe of list of trees xl. (if (null xl) c (fringe (car xl) (fringel (cdr xl) c)))) (defun samefringe1 (x y) ; construct both fringe(x) and fringe(y). (equal (fringe x) (fringe y))) (defun samefringe2 (x y) ; construct only fringe(x). (cfringe (fringe x) y #'(lambda (lx) (null lx)))) (defun cfringe (lx y c) (if (atom y) (and (consp lx) (eql (car lx) y) (funcall c (cdr lx))) (cfringel lx y c))) (defun cfringel (lx yl c) (if (null yl) (funcall c lx) (cfringe lx (car yl) #'(lambda (lx) (cfringel lx (cdr yl) c))))) The problem with explicitly constructing the fringes is that in the case that one (or both) of the fringes is very large and the fringes don't match, we will have performed a great deal of construction work for nothing. We would like a "lazier" method for examining the fringes which does not perform any construction unless and until it is required. The samefringe problem can be solved without constructing either of the fringes as explicit data structures by utilizing a "continuation-passing" style of programming. The reason for using this style lies in the unbounded amount of storage required to hold the stacks used to explore each tree; any generator for a fringe must have access to an unbounded amount of storage. Co-routines can also solve the samefringe problem because each of the co-routines has its own tree-tracing stack. Clearly, any "iterator" solution to the samefringe problem must give the iterator enough storage to emulate these stacks. (defun eof (c) (funcall c nil t nil)) ; empty generator, send eof flag. (defun samefringe (x y) ; "lazy" samefringe. (samefringec #'(lambda (c) (genfringe x c #'eof)) #'(lambda (c) (genfringe y c #'eof)))) (defun samefringec (xg yg) ; check equality of leaves generated by genfringe. (funcall xg ; We don't need Scheme 1st class continuations. #'(lambda (x eofx xg) ; receive 1st elt., eof flag, & generator for rest. (funcall yg #'(lambda (y eofy yg) (or (and eofx eofy) ; equal if both reach eof simultaneously. (and (eql x y) (samefringec xg yg)))))))) (defun genfringe (x consumer genrest) ; call consumer with leaves found in x. (if (atom x) (funcall consumer x nil genrest) ; send 1st elt. & ~eof flag. (genfringel x consumer genrest))) (defun genfringel (xl consumer genrest) (if (null xl) (funcall genrest consumer) (genfringe (car xl) consumer #'(lambda (consumer) (genfringel (cdr xl) consumer genrest))))) We don't claim that this style of programming is particularly perspicuous, although it can be understood with practice. It does show how a relatively complex task can be solved in a completely functional way--i.e., there are no assignments in this code, nor are there any explicit conses which must later be garbage-collected. If all of the function closures which are created are stack-allocated (as if by C's alloca, see [5][Baker92CONS] ), then this solution doesn't do any heap allocation at all. It must, of course, utilize an amount of storage which is proportional to the sizes of the tree portions being compared, but all of this storage can be stack-allocated. A more perspicuous functional solution can be obtained using the idea of "lazy evaluation" from functional languages. The idea is to compute the fringe lazily by producing a lazy list of the tree leaves, which can be elegantly created by means of "lazy consing" [Friedman76]. These lazy lists are gradually coerced by an equality function until either a mismatch is found, or both fringes have been completely explored. We can emulate laziness in Lisp by surrounding any lazy arguments with a lambda having no parameters, which is stripped off when necessary by calling this lambda with no arguments. Furthermore, we can utilize Common Lisp's global macro facilities to insert the lambda's for us, and use its local macro facilities to insert the funcall's necessary to force the evaluation of the lazy arguments. For lsamefringe, we only need fringe's second argument to be lazy. (defmacro lcons (x y) ; a cons which is lazy in its second argument. `(cons ,x #'(lambda () ,y))) (defmacro lcdr (x) `(funcall (cdr ,x))) (defmacro fringe (x &optional (c nil)) ; make fringe lazy in its 2nd arg. `(lfringe ,x #'(lambda () ,c))) ; this macro must appear before fringel. (defun lfringe (x lc) (symbol-macrolet ((c '(funcall lc))) ; accessing 'c' forces arg. evaluation. (if (atom x) (lcons x c) ; cdr of new cons is unevaluated argument. (fringel x c)))) ; fringel is defined above. (defun lequal (x y) ; works only for 1-level list with lazy cdr. (or (and (null x) (null y)) (and (consp x) (consp y) (eql (car x) (car y)) (lequal (lcdr x) (lcdr y))))) (defun lsamefringe (x y) (lequal (fringe x) (fringe y))) Unfortunately, our lazy evaluation implementation uses "upward funargs", which are typically heap-allocated, and therefore may produce a lot of transient garbage. Nevertheless, in the most common case where the fringes are unlikely to match, the amount of consing for a lazy samefringe is far less than that for "strict" (non-lazy) samefringe. D. ITERATORS Iterators provide a means to "generate" (enumerate) all of the elements in a collection, but without revealing the structure and implementation of this collection. Below is an example of the use of iterators to print the elements of a collection of C int's. int i,first(void),next(void); boolean last(void); for (i=first(); !last(); i=next()) {printf("%d\n",i);} Iterator functions for a collection can be "member functions" of the C++ collection class itself, but this policy creates a problem if one wishes to enumerate the same collection for different purposes at the same time--e.g., if one wishes to enumerate the Cartesian product of a collection with itself. The problem is that the "state" of the iterator must be stored somewhere, and the only two places are in the instance of the collection or the class of the collection. If the state is stored in the class, then only one iterator can be active for the entire class, while if the state is stored in the instance, then only one iterator can be active for the collection. Neither of these policies provides the flexibility required for our Cartesian product problem. The solution usually proposed for C++ is to provide every collection class with a "friend" class called its "iterator class". Because the iterator class is a friend of the collection class, it is allowed to "see" its implementation, yet any state required for an iteration is stored in an iterator instance which is distinct from the collection instance itself. Since multiple iterators can be active on the same collection instance, iterators can solve the Cartesian product problem. E. EMULATING MAPPING IN C AND C++ Before we can show how mapping can replace iterators, we first explore what goes wrong when we attempt to program a mapping function--e.g. maptimes--in C. We first show maptimes itself. void maptimes(fn,limit) void (*fn)(int); int limit; {int i; for (i=0; iD, where A,B,C,D are ML-style "type variables". The first argument to genfringe (x) is the collection type itself--in this case Lisp's list type. The second argument to genfringe (customer) is obviously a function of 3 arguments itself, where the first argument is Lisp's list type, the second argument is a boolean, and the third argument is the same as genfringe's third argument; the result of calling this functional argument is the same as the result of calling genfringe itself Therefore, a more precise typing for genfringe is listx(listxbooleanxC->D)xC->D. In order to obtain the best typing for genfringe, we must type genfringel. A first approximation to the type of genfringel is listx(listxbooleanxC->D)xC->D--i.e., the same as the type of genfringe itself! But we can see from the recursive calling structure of genfringel that C=(listxbooleanxC->D)->D. In other words, C is no longer an ML-style type variable, but a recursive/cyclic type.[8][6] Interestingly, however, D remains an ML-style type variable which can be replaced by any type whatsoever, depending upon the nature of the functions which call genfringe; in the case of samefringec, D becomes the actual type boolean. The fact that typing our generator function produces a recursive/cyclic type should not be surprising, since collection types themselves are recursive/cyclic types (i.e., they can hold an indefinite number of elements of the base type). If one wants to be pedantic, one can say that the cyclic functional type of our generator is the functional analog to C++-style iterator classes, so we haven't gotten rid of iterator classes after all. However, our cyclic generator types do have some pleasant properties: 1) they involve only the collection type itself, the boolean type, and the cyclic generator type itself, so they do hide the implementation of the collection class; 2) the result type is a type variable, so these generators can be used in any mapping context; and 3) they are derived in a "purely functional" way, with no data structures defined for holding a local iterator state. Recursive types have long been staples of strongly-typed languages like Pascal, Ada, C, C++, etc. However, the recursive types in these languages are always recursive data types [Aho86,s.6.3], not recursive functional types, like those of our functional iteration protocols. Recursive functional types are no more difficult to understand or implement than recursive data types; the apparent reason for their absence from most languages seems to be lack of imagination regarding their benefits. In fact, one cannot execute iterative or recursive programs without the use of recursive functional types, because the lambda calculus Y operator [Barendregt84] [Gabriel88] used to implement recursion has the ultimate recursive functional type. Unfortunately, Ada83 (along with almost every other strongly typed language) does not support recursive types other than recursive data types, and therefore cannot support the cyclic generator types of our functional generators, which are more powerful than the simple Ada mappers of section F. One can still program powerful functional iteration protocols in Ada83, but these now require the publication of a second "iteration state" abstract type for every collection type--i.e., Ada83 still requires a form of iterator types. Each iterator function must accept such an iterator type as a "current state" argument, and produce the "next state" iterator type as one of its results; since the state is accepted and manipulated explicitly, the iterator subprogram has no hidden state and hence it is functional. Unfortunately, in the case of the samefringe problem, the requirement to make the state of the iteration explicit means the generation and manipulations of explicit stacks instead of the far more perspicuous recursive programming used in our examples, above. In other words, the lack of recursive functional types in Ada can turn simple "pop quiz" programming problems into full-blown group projects. This is not the way to obtain higher software productivity.[9][7] Below we show how functional generator protocols can be used to program approximations to standard mapping functions such as Common Lisp's every and mapcar, which will work on arbitrary sequences, including trees. (defun eof (c) (funcall c nil t nil)) ; same as in section C. (defmethod gen ((x tree) consumer genrest) (labels ((genfringel (xl consumer) (if (null xl) (funcall genrest consumer) (gen (the tree (car xl)) consumer #'(lambda (consumer) (genfringel (cdr xl) consumer)))))) (if (atom x) (funcall consumer x nil genrest) (genfringel x consumer)))) (defmethod gen ((x list) consumer genrest) (if (null x) (funcall genrest consumer) (funcall consumer (car x) nil #'(lambda (consumer) (gen (the list (cdr x)) consumer genrest))))) (defmethod gen ((x vector) consumer genrest) (labels ((myloop (i consumer) (if (= i (length x)) (funcall genrest consumer) (funcall consumer (aref x i) nil #'(lambda (consumer) (myloop (1+ i) consumer)))))) (myloop 0 consumer))) (defun every (fn x y) ; [10][8] true only if all fn applications are true. (labels ((myloop (xg yg) (funcall xg #'(lambda (x eofx xg) (funcall yg #'(lambda (y eofy yg) (or (and eofx eofy) (and (funcall fn x y) (myloop xg yg))))))))) (myloop #'(lambda (c) (gen x c #'eof)) #'(lambda (c) (gen y c #'eof))))) (defun samefringe (x y) (every #'eql x y)) ; example of every's use. (defun mapcar (fn x y) ; accepts 2 generic sequences, returns list of values. (labels ((myloop (xg yg) (funcall xg #'(lambda (x eofx xg) (funcall yg #'(lambda (y eofy yg) (if (or eofx eofy) nil (cons (funcall fn x y) (myloop xg yg))))))))) (myloop #'(lambda (c) (gen x c #'eof)) #'(lambda (c) (gen y c #'eof))))) I. CONCLUSIONS We have shown how the ability to define higher-order functions (functions which take other functions as arguments) and the ability to define local functions which are closed over their free lexical variables can be used to provide iteration capabilities for abstract collections without the need to define an iterator class for each such collection class. These higher-order functions can then be provided as member functions of the collection class itself to provide the needed iteration capability. We have shown an example of such a higher-order iterator definition in Ada83, which is not normally considered to be an object-oriented programming language. The popular object-oriented programming language C++ has neither functional closures, nor a first-class macro capability (which might be used to emulate our Ada83 scheme), which makes C++ too weak to provide proper iteration facilities within the collection class itself. Thus, the ad hoc scheme of pairing every collection class with its own iterator friend class is forced by the weakness of C++ control structures. Iterators are far less powerful a facility than higher-order functions, so the ad hoc solution of iterator classes does nothing to solve more complex problems like samefringe which are easily solved using higher-order functions. We have shown that functional mapping can be performed by means of functional generators whose type signatures are recursive/cyclic types, but recursive functional types rather than the more common recursive data types. While these generator types are analogous to C++-style iterator classes, our generator types are not ad hoc, and have just the properties one would expect from an abstract generator type--they hide the implementation of the collection class, and they have a generic type signature which can be used in a wide variety of contexts. In conclusion, the attempt by object-oriented languages to avoid the "complexities" of higher-order functions and recursive functional types appears to have backfired. "Iterators" may look like an easy way out, but they do nothing to solve hard problems like samefringe. Higher-order functions can not only solve samefringe, but solve it in a way that is far easier to understand than any solution involving explicit state. J. REFERENCES Ada83. Reference Manual for the Adareg. Programming Language. ANSI/MIL-STD-1815A-1983, U.S. Gov't Printing Off., Wash., DC, 1983. Aho, A.V., et al. Compilers: Principles, Techniques, and Tools. Addison-Wesley, Reading, MA, 1986. [11][Baker91SP] Baker, H.G. "Structured Programming with Limited Private Types in Ada: Nesting is for the Soaring Eagles". ACM Ada Letters XI,5 (July/Aug. 1991), 79-90. [12][Baker92CONS] Baker, H.G. "CONS Should Not CONS its Arguments, or A Lazy Alloc is a Smart Alloc". ACM Sigplan Not. 27,3 (March 1992),24-34. Barendregt, H.P. The Lambda Calculus: Its Syntax and Semantics. North-Holland, New York, 1984. Cohen, Ellis S. "Updating Elements of a Collection in Place". ACM Ada Letters 6,1 (1986),55-62. Coyle, C., and Grogono, P. "Building Abstract Iterators Using Continuations". ACM Sigplan Not. 26,2 (Feb. 1991), 17-24. Eckel, Bruce. "C++ Into the Future". Unix Review 10,5 (May 1992), 26-35. Friedman, D.P., and Wise, D.S. "CONS Should not Evaluate its Arguments". In S. Michaelson and R. Milner (eds.), Automata, Languages and Programming, Edinburgh U. Press, Edinburgh, Scotland, 1976, 257-284. Gabriel, R.P. "The Why of Y". Lisp Pointers 2,2 (Oct./Dec. 1988), 15-25. Harper, R., et al. "Standard ML". TR ECS-LFCS-86-2, Comp. Sci. Dept., Edinburgh, UK, March 1986. Liskov, B., et al. "Abstraction mechanisms in CLU". CACM 20,8 (Aug. 1977),564-576. Milner, R. "A theory of type polymorphism in programming". JCSS 17,3 (1978), 348-375. Shaw, M., et al. "Abstraction and verification in Alphard: Defining and specifying iteration and generators". CACM 20,8 (Aug. 1977),553-564. Steele, Guy L. Rabbit: A Compiler for Scheme. AI Memo 474, MIT, May 1978. [13][Steele90] Steele, Guy L. Common Lisp, the Language; 2nd Ed. Digital Press, Bedford, MA, 1990,1029p. Stroustrup, Bjarne. The C++ Programming Language. Addison-Wesley, Reading, MA, 1986. [1] Ada83 [Ada83] does not allow the passing of subprograms as arguments to other subprograms, but does allow the passing of functions as arguments to generic subprograms; this capability will suffice for the emulation of iterators. [2]This definition is not quite the same as Common Lisp mapc. [3]This definition is not quite the same as Common Lisp dotimes. [4]Well, it mostly works. Watch out for commas in the "loop" body that aren't surrounded by parentheses (ugh!). [5]In short, C macros are déclassé. [6]ML can infer a cyclic type if the "occur check" is removed from the unification algorithm, but don't try to print it! [7]I defy CASE tool advocates to show how their CASE tools can help in the solution of this problem. In other words, no CASE tool can make up for fundamental weaknesses in a programming language. [8]Our every handles exactly 2 sequences; do n sequences as an exercise (Hint: lazily iterate on every itself!). References 1. http://home.pipeline.com/~hbaker1/home.html 2. http://home.pipeline.com/~hbaker1/Iterator.html#fn0 3. http://home.pipeline.com/~hbaker1/Iterator.html#fn1 4. http://home.pipeline.com/~hbaker1/Iterator.html#fn2 5. http://home.pipeline.com/~hbaker1/LazyAlloc.html 6. http://home.pipeline.com/~hbaker1/Iterator.html#fn3 7. http://home.pipeline.com/~hbaker1/Iterator.html#fn4 8. http://home.pipeline.com/~hbaker1/Iterator.html#fn5 9. http://home.pipeline.com/~hbaker1/Iterator.html#fn6 10. http://home.pipeline.com/~hbaker1/Iterator.html#fn7 11. http://home.pipeline.com/~hbaker1/LPprogram.html 12. http://home.pipeline.com/~hbaker1/LazyAlloc.html 13. http://www.cs.cmu.edu:8001/Web/Groups/AI/html/cltl/cltl2.html