Decoupling Lisp Syntax From Semantics

Summary: Common Lisp defines a simple regular syntax that is used by default for reading programs. It also provides a means to change that syntax, and an internal data representation for programs that is not text based.
prev next

site map



One may learn and use Lisp in the same manner as other traditional programming languages - by creating text based programs with an editor.

Lisp also allows programs to be represented as data. There are functions to evaluate (eg. interpret) programs and to compile programs. Unlike other languages, the representation of programs which is given to these functions is not text, but a list of program expressions.

The Lisp reader is a separate function which converts text to these expression data. This distinction is invisible most of the time. However, by changing the reader, one can change the syntax of the program text, without changing its semantics - i.e. without changing how it is evaluated or compiled.

The default printer automatically prints lisp data using the standard syntax.

The default configuration of the reader and printer is actually the only part of Lisp that uses the syntax. Everything else uses the expression representation of programs.

Some people like Lisp syntax. Others don't, and redefine the LISP acronym to stand for "Lots of Irritating, Silly Parentheses." If you tend to agree, it is useful to recognize that you can also change the reader to a syntax that you like better. Here's how the inventor of Lisp explained it in "History of Lisp", 1978:

This internal representation of symbolic information gives up the familiar infix notations in favor of a notation that simplifies the task of programming the substantive computations, e.g., logical deduction or algebraic simplification, differentiation or integration. If customary notations are to be used externally, translation programs must be written. Thus LISP programs use a prefix notation for algebraic expressions, because they usually must determine the main connective before deciding what to do next. In this, LISP differs from almost every other symbolic computation system. ... This feature probably accounts for LISP's success in competition with these languages, especially when large programs have to be written. The advantage is like that of binary computers over decimal--but larger.

... Another reason for the initial acceptance of awkwardnesses in the internal form of LISP is that we still expected to switch to writing programs as M-expressions [infix format]. The project of defining M-expressions precisely and compiling them or at least translating them into S-expressions was neither finalized nor explicitly abandoned. It just receded into the indefinite future, and a new generation of programmers appeared who preferred internal notation to any FORTRAN-like or ALGOL-like notation that could be devised.

... One can even conjecture that LISP owes its survival specifically to the fact that its programs are lists, which everyone, including me, has regarded as a disadvantage. Proposed replacements for LISP ... abandoned this feature in favor of an Algol-like syntax, leaving no target language for higher level systems.

The language Dylan, developed by Apple, Carnegie-Mellon University, and Harlequin, can be viewed as a C-like syntax for Lisp.

The actual Common Lisp syntax is defined as follows:

  1. Whitespace is ignored.
  2. In general, case is ignored by automatically translating everything to upper case, except within strings or when specifying individual characters as data. This behavior can be changed by the user.
  3. The first one or two characters of each expression define how the expression is to be read:
    1. The character " reads the text up to the next " as a string object.
    2. The character ( reads the text up to the next ) as a list object.
    3. The character # is used to introduce a two-character dispatch:
      1. #\ reads the remaining expression as a character object. Characters can have names. Examples: #\a #\A #\Return #\Control-G.
      2. #* reads the remaining expression as a bit vector. Example: #*1001
      3. #( reads the remaining expression (terminated by a closing )) as a vector. Example: #(1 2 3)
      4. #nA( reads the remaining expression (terminated by a closing )) as an n-dimensional array. Examples: #1A(1 2 3 4)
        #2A((1 2) (3 4))
      5. #S(name reads the remaining expression (terminated by a closing )) as a structure object of type name.
        Example: #S(point :x 1 :y 2).
      6. There are a few others as well, and more can be defined by the user.
    4. There are a few other single character dispatching characters, and more can be defined by the user.
    5. Everything else is either a number of a symbol. The reader is smart enough to recognize all the usual representations for integer, rational or floating point numbers. The base for rational numbers can be specified by the user.
The internal expression representation for programs is simply this:
  1. Symbols are names of variables.
  2. Everything else that is not a list is simply treated as literal data.
  3. A list is either a special operation or a function call:
    1. If the first item in the list is a symbol recognized as either a system-defined or user-defined special operator, the rules for that operator are followed to evaluate the list. There are special operators defined for defining functions, classes, variables, etc., as well as to control evaluation within a function (if, assignment, etc.).
    2. Otherwise the first item names a function, and the rest of the items in the list are arguments to the function.
That's it. No predence ordering. No statements.