Next: Evaluation and Compilation, Previous: Introduction, Up: Top [Contents][Index]
• Character Syntax | ||
• Reader Algorithm | ||
• Interpretation of Tokens | ||
• Standard Macro Characters |
Next: Reader Algorithm, Up: Syntax [Contents][Index]
The Lisp reader takes characters from a stream, interprets them as a printed representation of an object, constructs that object, and returns it.
The syntax described by this chapter is called the standard syntax. Operations are provided by Common Lisp so that various aspects of the syntax information represented by a readtable can be modified under program control; see Chapter 23 (Reader). Except as explicitly stated otherwise, the syntax used throughout this document is standard syntax.
• Readtables | ||
• Variables that affect the Lisp Reader | ||
• Standard Characters | ||
• Character Syntax Types |
Next: Variables that affect the Lisp Reader, Up: Character Syntax [Contents][Index]
Syntax information for use by the Lisp reader is embodied in an object called a readtable. Among other things, the readtable contains the association between characters and syntax types.
The next figure lists some defined names that are applicable to readtables.
|
Several readtables describing different syntaxes can exist,
but at any given time only one, called the
current readtable,
affects the way in which expressions2
into objects by the Lisp reader.
The current readtable in a given dynamic environment
is the value of *readtable*
in that environment.
To make a different readtable become the current readtable,
*readtable*
can be assigned or bound.
The standard readtable conforms to standard syntax. The consequences are undefined if an attempt is made to modify the standard readtable. To achieve the effect of altering or extending standard syntax, a copy of the standard readtable can be created; see the function copy-readtable.
The readtable case of the standard readtable is :upcase.
The initial readtable is the readtable that is the current readtable at the time when the Lisp image starts. At that time, it conforms to standard syntax. The initial readtable is distinct from the standard readtable. It is permissible for a conforming program to modify the initial readtable.
Next: Standard Characters, Previous: Readtables, Up: Character Syntax [Contents][Index]
The Lisp reader is influenced not only by the current readtable, but also by various dynamic variables. The next figure lists the variables that influence the behavior of the Lisp reader.
|
Next: Character Syntax Types, Previous: Variables that affect the Lisp Reader, Up: Character Syntax [Contents][Index]
All implementations must support a character repertoire
called standard-char
; characters that are members of that
repertoire are called
standard characters.
The standard-char
repertoire consists of
the non-graphic character newline,
the graphic character space,
and the following additional
ninety-four graphic characters or their equivalents:
|
|
|
The graphic IDs are not used within Common Lisp, but are provided for cross reference purposes with ISO 6937/2. Note that the first letter of the graphic ID categorizes the character as follows: L—Latin, N—Numeric, S—Special.
Previous: Standard Characters, Up: Character Syntax [Contents][Index]
The Lisp reader constructs an object from the input text by interpreting each character according to its syntax type. The Lisp reader cannot accept as input everything that the Lisp printer produces, and the Lisp reader has features that are not used by the Lisp printer. The Lisp reader can be used as a lexical analyzer for a more general user-written parser.
When the Lisp reader is invoked, it reads a single character from the input stream and dispatches according to the syntax type of that character. Every character that can appear in the input stream is of one of the syntax types shown in Figure 2.6.
|
The syntax type of a character in a readtable determines how that character is interpreted by the Lisp reader while that readtable is the current readtable. At any given time, every character has exactly one syntax type.
Figure 2.7 lists the syntax type of each character in standard syntax.
|
The characters marked with an asterisk (*) are initially constituents,
but they are not used in any standard Common Lisp notations.
These characters are explicitly reserved to the programmer.
~
is not used in Common Lisp, and reserved to implementors.
$
and %
are alphabetic2
but are not used in the names of any standard Common Lisp defined names.
Whitespace2 ignored. Constituent and escape characters are accumulated to make a token, which is then interpreted as a number or symbol. Macro characters trigger the invocation of functions (possibly user-supplied) that can perform arbitrary parsing actions. Macro characters are divided into two kinds, terminating and non-terminating, depending on whether or not they terminate a token. The following are descriptions of each kind of syntax type.
Constituent characters are used in tokens. A token is a representation of a number or a symbol. Examples of constituent characters are letters and digits.
Letters in symbol names are sometimes converted to letters in the opposite case when the name is read; see Section 23.1.2 (Effect of Readtable Case on the Lisp Reader). Case conversion can be suppressed by the use of single escape or multiple escape characters.
Every character has one or more constituent traits that define how the character is to be interpreted by the Lisp reader when the character is a constituent character. These constituent traits are alphabetic2 digit, package marker, plus sign, minus sign, dot, decimal point, ratio marker, exponent marker, and invalid. Figure 2.8 shows the constituent traits of the standard characters and of certain semi-standard characters; no mechanism is provided for changing the constituent trait of a character. Any character with the alphadigit constituent trait in that figure is a digit if the current input base is greater than that character’s digit value, otherwise the character is alphabetic2 Any character quoted by a single escape is treated as an alphabetic2
|
The interpretations in this table apply only to characters whose syntax type is constituent. Entries marked with an asterisk (*) are normally shadowed2 because the indicated characters are of syntax type whitespace2 macro character, single escape, or multiple escape; these constituent traits apply to them only if their syntax types are changed to constituent.
Characters with the constituent trait invalid
cannot ever appear in a token
except under the control of a single escape character.
If an invalid character is encountered while an object is
being read, an error of type reader-error
is signaled.
If an invalid character is preceded by a single escape character,
it is treated as an alphabetic2
When the Lisp reader encounters a macro character on an input stream, special parsing of subsequent characters on the input stream is performed.
A macro character has an associated function
called a
reader macro function that implements its specialized parsing behavior.
An association of this kind can be established or modified under control of
a conforming program by using
the functions set-macro-character
and set-dispatch-macro-character
.
Upon encountering a macro character, the Lisp reader calls its reader macro function, which parses one specially formatted object from the input stream. The function either returns the parsed object, or else it returns no values to indicate that the characters scanned by the function are being ignored (e.g., in the case of a comment). Examples of macro characters are backquote, single-quote, left-parenthesis, and right-parenthesis.
A macro character is either terminating or non-terminating. The difference between terminating and non-terminating macro characters lies in what happens when such characters occur in the middle of a token. If a non-terminating macro character occurs in the middle of a token, the function associated with the non-terminating macro character is not called, and the non-terminating macro character does not terminate the token’s name; it becomes part of the name as if the macro character were really a constituent character. A terminating macro character terminates any token, and its associated reader macro function is called no matter where the character appears. The only non-terminating macro character in standard syntax is sharpsign.
If a character is a dispatching macro character C1
its reader macro function is a function supplied by the implementation.
This function reads decimal digit characters until a non-digit
C2
If any digits were read,
they are converted into a corresponding integer infix parameter P;
otherwise, the infix parameter P is nil
.
The terminating non-digit C2
(sometimes called a “sub-character” to emphasize its subordinate role in the dispatching)
that is looked up in the dispatch table associated with
the dispatching macro character C1
The reader macro function associated with the sub-character C2
is invoked with three arguments:
the stream,
the sub-character C2
and the infix parameter P.
For more information about dispatch characters,
see the function set-dispatch-macro-character.
For information about the macro characters that are available in standard syntax, see Section 2.4 (Standard Macro Characters).
A pair of multiple escape characters is used to indicate that an enclosed sequence of characters, including possible macro characters and whitespace2 are to be treated as alphabetic2 with case preserved. Any single escape and multiple escape characters that are to appear in the sequence must be preceded by a single escape character.
Vertical-bar is a multiple escape character in standard syntax.
;; The following examples assume the readtable case of *readtable* ;; and *print-case* are both :upcase. (eq 'abc 'ABC) → true (eq 'abc '|ABC|) → true (eq 'abc 'a|B|c) → true (eq 'abc '|abc|) → false
A single escape is used to indicate that the next character is to be treated as an alphabetic2 with its case preserved, no matter what the character is or which constituent traits it has.
Backslash is a single escape character in standard syntax.
;; The following examples assume the readtable case of *readtable* ;; and *print-case* are both :upcase. (eq 'abc '\A\B\C) → true (eq 'abc 'a\Bc) → true (eq 'abc '\ABC) → true (eq 'abc '\abc) → false
Whitespace2
Space and newline are whitespace2 in standard syntax.
(length '(this-that)) → 1 (length '(this - that)) → 3 (length '(a b)) → 2 (+ 34) → 34 (+ 3 4) → 7
Next: Interpretation of Tokens, Previous: Character Syntax, Up: Syntax [Contents][Index]
This section describes the algorithm used by the Lisp reader to parse objects from an input character stream, including how the Lisp reader processes macro characters.
When dealing with tokens, the reader’s basic function is to distinguish representations of symbols from those of numbers. When a token is accumulated, it is assumed to represent a number if it satisfies the syntax for numbers listed in Figure 2.9. If it does not represent a number, it is then assumed to be a potential number if it satisfies the rules governing the syntax for a potential number. If a valid token is neither a representation of a number nor a potential number, it represents a symbol.
The algorithm performed by the Lisp reader is as follows:
read
.
Otherwise,
one character, x, is read from the input stream, and
dispatched according to the syntax type of x to one
of steps 2 to 7.
reader-error
is signaled.
The reader macro function may read characters from the input stream; if it does, it will see those characters following the macro character. The Lisp reader may be invoked recursively from the reader macro function.
The reader macro function must not have any side effects other than on the
input stream;
because of backtracking and restarting of the read
operation,
front ends to the Lisp reader (e.g., “editors” and “rubout handlers”)
may cause the reader macro function to be called repeatedly during the
reading of a single expression in which x only appears once.
The reader macro function may return zero values or one value. If one value is returned, then that value is returned as the result of the read operation; the algorithm is done. If zero values are returned, then step 1 is re-entered.
end-of-file
is signaled if at the end of file.
y is treated as if it is a constituent
whose only constituent trait is alphabetic2
y is used to begin a token, and step 8 is entered.
end-of-file
is signaled if at end of file.
Z is treated as if it is a constituent
whose only constituent trait is alphabetic2
Z is appended to the token being built,
and step 8 is repeated.
reader-error
is signaled.
unread-char
),
and then step 10 is entered.
read-preserving-whitespace
),
and then step 10 is entered.
end-of-file
is signaled.
Otherwise, a character, y, is read, and
one of the following actions is performed according to its syntax type:
end-of-file
is signaled if at end of file.
Z is treated as a constituent
whose only constituent trait is alphabetic2
Z is appended to the token being built,
and step 9 is repeated.
reader-error
is signaled.
reader-error
is signaled if the token is not of valid syntax.
Next: Standard Macro Characters, Previous: Reader Algorithm, Up: Syntax [Contents][Index]
• Numbers as Tokens | ||
• Constructing Numbers from Tokens | ||
• The Consing Dot | ||
• Symbols as Tokens | ||
• Valid Patterns for Tokens | ||
• Package System Consistency Rules |
When a token is read, it is interpreted as a number or symbol. The token is interpreted as a number if it satisfies the syntax for numbers specified in the next figure.
|
sign—a sign.
slash—a slash
decimal-point—a dot.
exponent-marker—an exponent marker.
decimal-digit—a digit in radix 10
.
digit—a digit in the current input radix.
To allow implementors and future Common Lisp standards to extend the syntax of numbers, a syntax for potential numbers is defined that is more general than the syntax for numbers. A token is a potential number if it satisfies all of the following requirements:
.
),
extension characters (^ or _
),
and number markers.
A number marker is a letter.
Whether a letter may be treated as a number marker depends on context,
but no letter that is adjacent to another letter may ever be treated as a number marker.
Exponent markers are number markers.
but not a
package marker.
The syntax involving a leading
package marker followed by a potential number is
not well-defined. The consequences of the use
of notation such as :1
, :1/2
, and :2^3
in a
position where an expression appropriate for read
is expected are unspecified.
If a potential number has number syntax,
a number of the appropriate type is constructed and returned,
if the number is representable in an implementation.
A number will not be representable in an implementation
if it is outside the boundaries set by the implementation-dependent
constants for numbers.
For example, specifying too large or too small an exponent for a float
may make the number impossible to represent in the implementation.
A ratio with denominator zero (such as -35/000
)
is not represented in any implementation.
When a token with the syntax of a number cannot be converted to an internal
number, an error of type reader-error
is signaled. An error
must not be signaled for specifying too many significant digits
for a float; a truncated or rounded value should be produced.
If there is an ambiguity as to whether a letter should be treated as a digit or as a number marker, the letter is treated as a digit.
A potential number cannot contain any escape characters. An escape character robs the following character of all syntactic qualities, forcing it to be strictly alphabetic2 potential number. For example, all of the following representations are interpreted as symbols, not numbers:
\256 25\64 1.0\E6 |100| 3\.14159 |3/4| 3\/4 5||
In each case, removing the escape character (or characters) would cause the token to be a potential number.
As examples, the tokens in the next figure are potential numbers, but they are not actually numbers, and so are reserved tokens; a conforming implementation is permitted, but not required, to define their meaning.
|
The tokens in the next figure are not potential numbers; they are always treated as symbols:
|
The tokens in the next figure are potential numbers
if the current input base is 16
,
but they are always treated as symbols if the current input base is 10
.
|
Next: The Consing Dot, Previous: Numbers as Tokens, Up: Interpretation of Tokens [Contents][Index]
A real is constructed directly from a corresponding numeric token; see Figure 2.9.
A complex is notated as a #C
(or #c
) followed by a list
of two reals; see Section 2.4.8.11 (Sharpsign C).
The reader macros #B
, #O
, #X
, and #R
may also be useful
in controlling the input radix in which rationals are parsed;
see Section 2.4.8.7 (Sharpsign B),
Section 2.4.8.8 (Sharpsign O),
Section 2.4.8.9 (Sharpsign X),
and Section 2.4.8.10 (Sharpsign R).
This section summarizes the full syntax for numbers.
Integers can be written as a sequence of digits,
optionally preceded by a sign and optionally followed by a decimal point;
see Figure 2.9.
When a decimal point is used,
the digits are taken to be in radix 10
;
when no decimal point is used,
the digits are taken to be in radix given by the current input base.
For information on how integers are printed, see Section 22.1.3.1.1 (Printing Integers).
Ratios can be written as an optional sign followed by two non-empty sequences of digits separated by a slash; see Figure 2.9. The second sequence may not consist entirely of zeros. Examples of ratios are in the next figure.
|
For information on how ratios are printed, see Section 22.1.3.1.2 (Printing Ratios).
Floats can be written in either decimal fraction or computerized
scientific notation: an optional sign, then a non-empty sequence of digits
with an embedded decimal point,
then an optional decimal exponent specification.
If there is no exponent specifier, then
the decimal point is required, and there must be digits
after it.
The exponent specifier consists of an exponent marker,
an optional sign, and a non-empty sequence of digits.
If no exponent specifier is present, or if the exponent marker e
(or E
) is used, then
the format specified
by *read-default-float-format*
is used.
See Figure 2.9.
An implementation may provide one or more kinds of float
that collectively make up the type float
.
The letters s
, f
, d
, and l
(or their
respective uppercase equivalents) explicitly specify the
use of the types short-float
, single-float
,
double-float
, and long-float
, respectively.
The internal format used for an external representation depends only on the exponent marker, and not on the number of decimal digits in the external representation.
The next figure contains examples of notations for floats:
|
For information on how floats are printed, see Section 22.1.3.1.3 (Printing Floats).
A complex has a Cartesian structure, with a real part and an imaginary part each of which is a real. The parts of a complex are not necessarily floats but both parts must be of the same type:
either both are rationals, or both are of the same float subtype.
When constructing a complex, if the specified parts are not the
same type, the parts are converted to be the same type
internally (i.e., the rational part is converted to a float).
An object of type (complex rational)
is converted internally
and represented thereafter as a rational if its imaginary part is an
integer whose value is 0.
For further information, see Section 2.4.8.11 (Sharpsign C) and Section 22.1.3.1.4 (Printing Complexes).
Next: Symbols as Tokens, Previous: Constructing Numbers from Tokens, Up: Interpretation of Tokens [Contents][Index]
If a token consists solely of dots (with no escape characters),
then an error of type reader-error
is signaled,
except in one circumstance:
if the token is a single dot
and appears in a situation where dotted pair notation permits a dot,
then it is accepted as part of such syntax and no error is signaled.
See Section 2.4.1 (Left-Parenthesis).
Next: Valid Patterns for Tokens, Previous: The Consing Dot, Up: Interpretation of Tokens [Contents][Index]
Any token that is not a potential number, does not contain a package marker, and does not consist entirely of dots will always be interpreted as a symbol. Any token that is a potential number but does not fit the number syntax is a reserved token and has an implementation-dependent interpretation. In all other cases, the token is construed to be the name of a symbol.
Examples of the printed representation of symbols are in the next figure. For presentational simplicity, these examples assume that the readtable case of the current readtable is :upcase.
|
|
In the process of parsing a symbol, it is implementation-dependent which implementation-defined attributes are removed from the characters forming a token that represents a symbol.
When parsing the syntax for a symbol, the Lisp reader looks up the name of that symbol in the current package. This lookup may involve looking in other packages whose external symbols are inherited by the current package. If the name is found, the corresponding symbol is returned. If the name is not found (that is, there is no symbol of that name accessible in the current package), a new symbol is created and is placed in the current package as an internal symbol. The current package becomes the owner (home package) of the symbol, and the symbol becomes interned in the current package. If the name is later read again while this same package is current, the same symbol will be found and returned.
Next: Package System Consistency Rules, Previous: Symbols as Tokens, Up: Interpretation of Tokens [Contents][Index]
The valid patterns for tokens are summarized in the next figure.
|
Note that nnnnn has number syntax, neither xxxxx nor ppppp has number syntax, and aaaaa has any syntax.
A summary of rules concerning package markers follows. In each case, examples are offered to illustrate the case; for presentational simplicity, the examples assume that the readtable case of the current readtable is :upcase.
KEYWORD
package.
It also sets the symbol-value
of the newly-created symbol to that
same symbol so that the symbol will self-evaluate.
For example,
:bar
, when read, interns BAR
as an external symbol in the KEYWORD
package.
For example,
foo:bar
, when read, looks up BAR
among the external symbols of
the package named FOO
.
For example,
foo::bar
, when read, interns BAR
in the package named FOO
.
For example,
bar
, when read, interns BAR
in the current package.
For example,
assuming the readtable case of the current readtable is :upcase,
editor:buffer
refers to the external symbol
named BUFFER
present in the package named editor
,
regardless of whether there is a symbol named BUFFER
in
the current package. If there is no package named
editor
, or if no symbol named BUFFER
is present in editor
, or if BUFFER
is not exported by
editor
, the reader signals
a correctable error.
If editor::buffer
is seen, the effect is exactly the same as
reading buffer
with the EDITOR
package being the current package.
Previous: Valid Patterns for Tokens, Up: Interpretation of Tokens [Contents][Index]
The following rules apply to the package system as long as
the value of *package*
is not changed:
Reading the same symbol name always results in the same symbol.
An interned symbol always prints as a sequence of characters that, when read back in, yields the same symbol.
For information about how the Lisp printer treats symbols, see Section 22.1.3.3 (Printing Symbols).
If two interned symbols are not the same, then their printed representations will be different sequences of characters.
These rules are true regardless of any implicit interning.
As long as the current package is not changed,
results are reproducible regardless of the order of loading files
or the exact history of what symbols were typed in when.
If the value of *package*
is changed and then changed back to the previous value,
consistency is maintained.
The rules can be violated by
changing the value of *package*
,
forcing a change to symbols
or to packages
or to both
by continuing from an error,
or calling one of the following functions:
unintern
,
unexport
,
shadow
,
shadowing-import
,
or unuse-package
.
An inconsistency only applies if one of the restrictions is violated
between two of the named symbols.
shadow
, unexport
, unintern
,
and shadowing-import
can only affect the consistency of
symbols with the same names (under string=
)
as the ones supplied as arguments.
Previous: Interpretation of Tokens, Up: Syntax [Contents][Index]
If the reader encounters a macro character, then its associated reader macro function is invoked and may produce an object to be returned. This function may read the characters following the macro character in the stream in any syntax and return the object represented by that syntax.
Any character can be made to be a macro character. The macro characters defined initially in a conforming implementation include the following:
• Left-Parenthesis | ||
• Right-Parenthesis | ||
• Single-Quote | ||
• Semicolon | ||
• Double-Quote | ||
• Backquote | ||
• Comma | ||
• Sharpsign | ||
• Re-Reading Abbreviated Expressions |
Next: Right-Parenthesis, Up: Standard Macro Characters [Contents][Index]
The left-parenthesis initiates reading of a list.
read
is called recursively to read successive objects
until a right parenthesis is found in the input stream.
A list of the objects read is returned. Thus
(a b c)
is read as a list of three objects
(the symbols a
, b
, and c
).
The right parenthesis need not immediately follow the printed representation of
the last object; whitespace2
characters and comments may precede it.
If no objects precede the right parenthesis, it reads as a list of zero objects (the empty list).
If a token that is just a dot not immediately preceded by an escape character is read after some object then exactly one more object must follow the dot, possibly preceded or followed by whitespace2 followed by the right parenthesis:
(a b c . d)
This means that the cdr of the last cons in the
list is not nil
,
but rather the object whose representation followed the dot.
The above example might have been the result of evaluating
(cons 'a (cons 'b (cons 'c 'd)))
Similarly,
(cons 'this-one 'that-one) → (this-one . that-one)
It is permissible for the object following the dot to be a list:
(a b c d . (e f . (g))) ≡ (a b c d e f g)
For information on how the Lisp printer prints lists and conses, see Section 22.1.3.5 (Printing Lists and Conses).
Next: Single-Quote, Previous: Left-Parenthesis, Up: Standard Macro Characters [Contents][Index]
The right-parenthesis is invalid except when used in conjunction with the left parenthesis character. For more information, see Section 2.2 (Reader Algorithm).
Next: Semicolon, Previous: Right-Parenthesis, Up: Standard Macro Characters [Contents][Index]
Syntax: '«exp»
A single-quote introduces an expression to be “quoted.”
Single-quote followed by an expression exp
is treated by the Lisp reader as an abbreviation for
and is parsed identically to the expression (quote exp)
.
See the special operator quote.
'foo → FOO ''foo → (QUOTE FOO) (car ''foo) → QUOTE
Next: Double-Quote, Previous: Single-Quote, Up: Standard Macro Characters [Contents][Index]
Syntax: ;«text»
A semicolon introduces characters that are to be ignored, such as comments. The semicolon and all characters up to and including the next newline or end of file are ignored.
(+ 3 ; three
4)
→ 7
Some text editors make assumptions about desired indentation based on the number of semicolons that begin a comment. The following style conventions are common, although not by any means universal.
Comments that begin with a single semicolon are all aligned to the same column at the right (sometimes called the “comment column”). The text of such a comment generally applies only to the line on which it appears. Occasionally two or three contain a single sentence together; this is sometimes indicated by indenting all but the first with an additional space (after the semicolon).
Comments that begin with a double semicolon are all aligned to the same level of indentation as a form would be at that same position in the code. The text of such a comment usually describes the state of the program at the point where the comment occurs, the code which follows the comment, or both.
Comments that begin with a triple semicolon are all aligned to the left margin. Usually they are used prior to a definition or set of definitions, rather than within a definition.
Comments that begin with a quadruple semicolon are all aligned to the left margin, and generally contain only a short piece of text that serve as a title for the code which follows, and might be used in the header or footer of a program that prepares code for presentation as a hardcopy document.
;;;; Math Utilities ;;; FIB computes the the Fibonacci function in the traditional ;;; recursive way. (defun fib (n) (check-type n integer) ;; At this point we're sure we have an integer argument. ;; Now we can get down to some serious computation. (cond ((< n 0) ;; Hey, this is just supposed to be a simple example. ;; Did you really expect me to handle the general case? (error "FIB got ~D as an argument." n)) ((< n 2) n) ;fib[0]=0 and fib[1]=1 ;; The cheap cases didn't work. ;; Nothing more to do but recurse. (t (+ (fib (- n 1)) ;The traditional formula (fib (- n 2)))))) ; is fib[n-1]+fib[n-2].
Next: Backquote, Previous: Semicolon, Up: Standard Macro Characters [Contents][Index]
Syntax: "«text»"
The double-quote is used to begin and end a string. When a double-quote is encountered, characters are read from the input stream and accumulated until another double-quote is encountered. If a single escape character is seen, the single escape character is discarded, the next character is accumulated, and accumulation continues. The accumulated characters up to but not including the matching double-quote are made into a simple string and returned. It is implementation-dependent which attributes of the accumulated characters are removed in this process.
Examples of the use of the double-quote character are in the next figure.
|
Note that to place a single escape character or a double-quote into a string, such a character must be preceded by a single escape character. Note, too, that a multiple escape character need not be quoted by a single escape character within a string.
For information on how the Lisp printer prints strings, see Section 22.1.3.4 (Printing Strings).
Next: Comma, Previous: Double-Quote, Up: Standard Macro Characters [Contents][Index]
The backquote introduces a template of a data structure to be built. For example, writing
`(cond ((numberp ,x) ,@y) (t (print ,x) ,@y))
is roughly equivalent to writing
(list 'cond (cons (list 'numberp x) y) (list* 't (list 'print x) y))
Where a comma
occurs in the template,
the expression
following the comma is to be evaluated to produce an object to
be inserted at that point. Assume b
has the value 3, for example, then
evaluating the form denoted by `(a b ,b ,(+ b 1) b)
produces
the result (a b 3 4 b)
.
If a comma is immediately followed by an at-sign,
then the form following the at-sign
is evaluated to produce a list of objects.
These objects are then “spliced” into place in the template. For
example, if x
has the value (a b c)
, then
`(x ,x ,@x foo ,(cadr x) bar ,(cdr x) baz ,@(cdr x))
→ (x (a b c) a b c foo b bar (b c) baz b c)
The backquote syntax can be summarized formally as follows.
`basic
is the same as 'basic
,
that is, (quote basic)
, for any expression
basic that is not a list or a general vector.
`,form
is the same as form, for any form, provided
that the representation of form does not begin with at-sign
or dot. (A similar caveat holds for all occurrences of a form after a comma.)
`,@form
has undefined consequences.
`(x1 x2 x3 ... xn . atom)
may be interpreted to mean
(append [ x1] [ x2] [ x3] ... [ xn] (quote atom))
where the brackets are used to indicate a transformation of an xj as follows:
[form]
is interpreted as (list `form)
,
which contains a backquoted form that must then be further interpreted.
[,form]
is interpreted as (list form)
.
[,@form]
is interpreted as form.
`(x1 x2 x3 ... xn)
may be interpreted to mean
the same as the backquoted form
`(x1 x2 x3 ... xn . nil
)
,
thereby reducing it to the previous case.
`(x1 x2 x3 ... xn . ,form)
may be interpreted to mean
(append [ x1] [ x2] [ x3] ... [ xn] form)
where the brackets indicate a transformation of an xj
as described above.
`(x1 x2 x3 ... xn . ,@form)
has undefined consequences.
`#(x1 x2 x3 ... xn)
may be interpreted to mean
(apply #'vector `(x1 x2 x3 ... xn))
.
Anywhere “,@
” may be used, the syntax “,.
” may be used instead
to indicate that it is permissible to operate destructively on
the list structure produced by the form following the “,.
”
(in effect, to use nconc
instead of append
).
If the backquote syntax is nested, the innermost backquoted form should be expanded first. This means that if several commas occur in a row, the leftmost one belongs to the innermost backquote.
An implementation is free to interpret a backquoted form F1
as any form F2
the same under equal
as the result implied by the above definition,
provided that the side-effect behavior of the substitute form F2
is also consistent with the description given above.
The constructed
copy of the template might or might not share list structure with the
template itself. As an example, the above definition implies that
`((,a b) ,c ,@d)
will be interpreted as if it were
(append (list (append (list a) (list 'b) 'nil
)) (list c) d 'nil
)
but it could also be legitimately interpreted to mean any of the following:
(append (list (append (list a) (list 'b))) (list c) d) (append (list (append (list a) '(b))) (list c) d) (list* (cons a '(b)) c d) (list* (cons a (list 'b)) c d) (append (list (cons a '(b))) (list c) d) (list* (cons a '(b)) c (copy-list d))
Since the exact manner in which the Lisp reader will parse an expression involving the backquote reader macro is not specified, an implementation is free to choose any representation that preserves the semantics described.
Often an implementation will choose a representation that facilitates
pretty printing of the expression, so that (pprint `(a ,b))
will display
`(a ,b)
and not, for example, (list 'a b)
. However, this is not a
requirement.
Implementors who have no particular reason to make one choice or another might wish to refer to IEEE Standard for the Scheme Programming Language, which identifies a popular choice of representation for such expressions that might provide useful to be useful compatibility for some user communities. There is no requirement, however, that any conforming implementation use this particular representation. This information is provided merely for cross-reference purposes.
Next: Sharpsign, Previous: Backquote, Up: Standard Macro Characters [Contents][Index]
The comma is part of the backquote syntax; see Section 2.4.6 (Backquote). Comma is invalid if used other than inside the body of a backquote expression as described above.
Next: Re-Reading Abbreviated Expressions, Previous: Comma, Up: Standard Macro Characters [Contents][Index]
Sharpsign is a non-terminating dispatching macro character. It reads an optional sequence of digits and then one more character, and uses that character to select a function to run as a reader macro function.
The standard syntax includes constructs introduced by the #
character.
The syntax of these constructs is as follows:
a character that identifies the type of construct is
followed by arguments in some form.
If the character is a letter, its case is not important;
#O
and #o
are considered to be equivalent, for example.
Certain #
constructs allow an unsigned decimal number to appear
between the #
and the character.
The reader macros associated with the dispatching macro character #
are described later in this section and summarized in the next figure.
|
The combinations marked by an asterisk (*) are explicitly reserved to the user. No conforming implementation defines them.
Note also that digits do not appear in the preceding table. This is
because the notations #0
, #1
, ..., #9
are
reserved for another purpose which occupies the same syntactic space.
When a digit follows a sharpsign,
it is not treated as a dispatch character.
Instead, an unsigned integer argument is accumulated
and passed as an argument to the reader macro
for the character that follows the digits.
For example,
#2A((1 2) (3 4))
is a use of #A
with an argument of 2
.
Syntax: #\«x»
When the token x is a single character long,
this parses as the literal character char.
Uppercase and lowercase letters are distinguished after #\
;
#\A
and #\a
denote different character objects.
Any single character works after #\
,
even those that are normally special to read
,
such as left-parenthesis and right-parenthesis.
In the single character case,
the x must be followed by a non-constituent character.
After #\
is read,
the reader backs up over the slash and then reads a token,
treating the initial slash as a single escape character
(whether it really is or not in the current readtable).
When the token x is more than one character long,
the x must have the syntax of a symbol
with no embedded package markers.
In this case, the sharpsign backslash notation
parses as the character whose name is (string-upcase x)
;
see Section 13.1.7 (Character Names).
For information about how the Lisp printer prints character objects, see Section 22.1.3.2 (Printing Characters).
Any expression preceded by #'
(sharpsign followed by single-quote),
as in #'expression
,
is treated by the Lisp reader as an abbreviation for and parsed identically
to the expression (function expression)
.
See function
. For example,
(apply #'+ l) ≡ (apply (function +) l)
#(
and )
are used to notate a simple vector.
If an unsigned decimal integer
appears between the #
and (
,
it specifies explicitly the length of the vector.
The consequences are undefined if the number of objects
specified before the closing )
exceeds the unsigned decimal integer.
If the number of objects supplied before the closing )
is less than the unsigned decimal integer but greater than zero,
the last object
is used to fill all
remaining elements of the vector.
The consequences are undefined if the unsigned decimal integer is non-zero and
number of objects supplied before the closing )
is zero.
For example,
#(a b c c c c) #6(a b c c c c) #6(a b c) #6(a b c c)
all mean the same thing: a vector of length 6
with elements a
, b
, and four occurrences of c
.
Other examples follow:
#(a b c) ;A vector of length 3 #(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47) ;A vector containing the primes below 50 #() ;An empty vector
The notation #()
denotes an empty vector, as does #0()
.
For information on how the Lisp printer prints vectors, see Section 22.1.3.4 (Printing Strings), Section 22.1.3.6 (Printing Bit Vectors), or Section 22.1.3.7 (Printing Other Vectors).
Syntax: #*«bits»
A simple bit vector is constructed containing the indicated bits
(0
’s and 1
’s), where the leftmost bit has index zero
and the subsequent bits have increasing indices.
Syntax: #«n»*«bits»
With an argument n, the vector to be created is of length n. If the number of bits is less than n but greater than zero, the last bit is used to fill all remaining bits of the bit vector.
The notations #*
and #0*
each denote an empty bit vector.
Regardless of whether the optional numeric argument n is provided,
the token that follows the asterisk is delimited by
a normal token delimiter.
However, (unless the value of *read-suppress*
is true)
an error of type reader-error
is signaled
if that token is not composed entirely of 0
’s and 1
’s,
or if n was supplied
and the token is composed of more than n bits,
or if n is greater than one, but no bits were specified.
Neither a single escape nor a multiple escape is permitted in this token.
For information on how the Lisp printer prints bit vectors, see Section 22.1.3.6 (Printing Bit Vectors).
For example,
#*101111 #6*101111 #6*101 #6*1011
all mean the same thing: a vector of length 6
with elements 1
, 0
, 1
, 1
, 1
, and 1
.
For example:
#* ;An empty bit-vector
Syntax: #:«symbol-name»
#:
introduces an uninterned symbol whose name
is symbol-name. Every time this syntax is encountered,
a distinct uninterned symbol is created.
The symbol-name must have the syntax of a symbol
with no package prefix.
For information on how the Lisp reader prints uninterned symbols, see Section 22.1.3.3 (Printing Symbols).
#.foo
is read as the object resulting from the evaluation
of the object represented by foo.
The evaluation is done during the read
process,
when the #.
notation is encountered.
The #.
syntax therefore performs a read-time evaluation of foo.
The normal effect of #.
is inhibited when the value of *read-eval*
is false.
In that situation, an error of type reader-error
is signaled.
For an object
that does not have a convenient printed
representation, a form that computes the object can be given using
the #.
notation.
#B
rational reads rational in binary (radix 2).
For example,
#B1101 ≡ 13 ;11012 #b101/11 ≡ 5/3
The consequences are undefined if the token immediately following
the #B
does not have the syntax of a binary (i.e., radix 2) rational.
#O
rational reads rational in octal (radix 8).
For example,
#o37/15 ≡ 31/13 #o777 ≡ 511 #o105 ≡ 69 ;1058
The consequences are undefined if the token immediately following
the #O
does not have the syntax of an octal (i.e., radix 8) rational.
#X
rational reads rational in hexadecimal (radix 16).
The digits above 9
are the letters A
through F
(the lowercase
letters a
through f
are also acceptable). For example,
#xF00 ≡ 3840 #x105 ≡ 261 ;1051
The consequences are undefined if the token immediately following
the #X
does not have the syntax of a hexadecimal (i.e., radix 16) rational.
#nR
#radixRrational
reads rational in radix radix.
radix must consist of only digits
that are interpreted as an integer
in decimal radix; its value must be between 2 and 36 (inclusive).
Only valid digits
for the specified radix may be used.
For example, #3r102
is another way of writing 11
(decimal),
and #11R32
is another way of writing 35
(decimal).
For radices larger than 10, letters of
the alphabet are used in order for the digits after 9
.
No alternate #
notation exists for the decimal radix since a
decimal point suffices.
The next figure contains examples of the use of #B
,
#O
, #X
, and #R
.
|
The consequences are undefined if the token immediately following
the #nR
does not have the syntax of a rational in radix n.
#C
reads a following object, which must be a list of
length two whose elements are both reals.
These reals denote, respectively,
the real and imaginary parts of a complex number.
If the two parts as notated are not of the same data type,
then they are converted
according to the rules of floating-point contagion
described in Section 12.1.1.2 (Contagion in Numeric Operations).
#C(real imag)
is equivalent to
#.(complex (quote real) (quote imag))
,
except that #C
is not affected by *read-eval*
.
See the function complex (Function).
The next figure contains examples of the use of #C
.
|
For further information, see Section 22.1.3.1.4 (Printing Complexes) and Section 2.3.2.3 (Syntax of a Complex).
#nA
#n
constructs an n-dimensional array,
using object as the value of the :initial-contents argument
to A
objectmake-array
.
For example, #2A((0 1 5) (foo 2 (hot dog)))
represents a 2-by-3 matrix:
0 1 5 foo 2 (hot dog)
In contrast, #1A((0 1 5) (foo 2 (hot dog)))
represents a vector of length 2
whose elements are lists:
(0 1 5) (foo 2 (hot dog))
#0A((0 1 5) (foo 2 (hot dog)))
represents a zero-dimensional
array whose sole element is a list:
((0 1 5) (foo 2 (hot dog)))
#0A foo
represents
a zero-dimensional array whose sole element is the
symbol foo
.
The notation #1A foo
is not valid because foo
is
not a sequence.
If some dimension of the array
whose representation is being parsed is found to be 0
,
all dimensions to the right
(i.e., the higher numbered dimensions)
are also considered to be 0
.
For information on how the Lisp printer prints arrays, see Section 22.1.3.4 (Printing Strings), Section 22.1.3.6 (Printing Bit Vectors), Section 22.1.3.7 (Printing Other Vectors), or Section 22.1.3.8 (Printing Other Arrays).
#s(name slot1 value1 slot2 value2 ...)
denotes a structure. This is valid only if name is the name
of a structure type already defined by defstruct
and if the structure type has a standard constructor function.
Let cm stand for the name of this constructor function;
then this syntax is equivalent to
#.(cm keyword1 'value1 keyword2 'value2 ...)
where each keywordj is the result of computing
(intern (string slotj) (find-package 'keyword))
The net effect is that the constructor function is called with the specified
slots having the specified values.
(This coercion feature is deprecated; in the future, keyword names will
be taken in the package they are read in, so symbols that are
actually in the KEYWORD
package should be used if that is what is desired.)
Whatever object the constructor function returns
is returned by the #S
syntax.
For information on how the Lisp printer prints structures, see Section 22.1.3.12 (Printing Structures).
#P
reads a following object, which must be a string.
#P«expression»
is equivalent to
#.(parse-namestring '«expression»)
,
except that #P
is not affected by *read-eval*
.
For information on how the Lisp printer prints pathnames, see Section 22.1.3.11 (Printing Pathnames).
#n=
#n=object
reads as whatever object
has object as its printed representation. However, that object
is labeled by n, a required unsigned decimal integer, for
possible reference by the syntax #n#
.
The scope of the label is the expression being read by the outermost
call to read
; within this expression,
the same label may not appear twice.
#n#
#n#
, where n is a required unsigned decimal
integer,
provides a reference to some object labeled by #n=
;
that is, #n#
represents a pointer to the same
(eq
) object labeled by #n=
.
For example, a structure created in the variable y
by this code:
(setq x (list 'p 'q)) (setq y (list (list 'a 'b) x 'foo x)) (rplacd (last y) (cdr y))
could be represented in this way:
((a b) . #1=(#2=(p q) foo #2# . #1#))
Without this notation, but with *print-length*
set to 10
and *print-circle*
set to nil
,
the structure would print in this way:
((a b) (p q) foo (p q) (p q) foo (p q) (p q) foo (p q) ...)
A reference #n#
may only occur after a label #n=
;
forward references are not permitted. The reference
may not appear as the labeled object itself (that is,
#n=#n#
) may not be written
because the object
labeled by #n=
is not well defined in this case.
#+
provides a read-time conditionalization facility;
the syntax is #+test expression
.
If the feature expression test succeeds,
then this textual notation represents an object
whose printed representation is expression.
If the feature expression test fails,
then this textual notation is treated as whitespace2
that is, it is as if the “#+
test expression”
did not appear and only a space appeared in its place.
For a detailed description of success and failure in feature expressions, see Section 24.1.2.1 (Feature Expressions).
#+
operates by first reading the feature expression
and then skipping over the form if the feature expression fails.
While reading the test, the current package is the KEYWORD
package.
Skipping over the form is accomplished by binding
*read-suppress*
to true and then calling read
.
For examples, see Section 24.1.2.1.1 (Examples of Feature Expressions).
#-
is like #+
except that it skips the expression if the test succeeds;
that is,
#-test expression ≡ #+(not test) expression
For examples, see Section 24.1.2.1.1 (Examples of Feature Expressions).
#|...|#
is treated as a comment by the reader.
It must be balanced with respect to other occurrences of #|
and |#
,
but otherwise may contain any characters whatsoever.
The following are some examples that exploit the #|...|#
notation:
;;; In this example, some debugging code is commented out with #|...|# ;;; Note that this kind of comment can occur in the middle of a line ;;; (because a delimiter marks where the end of the comment occurs) ;;; where a semicolon comment can only occur at the end of a line ;;; (because it comments out the rest of the line). (defun add3 (n) #|(format t "~&Adding 3 to ~D." n)|# (+ n 3)) ;;; The examples that follow show issues related to #| ... |# nesting. ;;; In this first example, #| and |# always occur properly paired, ;;; so nesting works naturally. (defun mention-fun-fact-1a () (format t "CL uses ; and #|...|# in comments.")) → MENTION-FUN-FACT-1A (mention-fun-fact-1a) ▷ CL uses ; and #|...|# in comments. → NIL #| (defun mention-fun-fact-1b () (format t "CL uses ; and #|...|# in comments.")) |# (fboundp 'mention-fun-fact-1b) → NIL ;;; In this example, vertical-bar followed by sharpsign needed to appear ;;; in a string without any matching sharpsign followed by vertical-bar ;;; having preceded this. To compensate, the programmer has included a ;;; slash separating the two characters. In case 2a, the slash is ;;; unnecessary but harmless, but in case 2b, the slash is critical to ;;; allowing the outer #| ... |# pair match. If the slash were not present, ;;; the outer comment would terminate prematurely. (defun mention-fun-fact-2a () (format t "Don't use |\# unmatched or you'll get in trouble!")) → MENTION-FUN-FACT-2A (mention-fun-fact-2a) ▷ Don't use |# unmatched or you'll get in trouble! → NIL #| (defun mention-fun-fact-2b () (format t "Don't use |\# unmatched or you'll get in trouble!") |# (fboundp 'mention-fun-fact-2b) → NIL ;;; In this example, the programmer attacks the mismatch problem in a ;;; different way. The sharpsign vertical bar in the comment is not needed ;;; for the correct parsing of the program normally (as in case 3a), but ;;; becomes important to avoid premature termination of a comment when such ;;; a program is commented out (as in case 3b). (defun mention-fun-fact-3a () ; #| (format t "Don't use |# unmatched or you'll get in trouble!")) → MENTION-FUN-FACT-3A (mention-fun-fact-3a) ▷ Don't use |# unmatched or you'll get in trouble! → NIL #| (defun mention-fun-fact-3b () ; #| (format t "Don't use |# unmatched or you'll get in trouble!")) |# (fboundp 'mention-fun-fact-3b) → NIL
Some text editors that purport to understand Lisp syntax treat any |...|
as balanced pairs that cannot nest (as if they were just balanced pairs of
the multiple escapes used in notating certain symbols). To compensate for
this deficiency, some programmers use the notation #||...#||...||#...||#
instead of #|...#|...|#...|#
. Note that this alternate usage is not
a different reader macro; it merely exploits the fact that the additional
vertical-bars occur within the comment in a way that tricks certain text editor
into better supporting nested comments. As such, one might sometimes see code
like:
#|| (+ #|| 3 ||# 4 5) ||#
Such code is equivalent to:
#| (+ #| 3 |# 4 5) |#
#<
is not valid reader syntax.
The Lisp reader will signal an error
of type reader-error
on encountering #<
.
This syntax is typically used in the printed representation
of objects that cannot be read back in.
#
followed immediately by whitespace1
The Lisp reader will signal an error of type reader-error
if it
encounters the reader macro notation #<Newline>
or #<Space>
.
This is not valid reader syntax.
The Lisp reader will signal an error
of type reader-error
upon encountering #)
.
Previous: Sharpsign, Up: Standard Macro Characters [Contents][Index]
Note that the Lisp reader will
generally
signal an error of type reader-error
when reading an expression2
abbreviated because of length or level limits
(see *print-level*
,
*print-length*
,
and *print-lines*
)
due to restrictions on “..
”, “...
”, “#
” followed by whitespace1
and “#)
”.
Previous: Sharpsign, Up: Standard Macro Characters [Contents][Index]