7.1.1 Lexical structure

This section describes how individual tokens (identifiers, numbers, etc.) are formed from sequences of characters. The following sections describe how expressions and programs are formed from sequences of tokens.

Intertoken space can occur on either side of any token, but not within a token.

Identifiers that do not begin with a vertical line are terminated by a delimiter or by the end of the input. So are dot, numbers, characters, and booleans. Identifiers that begin with a vertical line are terminated by another vertical line.

The following four characters from the ASCII repertoire are reserved for future extensions to the language: ‘[ ] { }

In addition to the identifier characters of the ASCII repertoire specified below, Scheme implementations may permit any additional repertoire of Unicode characters to be employed in identifiers, provided that each such character has a Unicode general category of Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd, Nl, No, Pd, Pc, Po, Sc, Sm, Sk, So, or Co, or is U+200C or U+200D (the zero-width non-joiner and joiner, respectively, which are needed for correct spelling in Persian, Hindi, and other languages). However, it is an error for the first character to have a general category of Nd, Mc, or Me. It is also an error to use a non-Unicode character in symbols or identifiers.

All Scheme implementations must permit the escape sequence ‘\xhex digits;’ to appear in Scheme identifiers that are enclosed in vertical lines. If the character with the given Unicode scalar value is supported by the implementation, identifiers containing such a sequence are equivalent to identifiers containing the corresponding character.

token  identifier | boolean | number
    | character | string
    | ( | ) | #( | #u8( | ' | ` | , | ,@ | .
delimiter  whitespace | vertical line
    | ( | ) | " | ;
intraline whitespace  space or tab
whitespace  intraline whitespace | line ending
vertical line  |
line ending  newline | return newline
    | return
comment  ; all subsequent characters up to a line ending
    | nested comment
    | #; intertoken space datum
nested comment  #| comment text
                      comment cont* |#
comment text  character sequence not containing #| or |#
comment cont  nested comment comment text
directive  #!fold-case | #!no-fold-case

Note that it is ungrammatical to follow a directive with anything but a delimiter or the end of file.

atmosphere  whitespace | comment | directive
intertoken space  atmosphere*

Note that ‘+i’, ‘-i’ and infnan below are exceptions to the peculiar identifier rule; they are parsed as numbers, not identifiers.

identifier  initial subsequent*
    | vertical line symbol element* vertical line
    | peculiar identifier
initial  letter | special initial
letter  a | b | c |  | z
    | A | B | C |  | Z
special initial  ! | $ | % | & | * | / | : | < | =
    | > | ? | @ | ^ | _ | ~
subsequent  initial | digit | special subsequent
digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
hex digit  digit | a | b | c | d | e | f
explicit sign  + | -
special subsequent  explicit sign | . | @
inline hex escape  \xhex scalar value;
hex scalar value  hex digit+
mnemonic escape  \a | \b | \t | \n | \r
peculiar identifier  explicit sign
    | explicit sign sign subsequent subsequent*
    | explicit sign . dot subsequent subsequent*
    | . dot subsequent subsequent*
dot subsequent  sign subsequent | .
exponent marker  e
sign  empty | + | -
exactness  empty | #i | #e
radix 2  #b
radix 8  #o
radix 10  empty | #d
radix 16  #x
digit 2  0 | 1
digit 8  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7
digit 10  digit
digit 16  digit 10 | a | b | c | d | e | f