Lexical structure (Revised7.0 Report on the Algorithmic Language Scheme (Unofficial))

Next: External representations, Up: Formal syntax [Contents][Index]

7.1.1 Lexical structure ¶

This section describes how individual tokens (identifiers, numbers, etc.) are formed from sequences of characters. The following sections describe how expressions and programs are formed from sequences of tokens.

⟨Intertoken space⟩ can occur on either side of any token, but not within a token.

Identifiers that do not begin with a vertical line are terminated by a ⟨delimiter⟩ or by the end of the input. So are dot, numbers, characters, and booleans. Identifiers that begin with a vertical line are terminated by another vertical line.

The following four characters from the ASCII repertoire are reserved for future extensions to the language: ‘[’ ‘]’ ‘{’ ‘}’

In addition to the identifier characters of the ASCII repertoire specified below, Scheme implementations may permit any additional repertoire of Unicode characters to be employed in identifiers, provided that each such character has a Unicode general category of Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd, Nl, No, Pd, Pc, Po, Sc, Sm, Sk, So, or Co, or is U+200C or U+200D (the zero-width non-joiner and joiner, respectively, which are needed for correct spelling in Persian, Hindi, and other languages). However, it is an error for the first character to have a general category of Nd, Mc, or Me. It is also an error to use a non-Unicode character in symbols or identifiers.

All Scheme implementations must permit the escape sequence ‘\x⟨hex digits⟩;’ to appear in Scheme identifiers that are enclosed in vertical lines. If the character with the given Unicode scalar value is supported by the implementation, identifiers containing such a sequence are equivalent to identifiers containing the corresponding character.

⟨token⟩ ⟶ ⟨identifier⟩ | ⟨boolean⟩ | ⟨number⟩
    | ⟨character⟩ | ⟨string⟩
    | ( | ) | #( | #u8( | ' | ` | , | ,@ | .
⟨delimiter⟩ ⟶ ⟨whitespace⟩ | ⟨vertical line⟩
    | ( | ) | " | ;
⟨intraline whitespace⟩ ⟶ ⟨space or tab⟩
⟨whitespace⟩ ⟶ ⟨intraline whitespace⟩ | ⟨line ending⟩
⟨vertical line⟩ ⟶ |
⟨line ending⟩ ⟶ ⟨newline⟩ | ⟨return⟩ ⟨newline⟩
    | ⟨return⟩
⟨comment⟩ ⟶ ; ⟨all subsequent characters up to a line ending⟩
    | ⟨nested comment⟩
    | #; ⟨intertoken space⟩ ⟨datum⟩
⟨nested comment⟩ ⟶ #| ⟨comment text⟩
                      ⟨comment cont⟩* |#
⟨comment text⟩ ⟶ ⟨character sequence not containing #| or |#⟩
⟨comment cont⟩ ⟶ ⟨nested comment⟩ ⟨comment text⟩
⟨directive⟩ ⟶ #!fold-case | #!no-fold-case

Note that it is ungrammatical to follow a ⟨directive⟩ with anything but a ⟨delimiter⟩ or the end of file.

⟨atmosphere⟩ ⟶ ⟨whitespace⟩ | ⟨comment⟩ | ⟨directive⟩
⟨intertoken space⟩ ⟶ ⟨atmosphere⟩*

Note that ‘+i’, ‘-i’ and ⟨infnan⟩ below are exceptions to the ⟨peculiar identifier⟩ rule; they are parsed as numbers, not identifiers.

⟨identifier⟩ ⟶ ⟨initial⟩ ⟨subsequent⟩*
    | ⟨vertical line⟩ ⟨symbol element⟩* ⟨vertical line⟩
    | ⟨peculiar identifier⟩
⟨initial⟩ ⟶ ⟨letter⟩ | ⟨special initial⟩
⟨letter⟩ ⟶ a | b | c | … | z
    | A | B | C | … | Z
⟨special initial⟩ ⟶ ! | $ | % | & | * | / | : | < | =
    | > | ? | @ | ^ | _ | ~
⟨subsequent⟩ ⟶ ⟨initial⟩ | ⟨digit⟩ | ⟨special subsequent⟩
⟨digit⟩ ⟶ 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
⟨hex digit⟩ ⟶ ⟨digit⟩ | a | b | c | d | e | f
⟨explicit sign⟩ ⟶ + | -
⟨special subsequent⟩ ⟶ ⟨explicit sign⟩ | . | @
⟨inline hex escape⟩ ⟶ \x⟨hex scalar value⟩;
⟨hex scalar value⟩ ⟶ ⟨hex digit⟩+
⟨mnemonic escape⟩ ⟶ \a | \b | \t | \n | \r
⟨peculiar identifier⟩ ⟶ ⟨explicit sign⟩
    | ⟨explicit sign⟩ ⟨sign subsequent⟩ ⟨subsequent⟩*
    | ⟨explicit sign⟩ . ⟨dot subsequent⟩ ⟨subsequent⟩*
    | . ⟨dot subsequent⟩ ⟨subsequent⟩*
⟨dot subsequent⟩ ⟶ ⟨sign subsequent⟩ | .
⟨exponent marker⟩ ⟶ e
⟨sign⟩ ⟶ ⟨empty⟩ | + | -
⟨exactness⟩ ⟶ ⟨empty⟩ | #i | #e
⟨radix 2⟩ ⟶ #b
⟨radix 8⟩ ⟶ #o
⟨radix 10⟩ ⟶ ⟨empty⟩ | #d
⟨radix 16⟩ ⟶ #x
⟨digit 2⟩ ⟶ 0 | 1
⟨digit 8⟩ ⟶ 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7
⟨digit 10⟩ ⟶ ⟨digit⟩
⟨digit 16⟩ ⟶ ⟨digit 10⟩ | a | b | c | d | e | f