[lex.literal] - C++20 → C++23

Files changed (1) hide show

tmp/tmpben6t3zn/{from.md → to.md} +298 -282

tmp/tmpben6t3zn/{from.md → to.md} RENAMED Viewed

@@ -1,10 +1,10 @@
 ## Literals <a id="lex.literal">[[lex.literal]]</a>
 ### Kinds of literals <a id="lex.literal.kinds">[[lex.literal.kinds]]</a>
-There are several kinds of literals.[^11]
 ``` bnf
 literal:
     integer-literal
     character-literal
@@ -13,10 +13,13 @@ literal:
     boolean-literal
     pointer-literal
     user-defined-literal
 ```
 ### Integer literals <a id="lex.icon">[[lex.icon]]</a>
 ``` bnf
 integer-literal:
     binary-literal integer-suffixₒₚₜ
@@ -84,12 +87,14 @@ hexadecimal-digit: one of
 ``` bnf
 integer-suffix:
     unsigned-suffix long-suffixₒₚₜ
     unsigned-suffix long-long-suffixₒₚₜ
     long-suffix unsigned-suffixₒₚₜ
     long-long-suffix unsigned-suffixₒₚₜ
 ```
 ``` bnf
 unsigned-suffix: one of
     'u U'
@@ -103,10 +108,15 @@ long-suffix: one of
 ``` bnf
 long-long-suffix: one of
     'll LL'
 ```
 In an *integer-literal*, the sequence of *binary-digit*s,
 *octal-digit*s, *digit*s, or *hexadecimal-digit*s is interpreted as a
 base N integer as shown in table [[lex.icon.base]]; the lexically first
 digit of the sequence of digits is the most significant.
@@ -131,16 +141,16 @@ decimal values ten through fifteen.
 `0x10'0000`, and `0'004'000'000` all have the same
 value. — *end example*]
 The type of an *integer-literal* is the first type in the list in
 [[lex.icon.type]] corresponding to its optional *integer-suffix* in
-which its value can be represented. An *integer-literal* is a prvalue.
 **Table: Types of *integer-literal*s** <a id="lex.icon.type">[lex.icon.type]</a>
 | *integer-suffix* | *decimal-literal*                         | *integer-literal* other than *decimal-literal* |
-| ---------------- | ------------------------ | ---------------------------------------------- |
 | none             | `int`                                     | `int`                                          |
 |                  | `long int`                                | `unsigned int`                                 |
 |                  | `long long int`                           | `long int`                                     |
 |                  |                                           | `unsigned long int`                            |
 |                  |                                           | `long long int`                                |
@@ -156,10 +166,15 @@ which its value can be represented. An *integer-literal* is a prvalue.
 | and `l` or `L`   | `unsigned long long int`                  | `unsigned long long int`                       |
 | `ll` or `LL`     | `long long int`                           | `long long int`                                |
 |                  |                                           | `unsigned long long int`                       |
 | Both `u` or `U`  | `unsigned long long int`                  | `unsigned long long int`                       |
 | and `ll` or `LL` |                                           |                                                |
 If an *integer-literal* cannot be represented by any type in its list
 and an extended integer type [[basic.fundamental]] can represent its
 value, it may have that extended integer type. If all of the types in
@@ -189,157 +204,165 @@ c-char-sequence:
     c-char-sequence c-char
 ```
 ``` bnf
 c-char:
- any member of the basic source character set except the single-quote ''', backslash '\', or new-line character
     escape-sequence
     universal-character-name
 ```
 ``` bnf
 escape-sequence:
     simple-escape-sequence
     octal-escape-sequence
     hexadecimal-escape-sequence
 ```
 ``` bnf
-simple-escape-sequence: one of
- '\'' '\"' '\?' '\\'
- '\a' '\b' '\f' '\n' '\r' '\t' '\v'
 ```
 ``` bnf
 octal-escape-sequence:
     '\' octal-digit
     '\' octal-digit octal-digit
     '\' octal-digit octal-digit octal-digit
 ```
 ``` bnf
 hexadecimal-escape-sequence:
-    '\x' hexadecimal-digit
-    hexadecimal-escape-sequence hexadecimal-digit
 ```
-A *character-literal* that does not begin with `u8`, `u`, `U`, or `L` is
-an *ordinary character literal*. An ordinary character literal that
-contains a single *c-char* representable in the execution character set
-has type `char`, with value equal to the numerical value of the encoding
-of the *c-char* in the execution character set. An ordinary character
-literal that contains more than one *c-char* is a
-*multicharacter literal*. A multicharacter literal, or an ordinary
-character literal containing a single *c-char* not representable in the
-execution character set, is conditionally-supported, has type `int`, and
-has an *implementation-defined* value.
-A *character-literal* that begins with `u8`, such as `u8'w'`, is a
-*character-literal* of type `char8_t`, known as a *UTF-8 character
-literal*. The value of a UTF-8 character literal is equal to its ISO/IEC
-10646 code point value, provided that the code point value can be
-encoded as a single UTF-8 code unit.
-[*Note 1*: That is, provided the code point value is in the range
-[0, 7F] (hexadecimal). — *end note*]
-If the value is not representable with a single UTF-8 code unit, the
-program is ill-formed. A UTF-8 character literal containing multiple
-*c-char*s is ill-formed.
-A *character-literal* that begins with the letter `u`, such as `u'x'`,
-is a *character-literal* of type `char16_t`, known as a *UTF-16
-character literal*. The value of a UTF-16 character literal is equal to
-its ISO/IEC 10646 code point value, provided that the code point value
-is representable with a single 16-bit code unit.
-[*Note 2*: That is, provided the code point value is in the range
-[0, FFFF] (hexadecimal). — *end note*]
-If the value is not representable with a single 16-bit code unit, the
-program is ill-formed. A UTF-16 character literal containing multiple
-*c-char*s is ill-formed.
-A *character-literal* that begins with the letter `U`, such as `U'y'`,
-is a *character-literal* of type `char32_t`, known as a *UTF-32
-character literal*. The value of a UTF-32 character literal containing a
-single *c-char* is equal to its ISO/IEC 10646 code point value. A UTF-32
-character literal containing multiple *c-char*s is ill-formed.
-A *character-literal* that begins with the letter `L`, such as `L'z'`,
-is a *wide-character literal*. A wide-character literal has type
-`wchar_t`.[^12] The value of a wide-character literal containing a
-single *c-char* has value equal to the numerical value of the encoding
-of the *c-char* in the execution wide-character set, unless the *c-char*
-has no representation in the execution wide-character set, in which case
-the value is *implementation-defined*.
-[*Note 3*: The type `wchar_t` is able to represent all members of the
-execution wide-character set (see
-[[basic.fundamental]]). — *end note*]
-The value of a wide-character literal containing multiple *c-char*s is
-*implementation-defined*.
-Certain non-graphic characters, the single quote `'`, the double quote
-`"`, the question mark `?`,[^13] and the backslash `\`, can be
-represented according to [[lex.ccon.esc]]. The double quote `"` and the
-question mark `?`, can be represented as themselves or by the escape
-sequences `\"` and `\?` respectively, but the single quote `'` and the
-backslash `\` shall be represented by the escape sequences `\'` and `\\`
-respectively. Escape sequences in which the character following the
-backslash is not listed in [[lex.ccon.esc]] are conditionally-supported,
-with *implementation-defined* semantics. An escape sequence specifies a
-single character.
-**Table: Escape sequences** <a id="lex.ccon.esc">[lex.ccon.esc]</a>
-|                 |                |                    |
-| --------------- | -------------- | ------------------ |
-| new-line        | NL(LF)         | `\n`               |
-| horizontal tab  | HT             | `\t`               |
-| vertical tab    | VT             | `\v`               |
-| backspace       | BS             | `\b`               |
-| carriage return | CR             | `\r`               |
-| form feed       | FF             | `\f`               |
-| alert           | BEL            | `\a`               |
-| backslash       | \              | ``                 |
-| question mark   | ?              | `\?`               |
-| single quote    | `'`            | `\'`               |
-| double quote    | `"`            | `\"`               |
-| octal number    | \numconst{ooo} | `numconst{ooo}`    |
-| hex number      | \numconst{hhh} | `\x\numconst{hhh}` |
-The escape `\\numconst{ooo}` consists of the backslash followed by one,
-two, or three octal digits that are taken to specify the value of the
-desired character. The escape `\x\numconst{hhh}` consists of the
-backslash followed by `x` followed by one or more hexadecimal digits
-that are taken to specify the value of the desired character. There is
-no limit to the number of digits in a hexadecimal sequence. A sequence
-of octal or hexadecimal digits is terminated by the first character that
-is not an octal digit or a hexadecimal digit, respectively. The value of
-a *character-literal* is *implementation-defined* if it falls outside of
-the *implementation-defined* range defined for `char` (for
-*character-literal*s with no prefix) or `wchar_t` (for
-*character-literal*s prefixed by `L`).
-[*Note 4*: If the value of a *character-literal* prefixed by `u`, `u8`,
-or `U` is outside the range defined for its type, the program is
-ill-formed. — *end note*]
-A *universal-character-name* is translated to the encoding, in the
-appropriate execution character set, of the character named. If there is
-no such encoding, the *universal-character-name* is translated to an
-*implementation-defined* encoding.
-[*Note 5*: In translation phase 1, a *universal-character-name* is
-introduced whenever an actual extended character is encountered in the
-source text. Therefore, all extended characters are described in terms
-of *universal-character-name*s. However, the actual compiler
-implementation may use its own native character set, so long as the same
-results are obtained. — *end note*]
 ### Floating-point literals <a id="lex.fcon">[[lex.fcon]]</a>
 ``` bnf
 floating-point-literal:
@@ -394,23 +417,33 @@ digit-sequence:
     digit-sequence '''ₒₚₜ digit
 ```
 ``` bnf
 floating-point-suffix: one of
-    'f l F L'
 ```
-The type of a *floating-point-literal* is determined by its
 *floating-point-suffix* as specified in [[lex.fcon.type]].
 **Table: Types of *floating-point-literal*{s}** <a id="lex.fcon.type">[lex.fcon.type]</a>
 | *floating-point-suffix* | type              |
-| ----------------------- | --------------- |
 | none                    | `double`          |
 | `f` or `F`              | `float`           |
 | `l` or `L`              | `long` `double`   |
 The *significand* of a *floating-point-literal* is the
 *fractional-constant* or *digit-sequence* of a
 *decimal-floating-point-literal* or the
@@ -419,11 +452,11 @@ The *significand* of a *floating-point-literal* is the
 of *digit*s or *hexadecimal-digit*s and optional period are interpreted
 as a base N real number s, where N is 10 for a
 *decimal-floating-point-literal* and 16 for a
 *hexadecimal-floating-point-literal*.
-[*Note 1*: Any optional separating single quotes are ignored when
 determining the value. — *end note*]
 If an *exponent-part* or *binary-exponent-part* is present, the exponent
 e of the *floating-point-literal* is the result of interpreting the
 sequence of an optional *sign* and the *digit*s as a base 10 integer.
@@ -455,15 +488,21 @@ s-char-sequence:
     s-char-sequence s-char
 ```
 ``` bnf
 s-char:
- any member of the basic source character set except the double-quote '"', backslash '\', or new-line character
     escape-sequence
     universal-character-name
 ```
 ``` bnf
 raw-string:
     '"' d-char-sequenceₒₚₜ '(' r-char-sequenceₒₚₜ ')' d-char-sequenceₒₚₜ '"'
 ```
@@ -473,27 +512,43 @@ r-char-sequence:
     r-char-sequence r-char
 ```
 ``` bnf
 r-char:
-    any member of the source character set, except a right parenthesis ')' followed by
-       the initial *d-char-sequence* (which may be empty) followed by a double quote '"'.
 ```
 ``` bnf
 d-char-sequence:
     d-char
     d-char-sequence d-char
 ```
 ``` bnf
 d-char:
-    any member of the basic source character set except:
- space, the left parenthesis '(', the right parenthesis ')', the backslash '\', and the control characters
-       representing horizontal tab, vertical tab, form feed, and newline.
 ```
 A *string-literal* that has an `R` in the prefix is a *raw string
 literal*. The *d-char-sequence* serves as a delimiter. The terminating
 *d-char-sequence* of a *raw-string* is the same sequence of characters
 as the initial *d-char-sequence*. A *d-char-sequence* shall consist of
 at most 16 characters.
@@ -536,149 +591,130 @@ R"(x = "\"y\"")"
 is equivalent to `"x = \"\\\"y\\\"\""`.
 — *end example*]
-After translation phase 6, a *string-literal* that does not begin with
-an *encoding-prefix* is an *ordinary string literal*. An ordinary string
-literal has type “array of *n* `const char`” where *n* is the size of
-the string as defined below, has static storage duration [[basic.stc]],
-and is initialized with the given characters.
-A *string-literal* that begins with `u8`, such as `u8"asdf"`, is a
-*UTF-8 string literal*. A UTF-8 string literal has type “array of *n*
-`const char8_t`”, where *n* is the size of the string as defined below;
-each successive element of the object representation [[basic.types]] has
-the value of the corresponding code unit of the UTF-8 encoding of the
-string.
 Ordinary string literals and UTF-8 string literals are also referred to
 as narrow string literals.
-A *string-literal* that begins with `u`, such as `u"asdf"`, is a *UTF-16
-string literal*. A UTF-16 string literal has type “array of *n*
-`const char16_t`”, where *n* is the size of the string as defined below;
-each successive element of the array has the value of the corresponding
-code unit of the UTF-16 encoding of the string.
-[*Note 3*: A single *c-char* may produce more than one `char16_t`
-character in the form of surrogate pairs. A surrogate pair is a
-representation for a single code point as a sequence of two 16-bit code
-units. — *end note*]
-A *string-literal* that begins with `U`, such as `U"asdf"`, is a *UTF-32
-string literal*. A UTF-32 string literal has type “array of *n*
-`const char32_t`”, where *n* is the size of the string as defined below;
-each successive element of the array has the value of the corresponding
-code unit of the UTF-32 encoding of the string.
-A *string-literal* that begins with `L`, such as `L"asdf"`, is a *wide
-string literal*. A wide string literal has type “array of *n* `const
-wchar_t`”, where *n* is the size of the string as defined below; it is
-initialized with the given characters.
 In translation phase 6 [[lex.phases]], adjacent *string-literal*s are
-concatenated. If both *string-literal*s have the same *encoding-prefix*,
-the resulting concatenated *string-literal* has that *encoding-prefix*.
-If one *string-literal* has no *encoding-prefix*, it is treated as a
-*string-literal* of the same *encoding-prefix* as the other operand. If
-a UTF-8 string literal token is adjacent to a wide string literal token,
-the program is ill-formed. Any other concatenations are
-conditionally-supported with *implementation-defined* behavior.
-[*Note 4*: This concatenation is an interpretation, not a conversion.
-Because the interpretation happens in translation phase 6 (after each
-character from a *string-literal* has been translated into a value from
-the appropriate character set), a *string-literal*’s initial rawness has
-no effect on the interpretation or well-formedness of the
-concatenation. — *end note*]
 [[lex.string.concat]] has some examples of valid concatenations.
 **Table: String literal concatenations** <a id="lex.string.concat">[lex.string.concat]</a>
 |                            |       |                            |       |                            |       |
 | -------------------------- | ----- | -------------------------- | ----- | -------------------------- | ----- |
 | *[spans 2 columns]* Source | Means | *[spans 2 columns]* Source | Means | *[spans 2 columns]* Source | Means |
 | `u"a"`                     | `u"b"` | `u"ab"`                    | `U"a"` | `U"b"`                     | `U"ab"` | `L"a"` | `L"b"` | `L"ab"` |
 | `u"a"`                     | `"b"` | `u"ab"`                    | `U"a"` | `"b"`                      | `U"ab"` | `L"a"` | `"b"` | `L"ab"` |
 | `"a"`                      | `u"b"` | `u"ab"`                    | `"a"` | `U"b"`                     | `U"ab"` | `"a"` | `L"b"` | `L"ab"` |
-Characters in concatenated strings are kept distinct.
-[*Example 2*:
-``` cpp
-"\xA" "B"
-```
-contains the two characters `'\xA'` and `'B'` after concatenation (and
-not the single hexadecimal character `'\xAB'`).
-— *end example*]
-After any necessary concatenation, in translation phase 7
-[[lex.phases]], `'\0'` is appended to every *string-literal* so that
-programs that scan a string can find its end.
-Escape sequences and *universal-character-name*s in non-raw string
-literals have the same meaning as in *character-literal*s [[lex.ccon]],
-except that the single quote `'` is representable either by itself or by
-the escape sequence `\'`, and the double quote `"` shall be preceded by
-a `\`, and except that a *universal-character-name* in a UTF-16 string
-literal may yield a surrogate pair. In a narrow string literal, a
-*universal-character-name* may map to more than one `char` or `char8_t`
-element due to *multibyte encoding*. The size of a `char32_t` or wide
-string literal is the total number of escape sequences,
-*universal-character-name*s, and other characters, plus one for the
-terminating `U'\0'` or `L'\0'`. The size of a UTF-16 string literal is
-the total number of escape sequences, *universal-character-name*s, and
-other characters, plus one for each character requiring a surrogate
-pair, plus one for the terminating `u'\0'`.
-[*Note 5*: The size of a `char16_t` string literal is the number of
-code units, not the number of characters. — *end note*]
-[*Note 6*: Any *universal-character-name*s are required to correspond
-to a code point in the range [0, D800) or [E000, 10FFFF] (hexadecimal)
-[[lex.charset]]. — *end note*]
-The size of a narrow string literal is the total number of escape
-sequences and other characters, plus at least one for the multibyte
-encoding of each *universal-character-name*, plus one for the
-terminating `'\0'`.
 Evaluating a *string-literal* results in a string literal object with
-static storage duration, initialized from the given characters as
-specified above. Whether all *string-literal*s are distinct (that is,
-are stored in nonoverlapping objects) and whether successive evaluations
-of a *string-literal* yield the same or a different object is
-unspecified.
-[*Note 7*:  The effect of attempting to modify a *string-literal* is
-undefined. — *end note*]
 ### Boolean literals <a id="lex.bool">[[lex.bool]]</a>
 ``` bnf
 boolean-literal:
     'false'
     'true'
 ```
 The Boolean literals are the keywords `false` and `true`. Such literals
-are prvalues and have type `bool`.
 ### Pointer literals <a id="lex.nullptr">[[lex.nullptr]]</a>
 ``` bnf
 pointer-literal:
     'nullptr'
 ```
-The pointer literal is the keyword `nullptr`. It is a prvalue of type
 `std::nullptr_t`.
 [*Note 1*: `std::nullptr_t` is a distinct type that is neither a
 pointer type nor a pointer-to-member type; rather, a prvalue of this
 type is a null pointer constant and can be converted to a null pointer
@@ -742,14 +778,13 @@ The syntactic non-terminal preceding the *ud-suffix* in a
 that could match that non-terminal.
 A *user-defined-literal* is treated as a call to a literal operator or
 literal operator template [[over.literal]]. To determine the form of
 this call for a given *user-defined-literal* *L* with *ud-suffix* *X*,
-the *literal-operator-id* whose literal suffix identifier is *X* is
-looked up in the context of *L* using the rules for unqualified name
-lookup [[basic.lookup.unqual]]. Let *S* be the set of declarations found
-by this lookup. *S* shall not be empty.
 If *L* is a *user-defined-integer-literal*, let *n* be the literal
 without its *ud-suffix*. If *S* contains a literal operator with
 parameter type `unsigned long long`, the literal *L* is treated as a
 call of the form
@@ -761,11 +796,11 @@ operator "" X(nULL)
 Otherwise, *S* shall contain a raw literal operator or a numeric literal
 operator template [[over.literal]] but not both. If *S* contains a raw
 literal operator, the literal *L* is treated as a call of the form
 ``` cpp
-operator "" X("n{"})
 ```
 Otherwise (*S* contains a numeric literal operator template), *L* is
 treated as a call of the form
@@ -774,11 +809,11 @@ operator "" X<'c₁', 'c₂', ... 'cₖ'>()
 ```
 where *n* is the source character sequence c₁c₂...cₖ.
 [*Note 1*: The sequence c₁c₂...cₖ can only contain characters from the
-basic source character set. — *end note*]
 If *L* is a *user-defined-floating-point-literal*, let *f* be the
 literal without its *ud-suffix*. If *S* contains a literal operator with
 parameter type `long double`, the literal *L* is treated as a call of
 the form
@@ -790,11 +825,11 @@ operator "" X(fL)
 Otherwise, *S* shall contain a raw literal operator or a numeric literal
 operator template [[over.literal]] but not both. If *S* contains a raw
 literal operator, the *literal* *L* is treated as a call of the form
 ``` cpp
-operator "" X("f{"})
 ```
 Otherwise (*S* contains a numeric literal operator template), *L* is
 treated as a call of the form
@@ -803,11 +838,11 @@ operator "" X<'c₁', 'c₂', ... 'cₖ'>()
 ```
 where *f* is the source character sequence c₁c₂...cₖ.
 [*Note 2*: The sequence c₁c₂...cₖ can only contain characters from the
-basic source character set. — *end note*]
 If *L* is a *user-defined-string-literal*, let *str* be the literal
 without its *ud-suffix* and let *len* be the number of code units in
 *str* (i.e., its length excluding the terminating null character). If
 *S* contains a literal operator template with a non-type template
@@ -861,39 +896,43 @@ suffix is applied to the result of the concatenation.
 [*Example 3*:
 ``` cpp
 int main() {
-  L"A" "B" "C"_x;   // OK: same as L"ABC"_x
   "P"_x "Q" "R"_y;  // error: two different ud-suffix{es}
 }
 ```
 — *end example*]
 <!-- Link reference definitions -->
 [basic.fundamental]: basic.md#basic.fundamental
 [basic.link]: basic.md#basic.link
 [basic.lookup.unqual]: basic.md#basic.lookup.unqual
 [basic.stc]: basic.md#basic.stc
-[basic.types]: basic.md#basic.types
 [conv.mem]: expr.md#conv.mem
 [conv.ptr]: expr.md#conv.ptr
 [cpp]: cpp.md#cpp
-[cpp.concat]: cpp.md#cpp.concat
 [cpp.cond]: cpp.md#cpp.cond
 [cpp.import]: cpp.md#cpp.import
 [cpp.include]: cpp.md#cpp.include
 [cpp.module]: cpp.md#cpp.module
 [cpp.stringize]: cpp.md#cpp.stringize
 [dcl.attr.grammar]: dcl.md#dcl.attr.grammar
 [headers]: library.md#headers
 [lex]: #lex
 [lex.bool]: #lex.bool
 [lex.ccon]: #lex.ccon
 [lex.ccon.esc]: #lex.ccon.esc
 [lex.charset]: #lex.charset
 [lex.comment]: #lex.comment
 [lex.digraph]: #lex.digraph
 [lex.ext]: #lex.ext
 [lex.fcon]: #lex.fcon
 [lex.fcon.type]: #lex.fcon.type
@@ -904,83 +943,60 @@ int main() {
 [lex.key]: #lex.key
 [lex.key.digraph]: #lex.key.digraph
 [lex.literal]: #lex.literal
 [lex.literal.kinds]: #lex.literal.kinds
 [lex.name]: #lex.name
-[lex.name.allowed]: #lex.name.allowed
-[lex.name.disallowed]: #lex.name.disallowed
 [lex.name.special]: #lex.name.special
 [lex.nullptr]: #lex.nullptr
 [lex.operators]: #lex.operators
 [lex.phases]: #lex.phases
 [lex.ppnumber]: #lex.ppnumber
 [lex.pptoken]: #lex.pptoken
 [lex.separate]: #lex.separate
 [lex.string]: #lex.string
 [lex.string.concat]: #lex.string.concat
 [lex.token]: #lex.token
 [module.import]: module.md#module.import
 [module.unit]: module.md#module.unit
 [over.literal]: over.md#over.literal
 [temp.explicit]: temp.md#temp.explicit
 [temp.names]: temp.md#temp.names
-[^1]: Implementations must behave as if these separate phases occur,
- although in practice different phases might be folded together.
 [^2]: A partial preprocessing token would arise from a source file
     ending in the first portion of a multi-character token that requires
     a terminating sequence of characters, such as a *header-name* that
     is missing the closing `"` or `>`. A partial comment would arise
     from a source file ending with an unclosed `/*` comment.
-[^3]: An implementation need not convert all non-corresponding source
-    characters to the same execution character.
-[^4]: The glyphs for the members of the basic source character set are
-    intended to identify characters from the subset of ISO/IEC 10646
-    which corresponds to the ASCII character set. However, because the
-    mapping from source file characters to the source character set
-    (described in translation phase 1) is specified as
-    *implementation-defined*, an implementation is required to document
-    how the basic source characters are represented in source files.
-[^5]: A sequence of characters resembling a *universal-character-name*
-    in an *r-char-sequence* [[lex.string]] does not form a
-    *universal-character-name*.
-[^6]:  These include “digraphs” and additional reserved words. The term
     “digraph” (token consisting of two characters) is not perfectly
     descriptive, since one of the alternative *preprocessing-token*s is
     `%:%:` and of course several primary tokens contain two characters.
     Nonetheless, those alternative tokens that aren’t lexical keywords
     are colloquially known as “digraphs”.
-[^7]: Thus the “stringized” values [[cpp.stringize]] of `[` and `<:`
     will be different, maintaining the source spelling, but the tokens
     can otherwise be freely interchanged.
-[^8]: Literals include strings and character and numeric literals.
-[^9]: Thus, a sequence of characters that resembles an escape sequence
- might result in an error, be interpreted as the character
     corresponding to the escape sequence, or have a completely different
     meaning, depending on the implementation.
-[^10]: On systems in which linkers cannot accept extended characters, an
-    encoding of the *universal-character-name* may be used in forming
     valid external identifiers. For example, some otherwise unused
-    character or sequence of characters may be used to encode the `\u`
- in a *universal-character-name*. Extended characters may produce a
     long external identifier, but C++ does not place a translation limit
-    on significant characters for external identifiers. In C++, upper-
-    and lower-case letters are considered different for all identifiers,
-    including external identifiers.
-[^11]: The term “literal” generally designates, in this document, those
     tokens that are called “constants” in ISO C.
-[^12]: They are intended for character sets where a character does not
-    fit into a single byte.
-[^13]: Using an escape sequence for a question mark is supported for
-    compatibility with ISO C++14 and ISO C.

 ## Literals <a id="lex.literal">[[lex.literal]]</a>
 ### Kinds of literals <a id="lex.literal.kinds">[[lex.literal.kinds]]</a>
+There are several kinds of literals.[^8]
 ``` bnf
 literal:
     integer-literal
     character-literal
     boolean-literal
     pointer-literal
     user-defined-literal
 ```
+[*Note 1*: When appearing as an *expression*, a literal has a type and
+a value category [[expr.prim.literal]]. — *end note*]
 ### Integer literals <a id="lex.icon">[[lex.icon]]</a>
 ``` bnf
 integer-literal:
     binary-literal integer-suffixₒₚₜ
 ``` bnf
 integer-suffix:
     unsigned-suffix long-suffixₒₚₜ
     unsigned-suffix long-long-suffixₒₚₜ
+    unsigned-suffix size-suffixₒₚₜ
     long-suffix unsigned-suffixₒₚₜ
     long-long-suffix unsigned-suffixₒₚₜ
+    size-suffix unsigned-suffixₒₚₜ
 ```
 ``` bnf
 unsigned-suffix: one of
     'u U'
 ``` bnf
 long-long-suffix: one of
     'll LL'
 ```
+``` bnf
+size-suffix: one of
+   'z Z'
+```
 In an *integer-literal*, the sequence of *binary-digit*s,
 *octal-digit*s, *digit*s, or *hexadecimal-digit*s is interpreted as a
 base N integer as shown in table [[lex.icon.base]]; the lexically first
 digit of the sequence of digits is the most significant.
 `0x10'0000`, and `0'004'000'000` all have the same
 value. — *end example*]
 The type of an *integer-literal* is the first type in the list in
 [[lex.icon.type]] corresponding to its optional *integer-suffix* in
+which its value can be represented.
 **Table: Types of *integer-literal*s** <a id="lex.icon.type">[lex.icon.type]</a>
 | *integer-suffix* | *decimal-literal*                         | *integer-literal* other than *decimal-literal* |
+| ---------------- | ----------------------------------------- | ---------------------------------------------- |
 | none             | `int`                                     | `int`                                          |
 |                  | `long int`                                | `unsigned int`                                 |
 |                  | `long long int`                           | `long int`                                     |
 |                  |                                           | `unsigned long int`                            |
 |                  |                                           | `long long int`                                |
 | and `l` or `L`   | `unsigned long long int`                  | `unsigned long long int`                       |
 | `ll` or `LL`     | `long long int`                           | `long long int`                                |
 |                  |                                           | `unsigned long long int`                       |
 | Both `u` or `U`  | `unsigned long long int`                  | `unsigned long long int`                       |
 | and `ll` or `LL` |                                           |                                                |
+| `z` or `Z`       | the signed integer type corresponding     | the signed integer type                        |
+|                  | to `std::size_t` [[support.types.layout]] | corresponding to `std::size_t`                 |
+|                  |                                           | `std::size_t`                                  |
+| Both `u` or `U`  | `std::size_t`                             | `std::size_t`                                  |
+| and `z` or `Z`   |                                           |                                                |
 If an *integer-literal* cannot be represented by any type in its list
 and an extended integer type [[basic.fundamental]] can represent its
 value, it may have that extended integer type. If all of the types in
     c-char-sequence c-char
 ```
 ``` bnf
 c-char:
+    basic-c-char
     escape-sequence
     universal-character-name
 ```
+``` bnf
+basic-c-char:
+    any member of the translation character set except the U+0027 (apostrophe),
+      U+005c (reverse solidus), or new-line character
+```
 ``` bnf
 escape-sequence:
     simple-escape-sequence
+    numeric-escape-sequence
+    conditional-escape-sequence
+```
+``` bnf
+simple-escape-sequence:
+    '\' simple-escape-sequence-char
+```
+``` bnf
+simple-escape-sequence-char: one of
+    '' " ? \ a b f n r t v'
+```
+``` bnf
+numeric-escape-sequence:
     octal-escape-sequence
     hexadecimal-escape-sequence
 ```
 ``` bnf
+simple-octal-digit-sequence:
+ octal-digit
+ simple-octal-digit-sequence octal-digit
 ```
 ``` bnf
 octal-escape-sequence:
     '\' octal-digit
     '\' octal-digit octal-digit
     '\' octal-digit octal-digit octal-digit
+    '\o{' simple-octal-digit-sequence '}'
 ```
 ``` bnf
 hexadecimal-escape-sequence:
+    '\x' simple-hexadecimal-digit-sequence
+ '\x{' simple-hexadecimal-digit-sequence '}'
 ```
+``` bnf
+conditional-escape-sequence:
+    '\' conditional-escape-sequence-char
+```
+``` bnf
+conditional-escape-sequence-char:
+    any member of the basic character set that is not an octal-digit, a simple-escape-sequence-char, or the characters 'N', 'o', 'u', 'U', or 'x'
+```
+A *non-encodable character literal* is a *character-literal* whose
+*c-char-sequence* consists of a single *c-char* that is not a
+*numeric-escape-sequence* and that specifies a character that either
+lacks representation in the literal’s associated character encoding or
+that cannot be encoded as a single code unit. A *multicharacter literal*
+is a *character-literal* whose *c-char-sequence* consists of more than
+one *c-char*. The *encoding-prefix* of a non-encodable character literal
+or a multicharacter literal shall be absent. Such *character-literal*s
+are conditionally-supported.
+The kind of a *character-literal*, its type, and its associated
+character encoding [[lex.charset]] are determined by its
+*encoding-prefix* and its *c-char-sequence* as defined by
+[[lex.ccon.literal]]. The special cases for non-encodable character
+literals and multicharacter literals take precedence over the base kind.
+[*Note 1*: The associated character encoding for ordinary character
+literals determines encodability, but does not determine the value of
+non-encodable ordinary character literals or ordinary multicharacter
+literals. The examples in [[lex.ccon.literal]] for non-encodable
+ordinary character literals assume that the specified character lacks
+representation in the ordinary literal encoding or that encoding the
+character would require more than one code unit. — *end note*]
+**Table: Character literals** <a id="lex.ccon.literal">[lex.ccon.literal]</a>
+|      |                            |            |              |         |
+| ---- | -------------------------- | ---------- | ------------ | ------- |
+| none | ordinary character literal | `char`     | ordinary     | `'v'`   |
+| `L`  | wide character literal     | `wchar_t`  | wide literal | `L'w'`  |
+|      |                            |            | encoding     |         |
+| `u8` | UTF-8 character literal    | `char8_t`  | UTF-8        | `u8'x'` |
+| `u`  | UTF-16 character literal   | `char16_t` | UTF-16       | `u'y'`  |
+| `U`  | UTF-32 character literal   | `char32_t` | UTF-32       | `U'z'`  |
+In translation phase 4, the value of a *character-literal* is determined
+using the range of representable values of the *character-literal*’s
+type in translation phase 7. A non-encodable character literal or a
+multicharacter literal has an *implementation-defined* value. The value
+of any other kind of *character-literal* is determined as follows:
+- A *character-literal* with a *c-char-sequence* consisting of a single
+  *basic-c-char*, *simple-escape-sequence*, or
+  *universal-character-name* is the code unit value of the specified
+  character as encoded in the literal’s associated character encoding.
+  \[*Note 2*: If the specified character lacks representation in the
+  literal’s associated character encoding or if it cannot be encoded as
+  a single code unit, then the literal is a non-encodable character
+  literal. — *end note*]
+- A *character-literal* with a *c-char-sequence* consisting of a single
+  *numeric-escape-sequence* has a value as follows:
+  - Let v be the integer value represented by the octal number
+    comprising the sequence of *octal-digit*s in an
+    *octal-escape-sequence* or by the hexadecimal number comprising the
+    sequence of *hexadecimal-digit*s in a *hexadecimal-escape-sequence*.
+  - If v does not exceed the range of representable values of the
+    *character-literal*’s type, then the value is v.
+  - Otherwise, if the *character-literal*’s *encoding-prefix* is absent
+    or `L`, and v does not exceed the range of representable values of
+    the corresponding unsigned type for the underlying type of the
+    *character-literal*’s type, then the value is the unique value of
+    the *character-literal*’s type `T` that is congruent to v modulo 2ᴺ,
+    where N is the width of `T`.
+  - Otherwise, the *character-literal* is ill-formed.
+- A *character-literal* with a *c-char-sequence* consisting of a single
+  *conditional-escape-sequence* is conditionally-supported and has an
+  *implementation-defined* value.
+The character specified by a *simple-escape-sequence* is specified in
+[[lex.ccon.esc]].
+[*Note 3*: Using an escape sequence for a question mark is supported
+for compatibility with ISO C++14 and ISO C. — *end note*]
+**Table: Simple escape sequences** <a id="lex.ccon.esc">[lex.ccon.esc]</a>
+| character |                      | *simple-escape-sequence* |
+| --------- | -------------------- | ------------------------ |
+| `U+000a`  | line feed            | `\n`                     |
+| `U+0009`  | character tabulation | `\t`                     |
+| `U+000b`  | line tabulation      | `\v`                     |
+| `U+0008`  | backspace            | `\b`                     |
+| `U+000d`  | carriage return      | `\r`                     |
+| `U+000c`  | form feed            | `\f`                     |
+| `U+0007`  | alert                | `\a`                     |
+| `U+005c`  | reverse solidus      | ``                       |
+| `U+003f`  | question mark        | `\?`                     |
+| `U+0027`  | apostrophe           | `\'`                     |
+| `U+0022`  | quotation mark       | `\"`                     |
 ### Floating-point literals <a id="lex.fcon">[[lex.fcon]]</a>
 ``` bnf
 floating-point-literal:
     digit-sequence '''ₒₚₜ digit
 ```
 ``` bnf
 floating-point-suffix: one of
+    'f l f16 f32 f64 f128 bf16 F L F16 F32 F64 F128 BF16'
 ```
+The type of a *floating-point-literal*
+[[basic.fundamental]], [[basic.extended.fp]] is determined by its
 *floating-point-suffix* as specified in [[lex.fcon.type]].
+[*Note 1*: The floating-point suffixes `f16`, `f32`, `f64`, `f128`,
+`bf16`, `F16`, `F32`, `F64`, `F128`, and `BF16` are
+conditionally-supported. See [[basic.extended.fp]]. — *end note*]
 **Table: Types of *floating-point-literal*{s}** <a id="lex.fcon.type">[lex.fcon.type]</a>
 | *floating-point-suffix* | type              |
+| ----------------------- | ----------------- |
 | none                    | `double`          |
 | `f` or `F`              | `float`           |
 | `l` or `L`              | `long` `double`   |
+| `f16` or `F16`          | `std::float16_t`  |
+| `f32` or `F32`          | `std::float32_t`  |
+| `f64` or `F64`          | `std::float64_t`  |
+| `f128` or `F128`        | `std::float128_t` |
+| `bf16` or `BF16`        | `std::bfloat16_t` |
 The *significand* of a *floating-point-literal* is the
 *fractional-constant* or *digit-sequence* of a
 *decimal-floating-point-literal* or the
 of *digit*s or *hexadecimal-digit*s and optional period are interpreted
 as a base N real number s, where N is 10 for a
 *decimal-floating-point-literal* and 16 for a
 *hexadecimal-floating-point-literal*.
+[*Note 2*: Any optional separating single quotes are ignored when
 determining the value. — *end note*]
 If an *exponent-part* or *binary-exponent-part* is present, the exponent
 e of the *floating-point-literal* is the result of interpreting the
 sequence of an optional *sign* and the *digit*s as a base 10 integer.
     s-char-sequence s-char
 ```
 ``` bnf
 s-char:
+    basic-s-char
     escape-sequence
     universal-character-name
 ```
+``` bnf
+basic-s-char:
+    any member of the translation character set except the U+0022 (quotation mark),
+      U+005c (reverse solidus), or new-line character
+```
 ``` bnf
 raw-string:
     '"' d-char-sequenceₒₚₜ '(' r-char-sequenceₒₚₜ ')' d-char-sequenceₒₚₜ '"'
 ```
     r-char-sequence r-char
 ```
 ``` bnf
 r-char:
+    any member of the translation character set, except a U+0029 (right parenthesis) followed by
+       the initial *d-char-sequence* (which may be empty) followed by a U+0022 (quotation mark)
 ```
 ``` bnf
 d-char-sequence:
     d-char
     d-char-sequence d-char
 ```
 ``` bnf
 d-char:
+    any member of the basic character set except:
+      U+0020 (space), U+0028 (left parenthesis), U+0029 (right parenthesis), U+005c (reverse solidus),
+      U+0009 (character tabulation), U+000b (line tabulation), U+000c (form feed), and new-line
 ```
+The kind of a *string-literal*, its type, and its associated character
+encoding [[lex.charset]] are determined by its encoding prefix and
+sequence of *s-char*s or *r-char*s as defined by [[lex.string.literal]]
+where n is the number of encoded code units as described below.
+**Table: String literals** <a id="lex.string.literal">[lex.string.literal]</a>
+|      |                         |                               |                           |                                                |
+| ---- | ----------------------- | ----------------------------- | ------------------------- | ---------------------------------------------- |
+| none | ordinary string literal | array of $n$ `const char`     | ordinary literal encoding | `"ordinary string"` `R"(ordinary raw string)"` |
+| `L`  | wide string literal     | array of $n$ `const wchar_t`  | wide literal encoding     | `L"wide string"` `LR"w(wide raw string)w"`     |
+| `u8` | UTF-8 string literal    | array of $n$ `const char8_t`  | UTF-8                     | `u8"UTF-8 string"` `u8R"x(UTF-8 raw string)x"` |
+| `u`  | UTF-16 string literal   | array of $n$ `const char16_t` | UTF-16                    | `u"UTF-16 string"` `uR"y(UTF-16 raw string)y"` |
+| `U`  | UTF-32 string literal   | array of $n$ `const char32_t` | UTF-32                    | `U"UTF-32 string"` `UR"z(UTF-32 raw string)z"` |
 A *string-literal* that has an `R` in the prefix is a *raw string
 literal*. The *d-char-sequence* serves as a delimiter. The terminating
 *d-char-sequence* of a *raw-string* is the same sequence of characters
 as the initial *d-char-sequence*. A *d-char-sequence* shall consist of
 at most 16 characters.
 is equivalent to `"x = \"\\\"y\\\"\""`.
 — *end example*]
 Ordinary string literals and UTF-8 string literals are also referred to
 as narrow string literals.
+The common *encoding-prefix* for a sequence of adjacent
+*string-literal*s is determined pairwise as follows: If two
+*string-literal*s have the same *encoding-prefix*, the common
+*encoding-prefix* is that *encoding-prefix*. If one *string-literal* has
+no *encoding-prefix*, the common *encoding-prefix* is that of the other
+*string-literal*. Any other combinations are ill-formed.
+[*Note 3*: A *string-literal*’s rawness has no effect on the
+determination of the common *encoding-prefix*. — *end note*]
 In translation phase 6 [[lex.phases]], adjacent *string-literal*s are
+concatenated. The lexical structure and grouping of the contents of the
+individual *string-literal*s is retained.
+[*Example 2*:
+``` cpp
+"\xA" "B"
+```
+represents the code unit `'\xA'` and the character `'B'` after
+concatenation (and not the single code unit `'\xAB'`). Similarly,
+``` cpp
+R"(\u00)" "41"
+```
+represents six characters, starting with a backslash and ending with the
+digit `1` (and not the single character `'A'` specified by a
+*universal-character-name*).
 [[lex.string.concat]] has some examples of valid concatenations.
+— *end example*]
 **Table: String literal concatenations** <a id="lex.string.concat">[lex.string.concat]</a>
 |                            |       |                            |       |                            |       |
 | -------------------------- | ----- | -------------------------- | ----- | -------------------------- | ----- |
 | *[spans 2 columns]* Source | Means | *[spans 2 columns]* Source | Means | *[spans 2 columns]* Source | Means |
 | `u"a"`                     | `u"b"` | `u"ab"`                    | `U"a"` | `U"b"`                     | `U"ab"` | `L"a"` | `L"b"` | `L"ab"` |
 | `u"a"`                     | `"b"` | `u"ab"`                    | `U"a"` | `"b"`                      | `U"ab"` | `L"a"` | `"b"` | `L"ab"` |
 | `"a"`                      | `u"b"` | `u"ab"`                    | `"a"` | `U"b"`                     | `U"ab"` | `"a"` | `L"b"` | `L"ab"` |
 Evaluating a *string-literal* results in a string literal object with
+static storage duration [[basic.stc]]. Whether all *string-literal*s are
+distinct (that is, are stored in nonoverlapping objects) and whether
+successive evaluations of a *string-literal* yield the same or a
+different object is unspecified.
+[*Note 4*:  The effect of attempting to modify a string literal object
+is undefined. — *end note*]
+String literal objects are initialized with the sequence of code unit
+values corresponding to the *string-literal*’s sequence of *s-char*s
+(originally from non-raw string literals) and *r-char*s (originally from
+raw string literals), plus a terminating U+0000 (null) character, in
+order as follows:
+- The sequence of characters denoted by each contiguous sequence of
+  *basic-s-char*s, *r-char*s, *simple-escape-sequence*s [[lex.ccon]],
+  and *universal-character-name*s [[lex.charset]] is encoded to a code
+  unit sequence using the *string-literal*’s associated character
+  encoding. If a character lacks representation in the associated
+  character encoding, then the *string-literal* is
+  conditionally-supported and an *implementation-defined* code unit
+  sequence is encoded. \[*Note 5*: No character lacks representation in
+  any Unicode encoding form. — *end note*] When encoding a stateful
+  character encoding, implementations should encode the first such
+  sequence beginning with the initial encoding state and encode
+  subsequent sequences beginning with the final encoding state of the
+  prior sequence. \[*Note 6*: The encoded code unit sequence can differ
+  from the sequence of code units that would be obtained by encoding
+  each character independently. — *end note*]
+- Each *numeric-escape-sequence* [[lex.ccon]] contributes a single code
+  unit with a value as follows:
+  - Let v be the integer value represented by the octal number
+    comprising the sequence of *octal-digit*s in an
+    *octal-escape-sequence* or by the hexadecimal number comprising the
+    sequence of *hexadecimal-digit*s in a *hexadecimal-escape-sequence*.
+  - If v does not exceed the range of representable values of the
+    *string-literal*’s array element type, then the value is v.
+  - Otherwise, if the *string-literal*’s *encoding-prefix* is absent or
+    `L`, and v does not exceed the range of representable values of the
+    corresponding unsigned type for the underlying type of the
+    *string-literal*’s array element type, then the value is the unique
+    value of the *string-literal*’s array element type `T` that is
+    congruent to v modulo 2ᴺ, where N is the width of `T`.
+  - Otherwise, the *string-literal* is ill-formed.
+  When encoding a stateful character encoding, these sequences should
+  have no effect on encoding state.
+- Each *conditional-escape-sequence* [[lex.ccon]] contributes an
+  *implementation-defined* code unit sequence. When encoding a stateful
+  character encoding, it is *implementation-defined* what effect these
+  sequences have on encoding state.
 ### Boolean literals <a id="lex.bool">[[lex.bool]]</a>
 ``` bnf
 boolean-literal:
     'false'
     'true'
 ```
 The Boolean literals are the keywords `false` and `true`. Such literals
+have type `bool`.
 ### Pointer literals <a id="lex.nullptr">[[lex.nullptr]]</a>
 ``` bnf
 pointer-literal:
     'nullptr'
 ```
+The pointer literal is the keyword `nullptr`. It has type
 `std::nullptr_t`.
 [*Note 1*: `std::nullptr_t` is a distinct type that is neither a
 pointer type nor a pointer-to-member type; rather, a prvalue of this
 type is a null pointer constant and can be converted to a null pointer
 that could match that non-terminal.
 A *user-defined-literal* is treated as a call to a literal operator or
 literal operator template [[over.literal]]. To determine the form of
 this call for a given *user-defined-literal* *L* with *ud-suffix* *X*,
+first let *S* be the set of declarations found by unqualified lookup for
+the *literal-operator-id* whose literal suffix identifier is *X*
+[[basic.lookup.unqual]]. *S* shall not be empty.
 If *L* is a *user-defined-integer-literal*, let *n* be the literal
 without its *ud-suffix*. If *S* contains a literal operator with
 parameter type `unsigned long long`, the literal *L* is treated as a
 call of the form
 Otherwise, *S* shall contain a raw literal operator or a numeric literal
 operator template [[over.literal]] but not both. If *S* contains a raw
 literal operator, the literal *L* is treated as a call of the form
 ``` cpp
+operator ""X("n")
 ```
 Otherwise (*S* contains a numeric literal operator template), *L* is
 treated as a call of the form
 ```
 where *n* is the source character sequence c₁c₂...cₖ.
 [*Note 1*: The sequence c₁c₂...cₖ can only contain characters from the
+basic character set. — *end note*]
 If *L* is a *user-defined-floating-point-literal*, let *f* be the
 literal without its *ud-suffix*. If *S* contains a literal operator with
 parameter type `long double`, the literal *L* is treated as a call of
 the form
 Otherwise, *S* shall contain a raw literal operator or a numeric literal
 operator template [[over.literal]] but not both. If *S* contains a raw
 literal operator, the *literal* *L* is treated as a call of the form
 ``` cpp
+operator ""X("f")
 ```
 Otherwise (*S* contains a numeric literal operator template), *L* is
 treated as a call of the form
 ```
 where *f* is the source character sequence c₁c₂...cₖ.
 [*Note 2*: The sequence c₁c₂...cₖ can only contain characters from the
+basic character set. — *end note*]
 If *L* is a *user-defined-string-literal*, let *str* be the literal
 without its *ud-suffix* and let *len* be the number of code units in
 *str* (i.e., its length excluding the terminating null character). If
 *S* contains a literal operator template with a non-type template
 [*Example 3*:
 ``` cpp
 int main() {
+  L"A" "B" "C"_x;   // OK, same as L"ABC"_x
   "P"_x "Q" "R"_y;  // error: two different ud-suffix{es}
 }
 ```
 — *end example*]
 <!-- Link reference definitions -->
+[basic.extended.fp]: basic.md#basic.extended.fp
 [basic.fundamental]: basic.md#basic.fundamental
 [basic.link]: basic.md#basic.link
 [basic.lookup.unqual]: basic.md#basic.lookup.unqual
 [basic.stc]: basic.md#basic.stc
+[character.seq]: library.md#character.seq
 [conv.mem]: expr.md#conv.mem
 [conv.ptr]: expr.md#conv.ptr
 [cpp]: cpp.md#cpp
 [cpp.cond]: cpp.md#cpp.cond
 [cpp.import]: cpp.md#cpp.import
 [cpp.include]: cpp.md#cpp.include
 [cpp.module]: cpp.md#cpp.module
 [cpp.stringize]: cpp.md#cpp.stringize
 [dcl.attr.grammar]: dcl.md#dcl.attr.grammar
+[expr.prim.literal]: expr.md#expr.prim.literal
 [headers]: library.md#headers
 [lex]: #lex
 [lex.bool]: #lex.bool
 [lex.ccon]: #lex.ccon
 [lex.ccon.esc]: #lex.ccon.esc
+[lex.ccon.literal]: #lex.ccon.literal
 [lex.charset]: #lex.charset
+[lex.charset.basic]: #lex.charset.basic
+[lex.charset.literal]: #lex.charset.literal
 [lex.comment]: #lex.comment
 [lex.digraph]: #lex.digraph
 [lex.ext]: #lex.ext
 [lex.fcon]: #lex.fcon
 [lex.fcon.type]: #lex.fcon.type
 [lex.key]: #lex.key
 [lex.key.digraph]: #lex.key.digraph
 [lex.literal]: #lex.literal
 [lex.literal.kinds]: #lex.literal.kinds
 [lex.name]: #lex.name
 [lex.name.special]: #lex.name.special
 [lex.nullptr]: #lex.nullptr
 [lex.operators]: #lex.operators
 [lex.phases]: #lex.phases
 [lex.ppnumber]: #lex.ppnumber
 [lex.pptoken]: #lex.pptoken
 [lex.separate]: #lex.separate
 [lex.string]: #lex.string
 [lex.string.concat]: #lex.string.concat
+[lex.string.literal]: #lex.string.literal
 [lex.token]: #lex.token
 [module.import]: module.md#module.import
 [module.unit]: module.md#module.unit
 [over.literal]: over.md#over.literal
+[support.types.layout]: support.md#support.types.layout
 [temp.explicit]: temp.md#temp.explicit
 [temp.names]: temp.md#temp.names
+[^1]: Implementations behave as if these separate phases occur, although
+    in practice different phases can be folded together.
 [^2]: A partial preprocessing token would arise from a source file
     ending in the first portion of a multi-character token that requires
     a terminating sequence of characters, such as a *header-name* that
     is missing the closing `"` or `>`. A partial comment would arise
     from a source file ending with an unclosed `/*` comment.
+[^3]:  These include “digraphs” and additional reserved words. The term
     “digraph” (token consisting of two characters) is not perfectly
     descriptive, since one of the alternative *preprocessing-token*s is
     `%:%:` and of course several primary tokens contain two characters.
     Nonetheless, those alternative tokens that aren’t lexical keywords
     are colloquially known as “digraphs”.
+[^4]: Thus the “stringized” values [[cpp.stringize]] of `[` and `<:`
     will be different, maintaining the source spelling, but the tokens
     can otherwise be freely interchanged.
+[^5]: Literals include strings and character and numeric literals.
+[^6]: Thus, a sequence of characters that resembles an escape sequence
+ can result in an error, be interpreted as the character
     corresponding to the escape sequence, or have a completely different
     meaning, depending on the implementation.
+[^7]: On systems in which linkers cannot accept extended characters, an
+    encoding of the \*universal-character-name\* can be used in forming
     valid external identifiers. For example, some otherwise unused
+    character or sequence of characters can be used to encode the `̆` in
+    a \*universal-character-name\*. Extended characters can produce a
     long external identifier, but C++ does not place a translation limit
+    on significant characters for external identifiers.
+[^8]: The term “literal” generally designates, in this document, those
     tokens that are called “constants” in ISO C.

Diff to HTML by rtfpessoa