[lex.literal] - C++17 → C++20

Files changed (1) hide show

tmp/tmpud1anggv/{from.md → to.md} +288 -244

tmp/tmpud1anggv/{from.md → to.md} RENAMED Viewed

@@ -6,11 +6,11 @@ There are several kinds of literals.[^11]
 ``` bnf
 literal:
     integer-literal
     character-literal
-    floating-literal
     string-literal
     boolean-literal
     pointer-literal
     user-defined-literal
 ```
@@ -48,13 +48,12 @@ decimal-literal:
 hexadecimal-literal:
     hexadecimal-prefix hexadecimal-digit-sequence
 ```
 ``` bnf
-binary-digit:
-    '0'
-    '1'
 ```
 ``` bnf
 octal-digit: one of
     '0 1 2 3 4 5 6 7'
@@ -104,38 +103,44 @@ long-suffix: one of
 ``` bnf
 long-long-suffix: one of
     'll LL'
 ```
-An *integer literal* is a sequence of digits that has no period or
-exponent part, with optional separating single quotes that are ignored
-when determining its value. An integer literal may have a prefix that
-specifies its base and a suffix that specifies its type. The lexically
-first digit of the sequence of digits is the most significant. A *binary
-integer literal* (base two) begins with `0b` or `0B` and consists of a
-sequence of binary digits. An *octal integer literal* (base eight)
-begins with the digit `0` and consists of a sequence of octal
-digits.[^12] A *decimal integer literal* (base ten) begins with a digit
-other than `0` and consists of a sequence of decimal digits. A
-*hexadecimal integer literal* (base sixteen) begins with `0x` or `0X`
-and consists of a sequence of hexadecimal digits, which include the
-decimal digits and the letters `a` through `f` and `A` through `F` with
 decimal values ten through fifteen.
 [*Example 1*: The number twelve can be written `12`, `014`, `0XC`, or
-`0b1100`. The integer literals `1048576`, `1'048'576`, `0X100000`,
 `0x10'0000`, and `0'004'000'000` all have the same
 value. — *end example*]
-The type of an integer literal is the first of the corresponding list in
-Table  [[tab:lex.type.integer.literal]] in which its value can be
-represented.
-**Table: Types of integer literals** <a id="tab:lex.type.integer.literal">[tab:lex.type.integer.literal]</a>
-| | | |
-| ---------------- | ------------------------ | ------------------------ |
 | none             | `int`                    | `int`                                          |
 |                  | `long int`               | `unsigned int`                                 |
 |                  | `long long int`          | `long int`                                     |
 |                  |                          | `unsigned long int`                            |
 |                  |                          | `long long int`                                |
@@ -153,20 +158,20 @@ represented.
 |                  |                          | `unsigned long long int`                       |
 | Both `u` or `U`  | `unsigned long long int` | `unsigned long long int`                       |
 | and `ll` or `LL` |                          |                                                |
-If an integer literal cannot be represented by any type in its list and
-an extended integer type ([[basic.fundamental]]) can represent its
 value, it may have that extended integer type. If all of the types in
-the list for the integer literal are signed, the extended integer type
-shall be signed. If all of the types in the list for the integer literal
-are unsigned, the extended integer type shall be unsigned. If the list
-contains both signed and unsigned types, the extended integer type may
-be signed or unsigned. A program is ill-formed if one of its translation
-units contains an integer literal that cannot be represented by any of
-the allowed types.
 ### Character literals <a id="lex.ccon">[[lex.ccon]]</a>
 ``` bnf
 character-literal:
@@ -182,10 +187,17 @@ encoding-prefix: one of
 c-char-sequence:
     c-char
     c-char-sequence c-char
 ```
 ``` bnf
 escape-sequence:
     simple-escape-sequence
     octal-escape-sequence
     hexadecimal-escape-sequence
@@ -208,76 +220,80 @@ octal-escape-sequence:
 hexadecimal-escape-sequence:
     '\x' hexadecimal-digit
     hexadecimal-escape-sequence hexadecimal-digit
 ```
-A character literal is one or more characters enclosed in single quotes,
-as in `'x'`, optionally preceded by `u8`, `u`, `U`, or `L`, as in
-`u8'w'`, `u'x'`, `U'y'`, or `L'z'`, respectively.
-A character literal that does not begin with `u8`, `u`, `U`, or `L` is
 an *ordinary character literal*. An ordinary character literal that
 contains a single *c-char* representable in the execution character set
 has type `char`, with value equal to the numerical value of the encoding
 of the *c-char* in the execution character set. An ordinary character
-literal that contains more than one *c-char* is a *multicharacter
-literal*. A multicharacter literal, or an ordinary character literal
-containing a single *c-char* not representable in the execution
-character set, is conditionally-supported, has type `int`, and has an
-*implementation-defined* value.
-A character literal that begins with `u8`, such as `u8'w'`, is a
-character literal of type `char`, known as a *UTF-8 character literal*.
-The value of a UTF-8 character literal is equal to its ISO 10646 code
-point value, provided that the code point value is representable with a
-single UTF-8 code unit (that is, provided it is in the C0 Controls and
-Basic Latin Unicode block). If the value is not representable with a
-single UTF-8 code unit, the program is ill-formed. A UTF-8 character
-literal containing multiple *c-char*s is ill-formed.
-A character literal that begins with the letter `u`, such as `u'x'`, is
-a character literal of type `char16_t`. The value of a `char16_t`
-character literal containing a single *c-char* is equal to its ISO 10646
-code point value, provided that the code point is representable with a
-single 16-bit code unit. (That is, provided it is a basic multi-lingual
-plane code point.) If the value is not representable within 16 bits, the
-program is ill-formed. A `char16_t` character literal containing
-multiple *c-char*s is ill-formed.
-A character literal that begins with the letter `U`, such as `U'y'`, is
-a character literal of type `char32_t`. The value of a `char32_t`
-character literal containing a single *c-char* is equal to its ISO 10646
-code point value. A `char32_t` character literal containing multiple
 *c-char*s is ill-formed.
-A character literal that begins with the letter `L`, such as `L'z'`, is
-a *wide-character literal*. A wide-character literal has type
-`wchar_t`.[^13] The value of a wide-character literal containing a
 single *c-char* has value equal to the numerical value of the encoding
 of the *c-char* in the execution wide-character set, unless the *c-char*
 has no representation in the execution wide-character set, in which case
 the value is *implementation-defined*.
-[*Note 1*: The type `wchar_t` is able to represent all members of the
 execution wide-character set (see
 [[basic.fundamental]]). — *end note*]
 The value of a wide-character literal containing multiple *c-char*s is
 *implementation-defined*.
 Certain non-graphic characters, the single quote `'`, the double quote
-`"`, the question mark `?`,[^14] and the backslash `\`, can be
-represented according to Table  [[tab:escape.sequences]]. The double
-quote `"` and the question mark `?`, can be represented as themselves or
-by the escape sequences `\"` and `\?` respectively, but the single quote
-`'` and the backslash `\` shall be represented by the escape sequences
-`\'` and `\\` respectively. Escape sequences in which the character
-following the backslash is not listed in Table  [[tab:escape.sequences]]
-are conditionally-supported, with *implementation-defined* semantics. An
-escape sequence specifies a single character.
-**Table: Escape sequences** <a id="tab:escape.sequences">[tab:escape.sequences]</a>
 |                 |                |                    |
 | --------------- | -------------- | ------------------ |
 | new-line        | NL(LF)         | `\n`               |
 | horizontal tab  | HT             | `\t`               |
@@ -300,49 +316,49 @@ desired character. The escape `\x\numconst{hhh}` consists of the
 backslash followed by `x` followed by one or more hexadecimal digits
 that are taken to specify the value of the desired character. There is
 no limit to the number of digits in a hexadecimal sequence. A sequence
 of octal or hexadecimal digits is terminated by the first character that
 is not an octal digit or a hexadecimal digit, respectively. The value of
-a character literal is *implementation-defined* if it falls outside of
-the *implementation-defined* range defined for `char` (for character
-literals with no prefix) or `wchar_t` (for character literals prefixed
-by `L`).
-[*Note 2*: If the value of a character literal prefixed by `u`, `u8`,
 or `U` is outside the range defined for its type, the program is
 ill-formed. — *end note*]
 A *universal-character-name* is translated to the encoding, in the
 appropriate execution character set, of the character named. If there is
 no such encoding, the *universal-character-name* is translated to an
 *implementation-defined* encoding.
-[*Note 3*: In translation phase 1, a *universal-character-name* is
 introduced whenever an actual extended character is encountered in the
 source text. Therefore, all extended characters are described in terms
 of *universal-character-name*s. However, the actual compiler
 implementation may use its own native character set, so long as the same
 results are obtained. — *end note*]
-### Floating literals <a id="lex.fcon">[[lex.fcon]]</a>
 ``` bnf
-floating-literal:
-    decimal-floating-literal
-    hexadecimal-floating-literal
 ```
 ``` bnf
-decimal-floating-literal:
-    fractional-constant exponent-partₒₚₜ floating-suffixₒₚₜ
-    digit-sequence exponent-part floating-suffixₒₚₜ
 ```
 ``` bnf
-hexadecimal-floating-literal:
-    hexadecimal-prefix hexadecimal-fractional-constant binary-exponent-part floating-suffixₒₚₜ
-    hexadecimal-prefix hexadecimal-digit-sequence binary-exponent-part floating-suffixₒₚₜ
 ```
 ``` bnf
 fractional-constant:
     digit-sequenceₒₚₜ '.' digit-sequence
@@ -377,50 +393,55 @@ digit-sequence:
     digit
     digit-sequence '''ₒₚₜ digit
 ```
 ``` bnf
-floating-suffix: one of
     'f l F L'
 ```
-A floating literal consists of an optional prefix specifying a base, an
-integer part, a radix point, a fraction part, an `e`, `E`, `p` or `P`,
-an optionally signed integer exponent, and an optional type suffix. The
-integer and fraction parts both consist of a sequence of decimal (base
-ten) digits if there is no prefix, or hexadecimal (base sixteen) digits
-if the prefix is `0x` or `0X`. The floating literal is a *decimal
-floating literal* in the former case and a *hexadecimal floating
-literal* in the latter case. Optional separating single quotes in a
-*digit-sequence* or *hexadecimal-digit-sequence* are ignored when
-determining its value.
-[*Example 1*: The floating literals `1.602'176'565e-19` and
-`1.602176565e-19` have the same value. — *end example*]
-Either the integer part or the fraction part (not both) can be omitted.
-Either the radix point or the letter `e` or `E` and the exponent (not
-both) can be omitted from a decimal floating literal. The radix point
-(but not the exponent) can be omitted from a hexadecimal floating
-literal. The integer part, the optional radix point, and the optional
-fraction part, form the *significand* of the floating literal. In a
-decimal floating literal, the exponent, if present, indicates the power
-of 10 by which the significand is to be scaled. In a hexadecimal
-floating literal, the exponent indicates the power of 2 by which the
-significand is to be scaled.
-[*Example 2*: The floating literals `49.625` and `0xC.68p+2` have the
-same value. — *end example*]
-If the scaled value is in the range of representable values for its
-type, the result is the scaled value if representable, else the larger
-or smaller representable value nearest the scaled value, chosen in an
-*implementation-defined* manner. The type of a floating literal is
-`double` unless explicitly specified by a suffix. The suffixes `f` and
-`F` specify `float`, the suffixes `l` and `L` specify `long` `double`.
 If the scaled value is not in the range of representable values for its
-type, the program is ill-formed.
 ### String literals <a id="lex.string">[[lex.string]]</a>
 ``` bnf
 string-literal:
@@ -432,10 +453,17 @@ string-literal:
 s-char-sequence:
     s-char
     s-char-sequence s-char
 ```
 ``` bnf
 raw-string:
     '"' d-char-sequenceₒₚₜ '(' r-char-sequenceₒₚₜ ')' d-char-sequenceₒₚₜ '"'
 ```
@@ -443,21 +471,28 @@ raw-string:
 r-char-sequence:
     r-char
     r-char-sequence r-char
 ```
 ``` bnf
 d-char-sequence:
     d-char
     d-char-sequence d-char
 ```
-A *string-literal* is a sequence of characters (as defined in
-[[lex.ccon]]) surrounded by double quotes, optionally prefixed by `R`,
-`u8`, `u8R`, `u`, `uR`, `U`, `UR`, `L`, or `LR`, as in `"..."`,
-`R"(...)"`, `u8"..."`, `u8R"**(...)**"`, `u"..."`, `uR"*~(...)*~"`,
-`U"..."`, `UR"zzz(...)zzz"`, `L"..."`, or `LR"(...)"`, respectively.
 A *string-literal* that has an `R` in the prefix is a *raw string
 literal*. The *d-char-sequence* serves as a delimiter. The terminating
 *d-char-sequence* of a *raw-string* is the same sequence of characters
 as the initial *d-char-sequence*. A *d-char-sequence* shall consist of
@@ -494,78 +529,74 @@ a"
 ```
 is equivalent to `"\n)\\\na\"\n"`. The raw string
 ``` cpp
-R"(??)"
 ```
-is equivalent to `"\?\?"`. The raw string
-``` cpp
-R"#(
-)??="
-)#"
-```
-is equivalent to `"\n)\?\?=\"\n"`.
 — *end example*]
 After translation phase 6, a *string-literal* that does not begin with
-an *encoding-prefix* is an *ordinary string literal*, and is initialized
-with the given characters.
 A *string-literal* that begins with `u8`, such as `u8"asdf"`, is a
-*UTF-8 string literal*.
 Ordinary string literals and UTF-8 string literals are also referred to
-as narrow string literals. A narrow string literal has type “array of
-*n* `const char`”, where *n* is the size of the string as defined below,
-and has static storage duration ([[basic.stc]]).
-For a UTF-8 string literal, each successive element of the object
-representation ([[basic.types]]) has the value of the corresponding
-code unit of the UTF-8 encoding of the string.
-A *string-literal* that begins with `u`, such as `u"asdf"`, is a
-`char16_t` string literal. A `char16_t` string literal has type “array
-of *n* `const char16_t`”, where *n* is the size of the string as defined
-below; it is initialized with the given characters. A single *c-char*
-may produce more than one `char16_t` character in the form of surrogate
-pairs.
-A *string-literal* that begins with `U`, such as `U"asdf"`, is a
-`char32_t` string literal. A `char32_t` string literal has type “array
-of *n* `const char32_t`”, where *n* is the size of the string as defined
-below; it is initialized with the given characters.
 A *string-literal* that begins with `L`, such as `L"asdf"`, is a *wide
 string literal*. A wide string literal has type “array of *n* `const
 wchar_t`”, where *n* is the size of the string as defined below; it is
 initialized with the given characters.
-In translation phase 6 ([[lex.phases]]), adjacent *string-literal*s are
 concatenated. If both *string-literal*s have the same *encoding-prefix*,
-the resulting concatenated string literal has that *encoding-prefix*. If
-one *string-literal* has no *encoding-prefix*, it is treated as a
 *string-literal* of the same *encoding-prefix* as the other operand. If
 a UTF-8 string literal token is adjacent to a wide string literal token,
 the program is ill-formed. Any other concatenations are
 conditionally-supported with *implementation-defined* behavior.
-[*Note 3*: This concatenation is an interpretation, not a conversion.
 Because the interpretation happens in translation phase 6 (after each
-character from a string literal has been translated into a value from
 the appropriate character set), a *string-literal*’s initial rawness has
 no effect on the interpretation or well-formedness of the
 concatenation. — *end note*]
-Table  [[tab:lex.string.concat]] has some examples of valid
-concatenations.
-**Table: String literal concatenations** <a id="tab:lex.string.concat">[tab:lex.string.concat]</a>
 |                            |       |                            |       |                            |       |
 | -------------------------- | ----- | -------------------------- | ----- | -------------------------- | ----- |
 | *[spans 2 columns]* Source | Means | *[spans 2 columns]* Source | Means | *[spans 2 columns]* Source | Means |
 | `u"a"`                     | `u"b"` | `u"ab"`                    | `U"a"` | `U"b"`                     | `U"ab"` | `L"a"` | `L"b"` | `L"ab"` |
@@ -584,46 +615,49 @@ Characters in concatenated strings are kept distinct.
 contains the two characters `'\xA'` and `'B'` after concatenation (and
 not the single hexadecimal character `'\xAB'`).
 — *end example*]
-After any necessary concatenation, in translation phase 7 (
-[[lex.phases]]), `'\0'` is appended to every string literal so that
 programs that scan a string can find its end.
 Escape sequences and *universal-character-name*s in non-raw string
-literals have the same meaning as in character literals ([[lex.ccon]]),
 except that the single quote `'` is representable either by itself or by
 the escape sequence `\'`, and the double quote `"` shall be preceded by
-a `\`, and except that a *universal-character-name* in a `char16_t`
-string literal may yield a surrogate pair. In a narrow string literal, a
-*universal-character-name* may map to more than one `char` element due
-to *multibyte encoding*. The size of a `char32_t` or wide string literal
-is the total number of escape sequences, *universal-character-name*s,
-and other characters, plus one for the terminating `U'\0'` or `L'\0'`.
-The size of a `char16_t` string literal is the total number of escape
-sequences, *universal-character-name*s, and other characters, plus one
-for each character requiring a surrogate pair, plus one for the
-terminating `u'\0'`.
-[*Note 4*: The size of a `char16_t` string literal is the number of
 code units, not the number of characters. — *end note*]
-Within `char32_t` and `char16_t` string literals, any
-*universal-character-name*s shall be within the range `0x0` to
-`0x10FFFF`. The size of a narrow string literal is the total number of
-escape sequences and other characters, plus at least one for the
-multibyte encoding of each *universal-character-name*, plus one for the
 terminating `'\0'`.
 Evaluating a *string-literal* results in a string literal object with
 static storage duration, initialized from the given characters as
-specified above. Whether all string literals are distinct (that is, are
-stored in nonoverlapping objects) and whether successive evaluations of
-a *string-literal* yield the same or a different object is unspecified.
-[*Note 5*:  The effect of attempting to modify a string literal is
 undefined. — *end note*]
 ### Boolean literals <a id="lex.bool">[[lex.bool]]</a>
 ``` bnf
@@ -644,21 +678,21 @@ pointer-literal:
 The pointer literal is the keyword `nullptr`. It is a prvalue of type
 `std::nullptr_t`.
 [*Note 1*: `std::nullptr_t` is a distinct type that is neither a
-pointer type nor a pointer to member type; rather, a prvalue of this
 type is a null pointer constant and can be converted to a null pointer
 value or null member pointer value. See  [[conv.ptr]] and
 [[conv.mem]]. — *end note*]
 ### User-defined literals <a id="lex.ext">[[lex.ext]]</a>
 ``` bnf
 user-defined-literal:
     user-defined-integer-literal
-    user-defined-floating-literal
     user-defined-string-literal
     user-defined-character-literal
 ```
 ``` bnf
@@ -668,11 +702,11 @@ user-defined-integer-literal:
     hexadecimal-literal ud-suffix
     binary-literal ud-suffix
 ```
 ``` bnf
-user-defined-floating-literal:
     fractional-constant exponent-partₒₚₜ ud-suffix
     digit-sequence exponent-part ud-suffix
     hexadecimal-prefix hexadecimal-fractional-constant binary-exponent-part ud-suffix
     hexadecimal-prefix hexadecimal-digit-sequence binary-exponent-part ud-suffix
 ```
@@ -706,65 +740,65 @@ is a *user-defined-literal*, but `12LL` is an *integer-literal*.
 The syntactic non-terminal preceding the *ud-suffix* in a
 *user-defined-literal* is taken to be the longest sequence of characters
 that could match that non-terminal.
 A *user-defined-literal* is treated as a call to a literal operator or
-literal operator template ([[over.literal]]). To determine the form of
 this call for a given *user-defined-literal* *L* with *ud-suffix* *X*,
 the *literal-operator-id* whose literal suffix identifier is *X* is
 looked up in the context of *L* using the rules for unqualified name
-lookup ([[basic.lookup.unqual]]). Let *S* be the set of declarations
-found by this lookup. *S* shall not be empty.
 If *L* is a *user-defined-integer-literal*, let *n* be the literal
 without its *ud-suffix*. If *S* contains a literal operator with
 parameter type `unsigned long long`, the literal *L* is treated as a
 call of the form
 ``` cpp
 operator "" X(nULL)
 ```
-Otherwise, *S* shall contain a raw literal operator or a literal
-operator template ([[over.literal]]) but not both. If *S* contains a
-raw literal operator, the literal *L* is treated as a call of the form
 ``` cpp
 operator "" X("n{"})
 ```
-Otherwise (*S* contains a literal operator template), *L* is treated as
-a call of the form
 ``` cpp
 operator "" X<'c₁', 'c₂', ... 'cₖ'>()
 ```
 where *n* is the source character sequence c₁c₂...cₖ.
 [*Note 1*: The sequence c₁c₂...cₖ can only contain characters from the
 basic source character set. — *end note*]
-If *L* is a *user-defined-floating-literal*, let *f* be the literal
-without its *ud-suffix*. If *S* contains a literal operator with
 parameter type `long double`, the literal *L* is treated as a call of
 the form
 ``` cpp
 operator "" X(fL)
 ```
-Otherwise, *S* shall contain a raw literal operator or a literal
-operator template ([[over.literal]]) but not both. If *S* contains a
-raw literal operator, the *literal* *L* is treated as a call of the form
 ``` cpp
 operator "" X("f{"})
 ```
-Otherwise (*S* contains a literal operator template), *L* is treated as
-a call of the form
 ``` cpp
 operator "" X<'c₁', 'c₂', ... 'cₖ'>()
 ```
@@ -773,20 +807,28 @@ where *f* is the source character sequence c₁c₂...cₖ.
 [*Note 2*: The sequence c₁c₂...cₖ can only contain characters from the
 basic source character set. — *end note*]
 If *L* is a *user-defined-string-literal*, let *str* be the literal
 without its *ud-suffix* and let *len* be the number of code units in
-*str* (i.e., its length excluding the terminating null character). The
 literal *L* is treated as a call of the form
 ``` cpp
 operator "" X(str, len)
 ```
 If *L* is a *user-defined-character-literal*, let *ch* be the literal
-without its *ud-suffix*. *S* shall contain a literal operator (
-[[over.literal]]) whose only parameter has the type of *ch* and the
 literal *L* is treated as a call of the form
 ``` cpp
 operator "" X(ch)
 ```
@@ -805,16 +847,16 @@ int main() {
 }
 ```
 — *end example*]
-In translation phase 6 ([[lex.phases]]), adjacent string literals are
-concatenated and *user-defined-string-literal*s are considered string
-literals for that purpose. During concatenation, *ud-suffix*es are
-removed and ignored and the concatenation process occurs as described
-in  [[lex.string]]. At the end of phase 6, if a string literal is the
-result of a concatenation involving at least one
 *user-defined-string-literal*, all the participating
 *user-defined-string-literal*s shall have the same *ud-suffix* and that
 suffix is applied to the result of the concatenation.
 [*Example 3*:
@@ -832,51 +874,55 @@ int main() {
 [basic.fundamental]: basic.md#basic.fundamental
 [basic.link]: basic.md#basic.link
 [basic.lookup.unqual]: basic.md#basic.lookup.unqual
 [basic.stc]: basic.md#basic.stc
 [basic.types]: basic.md#basic.types
-[conv.mem]: conv.md#conv.mem
-[conv.ptr]: conv.md#conv.ptr
 [cpp]: cpp.md#cpp
 [cpp.concat]: cpp.md#cpp.concat
 [cpp.cond]: cpp.md#cpp.cond
 [cpp.include]: cpp.md#cpp.include
 [cpp.stringize]: cpp.md#cpp.stringize
 [dcl.attr.grammar]: dcl.md#dcl.attr.grammar
 [headers]: library.md#headers
 [lex]: #lex
 [lex.bool]: #lex.bool
 [lex.ccon]: #lex.ccon
 [lex.charset]: #lex.charset
 [lex.comment]: #lex.comment
 [lex.digraph]: #lex.digraph
 [lex.ext]: #lex.ext
 [lex.fcon]: #lex.fcon
 [lex.header]: #lex.header
 [lex.icon]: #lex.icon
 [lex.key]: #lex.key
 [lex.literal]: #lex.literal
 [lex.literal.kinds]: #lex.literal.kinds
 [lex.name]: #lex.name
 [lex.nullptr]: #lex.nullptr
 [lex.operators]: #lex.operators
 [lex.phases]: #lex.phases
 [lex.ppnumber]: #lex.ppnumber
 [lex.pptoken]: #lex.pptoken
 [lex.separate]: #lex.separate
 [lex.string]: #lex.string
 [lex.token]: #lex.token
 [over.literal]: over.md#over.literal
-[tab:alternative.representations]: #tab:alternative.representations
-[tab:alternative.tokens]: #tab:alternative.tokens
-[tab:charname.allowed]: #tab:charname.allowed
-[tab:charname.disallowed]: #tab:charname.disallowed
-[tab:escape.sequences]: #tab:escape.sequences
-[tab:identifiers.special]: #tab:identifiers.special
-[tab:keywords]: #tab:keywords
-[tab:lex.string.concat]: #tab:lex.string.concat
-[tab:lex.type.integer.literal]: #tab:lex.type.integer.literal
 [temp.explicit]: temp.md#temp.explicit
 [temp.names]: temp.md#temp.names
 [^1]: Implementations must behave as if these separate phases occur,
     although in practice different phases might be folded together.
@@ -897,21 +943,21 @@ int main() {
     (described in translation phase 1) is specified as
     *implementation-defined*, an implementation is required to document
     how the basic source characters are represented in source files.
 [^5]: A sequence of characters resembling a *universal-character-name*
-    in an *r-char-sequence* ([[lex.string]]) does not form a
     *universal-character-name*.
 [^6]:  These include “digraphs” and additional reserved words. The term
     “digraph” (token consisting of two characters) is not perfectly
-    descriptive, since one of the alternative preprocessing-tokens is
     `%:%:` and of course several primary tokens contain two characters.
     Nonetheless, those alternative tokens that aren’t lexical keywords
     are colloquially known as “digraphs”.
-[^7]: Thus the “stringized” values ([[cpp.stringize]]) of `[` and `<:`
     will be different, maintaining the source spelling, but the tokens
     can otherwise be freely interchanged.
 [^8]: Literals include strings and character and numeric literals.
@@ -928,15 +974,13 @@ int main() {
     long external identifier, but C++ does not place a translation limit
     on significant characters for external identifiers. In C++, upper-
     and lower-case letters are considered different for all identifiers,
     including external identifiers.
-[^11]: The term “literal” generally designates, in this International
- Standard, those tokens that are called “constants” in ISO C.
-[^12]: The digits `8` and `9` are not octal digits.
-[^13]: They are intended for character sets where a character does not
     fit into a single byte.
-[^14]: Using an escape sequence for a question mark is supported for
     compatibility with ISO C++14 and ISO C.

 ``` bnf
 literal:
     integer-literal
     character-literal
+    floating-point-literal
     string-literal
     boolean-literal
     pointer-literal
     user-defined-literal
 ```
 hexadecimal-literal:
     hexadecimal-prefix hexadecimal-digit-sequence
 ```
 ``` bnf
+binary-digit: one of
+    '0 1'
 ```
 ``` bnf
 octal-digit: one of
     '0 1 2 3 4 5 6 7'
 ``` bnf
 long-long-suffix: one of
     'll LL'
 ```
+In an *integer-literal*, the sequence of *binary-digit*s,
+*octal-digit*s, *digit*s, or *hexadecimal-digit*s is interpreted as a
+base N integer as shown in table [[lex.icon.base]]; the lexically first
+digit of the sequence of digits is the most significant.
+[*Note 1*: The prefix and any optional separating single quotes are
+ignored when determining the value. — *end note*]
+**Table: Base of *integer-literal*{s}** <a id="lex.icon.base">[lex.icon.base]</a>
+| Kind of *integer-literal* | base $N$ |
+| ------------------------- | -------- |
+| *binary-literal*          | 2        |
+| *octal-literal*           | 8        |
+| *decimal-literal*         | 10       |
+| *hexadecimal-literal*     | 16       |
+The *hexadecimal-digit*s `a` through `f` and `A` through `F` have
 decimal values ten through fifteen.
 [*Example 1*: The number twelve can be written `12`, `014`, `0XC`, or
+`0b1100`. The *integer-literal*s `1048576`, `1'048'576`, `0X100000`,
 `0x10'0000`, and `0'004'000'000` all have the same
 value. — *end example*]
+The type of an *integer-literal* is the first type in the list in
+[[lex.icon.type]] corresponding to its optional *integer-suffix* in
+which its value can be represented. An *integer-literal* is a prvalue.
+**Table: Types of *integer-literal*s** <a id="lex.icon.type">[lex.icon.type]</a>
+| *integer-suffix* | *decimal-literal*        | *integer-literal* other than *decimal-literal* |
+| ---------------- | ------------------------ | ---------------------------------------------- |
 | none             | `int`                    | `int`                                          |
 |                  | `long int`               | `unsigned int`                                 |
 |                  | `long long int`          | `long int`                                     |
 |                  |                          | `unsigned long int`                            |
 |                  |                          | `long long int`                                |
 |                  |                          | `unsigned long long int`                       |
 | Both `u` or `U`  | `unsigned long long int` | `unsigned long long int`                       |
 | and `ll` or `LL` |                          |                                                |
+If an *integer-literal* cannot be represented by any type in its list
+and an extended integer type [[basic.fundamental]] can represent its
 value, it may have that extended integer type. If all of the types in
+the list for the *integer-literal* are signed, the extended integer type
+shall be signed. If all of the types in the list for the
+*integer-literal* are unsigned, the extended integer type shall be
+unsigned. If the list contains both signed and unsigned types, the
+extended integer type may be signed or unsigned. A program is ill-formed
+if one of its translation units contains an *integer-literal* that
+cannot be represented by any of the allowed types.
 ### Character literals <a id="lex.ccon">[[lex.ccon]]</a>
 ``` bnf
 character-literal:
 c-char-sequence:
     c-char
     c-char-sequence c-char
 ```
+``` bnf
+c-char:
+    any member of the basic source character set except the single-quote ''', backslash '\', or new-line character
+    escape-sequence
+    universal-character-name
+```
 ``` bnf
 escape-sequence:
     simple-escape-sequence
     octal-escape-sequence
     hexadecimal-escape-sequence
 hexadecimal-escape-sequence:
     '\x' hexadecimal-digit
     hexadecimal-escape-sequence hexadecimal-digit
 ```
+A *character-literal* that does not begin with `u8`, `u`, `U`, or `L` is
 an *ordinary character literal*. An ordinary character literal that
 contains a single *c-char* representable in the execution character set
 has type `char`, with value equal to the numerical value of the encoding
 of the *c-char* in the execution character set. An ordinary character
+literal that contains more than one *c-char* is a
+*multicharacter literal*. A multicharacter literal, or an ordinary
+character literal containing a single *c-char* not representable in the
+execution character set, is conditionally-supported, has type `int`, and
+has an *implementation-defined* value.
+A *character-literal* that begins with `u8`, such as `u8'w'`, is a
+*character-literal* of type `char8_t`, known as a *UTF-8 character
+literal*. The value of a UTF-8 character literal is equal to its ISO/IEC
+10646 code point value, provided that the code point value can be
+encoded as a single UTF-8 code unit.
+[*Note 1*: That is, provided the code point value is in the range
+[0, 7F] (hexadecimal). — *end note*]
+If the value is not representable with a single UTF-8 code unit, the
+program is ill-formed. A UTF-8 character literal containing multiple
+*c-char*s is ill-formed.
+A *character-literal* that begins with the letter `u`, such as `u'x'`,
+is a *character-literal* of type `char16_t`, known as a *UTF-16
+character literal*. The value of a UTF-16 character literal is equal to
+its ISO/IEC 10646 code point value, provided that the code point value
+is representable with a single 16-bit code unit.
+[*Note 2*: That is, provided the code point value is in the range
+[0, FFFF] (hexadecimal). — *end note*]
+If the value is not representable with a single 16-bit code unit, the
+program is ill-formed. A UTF-16 character literal containing multiple
 *c-char*s is ill-formed.
+A *character-literal* that begins with the letter `U`, such as `U'y'`,
+is a *character-literal* of type `char32_t`, known as a *UTF-32
+character literal*. The value of a UTF-32 character literal containing a
+single *c-char* is equal to its ISO/IEC 10646 code point value. A UTF-32
+character literal containing multiple *c-char*s is ill-formed.
+A *character-literal* that begins with the letter `L`, such as `L'z'`,
+is a *wide-character literal*. A wide-character literal has type
+`wchar_t`.[^12] The value of a wide-character literal containing a
 single *c-char* has value equal to the numerical value of the encoding
 of the *c-char* in the execution wide-character set, unless the *c-char*
 has no representation in the execution wide-character set, in which case
 the value is *implementation-defined*.
+[*Note 3*: The type `wchar_t` is able to represent all members of the
 execution wide-character set (see
 [[basic.fundamental]]). — *end note*]
 The value of a wide-character literal containing multiple *c-char*s is
 *implementation-defined*.
 Certain non-graphic characters, the single quote `'`, the double quote
+`"`, the question mark `?`,[^13] and the backslash `\`, can be
+represented according to [[lex.ccon.esc]]. The double quote `"` and the
+question mark `?`, can be represented as themselves or by the escape
+sequences `\"` and `\?` respectively, but the single quote `'` and the
+backslash `\` shall be represented by the escape sequences `\'` and `\\`
+respectively. Escape sequences in which the character following the
+backslash is not listed in [[lex.ccon.esc]] are conditionally-supported,
+with *implementation-defined* semantics. An escape sequence specifies a
+single character.
+**Table: Escape sequences** <a id="lex.ccon.esc">[lex.ccon.esc]</a>
 |                 |                |                    |
 | --------------- | -------------- | ------------------ |
 | new-line        | NL(LF)         | `\n`               |
 | horizontal tab  | HT             | `\t`               |
 backslash followed by `x` followed by one or more hexadecimal digits
 that are taken to specify the value of the desired character. There is
 no limit to the number of digits in a hexadecimal sequence. A sequence
 of octal or hexadecimal digits is terminated by the first character that
 is not an octal digit or a hexadecimal digit, respectively. The value of
+a *character-literal* is *implementation-defined* if it falls outside of
+the *implementation-defined* range defined for `char` (for
+*character-literal*s with no prefix) or `wchar_t` (for
+*character-literal*s prefixed by `L`).
+[*Note 4*: If the value of a *character-literal* prefixed by `u`, `u8`,
 or `U` is outside the range defined for its type, the program is
 ill-formed. — *end note*]
 A *universal-character-name* is translated to the encoding, in the
 appropriate execution character set, of the character named. If there is
 no such encoding, the *universal-character-name* is translated to an
 *implementation-defined* encoding.
+[*Note 5*: In translation phase 1, a *universal-character-name* is
 introduced whenever an actual extended character is encountered in the
 source text. Therefore, all extended characters are described in terms
 of *universal-character-name*s. However, the actual compiler
 implementation may use its own native character set, so long as the same
 results are obtained. — *end note*]
+### Floating-point literals <a id="lex.fcon">[[lex.fcon]]</a>
 ``` bnf
+floating-point-literal:
+    decimal-floating-point-literal
+    hexadecimal-floating-point-literal
 ```
 ``` bnf
+decimal-floating-point-literal:
+    fractional-constant exponent-partₒₚₜ floating-point-suffixₒₚₜ
+    digit-sequence exponent-part floating-point-suffixₒₚₜ
 ```
 ``` bnf
+hexadecimal-floating-point-literal:
+    hexadecimal-prefix hexadecimal-fractional-constant binary-exponent-part floating-point-suffixₒₚₜ
+    hexadecimal-prefix hexadecimal-digit-sequence binary-exponent-part floating-point-suffixₒₚₜ
 ```
 ``` bnf
 fractional-constant:
     digit-sequenceₒₚₜ '.' digit-sequence
     digit
     digit-sequence '''ₒₚₜ digit
 ```
 ``` bnf
+floating-point-suffix: one of
     'f l F L'
 ```
+The type of a *floating-point-literal* is determined by its
+*floating-point-suffix* as specified in [[lex.fcon.type]].
+**Table: Types of *floating-point-literal*{s}** <a id="lex.fcon.type">[lex.fcon.type]</a>
+| *floating-point-suffix* | type            |
+| ----------------------- | --------------- |
+| none                    | `double`        |
+| `f` or `F`              | `float`         |
+| `l` or `L`              | `long` `double` |
+The *significand* of a *floating-point-literal* is the
+*fractional-constant* or *digit-sequence* of a
+*decimal-floating-point-literal* or the
+*hexadecimal-fractional-constant* or *hexadecimal-digit-sequence* of a
+*hexadecimal-floating-point-literal*. In the significand, the sequence
+of *digit*s or *hexadecimal-digit*s and optional period are interpreted
+as a base N real number s, where N is 10 for a
+*decimal-floating-point-literal* and 16 for a
+*hexadecimal-floating-point-literal*.
+[*Note 1*: Any optional separating single quotes are ignored when
+determining the value. — *end note*]
+If an *exponent-part* or *binary-exponent-part* is present, the exponent
+e of the *floating-point-literal* is the result of interpreting the
+sequence of an optional *sign* and the *digit*s as a base 10 integer.
+Otherwise, the exponent e is 0. The scaled value of the literal is
+s × 10ᵉ for a *decimal-floating-point-literal* and s × 2ᵉ for a
+*hexadecimal-floating-point-literal*.
+[*Example 1*: The *floating-point-literal*s `49.625` and `0xC.68p+2`
+have the same value. The *floating-point-literal*s `1.602'176'565e-19`
+and `1.602176565e-19` have the same value. — *end example*]
 If the scaled value is not in the range of representable values for its
+type, the program is ill-formed. Otherwise, the value of a
+*floating-point-literal* is the scaled value if representable, else the
+larger or smaller representable value nearest the scaled value, chosen
+in an *implementation-defined* manner.
 ### String literals <a id="lex.string">[[lex.string]]</a>
 ``` bnf
 string-literal:
 s-char-sequence:
     s-char
     s-char-sequence s-char
 ```
+``` bnf
+s-char:
+    any member of the basic source character set except the double-quote '"', backslash '\', or new-line character
+    escape-sequence
+    universal-character-name
+```
 ``` bnf
 raw-string:
     '"' d-char-sequenceₒₚₜ '(' r-char-sequenceₒₚₜ ')' d-char-sequenceₒₚₜ '"'
 ```
 r-char-sequence:
     r-char
     r-char-sequence r-char
 ```
+``` bnf
+r-char:
+    any member of the source character set, except a right parenthesis ')' followed by
+       the initial *d-char-sequence* (which may be empty) followed by a double quote '"'.
+```
 ``` bnf
 d-char-sequence:
     d-char
     d-char-sequence d-char
 ```
+``` bnf
+d-char:
+    any member of the basic source character set except:
+       space, the left parenthesis '(', the right parenthesis ')', the backslash '\', and the control characters
+       representing horizontal tab, vertical tab, form feed, and newline.
+```
 A *string-literal* that has an `R` in the prefix is a *raw string
 literal*. The *d-char-sequence* serves as a delimiter. The terminating
 *d-char-sequence* of a *raw-string* is the same sequence of characters
 as the initial *d-char-sequence*. A *d-char-sequence* shall consist of
 ```
 is equivalent to `"\n)\\\na\"\n"`. The raw string
 ``` cpp
+R"(x = "\"y\"")"
 ```
+is equivalent to `"x = \"\\\"y\\\"\""`.
 — *end example*]
 After translation phase 6, a *string-literal* that does not begin with
+an *encoding-prefix* is an *ordinary string literal*. An ordinary string
+literal has type “array of *n* `const char`” where *n* is the size of
+the string as defined below, has static storage duration [[basic.stc]],
+and is initialized with the given characters.
 A *string-literal* that begins with `u8`, such as `u8"asdf"`, is a
+*UTF-8 string literal*. A UTF-8 string literal has type “array of *n*
+`const char8_t`”, where *n* is the size of the string as defined below;
+each successive element of the object representation [[basic.types]] has
+the value of the corresponding code unit of the UTF-8 encoding of the
+string.
 Ordinary string literals and UTF-8 string literals are also referred to
+as narrow string literals.
+A *string-literal* that begins with `u`, such as `u"asdf"`, is a *UTF-16
+string literal*. A UTF-16 string literal has type “array of *n*
+`const char16_t`”, where *n* is the size of the string as defined below;
+each successive element of the array has the value of the corresponding
+code unit of the UTF-16 encoding of the string.
+[*Note 3*: A single *c-char* may produce more than one `char16_t`
+character in the form of surrogate pairs. A surrogate pair is a
+representation for a single code point as a sequence of two 16-bit code
+units. — *end note*]
+A *string-literal* that begins with `U`, such as `U"asdf"`, is a *UTF-32
+string literal*. A UTF-32 string literal has type “array of *n*
+`const char32_t`”, where *n* is the size of the string as defined below;
+each successive element of the array has the value of the corresponding
+code unit of the UTF-32 encoding of the string.
 A *string-literal* that begins with `L`, such as `L"asdf"`, is a *wide
 string literal*. A wide string literal has type “array of *n* `const
 wchar_t`”, where *n* is the size of the string as defined below; it is
 initialized with the given characters.
+In translation phase 6 [[lex.phases]], adjacent *string-literal*s are
 concatenated. If both *string-literal*s have the same *encoding-prefix*,
+the resulting concatenated *string-literal* has that *encoding-prefix*.
+If one *string-literal* has no *encoding-prefix*, it is treated as a
 *string-literal* of the same *encoding-prefix* as the other operand. If
 a UTF-8 string literal token is adjacent to a wide string literal token,
 the program is ill-formed. Any other concatenations are
 conditionally-supported with *implementation-defined* behavior.
+[*Note 4*: This concatenation is an interpretation, not a conversion.
 Because the interpretation happens in translation phase 6 (after each
+character from a *string-literal* has been translated into a value from
 the appropriate character set), a *string-literal*’s initial rawness has
 no effect on the interpretation or well-formedness of the
 concatenation. — *end note*]
+[[lex.string.concat]] has some examples of valid concatenations.
+**Table: String literal concatenations** <a id="lex.string.concat">[lex.string.concat]</a>
 |                            |       |                            |       |                            |       |
 | -------------------------- | ----- | -------------------------- | ----- | -------------------------- | ----- |
 | *[spans 2 columns]* Source | Means | *[spans 2 columns]* Source | Means | *[spans 2 columns]* Source | Means |
 | `u"a"`                     | `u"b"` | `u"ab"`                    | `U"a"` | `U"b"`                     | `U"ab"` | `L"a"` | `L"b"` | `L"ab"` |
 contains the two characters `'\xA'` and `'B'` after concatenation (and
 not the single hexadecimal character `'\xAB'`).
 — *end example*]
+After any necessary concatenation, in translation phase 7
+[[lex.phases]], `'\0'` is appended to every *string-literal* so that
 programs that scan a string can find its end.
 Escape sequences and *universal-character-name*s in non-raw string
+literals have the same meaning as in *character-literal*s [[lex.ccon]],
 except that the single quote `'` is representable either by itself or by
 the escape sequence `\'`, and the double quote `"` shall be preceded by
+a `\`, and except that a *universal-character-name* in a UTF-16 string
+literal may yield a surrogate pair. In a narrow string literal, a
+*universal-character-name* may map to more than one `char` or `char8_t`
+element due to *multibyte encoding*. The size of a `char32_t` or wide
+string literal is the total number of escape sequences,
+*universal-character-name*s, and other characters, plus one for the
+terminating `U'\0'` or `L'\0'`. The size of a UTF-16 string literal is
+the total number of escape sequences, *universal-character-name*s, and
+other characters, plus one for each character requiring a surrogate
+pair, plus one for the terminating `u'\0'`.
+[*Note 5*: The size of a `char16_t` string literal is the number of
 code units, not the number of characters. — *end note*]
+[*Note 6*: Any *universal-character-name*s are required to correspond
+to a code point in the range [0, D800) or [E000, 10FFFF] (hexadecimal)
+[[lex.charset]]. — *end note*]
+The size of a narrow string literal is the total number of escape
+sequences and other characters, plus at least one for the multibyte
+encoding of each *universal-character-name*, plus one for the
 terminating `'\0'`.
 Evaluating a *string-literal* results in a string literal object with
 static storage duration, initialized from the given characters as
+specified above. Whether all *string-literal*s are distinct (that is,
+are stored in nonoverlapping objects) and whether successive evaluations
+of a *string-literal* yield the same or a different object is
+unspecified.
+[*Note 7*:  The effect of attempting to modify a *string-literal* is
 undefined. — *end note*]
 ### Boolean literals <a id="lex.bool">[[lex.bool]]</a>
 ``` bnf
 The pointer literal is the keyword `nullptr`. It is a prvalue of type
 `std::nullptr_t`.
 [*Note 1*: `std::nullptr_t` is a distinct type that is neither a
+pointer type nor a pointer-to-member type; rather, a prvalue of this
 type is a null pointer constant and can be converted to a null pointer
 value or null member pointer value. See  [[conv.ptr]] and
 [[conv.mem]]. — *end note*]
 ### User-defined literals <a id="lex.ext">[[lex.ext]]</a>
 ``` bnf
 user-defined-literal:
     user-defined-integer-literal
+    user-defined-floating-point-literal
     user-defined-string-literal
     user-defined-character-literal
 ```
 ``` bnf
     hexadecimal-literal ud-suffix
     binary-literal ud-suffix
 ```
 ``` bnf
+user-defined-floating-point-literal:
     fractional-constant exponent-partₒₚₜ ud-suffix
     digit-sequence exponent-part ud-suffix
     hexadecimal-prefix hexadecimal-fractional-constant binary-exponent-part ud-suffix
     hexadecimal-prefix hexadecimal-digit-sequence binary-exponent-part ud-suffix
 ```
 The syntactic non-terminal preceding the *ud-suffix* in a
 *user-defined-literal* is taken to be the longest sequence of characters
 that could match that non-terminal.
 A *user-defined-literal* is treated as a call to a literal operator or
+literal operator template [[over.literal]]. To determine the form of
 this call for a given *user-defined-literal* *L* with *ud-suffix* *X*,
 the *literal-operator-id* whose literal suffix identifier is *X* is
 looked up in the context of *L* using the rules for unqualified name
+lookup [[basic.lookup.unqual]]. Let *S* be the set of declarations found
+by this lookup. *S* shall not be empty.
 If *L* is a *user-defined-integer-literal*, let *n* be the literal
 without its *ud-suffix*. If *S* contains a literal operator with
 parameter type `unsigned long long`, the literal *L* is treated as a
 call of the form
 ``` cpp
 operator "" X(nULL)
 ```
+Otherwise, *S* shall contain a raw literal operator or a numeric literal
+operator template [[over.literal]] but not both. If *S* contains a raw
+literal operator, the literal *L* is treated as a call of the form
 ``` cpp
 operator "" X("n{"})
 ```
+Otherwise (*S* contains a numeric literal operator template), *L* is
+treated as a call of the form
 ``` cpp
 operator "" X<'c₁', 'c₂', ... 'cₖ'>()
 ```
 where *n* is the source character sequence c₁c₂...cₖ.
 [*Note 1*: The sequence c₁c₂...cₖ can only contain characters from the
 basic source character set. — *end note*]
+If *L* is a *user-defined-floating-point-literal*, let *f* be the
+literal without its *ud-suffix*. If *S* contains a literal operator with
 parameter type `long double`, the literal *L* is treated as a call of
 the form
 ``` cpp
 operator "" X(fL)
 ```
+Otherwise, *S* shall contain a raw literal operator or a numeric literal
+operator template [[over.literal]] but not both. If *S* contains a raw
+literal operator, the *literal* *L* is treated as a call of the form
 ``` cpp
 operator "" X("f{"})
 ```
+Otherwise (*S* contains a numeric literal operator template), *L* is
+treated as a call of the form
 ``` cpp
 operator "" X<'c₁', 'c₂', ... 'cₖ'>()
 ```
 [*Note 2*: The sequence c₁c₂...cₖ can only contain characters from the
 basic source character set. — *end note*]
 If *L* is a *user-defined-string-literal*, let *str* be the literal
 without its *ud-suffix* and let *len* be the number of code units in
+*str* (i.e., its length excluding the terminating null character). If
+*S* contains a literal operator template with a non-type template
+parameter for which *str* is a well-formed *template-argument*, the
 literal *L* is treated as a call of the form
+``` cpp
+operator "" X<str>()
+```
+Otherwise, the literal *L* is treated as a call of the form
 ``` cpp
 operator "" X(str, len)
 ```
 If *L* is a *user-defined-character-literal*, let *ch* be the literal
+without its *ud-suffix*. *S* shall contain a literal operator
+[[over.literal]] whose only parameter has the type of *ch* and the
 literal *L* is treated as a call of the form
 ``` cpp
 operator "" X(ch)
 ```
 }
 ```
 — *end example*]
+In translation phase 6 [[lex.phases]], adjacent *string-literal*s are
+concatenated and *user-defined-string-literal*s are considered
+*string-literal*s for that purpose. During concatenation, *ud-suffix*es
+are removed and ignored and the concatenation process occurs as
+described in  [[lex.string]]. At the end of phase 6, if a
+*string-literal* is the result of a concatenation involving at least one
 *user-defined-string-literal*, all the participating
 *user-defined-string-literal*s shall have the same *ud-suffix* and that
 suffix is applied to the result of the concatenation.
 [*Example 3*:
 [basic.fundamental]: basic.md#basic.fundamental
 [basic.link]: basic.md#basic.link
 [basic.lookup.unqual]: basic.md#basic.lookup.unqual
 [basic.stc]: basic.md#basic.stc
 [basic.types]: basic.md#basic.types
+[conv.mem]: expr.md#conv.mem
+[conv.ptr]: expr.md#conv.ptr
 [cpp]: cpp.md#cpp
 [cpp.concat]: cpp.md#cpp.concat
 [cpp.cond]: cpp.md#cpp.cond
+[cpp.import]: cpp.md#cpp.import
 [cpp.include]: cpp.md#cpp.include
+[cpp.module]: cpp.md#cpp.module
 [cpp.stringize]: cpp.md#cpp.stringize
 [dcl.attr.grammar]: dcl.md#dcl.attr.grammar
 [headers]: library.md#headers
 [lex]: #lex
 [lex.bool]: #lex.bool
 [lex.ccon]: #lex.ccon
+[lex.ccon.esc]: #lex.ccon.esc
 [lex.charset]: #lex.charset
 [lex.comment]: #lex.comment
 [lex.digraph]: #lex.digraph
 [lex.ext]: #lex.ext
 [lex.fcon]: #lex.fcon
+[lex.fcon.type]: #lex.fcon.type
 [lex.header]: #lex.header
 [lex.icon]: #lex.icon
+[lex.icon.base]: #lex.icon.base
+[lex.icon.type]: #lex.icon.type
 [lex.key]: #lex.key
+[lex.key.digraph]: #lex.key.digraph
 [lex.literal]: #lex.literal
 [lex.literal.kinds]: #lex.literal.kinds
 [lex.name]: #lex.name
+[lex.name.allowed]: #lex.name.allowed
+[lex.name.disallowed]: #lex.name.disallowed
+[lex.name.special]: #lex.name.special
 [lex.nullptr]: #lex.nullptr
 [lex.operators]: #lex.operators
 [lex.phases]: #lex.phases
 [lex.ppnumber]: #lex.ppnumber
 [lex.pptoken]: #lex.pptoken
 [lex.separate]: #lex.separate
 [lex.string]: #lex.string
+[lex.string.concat]: #lex.string.concat
 [lex.token]: #lex.token
+[module.import]: module.md#module.import
+[module.unit]: module.md#module.unit
 [over.literal]: over.md#over.literal
 [temp.explicit]: temp.md#temp.explicit
 [temp.names]: temp.md#temp.names
 [^1]: Implementations must behave as if these separate phases occur,
     although in practice different phases might be folded together.
     (described in translation phase 1) is specified as
     *implementation-defined*, an implementation is required to document
     how the basic source characters are represented in source files.
 [^5]: A sequence of characters resembling a *universal-character-name*
+    in an *r-char-sequence* [[lex.string]] does not form a
     *universal-character-name*.
 [^6]:  These include “digraphs” and additional reserved words. The term
     “digraph” (token consisting of two characters) is not perfectly
+    descriptive, since one of the alternative *preprocessing-token*s is
     `%:%:` and of course several primary tokens contain two characters.
     Nonetheless, those alternative tokens that aren’t lexical keywords
     are colloquially known as “digraphs”.
+[^7]: Thus the “stringized” values [[cpp.stringize]] of `[` and `<:`
     will be different, maintaining the source spelling, but the tokens
     can otherwise be freely interchanged.
 [^8]: Literals include strings and character and numeric literals.
     long external identifier, but C++ does not place a translation limit
     on significant characters for external identifiers. In C++, upper-
     and lower-case letters are considered different for all identifiers,
     including external identifiers.
+[^11]: The term “literal” generally designates, in this document, those
+    tokens that are called “constants” in ISO C.
+[^12]: They are intended for character sets where a character does not
     fit into a single byte.
+[^13]: Using an escape sequence for a question mark is supported for
     compatibility with ISO C++14 and ISO C.

Diff to HTML by rtfpessoa