[lex.literal] - C++23 → Trunk

Files changed (1) hide show

tmp/tmpwm5zlpj2/{from.md → to.md} +111 -93

tmp/tmpwm5zlpj2/{from.md → to.md} RENAMED Viewed

@@ -115,12 +115,12 @@ size-suffix: one of
    'z Z'
 ```
 In an *integer-literal*, the sequence of *binary-digit*s,
 *octal-digit*s, *digit*s, or *hexadecimal-digit*s is interpreted as a
-base N integer as shown in table [[lex.icon.base]]; the lexically first
-digit of the sequence of digits is the most significant.
 [*Note 1*: The prefix and any optional separating single quotes are
 ignored when determining the value. — *end note*]
 **Table: Base of *integer-literal*{s}** <a id="lex.icon.base">[lex.icon.base]</a>
@@ -173,20 +173,23 @@ which its value can be represented.
 |                  |                                           | `std::size_t`                                  |
 | Both `u` or `U`  | `std::size_t`                             | `std::size_t`                                  |
 | and `z` or `Z`   |                                           |                                                |
-If an *integer-literal* cannot be represented by any type in its list
 and an extended integer type [[basic.fundamental]] can represent its
 value, it may have that extended integer type. If all of the types in
 the list for the *integer-literal* are signed, the extended integer type
-shall be signed. If all of the types in the list for the
-*integer-literal* are unsigned, the extended integer type shall be
-unsigned. If the list contains both signed and unsigned types, the
-extended integer type may be signed or unsigned. A program is ill-formed
-if one of its translation units contains an *integer-literal* that
-cannot be represented by any of the allowed types.
 ### Character literals <a id="lex.ccon">[[lex.ccon]]</a>
 ``` bnf
 character-literal:
@@ -198,12 +201,11 @@ encoding-prefix: one of
     'u8' 'u' 'U' 'L'
 ```
 ``` bnf
 c-char-sequence:
-    c-char
-    c-char-sequence c-char
 ```
 ``` bnf
 c-char:
     basic-c-char
@@ -240,12 +242,11 @@ numeric-escape-sequence:
     hexadecimal-escape-sequence
 ```
 ``` bnf
 simple-octal-digit-sequence:
-    octal-digit
-    simple-octal-digit-sequence octal-digit
 ```
 ``` bnf
 octal-escape-sequence:
     '\' octal-digit
@@ -268,60 +269,47 @@ conditional-escape-sequence:
 ``` bnf
 conditional-escape-sequence-char:
     any member of the basic character set that is not an octal-digit, a simple-escape-sequence-char, or the characters 'N', 'o', 'u', 'U', or 'x'
 ```
-A *non-encodable character literal* is a *character-literal* whose
-*c-char-sequence* consists of a single *c-char* that is not a
-*numeric-escape-sequence* and that specifies a character that either
-lacks representation in the literal’s associated character encoding or
-that cannot be encoded as a single code unit. A *multicharacter literal*
-is a *character-literal* whose *c-char-sequence* consists of more than
-one *c-char*. The *encoding-prefix* of a non-encodable character literal
-or a multicharacter literal shall be absent. Such *character-literal*s
-are conditionally-supported.
 The kind of a *character-literal*, its type, and its associated
 character encoding [[lex.charset]] are determined by its
 *encoding-prefix* and its *c-char-sequence* as defined by
-[[lex.ccon.literal]]. The special cases for non-encodable character
-literals and multicharacter literals take precedence over the base kind.
-[*Note 1*: The associated character encoding for ordinary character
-literals determines encodability, but does not determine the value of
-non-encodable ordinary character literals or ordinary multicharacter
-literals. The examples in [[lex.ccon.literal]] for non-encodable
-ordinary character literals assume that the specified character lacks
-representation in the ordinary literal encoding or that encoding the
-character would require more than one code unit. — *end note*]
 **Table: Character literals** <a id="lex.ccon.literal">[lex.ccon.literal]</a>
-| | | | | |
-| ---- | -------------------------- | ---------- | ------------ | ------- |
-| none | ordinary character literal | `char`     | ordinary | `'v'`   |
 | `L`             | wide character literal     | `wchar_t`  | wide literal                    | `L'w'`  |
 |                 |                            |            | encoding                        |         |
 | `u8`            | UTF-8 character literal    | `char8_t`  | UTF-8                           | `u8'x'` |
 | `u`             | UTF-16 character literal   | `char16_t` | UTF-16                          | `u'y'`  |
 | `U`             | UTF-32 character literal   | `char32_t` | UTF-32                          | `U'z'`  |
 In translation phase 4, the value of a *character-literal* is determined
 using the range of representable values of the *character-literal*’s
-type in translation phase 7. A non-encodable character literal or a
-multicharacter literal has an *implementation-defined* value. The value
-of any other kind of *character-literal* is determined as follows:
 - A *character-literal* with a *c-char-sequence* consisting of a single
   *basic-c-char*, *simple-escape-sequence*, or
   *universal-character-name* is the code unit value of the specified
   character as encoded in the literal’s associated character encoding.
- \[*Note 2*: If the specified character lacks representation in the
- literal’s associated character encoding or if it cannot be encoded as
- a single code unit, then the literal is a non-encodable character
-  literal. — *end note*]
 - A *character-literal* with a *c-char-sequence* consisting of a single
   *numeric-escape-sequence* has a value as follows:
   - Let v be the integer value represented by the octal number
     comprising the sequence of *octal-digit*s in an
     *octal-escape-sequence* or by the hexadecimal number comprising the
@@ -332,20 +320,20 @@ of any other kind of *character-literal* is determined as follows:
     or `L`, and v does not exceed the range of representable values of
     the corresponding unsigned type for the underlying type of the
     *character-literal*’s type, then the value is the unique value of
     the *character-literal*’s type `T` that is congruent to v modulo 2ᴺ,
     where N is the width of `T`.
-  - Otherwise, the *character-literal* is ill-formed.
 - A *character-literal* with a *c-char-sequence* consisting of a single
   *conditional-escape-sequence* is conditionally-supported and has an
   *implementation-defined* value.
 The character specified by a *simple-escape-sequence* is specified in
 [[lex.ccon.esc]].
-[*Note 3*: Using an escape sequence for a question mark is supported
-for compatibility with ISO C++14 and ISO C. — *end note*]
 **Table: Simple escape sequences** <a id="lex.ccon.esc">[lex.ccon.esc]</a>
 | character |                      | *simple-escape-sequence* |
 | --------- | -------------------- | ------------------------ |
@@ -482,12 +470,11 @@ string-literal:
     encoding-prefixₒₚₜ 'R' raw-string
 ```
 ``` bnf
 s-char-sequence:
-    s-char
-    s-char-sequence s-char
 ```
 ``` bnf
 s-char:
     basic-s-char
@@ -506,24 +493,22 @@ raw-string:
     '"' d-char-sequenceₒₚₜ '(' r-char-sequenceₒₚₜ ')' d-char-sequenceₒₚₜ '"'
 ```
 ``` bnf
 r-char-sequence:
-    r-char
-    r-char-sequence r-char
 ```
 ``` bnf
 r-char:
     any member of the translation character set, except a U+0029 (right parenthesis) followed by
        the initial *d-char-sequence* (which may be empty) followed by a U+0022 (quotation mark)
 ```
 ``` bnf
 d-char-sequence:
-    d-char
-    d-char-sequence d-char
 ```
 ``` bnf
 d-char:
     any member of the basic character set except:
@@ -532,16 +517,17 @@ d-char:
 ```
 The kind of a *string-literal*, its type, and its associated character
 encoding [[lex.charset]] are determined by its encoding prefix and
 sequence of *s-char*s or *r-char*s as defined by [[lex.string.literal]]
-where n is the number of encoded code units as described below.
 **Table: String literals** <a id="lex.string.literal">[lex.string.literal]</a>
-| | | |                           |                                                |
-| ---- | ----------------------- | ----------------------------- | ------------------------- | ---------------------------------------------- |
 | none              | ordinary string literal | array of $n$ `const char`     | ordinary literal encoding     | `"ordinary string"` `R"(ordinary raw string)"` |
 | `L`               | wide string literal     | array of $n$ `const wchar_t`  | wide literal encoding         | `L"wide string"` `LR"w(wide raw string)w"`     |
 | `u8`              | UTF-8 string literal    | array of $n$ `const char8_t`  | UTF-8                         | `u8"UTF-8 string"` `u8R"x(UTF-8 raw string)x"` |
 | `u`               | UTF-16 string literal   | array of $n$ `const char16_t` | UTF-16                        | `u"UTF-16 string"` `uR"y(UTF-16 raw string)y"` |
 | `U`               | UTF-32 string literal   | array of $n$ `const char32_t` | UTF-32                        | `U"UTF-32 string"` `UR"z(UTF-32 raw string)z"` |
@@ -551,12 +537,12 @@ A *string-literal* that has an `R` in the prefix is a *raw string
 literal*. The *d-char-sequence* serves as a delimiter. The terminating
 *d-char-sequence* of a *raw-string* is the same sequence of characters
 as the initial *d-char-sequence*. A *d-char-sequence* shall consist of
 at most 16 characters.
-[*Note 1*: The characters `'('` and `')'` are permitted in a
-*raw-string*. Thus, `R"delimiter((a|b))delimiter"` is equivalent to
 `"(a|b)"`. — *end note*]
 [*Note 2*:
 A source-file new-line in a raw string literal results in a new-line in
@@ -592,18 +578,15 @@ R"(x = "\"y\"")"
 is equivalent to `"x = \"\\\"y\\\"\""`.
 — *end example*]
 Ordinary string literals and UTF-8 string literals are also referred to
-as narrow string literals.
-The common *encoding-prefix* for a sequence of adjacent
-*string-literal*s is determined pairwise as follows: If two
-*string-literal*s have the same *encoding-prefix*, the common
-*encoding-prefix* is that *encoding-prefix*. If one *string-literal* has
-no *encoding-prefix*, the common *encoding-prefix* is that of the other
-*string-literal*. Any other combinations are ill-formed.
 [*Note 3*: A *string-literal*’s rawness has no effect on the
 determination of the common *encoding-prefix*. — *end note*]
 In translation phase 6 [[lex.phases]], adjacent *string-literal*s are
@@ -640,16 +623,17 @@ digit `1` (and not the single character `'A'` specified by a
 | `u"a"`                     | `"b"` | `u"ab"`                    | `U"a"` | `"b"`                      | `U"ab"` | `L"a"` | `"b"` | `L"ab"` |
 | `"a"`                      | `u"b"` | `u"ab"`                    | `"a"` | `U"b"`                     | `U"ab"` | `"a"` | `L"b"` | `L"ab"` |
 Evaluating a *string-literal* results in a string literal object with
-static storage duration [[basic.stc]]. Whether all *string-literal*s are
-distinct (that is, are stored in nonoverlapping objects) and whether
-successive evaluations of a *string-literal* yield the same or a
-different object is unspecified.
-[*Note 4*:  The effect of attempting to modify a string literal object
 is undefined. — *end note*]
 String literal objects are initialized with the sequence of code unit
 values corresponding to the *string-literal*’s sequence of *s-char*s
 (originally from non-raw string literals) and *r-char*s (originally from
@@ -659,20 +643,19 @@ order as follows:
 - The sequence of characters denoted by each contiguous sequence of
   *basic-s-char*s, *r-char*s, *simple-escape-sequence*s [[lex.ccon]],
   and *universal-character-name*s [[lex.charset]] is encoded to a code
   unit sequence using the *string-literal*’s associated character
   encoding. If a character lacks representation in the associated
-  character encoding, then the *string-literal* is
- conditionally-supported and an *implementation-defined* code unit
- sequence is encoded. \[*Note 5*: No character lacks representation in
- any Unicode encoding form. — *end note*] When encoding a stateful
- character encoding, implementations should encode the first such
- sequence beginning with the initial encoding state and encode
- subsequent sequences beginning with the final encoding state of the
- prior sequence. \[*Note 6*: The encoded code unit sequence can differ
- from the sequence of code units that would be obtained by encoding
-  each character independently. — *end note*]
 - Each *numeric-escape-sequence* [[lex.ccon]] contributes a single code
   unit with a value as follows:
   - Let v be the integer value represented by the octal number
     comprising the sequence of *octal-digit*s in an
     *octal-escape-sequence* or by the hexadecimal number comprising the
@@ -683,35 +666,53 @@ order as follows:
     `L`, and v does not exceed the range of representable values of the
     corresponding unsigned type for the underlying type of the
     *string-literal*’s array element type, then the value is the unique
     value of the *string-literal*’s array element type `T` that is
     congruent to v modulo 2ᴺ, where N is the width of `T`.
-  - Otherwise, the *string-literal* is ill-formed.
   When encoding a stateful character encoding, these sequences should
   have no effect on encoding state.
 - Each *conditional-escape-sequence* [[lex.ccon]] contributes an
   *implementation-defined* code unit sequence. When encoding a stateful
   character encoding, it is *implementation-defined* what effect these
   sequences have on encoding state.
 ### Boolean literals <a id="lex.bool">[[lex.bool]]</a>
 ``` bnf
 boolean-literal:
- 'false'
- 'true'
 ```
 The Boolean literals are the keywords `false` and `true`. Such literals
 have type `bool`.
 ### Pointer literals <a id="lex.nullptr">[[lex.nullptr]]</a>
 ``` bnf
 pointer-literal:
- 'nullptr'
 ```
 The pointer literal is the keyword `nullptr`. It has type
 `std::nullptr_t`.
@@ -843,11 +844,11 @@ where *f* is the source character sequence c₁c₂...cₖ.
 basic character set. — *end note*]
 If *L* is a *user-defined-string-literal*, let *str* be the literal
 without its *ud-suffix* and let *len* be the number of code units in
 *str* (i.e., its length excluding the terminating null character). If
-*S* contains a literal operator template with a non-type template
 parameter for which *str* is a well-formed *template-argument*, the
 literal *L* is treated as a call of the form
 ``` cpp
 operator ""X<str>()
@@ -910,26 +911,37 @@ int main() {
 [basic.fundamental]: basic.md#basic.fundamental
 [basic.link]: basic.md#basic.link
 [basic.lookup.unqual]: basic.md#basic.lookup.unqual
 [basic.stc]: basic.md#basic.stc
 [character.seq]: library.md#character.seq
 [conv.mem]: expr.md#conv.mem
 [conv.ptr]: expr.md#conv.ptr
 [cpp]: cpp.md#cpp
 [cpp.cond]: cpp.md#cpp.cond
 [cpp.import]: cpp.md#cpp.import
 [cpp.include]: cpp.md#cpp.include
 [cpp.module]: cpp.md#cpp.module
 [cpp.stringize]: cpp.md#cpp.stringize
 [dcl.attr.grammar]: dcl.md#dcl.attr.grammar
 [expr.prim.literal]: expr.md#expr.prim.literal
 [headers]: library.md#headers
 [lex]: #lex
 [lex.bool]: #lex.bool
 [lex.ccon]: #lex.ccon
 [lex.ccon.esc]: #lex.ccon.esc
 [lex.ccon.literal]: #lex.ccon.literal
 [lex.charset]: #lex.charset
 [lex.charset.basic]: #lex.charset.basic
 [lex.charset.literal]: #lex.charset.literal
 [lex.comment]: #lex.comment
 [lex.digraph]: #lex.digraph
@@ -953,50 +965,56 @@ int main() {
 [lex.pptoken]: #lex.pptoken
 [lex.separate]: #lex.separate
 [lex.string]: #lex.string
 [lex.string.concat]: #lex.string.concat
 [lex.string.literal]: #lex.string.literal
 [lex.token]: #lex.token
 [module.import]: module.md#module.import
 [module.unit]: module.md#module.unit
 [over.literal]: over.md#over.literal
 [support.types.layout]: support.md#support.types.layout
 [temp.explicit]: temp.md#temp.explicit
 [temp.names]: temp.md#temp.names
 [^1]: Implementations behave as if these separate phases occur, although
     in practice different phases can be folded together.
-[^2]: A partial preprocessing token would arise from a source file
     ending in the first portion of a multi-character token that requires
     a terminating sequence of characters, such as a *header-name* that
     is missing the closing `"` or `>`. A partial comment would arise
     from a source file ending with an unclosed `/*` comment.
-[^3]:  These include “digraphs” and additional reserved words. The term
     “digraph” (token consisting of two characters) is not perfectly
     descriptive, since one of the alternative *preprocessing-token*s is
     `%:%:` and of course several primary tokens contain two characters.
     Nonetheless, those alternative tokens that aren’t lexical keywords
     are colloquially known as “digraphs”.
-[^4]: Thus the “stringized” values [[cpp.stringize]] of `[` and `<:`
     will be different, maintaining the source spelling, but the tokens
     can otherwise be freely interchanged.
-[^5]: Literals include strings and character and numeric literals.
-[^6]: Thus, a sequence of characters that resembles an escape sequence
-    can result in an error, be interpreted as the character
-    corresponding to the escape sequence, or have a completely different
-    meaning, depending on the implementation.
 [^7]: On systems in which linkers cannot accept extended characters, an
     encoding of the \*universal-character-name\* can be used in forming
     valid external identifiers. For example, some otherwise unused
     character or sequence of characters can be used to encode the `̆` in
     a \*universal-character-name\*. Extended characters can produce a
     long external identifier, but C++ does not place a translation limit
     on significant characters for external identifiers.
 [^8]: The term “literal” generally designates, in this document, those
-    tokens that are called “constants” in ISO C.

    'z Z'
 ```
 In an *integer-literal*, the sequence of *binary-digit*s,
 *octal-digit*s, *digit*s, or *hexadecimal-digit*s is interpreted as a
+base N integer as shown in [[lex.icon.base]]; the lexically first digit
+of the sequence of digits is the most significant.
 [*Note 1*: The prefix and any optional separating single quotes are
 ignored when determining the value. — *end note*]
 **Table: Base of *integer-literal*{s}** <a id="lex.icon.base">[lex.icon.base]</a>
 |                  |                                           | `std::size_t`                                  |
 | Both `u` or `U`  | `std::size_t`                             | `std::size_t`                                  |
 | and `z` or `Z`   |                                           |                                                |
+Except for *integer-literal*s containing a *size-suffix*, if the value
+of an *integer-literal* cannot be represented by any type in its list
 and an extended integer type [[basic.fundamental]] can represent its
 value, it may have that extended integer type. If all of the types in
 the list for the *integer-literal* are signed, the extended integer type
+is signed. If all of the types in the list for the *integer-literal* are
+unsigned, the extended integer type is unsigned. If the list contains
+both signed and unsigned types, the extended integer type may be signed
+or unsigned. If an *integer-literal* cannot be represented by any of the
+allowed types, the program is ill-formed.
+[*Note 2*: An *integer-literal* with a `z` or `Z` suffix is ill-formed
+if it cannot be represented by `std::size_t`. — *end note*]
 ### Character literals <a id="lex.ccon">[[lex.ccon]]</a>
 ``` bnf
 character-literal:
     'u8' 'u' 'U' 'L'
 ```
 ``` bnf
 c-char-sequence:
+    c-char c-char-sequenceₒₚₜ
 ```
 ``` bnf
 c-char:
     basic-c-char
     hexadecimal-escape-sequence
 ```
 ``` bnf
 simple-octal-digit-sequence:
+    octal-digit simple-octal-digit-sequenceₒₚₜ
 ```
 ``` bnf
 octal-escape-sequence:
     '\' octal-digit
 ``` bnf
 conditional-escape-sequence-char:
     any member of the basic character set that is not an octal-digit, a simple-escape-sequence-char, or the characters 'N', 'o', 'u', 'U', or 'x'
 ```
+A *multicharacter literal* is a *character-literal* whose
+*c-char-sequence* consists of more than one *c-char*. A multicharacter
+literal shall not have an *encoding-prefix*. If a multicharacter literal
+contains a *c-char* that is not encodable as a single code unit in the
+ordinary literal encoding, the program is ill-formed. Multicharacter
+literals are conditionally-supported.
 The kind of a *character-literal*, its type, and its associated
 character encoding [[lex.charset]] are determined by its
 *encoding-prefix* and its *c-char-sequence* as defined by
+[[lex.ccon.literal]].
 **Table: Character literals** <a id="lex.ccon.literal">[lex.ccon.literal]</a>
+| Encoding prefix | Kind \chdr                 | Type \chdr | Associated char- acter encoding | Example |
+| --------------- | -------------------------- | ---------- | ------------------------------- | ------- |
+| none | ordinary character literal | `char`     | ordinary literal                | `'v'`   |
 | `L`             | wide character literal     | `wchar_t`  | wide literal                    | `L'w'`  |
 |                 |                            |            | encoding                        |         |
 | `u8`            | UTF-8 character literal    | `char8_t`  | UTF-8                           | `u8'x'` |
 | `u`             | UTF-16 character literal   | `char16_t` | UTF-16                          | `u'y'`  |
 | `U`             | UTF-32 character literal   | `char32_t` | UTF-32                          | `U'z'`  |
 In translation phase 4, the value of a *character-literal* is determined
 using the range of representable values of the *character-literal*’s
+type in translation phase 7. A multicharacter literal has an
+*implementation-defined* value. The value of any other kind of
+*character-literal* is determined as follows:
 - A *character-literal* with a *c-char-sequence* consisting of a single
   *basic-c-char*, *simple-escape-sequence*, or
   *universal-character-name* is the code unit value of the specified
   character as encoded in the literal’s associated character encoding.
+  If the specified character lacks representation in the literal’s
+  associated character encoding or if it cannot be encoded as a single
+  code unit, then the program is ill-formed.
 - A *character-literal* with a *c-char-sequence* consisting of a single
   *numeric-escape-sequence* has a value as follows:
   - Let v be the integer value represented by the octal number
     comprising the sequence of *octal-digit*s in an
     *octal-escape-sequence* or by the hexadecimal number comprising the
     or `L`, and v does not exceed the range of representable values of
     the corresponding unsigned type for the underlying type of the
     *character-literal*’s type, then the value is the unique value of
     the *character-literal*’s type `T` that is congruent to v modulo 2ᴺ,
     where N is the width of `T`.
+  - Otherwise, the program is ill-formed.
 - A *character-literal* with a *c-char-sequence* consisting of a single
   *conditional-escape-sequence* is conditionally-supported and has an
   *implementation-defined* value.
 The character specified by a *simple-escape-sequence* is specified in
 [[lex.ccon.esc]].
+[*Note 1*: Using an escape sequence for a question mark is supported
+for compatibility with C++14 and C. — *end note*]
 **Table: Simple escape sequences** <a id="lex.ccon.esc">[lex.ccon.esc]</a>
 | character |                      | *simple-escape-sequence* |
 | --------- | -------------------- | ------------------------ |
     encoding-prefixₒₚₜ 'R' raw-string
 ```
 ``` bnf
 s-char-sequence:
+    s-char s-char-sequenceₒₚₜ
 ```
 ``` bnf
 s-char:
     basic-s-char
     '"' d-char-sequenceₒₚₜ '(' r-char-sequenceₒₚₜ ')' d-char-sequenceₒₚₜ '"'
 ```
 ``` bnf
 r-char-sequence:
+    r-char r-char-sequenceₒₚₜ
 ```
 ``` bnf
 r-char:
     any member of the translation character set, except a U+0029 (right parenthesis) followed by
        the initial *d-char-sequence* (which may be empty) followed by a U+0022 (quotation mark)
 ```
 ``` bnf
 d-char-sequence:
+    d-char d-char-sequenceₒₚₜ
 ```
 ``` bnf
 d-char:
     any member of the basic character set except:
 ```
 The kind of a *string-literal*, its type, and its associated character
 encoding [[lex.charset]] are determined by its encoding prefix and
 sequence of *s-char*s or *r-char*s as defined by [[lex.string.literal]]
+where n is the number of encoded code units that would result from an
+evaluation of the *string-literal* (see below).
 **Table: String literals** <a id="lex.string.literal">[lex.string.literal]</a>
+| Enco- ding prefix | Kind \chdr \chdr        | Type \chdr \chdr              | Associated character encoding | Examples \rhdr \rhdr                           |
+| ----------------- | ----------------------- | ----------------------------- | ----------------------------- | ---------------------------------------------- |
 | none              | ordinary string literal | array of $n$ `const char`     | ordinary literal encoding     | `"ordinary string"` `R"(ordinary raw string)"` |
 | `L`               | wide string literal     | array of $n$ `const wchar_t`  | wide literal encoding         | `L"wide string"` `LR"w(wide raw string)w"`     |
 | `u8`              | UTF-8 string literal    | array of $n$ `const char8_t`  | UTF-8                         | `u8"UTF-8 string"` `u8R"x(UTF-8 raw string)x"` |
 | `u`               | UTF-16 string literal   | array of $n$ `const char16_t` | UTF-16                        | `u"UTF-16 string"` `uR"y(UTF-16 raw string)y"` |
 | `U`               | UTF-32 string literal   | array of $n$ `const char32_t` | UTF-32                        | `U"UTF-32 string"` `UR"z(UTF-32 raw string)z"` |
 literal*. The *d-char-sequence* serves as a delimiter. The terminating
 *d-char-sequence* of a *raw-string* is the same sequence of characters
 as the initial *d-char-sequence*. A *d-char-sequence* shall consist of
 at most 16 characters.
+[*Note 1*: The characters `'('` and `')'` can appear in a *raw-string*.
+Thus, `R"delimiter((a|b))delimiter"` is equivalent to
 `"(a|b)"`. — *end note*]
 [*Note 2*:
 A source-file new-line in a raw string literal results in a new-line in
 is equivalent to `"x = \"\\\"y\\\"\""`.
 — *end example*]
 Ordinary string literals and UTF-8 string literals are also referred to
+as *narrow string literals*.
+The *string-literal*s in any sequence of adjacent *string-literal*s
+shall have at most one unique *encoding-prefix* among them. The common
+*encoding-prefix* of the sequence is that *encoding-prefix*, if any.
 [*Note 3*: A *string-literal*’s rawness has no effect on the
 determination of the common *encoding-prefix*. — *end note*]
 In translation phase 6 [[lex.phases]], adjacent *string-literal*s are
 | `u"a"`                     | `"b"` | `u"ab"`                    | `U"a"` | `"b"`                      | `U"ab"` | `L"a"` | `"b"` | `L"ab"` |
 | `"a"`                      | `u"b"` | `u"ab"`                    | `"a"` | `U"b"`                     | `U"ab"` | `"a"` | `L"b"` | `L"ab"` |
 Evaluating a *string-literal* results in a string literal object with
+static storage duration [[basic.stc]].
+[*Note 4*: String literal objects are potentially non-unique
+[[intro.object]]. Whether successive evaluations of a *string-literal*
+yield the same or a different object is unspecified. — *end note*]
+[*Note 5*:  The effect of attempting to modify a string literal object
 is undefined. — *end note*]
 String literal objects are initialized with the sequence of code unit
 values corresponding to the *string-literal*’s sequence of *s-char*s
 (originally from non-raw string literals) and *r-char*s (originally from
 - The sequence of characters denoted by each contiguous sequence of
   *basic-s-char*s, *r-char*s, *simple-escape-sequence*s [[lex.ccon]],
   and *universal-character-name*s [[lex.charset]] is encoded to a code
   unit sequence using the *string-literal*’s associated character
   encoding. If a character lacks representation in the associated
+  character encoding, then the program is ill-formed. \[*Note 6*: No
+ character lacks representation in any Unicode encoding
+ form. — *end note*] When encoding a stateful character encoding,
+ implementations should encode the first such sequence beginning with
+ the initial encoding state and encode subsequent sequences beginning
+  with the final encoding state of the prior sequence. \[*Note 7*: The
+ encoded code unit sequence can differ from the sequence of code units
+ that would be obtained by encoding each character
+ independently. — *end note*]
 - Each *numeric-escape-sequence* [[lex.ccon]] contributes a single code
   unit with a value as follows:
   - Let v be the integer value represented by the octal number
     comprising the sequence of *octal-digit*s in an
     *octal-escape-sequence* or by the hexadecimal number comprising the
     `L`, and v does not exceed the range of representable values of the
     corresponding unsigned type for the underlying type of the
     *string-literal*’s array element type, then the value is the unique
     value of the *string-literal*’s array element type `T` that is
     congruent to v modulo 2ᴺ, where N is the width of `T`.
+  - Otherwise, the program is ill-formed.
   When encoding a stateful character encoding, these sequences should
   have no effect on encoding state.
 - Each *conditional-escape-sequence* [[lex.ccon]] contributes an
   *implementation-defined* code unit sequence. When encoding a stateful
   character encoding, it is *implementation-defined* what effect these
   sequences have on encoding state.
+### Unevaluated strings <a id="lex.string.uneval">[[lex.string.uneval]]</a>
+``` bnf
+unevaluated-string:
+    string-literal
+```
+An *unevaluated-string* shall have no *encoding-prefix*.
+Each *universal-character-name* and each *simple-escape-sequence* in an
+*unevaluated-string* is replaced by the member of the translation
+character set it denotes. An *unevaluated-string* that contains a
+*numeric-escape-sequence* or a *conditional-escape-sequence* is
+ill-formed.
+An *unevaluated-string* is never evaluated and its interpretation
+depends on the context in which it appears.
 ### Boolean literals <a id="lex.bool">[[lex.bool]]</a>
 ``` bnf
 boolean-literal:
+    false
+    true
 ```
 The Boolean literals are the keywords `false` and `true`. Such literals
 have type `bool`.
 ### Pointer literals <a id="lex.nullptr">[[lex.nullptr]]</a>
 ``` bnf
 pointer-literal:
+    nullptr
 ```
 The pointer literal is the keyword `nullptr`. It has type
 `std::nullptr_t`.
 basic character set. — *end note*]
 If *L* is a *user-defined-string-literal*, let *str* be the literal
 without its *ud-suffix* and let *len* be the number of code units in
 *str* (i.e., its length excluding the terminating null character). If
+*S* contains a literal operator template with a constant template
 parameter for which *str* is a well-formed *template-argument*, the
 literal *L* is treated as a call of the form
 ``` cpp
 operator ""X<str>()
 [basic.fundamental]: basic.md#basic.fundamental
 [basic.link]: basic.md#basic.link
 [basic.lookup.unqual]: basic.md#basic.lookup.unqual
 [basic.stc]: basic.md#basic.stc
 [character.seq]: library.md#character.seq
+[class.mem.general]: class.md#class.mem.general
 [conv.mem]: expr.md#conv.mem
 [conv.ptr]: expr.md#conv.ptr
 [cpp]: cpp.md#cpp
 [cpp.cond]: cpp.md#cpp.cond
+[cpp.embed]: cpp.md#cpp.embed
 [cpp.import]: cpp.md#cpp.import
 [cpp.include]: cpp.md#cpp.include
 [cpp.module]: cpp.md#cpp.module
+[cpp.pragma]: cpp.md#cpp.pragma
+[cpp.pragma.op]: cpp.md#cpp.pragma.op
+[cpp.pre]: cpp.md#cpp.pre
+[cpp.predefined]: cpp.md#cpp.predefined
+[cpp.replace]: cpp.md#cpp.replace
 [cpp.stringize]: cpp.md#cpp.stringize
 [dcl.attr.grammar]: dcl.md#dcl.attr.grammar
+[dcl.pre]: dcl.md#dcl.pre
+[expr.const]: expr.md#expr.const
 [expr.prim.literal]: expr.md#expr.prim.literal
 [headers]: library.md#headers
+[intro.object]: basic.md#intro.object
 [lex]: #lex
 [lex.bool]: #lex.bool
 [lex.ccon]: #lex.ccon
 [lex.ccon.esc]: #lex.ccon.esc
 [lex.ccon.literal]: #lex.ccon.literal
+[lex.char]: #lex.char
 [lex.charset]: #lex.charset
 [lex.charset.basic]: #lex.charset.basic
 [lex.charset.literal]: #lex.charset.literal
 [lex.comment]: #lex.comment
 [lex.digraph]: #lex.digraph
 [lex.pptoken]: #lex.pptoken
 [lex.separate]: #lex.separate
 [lex.string]: #lex.string
 [lex.string.concat]: #lex.string.concat
 [lex.string.literal]: #lex.string.literal
+[lex.string.uneval]: #lex.string.uneval
 [lex.token]: #lex.token
+[lex.universal.char]: #lex.universal.char
 [module.import]: module.md#module.import
+[module.reach]: module.md#module.reach
 [module.unit]: module.md#module.unit
 [over.literal]: over.md#over.literal
 [support.types.layout]: support.md#support.types.layout
 [temp.explicit]: temp.md#temp.explicit
+[temp.inst]: temp.md#temp.inst
 [temp.names]: temp.md#temp.names
+[temp.point]: temp.md#temp.point
+[uaxid]: uax31.md#uaxid
 [^1]: Implementations behave as if these separate phases occur, although
     in practice different phases can be folded together.
+[^2]: Unicode® is a registered trademark of Unicode, Inc. This
+    information is given for the convenience of users of this document
+    and does not constitute an endorsement by ISO or IEC of this
+    product.
+[^3]: A partial preprocessing token would arise from a source file
     ending in the first portion of a multi-character token that requires
     a terminating sequence of characters, such as a *header-name* that
     is missing the closing `"` or `>`. A partial comment would arise
     from a source file ending with an unclosed `/*` comment.
+[^4]:  These include “digraphs” and additional reserved words. The term
     “digraph” (token consisting of two characters) is not perfectly
     descriptive, since one of the alternative *preprocessing-token*s is
     `%:%:` and of course several primary tokens contain two characters.
     Nonetheless, those alternative tokens that aren’t lexical keywords
     are colloquially known as “digraphs”.
+[^5]: Thus the “stringized” values [[cpp.stringize]] of `[` and `<:`
     will be different, maintaining the source spelling, but the tokens
     can otherwise be freely interchanged.
+[^6]: Literals include strings and character and numeric literals.
 [^7]: On systems in which linkers cannot accept extended characters, an
     encoding of the \*universal-character-name\* can be used in forming
     valid external identifiers. For example, some otherwise unused
     character or sequence of characters can be used to encode the `̆` in
     a \*universal-character-name\*. Extended characters can produce a
     long external identifier, but C++ does not place a translation limit
     on significant characters for external identifiers.
 [^8]: The term “literal” generally designates, in this document, those
+    tokens that are called “constants” in C.

Diff to HTML by rtfpessoa