[lex.literal] - C++14 → C++17

Files changed (1) hide show

tmp/tmpyqm3pry3/{from.md → to.md} +293 -181

tmp/tmpyqm3pry3/{from.md → to.md} RENAMED Viewed

@@ -44,13 +44,11 @@ decimal-literal:
     decimal-literal '''ₒₚₜ digit
 ```
 ``` bnf
 hexadecimal-literal:
- '0x' hexadecimal-digit
-    '0X' hexadecimal-digit
-    hexadecimal-literal '''ₒₚₜ hexadecimal-digit
 ```
 ``` bnf
 binary-digit:
     '0'
@@ -65,10 +63,21 @@ octal-digit: one of
 ``` bnf
 nonzero-digit: one of
     '1 2 3 4 5 6 7 8 9'
 ```
 ``` bnf
 hexadecimal-digit: one of
     '0 1 2 3 4 5 6 7 8 9'
     'a b c d e f'
     'A B C D E F'
@@ -99,22 +108,25 @@ long-long-suffix: one of
 An *integer literal* is a sequence of digits that has no period or
 exponent part, with optional separating single quotes that are ignored
 when determining its value. An integer literal may have a prefix that
 specifies its base and a suffix that specifies its type. The lexically
-first digit of the sequence of digits is the most significant. A
-*binary* integer literal (base two) begins with `0b` or `0B` and
-consists of a sequence of binary digits. An *octal* integer literal
-(base eight) begins with the digit `0` and consists of a sequence of
-octal digits.[^12] A *decimal* integer literal (base ten) begins with a
-digit other than `0` and consists of a sequence of decimal digits. A
-*hexadecimal* integer literal (base sixteen) begins with `0x` or `0X`
 and consists of a sequence of hexadecimal digits, which include the
 decimal digits and the letters `a` through `f` and `A` through `F` with
-decimal values ten through fifteen. The number twelve can be written
-`12`, `014`, `0XC`, or `0b1100`. The literals `1048576`, `1'048'576`,
-`0X100000`, `0x10'0000`, and `0'004'000'000` all have the same value.
 The type of an integer literal is the first of the corresponding list in
 Table  [[tab:lex.type.integer.literal]] in which its value can be
 represented.
@@ -144,26 +156,28 @@ represented.
 If an integer literal cannot be represented by any type in its list and
 an extended integer type ([[basic.fundamental]]) can represent its
 value, it may have that extended integer type. If all of the types in
-the list for the literal are signed, the extended integer type shall be
-signed. If all of the types in the list for the literal are unsigned,
-the extended integer type shall be unsigned. If the list contains both
-signed and unsigned types, the extended integer type may be signed or
-unsigned. A program is ill-formed if one of its translation units
-contains an integer literal that cannot be represented by any of the
-allowed types.
 ### Character literals <a id="lex.ccon">[[lex.ccon]]</a>
 ``` bnf
 character-literal:
-    ''' c-char-sequence '''
-    u''' c-char-sequence '''
-    U''' c-char-sequence '''
-    L''' c-char-sequence '''
 ```
 ``` bnf
 c-char-sequence:
     c-char
@@ -195,46 +209,64 @@ hexadecimal-escape-sequence:
     '\x' hexadecimal-digit
     hexadecimal-escape-sequence hexadecimal-digit
 ```
 A character literal is one or more characters enclosed in single quotes,
-as in `'x'`, optionally preceded by one of the letters `u`, `U`, or `L`,
-as in `u'y'`, `U'z'`, or `L'x'`, respectively. A character literal that
-does not begin with `u`, `U`, or `L` is an ordinary character literal,
-also referred to as a narrow-character literal. An ordinary character
-literal that contains a single *c-char* representable in the execution
-character set has type `char`, with value equal to the numerical value
-of the encoding of the *c-char* in the execution character set. An
-ordinary character literal that contains more than one *c-char* is a
-*multicharacter literal*. A multicharacter literal, or an ordinary
-character literal containing a single *c-char* not representable in the
-execution character set, is conditionally-supported, has type `int`, and
-has an *implementation-defined* value.
-A character literal that begins with the letter `u`, such as `u'y'`, is
 a character literal of type `char16_t`. The value of a `char16_t`
-literal containing a single *c-char* is equal to its ISO 10646 code
-point value, provided that the code point is representable with a single
-16-bit code unit. (That is, provided it is a basic multi-lingual plane
-code point.) If the value is not representable within 16 bits, the
-program is ill-formed. A `char16_t` literal containing multiple
-*c-char*s is ill-formed. A character literal that begins with the letter
-`U`, such as `U'z'`, is a character literal of type `char32_t`. The
-value of a `char32_t` literal containing a single *c-char* is equal to
-its ISO 10646 code point value. A `char32_t` literal containing multiple
-*c-char*s is ill-formed. A character literal that begins with the letter
-`L`, such as `L'x'`, is a wide-character literal. A wide-character
-literal has type `wchar_t`.[^13] The value of a wide-character literal
-containing a single *c-char* has value equal to the numerical value of
-the encoding of the *c-char* in the execution wide-character set, unless
-the *c-char* has no representation in the execution wide-character set,
-in which case the value is *implementation-defined*. The type `wchar_t`
-is able to represent all members of the execution wide-character set
-(see  [[basic.fundamental]]). . The value of a wide-character literal
-containing multiple *c-char*s is *implementation-defined*.
-Certain nongraphic characters, the single quote `'`, the double quote
 `"`, the question mark `?`,[^14] and the backslash `\`, can be
 represented according to Table  [[tab:escape.sequences]]. The double
 quote `"` and the question mark `?`, can be represented as themselves or
 by the escape sequences `\"` and `\?` respectively, but the single quote
 `'` and the backslash `\` shall be represented by the escape sequences
@@ -269,45 +301,74 @@ backslash followed by `x` followed by one or more hexadecimal digits
 that are taken to specify the value of the desired character. There is
 no limit to the number of digits in a hexadecimal sequence. A sequence
 of octal or hexadecimal digits is terminated by the first character that
 is not an octal digit or a hexadecimal digit, respectively. The value of
 a character literal is *implementation-defined* if it falls outside of
-the implementation-defined range defined for `char` (for literals with
-no prefix), `char16_t` (for literals prefixed by `'u'`), `char32_t` (for
-literals prefixed by `'U'`), or `wchar_t` (for literals prefixed by
-`'L'`).
-A universal-character-name is translated to the encoding, in the
 appropriate execution character set, of the character named. If there is
-no such encoding, the universal-character-name is translated to an
-*implementation-defined* encoding. In translation phase 1, a
-universal-character-name is introduced whenever an actual extended
-character is encountered in the source text. Therefore, all extended
-characters are described in terms of universal-character-names. However,
-the actual compiler implementation may use its own native character set,
-so long as the same results are obtained.
 ### Floating literals <a id="lex.fcon">[[lex.fcon]]</a>
 ``` bnf
 floating-literal:
     fractional-constant exponent-partₒₚₜ floating-suffixₒₚₜ
     digit-sequence exponent-part floating-suffixₒₚₜ
 ```
 ``` bnf
 fractional-constant:
     digit-sequenceₒₚₜ '.' digit-sequence
     digit-sequence '.'
 ```
 ``` bnf
 exponent-part:
     'e' signₒₚₜ digit-sequence
     'E' signₒₚₜ digit-sequence
 ```
 ``` bnf
 sign: one of
     '+ -'
 ```
@@ -320,46 +381,55 @@ digit-sequence:
 ``` bnf
 floating-suffix: one of
     'f l F L'
 ```
-A floating literal consists of an integer part, a decimal point, a
-fraction part, an `e` or `E`, an optionally signed integer exponent, and
-an optional type suffix. The integer and fraction parts both consist of
-a sequence of decimal (base ten) digits. Optional separating single
-quotes in a *digit-sequence* are ignored when determining its value. The
-literals `1.602'176'565e-19` and `1.602176565e-19` have the same value.
-Either the integer part or the fraction part (not both) can be omitted;
-either the decimal point or the letter `e` (or `E` ) and the exponent
-(not both) can be omitted. The integer part, the optional decimal point
-and the optional fraction part form the *significant part* of the
-floating literal. The exponent, if present, indicates the power of 10 by
-which the significant part is to be scaled. If the scaled value is in
-the range of representable values for its type, the result is the scaled
-value if representable, else the larger or smaller representable value
-nearest the scaled value, chosen in an *implementation-defined* manner.
-The type of a floating literal is `double` unless explicitly specified
-by a suffix. The suffixes `f` and `F` specify `float`, the suffixes `l`
-and `L` specify `long` `double`. If the scaled value is not in the range
-of representable values for its type, the program is ill-formed.
 ### String literals <a id="lex.string">[[lex.string]]</a>
 ``` bnf
 string-literal:
     encoding-prefixₒₚₜ '"' s-char-sequenceₒₚₜ '"'
     encoding-prefixₒₚₜ 'R' raw-string
 ```
-``` bnf
-encoding-prefix:
-  'u8'
-  'u'
-  'U'
-  'L'
-```
 ``` bnf
 s-char-sequence:
     s-char
     s-char-sequence s-char
 ```
@@ -379,36 +449,43 @@ r-char-sequence:
 d-char-sequence:
     d-char
     d-char-sequence d-char
 ```
-A string literal is a sequence of characters (as defined in
 [[lex.ccon]]) surrounded by double quotes, optionally prefixed by `R`,
 `u8`, `u8R`, `u`, `uR`, `U`, `UR`, `L`, or `LR`, as in `"..."`,
 `R"(...)"`, `u8"..."`, `u8R"**(...)**"`, `u"..."`, `uR"*~(...)*~"`,
 `U"..."`, `UR"zzz(...)zzz"`, `L"..."`, or `LR"(...)"`, respectively.
-A string literal that has an `R` in the prefix is a *raw string
 literal*. The *d-char-sequence* serves as a delimiter. The terminating
 *d-char-sequence* of a *raw-string* is the same sequence of characters
 as the initial *d-char-sequence*. A *d-char-sequence* shall consist of
 at most 16 characters.
-The characters `'('` and `')'` are permitted in a *raw-string*. Thus,
-`R"delimiter((a|b))delimiter"` is equivalent to `"(a|b)"`.
 A source-file new-line in a raw string literal results in a new-line in
-the resulting execution *string-literal*. Assuming no whitespace at the
 beginning of lines in the following example, the assert will succeed:
 ``` cpp
 const char* p = R"(a\
 b
 c)";
 assert(std::strcmp(p, "a\\\nb\nc") == 0);
 ```
 The raw string
 ``` cpp
 R"a(
 )\
@@ -430,62 +507,63 @@ R"#(
 )#"
 ```
 is equivalent to `"\n)\?\?=\"\n"`.
-After translation phase 6, a string literal that does not begin with an
-*encoding-prefix* is an ordinary string literal, and is initialized with
-the given characters.
-A string literal that begins with `u8`, such as `u8"asdf"`, is a UTF-8
-string literal.
 Ordinary string literals and UTF-8 string literals are also referred to
 as narrow string literals. A narrow string literal has type “array of
 *n* `const char`”, where *n* is the size of the string as defined below,
 and has static storage duration ([[basic.stc]]).
 For a UTF-8 string literal, each successive element of the object
 representation ([[basic.types]]) has the value of the corresponding
 code unit of the UTF-8 encoding of the string.
-A string literal that begins with `u`, such as `u"asdf"`, is a
 `char16_t` string literal. A `char16_t` string literal has type “array
 of *n* `const char16_t`”, where *n* is the size of the string as defined
-below; it has static storage duration and is initialized with the given
-characters. A single *c-char* may produce more than one `char16_t`
-character in the form of surrogate pairs.
-A string literal that begins with `U`, such as `U"asdf"`, is a
 `char32_t` string literal. A `char32_t` string literal has type “array
 of *n* `const char32_t`”, where *n* is the size of the string as defined
-below; it has static storage duration and is initialized with the given
-characters.
-A string literal that begins with `L`, such as `L"asdf"`, is a wide
-string literal. A wide string literal has type “array of *n* `const
-wchar_t`”, where *n* is the size of the string as defined below; it has
-static storage duration and is initialized with the given characters.
-Whether all string literals are distinct (that is, are stored in
-nonoverlapping objects) is *implementation-defined*. The effect of
-attempting to modify a string literal is undefined.
-In translation phase 6 ([[lex.phases]]), adjacent string literals are
-concatenated. If both string literals have the same *encoding-prefix*,
 the resulting concatenated string literal has that *encoding-prefix*. If
-one string literal has no *encoding-prefix*, it is treated as a string
-literal of the same *encoding-prefix* as the other operand. If a UTF-8
-string literal token is adjacent to a wide string literal token, the
-program is ill-formed. Any other concatenations are
-conditionally-supported with *implementation-defined* behavior. This
-concatenation is an interpretation, not a conversion. Because the
-interpretation happens in translation phase 6 (after each character from
-a literal has been translated into a value from the appropriate
-character set), a string literal’s initial rawness has no effect on the
-interpretation or well-formedness of the concatenation. Table
-[[tab:lex.string.concat]] has some examples of valid concatenations.
 **Table: String literal concatenations** <a id="tab:lex.string.concat">[tab:lex.string.concat]</a>
 |                            |       |                            |       |                            |       |
 | -------------------------- | ----- | -------------------------- | ----- | -------------------------- | ----- |
@@ -495,41 +573,59 @@ interpretation or well-formedness of the concatenation. Table
 | `"a"`                      | `u"b"` | `u"ab"`                    | `"a"` | `U"b"`                     | `U"ab"` | `"a"` | `L"b"` | `L"ab"` |
 Characters in concatenated strings are kept distinct.
 ``` cpp
 "\xA" "B"
 ```
 contains the two characters `'\xA'` and `'B'` after concatenation (and
 not the single hexadecimal character `'\xAB'`).
 After any necessary concatenation, in translation phase 7 (
 [[lex.phases]]), `'\0'` is appended to every string literal so that
 programs that scan a string can find its end.
-Escape sequences and universal-character-names in non-raw string
 literals have the same meaning as in character literals ([[lex.ccon]]),
 except that the single quote `'` is representable either by itself or by
 the escape sequence `\'`, and the double quote `"` shall be preceded by
-a `\`. In a narrow string literal, a universal-character-name may map to
-more than one `char` element due to *multibyte encoding*. The size of a
-`char32_t` or wide string literal is the total number of escape
-sequences, universal-character-names, and other characters, plus one for
-the terminating `U'\0'` or `L'\0'`. The size of a `char16_t` string
-literal is the total number of escape sequences,
-universal-character-names, and other characters, plus one for each
-character requiring a surrogate pair, plus one for the terminating
-`u'\0'`. The size of a `char16_t` string literal is the number of code
-units, not the number of characters. Within `char32_t` and `char16_t`
-literals, any universal-character-names shall be within the range `0x0`
-to `0x10FFFF`. The size of a narrow string literal is the total number
-of escape sequences and other characters, plus at least one for the
-multibyte encoding of each universal-character-name, plus one for the
 terminating `'\0'`.
 ### Boolean literals <a id="lex.bool">[[lex.bool]]</a>
 ``` bnf
 boolean-literal:
     'false'
@@ -545,14 +641,17 @@ are prvalues and have type `bool`.
 pointer-literal:
     'nullptr'
 ```
 The pointer literal is the keyword `nullptr`. It is a prvalue of type
-`std::nullptr_t`. `std::nullptr_t` is a distinct type that is neither a
 pointer type nor a pointer to member type; rather, a prvalue of this
 type is a null pointer constant and can be converted to a null pointer
-value or null member pointer value. See  [[conv.ptr]] and  [[conv.mem]].
 ### User-defined literals <a id="lex.ext">[[lex.ext]]</a>
 ``` bnf
 user-defined-literal:
@@ -572,10 +671,12 @@ user-defined-integer-literal:
 ``` bnf
 user-defined-floating-literal:
     fractional-constant exponent-partₒₚₜ ud-suffix
     digit-sequence exponent-part ud-suffix
 ```
 ``` bnf
 user-defined-string-literal:
     string-literal ud-suffix
@@ -589,15 +690,24 @@ user-defined-character-literal:
 ``` bnf
 ud-suffix:
     identifier
 ```
-If a token matches both *user-defined-literal* and another literal kind,
-it is treated as the latter. `123_km` is a *user-defined-literal*, but
-`12LL` is an *integer-literal*. The syntactic non-terminal preceding the
-*ud-suffix* in a *user-defined-literal* is taken to be the longest
-sequence of characters that could match that non-terminal.
 A *user-defined-literal* is treated as a call to a literal operator or
 literal operator template ([[over.literal]]). To determine the form of
 this call for a given *user-defined-literal* *L* with *ud-suffix* *X*,
 the *literal-operator-id* whose literal suffix identifier is *X* is
@@ -627,13 +737,14 @@ a call of the form
 ``` cpp
 operator "" X<'c₁', 'c₂', ... 'cₖ'>()
 ```
-where *n* is the source character sequence c₁c₂...cₖ. The sequence
-c₁c₂...cₖ can only contain characters from the basic source character
-set.
 If *L* is a *user-defined-floating-literal*, let *f* be the literal
 without its *ud-suffix*. If *S* contains a literal operator with
 parameter type `long double`, the literal *L* is treated as a call of
 the form
@@ -655,32 +766,35 @@ a call of the form
 ``` cpp
 operator "" X<'c₁', 'c₂', ... 'cₖ'>()
 ```
-where *f* is the source character sequence c₁c₂...cₖ. The sequence
-c₁c₂...cₖ can only contain characters from the basic source character
-set.
 If *L* is a *user-defined-string-literal*, let *str* be the literal
 without its *ud-suffix* and let *len* be the number of code units in
 *str* (i.e., its length excluding the terminating null character). The
 literal *L* is treated as a call of the form
 ``` cpp
-operator "" X(str{}, len{})
 ```
 If *L* is a *user-defined-character-literal*, let *ch* be the literal
 without its *ud-suffix*. *S* shall contain a literal operator (
 [[over.literal]]) whose only parameter has the type of *ch* and the
 literal *L* is treated as a call of the form
 ``` cpp
-operator "" X(ch{})
 ```
 ``` cpp
 long double operator "" _w(long double);
 std::string operator "" _w(const char16_t*, std::size_t);
 unsigned operator "" _w(const char*);
 int main() {
@@ -689,48 +803,47 @@ int main() {
   12_w;       // calls operator "" _w("12")
   "two"_w;    // error: no applicable literal operator
 }
 ```
 In translation phase 6 ([[lex.phases]]), adjacent string literals are
 concatenated and *user-defined-string-literal*s are considered string
 literals for that purpose. During concatenation, *ud-suffix*es are
 removed and ignored and the concatenation process occurs as described
 in  [[lex.string]]. At the end of phase 6, if a string literal is the
 result of a concatenation involving at least one
 *user-defined-string-literal*, all the participating
 *user-defined-string-literal*s shall have the same *ud-suffix* and that
 suffix is applied to the result of the concatenation.
 ``` cpp
 int main() {
   L"A" "B" "C"_x; // OK: same as L"ABC"_x
   "P"_x "Q" "R"_y;// error: two different ud-suffix{es}
 }
 ```
-Some *identifier*s appearing as *ud-suffix*es are reserved for future
-standardization ([[usrlit.suffix]]). A program containing such a
-*ud-suffix* is ill-formed, no diagnostic required.
 <!-- Link reference definitions -->
 [basic.fundamental]: basic.md#basic.fundamental
 [basic.link]: basic.md#basic.link
 [basic.lookup.unqual]: basic.md#basic.lookup.unqual
 [basic.stc]: basic.md#basic.stc
 [basic.types]: basic.md#basic.types
-[charname.allowed]: charname.md#charname.allowed
-[charname.disallowed]: charname.md#charname.disallowed
 [conv.mem]: conv.md#conv.mem
 [conv.ptr]: conv.md#conv.ptr
 [cpp]: cpp.md#cpp
 [cpp.concat]: cpp.md#cpp.concat
 [cpp.cond]: cpp.md#cpp.cond
 [cpp.include]: cpp.md#cpp.include
 [cpp.stringize]: cpp.md#cpp.stringize
 [dcl.attr.grammar]: dcl.md#dcl.attr.grammar
-[global.names]: library.md#global.names
 [headers]: library.md#headers
 [lex]: #lex
 [lex.bool]: #lex.bool
 [lex.ccon]: #lex.ccon
 [lex.charset]: #lex.charset
@@ -750,23 +863,22 @@ standardization ([[usrlit.suffix]]). A program containing such a
 [lex.ppnumber]: #lex.ppnumber
 [lex.pptoken]: #lex.pptoken
 [lex.separate]: #lex.separate
 [lex.string]: #lex.string
 [lex.token]: #lex.token
-[lex.trigraph]: #lex.trigraph
 [over.literal]: over.md#over.literal
 [tab:alternative.representations]: #tab:alternative.representations
 [tab:alternative.tokens]: #tab:alternative.tokens
 [tab:escape.sequences]: #tab:escape.sequences
 [tab:identifiers.special]: #tab:identifiers.special
 [tab:keywords]: #tab:keywords
 [tab:lex.string.concat]: #tab:lex.string.concat
 [tab:lex.type.integer.literal]: #tab:lex.type.integer.literal
-[tab:trigraph.sequences]: #tab:trigraph.sequences
 [temp.explicit]: temp.md#temp.explicit
 [temp.names]: temp.md#temp.names
-[usrlit.suffix]: library.md#usrlit.suffix
 [^1]: Implementations must behave as if these separate phases occur,
     although in practice different phases might be folded together.
 [^2]: A partial preprocessing token would arise from a source file
@@ -781,16 +893,16 @@ standardization ([[usrlit.suffix]]). A program containing such a
 [^4]: The glyphs for the members of the basic source character set are
     intended to identify characters from the subset of ISO/IEC 10646
     which corresponds to the ASCII character set. However, because the
     mapping from source file characters to the source character set
     (described in translation phase 1) is specified as
-    implementation-defined, an implementation is required to document
     how the basic source characters are represented in source files.
-[^5]: A sequence of characters resembling a universal-character-name in
-    an *r-char-sequence* ([[lex.string]]) does not form a
-    universal-character-name.
 [^6]:  These include “digraphs” and additional reserved words. The term
     “digraph” (token consisting of two characters) is not perfectly
     descriptive, since one of the alternative preprocessing-tokens is
     `%:%:` and of course several primary tokens contain two characters.
@@ -807,14 +919,14 @@ standardization ([[usrlit.suffix]]). A program containing such a
     might result in an error, be interpreted as the character
     corresponding to the escape sequence, or have a completely different
     meaning, depending on the implementation.
 [^10]: On systems in which linkers cannot accept extended characters, an
-    encoding of the universal-character-name may be used in forming
     valid external identifiers. For example, some otherwise unused
     character or sequence of characters may be used to encode the `\u`
-    in a universal-character-name. Extended characters may produce a
     long external identifier, but C++does not place a translation limit
     on significant characters for external identifiers. In C++, upper-
     and lower-case letters are considered different for all identifiers,
     including external identifiers.
@@ -824,7 +936,7 @@ standardization ([[usrlit.suffix]]). A program containing such a
 [^12]: The digits `8` and `9` are not octal digits.
 [^13]: They are intended for character sets where a character does not
     fit into a single byte.
-[^14]: Using an escape sequence for a question mark can avoid
- accidentally creating a trigraph.

     decimal-literal '''ₒₚₜ digit
 ```
 ``` bnf
 hexadecimal-literal:
+ hexadecimal-prefix hexadecimal-digit-sequence
 ```
 ``` bnf
 binary-digit:
     '0'
 ``` bnf
 nonzero-digit: one of
     '1 2 3 4 5 6 7 8 9'
 ```
+``` bnf
+hexadecimal-prefix: one of
+    '0x 0X'
+```
+``` bnf
+hexadecimal-digit-sequence:
+    hexadecimal-digit
+    hexadecimal-digit-sequence '''ₒₚₜ hexadecimal-digit
+```
 ``` bnf
 hexadecimal-digit: one of
     '0 1 2 3 4 5 6 7 8 9'
     'a b c d e f'
     'A B C D E F'
 An *integer literal* is a sequence of digits that has no period or
 exponent part, with optional separating single quotes that are ignored
 when determining its value. An integer literal may have a prefix that
 specifies its base and a suffix that specifies its type. The lexically
+first digit of the sequence of digits is the most significant. A *binary
+integer literal* (base two) begins with `0b` or `0B` and consists of a
+sequence of binary digits. An *octal integer literal* (base eight)
+begins with the digit `0` and consists of a sequence of octal
+digits.[^12] A *decimal integer literal* (base ten) begins with a digit
+other than `0` and consists of a sequence of decimal digits. A
+*hexadecimal integer literal* (base sixteen) begins with `0x` or `0X`
 and consists of a sequence of hexadecimal digits, which include the
 decimal digits and the letters `a` through `f` and `A` through `F` with
+decimal values ten through fifteen.
+[*Example 1*: The number twelve can be written `12`, `014`, `0XC`, or
+`0b1100`. The integer literals `1048576`, `1'048'576`, `0X100000`,
+`0x10'0000`, and `0'004'000'000` all have the same
+value. — *end example*]
 The type of an integer literal is the first of the corresponding list in
 Table  [[tab:lex.type.integer.literal]] in which its value can be
 represented.
 If an integer literal cannot be represented by any type in its list and
 an extended integer type ([[basic.fundamental]]) can represent its
 value, it may have that extended integer type. If all of the types in
+the list for the integer literal are signed, the extended integer type
+shall be signed. If all of the types in the list for the integer literal
+are unsigned, the extended integer type shall be unsigned. If the list
+contains both signed and unsigned types, the extended integer type may
+be signed or unsigned. A program is ill-formed if one of its translation
+units contains an integer literal that cannot be represented by any of
+the allowed types.
 ### Character literals <a id="lex.ccon">[[lex.ccon]]</a>
 ``` bnf
 character-literal:
+ encoding-prefixₒₚₜ ''' c-char-sequence '''
+```
+``` bnf
+encoding-prefix: one of
+    'u8' 'u' 'U' 'L'
 ```
 ``` bnf
 c-char-sequence:
     c-char
     '\x' hexadecimal-digit
     hexadecimal-escape-sequence hexadecimal-digit
 ```
 A character literal is one or more characters enclosed in single quotes,
+as in `'x'`, optionally preceded by `u8`, `u`, `U`, or `L`, as in
+`u8'w'`, `u'x'`, `U'y'`, or `L'z'`, respectively.
+A character literal that does not begin with `u8`, `u`, `U`, or `L` is
+an *ordinary character literal*. An ordinary character literal that
+contains a single *c-char* representable in the execution character set
+has type `char`, with value equal to the numerical value of the encoding
+of the *c-char* in the execution character set. An ordinary character
+literal that contains more than one *c-char* is a *multicharacter
+literal*. A multicharacter literal, or an ordinary character literal
+containing a single *c-char* not representable in the execution
+character set, is conditionally-supported, has type `int`, and has an
+*implementation-defined* value.
+A character literal that begins with `u8`, such as `u8'w'`, is a
+character literal of type `char`, known as a *UTF-8 character literal*.
+The value of a UTF-8 character literal is equal to its ISO 10646 code
+point value, provided that the code point value is representable with a
+single UTF-8 code unit (that is, provided it is in the C0 Controls and
+Basic Latin Unicode block). If the value is not representable with a
+single UTF-8 code unit, the program is ill-formed. A UTF-8 character
+literal containing multiple *c-char*s is ill-formed.
+A character literal that begins with the letter `u`, such as `u'x'`, is
 a character literal of type `char16_t`. The value of a `char16_t`
+character literal containing a single *c-char* is equal to its ISO 10646
+code point value, provided that the code point is representable with a
+single 16-bit code unit. (That is, provided it is a basic multi-lingual
+plane code point.) If the value is not representable within 16 bits, the
+program is ill-formed. A `char16_t` character literal containing
+multiple *c-char*s is ill-formed.
+A character literal that begins with the letter `U`, such as `U'y'`, is
+a character literal of type `char32_t`. The value of a `char32_t`
+character literal containing a single *c-char* is equal to its ISO 10646
+code point value. A `char32_t` character literal containing multiple
+*c-char*s is ill-formed.
+A character literal that begins with the letter `L`, such as `L'z'`, is
+a *wide-character literal*. A wide-character literal has type
+`wchar_t`.[^13] The value of a wide-character literal containing a
+single *c-char* has value equal to the numerical value of the encoding
+of the *c-char* in the execution wide-character set, unless the *c-char*
+has no representation in the execution wide-character set, in which case
+the value is *implementation-defined*.
+[*Note 1*: The type `wchar_t` is able to represent all members of the
+execution wide-character set (see
+[[basic.fundamental]]). — *end note*]
+The value of a wide-character literal containing multiple *c-char*s is
+*implementation-defined*.
+Certain non-graphic characters, the single quote `'`, the double quote
 `"`, the question mark `?`,[^14] and the backslash `\`, can be
 represented according to Table  [[tab:escape.sequences]]. The double
 quote `"` and the question mark `?`, can be represented as themselves or
 by the escape sequences `\"` and `\?` respectively, but the single quote
 `'` and the backslash `\` shall be represented by the escape sequences
 that are taken to specify the value of the desired character. There is
 no limit to the number of digits in a hexadecimal sequence. A sequence
 of octal or hexadecimal digits is terminated by the first character that
 is not an octal digit or a hexadecimal digit, respectively. The value of
 a character literal is *implementation-defined* if it falls outside of
+the *implementation-defined* range defined for `char` (for character
+literals with no prefix) or `wchar_t` (for character literals prefixed
+by `L`).
+[*Note 2*: If the value of a character literal prefixed by `u`, `u8`,
+or `U` is outside the range defined for its type, the program is
+ill-formed. — *end note*]
+A *universal-character-name* is translated to the encoding, in the
 appropriate execution character set, of the character named. If there is
+no such encoding, the *universal-character-name* is translated to an
+*implementation-defined* encoding.
+[*Note 3*: In translation phase 1, a *universal-character-name* is
+introduced whenever an actual extended character is encountered in the
+source text. Therefore, all extended characters are described in terms
+of *universal-character-name*s. However, the actual compiler
+implementation may use its own native character set, so long as the same
+results are obtained. — *end note*]
 ### Floating literals <a id="lex.fcon">[[lex.fcon]]</a>
 ``` bnf
 floating-literal:
+    decimal-floating-literal
+    hexadecimal-floating-literal
+```
+``` bnf
+decimal-floating-literal:
     fractional-constant exponent-partₒₚₜ floating-suffixₒₚₜ
     digit-sequence exponent-part floating-suffixₒₚₜ
 ```
+``` bnf
+hexadecimal-floating-literal:
+    hexadecimal-prefix hexadecimal-fractional-constant binary-exponent-part floating-suffixₒₚₜ
+    hexadecimal-prefix hexadecimal-digit-sequence binary-exponent-part floating-suffixₒₚₜ
+```
 ``` bnf
 fractional-constant:
     digit-sequenceₒₚₜ '.' digit-sequence
     digit-sequence '.'
 ```
+``` bnf
+hexadecimal-fractional-constant:
+    hexadecimal-digit-sequenceₒₚₜ '.' hexadecimal-digit-sequence
+    hexadecimal-digit-sequence '.'
+```
 ``` bnf
 exponent-part:
     'e' signₒₚₜ digit-sequence
     'E' signₒₚₜ digit-sequence
 ```
+``` bnf
+binary-exponent-part:
+    'p' signₒₚₜ digit-sequence
+    'P' signₒₚₜ digit-sequence
+```
 ``` bnf
 sign: one of
     '+ -'
 ```
 ``` bnf
 floating-suffix: one of
     'f l F L'
 ```
+A floating literal consists of an optional prefix specifying a base, an
+integer part, a radix point, a fraction part, an `e`, `E`, `p` or `P`,
+an optionally signed integer exponent, and an optional type suffix. The
+integer and fraction parts both consist of a sequence of decimal (base
+ten) digits if there is no prefix, or hexadecimal (base sixteen) digits
+if the prefix is `0x` or `0X`. The floating literal is a *decimal
+floating literal* in the former case and a *hexadecimal floating
+literal* in the latter case. Optional separating single quotes in a
+*digit-sequence* or *hexadecimal-digit-sequence* are ignored when
+determining its value.
+[*Example 1*: The floating literals `1.602'176'565e-19` and
+`1.602176565e-19` have the same value. — *end example*]
+Either the integer part or the fraction part (not both) can be omitted.
+Either the radix point or the letter `e` or `E` and the exponent (not
+both) can be omitted from a decimal floating literal. The radix point
+(but not the exponent) can be omitted from a hexadecimal floating
+literal. The integer part, the optional radix point, and the optional
+fraction part, form the *significand* of the floating literal. In a
+decimal floating literal, the exponent, if present, indicates the power
+of 10 by which the significand is to be scaled. In a hexadecimal
+floating literal, the exponent indicates the power of 2 by which the
+significand is to be scaled.
+[*Example 2*: The floating literals `49.625` and `0xC.68p+2` have the
+same value. — *end example*]
+If the scaled value is in the range of representable values for its
+type, the result is the scaled value if representable, else the larger
+or smaller representable value nearest the scaled value, chosen in an
+*implementation-defined* manner. The type of a floating literal is
+`double` unless explicitly specified by a suffix. The suffixes `f` and
+`F` specify `float`, the suffixes `l` and `L` specify `long` `double`.
+If the scaled value is not in the range of representable values for its
+type, the program is ill-formed.
 ### String literals <a id="lex.string">[[lex.string]]</a>
 ``` bnf
 string-literal:
     encoding-prefixₒₚₜ '"' s-char-sequenceₒₚₜ '"'
     encoding-prefixₒₚₜ 'R' raw-string
 ```
 ``` bnf
 s-char-sequence:
     s-char
     s-char-sequence s-char
 ```
 d-char-sequence:
     d-char
     d-char-sequence d-char
 ```
+A *string-literal* is a sequence of characters (as defined in
 [[lex.ccon]]) surrounded by double quotes, optionally prefixed by `R`,
 `u8`, `u8R`, `u`, `uR`, `U`, `UR`, `L`, or `LR`, as in `"..."`,
 `R"(...)"`, `u8"..."`, `u8R"**(...)**"`, `u"..."`, `uR"*~(...)*~"`,
 `U"..."`, `UR"zzz(...)zzz"`, `L"..."`, or `LR"(...)"`, respectively.
+A *string-literal* that has an `R` in the prefix is a *raw string
 literal*. The *d-char-sequence* serves as a delimiter. The terminating
 *d-char-sequence* of a *raw-string* is the same sequence of characters
 as the initial *d-char-sequence*. A *d-char-sequence* shall consist of
 at most 16 characters.
+[*Note 1*: The characters `'('` and `')'` are permitted in a
+*raw-string*. Thus, `R"delimiter((a|b))delimiter"` is equivalent to
+`"(a|b)"`. — *end note*]
+[*Note 2*:
 A source-file new-line in a raw string literal results in a new-line in
+the resulting execution string literal. Assuming no whitespace at the
 beginning of lines in the following example, the assert will succeed:
 ``` cpp
 const char* p = R"(a\
 b
 c)";
 assert(std::strcmp(p, "a\\\nb\nc") == 0);
 ```
+— *end note*]
+[*Example 1*:
 The raw string
 ``` cpp
 R"a(
 )\
 )#"
 ```
 is equivalent to `"\n)\?\?=\"\n"`.
+— *end example*]
+After translation phase 6, a *string-literal* that does not begin with
+an *encoding-prefix* is an *ordinary string literal*, and is initialized
+with the given characters.
+A *string-literal* that begins with `u8`, such as `u8"asdf"`, is a
+*UTF-8 string literal*.
 Ordinary string literals and UTF-8 string literals are also referred to
 as narrow string literals. A narrow string literal has type “array of
 *n* `const char`”, where *n* is the size of the string as defined below,
 and has static storage duration ([[basic.stc]]).
 For a UTF-8 string literal, each successive element of the object
 representation ([[basic.types]]) has the value of the corresponding
 code unit of the UTF-8 encoding of the string.
+A *string-literal* that begins with `u`, such as `u"asdf"`, is a
 `char16_t` string literal. A `char16_t` string literal has type “array
 of *n* `const char16_t`”, where *n* is the size of the string as defined
+below; it is initialized with the given characters. A single *c-char*
+may produce more than one `char16_t` character in the form of surrogate
+pairs.
+A *string-literal* that begins with `U`, such as `U"asdf"`, is a
 `char32_t` string literal. A `char32_t` string literal has type “array
 of *n* `const char32_t`”, where *n* is the size of the string as defined
+below; it is initialized with the given characters.
+A *string-literal* that begins with `L`, such as `L"asdf"`, is a *wide
+string literal*. A wide string literal has type “array of *n* `const
+wchar_t`”, where *n* is the size of the string as defined below; it is
+initialized with the given characters.
+In translation phase 6 ([[lex.phases]]), adjacent *string-literal*s are
+concatenated. If both *string-literal*s have the same *encoding-prefix*,
 the resulting concatenated string literal has that *encoding-prefix*. If
+one *string-literal* has no *encoding-prefix*, it is treated as a
+*string-literal* of the same *encoding-prefix* as the other operand. If
+a UTF-8 string literal token is adjacent to a wide string literal token,
+the program is ill-formed. Any other concatenations are
+conditionally-supported with *implementation-defined* behavior.
+[*Note 3*: This concatenation is an interpretation, not a conversion.
+Because the interpretation happens in translation phase 6 (after each
+character from a string literal has been translated into a value from
+the appropriate character set), a *string-literal*’s initial rawness has
+no effect on the interpretation or well-formedness of the
+concatenation. — *end note*]
+Table  [[tab:lex.string.concat]] has some examples of valid
+concatenations.
 **Table: String literal concatenations** <a id="tab:lex.string.concat">[tab:lex.string.concat]</a>
 |                            |       |                            |       |                            |       |
 | -------------------------- | ----- | -------------------------- | ----- | -------------------------- | ----- |
 | `"a"`                      | `u"b"` | `u"ab"`                    | `"a"` | `U"b"`                     | `U"ab"` | `"a"` | `L"b"` | `L"ab"` |
 Characters in concatenated strings are kept distinct.
+[*Example 2*:
 ``` cpp
 "\xA" "B"
 ```
 contains the two characters `'\xA'` and `'B'` after concatenation (and
 not the single hexadecimal character `'\xAB'`).
+— *end example*]
 After any necessary concatenation, in translation phase 7 (
 [[lex.phases]]), `'\0'` is appended to every string literal so that
 programs that scan a string can find its end.
+Escape sequences and *universal-character-name*s in non-raw string
 literals have the same meaning as in character literals ([[lex.ccon]]),
 except that the single quote `'` is representable either by itself or by
 the escape sequence `\'`, and the double quote `"` shall be preceded by
+a `\`, and except that a *universal-character-name* in a `char16_t`
+string literal may yield a surrogate pair. In a narrow string literal, a
+*universal-character-name* may map to more than one `char` element due
+to *multibyte encoding*. The size of a `char32_t` or wide string literal
+is the total number of escape sequences, *universal-character-name*s,
+and other characters, plus one for the terminating `U'\0'` or `L'\0'`.
+The size of a `char16_t` string literal is the total number of escape
+sequences, *universal-character-name*s, and other characters, plus one
+for each character requiring a surrogate pair, plus one for the
+terminating `u'\0'`.
+[*Note 4*: The size of a `char16_t` string literal is the number of
+code units, not the number of characters. — *end note*]
+Within `char32_t` and `char16_t` string literals, any
+*universal-character-name*s shall be within the range `0x0` to
+`0x10FFFF`. The size of a narrow string literal is the total number of
+escape sequences and other characters, plus at least one for the
+multibyte encoding of each *universal-character-name*, plus one for the
 terminating `'\0'`.
+Evaluating a *string-literal* results in a string literal object with
+static storage duration, initialized from the given characters as
+specified above. Whether all string literals are distinct (that is, are
+stored in nonoverlapping objects) and whether successive evaluations of
+a *string-literal* yield the same or a different object is unspecified.
+[*Note 5*:  The effect of attempting to modify a string literal is
+undefined. — *end note*]
 ### Boolean literals <a id="lex.bool">[[lex.bool]]</a>
 ``` bnf
 boolean-literal:
     'false'
 pointer-literal:
     'nullptr'
 ```
 The pointer literal is the keyword `nullptr`. It is a prvalue of type
+`std::nullptr_t`.
+[*Note 1*: `std::nullptr_t` is a distinct type that is neither a
 pointer type nor a pointer to member type; rather, a prvalue of this
 type is a null pointer constant and can be converted to a null pointer
+value or null member pointer value. See  [[conv.ptr]] and
+[[conv.mem]]. — *end note*]
 ### User-defined literals <a id="lex.ext">[[lex.ext]]</a>
 ``` bnf
 user-defined-literal:
 ``` bnf
 user-defined-floating-literal:
     fractional-constant exponent-partₒₚₜ ud-suffix
     digit-sequence exponent-part ud-suffix
+    hexadecimal-prefix hexadecimal-fractional-constant binary-exponent-part ud-suffix
+    hexadecimal-prefix hexadecimal-digit-sequence binary-exponent-part ud-suffix
 ```
 ``` bnf
 user-defined-string-literal:
     string-literal ud-suffix
 ``` bnf
 ud-suffix:
     identifier
 ```
+If a token matches both *user-defined-literal* and another *literal*
+kind, it is treated as the latter.
+[*Example 1*:
+`123_km`
+is a *user-defined-literal*, but `12LL` is an *integer-literal*.
+— *end example*]
+The syntactic non-terminal preceding the *ud-suffix* in a
+*user-defined-literal* is taken to be the longest sequence of characters
+that could match that non-terminal.
 A *user-defined-literal* is treated as a call to a literal operator or
 literal operator template ([[over.literal]]). To determine the form of
 this call for a given *user-defined-literal* *L* with *ud-suffix* *X*,
 the *literal-operator-id* whose literal suffix identifier is *X* is
 ``` cpp
 operator "" X<'c₁', 'c₂', ... 'cₖ'>()
 ```
+where *n* is the source character sequence c₁c₂...cₖ.
+[*Note 1*: The sequence c₁c₂...cₖ can only contain characters from the
+basic source character set. — *end note*]
 If *L* is a *user-defined-floating-literal*, let *f* be the literal
 without its *ud-suffix*. If *S* contains a literal operator with
 parameter type `long double`, the literal *L* is treated as a call of
 the form
 ``` cpp
 operator "" X<'c₁', 'c₂', ... 'cₖ'>()
 ```
+where *f* is the source character sequence c₁c₂...cₖ.
+[*Note 2*: The sequence c₁c₂...cₖ can only contain characters from the
+basic source character set. — *end note*]
 If *L* is a *user-defined-string-literal*, let *str* be the literal
 without its *ud-suffix* and let *len* be the number of code units in
 *str* (i.e., its length excluding the terminating null character). The
 literal *L* is treated as a call of the form
 ``` cpp
+operator "" X(str, len)
 ```
 If *L* is a *user-defined-character-literal*, let *ch* be the literal
 without its *ud-suffix*. *S* shall contain a literal operator (
 [[over.literal]]) whose only parameter has the type of *ch* and the
 literal *L* is treated as a call of the form
 ``` cpp
+operator "" X(ch)
 ```
+[*Example 2*:
 ``` cpp
 long double operator "" _w(long double);
 std::string operator "" _w(const char16_t*, std::size_t);
 unsigned operator "" _w(const char*);
 int main() {
   12_w;       // calls operator "" _w("12")
   "two"_w;    // error: no applicable literal operator
 }
 ```
+— *end example*]
 In translation phase 6 ([[lex.phases]]), adjacent string literals are
 concatenated and *user-defined-string-literal*s are considered string
 literals for that purpose. During concatenation, *ud-suffix*es are
 removed and ignored and the concatenation process occurs as described
 in  [[lex.string]]. At the end of phase 6, if a string literal is the
 result of a concatenation involving at least one
 *user-defined-string-literal*, all the participating
 *user-defined-string-literal*s shall have the same *ud-suffix* and that
 suffix is applied to the result of the concatenation.
+[*Example 3*:
 ``` cpp
 int main() {
   L"A" "B" "C"_x; // OK: same as L"ABC"_x
   "P"_x "Q" "R"_y;// error: two different ud-suffix{es}
 }
 ```
+— *end example*]
 <!-- Link reference definitions -->
 [basic.fundamental]: basic.md#basic.fundamental
 [basic.link]: basic.md#basic.link
 [basic.lookup.unqual]: basic.md#basic.lookup.unqual
 [basic.stc]: basic.md#basic.stc
 [basic.types]: basic.md#basic.types
 [conv.mem]: conv.md#conv.mem
 [conv.ptr]: conv.md#conv.ptr
 [cpp]: cpp.md#cpp
 [cpp.concat]: cpp.md#cpp.concat
 [cpp.cond]: cpp.md#cpp.cond
 [cpp.include]: cpp.md#cpp.include
 [cpp.stringize]: cpp.md#cpp.stringize
 [dcl.attr.grammar]: dcl.md#dcl.attr.grammar
 [headers]: library.md#headers
 [lex]: #lex
 [lex.bool]: #lex.bool
 [lex.ccon]: #lex.ccon
 [lex.charset]: #lex.charset
 [lex.ppnumber]: #lex.ppnumber
 [lex.pptoken]: #lex.pptoken
 [lex.separate]: #lex.separate
 [lex.string]: #lex.string
 [lex.token]: #lex.token
 [over.literal]: over.md#over.literal
 [tab:alternative.representations]: #tab:alternative.representations
 [tab:alternative.tokens]: #tab:alternative.tokens
+[tab:charname.allowed]: #tab:charname.allowed
+[tab:charname.disallowed]: #tab:charname.disallowed
 [tab:escape.sequences]: #tab:escape.sequences
 [tab:identifiers.special]: #tab:identifiers.special
 [tab:keywords]: #tab:keywords
 [tab:lex.string.concat]: #tab:lex.string.concat
 [tab:lex.type.integer.literal]: #tab:lex.type.integer.literal
 [temp.explicit]: temp.md#temp.explicit
 [temp.names]: temp.md#temp.names
 [^1]: Implementations must behave as if these separate phases occur,
     although in practice different phases might be folded together.
 [^2]: A partial preprocessing token would arise from a source file
 [^4]: The glyphs for the members of the basic source character set are
     intended to identify characters from the subset of ISO/IEC 10646
     which corresponds to the ASCII character set. However, because the
     mapping from source file characters to the source character set
     (described in translation phase 1) is specified as
+ *implementation-defined*, an implementation is required to document
     how the basic source characters are represented in source files.
+[^5]: A sequence of characters resembling a *universal-character-name*
+ in an *r-char-sequence* ([[lex.string]]) does not form a
+ *universal-character-name*.
 [^6]:  These include “digraphs” and additional reserved words. The term
     “digraph” (token consisting of two characters) is not perfectly
     descriptive, since one of the alternative preprocessing-tokens is
     `%:%:` and of course several primary tokens contain two characters.
     might result in an error, be interpreted as the character
     corresponding to the escape sequence, or have a completely different
     meaning, depending on the implementation.
 [^10]: On systems in which linkers cannot accept extended characters, an
+    encoding of the *universal-character-name* may be used in forming
     valid external identifiers. For example, some otherwise unused
     character or sequence of characters may be used to encode the `\u`
+    in a *universal-character-name*. Extended characters may produce a
     long external identifier, but C++does not place a translation limit
     on significant characters for external identifiers. In C++, upper-
     and lower-case letters are considered different for all identifiers,
     including external identifiers.
 [^12]: The digits `8` and `9` are not octal digits.
 [^13]: They are intended for character sets where a character does not
     fit into a single byte.
+[^14]: Using an escape sequence for a question mark is supported for
+ compatibility with ISO C++14and ISO C.

Diff to HTML by rtfpessoa