[lex.ccon] - C++14 → C++17

Files changed (1) hide show

tmp/tmpfgmirpl7/{from.md → to.md} +74 -49

tmp/tmpfgmirpl7/{from.md → to.md} RENAMED Viewed

@@ -1,13 +1,15 @@
 ### Character literals <a id="lex.ccon">[[lex.ccon]]</a>
 ``` bnf
 character-literal:
-    ''' c-char-sequence '''
-    u''' c-char-sequence '''
-    U''' c-char-sequence '''
-    L''' c-char-sequence '''
 ```
 ``` bnf
 c-char-sequence:
     c-char
@@ -39,46 +41,64 @@ hexadecimal-escape-sequence:
     '\x' hexadecimal-digit
     hexadecimal-escape-sequence hexadecimal-digit
 ```
 A character literal is one or more characters enclosed in single quotes,
-as in `'x'`, optionally preceded by one of the letters `u`, `U`, or `L`,
-as in `u'y'`, `U'z'`, or `L'x'`, respectively. A character literal that
-does not begin with `u`, `U`, or `L` is an ordinary character literal,
-also referred to as a narrow-character literal. An ordinary character
-literal that contains a single *c-char* representable in the execution
-character set has type `char`, with value equal to the numerical value
-of the encoding of the *c-char* in the execution character set. An
-ordinary character literal that contains more than one *c-char* is a
-*multicharacter literal*. A multicharacter literal, or an ordinary
-character literal containing a single *c-char* not representable in the
-execution character set, is conditionally-supported, has type `int`, and
-has an *implementation-defined* value.
-A character literal that begins with the letter `u`, such as `u'y'`, is
 a character literal of type `char16_t`. The value of a `char16_t`
-literal containing a single *c-char* is equal to its ISO 10646 code
-point value, provided that the code point is representable with a single
-16-bit code unit. (That is, provided it is a basic multi-lingual plane
-code point.) If the value is not representable within 16 bits, the
-program is ill-formed. A `char16_t` literal containing multiple
-*c-char*s is ill-formed. A character literal that begins with the letter
-`U`, such as `U'z'`, is a character literal of type `char32_t`. The
-value of a `char32_t` literal containing a single *c-char* is equal to
-its ISO 10646 code point value. A `char32_t` literal containing multiple
-*c-char*s is ill-formed. A character literal that begins with the letter
-`L`, such as `L'x'`, is a wide-character literal. A wide-character
-literal has type `wchar_t`.[^13] The value of a wide-character literal
-containing a single *c-char* has value equal to the numerical value of
-the encoding of the *c-char* in the execution wide-character set, unless
-the *c-char* has no representation in the execution wide-character set,
-in which case the value is *implementation-defined*. The type `wchar_t`
-is able to represent all members of the execution wide-character set
-(see  [[basic.fundamental]]). . The value of a wide-character literal
-containing multiple *c-char*s is *implementation-defined*.
-Certain nongraphic characters, the single quote `'`, the double quote
 `"`, the question mark `?`,[^14] and the backslash `\`, can be
 represented according to Table  [[tab:escape.sequences]]. The double
 quote `"` and the question mark `?`, can be represented as themselves or
 by the escape sequences `\"` and `\?` respectively, but the single quote
 `'` and the backslash `\` shall be represented by the escape sequences
@@ -113,20 +133,25 @@ backslash followed by `x` followed by one or more hexadecimal digits
 that are taken to specify the value of the desired character. There is
 no limit to the number of digits in a hexadecimal sequence. A sequence
 of octal or hexadecimal digits is terminated by the first character that
 is not an octal digit or a hexadecimal digit, respectively. The value of
 a character literal is *implementation-defined* if it falls outside of
-the implementation-defined range defined for `char` (for literals with
-no prefix), `char16_t` (for literals prefixed by `'u'`), `char32_t` (for
-literals prefixed by `'U'`), or `wchar_t` (for literals prefixed by
-`'L'`).
-A universal-character-name is translated to the encoding, in the
 appropriate execution character set, of the character named. If there is
-no such encoding, the universal-character-name is translated to an
-*implementation-defined* encoding. In translation phase 1, a
-universal-character-name is introduced whenever an actual extended
-character is encountered in the source text. Therefore, all extended
-characters are described in terms of universal-character-names. However,
-the actual compiler implementation may use its own native character set,
-so long as the same results are obtained.

 ### Character literals <a id="lex.ccon">[[lex.ccon]]</a>
 ``` bnf
 character-literal:
+ encoding-prefixₒₚₜ ''' c-char-sequence '''
+```
+``` bnf
+encoding-prefix: one of
+    'u8' 'u' 'U' 'L'
 ```
 ``` bnf
 c-char-sequence:
     c-char
     '\x' hexadecimal-digit
     hexadecimal-escape-sequence hexadecimal-digit
 ```
 A character literal is one or more characters enclosed in single quotes,
+as in `'x'`, optionally preceded by `u8`, `u`, `U`, or `L`, as in
+`u8'w'`, `u'x'`, `U'y'`, or `L'z'`, respectively.
+A character literal that does not begin with `u8`, `u`, `U`, or `L` is
+an *ordinary character literal*. An ordinary character literal that
+contains a single *c-char* representable in the execution character set
+has type `char`, with value equal to the numerical value of the encoding
+of the *c-char* in the execution character set. An ordinary character
+literal that contains more than one *c-char* is a *multicharacter
+literal*. A multicharacter literal, or an ordinary character literal
+containing a single *c-char* not representable in the execution
+character set, is conditionally-supported, has type `int`, and has an
+*implementation-defined* value.
+A character literal that begins with `u8`, such as `u8'w'`, is a
+character literal of type `char`, known as a *UTF-8 character literal*.
+The value of a UTF-8 character literal is equal to its ISO 10646 code
+point value, provided that the code point value is representable with a
+single UTF-8 code unit (that is, provided it is in the C0 Controls and
+Basic Latin Unicode block). If the value is not representable with a
+single UTF-8 code unit, the program is ill-formed. A UTF-8 character
+literal containing multiple *c-char*s is ill-formed.
+A character literal that begins with the letter `u`, such as `u'x'`, is
 a character literal of type `char16_t`. The value of a `char16_t`
+character literal containing a single *c-char* is equal to its ISO 10646
+code point value, provided that the code point is representable with a
+single 16-bit code unit. (That is, provided it is a basic multi-lingual
+plane code point.) If the value is not representable within 16 bits, the
+program is ill-formed. A `char16_t` character literal containing
+multiple *c-char*s is ill-formed.
+A character literal that begins with the letter `U`, such as `U'y'`, is
+a character literal of type `char32_t`. The value of a `char32_t`
+character literal containing a single *c-char* is equal to its ISO 10646
+code point value. A `char32_t` character literal containing multiple
+*c-char*s is ill-formed.
+A character literal that begins with the letter `L`, such as `L'z'`, is
+a *wide-character literal*. A wide-character literal has type
+`wchar_t`.[^13] The value of a wide-character literal containing a
+single *c-char* has value equal to the numerical value of the encoding
+of the *c-char* in the execution wide-character set, unless the *c-char*
+has no representation in the execution wide-character set, in which case
+the value is *implementation-defined*.
+[*Note 1*: The type `wchar_t` is able to represent all members of the
+execution wide-character set (see
+[[basic.fundamental]]). — *end note*]
+The value of a wide-character literal containing multiple *c-char*s is
+*implementation-defined*.
+Certain non-graphic characters, the single quote `'`, the double quote
 `"`, the question mark `?`,[^14] and the backslash `\`, can be
 represented according to Table  [[tab:escape.sequences]]. The double
 quote `"` and the question mark `?`, can be represented as themselves or
 by the escape sequences `\"` and `\?` respectively, but the single quote
 `'` and the backslash `\` shall be represented by the escape sequences
 that are taken to specify the value of the desired character. There is
 no limit to the number of digits in a hexadecimal sequence. A sequence
 of octal or hexadecimal digits is terminated by the first character that
 is not an octal digit or a hexadecimal digit, respectively. The value of
 a character literal is *implementation-defined* if it falls outside of
+the *implementation-defined* range defined for `char` (for character
+literals with no prefix) or `wchar_t` (for character literals prefixed
+by `L`).
+[*Note 2*: If the value of a character literal prefixed by `u`, `u8`,
+or `U` is outside the range defined for its type, the program is
+ill-formed. — *end note*]
+A *universal-character-name* is translated to the encoding, in the
 appropriate execution character set, of the character named. If there is
+no such encoding, the *universal-character-name* is translated to an
+*implementation-defined* encoding.
+[*Note 3*: In translation phase 1, a *universal-character-name* is
+introduced whenever an actual extended character is encountered in the
+source text. Therefore, all extended characters are described in terms
+of *universal-character-name*s. However, the actual compiler
+implementation may use its own native character set, so long as the same
+results are obtained. — *end note*]

Diff to HTML by rtfpessoa