[lex.ccon] - C++17 → C++20

Files changed (1) hide show

tmp/tmp9ch4qk5m/{from.md → to.md} +64 -53

tmp/tmp9ch4qk5m/{from.md → to.md} RENAMED Viewed

@@ -14,10 +14,17 @@ encoding-prefix: one of
 c-char-sequence:
     c-char
     c-char-sequence c-char
 ```
 ``` bnf
 escape-sequence:
     simple-escape-sequence
     octal-escape-sequence
     hexadecimal-escape-sequence
@@ -40,76 +47,80 @@ octal-escape-sequence:
 hexadecimal-escape-sequence:
     '\x' hexadecimal-digit
     hexadecimal-escape-sequence hexadecimal-digit
 ```
-A character literal is one or more characters enclosed in single quotes,
-as in `'x'`, optionally preceded by `u8`, `u`, `U`, or `L`, as in
-`u8'w'`, `u'x'`, `U'y'`, or `L'z'`, respectively.
-A character literal that does not begin with `u8`, `u`, `U`, or `L` is
 an *ordinary character literal*. An ordinary character literal that
 contains a single *c-char* representable in the execution character set
 has type `char`, with value equal to the numerical value of the encoding
 of the *c-char* in the execution character set. An ordinary character
-literal that contains more than one *c-char* is a *multicharacter
-literal*. A multicharacter literal, or an ordinary character literal
-containing a single *c-char* not representable in the execution
-character set, is conditionally-supported, has type `int`, and has an
-*implementation-defined* value.
-A character literal that begins with `u8`, such as `u8'w'`, is a
-character literal of type `char`, known as a *UTF-8 character literal*.
-The value of a UTF-8 character literal is equal to its ISO 10646 code
-point value, provided that the code point value is representable with a
-single UTF-8 code unit (that is, provided it is in the C0 Controls and
-Basic Latin Unicode block). If the value is not representable with a
-single UTF-8 code unit, the program is ill-formed. A UTF-8 character
-literal containing multiple *c-char*s is ill-formed.
-A character literal that begins with the letter `u`, such as `u'x'`, is
-a character literal of type `char16_t`. The value of a `char16_t`
-character literal containing a single *c-char* is equal to its ISO 10646
-code point value, provided that the code point is representable with a
-single 16-bit code unit. (That is, provided it is a basic multi-lingual
-plane code point.) If the value is not representable within 16 bits, the
-program is ill-formed. A `char16_t` character literal containing
-multiple *c-char*s is ill-formed.
-A character literal that begins with the letter `U`, such as `U'y'`, is
-a character literal of type `char32_t`. The value of a `char32_t`
-character literal containing a single *c-char* is equal to its ISO 10646
-code point value. A `char32_t` character literal containing multiple
 *c-char*s is ill-formed.
-A character literal that begins with the letter `L`, such as `L'z'`, is
-a *wide-character literal*. A wide-character literal has type
-`wchar_t`.[^13] The value of a wide-character literal containing a
 single *c-char* has value equal to the numerical value of the encoding
 of the *c-char* in the execution wide-character set, unless the *c-char*
 has no representation in the execution wide-character set, in which case
 the value is *implementation-defined*.
-[*Note 1*: The type `wchar_t` is able to represent all members of the
 execution wide-character set (see
 [[basic.fundamental]]). — *end note*]
 The value of a wide-character literal containing multiple *c-char*s is
 *implementation-defined*.
 Certain non-graphic characters, the single quote `'`, the double quote
-`"`, the question mark `?`,[^14] and the backslash `\`, can be
-represented according to Table  [[tab:escape.sequences]]. The double
-quote `"` and the question mark `?`, can be represented as themselves or
-by the escape sequences `\"` and `\?` respectively, but the single quote
-`'` and the backslash `\` shall be represented by the escape sequences
-`\'` and `\\` respectively. Escape sequences in which the character
-following the backslash is not listed in Table  [[tab:escape.sequences]]
-are conditionally-supported, with *implementation-defined* semantics. An
-escape sequence specifies a single character.
-**Table: Escape sequences** <a id="tab:escape.sequences">[tab:escape.sequences]</a>
 |                 |                |                    |
 | --------------- | -------------- | ------------------ |
 | new-line        | NL(LF)         | `\n`               |
 | horizontal tab  | HT             | `\t`               |
@@ -132,25 +143,25 @@ desired character. The escape `\x\numconst{hhh}` consists of the
 backslash followed by `x` followed by one or more hexadecimal digits
 that are taken to specify the value of the desired character. There is
 no limit to the number of digits in a hexadecimal sequence. A sequence
 of octal or hexadecimal digits is terminated by the first character that
 is not an octal digit or a hexadecimal digit, respectively. The value of
-a character literal is *implementation-defined* if it falls outside of
-the *implementation-defined* range defined for `char` (for character
-literals with no prefix) or `wchar_t` (for character literals prefixed
-by `L`).
-[*Note 2*: If the value of a character literal prefixed by `u`, `u8`,
 or `U` is outside the range defined for its type, the program is
 ill-formed. — *end note*]
 A *universal-character-name* is translated to the encoding, in the
 appropriate execution character set, of the character named. If there is
 no such encoding, the *universal-character-name* is translated to an
 *implementation-defined* encoding.
-[*Note 3*: In translation phase 1, a *universal-character-name* is
 introduced whenever an actual extended character is encountered in the
 source text. Therefore, all extended characters are described in terms
 of *universal-character-name*s. However, the actual compiler
 implementation may use its own native character set, so long as the same
 results are obtained. — *end note*]

 c-char-sequence:
     c-char
     c-char-sequence c-char
 ```
+``` bnf
+c-char:
+    any member of the basic source character set except the single-quote ''', backslash '\', or new-line character
+    escape-sequence
+    universal-character-name
+```
 ``` bnf
 escape-sequence:
     simple-escape-sequence
     octal-escape-sequence
     hexadecimal-escape-sequence
 hexadecimal-escape-sequence:
     '\x' hexadecimal-digit
     hexadecimal-escape-sequence hexadecimal-digit
 ```
+A *character-literal* that does not begin with `u8`, `u`, `U`, or `L` is
 an *ordinary character literal*. An ordinary character literal that
 contains a single *c-char* representable in the execution character set
 has type `char`, with value equal to the numerical value of the encoding
 of the *c-char* in the execution character set. An ordinary character
+literal that contains more than one *c-char* is a
+*multicharacter literal*. A multicharacter literal, or an ordinary
+character literal containing a single *c-char* not representable in the
+execution character set, is conditionally-supported, has type `int`, and
+has an *implementation-defined* value.
+A *character-literal* that begins with `u8`, such as `u8'w'`, is a
+*character-literal* of type `char8_t`, known as a *UTF-8 character
+literal*. The value of a UTF-8 character literal is equal to its ISO/IEC
+10646 code point value, provided that the code point value can be
+encoded as a single UTF-8 code unit.
+[*Note 1*: That is, provided the code point value is in the range
+[0, 7F] (hexadecimal). — *end note*]
+If the value is not representable with a single UTF-8 code unit, the
+program is ill-formed. A UTF-8 character literal containing multiple
+*c-char*s is ill-formed.
+A *character-literal* that begins with the letter `u`, such as `u'x'`,
+is a *character-literal* of type `char16_t`, known as a *UTF-16
+character literal*. The value of a UTF-16 character literal is equal to
+its ISO/IEC 10646 code point value, provided that the code point value
+is representable with a single 16-bit code unit.
+[*Note 2*: That is, provided the code point value is in the range
+[0, FFFF] (hexadecimal). — *end note*]
+If the value is not representable with a single 16-bit code unit, the
+program is ill-formed. A UTF-16 character literal containing multiple
 *c-char*s is ill-formed.
+A *character-literal* that begins with the letter `U`, such as `U'y'`,
+is a *character-literal* of type `char32_t`, known as a *UTF-32
+character literal*. The value of a UTF-32 character literal containing a
+single *c-char* is equal to its ISO/IEC 10646 code point value. A UTF-32
+character literal containing multiple *c-char*s is ill-formed.
+A *character-literal* that begins with the letter `L`, such as `L'z'`,
+is a *wide-character literal*. A wide-character literal has type
+`wchar_t`.[^12] The value of a wide-character literal containing a
 single *c-char* has value equal to the numerical value of the encoding
 of the *c-char* in the execution wide-character set, unless the *c-char*
 has no representation in the execution wide-character set, in which case
 the value is *implementation-defined*.
+[*Note 3*: The type `wchar_t` is able to represent all members of the
 execution wide-character set (see
 [[basic.fundamental]]). — *end note*]
 The value of a wide-character literal containing multiple *c-char*s is
 *implementation-defined*.
 Certain non-graphic characters, the single quote `'`, the double quote
+`"`, the question mark `?`,[^13] and the backslash `\`, can be
+represented according to [[lex.ccon.esc]]. The double quote `"` and the
+question mark `?`, can be represented as themselves or by the escape
+sequences `\"` and `\?` respectively, but the single quote `'` and the
+backslash `\` shall be represented by the escape sequences `\'` and `\\`
+respectively. Escape sequences in which the character following the
+backslash is not listed in [[lex.ccon.esc]] are conditionally-supported,
+with *implementation-defined* semantics. An escape sequence specifies a
+single character.
+**Table: Escape sequences** <a id="lex.ccon.esc">[lex.ccon.esc]</a>
 |                 |                |                    |
 | --------------- | -------------- | ------------------ |
 | new-line        | NL(LF)         | `\n`               |
 | horizontal tab  | HT             | `\t`               |
 backslash followed by `x` followed by one or more hexadecimal digits
 that are taken to specify the value of the desired character. There is
 no limit to the number of digits in a hexadecimal sequence. A sequence
 of octal or hexadecimal digits is terminated by the first character that
 is not an octal digit or a hexadecimal digit, respectively. The value of
+a *character-literal* is *implementation-defined* if it falls outside of
+the *implementation-defined* range defined for `char` (for
+*character-literal*s with no prefix) or `wchar_t` (for
+*character-literal*s prefixed by `L`).
+[*Note 4*: If the value of a *character-literal* prefixed by `u`, `u8`,
 or `U` is outside the range defined for its type, the program is
 ill-formed. — *end note*]
 A *universal-character-name* is translated to the encoding, in the
 appropriate execution character set, of the character named. If there is
 no such encoding, the *universal-character-name* is translated to an
 *implementation-defined* encoding.
+[*Note 5*: In translation phase 1, a *universal-character-name* is
 introduced whenever an actual extended character is encountered in the
 source text. Therefore, all extended characters are described in terms
 of *universal-character-name*s. However, the actual compiler
 implementation may use its own native character set, so long as the same
 results are obtained. — *end note*]

Diff to HTML by rtfpessoa