[lex.ccon] - C++20 → C++23

Files changed (1) hide show

tmp/tmp4mkv6h0u/{from.md → to.md} +130 -122

tmp/tmp4mkv6h0u/{from.md → to.md} RENAMED Viewed

@@ -16,153 +16,161 @@ c-char-sequence:
     c-char-sequence c-char
 ```
 ``` bnf
 c-char:
- any member of the basic source character set except the single-quote ''', backslash '\', or new-line character
     escape-sequence
     universal-character-name
 ```
 ``` bnf
 escape-sequence:
     simple-escape-sequence
     octal-escape-sequence
     hexadecimal-escape-sequence
 ```
 ``` bnf
-simple-escape-sequence: one of
- '\'' '\"' '\?' '\\'
- '\a' '\b' '\f' '\n' '\r' '\t' '\v'
 ```
 ``` bnf
 octal-escape-sequence:
     '\' octal-digit
     '\' octal-digit octal-digit
     '\' octal-digit octal-digit octal-digit
 ```
 ``` bnf
 hexadecimal-escape-sequence:
-    '\x' hexadecimal-digit
-    hexadecimal-escape-sequence hexadecimal-digit
 ```
-A *character-literal* that does not begin with `u8`, `u`, `U`, or `L` is
-an *ordinary character literal*. An ordinary character literal that
-contains a single *c-char* representable in the execution character set
-has type `char`, with value equal to the numerical value of the encoding
-of the *c-char* in the execution character set. An ordinary character
-literal that contains more than one *c-char* is a
-*multicharacter literal*. A multicharacter literal, or an ordinary
-character literal containing a single *c-char* not representable in the
-execution character set, is conditionally-supported, has type `int`, and
-has an *implementation-defined* value.
-A *character-literal* that begins with `u8`, such as `u8'w'`, is a
-*character-literal* of type `char8_t`, known as a *UTF-8 character
-literal*. The value of a UTF-8 character literal is equal to its ISO/IEC
-10646 code point value, provided that the code point value can be
-encoded as a single UTF-8 code unit.
-[*Note 1*: That is, provided the code point value is in the range
-[0, 7F] (hexadecimal). — *end note*]
-If the value is not representable with a single UTF-8 code unit, the
-program is ill-formed. A UTF-8 character literal containing multiple
-*c-char*s is ill-formed.
-A *character-literal* that begins with the letter `u`, such as `u'x'`,
-is a *character-literal* of type `char16_t`, known as a *UTF-16
-character literal*. The value of a UTF-16 character literal is equal to
-its ISO/IEC 10646 code point value, provided that the code point value
-is representable with a single 16-bit code unit.
-[*Note 2*: That is, provided the code point value is in the range
-[0, FFFF] (hexadecimal). — *end note*]
-If the value is not representable with a single 16-bit code unit, the
-program is ill-formed. A UTF-16 character literal containing multiple
-*c-char*s is ill-formed.
-A *character-literal* that begins with the letter `U`, such as `U'y'`,
-is a *character-literal* of type `char32_t`, known as a *UTF-32
-character literal*. The value of a UTF-32 character literal containing a
-single *c-char* is equal to its ISO/IEC 10646 code point value. A UTF-32
-character literal containing multiple *c-char*s is ill-formed.
-A *character-literal* that begins with the letter `L`, such as `L'z'`,
-is a *wide-character literal*. A wide-character literal has type
-`wchar_t`.[^12] The value of a wide-character literal containing a
-single *c-char* has value equal to the numerical value of the encoding
-of the *c-char* in the execution wide-character set, unless the *c-char*
-has no representation in the execution wide-character set, in which case
-the value is *implementation-defined*.
-[*Note 3*: The type `wchar_t` is able to represent all members of the
-execution wide-character set (see
-[[basic.fundamental]]). — *end note*]
-The value of a wide-character literal containing multiple *c-char*s is
-*implementation-defined*.
-Certain non-graphic characters, the single quote `'`, the double quote
-`"`, the question mark `?`,[^13] and the backslash `\`, can be
-represented according to [[lex.ccon.esc]]. The double quote `"` and the
-question mark `?`, can be represented as themselves or by the escape
-sequences `\"` and `\?` respectively, but the single quote `'` and the
-backslash `\` shall be represented by the escape sequences `\'` and `\\`
-respectively. Escape sequences in which the character following the
-backslash is not listed in [[lex.ccon.esc]] are conditionally-supported,
-with *implementation-defined* semantics. An escape sequence specifies a
-single character.
-**Table: Escape sequences** <a id="lex.ccon.esc">[lex.ccon.esc]</a>
-|                 |                |                    |
-| --------------- | -------------- | ------------------ |
-| new-line        | NL(LF)         | `\n`               |
-| horizontal tab  | HT             | `\t`               |
-| vertical tab    | VT             | `\v`               |
-| backspace       | BS             | `\b`               |
-| carriage return | CR             | `\r`               |
-| form feed       | FF             | `\f`               |
-| alert           | BEL            | `\a`               |
-| backslash       | \              | ``                 |
-| question mark   | ?              | `\?`               |
-| single quote    | `'`            | `\'`               |
-| double quote    | `"`            | `\"`               |
-| octal number    | \numconst{ooo} | `numconst{ooo}`    |
-| hex number      | \numconst{hhh} | `\x\numconst{hhh}` |
-The escape `\\numconst{ooo}` consists of the backslash followed by one,
-two, or three octal digits that are taken to specify the value of the
-desired character. The escape `\x\numconst{hhh}` consists of the
-backslash followed by `x` followed by one or more hexadecimal digits
-that are taken to specify the value of the desired character. There is
-no limit to the number of digits in a hexadecimal sequence. A sequence
-of octal or hexadecimal digits is terminated by the first character that
-is not an octal digit or a hexadecimal digit, respectively. The value of
-a *character-literal* is *implementation-defined* if it falls outside of
-the *implementation-defined* range defined for `char` (for
-*character-literal*s with no prefix) or `wchar_t` (for
-*character-literal*s prefixed by `L`).
-[*Note 4*: If the value of a *character-literal* prefixed by `u`, `u8`,
-or `U` is outside the range defined for its type, the program is
-ill-formed. — *end note*]
-A *universal-character-name* is translated to the encoding, in the
-appropriate execution character set, of the character named. If there is
-no such encoding, the *universal-character-name* is translated to an
-*implementation-defined* encoding.
-[*Note 5*: In translation phase 1, a *universal-character-name* is
-introduced whenever an actual extended character is encountered in the
-source text. Therefore, all extended characters are described in terms
-of *universal-character-name*s. However, the actual compiler
-implementation may use its own native character set, so long as the same
-results are obtained. — *end note*]

     c-char-sequence c-char
 ```
 ``` bnf
 c-char:
+    basic-c-char
     escape-sequence
     universal-character-name
 ```
+``` bnf
+basic-c-char:
+    any member of the translation character set except the U+0027 (apostrophe),
+      U+005c (reverse solidus), or new-line character
+```
 ``` bnf
 escape-sequence:
     simple-escape-sequence
+    numeric-escape-sequence
+    conditional-escape-sequence
+```
+``` bnf
+simple-escape-sequence:
+    '\' simple-escape-sequence-char
+```
+``` bnf
+simple-escape-sequence-char: one of
+    '' " ? \ a b f n r t v'
+```
+``` bnf
+numeric-escape-sequence:
     octal-escape-sequence
     hexadecimal-escape-sequence
 ```
 ``` bnf
+simple-octal-digit-sequence:
+ octal-digit
+ simple-octal-digit-sequence octal-digit
 ```
 ``` bnf
 octal-escape-sequence:
     '\' octal-digit
     '\' octal-digit octal-digit
     '\' octal-digit octal-digit octal-digit
+    '\o{' simple-octal-digit-sequence '}'
 ```
 ``` bnf
 hexadecimal-escape-sequence:
+    '\x' simple-hexadecimal-digit-sequence
+ '\x{' simple-hexadecimal-digit-sequence '}'
 ```
+``` bnf
+conditional-escape-sequence:
+    '\' conditional-escape-sequence-char
+```
+``` bnf
+conditional-escape-sequence-char:
+    any member of the basic character set that is not an octal-digit, a simple-escape-sequence-char, or the characters 'N', 'o', 'u', 'U', or 'x'
+```
+A *non-encodable character literal* is a *character-literal* whose
+*c-char-sequence* consists of a single *c-char* that is not a
+*numeric-escape-sequence* and that specifies a character that either
+lacks representation in the literal’s associated character encoding or
+that cannot be encoded as a single code unit. A *multicharacter literal*
+is a *character-literal* whose *c-char-sequence* consists of more than
+one *c-char*. The *encoding-prefix* of a non-encodable character literal
+or a multicharacter literal shall be absent. Such *character-literal*s
+are conditionally-supported.
+The kind of a *character-literal*, its type, and its associated
+character encoding [[lex.charset]] are determined by its
+*encoding-prefix* and its *c-char-sequence* as defined by
+[[lex.ccon.literal]]. The special cases for non-encodable character
+literals and multicharacter literals take precedence over the base kind.
+[*Note 1*: The associated character encoding for ordinary character
+literals determines encodability, but does not determine the value of
+non-encodable ordinary character literals or ordinary multicharacter
+literals. The examples in [[lex.ccon.literal]] for non-encodable
+ordinary character literals assume that the specified character lacks
+representation in the ordinary literal encoding or that encoding the
+character would require more than one code unit. — *end note*]
+**Table: Character literals** <a id="lex.ccon.literal">[lex.ccon.literal]</a>
+|      |                            |            |              |         |
+| ---- | -------------------------- | ---------- | ------------ | ------- |
+| none | ordinary character literal | `char`     | ordinary     | `'v'`   |
+| `L`  | wide character literal     | `wchar_t`  | wide literal | `L'w'`  |
+|      |                            |            | encoding     |         |
+| `u8` | UTF-8 character literal    | `char8_t`  | UTF-8        | `u8'x'` |
+| `u`  | UTF-16 character literal   | `char16_t` | UTF-16       | `u'y'`  |
+| `U`  | UTF-32 character literal   | `char32_t` | UTF-32       | `U'z'`  |
+In translation phase 4, the value of a *character-literal* is determined
+using the range of representable values of the *character-literal*’s
+type in translation phase 7. A non-encodable character literal or a
+multicharacter literal has an *implementation-defined* value. The value
+of any other kind of *character-literal* is determined as follows:
+- A *character-literal* with a *c-char-sequence* consisting of a single
+  *basic-c-char*, *simple-escape-sequence*, or
+  *universal-character-name* is the code unit value of the specified
+  character as encoded in the literal’s associated character encoding.
+  \[*Note 2*: If the specified character lacks representation in the
+  literal’s associated character encoding or if it cannot be encoded as
+  a single code unit, then the literal is a non-encodable character
+  literal. — *end note*]
+- A *character-literal* with a *c-char-sequence* consisting of a single
+  *numeric-escape-sequence* has a value as follows:
+  - Let v be the integer value represented by the octal number
+    comprising the sequence of *octal-digit*s in an
+    *octal-escape-sequence* or by the hexadecimal number comprising the
+    sequence of *hexadecimal-digit*s in a *hexadecimal-escape-sequence*.
+ - If v does not exceed the range of representable values of the
+    *character-literal*’s type, then the value is v.
+  - Otherwise, if the *character-literal*’s *encoding-prefix* is absent
+    or `L`, and v does not exceed the range of representable values of
+    the corresponding unsigned type for the underlying type of the
+    *character-literal*’s type, then the value is the unique value of
+    the *character-literal*’s type `T` that is congruent to v modulo 2ᴺ,
+    where N is the width of `T`.
+  - Otherwise, the *character-literal* is ill-formed.
+- A *character-literal* with a *c-char-sequence* consisting of a single
+  *conditional-escape-sequence* is conditionally-supported and has an
+  *implementation-defined* value.
+The character specified by a *simple-escape-sequence* is specified in
+[[lex.ccon.esc]].
+[*Note 3*: Using an escape sequence for a question mark is supported
+for compatibility with ISO C++14 and ISO C. — *end note*]
+**Table: Simple escape sequences** <a id="lex.ccon.esc">[lex.ccon.esc]</a>
+| character |                      | *simple-escape-sequence* |
+| --------- | -------------------- | ------------------------ |
+| `U+000a`  | line feed            | `\n`                     |
+| `U+0009`  | character tabulation | `\t`                     |
+| `U+000b`  | line tabulation      | `\v`                     |
+| `U+0008`  | backspace            | `\b`                     |
+| `U+000d`  | carriage return      | `\r`                     |
+| `U+000c`  | form feed            | `\f`                     |
+| `U+0007`  | alert                | `\a`                     |
+| `U+005c`  | reverse solidus      | ``                       |
+| `U+003f`  | question mark        | `\?`                     |
+| `U+0027`  | apostrophe           | `\'`                     |
+| `U+0022`  | quotation mark       | `\"`                     |

Diff to HTML by rtfpessoa