tmp/tmp9ch4qk5m/{from.md → to.md}
RENAMED
|
@@ -14,10 +14,17 @@ encoding-prefix: one of
|
|
| 14 |
c-char-sequence:
|
| 15 |
c-char
|
| 16 |
c-char-sequence c-char
|
| 17 |
```
|
| 18 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
``` bnf
|
| 20 |
escape-sequence:
|
| 21 |
simple-escape-sequence
|
| 22 |
octal-escape-sequence
|
| 23 |
hexadecimal-escape-sequence
|
|
@@ -40,76 +47,80 @@ octal-escape-sequence:
|
|
| 40 |
hexadecimal-escape-sequence:
|
| 41 |
'\x' hexadecimal-digit
|
| 42 |
hexadecimal-escape-sequence hexadecimal-digit
|
| 43 |
```
|
| 44 |
|
| 45 |
-
A character
|
| 46 |
-
as in `'x'`, optionally preceded by `u8`, `u`, `U`, or `L`, as in
|
| 47 |
-
`u8'w'`, `u'x'`, `U'y'`, or `L'z'`, respectively.
|
| 48 |
-
|
| 49 |
-
A character literal that does not begin with `u8`, `u`, `U`, or `L` is
|
| 50 |
an *ordinary character literal*. An ordinary character literal that
|
| 51 |
contains a single *c-char* representable in the execution character set
|
| 52 |
has type `char`, with value equal to the numerical value of the encoding
|
| 53 |
of the *c-char* in the execution character set. An ordinary character
|
| 54 |
-
literal that contains more than one *c-char* is a
|
| 55 |
-
literal*. A multicharacter literal, or an ordinary
|
| 56 |
-
containing a single *c-char* not representable in the
|
| 57 |
-
character set, is conditionally-supported, has type `int`, and
|
| 58 |
-
*implementation-defined* value.
|
| 59 |
-
|
| 60 |
-
A character
|
| 61 |
-
character
|
| 62 |
-
The value of a UTF-8 character literal is equal to its ISO
|
| 63 |
-
point value, provided that the code point value
|
| 64 |
-
single UTF-8 code unit
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
|
|
|
|
|
|
| 82 |
*c-char*s is ill-formed.
|
| 83 |
|
| 84 |
-
A character
|
| 85 |
-
a *
|
| 86 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 87 |
single *c-char* has value equal to the numerical value of the encoding
|
| 88 |
of the *c-char* in the execution wide-character set, unless the *c-char*
|
| 89 |
has no representation in the execution wide-character set, in which case
|
| 90 |
the value is *implementation-defined*.
|
| 91 |
|
| 92 |
-
[*Note
|
| 93 |
execution wide-character set (see
|
| 94 |
[[basic.fundamental]]). — *end note*]
|
| 95 |
|
| 96 |
The value of a wide-character literal containing multiple *c-char*s is
|
| 97 |
*implementation-defined*.
|
| 98 |
|
| 99 |
Certain non-graphic characters, the single quote `'`, the double quote
|
| 100 |
-
`"`, the question mark `?`,[^
|
| 101 |
-
represented according to
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
|
| 109 |
|
| 110 |
-
**Table: Escape sequences** <a id="
|
| 111 |
|
| 112 |
| | | |
|
| 113 |
| --------------- | -------------- | ------------------ |
|
| 114 |
| new-line | NL(LF) | `\n` |
|
| 115 |
| horizontal tab | HT | `\t` |
|
|
@@ -132,25 +143,25 @@ desired character. The escape `\x\numconst{hhh}` consists of the
|
|
| 132 |
backslash followed by `x` followed by one or more hexadecimal digits
|
| 133 |
that are taken to specify the value of the desired character. There is
|
| 134 |
no limit to the number of digits in a hexadecimal sequence. A sequence
|
| 135 |
of octal or hexadecimal digits is terminated by the first character that
|
| 136 |
is not an octal digit or a hexadecimal digit, respectively. The value of
|
| 137 |
-
a character
|
| 138 |
-
the *implementation-defined* range defined for `char` (for
|
| 139 |
-
|
| 140 |
-
by `L`).
|
| 141 |
|
| 142 |
-
[*Note
|
| 143 |
or `U` is outside the range defined for its type, the program is
|
| 144 |
ill-formed. — *end note*]
|
| 145 |
|
| 146 |
A *universal-character-name* is translated to the encoding, in the
|
| 147 |
appropriate execution character set, of the character named. If there is
|
| 148 |
no such encoding, the *universal-character-name* is translated to an
|
| 149 |
*implementation-defined* encoding.
|
| 150 |
|
| 151 |
-
[*Note
|
| 152 |
introduced whenever an actual extended character is encountered in the
|
| 153 |
source text. Therefore, all extended characters are described in terms
|
| 154 |
of *universal-character-name*s. However, the actual compiler
|
| 155 |
implementation may use its own native character set, so long as the same
|
| 156 |
results are obtained. — *end note*]
|
|
|
|
| 14 |
c-char-sequence:
|
| 15 |
c-char
|
| 16 |
c-char-sequence c-char
|
| 17 |
```
|
| 18 |
|
| 19 |
+
``` bnf
|
| 20 |
+
c-char:
|
| 21 |
+
any member of the basic source character set except the single-quote ''', backslash '\', or new-line character
|
| 22 |
+
escape-sequence
|
| 23 |
+
universal-character-name
|
| 24 |
+
```
|
| 25 |
+
|
| 26 |
``` bnf
|
| 27 |
escape-sequence:
|
| 28 |
simple-escape-sequence
|
| 29 |
octal-escape-sequence
|
| 30 |
hexadecimal-escape-sequence
|
|
|
|
| 47 |
hexadecimal-escape-sequence:
|
| 48 |
'\x' hexadecimal-digit
|
| 49 |
hexadecimal-escape-sequence hexadecimal-digit
|
| 50 |
```
|
| 51 |
|
| 52 |
+
A *character-literal* that does not begin with `u8`, `u`, `U`, or `L` is
|
|
|
|
|
|
|
|
|
|
|
|
|
| 53 |
an *ordinary character literal*. An ordinary character literal that
|
| 54 |
contains a single *c-char* representable in the execution character set
|
| 55 |
has type `char`, with value equal to the numerical value of the encoding
|
| 56 |
of the *c-char* in the execution character set. An ordinary character
|
| 57 |
+
literal that contains more than one *c-char* is a
|
| 58 |
+
*multicharacter literal*. A multicharacter literal, or an ordinary
|
| 59 |
+
character literal containing a single *c-char* not representable in the
|
| 60 |
+
execution character set, is conditionally-supported, has type `int`, and
|
| 61 |
+
has an *implementation-defined* value.
|
| 62 |
+
|
| 63 |
+
A *character-literal* that begins with `u8`, such as `u8'w'`, is a
|
| 64 |
+
*character-literal* of type `char8_t`, known as a *UTF-8 character
|
| 65 |
+
literal*. The value of a UTF-8 character literal is equal to its ISO/IEC
|
| 66 |
+
10646 code point value, provided that the code point value can be
|
| 67 |
+
encoded as a single UTF-8 code unit.
|
| 68 |
+
|
| 69 |
+
[*Note 1*: That is, provided the code point value is in the range
|
| 70 |
+
[0, 7F] (hexadecimal). — *end note*]
|
| 71 |
+
|
| 72 |
+
If the value is not representable with a single UTF-8 code unit, the
|
| 73 |
+
program is ill-formed. A UTF-8 character literal containing multiple
|
| 74 |
+
*c-char*s is ill-formed.
|
| 75 |
+
|
| 76 |
+
A *character-literal* that begins with the letter `u`, such as `u'x'`,
|
| 77 |
+
is a *character-literal* of type `char16_t`, known as a *UTF-16
|
| 78 |
+
character literal*. The value of a UTF-16 character literal is equal to
|
| 79 |
+
its ISO/IEC 10646 code point value, provided that the code point value
|
| 80 |
+
is representable with a single 16-bit code unit.
|
| 81 |
+
|
| 82 |
+
[*Note 2*: That is, provided the code point value is in the range
|
| 83 |
+
[0, FFFF] (hexadecimal). — *end note*]
|
| 84 |
+
|
| 85 |
+
If the value is not representable with a single 16-bit code unit, the
|
| 86 |
+
program is ill-formed. A UTF-16 character literal containing multiple
|
| 87 |
*c-char*s is ill-formed.
|
| 88 |
|
| 89 |
+
A *character-literal* that begins with the letter `U`, such as `U'y'`,
|
| 90 |
+
is a *character-literal* of type `char32_t`, known as a *UTF-32
|
| 91 |
+
character literal*. The value of a UTF-32 character literal containing a
|
| 92 |
+
single *c-char* is equal to its ISO/IEC 10646 code point value. A UTF-32
|
| 93 |
+
character literal containing multiple *c-char*s is ill-formed.
|
| 94 |
+
|
| 95 |
+
A *character-literal* that begins with the letter `L`, such as `L'z'`,
|
| 96 |
+
is a *wide-character literal*. A wide-character literal has type
|
| 97 |
+
`wchar_t`.[^12] The value of a wide-character literal containing a
|
| 98 |
single *c-char* has value equal to the numerical value of the encoding
|
| 99 |
of the *c-char* in the execution wide-character set, unless the *c-char*
|
| 100 |
has no representation in the execution wide-character set, in which case
|
| 101 |
the value is *implementation-defined*.
|
| 102 |
|
| 103 |
+
[*Note 3*: The type `wchar_t` is able to represent all members of the
|
| 104 |
execution wide-character set (see
|
| 105 |
[[basic.fundamental]]). — *end note*]
|
| 106 |
|
| 107 |
The value of a wide-character literal containing multiple *c-char*s is
|
| 108 |
*implementation-defined*.
|
| 109 |
|
| 110 |
Certain non-graphic characters, the single quote `'`, the double quote
|
| 111 |
+
`"`, the question mark `?`,[^13] and the backslash `\`, can be
|
| 112 |
+
represented according to [[lex.ccon.esc]]. The double quote `"` and the
|
| 113 |
+
question mark `?`, can be represented as themselves or by the escape
|
| 114 |
+
sequences `\"` and `\?` respectively, but the single quote `'` and the
|
| 115 |
+
backslash `\` shall be represented by the escape sequences `\'` and `\\`
|
| 116 |
+
respectively. Escape sequences in which the character following the
|
| 117 |
+
backslash is not listed in [[lex.ccon.esc]] are conditionally-supported,
|
| 118 |
+
with *implementation-defined* semantics. An escape sequence specifies a
|
| 119 |
+
single character.
|
| 120 |
|
| 121 |
+
**Table: Escape sequences** <a id="lex.ccon.esc">[lex.ccon.esc]</a>
|
| 122 |
|
| 123 |
| | | |
|
| 124 |
| --------------- | -------------- | ------------------ |
|
| 125 |
| new-line | NL(LF) | `\n` |
|
| 126 |
| horizontal tab | HT | `\t` |
|
|
|
|
| 143 |
backslash followed by `x` followed by one or more hexadecimal digits
|
| 144 |
that are taken to specify the value of the desired character. There is
|
| 145 |
no limit to the number of digits in a hexadecimal sequence. A sequence
|
| 146 |
of octal or hexadecimal digits is terminated by the first character that
|
| 147 |
is not an octal digit or a hexadecimal digit, respectively. The value of
|
| 148 |
+
a *character-literal* is *implementation-defined* if it falls outside of
|
| 149 |
+
the *implementation-defined* range defined for `char` (for
|
| 150 |
+
*character-literal*s with no prefix) or `wchar_t` (for
|
| 151 |
+
*character-literal*s prefixed by `L`).
|
| 152 |
|
| 153 |
+
[*Note 4*: If the value of a *character-literal* prefixed by `u`, `u8`,
|
| 154 |
or `U` is outside the range defined for its type, the program is
|
| 155 |
ill-formed. — *end note*]
|
| 156 |
|
| 157 |
A *universal-character-name* is translated to the encoding, in the
|
| 158 |
appropriate execution character set, of the character named. If there is
|
| 159 |
no such encoding, the *universal-character-name* is translated to an
|
| 160 |
*implementation-defined* encoding.
|
| 161 |
|
| 162 |
+
[*Note 5*: In translation phase 1, a *universal-character-name* is
|
| 163 |
introduced whenever an actual extended character is encountered in the
|
| 164 |
source text. Therefore, all extended characters are described in terms
|
| 165 |
of *universal-character-name*s. However, the actual compiler
|
| 166 |
implementation may use its own native character set, so long as the same
|
| 167 |
results are obtained. — *end note*]
|