tmp/tmpfgmirpl7/{from.md → to.md}
RENAMED
|
@@ -1,13 +1,15 @@
|
|
| 1 |
### Character literals <a id="lex.ccon">[[lex.ccon]]</a>
|
| 2 |
|
| 3 |
``` bnf
|
| 4 |
character-literal:
|
| 5 |
-
''' c-char-sequence '''
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
|
|
|
|
|
|
| 9 |
```
|
| 10 |
|
| 11 |
``` bnf
|
| 12 |
c-char-sequence:
|
| 13 |
c-char
|
|
@@ -39,46 +41,64 @@ hexadecimal-escape-sequence:
|
|
| 39 |
'\x' hexadecimal-digit
|
| 40 |
hexadecimal-escape-sequence hexadecimal-digit
|
| 41 |
```
|
| 42 |
|
| 43 |
A character literal is one or more characters enclosed in single quotes,
|
| 44 |
-
as in `'x'`, optionally preceded by
|
| 45 |
-
|
| 46 |
-
does not begin with `u`, `U`, or `L` is an ordinary character literal,
|
| 47 |
-
also referred to as a narrow-character literal. An ordinary character
|
| 48 |
-
literal that contains a single *c-char* representable in the execution
|
| 49 |
-
character set has type `char`, with value equal to the numerical value
|
| 50 |
-
of the encoding of the *c-char* in the execution character set. An
|
| 51 |
-
ordinary character literal that contains more than one *c-char* is a
|
| 52 |
-
*multicharacter literal*. A multicharacter literal, or an ordinary
|
| 53 |
-
character literal containing a single *c-char* not representable in the
|
| 54 |
-
execution character set, is conditionally-supported, has type `int`, and
|
| 55 |
-
has an *implementation-defined* value.
|
| 56 |
|
| 57 |
-
A character literal that
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 58 |
a character literal of type `char16_t`. The value of a `char16_t`
|
| 59 |
-
literal containing a single *c-char* is equal to its ISO 10646
|
| 60 |
-
point value, provided that the code point is representable with a
|
| 61 |
-
16-bit code unit. (That is, provided it is a basic multi-lingual
|
| 62 |
-
code point.) If the value is not representable within 16 bits, the
|
| 63 |
-
program is ill-formed. A `char16_t` literal containing
|
| 64 |
-
*c-char*s is ill-formed.
|
| 65 |
-
`U`, such as `U'z'`, is a character literal of type `char32_t`. The
|
| 66 |
-
value of a `char32_t` literal containing a single *c-char* is equal to
|
| 67 |
-
its ISO 10646 code point value. A `char32_t` literal containing multiple
|
| 68 |
-
*c-char*s is ill-formed. A character literal that begins with the letter
|
| 69 |
-
`L`, such as `L'x'`, is a wide-character literal. A wide-character
|
| 70 |
-
literal has type `wchar_t`.[^13] The value of a wide-character literal
|
| 71 |
-
containing a single *c-char* has value equal to the numerical value of
|
| 72 |
-
the encoding of the *c-char* in the execution wide-character set, unless
|
| 73 |
-
the *c-char* has no representation in the execution wide-character set,
|
| 74 |
-
in which case the value is *implementation-defined*. The type `wchar_t`
|
| 75 |
-
is able to represent all members of the execution wide-character set
|
| 76 |
-
(see [[basic.fundamental]]). . The value of a wide-character literal
|
| 77 |
-
containing multiple *c-char*s is *implementation-defined*.
|
| 78 |
|
| 79 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 80 |
`"`, the question mark `?`,[^14] and the backslash `\`, can be
|
| 81 |
represented according to Table [[tab:escape.sequences]]. The double
|
| 82 |
quote `"` and the question mark `?`, can be represented as themselves or
|
| 83 |
by the escape sequences `\"` and `\?` respectively, but the single quote
|
| 84 |
`'` and the backslash `\` shall be represented by the escape sequences
|
|
@@ -113,20 +133,25 @@ backslash followed by `x` followed by one or more hexadecimal digits
|
|
| 113 |
that are taken to specify the value of the desired character. There is
|
| 114 |
no limit to the number of digits in a hexadecimal sequence. A sequence
|
| 115 |
of octal or hexadecimal digits is terminated by the first character that
|
| 116 |
is not an octal digit or a hexadecimal digit, respectively. The value of
|
| 117 |
a character literal is *implementation-defined* if it falls outside of
|
| 118 |
-
the implementation-defined range defined for `char` (for
|
| 119 |
-
no prefix)
|
| 120 |
-
|
| 121 |
-
`'L'`).
|
| 122 |
|
| 123 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 124 |
appropriate execution character set, of the character named. If there is
|
| 125 |
-
no such encoding, the universal-character-name is translated to an
|
| 126 |
-
*implementation-defined* encoding.
|
| 127 |
-
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
|
|
|
|
|
|
|
| 132 |
|
|
|
|
| 1 |
### Character literals <a id="lex.ccon">[[lex.ccon]]</a>
|
| 2 |
|
| 3 |
``` bnf
|
| 4 |
character-literal:
|
| 5 |
+
encoding-prefixₒₚₜ ''' c-char-sequence '''
|
| 6 |
+
```
|
| 7 |
+
|
| 8 |
+
``` bnf
|
| 9 |
+
encoding-prefix: one of
|
| 10 |
+
'u8' 'u' 'U' 'L'
|
| 11 |
```
|
| 12 |
|
| 13 |
``` bnf
|
| 14 |
c-char-sequence:
|
| 15 |
c-char
|
|
|
|
| 41 |
'\x' hexadecimal-digit
|
| 42 |
hexadecimal-escape-sequence hexadecimal-digit
|
| 43 |
```
|
| 44 |
|
| 45 |
A character literal is one or more characters enclosed in single quotes,
|
| 46 |
+
as in `'x'`, optionally preceded by `u8`, `u`, `U`, or `L`, as in
|
| 47 |
+
`u8'w'`, `u'x'`, `U'y'`, or `L'z'`, respectively.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
|
| 49 |
+
A character literal that does not begin with `u8`, `u`, `U`, or `L` is
|
| 50 |
+
an *ordinary character literal*. An ordinary character literal that
|
| 51 |
+
contains a single *c-char* representable in the execution character set
|
| 52 |
+
has type `char`, with value equal to the numerical value of the encoding
|
| 53 |
+
of the *c-char* in the execution character set. An ordinary character
|
| 54 |
+
literal that contains more than one *c-char* is a *multicharacter
|
| 55 |
+
literal*. A multicharacter literal, or an ordinary character literal
|
| 56 |
+
containing a single *c-char* not representable in the execution
|
| 57 |
+
character set, is conditionally-supported, has type `int`, and has an
|
| 58 |
+
*implementation-defined* value.
|
| 59 |
+
|
| 60 |
+
A character literal that begins with `u8`, such as `u8'w'`, is a
|
| 61 |
+
character literal of type `char`, known as a *UTF-8 character literal*.
|
| 62 |
+
The value of a UTF-8 character literal is equal to its ISO 10646 code
|
| 63 |
+
point value, provided that the code point value is representable with a
|
| 64 |
+
single UTF-8 code unit (that is, provided it is in the C0 Controls and
|
| 65 |
+
Basic Latin Unicode block). If the value is not representable with a
|
| 66 |
+
single UTF-8 code unit, the program is ill-formed. A UTF-8 character
|
| 67 |
+
literal containing multiple *c-char*s is ill-formed.
|
| 68 |
+
|
| 69 |
+
A character literal that begins with the letter `u`, such as `u'x'`, is
|
| 70 |
a character literal of type `char16_t`. The value of a `char16_t`
|
| 71 |
+
character literal containing a single *c-char* is equal to its ISO 10646
|
| 72 |
+
code point value, provided that the code point is representable with a
|
| 73 |
+
single 16-bit code unit. (That is, provided it is a basic multi-lingual
|
| 74 |
+
plane code point.) If the value is not representable within 16 bits, the
|
| 75 |
+
program is ill-formed. A `char16_t` character literal containing
|
| 76 |
+
multiple *c-char*s is ill-formed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 77 |
|
| 78 |
+
A character literal that begins with the letter `U`, such as `U'y'`, is
|
| 79 |
+
a character literal of type `char32_t`. The value of a `char32_t`
|
| 80 |
+
character literal containing a single *c-char* is equal to its ISO 10646
|
| 81 |
+
code point value. A `char32_t` character literal containing multiple
|
| 82 |
+
*c-char*s is ill-formed.
|
| 83 |
+
|
| 84 |
+
A character literal that begins with the letter `L`, such as `L'z'`, is
|
| 85 |
+
a *wide-character literal*. A wide-character literal has type
|
| 86 |
+
`wchar_t`.[^13] The value of a wide-character literal containing a
|
| 87 |
+
single *c-char* has value equal to the numerical value of the encoding
|
| 88 |
+
of the *c-char* in the execution wide-character set, unless the *c-char*
|
| 89 |
+
has no representation in the execution wide-character set, in which case
|
| 90 |
+
the value is *implementation-defined*.
|
| 91 |
+
|
| 92 |
+
[*Note 1*: The type `wchar_t` is able to represent all members of the
|
| 93 |
+
execution wide-character set (see
|
| 94 |
+
[[basic.fundamental]]). — *end note*]
|
| 95 |
+
|
| 96 |
+
The value of a wide-character literal containing multiple *c-char*s is
|
| 97 |
+
*implementation-defined*.
|
| 98 |
+
|
| 99 |
+
Certain non-graphic characters, the single quote `'`, the double quote
|
| 100 |
`"`, the question mark `?`,[^14] and the backslash `\`, can be
|
| 101 |
represented according to Table [[tab:escape.sequences]]. The double
|
| 102 |
quote `"` and the question mark `?`, can be represented as themselves or
|
| 103 |
by the escape sequences `\"` and `\?` respectively, but the single quote
|
| 104 |
`'` and the backslash `\` shall be represented by the escape sequences
|
|
|
|
| 133 |
that are taken to specify the value of the desired character. There is
|
| 134 |
no limit to the number of digits in a hexadecimal sequence. A sequence
|
| 135 |
of octal or hexadecimal digits is terminated by the first character that
|
| 136 |
is not an octal digit or a hexadecimal digit, respectively. The value of
|
| 137 |
a character literal is *implementation-defined* if it falls outside of
|
| 138 |
+
the *implementation-defined* range defined for `char` (for character
|
| 139 |
+
literals with no prefix) or `wchar_t` (for character literals prefixed
|
| 140 |
+
by `L`).
|
|
|
|
| 141 |
|
| 142 |
+
[*Note 2*: If the value of a character literal prefixed by `u`, `u8`,
|
| 143 |
+
or `U` is outside the range defined for its type, the program is
|
| 144 |
+
ill-formed. — *end note*]
|
| 145 |
+
|
| 146 |
+
A *universal-character-name* is translated to the encoding, in the
|
| 147 |
appropriate execution character set, of the character named. If there is
|
| 148 |
+
no such encoding, the *universal-character-name* is translated to an
|
| 149 |
+
*implementation-defined* encoding.
|
| 150 |
+
|
| 151 |
+
[*Note 3*: In translation phase 1, a *universal-character-name* is
|
| 152 |
+
introduced whenever an actual extended character is encountered in the
|
| 153 |
+
source text. Therefore, all extended characters are described in terms
|
| 154 |
+
of *universal-character-name*s. However, the actual compiler
|
| 155 |
+
implementation may use its own native character set, so long as the same
|
| 156 |
+
results are obtained. — *end note*]
|
| 157 |
|