- tmp/tmp0ohtbg44/{from.md → to.md} +108 -105
tmp/tmp0ohtbg44/{from.md → to.md}
RENAMED
|
@@ -12,15 +12,21 @@ s-char-sequence:
|
|
| 12 |
s-char-sequence s-char
|
| 13 |
```
|
| 14 |
|
| 15 |
``` bnf
|
| 16 |
s-char:
|
| 17 |
-
|
| 18 |
escape-sequence
|
| 19 |
universal-character-name
|
| 20 |
```
|
| 21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
``` bnf
|
| 23 |
raw-string:
|
| 24 |
'"' d-char-sequenceₒₚₜ '(' r-char-sequenceₒₚₜ ')' d-char-sequenceₒₚₜ '"'
|
| 25 |
```
|
| 26 |
|
|
@@ -30,27 +36,43 @@ r-char-sequence:
|
|
| 30 |
r-char-sequence r-char
|
| 31 |
```
|
| 32 |
|
| 33 |
``` bnf
|
| 34 |
r-char:
|
| 35 |
-
any member of the
|
| 36 |
-
the initial *d-char-sequence* (which may be empty) followed by a
|
| 37 |
```
|
| 38 |
|
| 39 |
``` bnf
|
| 40 |
d-char-sequence:
|
| 41 |
d-char
|
| 42 |
d-char-sequence d-char
|
| 43 |
```
|
| 44 |
|
| 45 |
``` bnf
|
| 46 |
d-char:
|
| 47 |
-
any member of the basic
|
| 48 |
-
|
| 49 |
-
|
| 50 |
```
|
| 51 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 52 |
A *string-literal* that has an `R` in the prefix is a *raw string
|
| 53 |
literal*. The *d-char-sequence* serves as a delimiter. The terminating
|
| 54 |
*d-char-sequence* of a *raw-string* is the same sequence of characters
|
| 55 |
as the initial *d-char-sequence*. A *d-char-sequence* shall consist of
|
| 56 |
at most 16 characters.
|
|
@@ -93,125 +115,106 @@ R"(x = "\"y\"")"
|
|
| 93 |
|
| 94 |
is equivalent to `"x = \"\\\"y\\\"\""`.
|
| 95 |
|
| 96 |
— *end example*]
|
| 97 |
|
| 98 |
-
After translation phase 6, a *string-literal* that does not begin with
|
| 99 |
-
an *encoding-prefix* is an *ordinary string literal*. An ordinary string
|
| 100 |
-
literal has type “array of *n* `const char`” where *n* is the size of
|
| 101 |
-
the string as defined below, has static storage duration [[basic.stc]],
|
| 102 |
-
and is initialized with the given characters.
|
| 103 |
-
|
| 104 |
-
A *string-literal* that begins with `u8`, such as `u8"asdf"`, is a
|
| 105 |
-
*UTF-8 string literal*. A UTF-8 string literal has type “array of *n*
|
| 106 |
-
`const char8_t`”, where *n* is the size of the string as defined below;
|
| 107 |
-
each successive element of the object representation [[basic.types]] has
|
| 108 |
-
the value of the corresponding code unit of the UTF-8 encoding of the
|
| 109 |
-
string.
|
| 110 |
-
|
| 111 |
Ordinary string literals and UTF-8 string literals are also referred to
|
| 112 |
as narrow string literals.
|
| 113 |
|
| 114 |
-
|
| 115 |
-
string
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
|
|
|
| 119 |
|
| 120 |
-
[*Note 3*: A
|
| 121 |
-
|
| 122 |
-
representation for a single code point as a sequence of two 16-bit code
|
| 123 |
-
units. — *end note*]
|
| 124 |
-
|
| 125 |
-
A *string-literal* that begins with `U`, such as `U"asdf"`, is a *UTF-32
|
| 126 |
-
string literal*. A UTF-32 string literal has type “array of *n*
|
| 127 |
-
`const char32_t`”, where *n* is the size of the string as defined below;
|
| 128 |
-
each successive element of the array has the value of the corresponding
|
| 129 |
-
code unit of the UTF-32 encoding of the string.
|
| 130 |
-
|
| 131 |
-
A *string-literal* that begins with `L`, such as `L"asdf"`, is a *wide
|
| 132 |
-
string literal*. A wide string literal has type “array of *n* `const
|
| 133 |
-
wchar_t`”, where *n* is the size of the string as defined below; it is
|
| 134 |
-
initialized with the given characters.
|
| 135 |
|
| 136 |
In translation phase 6 [[lex.phases]], adjacent *string-literal*s are
|
| 137 |
-
concatenated.
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
*
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
|
| 147 |
-
|
| 148 |
-
|
| 149 |
-
|
| 150 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 151 |
|
| 152 |
[[lex.string.concat]] has some examples of valid concatenations.
|
| 153 |
|
|
|
|
|
|
|
| 154 |
**Table: String literal concatenations** <a id="lex.string.concat">[lex.string.concat]</a>
|
| 155 |
|
| 156 |
| | | | | | |
|
| 157 |
| -------------------------- | ----- | -------------------------- | ----- | -------------------------- | ----- |
|
| 158 |
| *[spans 2 columns]* Source | Means | *[spans 2 columns]* Source | Means | *[spans 2 columns]* Source | Means |
|
| 159 |
| `u"a"` | `u"b"` | `u"ab"` | `U"a"` | `U"b"` | `U"ab"` | `L"a"` | `L"b"` | `L"ab"` |
|
| 160 |
| `u"a"` | `"b"` | `u"ab"` | `U"a"` | `"b"` | `U"ab"` | `L"a"` | `"b"` | `L"ab"` |
|
| 161 |
| `"a"` | `u"b"` | `u"ab"` | `"a"` | `U"b"` | `U"ab"` | `"a"` | `L"b"` | `L"ab"` |
|
| 162 |
|
| 163 |
|
| 164 |
-
Characters in concatenated strings are kept distinct.
|
| 165 |
-
|
| 166 |
-
[*Example 2*:
|
| 167 |
-
|
| 168 |
-
``` cpp
|
| 169 |
-
"\xA" "B"
|
| 170 |
-
```
|
| 171 |
-
|
| 172 |
-
contains the two characters `'\xA'` and `'B'` after concatenation (and
|
| 173 |
-
not the single hexadecimal character `'\xAB'`).
|
| 174 |
-
|
| 175 |
-
— *end example*]
|
| 176 |
-
|
| 177 |
-
After any necessary concatenation, in translation phase 7
|
| 178 |
-
[[lex.phases]], `'\0'` is appended to every *string-literal* so that
|
| 179 |
-
programs that scan a string can find its end.
|
| 180 |
-
|
| 181 |
-
Escape sequences and *universal-character-name*s in non-raw string
|
| 182 |
-
literals have the same meaning as in *character-literal*s [[lex.ccon]],
|
| 183 |
-
except that the single quote `'` is representable either by itself or by
|
| 184 |
-
the escape sequence `\'`, and the double quote `"` shall be preceded by
|
| 185 |
-
a `\`, and except that a *universal-character-name* in a UTF-16 string
|
| 186 |
-
literal may yield a surrogate pair. In a narrow string literal, a
|
| 187 |
-
*universal-character-name* may map to more than one `char` or `char8_t`
|
| 188 |
-
element due to *multibyte encoding*. The size of a `char32_t` or wide
|
| 189 |
-
string literal is the total number of escape sequences,
|
| 190 |
-
*universal-character-name*s, and other characters, plus one for the
|
| 191 |
-
terminating `U'\0'` or `L'\0'`. The size of a UTF-16 string literal is
|
| 192 |
-
the total number of escape sequences, *universal-character-name*s, and
|
| 193 |
-
other characters, plus one for each character requiring a surrogate
|
| 194 |
-
pair, plus one for the terminating `u'\0'`.
|
| 195 |
-
|
| 196 |
-
[*Note 5*: The size of a `char16_t` string literal is the number of
|
| 197 |
-
code units, not the number of characters. — *end note*]
|
| 198 |
-
|
| 199 |
-
[*Note 6*: Any *universal-character-name*s are required to correspond
|
| 200 |
-
to a code point in the range [0, D800) or [E000, 10FFFF] (hexadecimal)
|
| 201 |
-
[[lex.charset]]. — *end note*]
|
| 202 |
-
|
| 203 |
-
The size of a narrow string literal is the total number of escape
|
| 204 |
-
sequences and other characters, plus at least one for the multibyte
|
| 205 |
-
encoding of each *universal-character-name*, plus one for the
|
| 206 |
-
terminating `'\0'`.
|
| 207 |
-
|
| 208 |
Evaluating a *string-literal* results in a string literal object with
|
| 209 |
-
static storage duration
|
| 210 |
-
|
| 211 |
-
|
| 212 |
-
|
| 213 |
-
|
| 214 |
-
|
| 215 |
-
|
| 216 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 217 |
|
|
|
|
| 12 |
s-char-sequence s-char
|
| 13 |
```
|
| 14 |
|
| 15 |
``` bnf
|
| 16 |
s-char:
|
| 17 |
+
basic-s-char
|
| 18 |
escape-sequence
|
| 19 |
universal-character-name
|
| 20 |
```
|
| 21 |
|
| 22 |
+
``` bnf
|
| 23 |
+
basic-s-char:
|
| 24 |
+
any member of the translation character set except the U+0022 (quotation mark),
|
| 25 |
+
U+005c (reverse solidus), or new-line character
|
| 26 |
+
```
|
| 27 |
+
|
| 28 |
``` bnf
|
| 29 |
raw-string:
|
| 30 |
'"' d-char-sequenceₒₚₜ '(' r-char-sequenceₒₚₜ ')' d-char-sequenceₒₚₜ '"'
|
| 31 |
```
|
| 32 |
|
|
|
|
| 36 |
r-char-sequence r-char
|
| 37 |
```
|
| 38 |
|
| 39 |
``` bnf
|
| 40 |
r-char:
|
| 41 |
+
any member of the translation character set, except a U+0029 (right parenthesis) followed by
|
| 42 |
+
the initial *d-char-sequence* (which may be empty) followed by a U+0022 (quotation mark)
|
| 43 |
```
|
| 44 |
|
| 45 |
``` bnf
|
| 46 |
d-char-sequence:
|
| 47 |
d-char
|
| 48 |
d-char-sequence d-char
|
| 49 |
```
|
| 50 |
|
| 51 |
``` bnf
|
| 52 |
d-char:
|
| 53 |
+
any member of the basic character set except:
|
| 54 |
+
U+0020 (space), U+0028 (left parenthesis), U+0029 (right parenthesis), U+005c (reverse solidus),
|
| 55 |
+
U+0009 (character tabulation), U+000b (line tabulation), U+000c (form feed), and new-line
|
| 56 |
```
|
| 57 |
|
| 58 |
+
The kind of a *string-literal*, its type, and its associated character
|
| 59 |
+
encoding [[lex.charset]] are determined by its encoding prefix and
|
| 60 |
+
sequence of *s-char*s or *r-char*s as defined by [[lex.string.literal]]
|
| 61 |
+
where n is the number of encoded code units as described below.
|
| 62 |
+
|
| 63 |
+
**Table: String literals** <a id="lex.string.literal">[lex.string.literal]</a>
|
| 64 |
+
|
| 65 |
+
| | | | | |
|
| 66 |
+
| ---- | ----------------------- | ----------------------------- | ------------------------- | ---------------------------------------------- |
|
| 67 |
+
| none | ordinary string literal | array of $n$ `const char` | ordinary literal encoding | `"ordinary string"` `R"(ordinary raw string)"` |
|
| 68 |
+
| `L` | wide string literal | array of $n$ `const wchar_t` | wide literal encoding | `L"wide string"` `LR"w(wide raw string)w"` |
|
| 69 |
+
| `u8` | UTF-8 string literal | array of $n$ `const char8_t` | UTF-8 | `u8"UTF-8 string"` `u8R"x(UTF-8 raw string)x"` |
|
| 70 |
+
| `u` | UTF-16 string literal | array of $n$ `const char16_t` | UTF-16 | `u"UTF-16 string"` `uR"y(UTF-16 raw string)y"` |
|
| 71 |
+
| `U` | UTF-32 string literal | array of $n$ `const char32_t` | UTF-32 | `U"UTF-32 string"` `UR"z(UTF-32 raw string)z"` |
|
| 72 |
+
|
| 73 |
+
|
| 74 |
A *string-literal* that has an `R` in the prefix is a *raw string
|
| 75 |
literal*. The *d-char-sequence* serves as a delimiter. The terminating
|
| 76 |
*d-char-sequence* of a *raw-string* is the same sequence of characters
|
| 77 |
as the initial *d-char-sequence*. A *d-char-sequence* shall consist of
|
| 78 |
at most 16 characters.
|
|
|
|
| 115 |
|
| 116 |
is equivalent to `"x = \"\\\"y\\\"\""`.
|
| 117 |
|
| 118 |
— *end example*]
|
| 119 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 120 |
Ordinary string literals and UTF-8 string literals are also referred to
|
| 121 |
as narrow string literals.
|
| 122 |
|
| 123 |
+
The common *encoding-prefix* for a sequence of adjacent
|
| 124 |
+
*string-literal*s is determined pairwise as follows: If two
|
| 125 |
+
*string-literal*s have the same *encoding-prefix*, the common
|
| 126 |
+
*encoding-prefix* is that *encoding-prefix*. If one *string-literal* has
|
| 127 |
+
no *encoding-prefix*, the common *encoding-prefix* is that of the other
|
| 128 |
+
*string-literal*. Any other combinations are ill-formed.
|
| 129 |
|
| 130 |
+
[*Note 3*: A *string-literal*’s rawness has no effect on the
|
| 131 |
+
determination of the common *encoding-prefix*. — *end note*]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 132 |
|
| 133 |
In translation phase 6 [[lex.phases]], adjacent *string-literal*s are
|
| 134 |
+
concatenated. The lexical structure and grouping of the contents of the
|
| 135 |
+
individual *string-literal*s is retained.
|
| 136 |
+
|
| 137 |
+
[*Example 2*:
|
| 138 |
+
|
| 139 |
+
``` cpp
|
| 140 |
+
"\xA" "B"
|
| 141 |
+
```
|
| 142 |
+
|
| 143 |
+
represents the code unit `'\xA'` and the character `'B'` after
|
| 144 |
+
concatenation (and not the single code unit `'\xAB'`). Similarly,
|
| 145 |
+
|
| 146 |
+
``` cpp
|
| 147 |
+
R"(\u00)" "41"
|
| 148 |
+
```
|
| 149 |
+
|
| 150 |
+
represents six characters, starting with a backslash and ending with the
|
| 151 |
+
digit `1` (and not the single character `'A'` specified by a
|
| 152 |
+
*universal-character-name*).
|
| 153 |
|
| 154 |
[[lex.string.concat]] has some examples of valid concatenations.
|
| 155 |
|
| 156 |
+
— *end example*]
|
| 157 |
+
|
| 158 |
**Table: String literal concatenations** <a id="lex.string.concat">[lex.string.concat]</a>
|
| 159 |
|
| 160 |
| | | | | | |
|
| 161 |
| -------------------------- | ----- | -------------------------- | ----- | -------------------------- | ----- |
|
| 162 |
| *[spans 2 columns]* Source | Means | *[spans 2 columns]* Source | Means | *[spans 2 columns]* Source | Means |
|
| 163 |
| `u"a"` | `u"b"` | `u"ab"` | `U"a"` | `U"b"` | `U"ab"` | `L"a"` | `L"b"` | `L"ab"` |
|
| 164 |
| `u"a"` | `"b"` | `u"ab"` | `U"a"` | `"b"` | `U"ab"` | `L"a"` | `"b"` | `L"ab"` |
|
| 165 |
| `"a"` | `u"b"` | `u"ab"` | `"a"` | `U"b"` | `U"ab"` | `"a"` | `L"b"` | `L"ab"` |
|
| 166 |
|
| 167 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 168 |
Evaluating a *string-literal* results in a string literal object with
|
| 169 |
+
static storage duration [[basic.stc]]. Whether all *string-literal*s are
|
| 170 |
+
distinct (that is, are stored in nonoverlapping objects) and whether
|
| 171 |
+
successive evaluations of a *string-literal* yield the same or a
|
| 172 |
+
different object is unspecified.
|
| 173 |
+
|
| 174 |
+
[*Note 4*: The effect of attempting to modify a string literal object
|
| 175 |
+
is undefined. — *end note*]
|
| 176 |
+
|
| 177 |
+
String literal objects are initialized with the sequence of code unit
|
| 178 |
+
values corresponding to the *string-literal*’s sequence of *s-char*s
|
| 179 |
+
(originally from non-raw string literals) and *r-char*s (originally from
|
| 180 |
+
raw string literals), plus a terminating U+0000 (null) character, in
|
| 181 |
+
order as follows:
|
| 182 |
+
|
| 183 |
+
- The sequence of characters denoted by each contiguous sequence of
|
| 184 |
+
*basic-s-char*s, *r-char*s, *simple-escape-sequence*s [[lex.ccon]],
|
| 185 |
+
and *universal-character-name*s [[lex.charset]] is encoded to a code
|
| 186 |
+
unit sequence using the *string-literal*’s associated character
|
| 187 |
+
encoding. If a character lacks representation in the associated
|
| 188 |
+
character encoding, then the *string-literal* is
|
| 189 |
+
conditionally-supported and an *implementation-defined* code unit
|
| 190 |
+
sequence is encoded. \[*Note 5*: No character lacks representation in
|
| 191 |
+
any Unicode encoding form. — *end note*] When encoding a stateful
|
| 192 |
+
character encoding, implementations should encode the first such
|
| 193 |
+
sequence beginning with the initial encoding state and encode
|
| 194 |
+
subsequent sequences beginning with the final encoding state of the
|
| 195 |
+
prior sequence. \[*Note 6*: The encoded code unit sequence can differ
|
| 196 |
+
from the sequence of code units that would be obtained by encoding
|
| 197 |
+
each character independently. — *end note*]
|
| 198 |
+
- Each *numeric-escape-sequence* [[lex.ccon]] contributes a single code
|
| 199 |
+
unit with a value as follows:
|
| 200 |
+
- Let v be the integer value represented by the octal number
|
| 201 |
+
comprising the sequence of *octal-digit*s in an
|
| 202 |
+
*octal-escape-sequence* or by the hexadecimal number comprising the
|
| 203 |
+
sequence of *hexadecimal-digit*s in a *hexadecimal-escape-sequence*.
|
| 204 |
+
- If v does not exceed the range of representable values of the
|
| 205 |
+
*string-literal*’s array element type, then the value is v.
|
| 206 |
+
- Otherwise, if the *string-literal*’s *encoding-prefix* is absent or
|
| 207 |
+
`L`, and v does not exceed the range of representable values of the
|
| 208 |
+
corresponding unsigned type for the underlying type of the
|
| 209 |
+
*string-literal*’s array element type, then the value is the unique
|
| 210 |
+
value of the *string-literal*’s array element type `T` that is
|
| 211 |
+
congruent to v modulo 2ᴺ, where N is the width of `T`.
|
| 212 |
+
- Otherwise, the *string-literal* is ill-formed.
|
| 213 |
+
|
| 214 |
+
When encoding a stateful character encoding, these sequences should
|
| 215 |
+
have no effect on encoding state.
|
| 216 |
+
- Each *conditional-escape-sequence* [[lex.ccon]] contributes an
|
| 217 |
+
*implementation-defined* code unit sequence. When encoding a stateful
|
| 218 |
+
character encoding, it is *implementation-defined* what effect these
|
| 219 |
+
sequences have on encoding state.
|
| 220 |
|