tmp/tmptr277u8h/{from.md → to.md}
RENAMED
|
@@ -6,12 +6,11 @@ string-literal:
|
|
| 6 |
encoding-prefixₒₚₜ 'R' raw-string
|
| 7 |
```
|
| 8 |
|
| 9 |
``` bnf
|
| 10 |
s-char-sequence:
|
| 11 |
-
s-char
|
| 12 |
-
s-char-sequence s-char
|
| 13 |
```
|
| 14 |
|
| 15 |
``` bnf
|
| 16 |
s-char:
|
| 17 |
basic-s-char
|
|
@@ -30,24 +29,22 @@ raw-string:
|
|
| 30 |
'"' d-char-sequenceₒₚₜ '(' r-char-sequenceₒₚₜ ')' d-char-sequenceₒₚₜ '"'
|
| 31 |
```
|
| 32 |
|
| 33 |
``` bnf
|
| 34 |
r-char-sequence:
|
| 35 |
-
r-char
|
| 36 |
-
r-char-sequence r-char
|
| 37 |
```
|
| 38 |
|
| 39 |
``` bnf
|
| 40 |
r-char:
|
| 41 |
any member of the translation character set, except a U+0029 (right parenthesis) followed by
|
| 42 |
the initial *d-char-sequence* (which may be empty) followed by a U+0022 (quotation mark)
|
| 43 |
```
|
| 44 |
|
| 45 |
``` bnf
|
| 46 |
d-char-sequence:
|
| 47 |
-
d-char
|
| 48 |
-
d-char-sequence d-char
|
| 49 |
```
|
| 50 |
|
| 51 |
``` bnf
|
| 52 |
d-char:
|
| 53 |
any member of the basic character set except:
|
|
@@ -56,16 +53,17 @@ d-char:
|
|
| 56 |
```
|
| 57 |
|
| 58 |
The kind of a *string-literal*, its type, and its associated character
|
| 59 |
encoding [[lex.charset]] are determined by its encoding prefix and
|
| 60 |
sequence of *s-char*s or *r-char*s as defined by [[lex.string.literal]]
|
| 61 |
-
where n is the number of encoded code units
|
|
|
|
| 62 |
|
| 63 |
**Table: String literals** <a id="lex.string.literal">[lex.string.literal]</a>
|
| 64 |
|
| 65 |
-
|
|
| 66 |
-
| ---- | ----------------------- | ----------------------------- | ------------------------- | ---------------------------------------------- |
|
| 67 |
| none | ordinary string literal | array of $n$ `const char` | ordinary literal encoding | `"ordinary string"` `R"(ordinary raw string)"` |
|
| 68 |
| `L` | wide string literal | array of $n$ `const wchar_t` | wide literal encoding | `L"wide string"` `LR"w(wide raw string)w"` |
|
| 69 |
| `u8` | UTF-8 string literal | array of $n$ `const char8_t` | UTF-8 | `u8"UTF-8 string"` `u8R"x(UTF-8 raw string)x"` |
|
| 70 |
| `u` | UTF-16 string literal | array of $n$ `const char16_t` | UTF-16 | `u"UTF-16 string"` `uR"y(UTF-16 raw string)y"` |
|
| 71 |
| `U` | UTF-32 string literal | array of $n$ `const char32_t` | UTF-32 | `U"UTF-32 string"` `UR"z(UTF-32 raw string)z"` |
|
|
@@ -75,12 +73,12 @@ A *string-literal* that has an `R` in the prefix is a *raw string
|
|
| 75 |
literal*. The *d-char-sequence* serves as a delimiter. The terminating
|
| 76 |
*d-char-sequence* of a *raw-string* is the same sequence of characters
|
| 77 |
as the initial *d-char-sequence*. A *d-char-sequence* shall consist of
|
| 78 |
at most 16 characters.
|
| 79 |
|
| 80 |
-
[*Note 1*: The characters `'('` and `')'`
|
| 81 |
-
|
| 82 |
`"(a|b)"`. — *end note*]
|
| 83 |
|
| 84 |
[*Note 2*:
|
| 85 |
|
| 86 |
A source-file new-line in a raw string literal results in a new-line in
|
|
@@ -116,18 +114,15 @@ R"(x = "\"y\"")"
|
|
| 116 |
is equivalent to `"x = \"\\\"y\\\"\""`.
|
| 117 |
|
| 118 |
— *end example*]
|
| 119 |
|
| 120 |
Ordinary string literals and UTF-8 string literals are also referred to
|
| 121 |
-
as narrow string literals.
|
| 122 |
|
| 123 |
-
The
|
| 124 |
-
|
| 125 |
-
*
|
| 126 |
-
*encoding-prefix* is that *encoding-prefix*. If one *string-literal* has
|
| 127 |
-
no *encoding-prefix*, the common *encoding-prefix* is that of the other
|
| 128 |
-
*string-literal*. Any other combinations are ill-formed.
|
| 129 |
|
| 130 |
[*Note 3*: A *string-literal*’s rawness has no effect on the
|
| 131 |
determination of the common *encoding-prefix*. — *end note*]
|
| 132 |
|
| 133 |
In translation phase 6 [[lex.phases]], adjacent *string-literal*s are
|
|
@@ -164,16 +159,17 @@ digit `1` (and not the single character `'A'` specified by a
|
|
| 164 |
| `u"a"` | `"b"` | `u"ab"` | `U"a"` | `"b"` | `U"ab"` | `L"a"` | `"b"` | `L"ab"` |
|
| 165 |
| `"a"` | `u"b"` | `u"ab"` | `"a"` | `U"b"` | `U"ab"` | `"a"` | `L"b"` | `L"ab"` |
|
| 166 |
|
| 167 |
|
| 168 |
Evaluating a *string-literal* results in a string literal object with
|
| 169 |
-
static storage duration [[basic.stc]].
|
| 170 |
-
distinct (that is, are stored in nonoverlapping objects) and whether
|
| 171 |
-
successive evaluations of a *string-literal* yield the same or a
|
| 172 |
-
different object is unspecified.
|
| 173 |
|
| 174 |
-
[*Note 4*:
|
|
|
|
|
|
|
|
|
|
|
|
|
| 175 |
is undefined. — *end note*]
|
| 176 |
|
| 177 |
String literal objects are initialized with the sequence of code unit
|
| 178 |
values corresponding to the *string-literal*’s sequence of *s-char*s
|
| 179 |
(originally from non-raw string literals) and *r-char*s (originally from
|
|
@@ -183,20 +179,19 @@ order as follows:
|
|
| 183 |
- The sequence of characters denoted by each contiguous sequence of
|
| 184 |
*basic-s-char*s, *r-char*s, *simple-escape-sequence*s [[lex.ccon]],
|
| 185 |
and *universal-character-name*s [[lex.charset]] is encoded to a code
|
| 186 |
unit sequence using the *string-literal*’s associated character
|
| 187 |
encoding. If a character lacks representation in the associated
|
| 188 |
-
character encoding, then the
|
| 189 |
-
|
| 190 |
-
|
| 191 |
-
|
| 192 |
-
|
| 193 |
-
|
| 194 |
-
|
| 195 |
-
|
| 196 |
-
|
| 197 |
-
each character independently. — *end note*]
|
| 198 |
- Each *numeric-escape-sequence* [[lex.ccon]] contributes a single code
|
| 199 |
unit with a value as follows:
|
| 200 |
- Let v be the integer value represented by the octal number
|
| 201 |
comprising the sequence of *octal-digit*s in an
|
| 202 |
*octal-escape-sequence* or by the hexadecimal number comprising the
|
|
@@ -207,11 +202,11 @@ order as follows:
|
|
| 207 |
`L`, and v does not exceed the range of representable values of the
|
| 208 |
corresponding unsigned type for the underlying type of the
|
| 209 |
*string-literal*’s array element type, then the value is the unique
|
| 210 |
value of the *string-literal*’s array element type `T` that is
|
| 211 |
congruent to v modulo 2ᴺ, where N is the width of `T`.
|
| 212 |
-
- Otherwise, the
|
| 213 |
|
| 214 |
When encoding a stateful character encoding, these sequences should
|
| 215 |
have no effect on encoding state.
|
| 216 |
- Each *conditional-escape-sequence* [[lex.ccon]] contributes an
|
| 217 |
*implementation-defined* code unit sequence. When encoding a stateful
|
|
|
|
| 6 |
encoding-prefixₒₚₜ 'R' raw-string
|
| 7 |
```
|
| 8 |
|
| 9 |
``` bnf
|
| 10 |
s-char-sequence:
|
| 11 |
+
s-char s-char-sequenceₒₚₜ
|
|
|
|
| 12 |
```
|
| 13 |
|
| 14 |
``` bnf
|
| 15 |
s-char:
|
| 16 |
basic-s-char
|
|
|
|
| 29 |
'"' d-char-sequenceₒₚₜ '(' r-char-sequenceₒₚₜ ')' d-char-sequenceₒₚₜ '"'
|
| 30 |
```
|
| 31 |
|
| 32 |
``` bnf
|
| 33 |
r-char-sequence:
|
| 34 |
+
r-char r-char-sequenceₒₚₜ
|
|
|
|
| 35 |
```
|
| 36 |
|
| 37 |
``` bnf
|
| 38 |
r-char:
|
| 39 |
any member of the translation character set, except a U+0029 (right parenthesis) followed by
|
| 40 |
the initial *d-char-sequence* (which may be empty) followed by a U+0022 (quotation mark)
|
| 41 |
```
|
| 42 |
|
| 43 |
``` bnf
|
| 44 |
d-char-sequence:
|
| 45 |
+
d-char d-char-sequenceₒₚₜ
|
|
|
|
| 46 |
```
|
| 47 |
|
| 48 |
``` bnf
|
| 49 |
d-char:
|
| 50 |
any member of the basic character set except:
|
|
|
|
| 53 |
```
|
| 54 |
|
| 55 |
The kind of a *string-literal*, its type, and its associated character
|
| 56 |
encoding [[lex.charset]] are determined by its encoding prefix and
|
| 57 |
sequence of *s-char*s or *r-char*s as defined by [[lex.string.literal]]
|
| 58 |
+
where n is the number of encoded code units that would result from an
|
| 59 |
+
evaluation of the *string-literal* (see below).
|
| 60 |
|
| 61 |
**Table: String literals** <a id="lex.string.literal">[lex.string.literal]</a>
|
| 62 |
|
| 63 |
+
| Enco- ding prefix | Kind \chdr \chdr | Type \chdr \chdr | Associated character encoding | Examples \rhdr \rhdr |
|
| 64 |
+
| ----------------- | ----------------------- | ----------------------------- | ----------------------------- | ---------------------------------------------- |
|
| 65 |
| none | ordinary string literal | array of $n$ `const char` | ordinary literal encoding | `"ordinary string"` `R"(ordinary raw string)"` |
|
| 66 |
| `L` | wide string literal | array of $n$ `const wchar_t` | wide literal encoding | `L"wide string"` `LR"w(wide raw string)w"` |
|
| 67 |
| `u8` | UTF-8 string literal | array of $n$ `const char8_t` | UTF-8 | `u8"UTF-8 string"` `u8R"x(UTF-8 raw string)x"` |
|
| 68 |
| `u` | UTF-16 string literal | array of $n$ `const char16_t` | UTF-16 | `u"UTF-16 string"` `uR"y(UTF-16 raw string)y"` |
|
| 69 |
| `U` | UTF-32 string literal | array of $n$ `const char32_t` | UTF-32 | `U"UTF-32 string"` `UR"z(UTF-32 raw string)z"` |
|
|
|
|
| 73 |
literal*. The *d-char-sequence* serves as a delimiter. The terminating
|
| 74 |
*d-char-sequence* of a *raw-string* is the same sequence of characters
|
| 75 |
as the initial *d-char-sequence*. A *d-char-sequence* shall consist of
|
| 76 |
at most 16 characters.
|
| 77 |
|
| 78 |
+
[*Note 1*: The characters `'('` and `')'` can appear in a *raw-string*.
|
| 79 |
+
Thus, `R"delimiter((a|b))delimiter"` is equivalent to
|
| 80 |
`"(a|b)"`. — *end note*]
|
| 81 |
|
| 82 |
[*Note 2*:
|
| 83 |
|
| 84 |
A source-file new-line in a raw string literal results in a new-line in
|
|
|
|
| 114 |
is equivalent to `"x = \"\\\"y\\\"\""`.
|
| 115 |
|
| 116 |
— *end example*]
|
| 117 |
|
| 118 |
Ordinary string literals and UTF-8 string literals are also referred to
|
| 119 |
+
as *narrow string literals*.
|
| 120 |
|
| 121 |
+
The *string-literal*s in any sequence of adjacent *string-literal*s
|
| 122 |
+
shall have at most one unique *encoding-prefix* among them. The common
|
| 123 |
+
*encoding-prefix* of the sequence is that *encoding-prefix*, if any.
|
|
|
|
|
|
|
|
|
|
| 124 |
|
| 125 |
[*Note 3*: A *string-literal*’s rawness has no effect on the
|
| 126 |
determination of the common *encoding-prefix*. — *end note*]
|
| 127 |
|
| 128 |
In translation phase 6 [[lex.phases]], adjacent *string-literal*s are
|
|
|
|
| 159 |
| `u"a"` | `"b"` | `u"ab"` | `U"a"` | `"b"` | `U"ab"` | `L"a"` | `"b"` | `L"ab"` |
|
| 160 |
| `"a"` | `u"b"` | `u"ab"` | `"a"` | `U"b"` | `U"ab"` | `"a"` | `L"b"` | `L"ab"` |
|
| 161 |
|
| 162 |
|
| 163 |
Evaluating a *string-literal* results in a string literal object with
|
| 164 |
+
static storage duration [[basic.stc]].
|
|
|
|
|
|
|
|
|
|
| 165 |
|
| 166 |
+
[*Note 4*: String literal objects are potentially non-unique
|
| 167 |
+
[[intro.object]]. Whether successive evaluations of a *string-literal*
|
| 168 |
+
yield the same or a different object is unspecified. — *end note*]
|
| 169 |
+
|
| 170 |
+
[*Note 5*: The effect of attempting to modify a string literal object
|
| 171 |
is undefined. — *end note*]
|
| 172 |
|
| 173 |
String literal objects are initialized with the sequence of code unit
|
| 174 |
values corresponding to the *string-literal*’s sequence of *s-char*s
|
| 175 |
(originally from non-raw string literals) and *r-char*s (originally from
|
|
|
|
| 179 |
- The sequence of characters denoted by each contiguous sequence of
|
| 180 |
*basic-s-char*s, *r-char*s, *simple-escape-sequence*s [[lex.ccon]],
|
| 181 |
and *universal-character-name*s [[lex.charset]] is encoded to a code
|
| 182 |
unit sequence using the *string-literal*’s associated character
|
| 183 |
encoding. If a character lacks representation in the associated
|
| 184 |
+
character encoding, then the program is ill-formed. \[*Note 6*: No
|
| 185 |
+
character lacks representation in any Unicode encoding
|
| 186 |
+
form. — *end note*] When encoding a stateful character encoding,
|
| 187 |
+
implementations should encode the first such sequence beginning with
|
| 188 |
+
the initial encoding state and encode subsequent sequences beginning
|
| 189 |
+
with the final encoding state of the prior sequence. \[*Note 7*: The
|
| 190 |
+
encoded code unit sequence can differ from the sequence of code units
|
| 191 |
+
that would be obtained by encoding each character
|
| 192 |
+
independently. — *end note*]
|
|
|
|
| 193 |
- Each *numeric-escape-sequence* [[lex.ccon]] contributes a single code
|
| 194 |
unit with a value as follows:
|
| 195 |
- Let v be the integer value represented by the octal number
|
| 196 |
comprising the sequence of *octal-digit*s in an
|
| 197 |
*octal-escape-sequence* or by the hexadecimal number comprising the
|
|
|
|
| 202 |
`L`, and v does not exceed the range of representable values of the
|
| 203 |
corresponding unsigned type for the underlying type of the
|
| 204 |
*string-literal*’s array element type, then the value is the unique
|
| 205 |
value of the *string-literal*’s array element type `T` that is
|
| 206 |
congruent to v modulo 2ᴺ, where N is the width of `T`.
|
| 207 |
+
- Otherwise, the program is ill-formed.
|
| 208 |
|
| 209 |
When encoding a stateful character encoding, these sequences should
|
| 210 |
have no effect on encoding state.
|
| 211 |
- Each *conditional-escape-sequence* [[lex.ccon]] contributes an
|
| 212 |
*implementation-defined* code unit sequence. When encoding a stateful
|