From Jason Turner

[lex.string]

Diff to HTML by rtfpessoa

Files changed (1) hide show
  1. tmp/tmptr277u8h/{from.md → to.md} +29 -34
tmp/tmptr277u8h/{from.md → to.md} RENAMED
@@ -6,12 +6,11 @@ string-literal:
6
  encoding-prefixₒₚₜ 'R' raw-string
7
  ```
8
 
9
  ``` bnf
10
  s-char-sequence:
11
- s-char
12
- s-char-sequence s-char
13
  ```
14
 
15
  ``` bnf
16
  s-char:
17
  basic-s-char
@@ -30,24 +29,22 @@ raw-string:
30
  '"' d-char-sequenceₒₚₜ '(' r-char-sequenceₒₚₜ ')' d-char-sequenceₒₚₜ '"'
31
  ```
32
 
33
  ``` bnf
34
  r-char-sequence:
35
- r-char
36
- r-char-sequence r-char
37
  ```
38
 
39
  ``` bnf
40
  r-char:
41
  any member of the translation character set, except a U+0029 (right parenthesis) followed by
42
  the initial *d-char-sequence* (which may be empty) followed by a U+0022 (quotation mark)
43
  ```
44
 
45
  ``` bnf
46
  d-char-sequence:
47
- d-char
48
- d-char-sequence d-char
49
  ```
50
 
51
  ``` bnf
52
  d-char:
53
  any member of the basic character set except:
@@ -56,16 +53,17 @@ d-char:
56
  ```
57
 
58
  The kind of a *string-literal*, its type, and its associated character
59
  encoding [[lex.charset]] are determined by its encoding prefix and
60
  sequence of *s-char*s or *r-char*s as defined by [[lex.string.literal]]
61
- where n is the number of encoded code units as described below.
 
62
 
63
  **Table: String literals** <a id="lex.string.literal">[lex.string.literal]</a>
64
 
65
- | | | | | |
66
- | ---- | ----------------------- | ----------------------------- | ------------------------- | ---------------------------------------------- |
67
  | none | ordinary string literal | array of $n$ `const char` | ordinary literal encoding | `"ordinary string"` `R"(ordinary raw string)"` |
68
  | `L` | wide string literal | array of $n$ `const wchar_t` | wide literal encoding | `L"wide string"` `LR"w(wide raw string)w"` |
69
  | `u8` | UTF-8 string literal | array of $n$ `const char8_t` | UTF-8 | `u8"UTF-8 string"` `u8R"x(UTF-8 raw string)x"` |
70
  | `u` | UTF-16 string literal | array of $n$ `const char16_t` | UTF-16 | `u"UTF-16 string"` `uR"y(UTF-16 raw string)y"` |
71
  | `U` | UTF-32 string literal | array of $n$ `const char32_t` | UTF-32 | `U"UTF-32 string"` `UR"z(UTF-32 raw string)z"` |
@@ -75,12 +73,12 @@ A *string-literal* that has an `R` in the prefix is a *raw string
75
  literal*. The *d-char-sequence* serves as a delimiter. The terminating
76
  *d-char-sequence* of a *raw-string* is the same sequence of characters
77
  as the initial *d-char-sequence*. A *d-char-sequence* shall consist of
78
  at most 16 characters.
79
 
80
- [*Note 1*: The characters `'('` and `')'` are permitted in a
81
- *raw-string*. Thus, `R"delimiter((a|b))delimiter"` is equivalent to
82
  `"(a|b)"`. — *end note*]
83
 
84
  [*Note 2*:
85
 
86
  A source-file new-line in a raw string literal results in a new-line in
@@ -116,18 +114,15 @@ R"(x = "\"y\"")"
116
  is equivalent to `"x = \"\\\"y\\\"\""`.
117
 
118
  — *end example*]
119
 
120
  Ordinary string literals and UTF-8 string literals are also referred to
121
- as narrow string literals.
122
 
123
- The common *encoding-prefix* for a sequence of adjacent
124
- *string-literal*s is determined pairwise as follows: If two
125
- *string-literal*s have the same *encoding-prefix*, the common
126
- *encoding-prefix* is that *encoding-prefix*. If one *string-literal* has
127
- no *encoding-prefix*, the common *encoding-prefix* is that of the other
128
- *string-literal*. Any other combinations are ill-formed.
129
 
130
  [*Note 3*: A *string-literal*’s rawness has no effect on the
131
  determination of the common *encoding-prefix*. — *end note*]
132
 
133
  In translation phase 6 [[lex.phases]], adjacent *string-literal*s are
@@ -164,16 +159,17 @@ digit `1` (and not the single character `'A'` specified by a
164
  | `u"a"` | `"b"` | `u"ab"` | `U"a"` | `"b"` | `U"ab"` | `L"a"` | `"b"` | `L"ab"` |
165
  | `"a"` | `u"b"` | `u"ab"` | `"a"` | `U"b"` | `U"ab"` | `"a"` | `L"b"` | `L"ab"` |
166
 
167
 
168
  Evaluating a *string-literal* results in a string literal object with
169
- static storage duration [[basic.stc]]. Whether all *string-literal*s are
170
- distinct (that is, are stored in nonoverlapping objects) and whether
171
- successive evaluations of a *string-literal* yield the same or a
172
- different object is unspecified.
173
 
174
- [*Note 4*: The effect of attempting to modify a string literal object
 
 
 
 
175
  is undefined. — *end note*]
176
 
177
  String literal objects are initialized with the sequence of code unit
178
  values corresponding to the *string-literal*’s sequence of *s-char*s
179
  (originally from non-raw string literals) and *r-char*s (originally from
@@ -183,20 +179,19 @@ order as follows:
183
  - The sequence of characters denoted by each contiguous sequence of
184
  *basic-s-char*s, *r-char*s, *simple-escape-sequence*s [[lex.ccon]],
185
  and *universal-character-name*s [[lex.charset]] is encoded to a code
186
  unit sequence using the *string-literal*’s associated character
187
  encoding. If a character lacks representation in the associated
188
- character encoding, then the *string-literal* is
189
- conditionally-supported and an *implementation-defined* code unit
190
- sequence is encoded. \[*Note 5*: No character lacks representation in
191
- any Unicode encoding form. *end note*] When encoding a stateful
192
- character encoding, implementations should encode the first such
193
- sequence beginning with the initial encoding state and encode
194
- subsequent sequences beginning with the final encoding state of the
195
- prior sequence. \[*Note 6*: The encoded code unit sequence can differ
196
- from the sequence of code units that would be obtained by encoding
197
- each character independently. — *end note*]
198
  - Each *numeric-escape-sequence* [[lex.ccon]] contributes a single code
199
  unit with a value as follows:
200
  - Let v be the integer value represented by the octal number
201
  comprising the sequence of *octal-digit*s in an
202
  *octal-escape-sequence* or by the hexadecimal number comprising the
@@ -207,11 +202,11 @@ order as follows:
207
  `L`, and v does not exceed the range of representable values of the
208
  corresponding unsigned type for the underlying type of the
209
  *string-literal*’s array element type, then the value is the unique
210
  value of the *string-literal*’s array element type `T` that is
211
  congruent to v modulo 2ᴺ, where N is the width of `T`.
212
- - Otherwise, the *string-literal* is ill-formed.
213
 
214
  When encoding a stateful character encoding, these sequences should
215
  have no effect on encoding state.
216
  - Each *conditional-escape-sequence* [[lex.ccon]] contributes an
217
  *implementation-defined* code unit sequence. When encoding a stateful
 
6
  encoding-prefixₒₚₜ 'R' raw-string
7
  ```
8
 
9
  ``` bnf
10
  s-char-sequence:
11
+ s-char s-char-sequenceₒₚₜ
 
12
  ```
13
 
14
  ``` bnf
15
  s-char:
16
  basic-s-char
 
29
  '"' d-char-sequenceₒₚₜ '(' r-char-sequenceₒₚₜ ')' d-char-sequenceₒₚₜ '"'
30
  ```
31
 
32
  ``` bnf
33
  r-char-sequence:
34
+ r-char r-char-sequenceₒₚₜ
 
35
  ```
36
 
37
  ``` bnf
38
  r-char:
39
  any member of the translation character set, except a U+0029 (right parenthesis) followed by
40
  the initial *d-char-sequence* (which may be empty) followed by a U+0022 (quotation mark)
41
  ```
42
 
43
  ``` bnf
44
  d-char-sequence:
45
+ d-char d-char-sequenceₒₚₜ
 
46
  ```
47
 
48
  ``` bnf
49
  d-char:
50
  any member of the basic character set except:
 
53
  ```
54
 
55
  The kind of a *string-literal*, its type, and its associated character
56
  encoding [[lex.charset]] are determined by its encoding prefix and
57
  sequence of *s-char*s or *r-char*s as defined by [[lex.string.literal]]
58
+ where n is the number of encoded code units that would result from an
59
+ evaluation of the *string-literal* (see below).
60
 
61
  **Table: String literals** <a id="lex.string.literal">[lex.string.literal]</a>
62
 
63
+ | Enco- ding prefix | Kind \chdr \chdr | Type \chdr \chdr | Associated character encoding | Examples \rhdr \rhdr |
64
+ | ----------------- | ----------------------- | ----------------------------- | ----------------------------- | ---------------------------------------------- |
65
  | none | ordinary string literal | array of $n$ `const char` | ordinary literal encoding | `"ordinary string"` `R"(ordinary raw string)"` |
66
  | `L` | wide string literal | array of $n$ `const wchar_t` | wide literal encoding | `L"wide string"` `LR"w(wide raw string)w"` |
67
  | `u8` | UTF-8 string literal | array of $n$ `const char8_t` | UTF-8 | `u8"UTF-8 string"` `u8R"x(UTF-8 raw string)x"` |
68
  | `u` | UTF-16 string literal | array of $n$ `const char16_t` | UTF-16 | `u"UTF-16 string"` `uR"y(UTF-16 raw string)y"` |
69
  | `U` | UTF-32 string literal | array of $n$ `const char32_t` | UTF-32 | `U"UTF-32 string"` `UR"z(UTF-32 raw string)z"` |
 
73
  literal*. The *d-char-sequence* serves as a delimiter. The terminating
74
  *d-char-sequence* of a *raw-string* is the same sequence of characters
75
  as the initial *d-char-sequence*. A *d-char-sequence* shall consist of
76
  at most 16 characters.
77
 
78
+ [*Note 1*: The characters `'('` and `')'` can appear in a *raw-string*.
79
+ Thus, `R"delimiter((a|b))delimiter"` is equivalent to
80
  `"(a|b)"`. — *end note*]
81
 
82
  [*Note 2*:
83
 
84
  A source-file new-line in a raw string literal results in a new-line in
 
114
  is equivalent to `"x = \"\\\"y\\\"\""`.
115
 
116
  — *end example*]
117
 
118
  Ordinary string literals and UTF-8 string literals are also referred to
119
+ as *narrow string literals*.
120
 
121
+ The *string-literal*s in any sequence of adjacent *string-literal*s
122
+ shall have at most one unique *encoding-prefix* among them. The common
123
+ *encoding-prefix* of the sequence is that *encoding-prefix*, if any.
 
 
 
124
 
125
  [*Note 3*: A *string-literal*’s rawness has no effect on the
126
  determination of the common *encoding-prefix*. — *end note*]
127
 
128
  In translation phase 6 [[lex.phases]], adjacent *string-literal*s are
 
159
  | `u"a"` | `"b"` | `u"ab"` | `U"a"` | `"b"` | `U"ab"` | `L"a"` | `"b"` | `L"ab"` |
160
  | `"a"` | `u"b"` | `u"ab"` | `"a"` | `U"b"` | `U"ab"` | `"a"` | `L"b"` | `L"ab"` |
161
 
162
 
163
  Evaluating a *string-literal* results in a string literal object with
164
+ static storage duration [[basic.stc]].
 
 
 
165
 
166
+ [*Note 4*: String literal objects are potentially non-unique
167
+ [[intro.object]]. Whether successive evaluations of a *string-literal*
168
+ yield the same or a different object is unspecified. — *end note*]
169
+
170
+ [*Note 5*: The effect of attempting to modify a string literal object
171
  is undefined. — *end note*]
172
 
173
  String literal objects are initialized with the sequence of code unit
174
  values corresponding to the *string-literal*’s sequence of *s-char*s
175
  (originally from non-raw string literals) and *r-char*s (originally from
 
179
  - The sequence of characters denoted by each contiguous sequence of
180
  *basic-s-char*s, *r-char*s, *simple-escape-sequence*s [[lex.ccon]],
181
  and *universal-character-name*s [[lex.charset]] is encoded to a code
182
  unit sequence using the *string-literal*’s associated character
183
  encoding. If a character lacks representation in the associated
184
+ character encoding, then the program is ill-formed. \[*Note 6*: No
185
+ character lacks representation in any Unicode encoding
186
+ form. *end note*] When encoding a stateful character encoding,
187
+ implementations should encode the first such sequence beginning with
188
+ the initial encoding state and encode subsequent sequences beginning
189
+ with the final encoding state of the prior sequence. \[*Note 7*: The
190
+ encoded code unit sequence can differ from the sequence of code units
191
+ that would be obtained by encoding each character
192
+ independently. *end note*]
 
193
  - Each *numeric-escape-sequence* [[lex.ccon]] contributes a single code
194
  unit with a value as follows:
195
  - Let v be the integer value represented by the octal number
196
  comprising the sequence of *octal-digit*s in an
197
  *octal-escape-sequence* or by the hexadecimal number comprising the
 
202
  `L`, and v does not exceed the range of representable values of the
203
  corresponding unsigned type for the underlying type of the
204
  *string-literal*’s array element type, then the value is the unique
205
  value of the *string-literal*’s array element type `T` that is
206
  congruent to v modulo 2ᴺ, where N is the width of `T`.
207
+ - Otherwise, the program is ill-formed.
208
 
209
  When encoding a stateful character encoding, these sequences should
210
  have no effect on encoding state.
211
  - Each *conditional-escape-sequence* [[lex.ccon]] contributes an
212
  *implementation-defined* code unit sequence. When encoding a stateful