From Jason Turner

[lex.string]

Diff to HTML by rtfpessoa

Files changed (1) hide show
  1. tmp/tmpnwvrki3f/{from.md → to.md} +78 -65
tmp/tmpnwvrki3f/{from.md → to.md} RENAMED
@@ -10,10 +10,17 @@ string-literal:
10
  s-char-sequence:
11
  s-char
12
  s-char-sequence s-char
13
  ```
14
 
 
 
 
 
 
 
 
15
  ``` bnf
16
  raw-string:
17
  '"' d-char-sequenceₒₚₜ '(' r-char-sequenceₒₚₜ ')' d-char-sequenceₒₚₜ '"'
18
  ```
19
 
@@ -21,21 +28,28 @@ raw-string:
21
  r-char-sequence:
22
  r-char
23
  r-char-sequence r-char
24
  ```
25
 
 
 
 
 
 
 
26
  ``` bnf
27
  d-char-sequence:
28
  d-char
29
  d-char-sequence d-char
30
  ```
31
 
32
- A *string-literal* is a sequence of characters (as defined in 
33
- [[lex.ccon]]) surrounded by double quotes, optionally prefixed by `R`,
34
- `u8`, `u8R`, `u`, `uR`, `U`, `UR`, `L`, or `LR`, as in `"..."`,
35
- `R"(...)"`, `u8"..."`, `u8R"**(...)**"`, `u"..."`, `uR"*~(...)*~"`,
36
- `U"..."`, `UR"zzz(...)zzz"`, `L"..."`, or `LR"(...)"`, respectively.
 
37
 
38
  A *string-literal* that has an `R` in the prefix is a *raw string
39
  literal*. The *d-char-sequence* serves as a delimiter. The terminating
40
  *d-char-sequence* of a *raw-string* is the same sequence of characters
41
  as the initial *d-char-sequence*. A *d-char-sequence* shall consist of
@@ -72,78 +86,74 @@ a"
72
  ```
73
 
74
  is equivalent to `"\n)\\\na\"\n"`. The raw string
75
 
76
  ``` cpp
77
- R"(??)"
78
  ```
79
 
80
- is equivalent to `"\?\?"`. The raw string
81
-
82
- ``` cpp
83
- R"#(
84
- )??="
85
- )#"
86
- ```
87
-
88
- is equivalent to `"\n)\?\?=\"\n"`.
89
 
90
  — *end example*]
91
 
92
  After translation phase 6, a *string-literal* that does not begin with
93
- an *encoding-prefix* is an *ordinary string literal*, and is initialized
94
- with the given characters.
 
 
95
 
96
  A *string-literal* that begins with `u8`, such as `u8"asdf"`, is a
97
- *UTF-8 string literal*.
 
 
 
 
98
 
99
  Ordinary string literals and UTF-8 string literals are also referred to
100
- as narrow string literals. A narrow string literal has type “array of
101
- *n* `const char`”, where *n* is the size of the string as defined below,
102
- and has static storage duration ([[basic.stc]]).
103
 
104
- For a UTF-8 string literal, each successive element of the object
105
- representation ([[basic.types]]) has the value of the corresponding
106
- code unit of the UTF-8 encoding of the string.
 
 
107
 
108
- A *string-literal* that begins with `u`, such as `u"asdf"`, is a
109
- `char16_t` string literal. A `char16_t` string literal has type “array
110
- of *n* `const char16_t`”, where *n* is the size of the string as defined
111
- below; it is initialized with the given characters. A single *c-char*
112
- may produce more than one `char16_t` character in the form of surrogate
113
- pairs.
114
 
115
- A *string-literal* that begins with `U`, such as `U"asdf"`, is a
116
- `char32_t` string literal. A `char32_t` string literal has type “array
117
- of *n* `const char32_t`”, where *n* is the size of the string as defined
118
- below; it is initialized with the given characters.
 
119
 
120
  A *string-literal* that begins with `L`, such as `L"asdf"`, is a *wide
121
  string literal*. A wide string literal has type “array of *n* `const
122
  wchar_t`”, where *n* is the size of the string as defined below; it is
123
  initialized with the given characters.
124
 
125
- In translation phase 6 ([[lex.phases]]), adjacent *string-literal*s are
126
  concatenated. If both *string-literal*s have the same *encoding-prefix*,
127
- the resulting concatenated string literal has that *encoding-prefix*. If
128
- one *string-literal* has no *encoding-prefix*, it is treated as a
129
  *string-literal* of the same *encoding-prefix* as the other operand. If
130
  a UTF-8 string literal token is adjacent to a wide string literal token,
131
  the program is ill-formed. Any other concatenations are
132
  conditionally-supported with *implementation-defined* behavior.
133
 
134
- [*Note 3*: This concatenation is an interpretation, not a conversion.
135
  Because the interpretation happens in translation phase 6 (after each
136
- character from a string literal has been translated into a value from
137
  the appropriate character set), a *string-literal*’s initial rawness has
138
  no effect on the interpretation or well-formedness of the
139
  concatenation. — *end note*]
140
 
141
- Table  [[tab:lex.string.concat]] has some examples of valid
142
- concatenations.
143
 
144
- **Table: String literal concatenations** <a id="tab:lex.string.concat">[tab:lex.string.concat]</a>
145
 
146
  | | | | | | |
147
  | -------------------------- | ----- | -------------------------- | ----- | -------------------------- | ----- |
148
  | *[spans 2 columns]* Source | Means | *[spans 2 columns]* Source | Means | *[spans 2 columns]* Source | Means |
149
  | `u"a"` | `u"b"` | `u"ab"` | `U"a"` | `U"b"` | `U"ab"` | `L"a"` | `L"b"` | `L"ab"` |
@@ -162,43 +172,46 @@ Characters in concatenated strings are kept distinct.
162
  contains the two characters `'\xA'` and `'B'` after concatenation (and
163
  not the single hexadecimal character `'\xAB'`).
164
 
165
  — *end example*]
166
 
167
- After any necessary concatenation, in translation phase 7 (
168
- [[lex.phases]]), `'\0'` is appended to every string literal so that
169
  programs that scan a string can find its end.
170
 
171
  Escape sequences and *universal-character-name*s in non-raw string
172
- literals have the same meaning as in character literals ([[lex.ccon]]),
173
  except that the single quote `'` is representable either by itself or by
174
  the escape sequence `\'`, and the double quote `"` shall be preceded by
175
- a `\`, and except that a *universal-character-name* in a `char16_t`
176
- string literal may yield a surrogate pair. In a narrow string literal, a
177
- *universal-character-name* may map to more than one `char` element due
178
- to *multibyte encoding*. The size of a `char32_t` or wide string literal
179
- is the total number of escape sequences, *universal-character-name*s,
180
- and other characters, plus one for the terminating `U'\0'` or `L'\0'`.
181
- The size of a `char16_t` string literal is the total number of escape
182
- sequences, *universal-character-name*s, and other characters, plus one
183
- for each character requiring a surrogate pair, plus one for the
184
- terminating `u'\0'`.
185
 
186
- [*Note 4*: The size of a `char16_t` string literal is the number of
187
  code units, not the number of characters. — *end note*]
188
 
189
- Within `char32_t` and `char16_t` string literals, any
190
- *universal-character-name*s shall be within the range `0x0` to
191
- `0x10FFFF`. The size of a narrow string literal is the total number of
192
- escape sequences and other characters, plus at least one for the
193
- multibyte encoding of each *universal-character-name*, plus one for the
 
 
194
  terminating `'\0'`.
195
 
196
  Evaluating a *string-literal* results in a string literal object with
197
  static storage duration, initialized from the given characters as
198
- specified above. Whether all string literals are distinct (that is, are
199
- stored in nonoverlapping objects) and whether successive evaluations of
200
- a *string-literal* yield the same or a different object is unspecified.
 
201
 
202
- [*Note 5*: The effect of attempting to modify a string literal is
203
  undefined. — *end note*]
204
 
 
10
  s-char-sequence:
11
  s-char
12
  s-char-sequence s-char
13
  ```
14
 
15
+ ``` bnf
16
+ s-char:
17
+ any member of the basic source character set except the double-quote '"', backslash '\', or new-line character
18
+ escape-sequence
19
+ universal-character-name
20
+ ```
21
+
22
  ``` bnf
23
  raw-string:
24
  '"' d-char-sequenceₒₚₜ '(' r-char-sequenceₒₚₜ ')' d-char-sequenceₒₚₜ '"'
25
  ```
26
 
 
28
  r-char-sequence:
29
  r-char
30
  r-char-sequence r-char
31
  ```
32
 
33
+ ``` bnf
34
+ r-char:
35
+ any member of the source character set, except a right parenthesis ')' followed by
36
+ the initial *d-char-sequence* (which may be empty) followed by a double quote '"'.
37
+ ```
38
+
39
  ``` bnf
40
  d-char-sequence:
41
  d-char
42
  d-char-sequence d-char
43
  ```
44
 
45
+ ``` bnf
46
+ d-char:
47
+ any member of the basic source character set except:
48
+ space, the left parenthesis '(', the right parenthesis ')', the backslash '\', and the control characters
49
+ representing horizontal tab, vertical tab, form feed, and newline.
50
+ ```
51
 
52
  A *string-literal* that has an `R` in the prefix is a *raw string
53
  literal*. The *d-char-sequence* serves as a delimiter. The terminating
54
  *d-char-sequence* of a *raw-string* is the same sequence of characters
55
  as the initial *d-char-sequence*. A *d-char-sequence* shall consist of
 
86
  ```
87
 
88
  is equivalent to `"\n)\\\na\"\n"`. The raw string
89
 
90
  ``` cpp
91
+ R"(x = "\"y\"")"
92
  ```
93
 
94
+ is equivalent to `"x = \"\\\"y\\\"\""`.
 
 
 
 
 
 
 
 
95
 
96
  — *end example*]
97
 
98
  After translation phase 6, a *string-literal* that does not begin with
99
+ an *encoding-prefix* is an *ordinary string literal*. An ordinary string
100
+ literal has type “array of *n* `const char`” where *n* is the size of
101
+ the string as defined below, has static storage duration [[basic.stc]],
102
+ and is initialized with the given characters.
103
 
104
  A *string-literal* that begins with `u8`, such as `u8"asdf"`, is a
105
+ *UTF-8 string literal*. A UTF-8 string literal has type “array of *n*
106
+ `const char8_t`”, where *n* is the size of the string as defined below;
107
+ each successive element of the object representation [[basic.types]] has
108
+ the value of the corresponding code unit of the UTF-8 encoding of the
109
+ string.
110
 
111
  Ordinary string literals and UTF-8 string literals are also referred to
112
+ as narrow string literals.
 
 
113
 
114
+ A *string-literal* that begins with `u`, such as `u"asdf"`, is a *UTF-16
115
+ string literal*. A UTF-16 string literal has type “array of *n*
116
+ `const char16_t`”, where *n* is the size of the string as defined below;
117
+ each successive element of the array has the value of the corresponding
118
+ code unit of the UTF-16 encoding of the string.
119
 
120
+ [*Note 3*: A single *c-char* may produce more than one `char16_t`
121
+ character in the form of surrogate pairs. A surrogate pair is a
122
+ representation for a single code point as a sequence of two 16-bit code
123
+ units. *end note*]
 
 
124
 
125
+ A *string-literal* that begins with `U`, such as `U"asdf"`, is a *UTF-32
126
+ string literal*. A UTF-32 string literal has type “array of *n*
127
+ `const char32_t`”, where *n* is the size of the string as defined below;
128
+ each successive element of the array has the value of the corresponding
129
+ code unit of the UTF-32 encoding of the string.
130
 
131
  A *string-literal* that begins with `L`, such as `L"asdf"`, is a *wide
132
  string literal*. A wide string literal has type “array of *n* `const
133
  wchar_t`”, where *n* is the size of the string as defined below; it is
134
  initialized with the given characters.
135
 
136
+ In translation phase 6 [[lex.phases]], adjacent *string-literal*s are
137
  concatenated. If both *string-literal*s have the same *encoding-prefix*,
138
+ the resulting concatenated *string-literal* has that *encoding-prefix*.
139
+ If one *string-literal* has no *encoding-prefix*, it is treated as a
140
  *string-literal* of the same *encoding-prefix* as the other operand. If
141
  a UTF-8 string literal token is adjacent to a wide string literal token,
142
  the program is ill-formed. Any other concatenations are
143
  conditionally-supported with *implementation-defined* behavior.
144
 
145
+ [*Note 4*: This concatenation is an interpretation, not a conversion.
146
  Because the interpretation happens in translation phase 6 (after each
147
+ character from a *string-literal* has been translated into a value from
148
  the appropriate character set), a *string-literal*’s initial rawness has
149
  no effect on the interpretation or well-formedness of the
150
  concatenation. — *end note*]
151
 
152
+ [[lex.string.concat]] has some examples of valid concatenations.
 
153
 
154
+ **Table: String literal concatenations** <a id="lex.string.concat">[lex.string.concat]</a>
155
 
156
  | | | | | | |
157
  | -------------------------- | ----- | -------------------------- | ----- | -------------------------- | ----- |
158
  | *[spans 2 columns]* Source | Means | *[spans 2 columns]* Source | Means | *[spans 2 columns]* Source | Means |
159
  | `u"a"` | `u"b"` | `u"ab"` | `U"a"` | `U"b"` | `U"ab"` | `L"a"` | `L"b"` | `L"ab"` |
 
172
  contains the two characters `'\xA'` and `'B'` after concatenation (and
173
  not the single hexadecimal character `'\xAB'`).
174
 
175
  — *end example*]
176
 
177
+ After any necessary concatenation, in translation phase 7
178
+ [[lex.phases]], `'\0'` is appended to every *string-literal* so that
179
  programs that scan a string can find its end.
180
 
181
  Escape sequences and *universal-character-name*s in non-raw string
182
+ literals have the same meaning as in *character-literal*s [[lex.ccon]],
183
  except that the single quote `'` is representable either by itself or by
184
  the escape sequence `\'`, and the double quote `"` shall be preceded by
185
+ a `\`, and except that a *universal-character-name* in a UTF-16 string
186
+ literal may yield a surrogate pair. In a narrow string literal, a
187
+ *universal-character-name* may map to more than one `char` or `char8_t`
188
+ element due to *multibyte encoding*. The size of a `char32_t` or wide
189
+ string literal is the total number of escape sequences,
190
+ *universal-character-name*s, and other characters, plus one for the
191
+ terminating `U'\0'` or `L'\0'`. The size of a UTF-16 string literal is
192
+ the total number of escape sequences, *universal-character-name*s, and
193
+ other characters, plus one for each character requiring a surrogate
194
+ pair, plus one for the terminating `u'\0'`.
195
 
196
+ [*Note 5*: The size of a `char16_t` string literal is the number of
197
  code units, not the number of characters. — *end note*]
198
 
199
+ [*Note 6*: Any *universal-character-name*s are required to correspond
200
+ to a code point in the range [0, D800) or [E000, 10FFFF] (hexadecimal)
201
+ [[lex.charset]]. *end note*]
202
+
203
+ The size of a narrow string literal is the total number of escape
204
+ sequences and other characters, plus at least one for the multibyte
205
+ encoding of each *universal-character-name*, plus one for the
206
  terminating `'\0'`.
207
 
208
  Evaluating a *string-literal* results in a string literal object with
209
  static storage duration, initialized from the given characters as
210
+ specified above. Whether all *string-literal*s are distinct (that is,
211
+ are stored in nonoverlapping objects) and whether successive evaluations
212
+ of a *string-literal* yield the same or a different object is
213
+ unspecified.
214
 
215
+ [*Note 7*: The effect of attempting to modify a *string-literal* is
216
  undefined. — *end note*]
217