From Jason Turner

[lex.string]

Diff to HTML by rtfpessoa

Files changed (1) hide show
  1. tmp/tmp6mv2vn1g/{from.md → to.md} +79 -61
tmp/tmp6mv2vn1g/{from.md → to.md} RENAMED
@@ -4,18 +4,10 @@
4
  string-literal:
5
  encoding-prefixₒₚₜ '"' s-char-sequenceₒₚₜ '"'
6
  encoding-prefixₒₚₜ 'R' raw-string
7
  ```
8
 
9
- ``` bnf
10
- encoding-prefix:
11
- 'u8'
12
- 'u'
13
- 'U'
14
- 'L'
15
- ```
16
-
17
  ``` bnf
18
  s-char-sequence:
19
  s-char
20
  s-char-sequence s-char
21
  ```
@@ -35,36 +27,43 @@ r-char-sequence:
35
  d-char-sequence:
36
  d-char
37
  d-char-sequence d-char
38
  ```
39
 
40
- A string literal is a sequence of characters (as defined in 
41
  [[lex.ccon]]) surrounded by double quotes, optionally prefixed by `R`,
42
  `u8`, `u8R`, `u`, `uR`, `U`, `UR`, `L`, or `LR`, as in `"..."`,
43
  `R"(...)"`, `u8"..."`, `u8R"**(...)**"`, `u"..."`, `uR"*~(...)*~"`,
44
  `U"..."`, `UR"zzz(...)zzz"`, `L"..."`, or `LR"(...)"`, respectively.
45
 
46
- A string literal that has an `R` in the prefix is a *raw string
47
  literal*. The *d-char-sequence* serves as a delimiter. The terminating
48
  *d-char-sequence* of a *raw-string* is the same sequence of characters
49
  as the initial *d-char-sequence*. A *d-char-sequence* shall consist of
50
  at most 16 characters.
51
 
52
- The characters `'('` and `')'` are permitted in a *raw-string*. Thus,
53
- `R"delimiter((a|b))delimiter"` is equivalent to `"(a|b)"`.
 
 
 
54
 
55
  A source-file new-line in a raw string literal results in a new-line in
56
- the resulting execution *string-literal*. Assuming no whitespace at the
57
  beginning of lines in the following example, the assert will succeed:
58
 
59
  ``` cpp
60
  const char* p = R"(a\
61
  b
62
  c)";
63
  assert(std::strcmp(p, "a\\\nb\nc") == 0);
64
  ```
65
 
 
 
 
 
66
  The raw string
67
 
68
  ``` cpp
69
  R"a(
70
  )\
@@ -86,62 +85,63 @@ R"#(
86
  )#"
87
  ```
88
 
89
  is equivalent to `"\n)\?\?=\"\n"`.
90
 
91
- After translation phase 6, a string literal that does not begin with an
92
- *encoding-prefix* is an ordinary string literal, and is initialized with
93
- the given characters.
94
 
95
- A string literal that begins with `u8`, such as `u8"asdf"`, is a UTF-8
96
- string literal.
 
 
 
 
97
 
98
  Ordinary string literals and UTF-8 string literals are also referred to
99
  as narrow string literals. A narrow string literal has type “array of
100
  *n* `const char`”, where *n* is the size of the string as defined below,
101
  and has static storage duration ([[basic.stc]]).
102
 
103
  For a UTF-8 string literal, each successive element of the object
104
  representation ([[basic.types]]) has the value of the corresponding
105
  code unit of the UTF-8 encoding of the string.
106
 
107
- A string literal that begins with `u`, such as `u"asdf"`, is a
108
  `char16_t` string literal. A `char16_t` string literal has type “array
109
  of *n* `const char16_t`”, where *n* is the size of the string as defined
110
- below; it has static storage duration and is initialized with the given
111
- characters. A single *c-char* may produce more than one `char16_t`
112
- character in the form of surrogate pairs.
113
 
114
- A string literal that begins with `U`, such as `U"asdf"`, is a
115
  `char32_t` string literal. A `char32_t` string literal has type “array
116
  of *n* `const char32_t`”, where *n* is the size of the string as defined
117
- below; it has static storage duration and is initialized with the given
118
- characters.
119
 
120
- A string literal that begins with `L`, such as `L"asdf"`, is a wide
121
- string literal. A wide string literal has type “array of *n* `const
122
- wchar_t`”, where *n* is the size of the string as defined below; it has
123
- static storage duration and is initialized with the given characters.
124
 
125
- Whether all string literals are distinct (that is, are stored in
126
- nonoverlapping objects) is *implementation-defined*. The effect of
127
- attempting to modify a string literal is undefined.
128
-
129
- In translation phase 6 ([[lex.phases]]), adjacent string literals are
130
- concatenated. If both string literals have the same *encoding-prefix*,
131
  the resulting concatenated string literal has that *encoding-prefix*. If
132
- one string literal has no *encoding-prefix*, it is treated as a string
133
- literal of the same *encoding-prefix* as the other operand. If a UTF-8
134
- string literal token is adjacent to a wide string literal token, the
135
- program is ill-formed. Any other concatenations are
136
- conditionally-supported with *implementation-defined* behavior. This
137
- concatenation is an interpretation, not a conversion. Because the
138
- interpretation happens in translation phase 6 (after each character from
139
- a literal has been translated into a value from the appropriate
140
- character set), a string literal’s initial rawness has no effect on the
141
- interpretation or well-formedness of the concatenation. Table 
142
- [[tab:lex.string.concat]] has some examples of valid concatenations.
 
 
 
 
143
 
144
  **Table: String literal concatenations** <a id="tab:lex.string.concat">[tab:lex.string.concat]</a>
145
 
146
  | | | | | | |
147
  | -------------------------- | ----- | -------------------------- | ----- | -------------------------- | ----- |
@@ -151,36 +151,54 @@ interpretation or well-formedness of the concatenation. Table 
151
  | `"a"` | `u"b"` | `u"ab"` | `"a"` | `U"b"` | `U"ab"` | `"a"` | `L"b"` | `L"ab"` |
152
 
153
 
154
  Characters in concatenated strings are kept distinct.
155
 
 
 
156
  ``` cpp
157
  "\xA" "B"
158
  ```
159
 
160
  contains the two characters `'\xA'` and `'B'` after concatenation (and
161
  not the single hexadecimal character `'\xAB'`).
162
 
 
 
163
  After any necessary concatenation, in translation phase 7 (
164
  [[lex.phases]]), `'\0'` is appended to every string literal so that
165
  programs that scan a string can find its end.
166
 
167
- Escape sequences and universal-character-names in non-raw string
168
  literals have the same meaning as in character literals ([[lex.ccon]]),
169
  except that the single quote `'` is representable either by itself or by
170
  the escape sequence `\'`, and the double quote `"` shall be preceded by
171
- a `\`. In a narrow string literal, a universal-character-name may map to
172
- more than one `char` element due to *multibyte encoding*. The size of a
173
- `char32_t` or wide string literal is the total number of escape
174
- sequences, universal-character-names, and other characters, plus one for
175
- the terminating `U'\0'` or `L'\0'`. The size of a `char16_t` string
176
- literal is the total number of escape sequences,
177
- universal-character-names, and other characters, plus one for each
178
- character requiring a surrogate pair, plus one for the terminating
179
- `u'\0'`. The size of a `char16_t` string literal is the number of code
180
- units, not the number of characters. Within `char32_t` and `char16_t`
181
- literals, any universal-character-names shall be within the range `0x0`
182
- to `0x10FFFF`. The size of a narrow string literal is the total number
183
- of escape sequences and other characters, plus at least one for the
184
- multibyte encoding of each universal-character-name, plus one for the
 
 
 
 
 
185
  terminating `'\0'`.
186
 
 
 
 
 
 
 
 
 
 
 
4
  string-literal:
5
  encoding-prefixₒₚₜ '"' s-char-sequenceₒₚₜ '"'
6
  encoding-prefixₒₚₜ 'R' raw-string
7
  ```
8
 
 
 
 
 
 
 
 
 
9
  ``` bnf
10
  s-char-sequence:
11
  s-char
12
  s-char-sequence s-char
13
  ```
 
27
  d-char-sequence:
28
  d-char
29
  d-char-sequence d-char
30
  ```
31
 
32
+ A *string-literal* is a sequence of characters (as defined in 
33
  [[lex.ccon]]) surrounded by double quotes, optionally prefixed by `R`,
34
  `u8`, `u8R`, `u`, `uR`, `U`, `UR`, `L`, or `LR`, as in `"..."`,
35
  `R"(...)"`, `u8"..."`, `u8R"**(...)**"`, `u"..."`, `uR"*~(...)*~"`,
36
  `U"..."`, `UR"zzz(...)zzz"`, `L"..."`, or `LR"(...)"`, respectively.
37
 
38
+ A *string-literal* that has an `R` in the prefix is a *raw string
39
  literal*. The *d-char-sequence* serves as a delimiter. The terminating
40
  *d-char-sequence* of a *raw-string* is the same sequence of characters
41
  as the initial *d-char-sequence*. A *d-char-sequence* shall consist of
42
  at most 16 characters.
43
 
44
+ [*Note 1*: The characters `'('` and `')'` are permitted in a
45
+ *raw-string*. Thus, `R"delimiter((a|b))delimiter"` is equivalent to
46
+ `"(a|b)"`. — *end note*]
47
+
48
+ [*Note 2*:
49
 
50
  A source-file new-line in a raw string literal results in a new-line in
51
+ the resulting execution string literal. Assuming no whitespace at the
52
  beginning of lines in the following example, the assert will succeed:
53
 
54
  ``` cpp
55
  const char* p = R"(a\
56
  b
57
  c)";
58
  assert(std::strcmp(p, "a\\\nb\nc") == 0);
59
  ```
60
 
61
+ — *end note*]
62
+
63
+ [*Example 1*:
64
+
65
  The raw string
66
 
67
  ``` cpp
68
  R"a(
69
  )\
 
85
  )#"
86
  ```
87
 
88
  is equivalent to `"\n)\?\?=\"\n"`.
89
 
90
+ *end example*]
 
 
91
 
92
+ After translation phase 6, a *string-literal* that does not begin with
93
+ an *encoding-prefix* is an *ordinary string literal*, and is initialized
94
+ with the given characters.
95
+
96
+ A *string-literal* that begins with `u8`, such as `u8"asdf"`, is a
97
+ *UTF-8 string literal*.
98
 
99
  Ordinary string literals and UTF-8 string literals are also referred to
100
  as narrow string literals. A narrow string literal has type “array of
101
  *n* `const char`”, where *n* is the size of the string as defined below,
102
  and has static storage duration ([[basic.stc]]).
103
 
104
  For a UTF-8 string literal, each successive element of the object
105
  representation ([[basic.types]]) has the value of the corresponding
106
  code unit of the UTF-8 encoding of the string.
107
 
108
+ A *string-literal* that begins with `u`, such as `u"asdf"`, is a
109
  `char16_t` string literal. A `char16_t` string literal has type “array
110
  of *n* `const char16_t`”, where *n* is the size of the string as defined
111
+ below; it is initialized with the given characters. A single *c-char*
112
+ may produce more than one `char16_t` character in the form of surrogate
113
+ pairs.
114
 
115
+ A *string-literal* that begins with `U`, such as `U"asdf"`, is a
116
  `char32_t` string literal. A `char32_t` string literal has type “array
117
  of *n* `const char32_t`”, where *n* is the size of the string as defined
118
+ below; it is initialized with the given characters.
 
119
 
120
+ A *string-literal* that begins with `L`, such as `L"asdf"`, is a *wide
121
+ string literal*. A wide string literal has type “array of *n* `const
122
+ wchar_t`”, where *n* is the size of the string as defined below; it is
123
+ initialized with the given characters.
124
 
125
+ In translation phase ([[lex.phases]]), adjacent *string-literal*s are
126
+ concatenated. If both *string-literal*s have the same *encoding-prefix*,
 
 
 
 
127
  the resulting concatenated string literal has that *encoding-prefix*. If
128
+ one *string-literal* has no *encoding-prefix*, it is treated as a
129
+ *string-literal* of the same *encoding-prefix* as the other operand. If
130
+ a UTF-8 string literal token is adjacent to a wide string literal token,
131
+ the program is ill-formed. Any other concatenations are
132
+ conditionally-supported with *implementation-defined* behavior.
133
+
134
+ [*Note 3*: This concatenation is an interpretation, not a conversion.
135
+ Because the interpretation happens in translation phase 6 (after each
136
+ character from a string literal has been translated into a value from
137
+ the appropriate character set), a *string-literal*’s initial rawness has
138
+ no effect on the interpretation or well-formedness of the
139
+ concatenation. — *end note*]
140
+
141
+ Table  [[tab:lex.string.concat]] has some examples of valid
142
+ concatenations.
143
 
144
  **Table: String literal concatenations** <a id="tab:lex.string.concat">[tab:lex.string.concat]</a>
145
 
146
  | | | | | | |
147
  | -------------------------- | ----- | -------------------------- | ----- | -------------------------- | ----- |
 
151
  | `"a"` | `u"b"` | `u"ab"` | `"a"` | `U"b"` | `U"ab"` | `"a"` | `L"b"` | `L"ab"` |
152
 
153
 
154
  Characters in concatenated strings are kept distinct.
155
 
156
+ [*Example 2*:
157
+
158
  ``` cpp
159
  "\xA" "B"
160
  ```
161
 
162
  contains the two characters `'\xA'` and `'B'` after concatenation (and
163
  not the single hexadecimal character `'\xAB'`).
164
 
165
+ — *end example*]
166
+
167
  After any necessary concatenation, in translation phase 7 (
168
  [[lex.phases]]), `'\0'` is appended to every string literal so that
169
  programs that scan a string can find its end.
170
 
171
+ Escape sequences and *universal-character-name*s in non-raw string
172
  literals have the same meaning as in character literals ([[lex.ccon]]),
173
  except that the single quote `'` is representable either by itself or by
174
  the escape sequence `\'`, and the double quote `"` shall be preceded by
175
+ a `\`, and except that a *universal-character-name* in a `char16_t`
176
+ string literal may yield a surrogate pair. In a narrow string literal, a
177
+ *universal-character-name* may map to more than one `char` element due
178
+ to *multibyte encoding*. The size of a `char32_t` or wide string literal
179
+ is the total number of escape sequences, *universal-character-name*s,
180
+ and other characters, plus one for the terminating `U'\0'` or `L'\0'`.
181
+ The size of a `char16_t` string literal is the total number of escape
182
+ sequences, *universal-character-name*s, and other characters, plus one
183
+ for each character requiring a surrogate pair, plus one for the
184
+ terminating `u'\0'`.
185
+
186
+ [*Note 4*: The size of a `char16_t` string literal is the number of
187
+ code units, not the number of characters. *end note*]
188
+
189
+ Within `char32_t` and `char16_t` string literals, any
190
+ *universal-character-name*s shall be within the range `0x0` to
191
+ `0x10FFFF`. The size of a narrow string literal is the total number of
192
+ escape sequences and other characters, plus at least one for the
193
+ multibyte encoding of each *universal-character-name*, plus one for the
194
  terminating `'\0'`.
195
 
196
+ Evaluating a *string-literal* results in a string literal object with
197
+ static storage duration, initialized from the given characters as
198
+ specified above. Whether all string literals are distinct (that is, are
199
+ stored in nonoverlapping objects) and whether successive evaluations of
200
+ a *string-literal* yield the same or a different object is unspecified.
201
+
202
+ [*Note 5*: The effect of attempting to modify a string literal is
203
+ undefined. — *end note*]
204
+