[lex.string] - C++17 → C++20

Files changed (1) hide show

tmp/tmpnwvrki3f/{from.md → to.md} +78 -65

tmp/tmpnwvrki3f/{from.md → to.md} RENAMED Viewed

@@ -10,10 +10,17 @@ string-literal:
 s-char-sequence:
     s-char
     s-char-sequence s-char
 ```
 ``` bnf
 raw-string:
     '"' d-char-sequenceₒₚₜ '(' r-char-sequenceₒₚₜ ')' d-char-sequenceₒₚₜ '"'
 ```
@@ -21,21 +28,28 @@ raw-string:
 r-char-sequence:
     r-char
     r-char-sequence r-char
 ```
 ``` bnf
 d-char-sequence:
     d-char
     d-char-sequence d-char
 ```
-A *string-literal* is a sequence of characters (as defined in
-[[lex.ccon]]) surrounded by double quotes, optionally prefixed by `R`,
-`u8`, `u8R`, `u`, `uR`, `U`, `UR`, `L`, or `LR`, as in `"..."`,
-`R"(...)"`, `u8"..."`, `u8R"**(...)**"`, `u"..."`, `uR"*~(...)*~"`,
-`U"..."`, `UR"zzz(...)zzz"`, `L"..."`, or `LR"(...)"`, respectively.
 A *string-literal* that has an `R` in the prefix is a *raw string
 literal*. The *d-char-sequence* serves as a delimiter. The terminating
 *d-char-sequence* of a *raw-string* is the same sequence of characters
 as the initial *d-char-sequence*. A *d-char-sequence* shall consist of
@@ -72,78 +86,74 @@ a"
 ```
 is equivalent to `"\n)\\\na\"\n"`. The raw string
 ``` cpp
-R"(??)"
 ```
-is equivalent to `"\?\?"`. The raw string
-``` cpp
-R"#(
-)??="
-)#"
-```
-is equivalent to `"\n)\?\?=\"\n"`.
 — *end example*]
 After translation phase 6, a *string-literal* that does not begin with
-an *encoding-prefix* is an *ordinary string literal*, and is initialized
-with the given characters.
 A *string-literal* that begins with `u8`, such as `u8"asdf"`, is a
-*UTF-8 string literal*.
 Ordinary string literals and UTF-8 string literals are also referred to
-as narrow string literals. A narrow string literal has type “array of
-*n* `const char`”, where *n* is the size of the string as defined below,
-and has static storage duration ([[basic.stc]]).
-For a UTF-8 string literal, each successive element of the object
-representation ([[basic.types]]) has the value of the corresponding
-code unit of the UTF-8 encoding of the string.
-A *string-literal* that begins with `u`, such as `u"asdf"`, is a
-`char16_t` string literal. A `char16_t` string literal has type “array
-of *n* `const char16_t`”, where *n* is the size of the string as defined
-below; it is initialized with the given characters. A single *c-char*
-may produce more than one `char16_t` character in the form of surrogate
-pairs.
-A *string-literal* that begins with `U`, such as `U"asdf"`, is a
-`char32_t` string literal. A `char32_t` string literal has type “array
-of *n* `const char32_t`”, where *n* is the size of the string as defined
-below; it is initialized with the given characters.
 A *string-literal* that begins with `L`, such as `L"asdf"`, is a *wide
 string literal*. A wide string literal has type “array of *n* `const
 wchar_t`”, where *n* is the size of the string as defined below; it is
 initialized with the given characters.
-In translation phase 6 ([[lex.phases]]), adjacent *string-literal*s are
 concatenated. If both *string-literal*s have the same *encoding-prefix*,
-the resulting concatenated string literal has that *encoding-prefix*. If
-one *string-literal* has no *encoding-prefix*, it is treated as a
 *string-literal* of the same *encoding-prefix* as the other operand. If
 a UTF-8 string literal token is adjacent to a wide string literal token,
 the program is ill-formed. Any other concatenations are
 conditionally-supported with *implementation-defined* behavior.
-[*Note 3*: This concatenation is an interpretation, not a conversion.
 Because the interpretation happens in translation phase 6 (after each
-character from a string literal has been translated into a value from
 the appropriate character set), a *string-literal*’s initial rawness has
 no effect on the interpretation or well-formedness of the
 concatenation. — *end note*]
-Table  [[tab:lex.string.concat]] has some examples of valid
-concatenations.
-**Table: String literal concatenations** <a id="tab:lex.string.concat">[tab:lex.string.concat]</a>
 |                            |       |                            |       |                            |       |
 | -------------------------- | ----- | -------------------------- | ----- | -------------------------- | ----- |
 | *[spans 2 columns]* Source | Means | *[spans 2 columns]* Source | Means | *[spans 2 columns]* Source | Means |
 | `u"a"`                     | `u"b"` | `u"ab"`                    | `U"a"` | `U"b"`                     | `U"ab"` | `L"a"` | `L"b"` | `L"ab"` |
@@ -162,43 +172,46 @@ Characters in concatenated strings are kept distinct.
 contains the two characters `'\xA'` and `'B'` after concatenation (and
 not the single hexadecimal character `'\xAB'`).
 — *end example*]
-After any necessary concatenation, in translation phase 7 (
-[[lex.phases]]), `'\0'` is appended to every string literal so that
 programs that scan a string can find its end.
 Escape sequences and *universal-character-name*s in non-raw string
-literals have the same meaning as in character literals ([[lex.ccon]]),
 except that the single quote `'` is representable either by itself or by
 the escape sequence `\'`, and the double quote `"` shall be preceded by
-a `\`, and except that a *universal-character-name* in a `char16_t`
-string literal may yield a surrogate pair. In a narrow string literal, a
-*universal-character-name* may map to more than one `char` element due
-to *multibyte encoding*. The size of a `char32_t` or wide string literal
-is the total number of escape sequences, *universal-character-name*s,
-and other characters, plus one for the terminating `U'\0'` or `L'\0'`.
-The size of a `char16_t` string literal is the total number of escape
-sequences, *universal-character-name*s, and other characters, plus one
-for each character requiring a surrogate pair, plus one for the
-terminating `u'\0'`.
-[*Note 4*: The size of a `char16_t` string literal is the number of
 code units, not the number of characters. — *end note*]
-Within `char32_t` and `char16_t` string literals, any
-*universal-character-name*s shall be within the range `0x0` to
-`0x10FFFF`. The size of a narrow string literal is the total number of
-escape sequences and other characters, plus at least one for the
-multibyte encoding of each *universal-character-name*, plus one for the
 terminating `'\0'`.
 Evaluating a *string-literal* results in a string literal object with
 static storage duration, initialized from the given characters as
-specified above. Whether all string literals are distinct (that is, are
-stored in nonoverlapping objects) and whether successive evaluations of
-a *string-literal* yield the same or a different object is unspecified.
-[*Note 5*:  The effect of attempting to modify a string literal is
 undefined. — *end note*]

 s-char-sequence:
     s-char
     s-char-sequence s-char
 ```
+``` bnf
+s-char:
+    any member of the basic source character set except the double-quote '"', backslash '\', or new-line character
+    escape-sequence
+    universal-character-name
+```
 ``` bnf
 raw-string:
     '"' d-char-sequenceₒₚₜ '(' r-char-sequenceₒₚₜ ')' d-char-sequenceₒₚₜ '"'
 ```
 r-char-sequence:
     r-char
     r-char-sequence r-char
 ```
+``` bnf
+r-char:
+    any member of the source character set, except a right parenthesis ')' followed by
+       the initial *d-char-sequence* (which may be empty) followed by a double quote '"'.
+```
 ``` bnf
 d-char-sequence:
     d-char
     d-char-sequence d-char
 ```
+``` bnf
+d-char:
+    any member of the basic source character set except:
+       space, the left parenthesis '(', the right parenthesis ')', the backslash '\', and the control characters
+       representing horizontal tab, vertical tab, form feed, and newline.
+```
 A *string-literal* that has an `R` in the prefix is a *raw string
 literal*. The *d-char-sequence* serves as a delimiter. The terminating
 *d-char-sequence* of a *raw-string* is the same sequence of characters
 as the initial *d-char-sequence*. A *d-char-sequence* shall consist of
 ```
 is equivalent to `"\n)\\\na\"\n"`. The raw string
 ``` cpp
+R"(x = "\"y\"")"
 ```
+is equivalent to `"x = \"\\\"y\\\"\""`.
 — *end example*]
 After translation phase 6, a *string-literal* that does not begin with
+an *encoding-prefix* is an *ordinary string literal*. An ordinary string
+literal has type “array of *n* `const char`” where *n* is the size of
+the string as defined below, has static storage duration [[basic.stc]],
+and is initialized with the given characters.
 A *string-literal* that begins with `u8`, such as `u8"asdf"`, is a
+*UTF-8 string literal*. A UTF-8 string literal has type “array of *n*
+`const char8_t`”, where *n* is the size of the string as defined below;
+each successive element of the object representation [[basic.types]] has
+the value of the corresponding code unit of the UTF-8 encoding of the
+string.
 Ordinary string literals and UTF-8 string literals are also referred to
+as narrow string literals.
+A *string-literal* that begins with `u`, such as `u"asdf"`, is a *UTF-16
+string literal*. A UTF-16 string literal has type “array of *n*
+`const char16_t`”, where *n* is the size of the string as defined below;
+each successive element of the array has the value of the corresponding
+code unit of the UTF-16 encoding of the string.
+[*Note 3*: A single *c-char* may produce more than one `char16_t`
+character in the form of surrogate pairs. A surrogate pair is a
+representation for a single code point as a sequence of two 16-bit code
+units. — *end note*]
+A *string-literal* that begins with `U`, such as `U"asdf"`, is a *UTF-32
+string literal*. A UTF-32 string literal has type “array of *n*
+`const char32_t`”, where *n* is the size of the string as defined below;
+each successive element of the array has the value of the corresponding
+code unit of the UTF-32 encoding of the string.
 A *string-literal* that begins with `L`, such as `L"asdf"`, is a *wide
 string literal*. A wide string literal has type “array of *n* `const
 wchar_t`”, where *n* is the size of the string as defined below; it is
 initialized with the given characters.
+In translation phase 6 [[lex.phases]], adjacent *string-literal*s are
 concatenated. If both *string-literal*s have the same *encoding-prefix*,
+the resulting concatenated *string-literal* has that *encoding-prefix*.
+If one *string-literal* has no *encoding-prefix*, it is treated as a
 *string-literal* of the same *encoding-prefix* as the other operand. If
 a UTF-8 string literal token is adjacent to a wide string literal token,
 the program is ill-formed. Any other concatenations are
 conditionally-supported with *implementation-defined* behavior.
+[*Note 4*: This concatenation is an interpretation, not a conversion.
 Because the interpretation happens in translation phase 6 (after each
+character from a *string-literal* has been translated into a value from
 the appropriate character set), a *string-literal*’s initial rawness has
 no effect on the interpretation or well-formedness of the
 concatenation. — *end note*]
+[[lex.string.concat]] has some examples of valid concatenations.
+**Table: String literal concatenations** <a id="lex.string.concat">[lex.string.concat]</a>
 |                            |       |                            |       |                            |       |
 | -------------------------- | ----- | -------------------------- | ----- | -------------------------- | ----- |
 | *[spans 2 columns]* Source | Means | *[spans 2 columns]* Source | Means | *[spans 2 columns]* Source | Means |
 | `u"a"`                     | `u"b"` | `u"ab"`                    | `U"a"` | `U"b"`                     | `U"ab"` | `L"a"` | `L"b"` | `L"ab"` |
 contains the two characters `'\xA'` and `'B'` after concatenation (and
 not the single hexadecimal character `'\xAB'`).
 — *end example*]
+After any necessary concatenation, in translation phase 7
+[[lex.phases]], `'\0'` is appended to every *string-literal* so that
 programs that scan a string can find its end.
 Escape sequences and *universal-character-name*s in non-raw string
+literals have the same meaning as in *character-literal*s [[lex.ccon]],
 except that the single quote `'` is representable either by itself or by
 the escape sequence `\'`, and the double quote `"` shall be preceded by
+a `\`, and except that a *universal-character-name* in a UTF-16 string
+literal may yield a surrogate pair. In a narrow string literal, a
+*universal-character-name* may map to more than one `char` or `char8_t`
+element due to *multibyte encoding*. The size of a `char32_t` or wide
+string literal is the total number of escape sequences,
+*universal-character-name*s, and other characters, plus one for the
+terminating `U'\0'` or `L'\0'`. The size of a UTF-16 string literal is
+the total number of escape sequences, *universal-character-name*s, and
+other characters, plus one for each character requiring a surrogate
+pair, plus one for the terminating `u'\0'`.
+[*Note 5*: The size of a `char16_t` string literal is the number of
 code units, not the number of characters. — *end note*]
+[*Note 6*: Any *universal-character-name*s are required to correspond
+to a code point in the range [0, D800) or [E000, 10FFFF] (hexadecimal)
+[[lex.charset]]. — *end note*]
+The size of a narrow string literal is the total number of escape
+sequences and other characters, plus at least one for the multibyte
+encoding of each *universal-character-name*, plus one for the
 terminating `'\0'`.
 Evaluating a *string-literal* results in a string literal object with
 static storage duration, initialized from the given characters as
+specified above. Whether all *string-literal*s are distinct (that is,
+are stored in nonoverlapping objects) and whether successive evaluations
+of a *string-literal* yield the same or a different object is
+unspecified.
+[*Note 7*:  The effect of attempting to modify a *string-literal* is
 undefined. — *end note*]

Diff to HTML by rtfpessoa