[lex.string] - C++14 → C++17

Files changed (1) hide show

tmp/tmp6mv2vn1g/{from.md → to.md} +79 -61

tmp/tmp6mv2vn1g/{from.md → to.md} RENAMED Viewed

@@ -4,18 +4,10 @@
 string-literal:
     encoding-prefixₒₚₜ '"' s-char-sequenceₒₚₜ '"'
     encoding-prefixₒₚₜ 'R' raw-string
 ```
-``` bnf
-encoding-prefix:
-  'u8'
-  'u'
-  'U'
-  'L'
-```
 ``` bnf
 s-char-sequence:
     s-char
     s-char-sequence s-char
 ```
@@ -35,36 +27,43 @@ r-char-sequence:
 d-char-sequence:
     d-char
     d-char-sequence d-char
 ```
-A string literal is a sequence of characters (as defined in
 [[lex.ccon]]) surrounded by double quotes, optionally prefixed by `R`,
 `u8`, `u8R`, `u`, `uR`, `U`, `UR`, `L`, or `LR`, as in `"..."`,
 `R"(...)"`, `u8"..."`, `u8R"**(...)**"`, `u"..."`, `uR"*~(...)*~"`,
 `U"..."`, `UR"zzz(...)zzz"`, `L"..."`, or `LR"(...)"`, respectively.
-A string literal that has an `R` in the prefix is a *raw string
 literal*. The *d-char-sequence* serves as a delimiter. The terminating
 *d-char-sequence* of a *raw-string* is the same sequence of characters
 as the initial *d-char-sequence*. A *d-char-sequence* shall consist of
 at most 16 characters.
-The characters `'('` and `')'` are permitted in a *raw-string*. Thus,
-`R"delimiter((a|b))delimiter"` is equivalent to `"(a|b)"`.
 A source-file new-line in a raw string literal results in a new-line in
-the resulting execution *string-literal*. Assuming no whitespace at the
 beginning of lines in the following example, the assert will succeed:
 ``` cpp
 const char* p = R"(a\
 b
 c)";
 assert(std::strcmp(p, "a\\\nb\nc") == 0);
 ```
 The raw string
 ``` cpp
 R"a(
 )\
@@ -86,62 +85,63 @@ R"#(
 )#"
 ```
 is equivalent to `"\n)\?\?=\"\n"`.
-After translation phase 6, a string literal that does not begin with an
-*encoding-prefix* is an ordinary string literal, and is initialized with
-the given characters.
-A string literal that begins with `u8`, such as `u8"asdf"`, is a UTF-8
-string literal.
 Ordinary string literals and UTF-8 string literals are also referred to
 as narrow string literals. A narrow string literal has type “array of
 *n* `const char`”, where *n* is the size of the string as defined below,
 and has static storage duration ([[basic.stc]]).
 For a UTF-8 string literal, each successive element of the object
 representation ([[basic.types]]) has the value of the corresponding
 code unit of the UTF-8 encoding of the string.
-A string literal that begins with `u`, such as `u"asdf"`, is a
 `char16_t` string literal. A `char16_t` string literal has type “array
 of *n* `const char16_t`”, where *n* is the size of the string as defined
-below; it has static storage duration and is initialized with the given
-characters. A single *c-char* may produce more than one `char16_t`
-character in the form of surrogate pairs.
-A string literal that begins with `U`, such as `U"asdf"`, is a
 `char32_t` string literal. A `char32_t` string literal has type “array
 of *n* `const char32_t`”, where *n* is the size of the string as defined
-below; it has static storage duration and is initialized with the given
-characters.
-A string literal that begins with `L`, such as `L"asdf"`, is a wide
-string literal. A wide string literal has type “array of *n* `const
-wchar_t`”, where *n* is the size of the string as defined below; it has
-static storage duration and is initialized with the given characters.
-Whether all string literals are distinct (that is, are stored in
-nonoverlapping objects) is *implementation-defined*. The effect of
-attempting to modify a string literal is undefined.
-In translation phase 6 ([[lex.phases]]), adjacent string literals are
-concatenated. If both string literals have the same *encoding-prefix*,
 the resulting concatenated string literal has that *encoding-prefix*. If
-one string literal has no *encoding-prefix*, it is treated as a string
-literal of the same *encoding-prefix* as the other operand. If a UTF-8
-string literal token is adjacent to a wide string literal token, the
-program is ill-formed. Any other concatenations are
-conditionally-supported with *implementation-defined* behavior. This
-concatenation is an interpretation, not a conversion. Because the
-interpretation happens in translation phase 6 (after each character from
-a literal has been translated into a value from the appropriate
-character set), a string literal’s initial rawness has no effect on the
-interpretation or well-formedness of the concatenation. Table
-[[tab:lex.string.concat]] has some examples of valid concatenations.
 **Table: String literal concatenations** <a id="tab:lex.string.concat">[tab:lex.string.concat]</a>
 |                            |       |                            |       |                            |       |
 | -------------------------- | ----- | -------------------------- | ----- | -------------------------- | ----- |
@@ -151,36 +151,54 @@ interpretation or well-formedness of the concatenation. Table
 | `"a"`                      | `u"b"` | `u"ab"`                    | `"a"` | `U"b"`                     | `U"ab"` | `"a"` | `L"b"` | `L"ab"` |
 Characters in concatenated strings are kept distinct.
 ``` cpp
 "\xA" "B"
 ```
 contains the two characters `'\xA'` and `'B'` after concatenation (and
 not the single hexadecimal character `'\xAB'`).
 After any necessary concatenation, in translation phase 7 (
 [[lex.phases]]), `'\0'` is appended to every string literal so that
 programs that scan a string can find its end.
-Escape sequences and universal-character-names in non-raw string
 literals have the same meaning as in character literals ([[lex.ccon]]),
 except that the single quote `'` is representable either by itself or by
 the escape sequence `\'`, and the double quote `"` shall be preceded by
-a `\`. In a narrow string literal, a universal-character-name may map to
-more than one `char` element due to *multibyte encoding*. The size of a
-`char32_t` or wide string literal is the total number of escape
-sequences, universal-character-names, and other characters, plus one for
-the terminating `U'\0'` or `L'\0'`. The size of a `char16_t` string
-literal is the total number of escape sequences,
-universal-character-names, and other characters, plus one for each
-character requiring a surrogate pair, plus one for the terminating
-`u'\0'`. The size of a `char16_t` string literal is the number of code
-units, not the number of characters. Within `char32_t` and `char16_t`
-literals, any universal-character-names shall be within the range `0x0`
-to `0x10FFFF`. The size of a narrow string literal is the total number
-of escape sequences and other characters, plus at least one for the
-multibyte encoding of each universal-character-name, plus one for the
 terminating `'\0'`.

 string-literal:
     encoding-prefixₒₚₜ '"' s-char-sequenceₒₚₜ '"'
     encoding-prefixₒₚₜ 'R' raw-string
 ```
 ``` bnf
 s-char-sequence:
     s-char
     s-char-sequence s-char
 ```
 d-char-sequence:
     d-char
     d-char-sequence d-char
 ```
+A *string-literal* is a sequence of characters (as defined in
 [[lex.ccon]]) surrounded by double quotes, optionally prefixed by `R`,
 `u8`, `u8R`, `u`, `uR`, `U`, `UR`, `L`, or `LR`, as in `"..."`,
 `R"(...)"`, `u8"..."`, `u8R"**(...)**"`, `u"..."`, `uR"*~(...)*~"`,
 `U"..."`, `UR"zzz(...)zzz"`, `L"..."`, or `LR"(...)"`, respectively.
+A *string-literal* that has an `R` in the prefix is a *raw string
 literal*. The *d-char-sequence* serves as a delimiter. The terminating
 *d-char-sequence* of a *raw-string* is the same sequence of characters
 as the initial *d-char-sequence*. A *d-char-sequence* shall consist of
 at most 16 characters.
+[*Note 1*: The characters `'('` and `')'` are permitted in a
+*raw-string*. Thus, `R"delimiter((a|b))delimiter"` is equivalent to
+`"(a|b)"`. — *end note*]
+[*Note 2*:
 A source-file new-line in a raw string literal results in a new-line in
+the resulting execution string literal. Assuming no whitespace at the
 beginning of lines in the following example, the assert will succeed:
 ``` cpp
 const char* p = R"(a\
 b
 c)";
 assert(std::strcmp(p, "a\\\nb\nc") == 0);
 ```
+— *end note*]
+[*Example 1*:
 The raw string
 ``` cpp
 R"a(
 )\
 )#"
 ```
 is equivalent to `"\n)\?\?=\"\n"`.
+— *end example*]
+After translation phase 6, a *string-literal* that does not begin with
+an *encoding-prefix* is an *ordinary string literal*, and is initialized
+with the given characters.
+A *string-literal* that begins with `u8`, such as `u8"asdf"`, is a
+*UTF-8 string literal*.
 Ordinary string literals and UTF-8 string literals are also referred to
 as narrow string literals. A narrow string literal has type “array of
 *n* `const char`”, where *n* is the size of the string as defined below,
 and has static storage duration ([[basic.stc]]).
 For a UTF-8 string literal, each successive element of the object
 representation ([[basic.types]]) has the value of the corresponding
 code unit of the UTF-8 encoding of the string.
+A *string-literal* that begins with `u`, such as `u"asdf"`, is a
 `char16_t` string literal. A `char16_t` string literal has type “array
 of *n* `const char16_t`”, where *n* is the size of the string as defined
+below; it is initialized with the given characters. A single *c-char*
+may produce more than one `char16_t` character in the form of surrogate
+pairs.
+A *string-literal* that begins with `U`, such as `U"asdf"`, is a
 `char32_t` string literal. A `char32_t` string literal has type “array
 of *n* `const char32_t`”, where *n* is the size of the string as defined
+below; it is initialized with the given characters.
+A *string-literal* that begins with `L`, such as `L"asdf"`, is a *wide
+string literal*. A wide string literal has type “array of *n* `const
+wchar_t`”, where *n* is the size of the string as defined below; it is
+initialized with the given characters.
+In translation phase 6 ([[lex.phases]]), adjacent *string-literal*s are
+concatenated. If both *string-literal*s have the same *encoding-prefix*,
 the resulting concatenated string literal has that *encoding-prefix*. If
+one *string-literal* has no *encoding-prefix*, it is treated as a
+*string-literal* of the same *encoding-prefix* as the other operand. If
+a UTF-8 string literal token is adjacent to a wide string literal token,
+the program is ill-formed. Any other concatenations are
+conditionally-supported with *implementation-defined* behavior.
+[*Note 3*: This concatenation is an interpretation, not a conversion.
+Because the interpretation happens in translation phase 6 (after each
+character from a string literal has been translated into a value from
+the appropriate character set), a *string-literal*’s initial rawness has
+no effect on the interpretation or well-formedness of the
+concatenation. — *end note*]
+Table  [[tab:lex.string.concat]] has some examples of valid
+concatenations.
 **Table: String literal concatenations** <a id="tab:lex.string.concat">[tab:lex.string.concat]</a>
 |                            |       |                            |       |                            |       |
 | -------------------------- | ----- | -------------------------- | ----- | -------------------------- | ----- |
 | `"a"`                      | `u"b"` | `u"ab"`                    | `"a"` | `U"b"`                     | `U"ab"` | `"a"` | `L"b"` | `L"ab"` |
 Characters in concatenated strings are kept distinct.
+[*Example 2*:
 ``` cpp
 "\xA" "B"
 ```
 contains the two characters `'\xA'` and `'B'` after concatenation (and
 not the single hexadecimal character `'\xAB'`).
+— *end example*]
 After any necessary concatenation, in translation phase 7 (
 [[lex.phases]]), `'\0'` is appended to every string literal so that
 programs that scan a string can find its end.
+Escape sequences and *universal-character-name*s in non-raw string
 literals have the same meaning as in character literals ([[lex.ccon]]),
 except that the single quote `'` is representable either by itself or by
 the escape sequence `\'`, and the double quote `"` shall be preceded by
+a `\`, and except that a *universal-character-name* in a `char16_t`
+string literal may yield a surrogate pair. In a narrow string literal, a
+*universal-character-name* may map to more than one `char` element due
+to *multibyte encoding*. The size of a `char32_t` or wide string literal
+is the total number of escape sequences, *universal-character-name*s,
+and other characters, plus one for the terminating `U'\0'` or `L'\0'`.
+The size of a `char16_t` string literal is the total number of escape
+sequences, *universal-character-name*s, and other characters, plus one
+for each character requiring a surrogate pair, plus one for the
+terminating `u'\0'`.
+[*Note 4*: The size of a `char16_t` string literal is the number of
+code units, not the number of characters. — *end note*]
+Within `char32_t` and `char16_t` string literals, any
+*universal-character-name*s shall be within the range `0x0` to
+`0x10FFFF`. The size of a narrow string literal is the total number of
+escape sequences and other characters, plus at least one for the
+multibyte encoding of each *universal-character-name*, plus one for the
 terminating `'\0'`.
+Evaluating a *string-literal* results in a string literal object with
+static storage duration, initialized from the given characters as
+specified above. Whether all string literals are distinct (that is, are
+stored in nonoverlapping objects) and whether successive evaluations of
+a *string-literal* yield the same or a different object is unspecified.
+[*Note 5*:  The effect of attempting to modify a string literal is
+undefined. — *end note*]

Diff to HTML by rtfpessoa