[lex.string] - C++23 → Trunk

Files changed (1) hide show

tmp/tmptr277u8h/{from.md → to.md} +29 -34

tmp/tmptr277u8h/{from.md → to.md} RENAMED Viewed

@@ -6,12 +6,11 @@ string-literal:
     encoding-prefixₒₚₜ 'R' raw-string
 ```
 ``` bnf
 s-char-sequence:
-    s-char
-    s-char-sequence s-char
 ```
 ``` bnf
 s-char:
     basic-s-char
@@ -30,24 +29,22 @@ raw-string:
     '"' d-char-sequenceₒₚₜ '(' r-char-sequenceₒₚₜ ')' d-char-sequenceₒₚₜ '"'
 ```
 ``` bnf
 r-char-sequence:
-    r-char
-    r-char-sequence r-char
 ```
 ``` bnf
 r-char:
     any member of the translation character set, except a U+0029 (right parenthesis) followed by
        the initial *d-char-sequence* (which may be empty) followed by a U+0022 (quotation mark)
 ```
 ``` bnf
 d-char-sequence:
-    d-char
-    d-char-sequence d-char
 ```
 ``` bnf
 d-char:
     any member of the basic character set except:
@@ -56,16 +53,17 @@ d-char:
 ```
 The kind of a *string-literal*, its type, and its associated character
 encoding [[lex.charset]] are determined by its encoding prefix and
 sequence of *s-char*s or *r-char*s as defined by [[lex.string.literal]]
-where n is the number of encoded code units as described below.
 **Table: String literals** <a id="lex.string.literal">[lex.string.literal]</a>
-| | | |                           |                                                |
-| ---- | ----------------------- | ----------------------------- | ------------------------- | ---------------------------------------------- |
 | none              | ordinary string literal | array of $n$ `const char`     | ordinary literal encoding     | `"ordinary string"` `R"(ordinary raw string)"` |
 | `L`               | wide string literal     | array of $n$ `const wchar_t`  | wide literal encoding         | `L"wide string"` `LR"w(wide raw string)w"`     |
 | `u8`              | UTF-8 string literal    | array of $n$ `const char8_t`  | UTF-8                         | `u8"UTF-8 string"` `u8R"x(UTF-8 raw string)x"` |
 | `u`               | UTF-16 string literal   | array of $n$ `const char16_t` | UTF-16                        | `u"UTF-16 string"` `uR"y(UTF-16 raw string)y"` |
 | `U`               | UTF-32 string literal   | array of $n$ `const char32_t` | UTF-32                        | `U"UTF-32 string"` `UR"z(UTF-32 raw string)z"` |
@@ -75,12 +73,12 @@ A *string-literal* that has an `R` in the prefix is a *raw string
 literal*. The *d-char-sequence* serves as a delimiter. The terminating
 *d-char-sequence* of a *raw-string* is the same sequence of characters
 as the initial *d-char-sequence*. A *d-char-sequence* shall consist of
 at most 16 characters.
-[*Note 1*: The characters `'('` and `')'` are permitted in a
-*raw-string*. Thus, `R"delimiter((a|b))delimiter"` is equivalent to
 `"(a|b)"`. — *end note*]
 [*Note 2*:
 A source-file new-line in a raw string literal results in a new-line in
@@ -116,18 +114,15 @@ R"(x = "\"y\"")"
 is equivalent to `"x = \"\\\"y\\\"\""`.
 — *end example*]
 Ordinary string literals and UTF-8 string literals are also referred to
-as narrow string literals.
-The common *encoding-prefix* for a sequence of adjacent
-*string-literal*s is determined pairwise as follows: If two
-*string-literal*s have the same *encoding-prefix*, the common
-*encoding-prefix* is that *encoding-prefix*. If one *string-literal* has
-no *encoding-prefix*, the common *encoding-prefix* is that of the other
-*string-literal*. Any other combinations are ill-formed.
 [*Note 3*: A *string-literal*’s rawness has no effect on the
 determination of the common *encoding-prefix*. — *end note*]
 In translation phase 6 [[lex.phases]], adjacent *string-literal*s are
@@ -164,16 +159,17 @@ digit `1` (and not the single character `'A'` specified by a
 | `u"a"`                     | `"b"` | `u"ab"`                    | `U"a"` | `"b"`                      | `U"ab"` | `L"a"` | `"b"` | `L"ab"` |
 | `"a"`                      | `u"b"` | `u"ab"`                    | `"a"` | `U"b"`                     | `U"ab"` | `"a"` | `L"b"` | `L"ab"` |
 Evaluating a *string-literal* results in a string literal object with
-static storage duration [[basic.stc]]. Whether all *string-literal*s are
-distinct (that is, are stored in nonoverlapping objects) and whether
-successive evaluations of a *string-literal* yield the same or a
-different object is unspecified.
-[*Note 4*:  The effect of attempting to modify a string literal object
 is undefined. — *end note*]
 String literal objects are initialized with the sequence of code unit
 values corresponding to the *string-literal*’s sequence of *s-char*s
 (originally from non-raw string literals) and *r-char*s (originally from
@@ -183,20 +179,19 @@ order as follows:
 - The sequence of characters denoted by each contiguous sequence of
   *basic-s-char*s, *r-char*s, *simple-escape-sequence*s [[lex.ccon]],
   and *universal-character-name*s [[lex.charset]] is encoded to a code
   unit sequence using the *string-literal*’s associated character
   encoding. If a character lacks representation in the associated
-  character encoding, then the *string-literal* is
- conditionally-supported and an *implementation-defined* code unit
- sequence is encoded. \[*Note 5*: No character lacks representation in
- any Unicode encoding form. — *end note*] When encoding a stateful
- character encoding, implementations should encode the first such
- sequence beginning with the initial encoding state and encode
- subsequent sequences beginning with the final encoding state of the
- prior sequence. \[*Note 6*: The encoded code unit sequence can differ
- from the sequence of code units that would be obtained by encoding
-  each character independently. — *end note*]
 - Each *numeric-escape-sequence* [[lex.ccon]] contributes a single code
   unit with a value as follows:
   - Let v be the integer value represented by the octal number
     comprising the sequence of *octal-digit*s in an
     *octal-escape-sequence* or by the hexadecimal number comprising the
@@ -207,11 +202,11 @@ order as follows:
     `L`, and v does not exceed the range of representable values of the
     corresponding unsigned type for the underlying type of the
     *string-literal*’s array element type, then the value is the unique
     value of the *string-literal*’s array element type `T` that is
     congruent to v modulo 2ᴺ, where N is the width of `T`.
-  - Otherwise, the *string-literal* is ill-formed.
   When encoding a stateful character encoding, these sequences should
   have no effect on encoding state.
 - Each *conditional-escape-sequence* [[lex.ccon]] contributes an
   *implementation-defined* code unit sequence. When encoding a stateful

     encoding-prefixₒₚₜ 'R' raw-string
 ```
 ``` bnf
 s-char-sequence:
+    s-char s-char-sequenceₒₚₜ
 ```
 ``` bnf
 s-char:
     basic-s-char
     '"' d-char-sequenceₒₚₜ '(' r-char-sequenceₒₚₜ ')' d-char-sequenceₒₚₜ '"'
 ```
 ``` bnf
 r-char-sequence:
+    r-char r-char-sequenceₒₚₜ
 ```
 ``` bnf
 r-char:
     any member of the translation character set, except a U+0029 (right parenthesis) followed by
        the initial *d-char-sequence* (which may be empty) followed by a U+0022 (quotation mark)
 ```
 ``` bnf
 d-char-sequence:
+    d-char d-char-sequenceₒₚₜ
 ```
 ``` bnf
 d-char:
     any member of the basic character set except:
 ```
 The kind of a *string-literal*, its type, and its associated character
 encoding [[lex.charset]] are determined by its encoding prefix and
 sequence of *s-char*s or *r-char*s as defined by [[lex.string.literal]]
+where n is the number of encoded code units that would result from an
+evaluation of the *string-literal* (see below).
 **Table: String literals** <a id="lex.string.literal">[lex.string.literal]</a>
+| Enco- ding prefix | Kind \chdr \chdr        | Type \chdr \chdr              | Associated character encoding | Examples \rhdr \rhdr                           |
+| ----------------- | ----------------------- | ----------------------------- | ----------------------------- | ---------------------------------------------- |
 | none              | ordinary string literal | array of $n$ `const char`     | ordinary literal encoding     | `"ordinary string"` `R"(ordinary raw string)"` |
 | `L`               | wide string literal     | array of $n$ `const wchar_t`  | wide literal encoding         | `L"wide string"` `LR"w(wide raw string)w"`     |
 | `u8`              | UTF-8 string literal    | array of $n$ `const char8_t`  | UTF-8                         | `u8"UTF-8 string"` `u8R"x(UTF-8 raw string)x"` |
 | `u`               | UTF-16 string literal   | array of $n$ `const char16_t` | UTF-16                        | `u"UTF-16 string"` `uR"y(UTF-16 raw string)y"` |
 | `U`               | UTF-32 string literal   | array of $n$ `const char32_t` | UTF-32                        | `U"UTF-32 string"` `UR"z(UTF-32 raw string)z"` |
 literal*. The *d-char-sequence* serves as a delimiter. The terminating
 *d-char-sequence* of a *raw-string* is the same sequence of characters
 as the initial *d-char-sequence*. A *d-char-sequence* shall consist of
 at most 16 characters.
+[*Note 1*: The characters `'('` and `')'` can appear in a *raw-string*.
+Thus, `R"delimiter((a|b))delimiter"` is equivalent to
 `"(a|b)"`. — *end note*]
 [*Note 2*:
 A source-file new-line in a raw string literal results in a new-line in
 is equivalent to `"x = \"\\\"y\\\"\""`.
 — *end example*]
 Ordinary string literals and UTF-8 string literals are also referred to
+as *narrow string literals*.
+The *string-literal*s in any sequence of adjacent *string-literal*s
+shall have at most one unique *encoding-prefix* among them. The common
+*encoding-prefix* of the sequence is that *encoding-prefix*, if any.
 [*Note 3*: A *string-literal*’s rawness has no effect on the
 determination of the common *encoding-prefix*. — *end note*]
 In translation phase 6 [[lex.phases]], adjacent *string-literal*s are
 | `u"a"`                     | `"b"` | `u"ab"`                    | `U"a"` | `"b"`                      | `U"ab"` | `L"a"` | `"b"` | `L"ab"` |
 | `"a"`                      | `u"b"` | `u"ab"`                    | `"a"` | `U"b"`                     | `U"ab"` | `"a"` | `L"b"` | `L"ab"` |
 Evaluating a *string-literal* results in a string literal object with
+static storage duration [[basic.stc]].
+[*Note 4*: String literal objects are potentially non-unique
+[[intro.object]]. Whether successive evaluations of a *string-literal*
+yield the same or a different object is unspecified. — *end note*]
+[*Note 5*:  The effect of attempting to modify a string literal object
 is undefined. — *end note*]
 String literal objects are initialized with the sequence of code unit
 values corresponding to the *string-literal*’s sequence of *s-char*s
 (originally from non-raw string literals) and *r-char*s (originally from
 - The sequence of characters denoted by each contiguous sequence of
   *basic-s-char*s, *r-char*s, *simple-escape-sequence*s [[lex.ccon]],
   and *universal-character-name*s [[lex.charset]] is encoded to a code
   unit sequence using the *string-literal*’s associated character
   encoding. If a character lacks representation in the associated
+  character encoding, then the program is ill-formed. \[*Note 6*: No
+ character lacks representation in any Unicode encoding
+ form. — *end note*] When encoding a stateful character encoding,
+ implementations should encode the first such sequence beginning with
+ the initial encoding state and encode subsequent sequences beginning
+  with the final encoding state of the prior sequence. \[*Note 7*: The
+ encoded code unit sequence can differ from the sequence of code units
+ that would be obtained by encoding each character
+ independently. — *end note*]
 - Each *numeric-escape-sequence* [[lex.ccon]] contributes a single code
   unit with a value as follows:
   - Let v be the integer value represented by the octal number
     comprising the sequence of *octal-digit*s in an
     *octal-escape-sequence* or by the hexadecimal number comprising the
     `L`, and v does not exceed the range of representable values of the
     corresponding unsigned type for the underlying type of the
     *string-literal*’s array element type, then the value is the unique
     value of the *string-literal*’s array element type `T` that is
     congruent to v modulo 2ᴺ, where N is the width of `T`.
+  - Otherwise, the program is ill-formed.
   When encoding a stateful character encoding, these sequences should
   have no effect on encoding state.
 - Each *conditional-escape-sequence* [[lex.ccon]] contributes an
   *implementation-defined* code unit sequence. When encoding a stateful

Diff to HTML by rtfpessoa