[format.string.escaped] - C++20 → C++23

Files changed (1) hide show

tmp/tmpxmcqamjp/{from.md → to.md} +85 -0

tmp/tmpxmcqamjp/{from.md → to.md} RENAMED Viewed

	@@ -0,0 +1,85 @@

+#### Formatting escaped characters and strings <a id="format.string.escaped">[[format.string.escaped]]</a>
+A character or string can be formatted as *escaped* to make it more
+suitable for debugging or for logging.
+The escaped string *E* representation of a string *S* is constructed by
+encoding a sequence of characters as follows. The associated character
+encoding *CE* for `charT` ([[lex.string.literal]]) is used to both
+interpret *S* and construct *E*.
+- U+0022 (quotation mark) (`"`) is appended to *E*.
+- For each code unit sequence *X* in *S* that either encodes a single
+  character, is a shift sequence, or is a sequence of ill-formed code
+  units, processing is in order as follows:
+  - If *X* encodes a single character *C*, then:
+    - If *C* is one of the characters in [[format.escape.sequences]],
+      then the two characters shown as the corresponding escape sequence
+      are appended to *E*.
+    - Otherwise, if *C* is not U+0020 (space) and
+      - *CE* is UTF-8, UTF-16, or UTF-32 and *C* corresponds to a
+        Unicode scalar value whose Unicode property `General_Category`
+        has a value in the groups `Separator` (`Z`) or `Other` (`C`), as
+        described by UAX \#44 of the Unicode Standard, or
+      - *CE* is UTF-8, UTF-16, or UTF-32 and *C* corresponds to a
+        Unicode scalar value with the Unicode property
+        `Grapheme_Extend=Yes` as described by UAX \#44 of the Unicode
+        Standard and *C* is not immediately preceded in *S* by a
+        character *P* appended to *E* without translation to an escape
+        sequence, or
+      - *CE* is neither UTF-8, UTF-16, nor UTF-32 and *C* is one of an
+        implementation-defined set of separator or non-printable
+        characters
+      then the sequence `\u{hex-digit-sequence}` is appended to *E*,
+      where `hex-digit-sequence` is the shortest hexadecimal
+      representation of *C* using lower-case hexadecimal digits.
+    - Otherwise, *C* is appended to *E*.
+  - Otherwise, if *X* is a shift sequence, the effect on *E* and further
+    decoding of *S* is unspecified. *Recommended practice:* A shift
+    sequence should be represented in *E* such that the original code
+    unit sequence of *S* can be reconstructed.
+  - Otherwise (*X* is a sequence of ill-formed code units), each code
+    unit *U* is appended to *E* in order as the sequence
+    `\x{hex-digit-sequence}`, where `hex-digit-sequence` is the shortest
+    hexadecimal representation of *U* using lower-case hexadecimal
+    digits.
+- Finally, U+0022 (quotation mark) (`"`) is appended to *E*.
+**Table: Mapping of characters to escape sequences** <a id="format.escape.sequences">[format.escape.sequences]</a>
+| Character                     | Escape sequence |
+| ----------------------------- | --------------- |
+| U+0009 (character tabulation) | `\t`            |
+| % U+000a (line feed)          | `\n`            |
+| % U+000d (carriage return)    | `\r`            |
+| % U+0022 (quotation mark)     | `\"`            |
+| % U+005c (reverse solidus)    | ``              |
+The escaped string representation of a character *C* is equivalent to
+the escaped string representation of a string of *C*, except that:
+- the result starts and ends with U+0027 (apostrophe) (`'`) instead of
+  U+0022 (quotation mark) (`"`), and
+- if *C* is U+0027 (apostrophe), the two characters `\'` are appended to
+  *E*, and
+- if *C* is U+0022 (quotation mark), then *C* is appended unchanged.
+[*Example 1*:
+``` cpp
+string s0 = format("[{}]", "h\tllo");               // s0 has value: [h\ \ \ \ llo]
+string s1 = format("[{:?}]", "h\tllo");             // s1 has value: ["h\ tllo"]
+string s3 = format("[{:?}, {:?}]", '\'', '"');      // s3 has value: ['\ '', '"']
+// The following examples assume use of the UTF-8 encoding
+string s4 = format("[{:?}]", string("\0 \n \t \x02 \x1b", 9));
+                                                    // s4 has value: ["\ u{0\ \ n \ t \ u{2} \ u{1b}"]}
+string s5 = format("[{:?}]", "\xc3\x28");           // invalid UTF-8, s5 has value: ["\ x{c3\("]}
+string s7 = format("[{:?}]", "\u0301");             // s7 has value: ["\ u{301"]}
+string s8 = format("[{:?}]", "\\\u0301");           // s8 has value: ["\ \ \ u{301"]}
+```
+— *end example*]

Diff to HTML by rtfpessoa