From Jason Turner

[format.string.escaped]

Diff to HTML by rtfpessoa

Files changed (1) hide show
  1. tmp/tmpxmcqamjp/{from.md → to.md} +85 -0
tmp/tmpxmcqamjp/{from.md → to.md} RENAMED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #### Formatting escaped characters and strings <a id="format.string.escaped">[[format.string.escaped]]</a>
2
+
3
+ A character or string can be formatted as *escaped* to make it more
4
+ suitable for debugging or for logging.
5
+
6
+ The escaped string *E* representation of a string *S* is constructed by
7
+ encoding a sequence of characters as follows. The associated character
8
+ encoding *CE* for `charT` ([[lex.string.literal]]) is used to both
9
+ interpret *S* and construct *E*.
10
+
11
+ - U+0022 (quotation mark) (`"`) is appended to *E*.
12
+ - For each code unit sequence *X* in *S* that either encodes a single
13
+ character, is a shift sequence, or is a sequence of ill-formed code
14
+ units, processing is in order as follows:
15
+ - If *X* encodes a single character *C*, then:
16
+ - If *C* is one of the characters in [[format.escape.sequences]],
17
+ then the two characters shown as the corresponding escape sequence
18
+ are appended to *E*.
19
+ - Otherwise, if *C* is not U+0020 (space) and
20
+ - *CE* is UTF-8, UTF-16, or UTF-32 and *C* corresponds to a
21
+ Unicode scalar value whose Unicode property `General_Category`
22
+ has a value in the groups `Separator` (`Z`) or `Other` (`C`), as
23
+ described by UAX \#44 of the Unicode Standard, or
24
+ - *CE* is UTF-8, UTF-16, or UTF-32 and *C* corresponds to a
25
+ Unicode scalar value with the Unicode property
26
+ `Grapheme_Extend=Yes` as described by UAX \#44 of the Unicode
27
+ Standard and *C* is not immediately preceded in *S* by a
28
+ character *P* appended to *E* without translation to an escape
29
+ sequence, or
30
+ - *CE* is neither UTF-8, UTF-16, nor UTF-32 and *C* is one of an
31
+ implementation-defined set of separator or non-printable
32
+ characters
33
+
34
+ then the sequence `\u{hex-digit-sequence}` is appended to *E*,
35
+ where `hex-digit-sequence` is the shortest hexadecimal
36
+ representation of *C* using lower-case hexadecimal digits.
37
+ - Otherwise, *C* is appended to *E*.
38
+ - Otherwise, if *X* is a shift sequence, the effect on *E* and further
39
+ decoding of *S* is unspecified. *Recommended practice:* A shift
40
+ sequence should be represented in *E* such that the original code
41
+ unit sequence of *S* can be reconstructed.
42
+ - Otherwise (*X* is a sequence of ill-formed code units), each code
43
+ unit *U* is appended to *E* in order as the sequence
44
+ `\x{hex-digit-sequence}`, where `hex-digit-sequence` is the shortest
45
+ hexadecimal representation of *U* using lower-case hexadecimal
46
+ digits.
47
+ - Finally, U+0022 (quotation mark) (`"`) is appended to *E*.
48
+
49
+ **Table: Mapping of characters to escape sequences** <a id="format.escape.sequences">[format.escape.sequences]</a>
50
+
51
+ | Character | Escape sequence |
52
+ | ----------------------------- | --------------- |
53
+ | U+0009 (character tabulation) | `\t` |
54
+ | % U+000a (line feed) | `\n` |
55
+ | % U+000d (carriage return) | `\r` |
56
+ | % U+0022 (quotation mark) | `\"` |
57
+ | % U+005c (reverse solidus) | `` |
58
+
59
+
60
+ The escaped string representation of a character *C* is equivalent to
61
+ the escaped string representation of a string of *C*, except that:
62
+
63
+ - the result starts and ends with U+0027 (apostrophe) (`'`) instead of
64
+ U+0022 (quotation mark) (`"`), and
65
+ - if *C* is U+0027 (apostrophe), the two characters `\'` are appended to
66
+ *E*, and
67
+ - if *C* is U+0022 (quotation mark), then *C* is appended unchanged.
68
+
69
+ [*Example 1*:
70
+
71
+ ``` cpp
72
+ string s0 = format("[{}]", "h\tllo"); // s0 has value: [h\ \ \ \ llo]
73
+ string s1 = format("[{:?}]", "h\tllo"); // s1 has value: ["h\ tllo"]
74
+ string s3 = format("[{:?}, {:?}]", '\'', '"'); // s3 has value: ['\ '', '"']
75
+
76
+ // The following examples assume use of the UTF-8 encoding
77
+ string s4 = format("[{:?}]", string("\0 \n \t \x02 \x1b", 9));
78
+ // s4 has value: ["\ u{0\ \ n \ t \ u{2} \ u{1b}"]}
79
+ string s5 = format("[{:?}]", "\xc3\x28"); // invalid UTF-8, s5 has value: ["\ x{c3\("]}
80
+ string s7 = format("[{:?}]", "\u0301"); // s7 has value: ["\ u{301"]}
81
+ string s8 = format("[{:?}]", "\\\u0301"); // s8 has value: ["\ \ \ u{301"]}
82
+ ```
83
+
84
+ — *end example*]
85
+