From Jason Turner

[lex.ccon]

Diff to HTML by rtfpessoa

Files changed (1) hide show
  1. tmp/tmp4mkv6h0u/{from.md → to.md} +130 -122
tmp/tmp4mkv6h0u/{from.md → to.md} RENAMED
@@ -16,153 +16,161 @@ c-char-sequence:
16
  c-char-sequence c-char
17
  ```
18
 
19
  ``` bnf
20
  c-char:
21
- any member of the basic source character set except the single-quote ''', backslash '\', or new-line character
22
  escape-sequence
23
  universal-character-name
24
  ```
25
 
 
 
 
 
 
 
26
  ``` bnf
27
  escape-sequence:
28
  simple-escape-sequence
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
  octal-escape-sequence
30
  hexadecimal-escape-sequence
31
  ```
32
 
33
  ``` bnf
34
- simple-escape-sequence: one of
35
- '\'' '\"' '\?' '\\'
36
- '\a' '\b' '\f' '\n' '\r' '\t' '\v'
37
  ```
38
 
39
  ``` bnf
40
  octal-escape-sequence:
41
  '\' octal-digit
42
  '\' octal-digit octal-digit
43
  '\' octal-digit octal-digit octal-digit
 
44
  ```
45
 
46
  ``` bnf
47
  hexadecimal-escape-sequence:
48
- '\x' hexadecimal-digit
49
- hexadecimal-escape-sequence hexadecimal-digit
50
  ```
51
 
52
- A *character-literal* that does not begin with `u8`, `u`, `U`, or `L` is
53
- an *ordinary character literal*. An ordinary character literal that
54
- contains a single *c-char* representable in the execution character set
55
- has type `char`, with value equal to the numerical value of the encoding
56
- of the *c-char* in the execution character set. An ordinary character
57
- literal that contains more than one *c-char* is a
58
- *multicharacter literal*. A multicharacter literal, or an ordinary
59
- character literal containing a single *c-char* not representable in the
60
- execution character set, is conditionally-supported, has type `int`, and
61
- has an *implementation-defined* value.
62
-
63
- A *character-literal* that begins with `u8`, such as `u8'w'`, is a
64
- *character-literal* of type `char8_t`, known as a *UTF-8 character
65
- literal*. The value of a UTF-8 character literal is equal to its ISO/IEC
66
- 10646 code point value, provided that the code point value can be
67
- encoded as a single UTF-8 code unit.
68
-
69
- [*Note 1*: That is, provided the code point value is in the range
70
- [0, 7F] (hexadecimal). — *end note*]
71
-
72
- If the value is not representable with a single UTF-8 code unit, the
73
- program is ill-formed. A UTF-8 character literal containing multiple
74
- *c-char*s is ill-formed.
75
-
76
- A *character-literal* that begins with the letter `u`, such as `u'x'`,
77
- is a *character-literal* of type `char16_t`, known as a *UTF-16
78
- character literal*. The value of a UTF-16 character literal is equal to
79
- its ISO/IEC 10646 code point value, provided that the code point value
80
- is representable with a single 16-bit code unit.
81
-
82
- [*Note 2*: That is, provided the code point value is in the range
83
- [0, FFFF] (hexadecimal). *end note*]
84
-
85
- If the value is not representable with a single 16-bit code unit, the
86
- program is ill-formed. A UTF-16 character literal containing multiple
87
- *c-char*s is ill-formed.
88
-
89
- A *character-literal* that begins with the letter `U`, such as `U'y'`,
90
- is a *character-literal* of type `char32_t`, known as a *UTF-32
91
- character literal*. The value of a UTF-32 character literal containing a
92
- single *c-char* is equal to its ISO/IEC 10646 code point value. A UTF-32
93
- character literal containing multiple *c-char*s is ill-formed.
94
-
95
- A *character-literal* that begins with the letter `L`, such as `L'z'`,
96
- is a *wide-character literal*. A wide-character literal has type
97
- `wchar_t`.[^12] The value of a wide-character literal containing a
98
- single *c-char* has value equal to the numerical value of the encoding
99
- of the *c-char* in the execution wide-character set, unless the *c-char*
100
- has no representation in the execution wide-character set, in which case
101
- the value is *implementation-defined*.
102
-
103
- [*Note 3*: The type `wchar_t` is able to represent all members of the
104
- execution wide-character set (see 
105
- [[basic.fundamental]]). — *end note*]
106
-
107
- The value of a wide-character literal containing multiple *c-char*s is
108
- *implementation-defined*.
109
-
110
- Certain non-graphic characters, the single quote `'`, the double quote
111
- `"`, the question mark `?`,[^13] and the backslash `\`, can be
112
- represented according to [[lex.ccon.esc]]. The double quote `"` and the
113
- question mark `?`, can be represented as themselves or by the escape
114
- sequences `\"` and `\?` respectively, but the single quote `'` and the
115
- backslash `\` shall be represented by the escape sequences `\'` and `\\`
116
- respectively. Escape sequences in which the character following the
117
- backslash is not listed in [[lex.ccon.esc]] are conditionally-supported,
118
- with *implementation-defined* semantics. An escape sequence specifies a
119
- single character.
120
-
121
- **Table: Escape sequences** <a id="lex.ccon.esc">[lex.ccon.esc]</a>
122
-
123
- | | | |
124
- | --------------- | -------------- | ------------------ |
125
- | new-line | NL(LF) | `\n` |
126
- | horizontal tab | HT | `\t` |
127
- | vertical tab | VT | `\v` |
128
- | backspace | BS | `\b` |
129
- | carriage return | CR | `\r` |
130
- | form feed | FF | `\f` |
131
- | alert | BEL | `\a` |
132
- | backslash | \ | `` |
133
- | question mark | ? | `\?` |
134
- | single quote | `'` | `\'` |
135
- | double quote | `"` | `\"` |
136
- | octal number | \numconst{ooo} | `numconst{ooo}` |
137
- | hex number | \numconst{hhh} | `\x\numconst{hhh}` |
138
-
139
-
140
- The escape `\\numconst{ooo}` consists of the backslash followed by one,
141
- two, or three octal digits that are taken to specify the value of the
142
- desired character. The escape `\x\numconst{hhh}` consists of the
143
- backslash followed by `x` followed by one or more hexadecimal digits
144
- that are taken to specify the value of the desired character. There is
145
- no limit to the number of digits in a hexadecimal sequence. A sequence
146
- of octal or hexadecimal digits is terminated by the first character that
147
- is not an octal digit or a hexadecimal digit, respectively. The value of
148
- a *character-literal* is *implementation-defined* if it falls outside of
149
- the *implementation-defined* range defined for `char` (for
150
- *character-literal*s with no prefix) or `wchar_t` (for
151
- *character-literal*s prefixed by `L`).
152
-
153
- [*Note 4*: If the value of a *character-literal* prefixed by `u`, `u8`,
154
- or `U` is outside the range defined for its type, the program is
155
- ill-formed. — *end note*]
156
-
157
- A *universal-character-name* is translated to the encoding, in the
158
- appropriate execution character set, of the character named. If there is
159
- no such encoding, the *universal-character-name* is translated to an
160
- *implementation-defined* encoding.
161
-
162
- [*Note 5*: In translation phase 1, a *universal-character-name* is
163
- introduced whenever an actual extended character is encountered in the
164
- source text. Therefore, all extended characters are described in terms
165
- of *universal-character-name*s. However, the actual compiler
166
- implementation may use its own native character set, so long as the same
167
- results are obtained. — *end note*]
168
 
 
16
  c-char-sequence c-char
17
  ```
18
 
19
  ``` bnf
20
  c-char:
21
+ basic-c-char
22
  escape-sequence
23
  universal-character-name
24
  ```
25
 
26
+ ``` bnf
27
+ basic-c-char:
28
+ any member of the translation character set except the U+0027 (apostrophe),
29
+ U+005c (reverse solidus), or new-line character
30
+ ```
31
+
32
  ``` bnf
33
  escape-sequence:
34
  simple-escape-sequence
35
+ numeric-escape-sequence
36
+ conditional-escape-sequence
37
+ ```
38
+
39
+ ``` bnf
40
+ simple-escape-sequence:
41
+ '\' simple-escape-sequence-char
42
+ ```
43
+
44
+ ``` bnf
45
+ simple-escape-sequence-char: one of
46
+ '' " ? \ a b f n r t v'
47
+ ```
48
+
49
+ ``` bnf
50
+ numeric-escape-sequence:
51
  octal-escape-sequence
52
  hexadecimal-escape-sequence
53
  ```
54
 
55
  ``` bnf
56
+ simple-octal-digit-sequence:
57
+ octal-digit
58
+ simple-octal-digit-sequence octal-digit
59
  ```
60
 
61
  ``` bnf
62
  octal-escape-sequence:
63
  '\' octal-digit
64
  '\' octal-digit octal-digit
65
  '\' octal-digit octal-digit octal-digit
66
+ '\o{' simple-octal-digit-sequence '}'
67
  ```
68
 
69
  ``` bnf
70
  hexadecimal-escape-sequence:
71
+ '\x' simple-hexadecimal-digit-sequence
72
+ '\x{' simple-hexadecimal-digit-sequence '}'
73
  ```
74
 
75
+ ``` bnf
76
+ conditional-escape-sequence:
77
+ '\' conditional-escape-sequence-char
78
+ ```
79
+
80
+ ``` bnf
81
+ conditional-escape-sequence-char:
82
+ any member of the basic character set that is not an octal-digit, a simple-escape-sequence-char, or the characters 'N', 'o', 'u', 'U', or 'x'
83
+ ```
84
+
85
+ A *non-encodable character literal* is a *character-literal* whose
86
+ *c-char-sequence* consists of a single *c-char* that is not a
87
+ *numeric-escape-sequence* and that specifies a character that either
88
+ lacks representation in the literal’s associated character encoding or
89
+ that cannot be encoded as a single code unit. A *multicharacter literal*
90
+ is a *character-literal* whose *c-char-sequence* consists of more than
91
+ one *c-char*. The *encoding-prefix* of a non-encodable character literal
92
+ or a multicharacter literal shall be absent. Such *character-literal*s
93
+ are conditionally-supported.
94
+
95
+ The kind of a *character-literal*, its type, and its associated
96
+ character encoding [[lex.charset]] are determined by its
97
+ *encoding-prefix* and its *c-char-sequence* as defined by
98
+ [[lex.ccon.literal]]. The special cases for non-encodable character
99
+ literals and multicharacter literals take precedence over the base kind.
100
+
101
+ [*Note 1*: The associated character encoding for ordinary character
102
+ literals determines encodability, but does not determine the value of
103
+ non-encodable ordinary character literals or ordinary multicharacter
104
+ literals. The examples in [[lex.ccon.literal]] for non-encodable
105
+ ordinary character literals assume that the specified character lacks
106
+ representation in the ordinary literal encoding or that encoding the
107
+ character would require more than one code unit. — *end note*]
108
+
109
+ **Table: Character literals** <a id="lex.ccon.literal">[lex.ccon.literal]</a>
110
+
111
+ | | | | | |
112
+ | ---- | -------------------------- | ---------- | ------------ | ------- |
113
+ | none | ordinary character literal | `char` | ordinary | `'v'` |
114
+ | `L` | wide character literal | `wchar_t` | wide literal | `L'w'` |
115
+ | | | | encoding | |
116
+ | `u8` | UTF-8 character literal | `char8_t` | UTF-8 | `u8'x'` |
117
+ | `u` | UTF-16 character literal | `char16_t` | UTF-16 | `u'y'` |
118
+ | `U` | UTF-32 character literal | `char32_t` | UTF-32 | `U'z'` |
119
+
120
+
121
+ In translation phase 4, the value of a *character-literal* is determined
122
+ using the range of representable values of the *character-literal*’s
123
+ type in translation phase 7. A non-encodable character literal or a
124
+ multicharacter literal has an *implementation-defined* value. The value
125
+ of any other kind of *character-literal* is determined as follows:
126
+
127
+ - A *character-literal* with a *c-char-sequence* consisting of a single
128
+ *basic-c-char*, *simple-escape-sequence*, or
129
+ *universal-character-name* is the code unit value of the specified
130
+ character as encoded in the literal’s associated character encoding.
131
+ \[*Note 2*: If the specified character lacks representation in the
132
+ literal’s associated character encoding or if it cannot be encoded as
133
+ a single code unit, then the literal is a non-encodable character
134
+ literal. *end note*]
135
+ - A *character-literal* with a *c-char-sequence* consisting of a single
136
+ *numeric-escape-sequence* has a value as follows:
137
+ - Let v be the integer value represented by the octal number
138
+ comprising the sequence of *octal-digit*s in an
139
+ *octal-escape-sequence* or by the hexadecimal number comprising the
140
+ sequence of *hexadecimal-digit*s in a *hexadecimal-escape-sequence*.
141
+ - If v does not exceed the range of representable values of the
142
+ *character-literal*’s type, then the value is v.
143
+ - Otherwise, if the *character-literal*’s *encoding-prefix* is absent
144
+ or `L`, and v does not exceed the range of representable values of
145
+ the corresponding unsigned type for the underlying type of the
146
+ *character-literal*’s type, then the value is the unique value of
147
+ the *character-literal*’s type `T` that is congruent to v modulo 2ᴺ,
148
+ where N is the width of `T`.
149
+ - Otherwise, the *character-literal* is ill-formed.
150
+ - A *character-literal* with a *c-char-sequence* consisting of a single
151
+ *conditional-escape-sequence* is conditionally-supported and has an
152
+ *implementation-defined* value.
153
+
154
+ The character specified by a *simple-escape-sequence* is specified in
155
+ [[lex.ccon.esc]].
156
+
157
+ [*Note 3*: Using an escape sequence for a question mark is supported
158
+ for compatibility with ISO C++14 and ISO C. — *end note*]
159
+
160
+ **Table: Simple escape sequences** <a id="lex.ccon.esc">[lex.ccon.esc]</a>
161
+
162
+ | character | | *simple-escape-sequence* |
163
+ | --------- | -------------------- | ------------------------ |
164
+ | `U+000a` | line feed | `\n` |
165
+ | `U+0009` | character tabulation | `\t` |
166
+ | `U+000b` | line tabulation | `\v` |
167
+ | `U+0008` | backspace | `\b` |
168
+ | `U+000d` | carriage return | `\r` |
169
+ | `U+000c` | form feed | `\f` |
170
+ | `U+0007` | alert | `\a` |
171
+ | `U+005c` | reverse solidus | `` |
172
+ | `U+003f` | question mark | `\?` |
173
+ | `U+0027` | apostrophe | `\'` |
174
+ | `U+0022` | quotation mark | `\"` |
175
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
176