From Jason Turner

[lex.ccon]

Diff to HTML by rtfpessoa

Files changed (1) hide show
  1. tmp/tmp9ch4qk5m/{from.md → to.md} +64 -53
tmp/tmp9ch4qk5m/{from.md → to.md} RENAMED
@@ -14,10 +14,17 @@ encoding-prefix: one of
14
  c-char-sequence:
15
  c-char
16
  c-char-sequence c-char
17
  ```
18
 
 
 
 
 
 
 
 
19
  ``` bnf
20
  escape-sequence:
21
  simple-escape-sequence
22
  octal-escape-sequence
23
  hexadecimal-escape-sequence
@@ -40,76 +47,80 @@ octal-escape-sequence:
40
  hexadecimal-escape-sequence:
41
  '\x' hexadecimal-digit
42
  hexadecimal-escape-sequence hexadecimal-digit
43
  ```
44
 
45
- A character literal is one or more characters enclosed in single quotes,
46
- as in `'x'`, optionally preceded by `u8`, `u`, `U`, or `L`, as in
47
- `u8'w'`, `u'x'`, `U'y'`, or `L'z'`, respectively.
48
-
49
- A character literal that does not begin with `u8`, `u`, `U`, or `L` is
50
  an *ordinary character literal*. An ordinary character literal that
51
  contains a single *c-char* representable in the execution character set
52
  has type `char`, with value equal to the numerical value of the encoding
53
  of the *c-char* in the execution character set. An ordinary character
54
- literal that contains more than one *c-char* is a *multicharacter
55
- literal*. A multicharacter literal, or an ordinary character literal
56
- containing a single *c-char* not representable in the execution
57
- character set, is conditionally-supported, has type `int`, and has an
58
- *implementation-defined* value.
59
-
60
- A character literal that begins with `u8`, such as `u8'w'`, is a
61
- character literal of type `char`, known as a *UTF-8 character literal*.
62
- The value of a UTF-8 character literal is equal to its ISO 10646 code
63
- point value, provided that the code point value is representable with a
64
- single UTF-8 code unit (that is, provided it is in the C0 Controls and
65
- Basic Latin Unicode block). If the value is not representable with a
66
- single UTF-8 code unit, the program is ill-formed. A UTF-8 character
67
- literal containing multiple *c-char*s is ill-formed.
68
-
69
- A character literal that begins with the letter `u`, such as `u'x'`, is
70
- a character literal of type `char16_t`. The value of a `char16_t`
71
- character literal containing a single *c-char* is equal to its ISO 10646
72
- code point value, provided that the code point is representable with a
73
- single 16-bit code unit. (That is, provided it is a basic multi-lingual
74
- plane code point.) If the value is not representable within 16 bits, the
75
- program is ill-formed. A `char16_t` character literal containing
76
- multiple *c-char*s is ill-formed.
77
-
78
- A character literal that begins with the letter `U`, such as `U'y'`, is
79
- a character literal of type `char32_t`. The value of a `char32_t`
80
- character literal containing a single *c-char* is equal to its ISO 10646
81
- code point value. A `char32_t` character literal containing multiple
 
 
82
  *c-char*s is ill-formed.
83
 
84
- A character literal that begins with the letter `L`, such as `L'z'`, is
85
- a *wide-character literal*. A wide-character literal has type
86
- `wchar_t`.[^13] The value of a wide-character literal containing a
 
 
 
 
 
 
87
  single *c-char* has value equal to the numerical value of the encoding
88
  of the *c-char* in the execution wide-character set, unless the *c-char*
89
  has no representation in the execution wide-character set, in which case
90
  the value is *implementation-defined*.
91
 
92
- [*Note 1*: The type `wchar_t` is able to represent all members of the
93
  execution wide-character set (see 
94
  [[basic.fundamental]]). — *end note*]
95
 
96
  The value of a wide-character literal containing multiple *c-char*s is
97
  *implementation-defined*.
98
 
99
  Certain non-graphic characters, the single quote `'`, the double quote
100
- `"`, the question mark `?`,[^14] and the backslash `\`, can be
101
- represented according to Table  [[tab:escape.sequences]]. The double
102
- quote `"` and the question mark `?`, can be represented as themselves or
103
- by the escape sequences `\"` and `\?` respectively, but the single quote
104
- `'` and the backslash `\` shall be represented by the escape sequences
105
- `\'` and `\\` respectively. Escape sequences in which the character
106
- following the backslash is not listed in Table  [[tab:escape.sequences]]
107
- are conditionally-supported, with *implementation-defined* semantics. An
108
- escape sequence specifies a single character.
109
 
110
- **Table: Escape sequences** <a id="tab:escape.sequences">[tab:escape.sequences]</a>
111
 
112
  | | | |
113
  | --------------- | -------------- | ------------------ |
114
  | new-line | NL(LF) | `\n` |
115
  | horizontal tab | HT | `\t` |
@@ -132,25 +143,25 @@ desired character. The escape `\x\numconst{hhh}` consists of the
132
  backslash followed by `x` followed by one or more hexadecimal digits
133
  that are taken to specify the value of the desired character. There is
134
  no limit to the number of digits in a hexadecimal sequence. A sequence
135
  of octal or hexadecimal digits is terminated by the first character that
136
  is not an octal digit or a hexadecimal digit, respectively. The value of
137
- a character literal is *implementation-defined* if it falls outside of
138
- the *implementation-defined* range defined for `char` (for character
139
- literals with no prefix) or `wchar_t` (for character literals prefixed
140
- by `L`).
141
 
142
- [*Note 2*: If the value of a character literal prefixed by `u`, `u8`,
143
  or `U` is outside the range defined for its type, the program is
144
  ill-formed. — *end note*]
145
 
146
  A *universal-character-name* is translated to the encoding, in the
147
  appropriate execution character set, of the character named. If there is
148
  no such encoding, the *universal-character-name* is translated to an
149
  *implementation-defined* encoding.
150
 
151
- [*Note 3*: In translation phase 1, a *universal-character-name* is
152
  introduced whenever an actual extended character is encountered in the
153
  source text. Therefore, all extended characters are described in terms
154
  of *universal-character-name*s. However, the actual compiler
155
  implementation may use its own native character set, so long as the same
156
  results are obtained. — *end note*]
 
14
  c-char-sequence:
15
  c-char
16
  c-char-sequence c-char
17
  ```
18
 
19
+ ``` bnf
20
+ c-char:
21
+ any member of the basic source character set except the single-quote ''', backslash '\', or new-line character
22
+ escape-sequence
23
+ universal-character-name
24
+ ```
25
+
26
  ``` bnf
27
  escape-sequence:
28
  simple-escape-sequence
29
  octal-escape-sequence
30
  hexadecimal-escape-sequence
 
47
  hexadecimal-escape-sequence:
48
  '\x' hexadecimal-digit
49
  hexadecimal-escape-sequence hexadecimal-digit
50
  ```
51
 
52
+ A *character-literal* that does not begin with `u8`, `u`, `U`, or `L` is
 
 
 
 
53
  an *ordinary character literal*. An ordinary character literal that
54
  contains a single *c-char* representable in the execution character set
55
  has type `char`, with value equal to the numerical value of the encoding
56
  of the *c-char* in the execution character set. An ordinary character
57
+ literal that contains more than one *c-char* is a
58
+ *multicharacter literal*. A multicharacter literal, or an ordinary
59
+ character literal containing a single *c-char* not representable in the
60
+ execution character set, is conditionally-supported, has type `int`, and
61
+ has an *implementation-defined* value.
62
+
63
+ A *character-literal* that begins with `u8`, such as `u8'w'`, is a
64
+ *character-literal* of type `char8_t`, known as a *UTF-8 character
65
+ literal*. The value of a UTF-8 character literal is equal to its ISO/IEC
66
+ 10646 code point value, provided that the code point value can be
67
+ encoded as a single UTF-8 code unit.
68
+
69
+ [*Note 1*: That is, provided the code point value is in the range
70
+ [0, 7F] (hexadecimal). *end note*]
71
+
72
+ If the value is not representable with a single UTF-8 code unit, the
73
+ program is ill-formed. A UTF-8 character literal containing multiple
74
+ *c-char*s is ill-formed.
75
+
76
+ A *character-literal* that begins with the letter `u`, such as `u'x'`,
77
+ is a *character-literal* of type `char16_t`, known as a *UTF-16
78
+ character literal*. The value of a UTF-16 character literal is equal to
79
+ its ISO/IEC 10646 code point value, provided that the code point value
80
+ is representable with a single 16-bit code unit.
81
+
82
+ [*Note 2*: That is, provided the code point value is in the range
83
+ [0, FFFF] (hexadecimal). *end note*]
84
+
85
+ If the value is not representable with a single 16-bit code unit, the
86
+ program is ill-formed. A UTF-16 character literal containing multiple
87
  *c-char*s is ill-formed.
88
 
89
+ A *character-literal* that begins with the letter `U`, such as `U'y'`,
90
+ is a *character-literal* of type `char32_t`, known as a *UTF-32
91
+ character literal*. The value of a UTF-32 character literal containing a
92
+ single *c-char* is equal to its ISO/IEC 10646 code point value. A UTF-32
93
+ character literal containing multiple *c-char*s is ill-formed.
94
+
95
+ A *character-literal* that begins with the letter `L`, such as `L'z'`,
96
+ is a *wide-character literal*. A wide-character literal has type
97
+ `wchar_t`.[^12] The value of a wide-character literal containing a
98
  single *c-char* has value equal to the numerical value of the encoding
99
  of the *c-char* in the execution wide-character set, unless the *c-char*
100
  has no representation in the execution wide-character set, in which case
101
  the value is *implementation-defined*.
102
 
103
+ [*Note 3*: The type `wchar_t` is able to represent all members of the
104
  execution wide-character set (see 
105
  [[basic.fundamental]]). — *end note*]
106
 
107
  The value of a wide-character literal containing multiple *c-char*s is
108
  *implementation-defined*.
109
 
110
  Certain non-graphic characters, the single quote `'`, the double quote
111
+ `"`, the question mark `?`,[^13] and the backslash `\`, can be
112
+ represented according to [[lex.ccon.esc]]. The double quote `"` and the
113
+ question mark `?`, can be represented as themselves or by the escape
114
+ sequences `\"` and `\?` respectively, but the single quote `'` and the
115
+ backslash `\` shall be represented by the escape sequences `\'` and `\\`
116
+ respectively. Escape sequences in which the character following the
117
+ backslash is not listed in [[lex.ccon.esc]] are conditionally-supported,
118
+ with *implementation-defined* semantics. An escape sequence specifies a
119
+ single character.
120
 
121
+ **Table: Escape sequences** <a id="lex.ccon.esc">[lex.ccon.esc]</a>
122
 
123
  | | | |
124
  | --------------- | -------------- | ------------------ |
125
  | new-line | NL(LF) | `\n` |
126
  | horizontal tab | HT | `\t` |
 
143
  backslash followed by `x` followed by one or more hexadecimal digits
144
  that are taken to specify the value of the desired character. There is
145
  no limit to the number of digits in a hexadecimal sequence. A sequence
146
  of octal or hexadecimal digits is terminated by the first character that
147
  is not an octal digit or a hexadecimal digit, respectively. The value of
148
+ a *character-literal* is *implementation-defined* if it falls outside of
149
+ the *implementation-defined* range defined for `char` (for
150
+ *character-literal*s with no prefix) or `wchar_t` (for
151
+ *character-literal*s prefixed by `L`).
152
 
153
+ [*Note 4*: If the value of a *character-literal* prefixed by `u`, `u8`,
154
  or `U` is outside the range defined for its type, the program is
155
  ill-formed. — *end note*]
156
 
157
  A *universal-character-name* is translated to the encoding, in the
158
  appropriate execution character set, of the character named. If there is
159
  no such encoding, the *universal-character-name* is translated to an
160
  *implementation-defined* encoding.
161
 
162
+ [*Note 5*: In translation phase 1, a *universal-character-name* is
163
  introduced whenever an actual extended character is encountered in the
164
  source text. Therefore, all extended characters are described in terms
165
  of *universal-character-name*s. However, the actual compiler
166
  implementation may use its own native character set, so long as the same
167
  results are obtained. — *end note*]