From Jason Turner

[lex.ccon]

Diff to HTML by rtfpessoa

Files changed (1) hide show
  1. tmp/tmpfgmirpl7/{from.md → to.md} +74 -49
tmp/tmpfgmirpl7/{from.md → to.md} RENAMED
@@ -1,13 +1,15 @@
1
  ### Character literals <a id="lex.ccon">[[lex.ccon]]</a>
2
 
3
  ``` bnf
4
  character-literal:
5
- ''' c-char-sequence '''
6
- u''' c-char-sequence '''
7
- U''' c-char-sequence '''
8
- L''' c-char-sequence '''
 
 
9
  ```
10
 
11
  ``` bnf
12
  c-char-sequence:
13
  c-char
@@ -39,46 +41,64 @@ hexadecimal-escape-sequence:
39
  '\x' hexadecimal-digit
40
  hexadecimal-escape-sequence hexadecimal-digit
41
  ```
42
 
43
  A character literal is one or more characters enclosed in single quotes,
44
- as in `'x'`, optionally preceded by one of the letters `u`, `U`, or `L`,
45
- as in `u'y'`, `U'z'`, or `L'x'`, respectively. A character literal that
46
- does not begin with `u`, `U`, or `L` is an ordinary character literal,
47
- also referred to as a narrow-character literal. An ordinary character
48
- literal that contains a single *c-char* representable in the execution
49
- character set has type `char`, with value equal to the numerical value
50
- of the encoding of the *c-char* in the execution character set. An
51
- ordinary character literal that contains more than one *c-char* is a
52
- *multicharacter literal*. A multicharacter literal, or an ordinary
53
- character literal containing a single *c-char* not representable in the
54
- execution character set, is conditionally-supported, has type `int`, and
55
- has an *implementation-defined* value.
56
 
57
- A character literal that begins with the letter `u`, such as `u'y'`, is
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58
  a character literal of type `char16_t`. The value of a `char16_t`
59
- literal containing a single *c-char* is equal to its ISO 10646 code
60
- point value, provided that the code point is representable with a single
61
- 16-bit code unit. (That is, provided it is a basic multi-lingual plane
62
- code point.) If the value is not representable within 16 bits, the
63
- program is ill-formed. A `char16_t` literal containing multiple
64
- *c-char*s is ill-formed. A character literal that begins with the letter
65
- `U`, such as `U'z'`, is a character literal of type `char32_t`. The
66
- value of a `char32_t` literal containing a single *c-char* is equal to
67
- its ISO 10646 code point value. A `char32_t` literal containing multiple
68
- *c-char*s is ill-formed. A character literal that begins with the letter
69
- `L`, such as `L'x'`, is a wide-character literal. A wide-character
70
- literal has type `wchar_t`.[^13] The value of a wide-character literal
71
- containing a single *c-char* has value equal to the numerical value of
72
- the encoding of the *c-char* in the execution wide-character set, unless
73
- the *c-char* has no representation in the execution wide-character set,
74
- in which case the value is *implementation-defined*. The type `wchar_t`
75
- is able to represent all members of the execution wide-character set
76
- (see  [[basic.fundamental]]). . The value of a wide-character literal
77
- containing multiple *c-char*s is *implementation-defined*.
78
 
79
- Certain nongraphic characters, the single quote `'`, the double quote
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80
  `"`, the question mark `?`,[^14] and the backslash `\`, can be
81
  represented according to Table  [[tab:escape.sequences]]. The double
82
  quote `"` and the question mark `?`, can be represented as themselves or
83
  by the escape sequences `\"` and `\?` respectively, but the single quote
84
  `'` and the backslash `\` shall be represented by the escape sequences
@@ -113,20 +133,25 @@ backslash followed by `x` followed by one or more hexadecimal digits
113
  that are taken to specify the value of the desired character. There is
114
  no limit to the number of digits in a hexadecimal sequence. A sequence
115
  of octal or hexadecimal digits is terminated by the first character that
116
  is not an octal digit or a hexadecimal digit, respectively. The value of
117
  a character literal is *implementation-defined* if it falls outside of
118
- the implementation-defined range defined for `char` (for literals with
119
- no prefix), `char16_t` (for literals prefixed by `'u'`), `char32_t` (for
120
- literals prefixed by `'U'`), or `wchar_t` (for literals prefixed by
121
- `'L'`).
122
 
123
- A universal-character-name is translated to the encoding, in the
 
 
 
 
124
  appropriate execution character set, of the character named. If there is
125
- no such encoding, the universal-character-name is translated to an
126
- *implementation-defined* encoding. In translation phase 1, a
127
- universal-character-name is introduced whenever an actual extended
128
- character is encountered in the source text. Therefore, all extended
129
- characters are described in terms of universal-character-names. However,
130
- the actual compiler implementation may use its own native character set,
131
- so long as the same results are obtained.
 
 
132
 
 
1
  ### Character literals <a id="lex.ccon">[[lex.ccon]]</a>
2
 
3
  ``` bnf
4
  character-literal:
5
+ encoding-prefixₒₚₜ ''' c-char-sequence '''
6
+ ```
7
+
8
+ ``` bnf
9
+ encoding-prefix: one of
10
+ 'u8' 'u' 'U' 'L'
11
  ```
12
 
13
  ``` bnf
14
  c-char-sequence:
15
  c-char
 
41
  '\x' hexadecimal-digit
42
  hexadecimal-escape-sequence hexadecimal-digit
43
  ```
44
 
45
  A character literal is one or more characters enclosed in single quotes,
46
+ as in `'x'`, optionally preceded by `u8`, `u`, `U`, or `L`, as in
47
+ `u8'w'`, `u'x'`, `U'y'`, or `L'z'`, respectively.
 
 
 
 
 
 
 
 
 
 
48
 
49
+ A character literal that does not begin with `u8`, `u`, `U`, or `L` is
50
+ an *ordinary character literal*. An ordinary character literal that
51
+ contains a single *c-char* representable in the execution character set
52
+ has type `char`, with value equal to the numerical value of the encoding
53
+ of the *c-char* in the execution character set. An ordinary character
54
+ literal that contains more than one *c-char* is a *multicharacter
55
+ literal*. A multicharacter literal, or an ordinary character literal
56
+ containing a single *c-char* not representable in the execution
57
+ character set, is conditionally-supported, has type `int`, and has an
58
+ *implementation-defined* value.
59
+
60
+ A character literal that begins with `u8`, such as `u8'w'`, is a
61
+ character literal of type `char`, known as a *UTF-8 character literal*.
62
+ The value of a UTF-8 character literal is equal to its ISO 10646 code
63
+ point value, provided that the code point value is representable with a
64
+ single UTF-8 code unit (that is, provided it is in the C0 Controls and
65
+ Basic Latin Unicode block). If the value is not representable with a
66
+ single UTF-8 code unit, the program is ill-formed. A UTF-8 character
67
+ literal containing multiple *c-char*s is ill-formed.
68
+
69
+ A character literal that begins with the letter `u`, such as `u'x'`, is
70
  a character literal of type `char16_t`. The value of a `char16_t`
71
+ character literal containing a single *c-char* is equal to its ISO 10646
72
+ code point value, provided that the code point is representable with a
73
+ single 16-bit code unit. (That is, provided it is a basic multi-lingual
74
+ plane code point.) If the value is not representable within 16 bits, the
75
+ program is ill-formed. A `char16_t` character literal containing
76
+ multiple *c-char*s is ill-formed.
 
 
 
 
 
 
 
 
 
 
 
 
 
77
 
78
+ A character literal that begins with the letter `U`, such as `U'y'`, is
79
+ a character literal of type `char32_t`. The value of a `char32_t`
80
+ character literal containing a single *c-char* is equal to its ISO 10646
81
+ code point value. A `char32_t` character literal containing multiple
82
+ *c-char*s is ill-formed.
83
+
84
+ A character literal that begins with the letter `L`, such as `L'z'`, is
85
+ a *wide-character literal*. A wide-character literal has type
86
+ `wchar_t`.[^13] The value of a wide-character literal containing a
87
+ single *c-char* has value equal to the numerical value of the encoding
88
+ of the *c-char* in the execution wide-character set, unless the *c-char*
89
+ has no representation in the execution wide-character set, in which case
90
+ the value is *implementation-defined*.
91
+
92
+ [*Note 1*: The type `wchar_t` is able to represent all members of the
93
+ execution wide-character set (see 
94
+ [[basic.fundamental]]). — *end note*]
95
+
96
+ The value of a wide-character literal containing multiple *c-char*s is
97
+ *implementation-defined*.
98
+
99
+ Certain non-graphic characters, the single quote `'`, the double quote
100
  `"`, the question mark `?`,[^14] and the backslash `\`, can be
101
  represented according to Table  [[tab:escape.sequences]]. The double
102
  quote `"` and the question mark `?`, can be represented as themselves or
103
  by the escape sequences `\"` and `\?` respectively, but the single quote
104
  `'` and the backslash `\` shall be represented by the escape sequences
 
133
  that are taken to specify the value of the desired character. There is
134
  no limit to the number of digits in a hexadecimal sequence. A sequence
135
  of octal or hexadecimal digits is terminated by the first character that
136
  is not an octal digit or a hexadecimal digit, respectively. The value of
137
  a character literal is *implementation-defined* if it falls outside of
138
+ the *implementation-defined* range defined for `char` (for character
139
+ literals with no prefix) or `wchar_t` (for character literals prefixed
140
+ by `L`).
 
141
 
142
+ [*Note 2*: If the value of a character literal prefixed by `u`, `u8`,
143
+ or `U` is outside the range defined for its type, the program is
144
+ ill-formed. — *end note*]
145
+
146
+ A *universal-character-name* is translated to the encoding, in the
147
  appropriate execution character set, of the character named. If there is
148
+ no such encoding, the *universal-character-name* is translated to an
149
+ *implementation-defined* encoding.
150
+
151
+ [*Note 3*: In translation phase 1, a *universal-character-name* is
152
+ introduced whenever an actual extended character is encountered in the
153
+ source text. Therefore, all extended characters are described in terms
154
+ of *universal-character-name*s. However, the actual compiler
155
+ implementation may use its own native character set, so long as the same
156
+ results are obtained. — *end note*]
157