From Jason Turner

[lex.charset]

Diff to HTML by rtfpessoa

Files changed (1) hide show
  1. tmp/tmpvrqdruyf/{from.md → to.md} +18 -12
tmp/tmpvrqdruyf/{from.md → to.md} RENAMED
@@ -24,22 +24,28 @@ hex-quad:
24
  universal-character-name:
25
  '\u' hex-quad
26
  '\U' hex-quad hex-quad
27
  ```
28
 
29
- The character designated by the *universal-character-name* `\UNNNNNNNN`
30
- is that character whose character short name in ISO/IEC 10646 is
31
- `NNNNNNNN`; the character designated by the *universal-character-name*
32
- `\uNNNN` is that character whose character short name in ISO/IEC 10646
33
- is `0000NNNN`. If the hexadecimal value for a *universal-character-name*
34
- corresponds to a surrogate code point (in the range 0xD800–0xDFFF,
35
- inclusive), the program is ill-formed. Additionally, if the hexadecimal
36
- value for a *universal-character-name* outside the *c-char-sequence*,
37
- *s-char-sequence*, or *r-char-sequence* of a character or string literal
38
- corresponds to a control character (in either of the ranges 0x00–0x1F or
39
- 0x7F–0x9F, both inclusive) or to a character in the basic source
40
- character set, the program is ill-formed.[^5]
 
 
 
 
 
 
41
 
42
  The *basic execution character set* and the *basic execution
43
  wide-character set* shall each contain all the members of the basic
44
  source character set, plus control characters representing alert,
45
  backspace, and carriage return, plus a *null character* (respectively,
 
24
  universal-character-name:
25
  '\u' hex-quad
26
  '\U' hex-quad hex-quad
27
  ```
28
 
29
+ A *universal-character-name* designates the character in ISO/IEC 10646
30
+ (if any) whose code point is the hexadecimal number represented by the
31
+ sequence of *hexadecimal-digit*s in the *universal-character-name*. The
32
+ program is ill-formed if that number is not a code point or if it is a
33
+ surrogate code point. Noncharacter code points and reserved code points
34
+ are considered to designate separate characters distinct from any
35
+ ISO/IEC 10646 character. If a *universal-character-name* outside the
36
+ *c-char-sequence*, *s-char-sequence*, or *r-char-sequence* of a
37
+ *character-literal* or *string-literal* (in either case, including
38
+ within a *user-defined-literal*) corresponds to a control character or
39
+ to a character in the basic source character set, the program is
40
+ ill-formed.[^5]
41
+
42
+ [*Note 1*: ISO/IEC 10646 code points are integers in the range
43
+ [0, 10FFFF] (hexadecimal). A surrogate code point is a value in the
44
+ range [D800, DFFF] (hexadecimal). A control character is a character
45
+ whose code point is in either of the ranges [0, 1F] or [7F, 9F]
46
+ (hexadecimal). — *end note*]
47
 
48
  The *basic execution character set* and the *basic execution
49
  wide-character set* shall each contain all the members of the basic
50
  source character set, plus control characters representing alert,
51
  backspace, and carriage return, plus a *null character* (respectively,