From Jason Turner

[lex.universal.char]

Diff to HTML by rtfpessoa

Files changed (1) hide show
  1. tmp/tmp6h9bn631/{from.md → to.md} +64 -0
tmp/tmp6h9bn631/{from.md → to.md} RENAMED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ### Universal character names <a id="lex.universal.char">[[lex.universal.char]]</a>
2
+
3
+ ``` bnf
4
+ n-char:
5
+ any member of the translation character set except the U+007d (right curly bracket) or new-line character
6
+ ```
7
+
8
+ ``` bnf
9
+ n-char-sequence:
10
+ n-char n-char-sequenceₒₚₜ
11
+ ```
12
+
13
+ ``` bnf
14
+ named-universal-character:
15
+ '\N{' n-char-sequence '}'
16
+ ```
17
+
18
+ ``` bnf
19
+ hex-quad:
20
+ hexadecimal-digit hexadecimal-digit hexadecimal-digit hexadecimal-digit
21
+ ```
22
+
23
+ ``` bnf
24
+ simple-hexadecimal-digit-sequence:
25
+ hexadecimal-digit simple-hexadecimal-digit-sequenceₒₚₜ
26
+ ```
27
+
28
+ ``` bnf
29
+ universal-character-name:
30
+ '\u' hex-quad
31
+ '\U' hex-quad hex-quad
32
+ '\u{' simple-hexadecimal-digit-sequence '}'
33
+ named-universal-character
34
+ ```
35
+
36
+ The *universal-character-name* construct provides a way to name any
37
+ element in the translation character set using just the basic character
38
+ set. If a *universal-character-name* outside the *c-char-sequence*,
39
+ *s-char-sequence*, or *r-char-sequence* of a *character-literal* or
40
+ *string-literal* (in either case, including within a
41
+ *user-defined-literal*) corresponds to a control character or to a
42
+ character in the basic character set, the program is ill-formed.
43
+
44
+ [*Note 1*: A sequence of characters resembling a
45
+ *universal-character-name* in an *r-char-sequence* [[lex.string]] does
46
+ not form a *universal-character-name*. — *end note*]
47
+
48
+ A *universal-character-name* of the form `\u` *hex-quad*, `\U`
49
+ *hex-quad* *hex-quad*, or `\u{simple-hexadecimal-digit-sequence}`
50
+ designates the character in the translation character set whose Unicode
51
+ scalar value is the hexadecimal number represented by the sequence of
52
+ *hexadecimal-digit*s in the *universal-character-name*. The program is
53
+ ill-formed if that number is not a Unicode scalar value.
54
+
55
+ A *universal-character-name* that is a *named-universal-character*
56
+ designates the corresponding character in the Unicode Standard (chapter
57
+ 4.8 Name) if the *n-char-sequence* is equal to its character name or to
58
+ one of its character name aliases of type “control”, “correction”, or
59
+ “alternate”; otherwise, the program is ill-formed.
60
+
61
+ [*Note 2*: These aliases are listed in the Unicode Character Database’s
62
+ `NameAliases.txt`. None of these names or aliases have leading or
63
+ trailing spaces. — *end note*]
64
+