[lex.name] - C++20 → C++23

Files changed (1) hide show

tmp/tmpuz57668h/{from.md → to.md} +25 -33

tmp/tmpuz57668h/{from.md → to.md} RENAMED Viewed

@@ -1,18 +1,24 @@
 ## Identifiers <a id="lex.name">[[lex.name]]</a>
 ``` bnf
 identifier:
-    identifier-nondigit
-    identifier identifier-nondigit
-    identifier digit
 ```
 ``` bnf
-identifier-nondigit:
     nondigit
- universal-character-name
 ```
 ``` bnf
 nondigit: one of
     'a b c d e f g h i j k l m'
@@ -24,51 +30,37 @@ nondigit: one of
 ``` bnf
 digit: one of
     '0 1 2 3 4 5 6 7 8 9'
 ```
-An identifier is an arbitrarily long sequence of letters and digits.
-Each *universal-character-name* in an identifier shall designate a
-character whose encoding in ISO/IEC 10646 falls into one of the ranges
-specified in [[lex.name.allowed]]. The initial element shall not be a
-*universal-character-name* designating a character whose encoding falls
-into one of the ranges specified in [[lex.name.disallowed]]. Upper- and
-lower-case letters are different. All characters are significant.[^10]
-**Table: Ranges of characters allowed** <a id="lex.name.allowed">[lex.name.allowed]</a>
-|               |               |               |               |               |
-| ------------- | ------------- | ------------- | ------------- | ------------- |
-| `00A8`        | `00AA`        | `00AD`        | `00AF`        | `00B2-00B5`   |
-| `00B7-00BA`   | `00BC-00BE`   | `00C0-00D6`   | `00D8-00F6`   | `00F8-00FF`   |
-| `0100-167F`   | `1681-180D`   | `180F-1FFF`   |               |               |
-| `200B-200D`   | `202A-202E`   | `203F-2040`   | `2054`        | `2060-206F`   |
-| `2070-218F`   | `2460-24FF`   | `2776-2793`   | `2C00-2DFF`   | `2E80-2FFF`   |
-| `3004-3007`   | `3021-302F`   | `3031-D7FF`   |               |               |
-| `F900-FD3D`   | `FD40-FDCF`   | `FDF0-FE44`   | `FE47-FFFD`   |               |
-| `10000-1FFFD` | `20000-2FFFD` | `30000-3FFFD` | `40000-4FFFD` | `50000-5FFFD` |
-| `60000-6FFFD` | `70000-7FFFD` | `80000-8FFFD` | `90000-9FFFD` | `A0000-AFFFD` |
-| `B0000-BFFFD` | `C0000-CFFFD` | `D0000-DFFFD` | `E0000-EFFFD` |               |
-**Table: Ranges of characters disallowed initially (combining characters)** <a id="lex.name.disallowed">[lex.name.disallowed]</a>
-|             |                                                |             |             |
-| ----------- | ---------------------------------------------- | ----------- | ----------- |
-| `0300-036F` | % FIXME: Unicode v7 adds 1AB0-1AFF `1DC0-1DFF` | `20D0-20FF` | `FE20-FE2F` |
 The identifiers in [[lex.name.special]] have a special meaning when
 appearing in a certain context. When referred to in the grammar, these
 identifiers are used explicitly rather than using the *identifier*
 grammar production. Unless otherwise specified, any ambiguity as to
 whether a given *identifier* has a special meaning is resolved to
 interpret the token as a regular *identifier*.
-In addition, some identifiers are reserved for use by C++
-implementations and shall not be used otherwise; no diagnostic is
-required.
 - Each identifier that contains a double underscore `__` or begins with
   an underscore followed by an uppercase letter is reserved to the
   implementation for any use.
 - Each identifier that begins with an underscore is reserved to the

 ## Identifiers <a id="lex.name">[[lex.name]]</a>
 ``` bnf
 identifier:
+    identifier-start
+    identifier identifier-continue
 ```
 ``` bnf
+identifier-start:
     nondigit
+ an element of the translation character set with the Unicode property XID_Start
+```
+``` bnf
+identifier-continue:
+    digit
+    nondigit
+    an element of the translation character set with the Unicode property XID_Continue
 ```
 ``` bnf
 nondigit: one of
     'a b c d e f g h i j k l m'
 ``` bnf
 digit: one of
     '0 1 2 3 4 5 6 7 8 9'
 ```
+[*Note 1*:
+The character properties XID_Start and XID_Continue are Derived Core
+Properties as described by UAX \#44 of the Unicode Standard.[^7]
+— *end note*]
+The program is ill-formed if an *identifier* does not conform to
+Normalization Form C as specified in the Unicode Standard.
+[*Note 2*: Identifiers are case-sensitive. — *end note*]
+[*Note 3*: In translation phase 4, *identifier* also includes those
+*preprocessing-token*s [[lex.pptoken]] differentiated as keywords
+[[lex.key]] in the later translation phase 7
+[[lex.token]]. — *end note*]
 The identifiers in [[lex.name.special]] have a special meaning when
 appearing in a certain context. When referred to in the grammar, these
 identifiers are used explicitly rather than using the *identifier*
 grammar production. Unless otherwise specified, any ambiguity as to
 whether a given *identifier* has a special meaning is resolved to
 interpret the token as a regular *identifier*.
+In addition, some identifiers appearing as a *token* or
+*preprocessing-token* are reserved for use by C++ implementations and
+shall not be used otherwise; no diagnostic is required.
 - Each identifier that contains a double underscore `__` or begins with
   an underscore followed by an uppercase letter is reserved to the
   implementation for any use.
 - Each identifier that begins with an underscore is reserved to the

Diff to HTML by rtfpessoa