[lex.pptoken] - C++23 → Trunk

Files changed (1) hide show

tmp/tmp74xuroty/{from.md → to.md} +29 -19

tmp/tmp74xuroty/{from.md → to.md} RENAMED Viewed

@@ -14,27 +14,22 @@ preprocessing-token:
     user-defined-string-literal
     preprocessing-op-or-punc
     each non-whitespace character that cannot be one of the above
 ```
-Each preprocessing token that is converted to a token [[lex.token]]
-shall have the lexical form of a keyword, an identifier, a literal, or
-an operator or punctuator.
 A preprocessing token is the minimal lexical element of the language in
 translation phases 3 through 6. In this document, glyphs are used to
 identify elements of the basic character set [[lex.charset]]. The
 categories of preprocessing token are: header names, placeholder tokens
 produced by preprocessing `import` and `module` directives
 (*import-keyword*, *module-keyword*, and *export-keyword*), identifiers,
 preprocessing numbers, character literals (including user-defined
 character literals), string literals (including user-defined string
 literals), preprocessing operators and punctuators, and single
 non-whitespace characters that do not lexically match the other
-preprocessing token categories. If a U+0027 (apostrophe) or a
-U+0022 (quotation mark) character matches the last category, the
-behavior is undefined. If any character not in the basic character set
 matches the last category, the program is ill-formed. Preprocessing
 tokens can be separated by whitespace; this consists of comments
 [[lex.comment]], or whitespace characters (U+0020 (space),
 U+0009 (character tabulation), new-line, U+000b (line tabulation), and
 U+000c (form feed)), or both. As described in [[cpp]], in certain
@@ -42,10 +37,21 @@ circumstances during translation phase 4, whitespace (or the absence
 thereof) serves as more than preprocessing token separation. Whitespace
 can appear within a preprocessing token only as part of a header name or
 between the quotation characters in a character literal or string
 literal.
 If the input stream has been parsed into preprocessing tokens up to a
 given character:
 - If the next character begins a sequence of characters that could be
   the prefix and initial double quote of a raw string literal, such as
@@ -61,34 +67,38 @@ given character:
   ```
 - Otherwise, if the next three characters are `<::` and the subsequent
   character is neither `:` nor `>`, the `<` is treated as a
   preprocessing token by itself and not as the first character of the
   alternative token `<:`.
 - Otherwise, the next preprocessing token is the longest sequence of
   characters that could constitute a preprocessing token, even if that
-  would cause further lexical analysis to fail, except that a
-  *header-name* [[lex.header]] is only formed
-  - after the `include` or `import` preprocessing token in an `#include`
-    [[cpp.include]] or `import` [[cpp.import]] directive, or
- - within a *has-include-expression*.
 [*Example 1*:
 ``` cpp
 #define R "x"
 const char* s = R"y";           // ill-formed raw string, not "x" "y"
 ```
 — *end example*]
-The *import-keyword* is produced by processing an `import` directive
-[[cpp.import]], the *module-keyword* is produced by preprocessing a
-`module` directive [[cpp.module]], and the *export-keyword* is produced
-by preprocessing either of the previous two directives.
-[*Note 1*: None has any observable spelling. — *end note*]
 [*Example 2*: The program fragment `0xe+foo` is parsed as a
 preprocessing number token (one that is not a valid *integer-literal* or
 *floating-point-literal* token), even though a parse as three
 preprocessing tokens `0xe`, `+`, and `foo` can produce a valid
 expression (for example, if `foo` is a macro defined as `1`). Similarly,

     user-defined-string-literal
     preprocessing-op-or-punc
     each non-whitespace character that cannot be one of the above
 ```
 A preprocessing token is the minimal lexical element of the language in
 translation phases 3 through 6. In this document, glyphs are used to
 identify elements of the basic character set [[lex.charset]]. The
 categories of preprocessing token are: header names, placeholder tokens
 produced by preprocessing `import` and `module` directives
 (*import-keyword*, *module-keyword*, and *export-keyword*), identifiers,
 preprocessing numbers, character literals (including user-defined
 character literals), string literals (including user-defined string
 literals), preprocessing operators and punctuators, and single
 non-whitespace characters that do not lexically match the other
+preprocessing token categories. If a U+0027 (apostrophe), a
+U+0022 (quotation mark), or any character not in the basic character set
 matches the last category, the program is ill-formed. Preprocessing
 tokens can be separated by whitespace; this consists of comments
 [[lex.comment]], or whitespace characters (U+0020 (space),
 U+0009 (character tabulation), new-line, U+000b (line tabulation), and
 U+000c (form feed)), or both. As described in [[cpp]], in certain
 thereof) serves as more than preprocessing token separation. Whitespace
 can appear within a preprocessing token only as part of a header name or
 between the quotation characters in a character literal or string
 literal.
+Each preprocessing token that is converted to a token [[lex.token]]
+shall have the lexical form of a keyword, an identifier, a literal, or
+an operator or punctuator.
+The *import-keyword* is produced by processing an `import` directive
+[[cpp.import]], the *module-keyword* is produced by preprocessing a
+`module` directive [[cpp.module]], and the *export-keyword* is produced
+by preprocessing either of the previous two directives.
+[*Note 1*: None has any observable spelling. — *end note*]
 If the input stream has been parsed into preprocessing tokens up to a
 given character:
 - If the next character begins a sequence of characters that could be
   the prefix and initial double quote of a raw string literal, such as
   ```
 - Otherwise, if the next three characters are `<::` and the subsequent
   character is neither `:` nor `>`, the `<` is treated as a
   preprocessing token by itself and not as the first character of the
   alternative token `<:`.
+- Otherwise, if the next three characters are `[::` and the subsequent
+  character is not `:`, or if the next three characters are `[:>`, the
+  `[` is treated as a preprocessing token by itself and not as the first
+  character of the preprocessing token `[:`. \[*Note 2*: The tokens `[:`
+  and `:]` cannot be composed from digraphs. — *end note*]
 - Otherwise, the next preprocessing token is the longest sequence of
   characters that could constitute a preprocessing token, even if that
+  would cause further lexical analysis to fail, except that
+ - a *string-literal* token is never formed when a *header-name* token
+    can be formed, and
+  - a *header-name* [[lex.header]] is only formed
+ - immediately after the `include`, `embed`, or `import`
+      preprocessing token in a `#include` [[cpp.include]], `#embed`
+      [[cpp.embed]], or `import` [[cpp.import]] directive, respectively,
+      or
+    - immediately after a preprocessing token sequence of
+      `__has_include` or `__has_embed` immediately followed by `(` in a
+      `#if`, `#elif`, or `#embed` directive [[cpp.cond]], [[cpp.embed]].
 [*Example 1*:
 ``` cpp
 #define R "x"
 const char* s = R"y";           // ill-formed raw string, not "x" "y"
 ```
 — *end example*]
 [*Example 2*: The program fragment `0xe+foo` is parsed as a
 preprocessing number token (one that is not a valid *integer-literal* or
 *floating-point-literal* token), even though a parse as three
 preprocessing tokens `0xe`, `+`, and `foo` can produce a valid
 expression (for example, if `foo` is a macro defined as `1`). Similarly,

Diff to HTML by rtfpessoa