[lex.pptoken] - C++20 → C++23

Files changed (1) hide show

tmp/tmpj4_1oqdi/{from.md → to.md} +34 -29

tmp/tmpj4_1oqdi/{from.md → to.md} RENAMED Viewed

@@ -11,48 +11,53 @@ preprocessing-token:
     character-literal
     user-defined-character-literal
     string-literal
     user-defined-string-literal
     preprocessing-op-or-punc
-    each non-white-space character that cannot be one of the above
 ```
 Each preprocessing token that is converted to a token [[lex.token]]
 shall have the lexical form of a keyword, an identifier, a literal, or
 an operator or punctuator.
 A preprocessing token is the minimal lexical element of the language in
-translation phases 3 through 6. The categories of preprocessing token
-are: header names, placeholder tokens produced by preprocessing `import`
-and `module` directives (*import-keyword*, *module-keyword*, and
-*export-keyword*), identifiers, preprocessing numbers, character
-literals (including user-defined character literals), string literals
-(including user-defined string literals), preprocessing operators and
-punctuators, and single non-white-space characters that do not lexically
-match the other preprocessing token categories. If a `'` or a `"`
-character matches the last category, the behavior is undefined.
-Preprocessing tokens can be separated by white space; this consists of
-comments [[lex.comment]], or white-space characters (space, horizontal
-tab, new-line, vertical tab, and form-feed), or both. As described in
-[[cpp]], in certain circumstances during translation phase 4, white
-space (or the absence thereof) serves as more than preprocessing token
-separation. White space can appear within a preprocessing token only as
-part of a header name or between the quotation characters in a character
-literal or string literal.
 If the input stream has been parsed into preprocessing tokens up to a
 given character:
 - If the next character begins a sequence of characters that could be
   the prefix and initial double quote of a raw string literal, such as
   `R"`, the next preprocessing token shall be a raw string literal.
   Between the initial and final double quote characters of the raw
-  string, any transformations performed in phases 1 and 2
- (*universal-character-name*s and line splicing) are reverted; this
- reversion shall apply before any *d-char*, *r-char*, or delimiting
- parenthesis is identified. The raw string literal is defined as the
- shortest sequence of characters that matches the raw-string pattern
   ``` bnf
   encoding-prefixₒₚₜ 'R' raw-string
   ```
 - Otherwise, if the next three characters are `<::` and the subsequent
   character is neither `:` nor `>`, the `<` is treated as a
@@ -83,16 +88,16 @@ by preprocessing either of the previous two directives.
 [*Note 1*: None has any observable spelling. — *end note*]
 [*Example 2*: The program fragment `0xe+foo` is parsed as a
 preprocessing number token (one that is not a valid *integer-literal* or
 *floating-point-literal* token), even though a parse as three
-preprocessing tokens `0xe`, `+`, and `foo` might produce a valid
-expression (for example, if `foo` were a macro defined as `1`).
-Similarly, the program fragment `1E1` is parsed as a preprocessing
-number (one that is a valid *floating-point-literal* token), whether or
-not `E` is a macro name. — *end example*]
 [*Example 3*: The program fragment `x+++++y` is parsed as `x
 ++ ++ + y`, which, if `x` and `y` have integral types, violates a
 constraint on increment operators, even though the parse `x ++ + ++ y`
-might yield a correct expression. — *end example*]

     character-literal
     user-defined-character-literal
     string-literal
     user-defined-string-literal
     preprocessing-op-or-punc
+    each non-whitespace character that cannot be one of the above
 ```
 Each preprocessing token that is converted to a token [[lex.token]]
 shall have the lexical form of a keyword, an identifier, a literal, or
 an operator or punctuator.
 A preprocessing token is the minimal lexical element of the language in
+translation phases 3 through 6. In this document, glyphs are used to
+identify elements of the basic character set [[lex.charset]]. The
+categories of preprocessing token are: header names, placeholder tokens
+produced by preprocessing `import` and `module` directives
+(*import-keyword*, *module-keyword*, and *export-keyword*), identifiers,
+preprocessing numbers, character literals (including user-defined
+character literals), string literals (including user-defined string
+literals), preprocessing operators and punctuators, and single
+non-whitespace characters that do not lexically match the other
+preprocessing token categories. If a U+0027 (apostrophe) or a
+U+0022 (quotation mark) character matches the last category, the
+behavior is undefined. If any character not in the basic character set
+matches the last category, the program is ill-formed. Preprocessing
+tokens can be separated by whitespace; this consists of comments
+[[lex.comment]], or whitespace characters (U+0020 (space),
+U+0009 (character tabulation), new-line, U+000b (line tabulation), and
+U+000c (form feed)), or both. As described in [[cpp]], in certain
+circumstances during translation phase 4, whitespace (or the absence
+thereof) serves as more than preprocessing token separation. Whitespace
+can appear within a preprocessing token only as part of a header name or
+between the quotation characters in a character literal or string
+literal.
 If the input stream has been parsed into preprocessing tokens up to a
 given character:
 - If the next character begins a sequence of characters that could be
   the prefix and initial double quote of a raw string literal, such as
   `R"`, the next preprocessing token shall be a raw string literal.
   Between the initial and final double quote characters of the raw
+  string, any transformations performed in phase 2 (line splicing) are
+ reverted; this reversion shall apply before any *d-char*, *r-char*, or
+ delimiting parenthesis is identified. The raw string literal is
+ defined as the shortest sequence of characters that matches the
+  raw-string pattern
   ``` bnf
   encoding-prefixₒₚₜ 'R' raw-string
   ```
 - Otherwise, if the next three characters are `<::` and the subsequent
   character is neither `:` nor `>`, the `<` is treated as a
 [*Note 1*: None has any observable spelling. — *end note*]
 [*Example 2*: The program fragment `0xe+foo` is parsed as a
 preprocessing number token (one that is not a valid *integer-literal* or
 *floating-point-literal* token), even though a parse as three
+preprocessing tokens `0xe`, `+`, and `foo` can produce a valid
+expression (for example, if `foo` is a macro defined as `1`). Similarly,
+the program fragment `1E1` is parsed as a preprocessing number (one that
+is a valid *floating-point-literal* token), whether or not `E` is a
+macro name. — *end example*]
 [*Example 3*: The program fragment `x+++++y` is parsed as `x
 ++ ++ + y`, which, if `x` and `y` have integral types, violates a
 constraint on increment operators, even though the parse `x ++ + ++ y`
+can yield a correct expression. — *end example*]

Diff to HTML by rtfpessoa