From Jason Turner

[lex.pptoken]

Diff to HTML by rtfpessoa

Files changed (1) hide show
  1. tmp/tmp74xuroty/{from.md → to.md} +29 -19
tmp/tmp74xuroty/{from.md → to.md} RENAMED
@@ -14,27 +14,22 @@ preprocessing-token:
14
  user-defined-string-literal
15
  preprocessing-op-or-punc
16
  each non-whitespace character that cannot be one of the above
17
  ```
18
 
19
- Each preprocessing token that is converted to a token [[lex.token]]
20
- shall have the lexical form of a keyword, an identifier, a literal, or
21
- an operator or punctuator.
22
-
23
  A preprocessing token is the minimal lexical element of the language in
24
  translation phases 3 through 6. In this document, glyphs are used to
25
  identify elements of the basic character set [[lex.charset]]. The
26
  categories of preprocessing token are: header names, placeholder tokens
27
  produced by preprocessing `import` and `module` directives
28
  (*import-keyword*, *module-keyword*, and *export-keyword*), identifiers,
29
  preprocessing numbers, character literals (including user-defined
30
  character literals), string literals (including user-defined string
31
  literals), preprocessing operators and punctuators, and single
32
  non-whitespace characters that do not lexically match the other
33
- preprocessing token categories. If a U+0027 (apostrophe) or a
34
- U+0022 (quotation mark) character matches the last category, the
35
- behavior is undefined. If any character not in the basic character set
36
  matches the last category, the program is ill-formed. Preprocessing
37
  tokens can be separated by whitespace; this consists of comments
38
  [[lex.comment]], or whitespace characters (U+0020 (space),
39
  U+0009 (character tabulation), new-line, U+000b (line tabulation), and
40
  U+000c (form feed)), or both. As described in [[cpp]], in certain
@@ -42,10 +37,21 @@ circumstances during translation phase 4, whitespace (or the absence
42
  thereof) serves as more than preprocessing token separation. Whitespace
43
  can appear within a preprocessing token only as part of a header name or
44
  between the quotation characters in a character literal or string
45
  literal.
46
 
 
 
 
 
 
 
 
 
 
 
 
47
  If the input stream has been parsed into preprocessing tokens up to a
48
  given character:
49
 
50
  - If the next character begins a sequence of characters that could be
51
  the prefix and initial double quote of a raw string literal, such as
@@ -61,34 +67,38 @@ given character:
61
  ```
62
  - Otherwise, if the next three characters are `<::` and the subsequent
63
  character is neither `:` nor `>`, the `<` is treated as a
64
  preprocessing token by itself and not as the first character of the
65
  alternative token `<:`.
 
 
 
 
 
66
  - Otherwise, the next preprocessing token is the longest sequence of
67
  characters that could constitute a preprocessing token, even if that
68
- would cause further lexical analysis to fail, except that a
69
- *header-name* [[lex.header]] is only formed
70
- - after the `include` or `import` preprocessing token in an `#include`
71
- [[cpp.include]] or `import` [[cpp.import]] directive, or
72
- - within a *has-include-expression*.
 
 
 
 
 
 
73
 
74
  [*Example 1*:
75
 
76
  ``` cpp
77
  #define R "x"
78
  const char* s = R"y"; // ill-formed raw string, not "x" "y"
79
  ```
80
 
81
  — *end example*]
82
 
83
- The *import-keyword* is produced by processing an `import` directive
84
- [[cpp.import]], the *module-keyword* is produced by preprocessing a
85
- `module` directive [[cpp.module]], and the *export-keyword* is produced
86
- by preprocessing either of the previous two directives.
87
-
88
- [*Note 1*: None has any observable spelling. — *end note*]
89
-
90
  [*Example 2*: The program fragment `0xe+foo` is parsed as a
91
  preprocessing number token (one that is not a valid *integer-literal* or
92
  *floating-point-literal* token), even though a parse as three
93
  preprocessing tokens `0xe`, `+`, and `foo` can produce a valid
94
  expression (for example, if `foo` is a macro defined as `1`). Similarly,
 
14
  user-defined-string-literal
15
  preprocessing-op-or-punc
16
  each non-whitespace character that cannot be one of the above
17
  ```
18
 
 
 
 
 
19
  A preprocessing token is the minimal lexical element of the language in
20
  translation phases 3 through 6. In this document, glyphs are used to
21
  identify elements of the basic character set [[lex.charset]]. The
22
  categories of preprocessing token are: header names, placeholder tokens
23
  produced by preprocessing `import` and `module` directives
24
  (*import-keyword*, *module-keyword*, and *export-keyword*), identifiers,
25
  preprocessing numbers, character literals (including user-defined
26
  character literals), string literals (including user-defined string
27
  literals), preprocessing operators and punctuators, and single
28
  non-whitespace characters that do not lexically match the other
29
+ preprocessing token categories. If a U+0027 (apostrophe), a
30
+ U+0022 (quotation mark), or any character not in the basic character set
 
31
  matches the last category, the program is ill-formed. Preprocessing
32
  tokens can be separated by whitespace; this consists of comments
33
  [[lex.comment]], or whitespace characters (U+0020 (space),
34
  U+0009 (character tabulation), new-line, U+000b (line tabulation), and
35
  U+000c (form feed)), or both. As described in [[cpp]], in certain
 
37
  thereof) serves as more than preprocessing token separation. Whitespace
38
  can appear within a preprocessing token only as part of a header name or
39
  between the quotation characters in a character literal or string
40
  literal.
41
 
42
+ Each preprocessing token that is converted to a token [[lex.token]]
43
+ shall have the lexical form of a keyword, an identifier, a literal, or
44
+ an operator or punctuator.
45
+
46
+ The *import-keyword* is produced by processing an `import` directive
47
+ [[cpp.import]], the *module-keyword* is produced by preprocessing a
48
+ `module` directive [[cpp.module]], and the *export-keyword* is produced
49
+ by preprocessing either of the previous two directives.
50
+
51
+ [*Note 1*: None has any observable spelling. — *end note*]
52
+
53
  If the input stream has been parsed into preprocessing tokens up to a
54
  given character:
55
 
56
  - If the next character begins a sequence of characters that could be
57
  the prefix and initial double quote of a raw string literal, such as
 
67
  ```
68
  - Otherwise, if the next three characters are `<::` and the subsequent
69
  character is neither `:` nor `>`, the `<` is treated as a
70
  preprocessing token by itself and not as the first character of the
71
  alternative token `<:`.
72
+ - Otherwise, if the next three characters are `[::` and the subsequent
73
+ character is not `:`, or if the next three characters are `[:>`, the
74
+ `[` is treated as a preprocessing token by itself and not as the first
75
+ character of the preprocessing token `[:`. \[*Note 2*: The tokens `[:`
76
+ and `:]` cannot be composed from digraphs. — *end note*]
77
  - Otherwise, the next preprocessing token is the longest sequence of
78
  characters that could constitute a preprocessing token, even if that
79
+ would cause further lexical analysis to fail, except that
80
+ - a *string-literal* token is never formed when a *header-name* token
81
+ can be formed, and
82
+ - a *header-name* [[lex.header]] is only formed
83
+ - immediately after the `include`, `embed`, or `import`
84
+ preprocessing token in a `#include` [[cpp.include]], `#embed`
85
+ [[cpp.embed]], or `import` [[cpp.import]] directive, respectively,
86
+ or
87
+ - immediately after a preprocessing token sequence of
88
+ `__has_include` or `__has_embed` immediately followed by `(` in a
89
+ `#if`, `#elif`, or `#embed` directive [[cpp.cond]], [[cpp.embed]].
90
 
91
  [*Example 1*:
92
 
93
  ``` cpp
94
  #define R "x"
95
  const char* s = R"y"; // ill-formed raw string, not "x" "y"
96
  ```
97
 
98
  — *end example*]
99
 
 
 
 
 
 
 
 
100
  [*Example 2*: The program fragment `0xe+foo` is parsed as a
101
  preprocessing number token (one that is not a valid *integer-literal* or
102
  *floating-point-literal* token), even though a parse as three
103
  preprocessing tokens `0xe`, `+`, and `foo` can produce a valid
104
  expression (for example, if `foo` is a macro defined as `1`). Similarly,