From Jason Turner

[lex.phases]

Diff to HTML by rtfpessoa

Files changed (1) hide show
  1. tmp/tmpohig911a/{from.md → to.md} +130 -78
tmp/tmpohig911a/{from.md → to.md} RENAMED
@@ -10,94 +10,146 @@ following phases.[^1]
10
  *implementation-defined* manner that includes a means of designating
11
  input files as UTF-8 files, independent of their content.
12
  \[*Note 1*: In other words, recognizing the U+feff (byte order mark)
13
  is not sufficient. — *end note*] If an input file is determined to
14
  be a UTF-8 file, then it shall be a well-formed UTF-8 code unit
15
- sequence and it is decoded to produce a sequence of Unicode scalar
16
- values. A sequence of translation character set elements is then
17
- formed by mapping each Unicode scalar value to the corresponding
18
- translation character set element. In the resulting sequence, each
19
- pair of characters in the input sequence consisting of
20
- U+000d (carriage return) followed by U+000a (line feed), as well as
21
- each U+000d (carriage return) not immediately followed by a
22
- U+000a (line feed), is replaced by a single new-line character. For
23
- any other kind of input file supported by the implementation,
24
- characters are mapped, in an *implementation-defined* manner, to a
25
- sequence of translation character set elements [[lex.charset]],
26
- representing end-of-line indicators as new-line characters.
 
27
  2. If the first translation character is U+feff (byte order mark), it
28
- is deleted. Each sequence of a backslash character (\\ immediately
29
- followed by zero or more whitespace characters other than new-line
30
- followed by a new-line character is deleted, splicing physical
31
- source lines to form logical source lines. Only the last backslash
32
- on any physical source line shall be eligible for being part of such
33
- a splice. Except for splices reverted in a raw string literal, if a
34
- splice results in a character sequence that matches the syntax of a
35
- *universal-character-name*, the behavior is undefined. A source file
36
- that is not empty and that does not end in a new-line character, or
37
- that ends in a splice, shall be processed as if an additional
38
- new-line character were appended to the file.
39
  3. The source file is decomposed into preprocessing tokens
40
  [[lex.pptoken]] and sequences of whitespace characters (including
41
  comments). A source file shall not end in a partial preprocessing
42
- token or in a partial comment.[^2] Each comment is replaced by one
43
- space character. New-line characters are retained. Whether each
44
- nonempty sequence of whitespace characters other than new-line is
45
- retained or replaced by one space character is unspecified. As
46
- characters from the source file are consumed to form the next
47
- preprocessing token (i.e., not being consumed as part of a comment
48
- or other forms of whitespace), except when matching a
49
- *c-char-sequence*, *s-char-sequence*, *r-char-sequence*,
50
- *h-char-sequence*, or *q-char-sequence*, *universal-character-name*s
51
- are recognized and replaced by the designated element of the
52
- translation character set. The process of dividing a source file’s
 
53
  characters into preprocessing tokens is context-dependent.
54
  \[*Example 1*: See the handling of `<` within a `#include`
55
- preprocessing directive. — *end example*]
56
- 4. Preprocessing directives are executed, macro invocations are
57
- expanded, and `_Pragma` unary operator expressions are executed. A
58
- `#include` preprocessing directive causes the named header or source
59
- file to be processed from phase 1 through phase 4, recursively. All
60
- preprocessing directives are then deleted.
61
- 5. For a sequence of two or more adjacent *string-literal* tokens, a
62
- common *encoding-prefix* is determined as specified in
63
- [[lex.string]]. Each such *string-literal* token is then considered
64
- to have that common *encoding-prefix*.
65
- 6. Adjacent *string-literal* tokens are concatenated [[lex.string]].
66
- 7. Whitespace characters separating tokens are no longer significant.
67
- Each preprocessing token is converted into a token [[lex.token]].
 
 
 
 
 
68
  The resulting tokens constitute a *translation unit* and are
69
- syntactically and semantically analyzed and translated.
70
- \[*Note 2*: The process of analyzing and translating the tokens can
 
71
  occasionally result in one token being replaced by a sequence of
72
- other tokens [[temp.names]]. — *end note*] It is
73
- *implementation-defined* whether the sources for module units and
74
- header units on which the current translation unit has an interface
75
- dependency [[module.unit]], [[module.import]] are required to be
76
- available. \[*Note 3*: Source files, translation units and
77
- translated translation units need not necessarily be stored as
78
- files, nor need there be any one-to-one correspondence between these
79
- entities and any external representation. The description is
80
- conceptual only, and does not specify any particular
81
- implementation. — *end note*]
82
- 8. Translated translation units and instantiation units are combined as
83
- follows: \[*Note 4*: Some or all of these can be supplied from a
84
- library. *end note*] Each translated translation unit is examined
85
- to produce a list of required instantiations. \[*Note 5*: This can
86
- include instantiations which have been explicitly requested
87
- [[temp.explicit]]. — *end note*] The definitions of the required
88
- templates are located. It is *implementation-defined* whether the
89
- source of the translation units containing these definitions is
90
- required to be available. \[*Note 6*: An implementation can choose
91
- to encode sufficient information into the translated translation
92
- unit so as to ensure the source is not required here. — *end note*]
93
- All the required instantiations are performed to produce
94
- *instantiation units*. \[*Note 7*: These are similar to translated
95
- translation units, but contain no references to uninstantiated
96
- templates and no template definitions. *end note*] The program is
 
 
 
 
 
97
  ill-formed if any instantiation fails.
98
- 9. All external entity references are resolved. Library components are
99
- linked to satisfy external references to entities not defined in the
100
- current translation. All such translator output is collected into a
101
- program image which contains information needed for execution in its
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
102
  execution environment.
103
 
 
10
  *implementation-defined* manner that includes a means of designating
11
  input files as UTF-8 files, independent of their content.
12
  \[*Note 1*: In other words, recognizing the U+feff (byte order mark)
13
  is not sufficient. — *end note*] If an input file is determined to
14
  be a UTF-8 file, then it shall be a well-formed UTF-8 code unit
15
+ sequence and it is decoded to produce a sequence of Unicode[^2]
16
+ scalar values. A sequence of translation character set elements
17
+ [[lex.charset]] is then formed by mapping each Unicode scalar value
18
+ to the corresponding translation character set element. In the
19
+ resulting sequence, each pair of characters in the input sequence
20
+ consisting of U+000d (carriage return) followed by
21
+ U+000a (line feed), as well as each U+000d (carriage return) not
22
+ immediately followed by a U+000a (line feed), is replaced by a
23
+ single new-line character. For any other kind of input file
24
+ supported by the implementation, characters are mapped, in an
25
+ *implementation-defined* manner, to a sequence of translation
26
+ character set elements, representing end-of-line indicators as
27
+ new-line characters.
28
  2. If the first translation character is U+feff (byte order mark), it
29
+ is deleted. Each sequence comprising a backslash character (\\
30
+ immediately followed by zero or more whitespace characters other
31
+ than new-line followed by a new-line character is deleted, splicing
32
+ physical source lines to form *logical source lines*. Only the last
33
+ backslash on any physical source line shall be eligible for being
34
+ part of such a splice. \[*Note 2*: Line splicing can form a
35
+ *universal-character-name* [[lex.charset]]. *end note*] A source
36
+ file that is not empty and that (after splicing) does not end in a
37
+ new-line character shall be processed as if an additional new-line
38
+ character were appended to the file.
 
39
  3. The source file is decomposed into preprocessing tokens
40
  [[lex.pptoken]] and sequences of whitespace characters (including
41
  comments). A source file shall not end in a partial preprocessing
42
+ token or in a partial comment.[^3] Each comment [[lex.comment]] is
43
+ replaced by one U+0020 (space) character. New-line characters are
44
+ retained. Whether each nonempty sequence of whitespace characters
45
+ other than new-line is retained or replaced by one U+0020 (space)
46
+ character is unspecified. As characters from the source file are
47
+ consumed to form the next preprocessing token (i.e., not being
48
+ consumed as part of a comment or other forms of whitespace), except
49
+ when matching a *c-char-sequence*, *s-char-sequence*,
50
+ *r-char-sequence*, *h-char-sequence*, or *q-char-sequence*,
51
+ *universal-character-name*s are recognized [[lex.universal.char]]
52
+ and replaced by the designated element of the translation character
53
+ set [[lex.charset]]. The process of dividing a source file’s
54
  characters into preprocessing tokens is context-dependent.
55
  \[*Example 1*: See the handling of `<` within a `#include`
56
+ preprocessing directive
57
+ [[lex.header]], [[cpp.include]]. *end example*]
58
+ 4. The source file is analyzed as a *preprocessing-file* [[cpp.pre]].
59
+ Preprocessing directives [[cpp]] are executed, macro invocations are
60
+ expanded [[cpp.replace]], and `_Pragma` unary operator expressions
61
+ are executed [[cpp.pragma.op]]. A `#include` preprocessing directive
62
+ [[cpp.include]] causes the named header or source file to be
63
+ processed from phase 1 through phase 4, recursively. All
64
+ preprocessing directives are then deleted. Whitespace characters
65
+ separating preprocessing tokens are no longer significant.
66
+ 5. For a sequence of two or more adjacent *string-literal*
67
+ preprocessing tokens, a common *encoding-prefix* is determined as
68
+ specified in [[lex.string]]. Each such *string-literal*
69
+ preprocessing token is then considered to have that common
70
+ *encoding-prefix*.
71
+ 6. Adjacent *string-literal* preprocessing tokens are concatenated
72
+ [[lex.string]].
73
+ 7. Each preprocessing token is converted into a token [[lex.token]].
74
  The resulting tokens constitute a *translation unit* and are
75
+ syntactically and semantically analyzed as a *translation-unit*
76
+ [[basic.link]] and translated.
77
+ \[*Note 3*: The process of analyzing and translating the tokens can
78
  occasionally result in one token being replaced by a sequence of
79
+ other tokens [[temp.names]]. — *end note*]
80
+ It is *implementation-defined* whether the sources for module units
81
+ and header units on which the current translation unit has an
82
+ interface dependency [[module.unit]], [[module.import]] are required
83
+ to be available.
84
+ \[*Note 4*: Source files, translation units and translated
85
+ translation units need not necessarily be stored as files, nor need
86
+ there be any one-to-one correspondence between these entities and
87
+ any external representation. The description is conceptual only, and
88
+ does not specify any particular implementation. — *end note*]
89
+ \[*Note 5*: Previously translated translation units can be preserved
90
+ individually or in libraries. The separate translation units of a
91
+ program communicate [[basic.link]] by (for example) calls to
92
+ functions whose names have external or module linkage, manipulation
93
+ of variables whose names have external or module linkage, or
94
+ manipulation of data files. — *end note*]
95
+ While the tokens constituting translation units are being analyzed
96
+ and translated, required instantiations are performed.
97
+ \[*Note 6*: This can include instantiations which have been
98
+ explicitly requested [[temp.explicit]]. *end note*]
99
+ The contexts from which instantiations may be performed are
100
+ determined by their respective points of instantiation
101
+ [[temp.point]].
102
+ \[*Note 7*: Other requirements in this document can further
103
+ constrain the context from which an instantiation can be performed.
104
+ For example, a constexpr function template specialization might have
105
+ a point of instantiation at the end of a translation unit, but its
106
+ use in certain constant expressions could require that it be
107
+ instantiated at an earlier point [[temp.inst]]. — *end note*]
108
+ Each instantiation results in new program constructs. The program is
109
  ill-formed if any instantiation fails.
110
+ During the analysis and translation of tokens, certain expressions
111
+ are evaluated [[expr.const]]. Constructs appearing at a program
112
+ point P are analyzed in a context where each side effect of
113
+ evaluating an expression E as a full-expression is complete if and
114
+ only if
115
+ - E is the expression corresponding to a
116
+ *consteval-block-declaration* [[dcl.pre]], and
117
+ - either that *consteval-block-declaration* or the template
118
+ definition from which it is instantiated is reachable from
119
+ [[module.reach]]
120
+ - P, or
121
+ - the point immediately following the *class-specifier* of the
122
+ outermost class for which P is in a complete-class context
123
+ [[class.mem.general]].
124
+
125
+ \[*Example 2*:
126
+ ``` cpp
127
+ class S {
128
+ class Incomplete;
129
+
130
+ class Inner {
131
+ void fn() {
132
+ /* p₁ */ Incomplete i; // OK
133
+ }
134
+ }; /* p₂ */
135
+
136
+ consteval {
137
+ define_aggregate(^^Incomplete, {});
138
+ }
139
+ }; /* p₃ */
140
+ ```
141
+
142
+ Constructs at p₁ are analyzed in a context where the side effect of
143
+ the call to `define_aggregate` is evaluated because
144
+ - E is the expression corresponding to a consteval block, and
145
+ - p₁ is in a complete-class context of `S` and the consteval block
146
+ is reachable from p₃.
147
+
148
+ — *end example*]
149
+ 8. Translated translation units are combined, and all external entity
150
+ references are resolved. Library components are linked to satisfy
151
+ external references to entities not defined in the current
152
+ translation. All such translator output is collected into a program
153
+ image which contains information needed for execution in its
154
  execution environment.
155