[lex.phases] - C++23 → Trunk

Files changed (1) hide show

tmp/tmpohig911a/{from.md → to.md} +130 -78

tmp/tmpohig911a/{from.md → to.md} RENAMED Viewed

@@ -10,94 +10,146 @@ following phases.[^1]
     *implementation-defined* manner that includes a means of designating
     input files as UTF-8 files, independent of their content.
     \[*Note 1*: In other words, recognizing the U+feff (byte order mark)
     is not sufficient. — *end note*] If an input file is determined to
     be a UTF-8 file, then it shall be a well-formed UTF-8 code unit
-    sequence and it is decoded to produce a sequence of Unicode scalar
-    values. A sequence of translation character set elements is then
-    formed by mapping each Unicode scalar value to the corresponding
-    translation character set element. In the resulting sequence, each
-    pair of characters in the input sequence consisting of
-    U+000d (carriage return) followed by U+000a (line feed), as well as
-    each U+000d (carriage return) not immediately followed by a
-    U+000a (line feed), is replaced by a single new-line character. For
-    any other kind of input file supported by the implementation,
-    characters are mapped, in an *implementation-defined* manner, to a
- sequence of translation character set elements [[lex.charset]],
-    representing end-of-line indicators as new-line characters.
 2.  If the first translation character is U+feff (byte order mark), it
-    is deleted. Each sequence of a backslash character (\\ immediately
-    followed by zero or more whitespace characters other than new-line
-    followed by a new-line character is deleted, splicing physical
-    source lines to form logical source lines. Only the last backslash
-    on any physical source line shall be eligible for being part of such
-    a splice. Except for splices reverted in a raw string literal, if a
- splice results in a character sequence that matches the syntax of a
- *universal-character-name*, the behavior is undefined. A source file
- that is not empty and that does not end in a new-line character, or
- that ends in a splice, shall be processed as if an additional
-    new-line character were appended to the file.
 3.  The source file is decomposed into preprocessing tokens
     [[lex.pptoken]] and sequences of whitespace characters (including
     comments). A source file shall not end in a partial preprocessing
-    token or in a partial comment.[^2] Each comment is replaced by one
-    space character. New-line characters are retained. Whether each
-    nonempty sequence of whitespace characters other than new-line is
-    retained or replaced by one space character is unspecified. As
-    characters from the source file are consumed to form the next
-    preprocessing token (i.e., not being consumed as part of a comment
-    or other forms of whitespace), except when matching a
-    *c-char-sequence*, *s-char-sequence*, *r-char-sequence*,
-    *h-char-sequence*, or *q-char-sequence*, *universal-character-name*s
-    are recognized and replaced by the designated element of the
- translation character set. The process of dividing a source file’s
     characters into preprocessing tokens is context-dependent.
     \[*Example 1*: See the handling of `<` within a `#include`
-    preprocessing directive. — *end example*]
-4.  Preprocessing directives are executed, macro invocations are
-    expanded, and `_Pragma` unary operator expressions are executed. A
- `#include` preprocessing directive causes the named header or source
- file to be processed from phase 1 through phase 4, recursively. All
- preprocessing directives are then deleted.
-5.  For a sequence of two or more adjacent *string-literal* tokens, a
- common *encoding-prefix* is determined as specified in
- [[lex.string]]. Each such *string-literal* token is then considered
- to have that common *encoding-prefix*.
-6.  Adjacent *string-literal* tokens are concatenated [[lex.string]].
-7.  Whitespace characters separating tokens are no longer significant.
- Each preprocessing token is converted into a token [[lex.token]].
     The resulting tokens constitute a *translation unit* and are
-    syntactically and semantically analyzed and translated.
- \[*Note 2*: The process of analyzing and translating the tokens can
     occasionally result in one token being replaced by a sequence of
-    other tokens [[temp.names]]. — *end note*] It is
-    *implementation-defined* whether the sources for module units and
-    header units on which the current translation unit has an interface
-    dependency [[module.unit]], [[module.import]] are required to be
- available. \[*Note 3*: Source files, translation units and
- translated translation units need not necessarily be stored as
- files, nor need there be any one-to-one correspondence between these
- entities and any external representation. The description is
- conceptual only, and does not specify any particular
-    implementation. — *end note*]
-8.  Translated translation units and instantiation units are combined as
- follows: \[*Note 4*: Some or all of these can be supplied from a
- library. — *end note*] Each translated translation unit is examined
- to produce a list of required instantiations. \[*Note 5*: This can
- include instantiations which have been explicitly requested
- [[temp.explicit]]. — *end note*] The definitions of the required
- templates are located. It is *implementation-defined* whether the
- source of the translation units containing these definitions is
- required to be available. \[*Note 6*: An implementation can choose
- to encode sufficient information into the translated translation
- unit so as to ensure the source is not required here. — *end note*]
- All the required instantiations are performed to produce
- *instantiation units*. \[*Note 7*: These are similar to translated
- translation units, but contain no references to uninstantiated
- templates and no template definitions. — *end note*] The program is
     ill-formed if any instantiation fails.
-9.  All external entity references are resolved. Library components are
- linked to satisfy external references to entities not defined in the
- current translation. All such translator output is collected into a
- program image which contains information needed for execution in its
     execution environment.

     *implementation-defined* manner that includes a means of designating
     input files as UTF-8 files, independent of their content.
     \[*Note 1*: In other words, recognizing the U+feff (byte order mark)
     is not sufficient. — *end note*] If an input file is determined to
     be a UTF-8 file, then it shall be a well-formed UTF-8 code unit
+    sequence and it is decoded to produce a sequence of Unicode[^2]
+ scalar values. A sequence of translation character set elements
+ [[lex.charset]] is then formed by mapping each Unicode scalar value
+ to the corresponding translation character set element. In the
+ resulting sequence, each pair of characters in the input sequence
+ consisting of U+000d (carriage return) followed by
+ U+000a (line feed), as well as each U+000d (carriage return) not
+ immediately followed by a U+000a (line feed), is replaced by a
+ single new-line character. For any other kind of input file
+ supported by the implementation, characters are mapped, in an
+ *implementation-defined* manner, to a sequence of translation
+ character set elements, representing end-of-line indicators as
+    new-line characters.
 2.  If the first translation character is U+feff (byte order mark), it
+    is deleted. Each sequence comprising a backslash character (\\
+ immediately followed by zero or more whitespace characters other
+ than new-line followed by a new-line character is deleted, splicing
+ physical source lines to form *logical source lines*. Only the last
+ backslash on any physical source line shall be eligible for being
+ part of such a splice. \[*Note 2*: Line splicing can form a
+ *universal-character-name* [[lex.charset]]. — *end note*] A source
+ file that is not empty and that (after splicing) does not end in a
+ new-line character shall be processed as if an additional new-line
+ character were appended to the file.
 3.  The source file is decomposed into preprocessing tokens
     [[lex.pptoken]] and sequences of whitespace characters (including
     comments). A source file shall not end in a partial preprocessing
+    token or in a partial comment.[^3] Each comment [[lex.comment]] is
+ replaced by one U+0020 (space) character. New-line characters are
+ retained. Whether each nonempty sequence of whitespace characters
+ other than new-line is retained or replaced by one U+0020 (space)
+ character is unspecified. As characters from the source file are
+ consumed to form the next preprocessing token (i.e., not being
+ consumed as part of a comment or other forms of whitespace), except
+ when matching a *c-char-sequence*, *s-char-sequence*,
+    *r-char-sequence*, *h-char-sequence*, or *q-char-sequence*,
+ *universal-character-name*s are recognized [[lex.universal.char]]
+ and replaced by the designated element of the translation character
+    set [[lex.charset]]. The process of dividing a source file’s
     characters into preprocessing tokens is context-dependent.
     \[*Example 1*: See the handling of `<` within a `#include`
+    preprocessing directive
+    [[lex.header]], [[cpp.include]]. — *end example*]
+4.  The source file is analyzed as a *preprocessing-file* [[cpp.pre]].
+ Preprocessing directives [[cpp]] are executed, macro invocations are
+ expanded [[cpp.replace]], and `_Pragma` unary operator expressions
+    are executed [[cpp.pragma.op]]. A `#include` preprocessing directive
+    [[cpp.include]] causes the named header or source file to be
+ processed from phase 1 through phase 4, recursively. All
+ preprocessing directives are then deleted. Whitespace characters
+ separating preprocessing tokens are no longer significant.
+5.  For a sequence of two or more adjacent *string-literal*
+    preprocessing tokens, a common *encoding-prefix* is determined as
+ specified in [[lex.string]]. Each such *string-literal*
+    preprocessing token is then considered to have that common
+    *encoding-prefix*.
+6.  Adjacent *string-literal* preprocessing tokens are concatenated
+    [[lex.string]].
+7.  Each preprocessing token is converted into a token [[lex.token]].
     The resulting tokens constitute a *translation unit* and are
+    syntactically and semantically analyzed as a *translation-unit*
+    [[basic.link]] and translated.
+    \[*Note 3*: The process of analyzing and translating the tokens can
     occasionally result in one token being replaced by a sequence of
+    other tokens [[temp.names]]. — *end note*]
+ It is *implementation-defined* whether the sources for module units
+ and header units on which the current translation unit has an
+ interface dependency [[module.unit]], [[module.import]] are required
+ to be available.
+ \[*Note 4*: Source files, translation units and translated
+ translation units need not necessarily be stored as files, nor need
+ there be any one-to-one correspondence between these entities and
+ any external representation. The description is conceptual only, and
+ does not specify any particular implementation. — *end note*]
+    \[*Note 5*: Previously translated translation units can be preserved
+ individually or in libraries. The separate translation units of a
+ program communicate [[basic.link]] by (for example) calls to
+ functions whose names have external or module linkage, manipulation
+ of variables whose names have external or module linkage, or
+ manipulation of data files. — *end note*]
+ While the tokens constituting translation units are being analyzed
+ and translated, required instantiations are performed.
+    \[*Note 6*: This can include instantiations which have been
+ explicitly requested [[temp.explicit]]. — *end note*]
+ The contexts from which instantiations may be performed are
+ determined by their respective points of instantiation
+    [[temp.point]].
+ \[*Note 7*: Other requirements in this document can further
+ constrain the context from which an instantiation can be performed.
+    For example, a constexpr function template specialization might have
+    a point of instantiation at the end of a translation unit, but its
+    use in certain constant expressions could require that it be
+    instantiated at an earlier point [[temp.inst]]. — *end note*]
+    Each instantiation results in new program constructs. The program is
     ill-formed if any instantiation fails.
+    During the analysis and translation of tokens, certain expressions
+ are evaluated [[expr.const]]. Constructs appearing at a program
+ point P are analyzed in a context where each side effect of
+ evaluating an expression E as a full-expression is complete if and
+    only if
+    - E is the expression corresponding to a
+      *consteval-block-declaration* [[dcl.pre]], and
+    - either that *consteval-block-declaration* or the template
+      definition from which it is instantiated is reachable from
+      [[module.reach]]
+      - P, or
+      - the point immediately following the *class-specifier* of the
+        outermost class for which P is in a complete-class context
+        [[class.mem.general]].
+    \[*Example 2*:
+    ``` cpp
+    class S {
+      class Incomplete;
+      class Inner {
+        void fn() {
+          /* p₁ */ Incomplete i;    // OK
+        }
+      }; /* p₂ */
+      consteval {
+        define_aggregate(^^Incomplete, {});
+      }
+    }; /* p₃ */
+    ```
+    Constructs at p₁ are analyzed in a context where the side effect of
+    the call to `define_aggregate` is evaluated because
+    - E is the expression corresponding to a consteval block, and
+    - p₁ is in a complete-class context of `S` and the consteval block
+      is reachable from p₃.
+    — *end example*]
+8.  Translated translation units are combined, and all external entity
+    references are resolved. Library components are linked to satisfy
+    external references to entities not defined in the current
+    translation. All such translator output is collected into a program
+    image which contains information needed for execution in its
     execution environment.

Diff to HTML by rtfpessoa