[lex.phases] - C++17 → C++20

Files changed (1) hide show

tmp/tmpyl6tczpz/{from.md → to.md} +24 -21

tmp/tmpyl6tczpz/{from.md → to.md} RENAMED Viewed

@@ -6,18 +6,18 @@ following phases.[^1]
 1.  Physical source file characters are mapped, in an
     *implementation-defined* manner, to the basic source character set
     (introducing new-line characters for end-of-line indicators) if
     necessary. The set of physical source file characters accepted is
     *implementation-defined*. Any source file character not in the basic
-    source character set ([[lex.charset]]) is replaced by the
     *universal-character-name* that designates that character. An
     implementation may use any internal encoding, so long as an actual
     extended character encountered in the source file, and the same
     extended character expressed in the source file as a
     *universal-character-name* (e.g., using the `\uXXXX` notation), are
-    handled equivalently except where this replacement is reverted (
-    [[lex.pptoken]]) in a raw string literal.
 2.  Each instance of a backslash character (\\ immediately followed by a
     new-line character is deleted, splicing physical source lines to
     form logical source lines. Only the last backslash on any physical
     source line shall be eligible for being part of such a splice.
     Except for splices reverted in a raw string literal, if a splice
@@ -26,55 +26,58 @@ following phases.[^1]
     that is not empty and that does not end in a new-line character, or
     that ends in a new-line character immediately preceded by a
     backslash character before any such splicing takes place, shall be
     processed as if an additional new-line character were appended to
     the file.
-3.  The source file is decomposed into preprocessing tokens (
-    [[lex.pptoken]]) and sequences of white-space characters (including
     comments). A source file shall not end in a partial preprocessing
     token or in a partial comment.[^2] Each comment is replaced by one
     space character. New-line characters are retained. Whether each
     nonempty sequence of white-space characters other than new-line is
     retained or replaced by one space character is unspecified. The
     process of dividing a source file’s characters into preprocessing
-    tokens is context-dependent. \[*Example 1*: see the handling of `<`
     within a `#include` preprocessing directive. — *end example*]
 4.  Preprocessing directives are executed, macro invocations are
     expanded, and `_Pragma` unary operator expressions are executed. If
     a character sequence that matches the syntax of a
-    *universal-character-name* is produced by token concatenation (
-    [[cpp.concat]]), the behavior is undefined. A `#include`
     preprocessing directive causes the named header or source file to be
     processed from phase 1 through phase 4, recursively. All
     preprocessing directives are then deleted.
-5.  Each source character set member in a character literal or a string
-    literal, as well as each escape sequence and
-    *universal-character-name* in a character literal or a non-raw
     string literal, is converted to the corresponding member of the
     execution character set ([[lex.ccon]], [[lex.string]]); if there is
     no corresponding member, it is converted to an
     *implementation-defined* member other than the null (wide)
     character.[^3]
 6.  Adjacent string literal tokens are concatenated.
 7.  White-space characters separating tokens are no longer significant.
-    Each preprocessing token is converted into a token ([[lex.token]]).
     The resulting tokens are syntactically and semantically analyzed and
     translated as a translation unit. \[*Note 1*: The process of
     analyzing and translating the tokens may occasionally result in one
-    token being replaced by a sequence of other tokens (
-    [[temp.names]]). — *end note*] \[*Note 2*: Source files,
- translation units and translated translation units need not
- necessarily be stored as files, nor need there be any one-to-one
- correspondence between these entities and any external
- representation. The description is conceptual only, and does not
- specify any particular implementation. — *end note*]
 8.  Translated translation units and instantiation units are combined as
     follows: \[*Note 3*: Some or all of these may be supplied from a
     library. — *end note*] Each translated translation unit is examined
     to produce a list of required instantiations. \[*Note 4*: This may
-    include instantiations which have been explicitly requested (
-    [[temp.explicit]]). — *end note*] The definitions of the required
     templates are located. It is *implementation-defined* whether the
     source of the translation units containing these definitions is
     required to be available. \[*Note 5*: An implementation could encode
     sufficient information into the translated translation unit so as to
     ensure the source is not required here. — *end note*] All the

 1.  Physical source file characters are mapped, in an
     *implementation-defined* manner, to the basic source character set
     (introducing new-line characters for end-of-line indicators) if
     necessary. The set of physical source file characters accepted is
     *implementation-defined*. Any source file character not in the basic
+    source character set [[lex.charset]] is replaced by the
     *universal-character-name* that designates that character. An
     implementation may use any internal encoding, so long as an actual
     extended character encountered in the source file, and the same
     extended character expressed in the source file as a
     *universal-character-name* (e.g., using the `\uXXXX` notation), are
+    handled equivalently except where this replacement is reverted
+    [[lex.pptoken]] in a raw string literal.
 2.  Each instance of a backslash character (\\ immediately followed by a
     new-line character is deleted, splicing physical source lines to
     form logical source lines. Only the last backslash on any physical
     source line shall be eligible for being part of such a splice.
     Except for splices reverted in a raw string literal, if a splice
     that is not empty and that does not end in a new-line character, or
     that ends in a new-line character immediately preceded by a
     backslash character before any such splicing takes place, shall be
     processed as if an additional new-line character were appended to
     the file.
+3.  The source file is decomposed into preprocessing tokens
+    [[lex.pptoken]] and sequences of white-space characters (including
     comments). A source file shall not end in a partial preprocessing
     token or in a partial comment.[^2] Each comment is replaced by one
     space character. New-line characters are retained. Whether each
     nonempty sequence of white-space characters other than new-line is
     retained or replaced by one space character is unspecified. The
     process of dividing a source file’s characters into preprocessing
+    tokens is context-dependent. \[*Example 1*: See the handling of `<`
     within a `#include` preprocessing directive. — *end example*]
 4.  Preprocessing directives are executed, macro invocations are
     expanded, and `_Pragma` unary operator expressions are executed. If
     a character sequence that matches the syntax of a
+    *universal-character-name* is produced by token concatenation
+    [[cpp.concat]], the behavior is undefined. A `#include`
     preprocessing directive causes the named header or source file to be
     processed from phase 1 through phase 4, recursively. All
     preprocessing directives are then deleted.
+5.  Each basic source character set member in a *character-literal* or a
+ *string-literal*, as well as each escape sequence and
+    *universal-character-name* in a *character-literal* or a non-raw
     string literal, is converted to the corresponding member of the
     execution character set ([[lex.ccon]], [[lex.string]]); if there is
     no corresponding member, it is converted to an
     *implementation-defined* member other than the null (wide)
     character.[^3]
 6.  Adjacent string literal tokens are concatenated.
 7.  White-space characters separating tokens are no longer significant.
+    Each preprocessing token is converted into a token [[lex.token]].
     The resulting tokens are syntactically and semantically analyzed and
     translated as a translation unit. \[*Note 1*: The process of
     analyzing and translating the tokens may occasionally result in one
+    token being replaced by a sequence of other tokens
+    [[temp.names]]. — *end note*] It is *implementation-defined*
+ whether the sources for module units and header units on which the
+ current translation unit has an interface dependency (
+ [[module.unit]], [[module.import]]) are required to be available.
+ \[*Note 2*: Source files, translation units and translated
+ translation units need not necessarily be stored as files, nor need
+    there be any one-to-one correspondence between these entities and
+    any external representation. The description is conceptual only, and
+    does not specify any particular implementation. — *end note*]
 8.  Translated translation units and instantiation units are combined as
     follows: \[*Note 3*: Some or all of these may be supplied from a
     library. — *end note*] Each translated translation unit is examined
     to produce a list of required instantiations. \[*Note 4*: This may
+    include instantiations which have been explicitly requested
+    [[temp.explicit]]. — *end note*] The definitions of the required
     templates are located. It is *implementation-defined* whether the
     source of the translation units containing these definitions is
     required to be available. \[*Note 5*: An implementation could encode
     sufficient information into the translated translation unit so as to
     ensure the source is not required here. — *end note*] All the

Diff to HTML by rtfpessoa