From Jason Turner

[lex.phases]

Diff to HTML by rtfpessoa

Files changed (1) hide show
  1. tmp/tmpyl6tczpz/{from.md → to.md} +24 -21
tmp/tmpyl6tczpz/{from.md → to.md} RENAMED
@@ -6,18 +6,18 @@ following phases.[^1]
6
  1. Physical source file characters are mapped, in an
7
  *implementation-defined* manner, to the basic source character set
8
  (introducing new-line characters for end-of-line indicators) if
9
  necessary. The set of physical source file characters accepted is
10
  *implementation-defined*. Any source file character not in the basic
11
- source character set ([[lex.charset]]) is replaced by the
12
  *universal-character-name* that designates that character. An
13
  implementation may use any internal encoding, so long as an actual
14
  extended character encountered in the source file, and the same
15
  extended character expressed in the source file as a
16
  *universal-character-name* (e.g., using the `\uXXXX` notation), are
17
- handled equivalently except where this replacement is reverted (
18
- [[lex.pptoken]]) in a raw string literal.
19
  2. Each instance of a backslash character (\\ immediately followed by a
20
  new-line character is deleted, splicing physical source lines to
21
  form logical source lines. Only the last backslash on any physical
22
  source line shall be eligible for being part of such a splice.
23
  Except for splices reverted in a raw string literal, if a splice
@@ -26,55 +26,58 @@ following phases.[^1]
26
  that is not empty and that does not end in a new-line character, or
27
  that ends in a new-line character immediately preceded by a
28
  backslash character before any such splicing takes place, shall be
29
  processed as if an additional new-line character were appended to
30
  the file.
31
- 3. The source file is decomposed into preprocessing tokens (
32
- [[lex.pptoken]]) and sequences of white-space characters (including
33
  comments). A source file shall not end in a partial preprocessing
34
  token or in a partial comment.[^2] Each comment is replaced by one
35
  space character. New-line characters are retained. Whether each
36
  nonempty sequence of white-space characters other than new-line is
37
  retained or replaced by one space character is unspecified. The
38
  process of dividing a source file’s characters into preprocessing
39
- tokens is context-dependent. \[*Example 1*: see the handling of `<`
40
  within a `#include` preprocessing directive. — *end example*]
41
  4. Preprocessing directives are executed, macro invocations are
42
  expanded, and `_Pragma` unary operator expressions are executed. If
43
  a character sequence that matches the syntax of a
44
- *universal-character-name* is produced by token concatenation (
45
- [[cpp.concat]]), the behavior is undefined. A `#include`
46
  preprocessing directive causes the named header or source file to be
47
  processed from phase 1 through phase 4, recursively. All
48
  preprocessing directives are then deleted.
49
- 5. Each source character set member in a character literal or a string
50
- literal, as well as each escape sequence and
51
- *universal-character-name* in a character literal or a non-raw
52
  string literal, is converted to the corresponding member of the
53
  execution character set ([[lex.ccon]], [[lex.string]]); if there is
54
  no corresponding member, it is converted to an
55
  *implementation-defined* member other than the null (wide)
56
  character.[^3]
57
  6. Adjacent string literal tokens are concatenated.
58
  7. White-space characters separating tokens are no longer significant.
59
- Each preprocessing token is converted into a token ([[lex.token]]).
60
  The resulting tokens are syntactically and semantically analyzed and
61
  translated as a translation unit. \[*Note 1*: The process of
62
  analyzing and translating the tokens may occasionally result in one
63
- token being replaced by a sequence of other tokens (
64
- [[temp.names]]). — *end note*] \[*Note 2*: Source files,
65
- translation units and translated translation units need not
66
- necessarily be stored as files, nor need there be any one-to-one
67
- correspondence between these entities and any external
68
- representation. The description is conceptual only, and does not
69
- specify any particular implementation. *end note*]
 
 
 
70
  8. Translated translation units and instantiation units are combined as
71
  follows: \[*Note 3*: Some or all of these may be supplied from a
72
  library. — *end note*] Each translated translation unit is examined
73
  to produce a list of required instantiations. \[*Note 4*: This may
74
- include instantiations which have been explicitly requested (
75
- [[temp.explicit]]). — *end note*] The definitions of the required
76
  templates are located. It is *implementation-defined* whether the
77
  source of the translation units containing these definitions is
78
  required to be available. \[*Note 5*: An implementation could encode
79
  sufficient information into the translated translation unit so as to
80
  ensure the source is not required here. — *end note*] All the
 
6
  1. Physical source file characters are mapped, in an
7
  *implementation-defined* manner, to the basic source character set
8
  (introducing new-line characters for end-of-line indicators) if
9
  necessary. The set of physical source file characters accepted is
10
  *implementation-defined*. Any source file character not in the basic
11
+ source character set [[lex.charset]] is replaced by the
12
  *universal-character-name* that designates that character. An
13
  implementation may use any internal encoding, so long as an actual
14
  extended character encountered in the source file, and the same
15
  extended character expressed in the source file as a
16
  *universal-character-name* (e.g., using the `\uXXXX` notation), are
17
+ handled equivalently except where this replacement is reverted
18
+ [[lex.pptoken]] in a raw string literal.
19
  2. Each instance of a backslash character (\\ immediately followed by a
20
  new-line character is deleted, splicing physical source lines to
21
  form logical source lines. Only the last backslash on any physical
22
  source line shall be eligible for being part of such a splice.
23
  Except for splices reverted in a raw string literal, if a splice
 
26
  that is not empty and that does not end in a new-line character, or
27
  that ends in a new-line character immediately preceded by a
28
  backslash character before any such splicing takes place, shall be
29
  processed as if an additional new-line character were appended to
30
  the file.
31
+ 3. The source file is decomposed into preprocessing tokens
32
+ [[lex.pptoken]] and sequences of white-space characters (including
33
  comments). A source file shall not end in a partial preprocessing
34
  token or in a partial comment.[^2] Each comment is replaced by one
35
  space character. New-line characters are retained. Whether each
36
  nonempty sequence of white-space characters other than new-line is
37
  retained or replaced by one space character is unspecified. The
38
  process of dividing a source file’s characters into preprocessing
39
+ tokens is context-dependent. \[*Example 1*: See the handling of `<`
40
  within a `#include` preprocessing directive. — *end example*]
41
  4. Preprocessing directives are executed, macro invocations are
42
  expanded, and `_Pragma` unary operator expressions are executed. If
43
  a character sequence that matches the syntax of a
44
+ *universal-character-name* is produced by token concatenation
45
+ [[cpp.concat]], the behavior is undefined. A `#include`
46
  preprocessing directive causes the named header or source file to be
47
  processed from phase 1 through phase 4, recursively. All
48
  preprocessing directives are then deleted.
49
+ 5. Each basic source character set member in a *character-literal* or a
50
+ *string-literal*, as well as each escape sequence and
51
+ *universal-character-name* in a *character-literal* or a non-raw
52
  string literal, is converted to the corresponding member of the
53
  execution character set ([[lex.ccon]], [[lex.string]]); if there is
54
  no corresponding member, it is converted to an
55
  *implementation-defined* member other than the null (wide)
56
  character.[^3]
57
  6. Adjacent string literal tokens are concatenated.
58
  7. White-space characters separating tokens are no longer significant.
59
+ Each preprocessing token is converted into a token [[lex.token]].
60
  The resulting tokens are syntactically and semantically analyzed and
61
  translated as a translation unit. \[*Note 1*: The process of
62
  analyzing and translating the tokens may occasionally result in one
63
+ token being replaced by a sequence of other tokens
64
+ [[temp.names]]. — *end note*] It is *implementation-defined*
65
+ whether the sources for module units and header units on which the
66
+ current translation unit has an interface dependency (
67
+ [[module.unit]], [[module.import]]) are required to be available.
68
+ \[*Note 2*: Source files, translation units and translated
69
+ translation units need not necessarily be stored as files, nor need
70
+ there be any one-to-one correspondence between these entities and
71
+ any external representation. The description is conceptual only, and
72
+ does not specify any particular implementation. — *end note*]
73
  8. Translated translation units and instantiation units are combined as
74
  follows: \[*Note 3*: Some or all of these may be supplied from a
75
  library. — *end note*] Each translated translation unit is examined
76
  to produce a list of required instantiations. \[*Note 4*: This may
77
+ include instantiations which have been explicitly requested
78
+ [[temp.explicit]]. — *end note*] The definitions of the required
79
  templates are located. It is *implementation-defined* whether the
80
  source of the translation units containing these definitions is
81
  required to be available. \[*Note 5*: An implementation could encode
82
  sufficient information into the translated translation unit so as to
83
  ensure the source is not required here. — *end note*] All the