From Jason Turner

[lex.phases]

Diff to HTML by rtfpessoa

Files changed (1) hide show
  1. tmp/tmpi8vy37oi/{from.md → to.md} +77 -68
tmp/tmpi8vy37oi/{from.md → to.md} RENAMED
@@ -1,93 +1,102 @@
1
  ## Phases of translation <a id="lex.phases">[[lex.phases]]</a>
2
 
3
  The precedence among the syntax rules of translation is specified by the
4
  following phases.[^1]
5
 
6
- 1. Physical source file characters are mapped, in an
7
- *implementation-defined* manner, to the basic source character set
8
- (introducing new-line characters for end-of-line indicators) if
9
- necessary. The set of physical source file characters accepted is
10
- *implementation-defined*. Any source file character not in the basic
11
- source character set [[lex.charset]] is replaced by the
12
- *universal-character-name* that designates that character. An
13
- implementation may use any internal encoding, so long as an actual
14
- extended character encountered in the source file, and the same
15
- extended character expressed in the source file as a
16
- *universal-character-name* (e.g., using the `\uXXXX` notation), are
17
- handled equivalently except where this replacement is reverted
18
- [[lex.pptoken]] in a raw string literal.
19
- 2. Each instance of a backslash character (\\ immediately followed by a
20
- new-line character is deleted, splicing physical source lines to
21
- form logical source lines. Only the last backslash on any physical
22
- source line shall be eligible for being part of such a splice.
23
- Except for splices reverted in a raw string literal, if a splice
24
- results in a character sequence that matches the syntax of a
 
 
 
 
 
 
 
 
 
 
25
  *universal-character-name*, the behavior is undefined. A source file
26
  that is not empty and that does not end in a new-line character, or
27
- that ends in a new-line character immediately preceded by a
28
- backslash character before any such splicing takes place, shall be
29
- processed as if an additional new-line character were appended to
30
- the file.
31
  3. The source file is decomposed into preprocessing tokens
32
- [[lex.pptoken]] and sequences of white-space characters (including
33
  comments). A source file shall not end in a partial preprocessing
34
  token or in a partial comment.[^2] Each comment is replaced by one
35
  space character. New-line characters are retained. Whether each
36
- nonempty sequence of white-space characters other than new-line is
37
- retained or replaced by one space character is unspecified. The
38
- process of dividing a source file’s characters into preprocessing
39
- tokens is context-dependent. \[*Example 1*: See the handling of `<`
40
- within a `#include` preprocessing directive. *end example*]
 
 
 
 
 
 
 
41
  4. Preprocessing directives are executed, macro invocations are
42
- expanded, and `_Pragma` unary operator expressions are executed. If
43
- a character sequence that matches the syntax of a
44
- *universal-character-name* is produced by token concatenation
45
- [[cpp.concat]], the behavior is undefined. A `#include`
46
- preprocessing directive causes the named header or source file to be
47
- processed from phase 1 through phase 4, recursively. All
48
  preprocessing directives are then deleted.
49
- 5. Each basic source character set member in a *character-literal* or a
50
- *string-literal*, as well as each escape sequence and
51
- *universal-character-name* in a *character-literal* or a non-raw
52
- string literal, is converted to the corresponding member of the
53
- execution character set ([[lex.ccon]], [[lex.string]]); if there is
54
- no corresponding member, it is converted to an
55
- *implementation-defined* member other than the null (wide)
56
- character.[^3]
57
- 6. Adjacent string literal tokens are concatenated.
58
- 7. White-space characters separating tokens are no longer significant.
59
  Each preprocessing token is converted into a token [[lex.token]].
60
- The resulting tokens are syntactically and semantically analyzed and
61
- translated as a translation unit. \[*Note 1*: The process of
62
- analyzing and translating the tokens may occasionally result in one
63
- token being replaced by a sequence of other tokens
64
- [[temp.names]]. — *end note*] It is *implementation-defined*
65
- whether the sources for module units and header units on which the
66
- current translation unit has an interface dependency (
67
- [[module.unit]], [[module.import]]) are required to be available.
68
- \[*Note 2*: Source files, translation units and translated
69
- translation units need not necessarily be stored as files, nor need
70
- there be any one-to-one correspondence between these entities and
71
- any external representation. The description is conceptual only, and
72
- does not specify any particular implementation. — *end note*]
 
73
  8. Translated translation units and instantiation units are combined as
74
- follows: \[*Note 3*: Some or all of these may be supplied from a
75
  library. — *end note*] Each translated translation unit is examined
76
- to produce a list of required instantiations. \[*Note 4*: This may
77
  include instantiations which have been explicitly requested
78
  [[temp.explicit]]. — *end note*] The definitions of the required
79
  templates are located. It is *implementation-defined* whether the
80
  source of the translation units containing these definitions is
81
- required to be available. \[*Note 5*: An implementation could encode
82
- sufficient information into the translated translation unit so as to
83
- ensure the source is not required here. — *end note*] All the
84
- required instantiations are performed to produce *instantiation
85
- units*. \[*Note 6*: These are similar to translated translation
86
- units, but contain no references to uninstantiated templates and no
87
- template definitions. — *end note*] The program is ill-formed if
88
- any instantiation fails.
89
  9. All external entity references are resolved. Library components are
90
  linked to satisfy external references to entities not defined in the
91
  current translation. All such translator output is collected into a
92
  program image which contains information needed for execution in its
93
  execution environment.
 
1
  ## Phases of translation <a id="lex.phases">[[lex.phases]]</a>
2
 
3
  The precedence among the syntax rules of translation is specified by the
4
  following phases.[^1]
5
 
6
+ 1. An implementation shall support input files that are a sequence of
7
+ UTF-8 code units (UTF-8 files). It may also support an
8
+ *implementation-defined* set of other kinds of input files, and, if
9
+ so, the kind of an input file is determined in an
10
+ *implementation-defined* manner that includes a means of designating
11
+ input files as UTF-8 files, independent of their content.
12
+ \[*Note 1*: In other words, recognizing the U+feff (byte order mark)
13
+ is not sufficient. *end note*] If an input file is determined to
14
+ be a UTF-8 file, then it shall be a well-formed UTF-8 code unit
15
+ sequence and it is decoded to produce a sequence of Unicode scalar
16
+ values. A sequence of translation character set elements is then
17
+ formed by mapping each Unicode scalar value to the corresponding
18
+ translation character set element. In the resulting sequence, each
19
+ pair of characters in the input sequence consisting of
20
+ U+000d (carriage return) followed by U+000a (line feed), as well as
21
+ each U+000d (carriage return) not immediately followed by a
22
+ U+000a (line feed), is replaced by a single new-line character. For
23
+ any other kind of input file supported by the implementation,
24
+ characters are mapped, in an *implementation-defined* manner, to a
25
+ sequence of translation character set elements [[lex.charset]],
26
+ representing end-of-line indicators as new-line characters.
27
+ 2. If the first translation character is U+feff (byte order mark), it
28
+ is deleted. Each sequence of a backslash character (\\ immediately
29
+ followed by zero or more whitespace characters other than new-line
30
+ followed by a new-line character is deleted, splicing physical
31
+ source lines to form logical source lines. Only the last backslash
32
+ on any physical source line shall be eligible for being part of such
33
+ a splice. Except for splices reverted in a raw string literal, if a
34
+ splice results in a character sequence that matches the syntax of a
35
  *universal-character-name*, the behavior is undefined. A source file
36
  that is not empty and that does not end in a new-line character, or
37
+ that ends in a splice, shall be processed as if an additional
38
+ new-line character were appended to the file.
 
 
39
  3. The source file is decomposed into preprocessing tokens
40
+ [[lex.pptoken]] and sequences of whitespace characters (including
41
  comments). A source file shall not end in a partial preprocessing
42
  token or in a partial comment.[^2] Each comment is replaced by one
43
  space character. New-line characters are retained. Whether each
44
+ nonempty sequence of whitespace characters other than new-line is
45
+ retained or replaced by one space character is unspecified. As
46
+ characters from the source file are consumed to form the next
47
+ preprocessing token (i.e., not being consumed as part of a comment
48
+ or other forms of whitespace), except when matching a
49
+ *c-char-sequence*, *s-char-sequence*, *r-char-sequence*,
50
+ *h-char-sequence*, or *q-char-sequence*, *universal-character-name*s
51
+ are recognized and replaced by the designated element of the
52
+ translation character set. The process of dividing a source file’s
53
+ characters into preprocessing tokens is context-dependent.
54
+ \[*Example 1*: See the handling of `<` within a `#include`
55
+ preprocessing directive. — *end example*]
56
  4. Preprocessing directives are executed, macro invocations are
57
+ expanded, and `_Pragma` unary operator expressions are executed. A
58
+ `#include` preprocessing directive causes the named header or source
59
+ file to be processed from phase 1 through phase 4, recursively. All
 
 
 
60
  preprocessing directives are then deleted.
61
+ 5. For a sequence of two or more adjacent *string-literal* tokens, a
62
+ common *encoding-prefix* is determined as specified in
63
+ [[lex.string]]. Each such *string-literal* token is then considered
64
+ to have that common *encoding-prefix*.
65
+ 6. Adjacent *string-literal* tokens are concatenated [[lex.string]].
66
+ 7. Whitespace characters separating tokens are no longer significant.
 
 
 
 
67
  Each preprocessing token is converted into a token [[lex.token]].
68
+ The resulting tokens constitute a *translation unit* and are
69
+ syntactically and semantically analyzed and translated.
70
+ \[*Note 2*: The process of analyzing and translating the tokens can
71
+ occasionally result in one token being replaced by a sequence of
72
+ other tokens [[temp.names]]. — *end note*] It is
73
+ *implementation-defined* whether the sources for module units and
74
+ header units on which the current translation unit has an interface
75
+ dependency [[module.unit]], [[module.import]] are required to be
76
+ available. \[*Note 3*: Source files, translation units and
77
+ translated translation units need not necessarily be stored as
78
+ files, nor need there be any one-to-one correspondence between these
79
+ entities and any external representation. The description is
80
+ conceptual only, and does not specify any particular
81
+ implementation. — *end note*]
82
  8. Translated translation units and instantiation units are combined as
83
+ follows: \[*Note 4*: Some or all of these can be supplied from a
84
  library. — *end note*] Each translated translation unit is examined
85
+ to produce a list of required instantiations. \[*Note 5*: This can
86
  include instantiations which have been explicitly requested
87
  [[temp.explicit]]. — *end note*] The definitions of the required
88
  templates are located. It is *implementation-defined* whether the
89
  source of the translation units containing these definitions is
90
+ required to be available. \[*Note 6*: An implementation can choose
91
+ to encode sufficient information into the translated translation
92
+ unit so as to ensure the source is not required here. — *end note*]
93
+ All the required instantiations are performed to produce
94
+ *instantiation units*. \[*Note 7*: These are similar to translated
95
+ translation units, but contain no references to uninstantiated
96
+ templates and no template definitions. — *end note*] The program is
97
+ ill-formed if any instantiation fails.
98
  9. All external entity references are resolved. Library components are
99
  linked to satisfy external references to entities not defined in the
100
  current translation. All such translator output is collected into a
101
  program image which contains information needed for execution in its
102
  execution environment.