From Jason Turner

[lex]

Diff to HTML by rtfpessoa

Files changed (1) hide show
  1. tmp/tmp6rxk53sa/{from.md → to.md} +441 -370
tmp/tmp6rxk53sa/{from.md → to.md} RENAMED
@@ -1,27 +1,27 @@
1
  # Lexical conventions <a id="lex">[[lex]]</a>
2
 
3
  ## Separate translation <a id="lex.separate">[[lex.separate]]</a>
4
 
5
  The text of the program is kept in units called *source files* in this
6
- International Standard. A source file together with all the headers (
7
- [[headers]]) and source files included ([[cpp.include]]) via the
8
- preprocessing directive `#include`, less any source lines skipped by any
9
- of the conditional inclusion ([[cpp.cond]]) preprocessing directives,
10
- is called a *translation unit*.
11
 
12
  [*Note 1*: A C++ program need not all be translated at the same
13
  time. — *end note*]
14
 
15
  [*Note 2*: Previously translated translation units and instantiation
16
  units can be preserved individually or in libraries. The separate
17
- translation units of a program communicate ([[basic.link]]) by (for
18
- example) calls to functions whose identifiers have external linkage,
19
- manipulation of objects whose identifiers have external linkage, or
20
- manipulation of data files. Translation units can be separately
21
- translated and then later linked to produce an executable program (
22
- [[basic.link]]). — *end note*]
23
 
24
  ## Phases of translation <a id="lex.phases">[[lex.phases]]</a>
25
 
26
  The precedence among the syntax rules of translation is specified by the
27
  following phases.[^1]
@@ -29,18 +29,18 @@ following phases.[^1]
29
  1. Physical source file characters are mapped, in an
30
  *implementation-defined* manner, to the basic source character set
31
  (introducing new-line characters for end-of-line indicators) if
32
  necessary. The set of physical source file characters accepted is
33
  *implementation-defined*. Any source file character not in the basic
34
- source character set ([[lex.charset]]) is replaced by the
35
  *universal-character-name* that designates that character. An
36
  implementation may use any internal encoding, so long as an actual
37
  extended character encountered in the source file, and the same
38
  extended character expressed in the source file as a
39
  *universal-character-name* (e.g., using the `\uXXXX` notation), are
40
- handled equivalently except where this replacement is reverted (
41
- [[lex.pptoken]]) in a raw string literal.
42
  2. Each instance of a backslash character (\\ immediately followed by a
43
  new-line character is deleted, splicing physical source lines to
44
  form logical source lines. Only the last backslash on any physical
45
  source line shall be eligible for being part of such a splice.
46
  Except for splices reverted in a raw string literal, if a splice
@@ -49,55 +49,58 @@ following phases.[^1]
49
  that is not empty and that does not end in a new-line character, or
50
  that ends in a new-line character immediately preceded by a
51
  backslash character before any such splicing takes place, shall be
52
  processed as if an additional new-line character were appended to
53
  the file.
54
- 3. The source file is decomposed into preprocessing tokens (
55
- [[lex.pptoken]]) and sequences of white-space characters (including
56
  comments). A source file shall not end in a partial preprocessing
57
  token or in a partial comment.[^2] Each comment is replaced by one
58
  space character. New-line characters are retained. Whether each
59
  nonempty sequence of white-space characters other than new-line is
60
  retained or replaced by one space character is unspecified. The
61
  process of dividing a source file’s characters into preprocessing
62
- tokens is context-dependent. \[*Example 1*: see the handling of `<`
63
  within a `#include` preprocessing directive. — *end example*]
64
  4. Preprocessing directives are executed, macro invocations are
65
  expanded, and `_Pragma` unary operator expressions are executed. If
66
  a character sequence that matches the syntax of a
67
- *universal-character-name* is produced by token concatenation (
68
- [[cpp.concat]]), the behavior is undefined. A `#include`
69
  preprocessing directive causes the named header or source file to be
70
  processed from phase 1 through phase 4, recursively. All
71
  preprocessing directives are then deleted.
72
- 5. Each source character set member in a character literal or a string
73
- literal, as well as each escape sequence and
74
- *universal-character-name* in a character literal or a non-raw
75
  string literal, is converted to the corresponding member of the
76
  execution character set ([[lex.ccon]], [[lex.string]]); if there is
77
  no corresponding member, it is converted to an
78
  *implementation-defined* member other than the null (wide)
79
  character.[^3]
80
  6. Adjacent string literal tokens are concatenated.
81
  7. White-space characters separating tokens are no longer significant.
82
- Each preprocessing token is converted into a token ([[lex.token]]).
83
  The resulting tokens are syntactically and semantically analyzed and
84
  translated as a translation unit. \[*Note 1*: The process of
85
  analyzing and translating the tokens may occasionally result in one
86
- token being replaced by a sequence of other tokens (
87
- [[temp.names]]). — *end note*] \[*Note 2*: Source files,
88
- translation units and translated translation units need not
89
- necessarily be stored as files, nor need there be any one-to-one
90
- correspondence between these entities and any external
91
- representation. The description is conceptual only, and does not
92
- specify any particular implementation. *end note*]
 
 
 
93
  8. Translated translation units and instantiation units are combined as
94
  follows: \[*Note 3*: Some or all of these may be supplied from a
95
  library. — *end note*] Each translated translation unit is examined
96
  to produce a list of required instantiations. \[*Note 4*: This may
97
- include instantiations which have been explicitly requested (
98
- [[temp.explicit]]). — *end note*] The definitions of the required
99
  templates are located. It is *implementation-defined* whether the
100
  source of the translation units containing these definitions is
101
  required to be available. \[*Note 5*: An implementation could encode
102
  sufficient information into the translated translation unit so as to
103
  ensure the source is not required here. — *end note*] All the
@@ -138,22 +141,28 @@ hex-quad:
138
  universal-character-name:
139
  '\u' hex-quad
140
  '\U' hex-quad hex-quad
141
  ```
142
 
143
- The character designated by the *universal-character-name* `\UNNNNNNNN`
144
- is that character whose character short name in ISO/IEC 10646 is
145
- `NNNNNNNN`; the character designated by the *universal-character-name*
146
- `\uNNNN` is that character whose character short name in ISO/IEC 10646
147
- is `0000NNNN`. If the hexadecimal value for a *universal-character-name*
148
- corresponds to a surrogate code point (in the range 0xD800–0xDFFF,
149
- inclusive), the program is ill-formed. Additionally, if the hexadecimal
150
- value for a *universal-character-name* outside the *c-char-sequence*,
151
- *s-char-sequence*, or *r-char-sequence* of a character or string literal
152
- corresponds to a control character (in either of the ranges 0x00–0x1F or
153
- 0x7F–0x9F, both inclusive) or to a character in the basic source
154
- character set, the program is ill-formed.[^5]
 
 
 
 
 
 
155
 
156
  The *basic execution character set* and the *basic execution
157
  wide-character set* shall each contain all the members of the basic
158
  source character set, plus control characters representing alert,
159
  backspace, and carriage return, plus a *null character* (respectively,
@@ -171,40 +180,45 @@ members are locale-specific.
171
  ## Preprocessing tokens <a id="lex.pptoken">[[lex.pptoken]]</a>
172
 
173
  ``` bnf
174
  preprocessing-token:
175
  header-name
 
 
 
176
  identifier
177
  pp-number
178
  character-literal
179
  user-defined-character-literal
180
  string-literal
181
  user-defined-string-literal
182
  preprocessing-op-or-punc
183
  each non-white-space character that cannot be one of the above
184
  ```
185
 
186
- Each preprocessing token that is converted to a token ([[lex.token]])
187
- shall have the lexical form of a keyword, an identifier, a literal, an
188
- operator, or a punctuator.
189
 
190
  A preprocessing token is the minimal lexical element of the language in
191
  translation phases 3 through 6. The categories of preprocessing token
192
- are: header names, identifiers, preprocessing numbers, character
 
 
193
  literals (including user-defined character literals), string literals
194
  (including user-defined string literals), preprocessing operators and
195
  punctuators, and single non-white-space characters that do not lexically
196
  match the other preprocessing token categories. If a `'` or a `"`
197
  character matches the last category, the behavior is undefined.
198
  Preprocessing tokens can be separated by white space; this consists of
199
- comments ([[lex.comment]]), or white-space characters (space,
200
- horizontal tab, new-line, vertical tab, and form-feed), or both. As
201
- described in Clause  [[cpp]], in certain circumstances during
202
- translation phase 4, white space (or the absence thereof) serves as more
203
- than preprocessing token separation. White space can appear within a
204
- preprocessing token only as part of a header name or between the
205
- quotation characters in a character literal or string literal.
206
 
207
  If the input stream has been parsed into preprocessing tokens up to a
208
  given character:
209
 
210
  - If the next character begins a sequence of characters that could be
@@ -224,29 +238,39 @@ given character:
224
  preprocessing token by itself and not as the first character of the
225
  alternative token `<:`.
226
  - Otherwise, the next preprocessing token is the longest sequence of
227
  characters that could constitute a preprocessing token, even if that
228
  would cause further lexical analysis to fail, except that a
229
- *header-name* ([[lex.header]]) is only formed within a `#include`
230
- directive ([[cpp.include]]).
 
 
231
 
232
  [*Example 1*:
233
 
234
  ``` cpp
235
  #define R "x"
236
  const char* s = R"y"; // ill-formed raw string, not "x" "y"
237
  ```
238
 
239
  — *end example*]
240
 
 
 
 
 
 
 
 
241
  [*Example 2*: The program fragment `0xe+foo` is parsed as a
242
- preprocessing number token (one that is not a valid floating or integer
243
- literal token), even though a parse as three preprocessing tokens `0xe`,
244
- `+`, and `foo` might produce a valid expression (for example, if `foo`
245
- were a macro defined as `1`). Similarly, the program fragment `1E1` is
246
- parsed as a preprocessing number (one that is a valid floating literal
247
- token), whether or not `E` is a macro name. *end example*]
 
248
 
249
  [*Example 3*: The program fragment `x+++++y` is parsed as `x
250
  ++ ++ + y`, which, if `x` and `y` have integral types, violates a
251
  constraint on increment operators, even though the parse `x ++ + ++ y`
252
  might yield a correct expression. — *end example*]
@@ -256,22 +280,20 @@ might yield a correct expression. — *end example*]
256
  Alternative token representations are provided for some operators and
257
  punctuators.[^6]
258
 
259
  In all respects of the language, each alternative token behaves the
260
  same, respectively, as its primary token, except for its spelling.[^7]
261
- The set of alternative tokens is defined in Table 
262
- [[tab:alternative.tokens]].
263
 
264
  ## Tokens <a id="lex.token">[[lex.token]]</a>
265
 
266
  ``` bnf
267
  token:
268
  identifier
269
  keyword
270
  literal
271
- operator
272
- punctuator
273
  ```
274
 
275
  There are five kinds of tokens: identifiers, keywords, literals,[^8]
276
  operators, and other separators. Blanks, horizontal and vertical tabs,
277
  newlines, formfeeds, and comments (collectively, “white space”), as
@@ -324,11 +346,12 @@ q-char-sequence:
324
  q-char:
325
  any member of the source character set except new-line and '"'
326
  ```
327
 
328
  [*Note 1*: Header name preprocessing tokens only appear within a
329
- `#include` preprocessing directive (see 
 
330
  [[lex.pptoken]]). — *end note*]
331
 
332
  The sequences in both forms of *header-name*s are mapped in an
333
  *implementation-defined* manner to headers or to external source file
334
  names as specified in  [[cpp.include]].
@@ -354,16 +377,17 @@ pp-number:
354
  pp-number 'p' sign
355
  pp-number 'P' sign
356
  pp-number '.'
357
  ```
358
 
359
- Preprocessing number tokens lexically include all integer literal
360
- tokens ([[lex.icon]]) and all floating literal tokens ([[lex.fcon]]).
 
361
 
362
  A preprocessing number does not have a type or a value; it acquires both
363
- after a successful conversion to an integer literal token or a floating
364
- literal token.
365
 
366
  ## Identifiers <a id="lex.name">[[lex.name]]</a>
367
 
368
  ``` bnf
369
  identifier:
@@ -391,18 +415,17 @@ digit: one of
391
  '0 1 2 3 4 5 6 7 8 9'
392
  ```
393
 
394
  An identifier is an arbitrarily long sequence of letters and digits.
395
  Each *universal-character-name* in an identifier shall designate a
396
- character whose encoding in ISO 10646 falls into one of the ranges
397
- specified in Table  [[tab:charname.allowed]]. The initial element shall
398
- not be a *universal-character-name* designating a character whose
399
- encoding falls into one of the ranges specified in Table 
400
- [[tab:charname.disallowed]]. Upper- and lower-case letters are
401
- different. All characters are significant.[^10]
402
 
403
- **Table: Ranges of characters allowed** <a id="tab:charname.allowed">[tab:charname.allowed]</a>
404
 
405
  | | | | | |
406
  | ------------- | ------------- | ------------- | ------------- | ------------- |
407
  | `00A8` | `00AA` | `00AD` | `00AF` | `00B2-00B5` |
408
  | `00B7-00BA` | `00BC-00BE` | `00C0-00D6` | `00D8-00F6` | `00F8-00FF` |
@@ -414,30 +437,23 @@ different. All characters are significant.[^10]
414
  | `10000-1FFFD` | `20000-2FFFD` | `30000-3FFFD` | `40000-4FFFD` | `50000-5FFFD` |
415
  | `60000-6FFFD` | `70000-7FFFD` | `80000-8FFFD` | `90000-9FFFD` | `A0000-AFFFD` |
416
  | `B0000-BFFFD` | `C0000-CFFFD` | `D0000-DFFFD` | `E0000-EFFFD` | |
417
 
418
 
419
- **Table: Ranges of characters disallowed initially (combining characters)** <a id="tab:charname.disallowed">[tab:charname.disallowed]</a>
420
 
421
  | | | | |
422
  | ----------- | ---------------------------------------------- | ----------- | ----------- |
423
  | `0300-036F` | % FIXME: Unicode v7 adds 1AB0-1AFF `1DC0-1DFF` | `20D0-20FF` | `FE20-FE2F` |
424
 
425
 
426
- The identifiers in Table  [[tab:identifiers.special]] have a special
427
- meaning when appearing in a certain context. When referred to in the
428
- grammar, these identifiers are used explicitly rather than using the
429
- *identifier* grammar production. Unless otherwise specified, any
430
- ambiguity as to whether a given *identifier* has a special meaning is
431
- resolved to interpret the token as a regular *identifier*.
432
-
433
- **Table: Identifiers with special meaning** <a id="tab:identifiers.special">[tab:identifiers.special]</a>
434
-
435
- | | |
436
- | ---------- | ------- |
437
- | `override` | `final` |
438
-
439
 
440
  In addition, some identifiers are reserved for use by C++
441
  implementations and shall not be used otherwise; no diagnostic is
442
  required.
443
 
@@ -447,58 +463,69 @@ required.
447
  - Each identifier that begins with an underscore is reserved to the
448
  implementation for use as a name in the global namespace.
449
 
450
  ## Keywords <a id="lex.key">[[lex.key]]</a>
451
 
452
- The identifiers shown in Table  [[tab:keywords]] are reserved for use as
453
- keywords (that is, they are unconditionally treated as keywords in phase
454
- 7) except in an *attribute-token* ([[dcl.attr.grammar]]):
 
 
 
 
455
 
456
- **Table: Keywords** <a id="tab:keywords">[tab:keywords]</a>
 
 
457
 
458
- | | | | | |
459
- | ------------ | -------------- | ----------- | ------------------ | ---------- |
460
- | `alignas` | `continue` | `friend` | `register` | `true` |
461
- | `alignof` | `decltype` | `goto` | `reinterpret_cast` | `try` |
462
- | `asm` | `default` | `if` | `return` | `typedef` |
463
- | `auto` | `delete` | `inline` | `short` | `typeid` |
464
- | `bool` | `do` | `int` | `signed` | `typename` |
465
- | `break` | `double` | `long` | `sizeof` | `union` |
466
- | `case` | `dynamic_cast` | `mutable` | `static` | `unsigned` |
467
- | `catch` | `else` | `namespace` | `static_assert` | `using` |
468
- | `char` | `enum` | `new` | `static_cast` | `virtual` |
469
- | `char16_t` | `explicit` | `noexcept` | `struct` | `void` |
470
- | `char32_t` | `export` | `nullptr` | `switch` | `volatile` |
471
- | `class` | `extern` | `operator` | `template` | `wchar_t` |
472
- | `const` | `false` | `private` | `this` | `while` |
473
- | `constexpr` | `float` | `protected` | `thread_local` | |
474
- | `const_cast` | `for` | `public` | `throw` | |
475
 
 
 
 
476
 
477
- [*Note 1*: The `export` and `register` keywords are unused but are
478
- reserved for future use. — *end note*]
479
-
480
- Furthermore, the alternative representations shown in Table 
481
- [[tab:alternative.representations]] for certain operators and
482
- punctuators ([[lex.digraph]]) are reserved and shall not be used
483
- otherwise:
484
-
485
- **Table: Alternative representations** <a id="tab:alternative.representations">[tab:alternative.representations]</a>
486
 
487
  | | | | | | |
488
  | -------- | -------- | -------- | ------- | -------- | ----- |
489
  | `and` | `and_eq` | `bitand` | `bitor` | `compl` | `not` |
490
  | `not_eq` | `or` | `or_eq` | `xor` | `xor_eq` | |
491
 
492
  ## Operators and punctuators <a id="lex.operators">[[lex.operators]]</a>
493
 
494
  The lexical representation of C++ programs includes a number of
495
- preprocessing tokens which are used in the syntax of the preprocessor or
496
  are converted into tokens for operators and punctuators:
497
 
498
- Each *preprocessing-op-or-punc* is converted to a single token in
499
- translation phase 7 ([[lex.phases]]).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
500
 
501
  ## Literals <a id="lex.literal">[[lex.literal]]</a>
502
 
503
  ### Kinds of literals <a id="lex.literal.kinds">[[lex.literal.kinds]]</a>
504
 
@@ -506,11 +533,11 @@ There are several kinds of literals.[^11]
506
 
507
  ``` bnf
508
  literal:
509
  integer-literal
510
  character-literal
511
- floating-literal
512
  string-literal
513
  boolean-literal
514
  pointer-literal
515
  user-defined-literal
516
  ```
@@ -548,13 +575,12 @@ decimal-literal:
548
  hexadecimal-literal:
549
  hexadecimal-prefix hexadecimal-digit-sequence
550
  ```
551
 
552
  ``` bnf
553
- binary-digit:
554
- '0'
555
- '1'
556
  ```
557
 
558
  ``` bnf
559
  octal-digit: one of
560
  '0 1 2 3 4 5 6 7'
@@ -604,38 +630,44 @@ long-suffix: one of
604
  ``` bnf
605
  long-long-suffix: one of
606
  'll LL'
607
  ```
608
 
609
- An *integer literal* is a sequence of digits that has no period or
610
- exponent part, with optional separating single quotes that are ignored
611
- when determining its value. An integer literal may have a prefix that
612
- specifies its base and a suffix that specifies its type. The lexically
613
- first digit of the sequence of digits is the most significant. A *binary
614
- integer literal* (base two) begins with `0b` or `0B` and consists of a
615
- sequence of binary digits. An *octal integer literal* (base eight)
616
- begins with the digit `0` and consists of a sequence of octal
617
- digits.[^12] A *decimal integer literal* (base ten) begins with a digit
618
- other than `0` and consists of a sequence of decimal digits. A
619
- *hexadecimal integer literal* (base sixteen) begins with `0x` or `0X`
620
- and consists of a sequence of hexadecimal digits, which include the
621
- decimal digits and the letters `a` through `f` and `A` through `F` with
 
 
 
 
 
 
622
  decimal values ten through fifteen.
623
 
624
  [*Example 1*: The number twelve can be written `12`, `014`, `0XC`, or
625
- `0b1100`. The integer literals `1048576`, `1'048'576`, `0X100000`,
626
  `0x10'0000`, and `0'004'000'000` all have the same
627
  value. — *end example*]
628
 
629
- The type of an integer literal is the first of the corresponding list in
630
- Table  [[tab:lex.type.integer.literal]] in which its value can be
631
- represented.
632
 
633
- **Table: Types of integer literals** <a id="tab:lex.type.integer.literal">[tab:lex.type.integer.literal]</a>
634
 
635
- | | | |
636
- | ---------------- | ------------------------ | ------------------------ |
637
  | none | `int` | `int` |
638
  | | `long int` | `unsigned int` |
639
  | | `long long int` | `long int` |
640
  | | | `unsigned long int` |
641
  | | | `long long int` |
@@ -653,20 +685,20 @@ represented.
653
  | | | `unsigned long long int` |
654
  | Both `u` or `U` | `unsigned long long int` | `unsigned long long int` |
655
  | and `ll` or `LL` | | |
656
 
657
 
658
- If an integer literal cannot be represented by any type in its list and
659
- an extended integer type ([[basic.fundamental]]) can represent its
660
  value, it may have that extended integer type. If all of the types in
661
- the list for the integer literal are signed, the extended integer type
662
- shall be signed. If all of the types in the list for the integer literal
663
- are unsigned, the extended integer type shall be unsigned. If the list
664
- contains both signed and unsigned types, the extended integer type may
665
- be signed or unsigned. A program is ill-formed if one of its translation
666
- units contains an integer literal that cannot be represented by any of
667
- the allowed types.
668
 
669
  ### Character literals <a id="lex.ccon">[[lex.ccon]]</a>
670
 
671
  ``` bnf
672
  character-literal:
@@ -682,10 +714,17 @@ encoding-prefix: one of
682
  c-char-sequence:
683
  c-char
684
  c-char-sequence c-char
685
  ```
686
 
 
 
 
 
 
 
 
687
  ``` bnf
688
  escape-sequence:
689
  simple-escape-sequence
690
  octal-escape-sequence
691
  hexadecimal-escape-sequence
@@ -708,76 +747,80 @@ octal-escape-sequence:
708
  hexadecimal-escape-sequence:
709
  '\x' hexadecimal-digit
710
  hexadecimal-escape-sequence hexadecimal-digit
711
  ```
712
 
713
- A character literal is one or more characters enclosed in single quotes,
714
- as in `'x'`, optionally preceded by `u8`, `u`, `U`, or `L`, as in
715
- `u8'w'`, `u'x'`, `U'y'`, or `L'z'`, respectively.
716
-
717
- A character literal that does not begin with `u8`, `u`, `U`, or `L` is
718
  an *ordinary character literal*. An ordinary character literal that
719
  contains a single *c-char* representable in the execution character set
720
  has type `char`, with value equal to the numerical value of the encoding
721
  of the *c-char* in the execution character set. An ordinary character
722
- literal that contains more than one *c-char* is a *multicharacter
723
- literal*. A multicharacter literal, or an ordinary character literal
724
- containing a single *c-char* not representable in the execution
725
- character set, is conditionally-supported, has type `int`, and has an
726
- *implementation-defined* value.
727
-
728
- A character literal that begins with `u8`, such as `u8'w'`, is a
729
- character literal of type `char`, known as a *UTF-8 character literal*.
730
- The value of a UTF-8 character literal is equal to its ISO 10646 code
731
- point value, provided that the code point value is representable with a
732
- single UTF-8 code unit (that is, provided it is in the C0 Controls and
733
- Basic Latin Unicode block). If the value is not representable with a
734
- single UTF-8 code unit, the program is ill-formed. A UTF-8 character
735
- literal containing multiple *c-char*s is ill-formed.
736
-
737
- A character literal that begins with the letter `u`, such as `u'x'`, is
738
- a character literal of type `char16_t`. The value of a `char16_t`
739
- character literal containing a single *c-char* is equal to its ISO 10646
740
- code point value, provided that the code point is representable with a
741
- single 16-bit code unit. (That is, provided it is a basic multi-lingual
742
- plane code point.) If the value is not representable within 16 bits, the
743
- program is ill-formed. A `char16_t` character literal containing
744
- multiple *c-char*s is ill-formed.
745
-
746
- A character literal that begins with the letter `U`, such as `U'y'`, is
747
- a character literal of type `char32_t`. The value of a `char32_t`
748
- character literal containing a single *c-char* is equal to its ISO 10646
749
- code point value. A `char32_t` character literal containing multiple
 
 
750
  *c-char*s is ill-formed.
751
 
752
- A character literal that begins with the letter `L`, such as `L'z'`, is
753
- a *wide-character literal*. A wide-character literal has type
754
- `wchar_t`.[^13] The value of a wide-character literal containing a
 
 
 
 
 
 
755
  single *c-char* has value equal to the numerical value of the encoding
756
  of the *c-char* in the execution wide-character set, unless the *c-char*
757
  has no representation in the execution wide-character set, in which case
758
  the value is *implementation-defined*.
759
 
760
- [*Note 1*: The type `wchar_t` is able to represent all members of the
761
  execution wide-character set (see 
762
  [[basic.fundamental]]). — *end note*]
763
 
764
  The value of a wide-character literal containing multiple *c-char*s is
765
  *implementation-defined*.
766
 
767
  Certain non-graphic characters, the single quote `'`, the double quote
768
- `"`, the question mark `?`,[^14] and the backslash `\`, can be
769
- represented according to Table  [[tab:escape.sequences]]. The double
770
- quote `"` and the question mark `?`, can be represented as themselves or
771
- by the escape sequences `\"` and `\?` respectively, but the single quote
772
- `'` and the backslash `\` shall be represented by the escape sequences
773
- `\'` and `\\` respectively. Escape sequences in which the character
774
- following the backslash is not listed in Table  [[tab:escape.sequences]]
775
- are conditionally-supported, with *implementation-defined* semantics. An
776
- escape sequence specifies a single character.
777
 
778
- **Table: Escape sequences** <a id="tab:escape.sequences">[tab:escape.sequences]</a>
779
 
780
  | | | |
781
  | --------------- | -------------- | ------------------ |
782
  | new-line | NL(LF) | `\n` |
783
  | horizontal tab | HT | `\t` |
@@ -800,49 +843,49 @@ desired character. The escape `\x\numconst{hhh}` consists of the
800
  backslash followed by `x` followed by one or more hexadecimal digits
801
  that are taken to specify the value of the desired character. There is
802
  no limit to the number of digits in a hexadecimal sequence. A sequence
803
  of octal or hexadecimal digits is terminated by the first character that
804
  is not an octal digit or a hexadecimal digit, respectively. The value of
805
- a character literal is *implementation-defined* if it falls outside of
806
- the *implementation-defined* range defined for `char` (for character
807
- literals with no prefix) or `wchar_t` (for character literals prefixed
808
- by `L`).
809
 
810
- [*Note 2*: If the value of a character literal prefixed by `u`, `u8`,
811
  or `U` is outside the range defined for its type, the program is
812
  ill-formed. — *end note*]
813
 
814
  A *universal-character-name* is translated to the encoding, in the
815
  appropriate execution character set, of the character named. If there is
816
  no such encoding, the *universal-character-name* is translated to an
817
  *implementation-defined* encoding.
818
 
819
- [*Note 3*: In translation phase 1, a *universal-character-name* is
820
  introduced whenever an actual extended character is encountered in the
821
  source text. Therefore, all extended characters are described in terms
822
  of *universal-character-name*s. However, the actual compiler
823
  implementation may use its own native character set, so long as the same
824
  results are obtained. — *end note*]
825
 
826
- ### Floating literals <a id="lex.fcon">[[lex.fcon]]</a>
827
 
828
  ``` bnf
829
- floating-literal:
830
- decimal-floating-literal
831
- hexadecimal-floating-literal
832
  ```
833
 
834
  ``` bnf
835
- decimal-floating-literal:
836
- fractional-constant exponent-partₒₚₜ floating-suffixₒₚₜ
837
- digit-sequence exponent-part floating-suffixₒₚₜ
838
  ```
839
 
840
  ``` bnf
841
- hexadecimal-floating-literal:
842
- hexadecimal-prefix hexadecimal-fractional-constant binary-exponent-part floating-suffixₒₚₜ
843
- hexadecimal-prefix hexadecimal-digit-sequence binary-exponent-part floating-suffixₒₚₜ
844
  ```
845
 
846
  ``` bnf
847
  fractional-constant:
848
  digit-sequenceₒₚₜ '.' digit-sequence
@@ -877,50 +920,55 @@ digit-sequence:
877
  digit
878
  digit-sequence '''ₒₚₜ digit
879
  ```
880
 
881
  ``` bnf
882
- floating-suffix: one of
883
  'f l F L'
884
  ```
885
 
886
- A floating literal consists of an optional prefix specifying a base, an
887
- integer part, a radix point, a fraction part, an `e`, `E`, `p` or `P`,
888
- an optionally signed integer exponent, and an optional type suffix. The
889
- integer and fraction parts both consist of a sequence of decimal (base
890
- ten) digits if there is no prefix, or hexadecimal (base sixteen) digits
891
- if the prefix is `0x` or `0X`. The floating literal is a *decimal
892
- floating literal* in the former case and a *hexadecimal floating
893
- literal* in the latter case. Optional separating single quotes in a
894
- *digit-sequence* or *hexadecimal-digit-sequence* are ignored when
895
- determining its value.
896
-
897
- [*Example 1*: The floating literals `1.602'176'565e-19` and
898
- `1.602176565e-19` have the same value. — *end example*]
899
-
900
- Either the integer part or the fraction part (not both) can be omitted.
901
- Either the radix point or the letter `e` or `E` and the exponent (not
902
- both) can be omitted from a decimal floating literal. The radix point
903
- (but not the exponent) can be omitted from a hexadecimal floating
904
- literal. The integer part, the optional radix point, and the optional
905
- fraction part, form the *significand* of the floating literal. In a
906
- decimal floating literal, the exponent, if present, indicates the power
907
- of 10 by which the significand is to be scaled. In a hexadecimal
908
- floating literal, the exponent indicates the power of 2 by which the
909
- significand is to be scaled.
910
-
911
- [*Example 2*: The floating literals `49.625` and `0xC.68p+2` have the
912
- same value. *end example*]
913
-
914
- If the scaled value is in the range of representable values for its
915
- type, the result is the scaled value if representable, else the larger
916
- or smaller representable value nearest the scaled value, chosen in an
917
- *implementation-defined* manner. The type of a floating literal is
918
- `double` unless explicitly specified by a suffix. The suffixes `f` and
919
- `F` specify `float`, the suffixes `l` and `L` specify `long` `double`.
 
 
920
  If the scaled value is not in the range of representable values for its
921
- type, the program is ill-formed.
 
 
 
922
 
923
  ### String literals <a id="lex.string">[[lex.string]]</a>
924
 
925
  ``` bnf
926
  string-literal:
@@ -932,10 +980,17 @@ string-literal:
932
  s-char-sequence:
933
  s-char
934
  s-char-sequence s-char
935
  ```
936
 
 
 
 
 
 
 
 
937
  ``` bnf
938
  raw-string:
939
  '"' d-char-sequenceₒₚₜ '(' r-char-sequenceₒₚₜ ')' d-char-sequenceₒₚₜ '"'
940
  ```
941
 
@@ -943,21 +998,28 @@ raw-string:
943
  r-char-sequence:
944
  r-char
945
  r-char-sequence r-char
946
  ```
947
 
 
 
 
 
 
 
948
  ``` bnf
949
  d-char-sequence:
950
  d-char
951
  d-char-sequence d-char
952
  ```
953
 
954
- A *string-literal* is a sequence of characters (as defined in 
955
- [[lex.ccon]]) surrounded by double quotes, optionally prefixed by `R`,
956
- `u8`, `u8R`, `u`, `uR`, `U`, `UR`, `L`, or `LR`, as in `"..."`,
957
- `R"(...)"`, `u8"..."`, `u8R"**(...)**"`, `u"..."`, `uR"*~(...)*~"`,
958
- `U"..."`, `UR"zzz(...)zzz"`, `L"..."`, or `LR"(...)"`, respectively.
 
959
 
960
  A *string-literal* that has an `R` in the prefix is a *raw string
961
  literal*. The *d-char-sequence* serves as a delimiter. The terminating
962
  *d-char-sequence* of a *raw-string* is the same sequence of characters
963
  as the initial *d-char-sequence*. A *d-char-sequence* shall consist of
@@ -994,78 +1056,74 @@ a"
994
  ```
995
 
996
  is equivalent to `"\n)\\\na\"\n"`. The raw string
997
 
998
  ``` cpp
999
- R"(??)"
1000
  ```
1001
 
1002
- is equivalent to `"\?\?"`. The raw string
1003
-
1004
- ``` cpp
1005
- R"#(
1006
- )??="
1007
- )#"
1008
- ```
1009
-
1010
- is equivalent to `"\n)\?\?=\"\n"`.
1011
 
1012
  — *end example*]
1013
 
1014
  After translation phase 6, a *string-literal* that does not begin with
1015
- an *encoding-prefix* is an *ordinary string literal*, and is initialized
1016
- with the given characters.
 
 
1017
 
1018
  A *string-literal* that begins with `u8`, such as `u8"asdf"`, is a
1019
- *UTF-8 string literal*.
 
 
 
 
1020
 
1021
  Ordinary string literals and UTF-8 string literals are also referred to
1022
- as narrow string literals. A narrow string literal has type “array of
1023
- *n* `const char`”, where *n* is the size of the string as defined below,
1024
- and has static storage duration ([[basic.stc]]).
1025
 
1026
- For a UTF-8 string literal, each successive element of the object
1027
- representation ([[basic.types]]) has the value of the corresponding
1028
- code unit of the UTF-8 encoding of the string.
 
 
1029
 
1030
- A *string-literal* that begins with `u`, such as `u"asdf"`, is a
1031
- `char16_t` string literal. A `char16_t` string literal has type “array
1032
- of *n* `const char16_t`”, where *n* is the size of the string as defined
1033
- below; it is initialized with the given characters. A single *c-char*
1034
- may produce more than one `char16_t` character in the form of surrogate
1035
- pairs.
1036
 
1037
- A *string-literal* that begins with `U`, such as `U"asdf"`, is a
1038
- `char32_t` string literal. A `char32_t` string literal has type “array
1039
- of *n* `const char32_t`”, where *n* is the size of the string as defined
1040
- below; it is initialized with the given characters.
 
1041
 
1042
  A *string-literal* that begins with `L`, such as `L"asdf"`, is a *wide
1043
  string literal*. A wide string literal has type “array of *n* `const
1044
  wchar_t`”, where *n* is the size of the string as defined below; it is
1045
  initialized with the given characters.
1046
 
1047
- In translation phase 6 ([[lex.phases]]), adjacent *string-literal*s are
1048
  concatenated. If both *string-literal*s have the same *encoding-prefix*,
1049
- the resulting concatenated string literal has that *encoding-prefix*. If
1050
- one *string-literal* has no *encoding-prefix*, it is treated as a
1051
  *string-literal* of the same *encoding-prefix* as the other operand. If
1052
  a UTF-8 string literal token is adjacent to a wide string literal token,
1053
  the program is ill-formed. Any other concatenations are
1054
  conditionally-supported with *implementation-defined* behavior.
1055
 
1056
- [*Note 3*: This concatenation is an interpretation, not a conversion.
1057
  Because the interpretation happens in translation phase 6 (after each
1058
- character from a string literal has been translated into a value from
1059
  the appropriate character set), a *string-literal*’s initial rawness has
1060
  no effect on the interpretation or well-formedness of the
1061
  concatenation. — *end note*]
1062
 
1063
- Table  [[tab:lex.string.concat]] has some examples of valid
1064
- concatenations.
1065
 
1066
- **Table: String literal concatenations** <a id="tab:lex.string.concat">[tab:lex.string.concat]</a>
1067
 
1068
  | | | | | | |
1069
  | -------------------------- | ----- | -------------------------- | ----- | -------------------------- | ----- |
1070
  | *[spans 2 columns]* Source | Means | *[spans 2 columns]* Source | Means | *[spans 2 columns]* Source | Means |
1071
  | `u"a"` | `u"b"` | `u"ab"` | `U"a"` | `U"b"` | `U"ab"` | `L"a"` | `L"b"` | `L"ab"` |
@@ -1084,46 +1142,49 @@ Characters in concatenated strings are kept distinct.
1084
  contains the two characters `'\xA'` and `'B'` after concatenation (and
1085
  not the single hexadecimal character `'\xAB'`).
1086
 
1087
  — *end example*]
1088
 
1089
- After any necessary concatenation, in translation phase 7 (
1090
- [[lex.phases]]), `'\0'` is appended to every string literal so that
1091
  programs that scan a string can find its end.
1092
 
1093
  Escape sequences and *universal-character-name*s in non-raw string
1094
- literals have the same meaning as in character literals ([[lex.ccon]]),
1095
  except that the single quote `'` is representable either by itself or by
1096
  the escape sequence `\'`, and the double quote `"` shall be preceded by
1097
- a `\`, and except that a *universal-character-name* in a `char16_t`
1098
- string literal may yield a surrogate pair. In a narrow string literal, a
1099
- *universal-character-name* may map to more than one `char` element due
1100
- to *multibyte encoding*. The size of a `char32_t` or wide string literal
1101
- is the total number of escape sequences, *universal-character-name*s,
1102
- and other characters, plus one for the terminating `U'\0'` or `L'\0'`.
1103
- The size of a `char16_t` string literal is the total number of escape
1104
- sequences, *universal-character-name*s, and other characters, plus one
1105
- for each character requiring a surrogate pair, plus one for the
1106
- terminating `u'\0'`.
1107
 
1108
- [*Note 4*: The size of a `char16_t` string literal is the number of
1109
  code units, not the number of characters. — *end note*]
1110
 
1111
- Within `char32_t` and `char16_t` string literals, any
1112
- *universal-character-name*s shall be within the range `0x0` to
1113
- `0x10FFFF`. The size of a narrow string literal is the total number of
1114
- escape sequences and other characters, plus at least one for the
1115
- multibyte encoding of each *universal-character-name*, plus one for the
 
 
1116
  terminating `'\0'`.
1117
 
1118
  Evaluating a *string-literal* results in a string literal object with
1119
  static storage duration, initialized from the given characters as
1120
- specified above. Whether all string literals are distinct (that is, are
1121
- stored in nonoverlapping objects) and whether successive evaluations of
1122
- a *string-literal* yield the same or a different object is unspecified.
 
1123
 
1124
- [*Note 5*: The effect of attempting to modify a string literal is
1125
  undefined. — *end note*]
1126
 
1127
  ### Boolean literals <a id="lex.bool">[[lex.bool]]</a>
1128
 
1129
  ``` bnf
@@ -1144,21 +1205,21 @@ pointer-literal:
1144
 
1145
  The pointer literal is the keyword `nullptr`. It is a prvalue of type
1146
  `std::nullptr_t`.
1147
 
1148
  [*Note 1*: `std::nullptr_t` is a distinct type that is neither a
1149
- pointer type nor a pointer to member type; rather, a prvalue of this
1150
  type is a null pointer constant and can be converted to a null pointer
1151
  value or null member pointer value. See  [[conv.ptr]] and 
1152
  [[conv.mem]]. — *end note*]
1153
 
1154
  ### User-defined literals <a id="lex.ext">[[lex.ext]]</a>
1155
 
1156
  ``` bnf
1157
  user-defined-literal:
1158
  user-defined-integer-literal
1159
- user-defined-floating-literal
1160
  user-defined-string-literal
1161
  user-defined-character-literal
1162
  ```
1163
 
1164
  ``` bnf
@@ -1168,11 +1229,11 @@ user-defined-integer-literal:
1168
  hexadecimal-literal ud-suffix
1169
  binary-literal ud-suffix
1170
  ```
1171
 
1172
  ``` bnf
1173
- user-defined-floating-literal:
1174
  fractional-constant exponent-partₒₚₜ ud-suffix
1175
  digit-sequence exponent-part ud-suffix
1176
  hexadecimal-prefix hexadecimal-fractional-constant binary-exponent-part ud-suffix
1177
  hexadecimal-prefix hexadecimal-digit-sequence binary-exponent-part ud-suffix
1178
  ```
@@ -1206,65 +1267,65 @@ is a *user-defined-literal*, but `12LL` is an *integer-literal*.
1206
  The syntactic non-terminal preceding the *ud-suffix* in a
1207
  *user-defined-literal* is taken to be the longest sequence of characters
1208
  that could match that non-terminal.
1209
 
1210
  A *user-defined-literal* is treated as a call to a literal operator or
1211
- literal operator template ([[over.literal]]). To determine the form of
1212
  this call for a given *user-defined-literal* *L* with *ud-suffix* *X*,
1213
  the *literal-operator-id* whose literal suffix identifier is *X* is
1214
  looked up in the context of *L* using the rules for unqualified name
1215
- lookup ([[basic.lookup.unqual]]). Let *S* be the set of declarations
1216
- found by this lookup. *S* shall not be empty.
1217
 
1218
  If *L* is a *user-defined-integer-literal*, let *n* be the literal
1219
  without its *ud-suffix*. If *S* contains a literal operator with
1220
  parameter type `unsigned long long`, the literal *L* is treated as a
1221
  call of the form
1222
 
1223
  ``` cpp
1224
  operator "" X(nULL)
1225
  ```
1226
 
1227
- Otherwise, *S* shall contain a raw literal operator or a literal
1228
- operator template ([[over.literal]]) but not both. If *S* contains a
1229
- raw literal operator, the literal *L* is treated as a call of the form
1230
 
1231
  ``` cpp
1232
  operator "" X("n{"})
1233
  ```
1234
 
1235
- Otherwise (*S* contains a literal operator template), *L* is treated as
1236
- a call of the form
1237
 
1238
  ``` cpp
1239
  operator "" X<'c₁', 'c₂', ... 'cₖ'>()
1240
  ```
1241
 
1242
  where *n* is the source character sequence c₁c₂...cₖ.
1243
 
1244
  [*Note 1*: The sequence c₁c₂...cₖ can only contain characters from the
1245
  basic source character set. — *end note*]
1246
 
1247
- If *L* is a *user-defined-floating-literal*, let *f* be the literal
1248
- without its *ud-suffix*. If *S* contains a literal operator with
1249
  parameter type `long double`, the literal *L* is treated as a call of
1250
  the form
1251
 
1252
  ``` cpp
1253
  operator "" X(fL)
1254
  ```
1255
 
1256
- Otherwise, *S* shall contain a raw literal operator or a literal
1257
- operator template ([[over.literal]]) but not both. If *S* contains a
1258
- raw literal operator, the *literal* *L* is treated as a call of the form
1259
 
1260
  ``` cpp
1261
  operator "" X("f{"})
1262
  ```
1263
 
1264
- Otherwise (*S* contains a literal operator template), *L* is treated as
1265
- a call of the form
1266
 
1267
  ``` cpp
1268
  operator "" X<'c₁', 'c₂', ... 'cₖ'>()
1269
  ```
1270
 
@@ -1273,20 +1334,28 @@ where *f* is the source character sequence c₁c₂...cₖ.
1273
  [*Note 2*: The sequence c₁c₂...cₖ can only contain characters from the
1274
  basic source character set. — *end note*]
1275
 
1276
  If *L* is a *user-defined-string-literal*, let *str* be the literal
1277
  without its *ud-suffix* and let *len* be the number of code units in
1278
- *str* (i.e., its length excluding the terminating null character). The
 
 
1279
  literal *L* is treated as a call of the form
1280
 
 
 
 
 
 
 
1281
  ``` cpp
1282
  operator "" X(str, len)
1283
  ```
1284
 
1285
  If *L* is a *user-defined-character-literal*, let *ch* be the literal
1286
- without its *ud-suffix*. *S* shall contain a literal operator (
1287
- [[over.literal]]) whose only parameter has the type of *ch* and the
1288
  literal *L* is treated as a call of the form
1289
 
1290
  ``` cpp
1291
  operator "" X(ch)
1292
  ```
@@ -1305,16 +1374,16 @@ int main() {
1305
  }
1306
  ```
1307
 
1308
  — *end example*]
1309
 
1310
- In translation phase 6 ([[lex.phases]]), adjacent string literals are
1311
- concatenated and *user-defined-string-literal*s are considered string
1312
- literals for that purpose. During concatenation, *ud-suffix*es are
1313
- removed and ignored and the concatenation process occurs as described
1314
- in  [[lex.string]]. At the end of phase 6, if a string literal is the
1315
- result of a concatenation involving at least one
1316
  *user-defined-string-literal*, all the participating
1317
  *user-defined-string-literal*s shall have the same *ud-suffix* and that
1318
  suffix is applied to the result of the concatenation.
1319
 
1320
  [*Example 3*:
@@ -1332,51 +1401,55 @@ int main() {
1332
  [basic.fundamental]: basic.md#basic.fundamental
1333
  [basic.link]: basic.md#basic.link
1334
  [basic.lookup.unqual]: basic.md#basic.lookup.unqual
1335
  [basic.stc]: basic.md#basic.stc
1336
  [basic.types]: basic.md#basic.types
1337
- [conv.mem]: conv.md#conv.mem
1338
- [conv.ptr]: conv.md#conv.ptr
1339
  [cpp]: cpp.md#cpp
1340
  [cpp.concat]: cpp.md#cpp.concat
1341
  [cpp.cond]: cpp.md#cpp.cond
 
1342
  [cpp.include]: cpp.md#cpp.include
 
1343
  [cpp.stringize]: cpp.md#cpp.stringize
1344
  [dcl.attr.grammar]: dcl.md#dcl.attr.grammar
1345
  [headers]: library.md#headers
1346
  [lex]: #lex
1347
  [lex.bool]: #lex.bool
1348
  [lex.ccon]: #lex.ccon
 
1349
  [lex.charset]: #lex.charset
1350
  [lex.comment]: #lex.comment
1351
  [lex.digraph]: #lex.digraph
1352
  [lex.ext]: #lex.ext
1353
  [lex.fcon]: #lex.fcon
 
1354
  [lex.header]: #lex.header
1355
  [lex.icon]: #lex.icon
 
 
1356
  [lex.key]: #lex.key
 
1357
  [lex.literal]: #lex.literal
1358
  [lex.literal.kinds]: #lex.literal.kinds
1359
  [lex.name]: #lex.name
 
 
 
1360
  [lex.nullptr]: #lex.nullptr
1361
  [lex.operators]: #lex.operators
1362
  [lex.phases]: #lex.phases
1363
  [lex.ppnumber]: #lex.ppnumber
1364
  [lex.pptoken]: #lex.pptoken
1365
  [lex.separate]: #lex.separate
1366
  [lex.string]: #lex.string
 
1367
  [lex.token]: #lex.token
 
 
1368
  [over.literal]: over.md#over.literal
1369
- [tab:alternative.representations]: #tab:alternative.representations
1370
- [tab:alternative.tokens]: #tab:alternative.tokens
1371
- [tab:charname.allowed]: #tab:charname.allowed
1372
- [tab:charname.disallowed]: #tab:charname.disallowed
1373
- [tab:escape.sequences]: #tab:escape.sequences
1374
- [tab:identifiers.special]: #tab:identifiers.special
1375
- [tab:keywords]: #tab:keywords
1376
- [tab:lex.string.concat]: #tab:lex.string.concat
1377
- [tab:lex.type.integer.literal]: #tab:lex.type.integer.literal
1378
  [temp.explicit]: temp.md#temp.explicit
1379
  [temp.names]: temp.md#temp.names
1380
 
1381
  [^1]: Implementations must behave as if these separate phases occur,
1382
  although in practice different phases might be folded together.
@@ -1397,21 +1470,21 @@ int main() {
1397
  (described in translation phase 1) is specified as
1398
  *implementation-defined*, an implementation is required to document
1399
  how the basic source characters are represented in source files.
1400
 
1401
  [^5]: A sequence of characters resembling a *universal-character-name*
1402
- in an *r-char-sequence* ([[lex.string]]) does not form a
1403
  *universal-character-name*.
1404
 
1405
  [^6]: These include “digraphs” and additional reserved words. The term
1406
  “digraph” (token consisting of two characters) is not perfectly
1407
- descriptive, since one of the alternative preprocessing-tokens is
1408
  `%:%:` and of course several primary tokens contain two characters.
1409
  Nonetheless, those alternative tokens that aren’t lexical keywords
1410
  are colloquially known as “digraphs”.
1411
 
1412
- [^7]: Thus the “stringized” values ([[cpp.stringize]]) of `[` and `<:`
1413
  will be different, maintaining the source spelling, but the tokens
1414
  can otherwise be freely interchanged.
1415
 
1416
  [^8]: Literals include strings and character and numeric literals.
1417
 
@@ -1428,15 +1501,13 @@ int main() {
1428
  long external identifier, but C++ does not place a translation limit
1429
  on significant characters for external identifiers. In C++, upper-
1430
  and lower-case letters are considered different for all identifiers,
1431
  including external identifiers.
1432
 
1433
- [^11]: The term “literal” generally designates, in this International
1434
- Standard, those tokens that are called “constants” in ISO C.
1435
 
1436
- [^12]: The digits `8` and `9` are not octal digits.
1437
-
1438
- [^13]: They are intended for character sets where a character does not
1439
  fit into a single byte.
1440
 
1441
- [^14]: Using an escape sequence for a question mark is supported for
1442
  compatibility with ISO C++14 and ISO C.
 
1
  # Lexical conventions <a id="lex">[[lex]]</a>
2
 
3
  ## Separate translation <a id="lex.separate">[[lex.separate]]</a>
4
 
5
  The text of the program is kept in units called *source files* in this
6
+ document. A source file together with all the headers [[headers]] and
7
+ source files included [[cpp.include]] via the preprocessing directive
8
+ `#include`, less any source lines skipped by any of the conditional
9
+ inclusion [[cpp.cond]] preprocessing directives, is called a
10
+ *translation unit*.
11
 
12
  [*Note 1*: A C++ program need not all be translated at the same
13
  time. — *end note*]
14
 
15
  [*Note 2*: Previously translated translation units and instantiation
16
  units can be preserved individually or in libraries. The separate
17
+ translation units of a program communicate [[basic.link]] by (for
18
+ example) calls to functions whose identifiers have external or module
19
+ linkage, manipulation of objects whose identifiers have external or
20
+ module linkage, or manipulation of data files. Translation units can be
21
+ separately translated and then later linked to produce an executable
22
+ program [[basic.link]]. — *end note*]
23
 
24
  ## Phases of translation <a id="lex.phases">[[lex.phases]]</a>
25
 
26
  The precedence among the syntax rules of translation is specified by the
27
  following phases.[^1]
 
29
  1. Physical source file characters are mapped, in an
30
  *implementation-defined* manner, to the basic source character set
31
  (introducing new-line characters for end-of-line indicators) if
32
  necessary. The set of physical source file characters accepted is
33
  *implementation-defined*. Any source file character not in the basic
34
+ source character set [[lex.charset]] is replaced by the
35
  *universal-character-name* that designates that character. An
36
  implementation may use any internal encoding, so long as an actual
37
  extended character encountered in the source file, and the same
38
  extended character expressed in the source file as a
39
  *universal-character-name* (e.g., using the `\uXXXX` notation), are
40
+ handled equivalently except where this replacement is reverted
41
+ [[lex.pptoken]] in a raw string literal.
42
  2. Each instance of a backslash character (\\ immediately followed by a
43
  new-line character is deleted, splicing physical source lines to
44
  form logical source lines. Only the last backslash on any physical
45
  source line shall be eligible for being part of such a splice.
46
  Except for splices reverted in a raw string literal, if a splice
 
49
  that is not empty and that does not end in a new-line character, or
50
  that ends in a new-line character immediately preceded by a
51
  backslash character before any such splicing takes place, shall be
52
  processed as if an additional new-line character were appended to
53
  the file.
54
+ 3. The source file is decomposed into preprocessing tokens
55
+ [[lex.pptoken]] and sequences of white-space characters (including
56
  comments). A source file shall not end in a partial preprocessing
57
  token or in a partial comment.[^2] Each comment is replaced by one
58
  space character. New-line characters are retained. Whether each
59
  nonempty sequence of white-space characters other than new-line is
60
  retained or replaced by one space character is unspecified. The
61
  process of dividing a source file’s characters into preprocessing
62
+ tokens is context-dependent. \[*Example 1*: See the handling of `<`
63
  within a `#include` preprocessing directive. — *end example*]
64
  4. Preprocessing directives are executed, macro invocations are
65
  expanded, and `_Pragma` unary operator expressions are executed. If
66
  a character sequence that matches the syntax of a
67
+ *universal-character-name* is produced by token concatenation
68
+ [[cpp.concat]], the behavior is undefined. A `#include`
69
  preprocessing directive causes the named header or source file to be
70
  processed from phase 1 through phase 4, recursively. All
71
  preprocessing directives are then deleted.
72
+ 5. Each basic source character set member in a *character-literal* or a
73
+ *string-literal*, as well as each escape sequence and
74
+ *universal-character-name* in a *character-literal* or a non-raw
75
  string literal, is converted to the corresponding member of the
76
  execution character set ([[lex.ccon]], [[lex.string]]); if there is
77
  no corresponding member, it is converted to an
78
  *implementation-defined* member other than the null (wide)
79
  character.[^3]
80
  6. Adjacent string literal tokens are concatenated.
81
  7. White-space characters separating tokens are no longer significant.
82
+ Each preprocessing token is converted into a token [[lex.token]].
83
  The resulting tokens are syntactically and semantically analyzed and
84
  translated as a translation unit. \[*Note 1*: The process of
85
  analyzing and translating the tokens may occasionally result in one
86
+ token being replaced by a sequence of other tokens
87
+ [[temp.names]]. — *end note*] It is *implementation-defined*
88
+ whether the sources for module units and header units on which the
89
+ current translation unit has an interface dependency (
90
+ [[module.unit]], [[module.import]]) are required to be available.
91
+ \[*Note 2*: Source files, translation units and translated
92
+ translation units need not necessarily be stored as files, nor need
93
+ there be any one-to-one correspondence between these entities and
94
+ any external representation. The description is conceptual only, and
95
+ does not specify any particular implementation. — *end note*]
96
  8. Translated translation units and instantiation units are combined as
97
  follows: \[*Note 3*: Some or all of these may be supplied from a
98
  library. — *end note*] Each translated translation unit is examined
99
  to produce a list of required instantiations. \[*Note 4*: This may
100
+ include instantiations which have been explicitly requested
101
+ [[temp.explicit]]. — *end note*] The definitions of the required
102
  templates are located. It is *implementation-defined* whether the
103
  source of the translation units containing these definitions is
104
  required to be available. \[*Note 5*: An implementation could encode
105
  sufficient information into the translated translation unit so as to
106
  ensure the source is not required here. — *end note*] All the
 
141
  universal-character-name:
142
  '\u' hex-quad
143
  '\U' hex-quad hex-quad
144
  ```
145
 
146
+ A *universal-character-name* designates the character in ISO/IEC 10646
147
+ (if any) whose code point is the hexadecimal number represented by the
148
+ sequence of *hexadecimal-digit*s in the *universal-character-name*. The
149
+ program is ill-formed if that number is not a code point or if it is a
150
+ surrogate code point. Noncharacter code points and reserved code points
151
+ are considered to designate separate characters distinct from any
152
+ ISO/IEC 10646 character. If a *universal-character-name* outside the
153
+ *c-char-sequence*, *s-char-sequence*, or *r-char-sequence* of a
154
+ *character-literal* or *string-literal* (in either case, including
155
+ within a *user-defined-literal*) corresponds to a control character or
156
+ to a character in the basic source character set, the program is
157
+ ill-formed.[^5]
158
+
159
+ [*Note 1*: ISO/IEC 10646 code points are integers in the range
160
+ [0, 10FFFF] (hexadecimal). A surrogate code point is a value in the
161
+ range [D800, DFFF] (hexadecimal). A control character is a character
162
+ whose code point is in either of the ranges [0, 1F] or [7F, 9F]
163
+ (hexadecimal). — *end note*]
164
 
165
  The *basic execution character set* and the *basic execution
166
  wide-character set* shall each contain all the members of the basic
167
  source character set, plus control characters representing alert,
168
  backspace, and carriage return, plus a *null character* (respectively,
 
180
  ## Preprocessing tokens <a id="lex.pptoken">[[lex.pptoken]]</a>
181
 
182
  ``` bnf
183
  preprocessing-token:
184
  header-name
185
+ import-keyword
186
+ module-keyword
187
+ export-keyword
188
  identifier
189
  pp-number
190
  character-literal
191
  user-defined-character-literal
192
  string-literal
193
  user-defined-string-literal
194
  preprocessing-op-or-punc
195
  each non-white-space character that cannot be one of the above
196
  ```
197
 
198
+ Each preprocessing token that is converted to a token [[lex.token]]
199
+ shall have the lexical form of a keyword, an identifier, a literal, or
200
+ an operator or punctuator.
201
 
202
  A preprocessing token is the minimal lexical element of the language in
203
  translation phases 3 through 6. The categories of preprocessing token
204
+ are: header names, placeholder tokens produced by preprocessing `import`
205
+ and `module` directives (*import-keyword*, *module-keyword*, and
206
+ *export-keyword*), identifiers, preprocessing numbers, character
207
  literals (including user-defined character literals), string literals
208
  (including user-defined string literals), preprocessing operators and
209
  punctuators, and single non-white-space characters that do not lexically
210
  match the other preprocessing token categories. If a `'` or a `"`
211
  character matches the last category, the behavior is undefined.
212
  Preprocessing tokens can be separated by white space; this consists of
213
+ comments [[lex.comment]], or white-space characters (space, horizontal
214
+ tab, new-line, vertical tab, and form-feed), or both. As described in
215
+ [[cpp]], in certain circumstances during translation phase 4, white
216
+ space (or the absence thereof) serves as more than preprocessing token
217
+ separation. White space can appear within a preprocessing token only as
218
+ part of a header name or between the quotation characters in a character
219
+ literal or string literal.
220
 
221
  If the input stream has been parsed into preprocessing tokens up to a
222
  given character:
223
 
224
  - If the next character begins a sequence of characters that could be
 
238
  preprocessing token by itself and not as the first character of the
239
  alternative token `<:`.
240
  - Otherwise, the next preprocessing token is the longest sequence of
241
  characters that could constitute a preprocessing token, even if that
242
  would cause further lexical analysis to fail, except that a
243
+ *header-name* [[lex.header]] is only formed
244
+ - after the `include` or `import` preprocessing token in an `#include`
245
+ [[cpp.include]] or `import` [[cpp.import]] directive, or
246
+ - within a *has-include-expression*.
247
 
248
  [*Example 1*:
249
 
250
  ``` cpp
251
  #define R "x"
252
  const char* s = R"y"; // ill-formed raw string, not "x" "y"
253
  ```
254
 
255
  — *end example*]
256
 
257
+ The *import-keyword* is produced by processing an `import` directive
258
+ [[cpp.import]], the *module-keyword* is produced by preprocessing a
259
+ `module` directive [[cpp.module]], and the *export-keyword* is produced
260
+ by preprocessing either of the previous two directives.
261
+
262
+ [*Note 1*: None has any observable spelling. — *end note*]
263
+
264
  [*Example 2*: The program fragment `0xe+foo` is parsed as a
265
+ preprocessing number token (one that is not a valid *integer-literal* or
266
+ *floating-point-literal* token), even though a parse as three
267
+ preprocessing tokens `0xe`, `+`, and `foo` might produce a valid
268
+ expression (for example, if `foo` were a macro defined as `1`).
269
+ Similarly, the program fragment `1E1` is parsed as a preprocessing
270
+ number (one that is a valid *floating-point-literal* token), whether or
271
+ not `E` is a macro name. — *end example*]
272
 
273
  [*Example 3*: The program fragment `x+++++y` is parsed as `x
274
  ++ ++ + y`, which, if `x` and `y` have integral types, violates a
275
  constraint on increment operators, even though the parse `x ++ + ++ y`
276
  might yield a correct expression. — *end example*]
 
280
  Alternative token representations are provided for some operators and
281
  punctuators.[^6]
282
 
283
  In all respects of the language, each alternative token behaves the
284
  same, respectively, as its primary token, except for its spelling.[^7]
285
+ The set of alternative tokens is defined in [[lex.digraph]].
 
286
 
287
  ## Tokens <a id="lex.token">[[lex.token]]</a>
288
 
289
  ``` bnf
290
  token:
291
  identifier
292
  keyword
293
  literal
294
+ operator-or-punctuator
 
295
  ```
296
 
297
  There are five kinds of tokens: identifiers, keywords, literals,[^8]
298
  operators, and other separators. Blanks, horizontal and vertical tabs,
299
  newlines, formfeeds, and comments (collectively, “white space”), as
 
346
  q-char:
347
  any member of the source character set except new-line and '"'
348
  ```
349
 
350
  [*Note 1*: Header name preprocessing tokens only appear within a
351
+ `#include` preprocessing directive, a `__has_include` preprocessing
352
+ expression, or after certain occurrences of an `import` token (see 
353
  [[lex.pptoken]]). — *end note*]
354
 
355
  The sequences in both forms of *header-name*s are mapped in an
356
  *implementation-defined* manner to headers or to external source file
357
  names as specified in  [[cpp.include]].
 
377
  pp-number 'p' sign
378
  pp-number 'P' sign
379
  pp-number '.'
380
  ```
381
 
382
+ Preprocessing number tokens lexically include all *integer-literal*
383
+ tokens [[lex.icon]] and all *floating-point-literal* tokens
384
+ [[lex.fcon]].
385
 
386
  A preprocessing number does not have a type or a value; it acquires both
387
+ after a successful conversion to an *integer-literal* token or a
388
+ *floating-point-literal* token.
389
 
390
  ## Identifiers <a id="lex.name">[[lex.name]]</a>
391
 
392
  ``` bnf
393
  identifier:
 
415
  '0 1 2 3 4 5 6 7 8 9'
416
  ```
417
 
418
  An identifier is an arbitrarily long sequence of letters and digits.
419
  Each *universal-character-name* in an identifier shall designate a
420
+ character whose encoding in ISO/IEC 10646 falls into one of the ranges
421
+ specified in [[lex.name.allowed]]. The initial element shall not be a
422
+ *universal-character-name* designating a character whose encoding falls
423
+ into one of the ranges specified in [[lex.name.disallowed]]. Upper- and
424
+ lower-case letters are different. All characters are significant.[^10]
 
425
 
426
+ **Table: Ranges of characters allowed** <a id="lex.name.allowed">[lex.name.allowed]</a>
427
 
428
  | | | | | |
429
  | ------------- | ------------- | ------------- | ------------- | ------------- |
430
  | `00A8` | `00AA` | `00AD` | `00AF` | `00B2-00B5` |
431
  | `00B7-00BA` | `00BC-00BE` | `00C0-00D6` | `00D8-00F6` | `00F8-00FF` |
 
437
  | `10000-1FFFD` | `20000-2FFFD` | `30000-3FFFD` | `40000-4FFFD` | `50000-5FFFD` |
438
  | `60000-6FFFD` | `70000-7FFFD` | `80000-8FFFD` | `90000-9FFFD` | `A0000-AFFFD` |
439
  | `B0000-BFFFD` | `C0000-CFFFD` | `D0000-DFFFD` | `E0000-EFFFD` | |
440
 
441
 
442
+ **Table: Ranges of characters disallowed initially (combining characters)** <a id="lex.name.disallowed">[lex.name.disallowed]</a>
443
 
444
  | | | | |
445
  | ----------- | ---------------------------------------------- | ----------- | ----------- |
446
  | `0300-036F` | % FIXME: Unicode v7 adds 1AB0-1AFF `1DC0-1DFF` | `20D0-20FF` | `FE20-FE2F` |
447
 
448
 
449
+ The identifiers in [[lex.name.special]] have a special meaning when
450
+ appearing in a certain context. When referred to in the grammar, these
451
+ identifiers are used explicitly rather than using the *identifier*
452
+ grammar production. Unless otherwise specified, any ambiguity as to
453
+ whether a given *identifier* has a special meaning is resolved to
454
+ interpret the token as a regular *identifier*.
 
 
 
 
 
 
 
455
 
456
  In addition, some identifiers are reserved for use by C++
457
  implementations and shall not be used otherwise; no diagnostic is
458
  required.
459
 
 
463
  - Each identifier that begins with an underscore is reserved to the
464
  implementation for use as a name in the global namespace.
465
 
466
  ## Keywords <a id="lex.key">[[lex.key]]</a>
467
 
468
+ ``` bnf
469
+ keyword:
470
+ any identifier listed in [[lex.key]]
471
+ *import-keyword*
472
+ *module-keyword*
473
+ *export-keyword*
474
+ ```
475
 
476
+ The identifiers shown in [[lex.key]] are reserved for use as keywords
477
+ (that is, they are unconditionally treated as keywords in phase 7)
478
+ except in an *attribute-token* [[dcl.attr.grammar]].
479
 
480
+ [*Note 1*: The `register` keyword is unused but is reserved for future
481
+ use. *end note*]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
482
 
483
+ Furthermore, the alternative representations shown in
484
+ [[lex.key.digraph]] for certain operators and punctuators
485
+ [[lex.digraph]] are reserved and shall not be used otherwise.
486
 
487
+ **Table: Alternative representations** <a id="lex.key.digraph">[lex.key.digraph]</a>
 
 
 
 
 
 
 
 
488
 
489
  | | | | | | |
490
  | -------- | -------- | -------- | ------- | -------- | ----- |
491
  | `and` | `and_eq` | `bitand` | `bitor` | `compl` | `not` |
492
  | `not_eq` | `or` | `or_eq` | `xor` | `xor_eq` | |
493
 
494
  ## Operators and punctuators <a id="lex.operators">[[lex.operators]]</a>
495
 
496
  The lexical representation of C++ programs includes a number of
497
+ preprocessing tokens that are used in the syntax of the preprocessor or
498
  are converted into tokens for operators and punctuators:
499
 
500
+ ``` bnf
501
+ preprocessing-op-or-punc:
502
+ preprocessing-operator
503
+ operator-or-punctuator
504
+ ```
505
+
506
+ ``` bnf
507
+ %% Ed. note: character protrusion would misalign various operators.
508
+ preprocessing-operator: one of
509
+ '# ## %: %:%:'
510
+ ```
511
+
512
+ ``` bnf
513
+ operator-or-punctuator: one of
514
+ '{ } [ ] ( )'
515
+ '<: :> <% %> ; : ...'
516
+ '? :: . .* -> ->* ~'
517
+ '! + - * / % ^ & |'
518
+ '= += -= *= /= %= ^= &= |='
519
+ '== != < > <= >= <=> && ||'
520
+ '<< >> <<= >>= ++ -- ,'
521
+ 'and or xor not bitand bitor compl'
522
+ 'and_eq or_eq xor_eq not_eq'
523
+ ```
524
+
525
+ Each *operator-or-punctuator* is converted to a single token in
526
+ translation phase 7 [[lex.phases]].
527
 
528
  ## Literals <a id="lex.literal">[[lex.literal]]</a>
529
 
530
  ### Kinds of literals <a id="lex.literal.kinds">[[lex.literal.kinds]]</a>
531
 
 
533
 
534
  ``` bnf
535
  literal:
536
  integer-literal
537
  character-literal
538
+ floating-point-literal
539
  string-literal
540
  boolean-literal
541
  pointer-literal
542
  user-defined-literal
543
  ```
 
575
  hexadecimal-literal:
576
  hexadecimal-prefix hexadecimal-digit-sequence
577
  ```
578
 
579
  ``` bnf
580
+ binary-digit: one of
581
+ '0 1'
 
582
  ```
583
 
584
  ``` bnf
585
  octal-digit: one of
586
  '0 1 2 3 4 5 6 7'
 
630
  ``` bnf
631
  long-long-suffix: one of
632
  'll LL'
633
  ```
634
 
635
+ In an *integer-literal*, the sequence of *binary-digit*s,
636
+ *octal-digit*s, *digit*s, or *hexadecimal-digit*s is interpreted as a
637
+ base N integer as shown in table [[lex.icon.base]]; the lexically first
638
+ digit of the sequence of digits is the most significant.
639
+
640
+ [*Note 1*: The prefix and any optional separating single quotes are
641
+ ignored when determining the value. *end note*]
642
+
643
+ **Table: Base of *integer-literal*{s}** <a id="lex.icon.base">[lex.icon.base]</a>
644
+
645
+ | Kind of *integer-literal* | base $N$ |
646
+ | ------------------------- | -------- |
647
+ | *binary-literal* | 2 |
648
+ | *octal-literal* | 8 |
649
+ | *decimal-literal* | 10 |
650
+ | *hexadecimal-literal* | 16 |
651
+
652
+
653
+ The *hexadecimal-digit*s `a` through `f` and `A` through `F` have
654
  decimal values ten through fifteen.
655
 
656
  [*Example 1*: The number twelve can be written `12`, `014`, `0XC`, or
657
+ `0b1100`. The *integer-literal*s `1048576`, `1'048'576`, `0X100000`,
658
  `0x10'0000`, and `0'004'000'000` all have the same
659
  value. — *end example*]
660
 
661
+ The type of an *integer-literal* is the first type in the list in
662
+ [[lex.icon.type]] corresponding to its optional *integer-suffix* in
663
+ which its value can be represented. An *integer-literal* is a prvalue.
664
 
665
+ **Table: Types of *integer-literal*s** <a id="lex.icon.type">[lex.icon.type]</a>
666
 
667
+ | *integer-suffix* | *decimal-literal* | *integer-literal* other than *decimal-literal* |
668
+ | ---------------- | ------------------------ | ---------------------------------------------- |
669
  | none | `int` | `int` |
670
  | | `long int` | `unsigned int` |
671
  | | `long long int` | `long int` |
672
  | | | `unsigned long int` |
673
  | | | `long long int` |
 
685
  | | | `unsigned long long int` |
686
  | Both `u` or `U` | `unsigned long long int` | `unsigned long long int` |
687
  | and `ll` or `LL` | | |
688
 
689
 
690
+ If an *integer-literal* cannot be represented by any type in its list
691
+ and an extended integer type [[basic.fundamental]] can represent its
692
  value, it may have that extended integer type. If all of the types in
693
+ the list for the *integer-literal* are signed, the extended integer type
694
+ shall be signed. If all of the types in the list for the
695
+ *integer-literal* are unsigned, the extended integer type shall be
696
+ unsigned. If the list contains both signed and unsigned types, the
697
+ extended integer type may be signed or unsigned. A program is ill-formed
698
+ if one of its translation units contains an *integer-literal* that
699
+ cannot be represented by any of the allowed types.
700
 
701
  ### Character literals <a id="lex.ccon">[[lex.ccon]]</a>
702
 
703
  ``` bnf
704
  character-literal:
 
714
  c-char-sequence:
715
  c-char
716
  c-char-sequence c-char
717
  ```
718
 
719
+ ``` bnf
720
+ c-char:
721
+ any member of the basic source character set except the single-quote ''', backslash '\', or new-line character
722
+ escape-sequence
723
+ universal-character-name
724
+ ```
725
+
726
  ``` bnf
727
  escape-sequence:
728
  simple-escape-sequence
729
  octal-escape-sequence
730
  hexadecimal-escape-sequence
 
747
  hexadecimal-escape-sequence:
748
  '\x' hexadecimal-digit
749
  hexadecimal-escape-sequence hexadecimal-digit
750
  ```
751
 
752
+ A *character-literal* that does not begin with `u8`, `u`, `U`, or `L` is
 
 
 
 
753
  an *ordinary character literal*. An ordinary character literal that
754
  contains a single *c-char* representable in the execution character set
755
  has type `char`, with value equal to the numerical value of the encoding
756
  of the *c-char* in the execution character set. An ordinary character
757
+ literal that contains more than one *c-char* is a
758
+ *multicharacter literal*. A multicharacter literal, or an ordinary
759
+ character literal containing a single *c-char* not representable in the
760
+ execution character set, is conditionally-supported, has type `int`, and
761
+ has an *implementation-defined* value.
762
+
763
+ A *character-literal* that begins with `u8`, such as `u8'w'`, is a
764
+ *character-literal* of type `char8_t`, known as a *UTF-8 character
765
+ literal*. The value of a UTF-8 character literal is equal to its ISO/IEC
766
+ 10646 code point value, provided that the code point value can be
767
+ encoded as a single UTF-8 code unit.
768
+
769
+ [*Note 1*: That is, provided the code point value is in the range
770
+ [0, 7F] (hexadecimal). *end note*]
771
+
772
+ If the value is not representable with a single UTF-8 code unit, the
773
+ program is ill-formed. A UTF-8 character literal containing multiple
774
+ *c-char*s is ill-formed.
775
+
776
+ A *character-literal* that begins with the letter `u`, such as `u'x'`,
777
+ is a *character-literal* of type `char16_t`, known as a *UTF-16
778
+ character literal*. The value of a UTF-16 character literal is equal to
779
+ its ISO/IEC 10646 code point value, provided that the code point value
780
+ is representable with a single 16-bit code unit.
781
+
782
+ [*Note 2*: That is, provided the code point value is in the range
783
+ [0, FFFF] (hexadecimal). *end note*]
784
+
785
+ If the value is not representable with a single 16-bit code unit, the
786
+ program is ill-formed. A UTF-16 character literal containing multiple
787
  *c-char*s is ill-formed.
788
 
789
+ A *character-literal* that begins with the letter `U`, such as `U'y'`,
790
+ is a *character-literal* of type `char32_t`, known as a *UTF-32
791
+ character literal*. The value of a UTF-32 character literal containing a
792
+ single *c-char* is equal to its ISO/IEC 10646 code point value. A UTF-32
793
+ character literal containing multiple *c-char*s is ill-formed.
794
+
795
+ A *character-literal* that begins with the letter `L`, such as `L'z'`,
796
+ is a *wide-character literal*. A wide-character literal has type
797
+ `wchar_t`.[^12] The value of a wide-character literal containing a
798
  single *c-char* has value equal to the numerical value of the encoding
799
  of the *c-char* in the execution wide-character set, unless the *c-char*
800
  has no representation in the execution wide-character set, in which case
801
  the value is *implementation-defined*.
802
 
803
+ [*Note 3*: The type `wchar_t` is able to represent all members of the
804
  execution wide-character set (see 
805
  [[basic.fundamental]]). — *end note*]
806
 
807
  The value of a wide-character literal containing multiple *c-char*s is
808
  *implementation-defined*.
809
 
810
  Certain non-graphic characters, the single quote `'`, the double quote
811
+ `"`, the question mark `?`,[^13] and the backslash `\`, can be
812
+ represented according to [[lex.ccon.esc]]. The double quote `"` and the
813
+ question mark `?`, can be represented as themselves or by the escape
814
+ sequences `\"` and `\?` respectively, but the single quote `'` and the
815
+ backslash `\` shall be represented by the escape sequences `\'` and `\\`
816
+ respectively. Escape sequences in which the character following the
817
+ backslash is not listed in [[lex.ccon.esc]] are conditionally-supported,
818
+ with *implementation-defined* semantics. An escape sequence specifies a
819
+ single character.
820
 
821
+ **Table: Escape sequences** <a id="lex.ccon.esc">[lex.ccon.esc]</a>
822
 
823
  | | | |
824
  | --------------- | -------------- | ------------------ |
825
  | new-line | NL(LF) | `\n` |
826
  | horizontal tab | HT | `\t` |
 
843
  backslash followed by `x` followed by one or more hexadecimal digits
844
  that are taken to specify the value of the desired character. There is
845
  no limit to the number of digits in a hexadecimal sequence. A sequence
846
  of octal or hexadecimal digits is terminated by the first character that
847
  is not an octal digit or a hexadecimal digit, respectively. The value of
848
+ a *character-literal* is *implementation-defined* if it falls outside of
849
+ the *implementation-defined* range defined for `char` (for
850
+ *character-literal*s with no prefix) or `wchar_t` (for
851
+ *character-literal*s prefixed by `L`).
852
 
853
+ [*Note 4*: If the value of a *character-literal* prefixed by `u`, `u8`,
854
  or `U` is outside the range defined for its type, the program is
855
  ill-formed. — *end note*]
856
 
857
  A *universal-character-name* is translated to the encoding, in the
858
  appropriate execution character set, of the character named. If there is
859
  no such encoding, the *universal-character-name* is translated to an
860
  *implementation-defined* encoding.
861
 
862
+ [*Note 5*: In translation phase 1, a *universal-character-name* is
863
  introduced whenever an actual extended character is encountered in the
864
  source text. Therefore, all extended characters are described in terms
865
  of *universal-character-name*s. However, the actual compiler
866
  implementation may use its own native character set, so long as the same
867
  results are obtained. — *end note*]
868
 
869
+ ### Floating-point literals <a id="lex.fcon">[[lex.fcon]]</a>
870
 
871
  ``` bnf
872
+ floating-point-literal:
873
+ decimal-floating-point-literal
874
+ hexadecimal-floating-point-literal
875
  ```
876
 
877
  ``` bnf
878
+ decimal-floating-point-literal:
879
+ fractional-constant exponent-partₒₚₜ floating-point-suffixₒₚₜ
880
+ digit-sequence exponent-part floating-point-suffixₒₚₜ
881
  ```
882
 
883
  ``` bnf
884
+ hexadecimal-floating-point-literal:
885
+ hexadecimal-prefix hexadecimal-fractional-constant binary-exponent-part floating-point-suffixₒₚₜ
886
+ hexadecimal-prefix hexadecimal-digit-sequence binary-exponent-part floating-point-suffixₒₚₜ
887
  ```
888
 
889
  ``` bnf
890
  fractional-constant:
891
  digit-sequenceₒₚₜ '.' digit-sequence
 
920
  digit
921
  digit-sequence '''ₒₚₜ digit
922
  ```
923
 
924
  ``` bnf
925
+ floating-point-suffix: one of
926
  'f l F L'
927
  ```
928
 
929
+ The type of a *floating-point-literal* is determined by its
930
+ *floating-point-suffix* as specified in [[lex.fcon.type]].
931
+
932
+ **Table: Types of *floating-point-literal*{s}** <a id="lex.fcon.type">[lex.fcon.type]</a>
933
+
934
+ | *floating-point-suffix* | type |
935
+ | ----------------------- | --------------- |
936
+ | none | `double` |
937
+ | `f` or `F` | `float` |
938
+ | `l` or `L` | `long` `double` |
939
+
940
+
941
+ The *significand* of a *floating-point-literal* is the
942
+ *fractional-constant* or *digit-sequence* of a
943
+ *decimal-floating-point-literal* or the
944
+ *hexadecimal-fractional-constant* or *hexadecimal-digit-sequence* of a
945
+ *hexadecimal-floating-point-literal*. In the significand, the sequence
946
+ of *digit*s or *hexadecimal-digit*s and optional period are interpreted
947
+ as a base N real number s, where N is 10 for a
948
+ *decimal-floating-point-literal* and 16 for a
949
+ *hexadecimal-floating-point-literal*.
950
+
951
+ [*Note 1*: Any optional separating single quotes are ignored when
952
+ determining the value. *end note*]
953
+
954
+ If an *exponent-part* or *binary-exponent-part* is present, the exponent
955
+ e of the *floating-point-literal* is the result of interpreting the
956
+ sequence of an optional *sign* and the *digit*s as a base 10 integer.
957
+ Otherwise, the exponent e is 0. The scaled value of the literal is
958
+ s × 10ᵉ for a *decimal-floating-point-literal* and s × 2ᵉ for a
959
+ *hexadecimal-floating-point-literal*.
960
+
961
+ [*Example 1*: The *floating-point-literal*s `49.625` and `0xC.68p+2`
962
+ have the same value. The *floating-point-literal*s `1.602'176'565e-19`
963
+ and `1.602176565e-19` have the same value. — *end example*]
964
+
965
  If the scaled value is not in the range of representable values for its
966
+ type, the program is ill-formed. Otherwise, the value of a
967
+ *floating-point-literal* is the scaled value if representable, else the
968
+ larger or smaller representable value nearest the scaled value, chosen
969
+ in an *implementation-defined* manner.
970
 
971
  ### String literals <a id="lex.string">[[lex.string]]</a>
972
 
973
  ``` bnf
974
  string-literal:
 
980
  s-char-sequence:
981
  s-char
982
  s-char-sequence s-char
983
  ```
984
 
985
+ ``` bnf
986
+ s-char:
987
+ any member of the basic source character set except the double-quote '"', backslash '\', or new-line character
988
+ escape-sequence
989
+ universal-character-name
990
+ ```
991
+
992
  ``` bnf
993
  raw-string:
994
  '"' d-char-sequenceₒₚₜ '(' r-char-sequenceₒₚₜ ')' d-char-sequenceₒₚₜ '"'
995
  ```
996
 
 
998
  r-char-sequence:
999
  r-char
1000
  r-char-sequence r-char
1001
  ```
1002
 
1003
+ ``` bnf
1004
+ r-char:
1005
+ any member of the source character set, except a right parenthesis ')' followed by
1006
+ the initial *d-char-sequence* (which may be empty) followed by a double quote '"'.
1007
+ ```
1008
+
1009
  ``` bnf
1010
  d-char-sequence:
1011
  d-char
1012
  d-char-sequence d-char
1013
  ```
1014
 
1015
+ ``` bnf
1016
+ d-char:
1017
+ any member of the basic source character set except:
1018
+ space, the left parenthesis '(', the right parenthesis ')', the backslash '\', and the control characters
1019
+ representing horizontal tab, vertical tab, form feed, and newline.
1020
+ ```
1021
 
1022
  A *string-literal* that has an `R` in the prefix is a *raw string
1023
  literal*. The *d-char-sequence* serves as a delimiter. The terminating
1024
  *d-char-sequence* of a *raw-string* is the same sequence of characters
1025
  as the initial *d-char-sequence*. A *d-char-sequence* shall consist of
 
1056
  ```
1057
 
1058
  is equivalent to `"\n)\\\na\"\n"`. The raw string
1059
 
1060
  ``` cpp
1061
+ R"(x = "\"y\"")"
1062
  ```
1063
 
1064
+ is equivalent to `"x = \"\\\"y\\\"\""`.
 
 
 
 
 
 
 
 
1065
 
1066
  — *end example*]
1067
 
1068
  After translation phase 6, a *string-literal* that does not begin with
1069
+ an *encoding-prefix* is an *ordinary string literal*. An ordinary string
1070
+ literal has type “array of *n* `const char`” where *n* is the size of
1071
+ the string as defined below, has static storage duration [[basic.stc]],
1072
+ and is initialized with the given characters.
1073
 
1074
  A *string-literal* that begins with `u8`, such as `u8"asdf"`, is a
1075
+ *UTF-8 string literal*. A UTF-8 string literal has type “array of *n*
1076
+ `const char8_t`”, where *n* is the size of the string as defined below;
1077
+ each successive element of the object representation [[basic.types]] has
1078
+ the value of the corresponding code unit of the UTF-8 encoding of the
1079
+ string.
1080
 
1081
  Ordinary string literals and UTF-8 string literals are also referred to
1082
+ as narrow string literals.
 
 
1083
 
1084
+ A *string-literal* that begins with `u`, such as `u"asdf"`, is a *UTF-16
1085
+ string literal*. A UTF-16 string literal has type “array of *n*
1086
+ `const char16_t`”, where *n* is the size of the string as defined below;
1087
+ each successive element of the array has the value of the corresponding
1088
+ code unit of the UTF-16 encoding of the string.
1089
 
1090
+ [*Note 3*: A single *c-char* may produce more than one `char16_t`
1091
+ character in the form of surrogate pairs. A surrogate pair is a
1092
+ representation for a single code point as a sequence of two 16-bit code
1093
+ units. *end note*]
 
 
1094
 
1095
+ A *string-literal* that begins with `U`, such as `U"asdf"`, is a *UTF-32
1096
+ string literal*. A UTF-32 string literal has type “array of *n*
1097
+ `const char32_t`”, where *n* is the size of the string as defined below;
1098
+ each successive element of the array has the value of the corresponding
1099
+ code unit of the UTF-32 encoding of the string.
1100
 
1101
  A *string-literal* that begins with `L`, such as `L"asdf"`, is a *wide
1102
  string literal*. A wide string literal has type “array of *n* `const
1103
  wchar_t`”, where *n* is the size of the string as defined below; it is
1104
  initialized with the given characters.
1105
 
1106
+ In translation phase 6 [[lex.phases]], adjacent *string-literal*s are
1107
  concatenated. If both *string-literal*s have the same *encoding-prefix*,
1108
+ the resulting concatenated *string-literal* has that *encoding-prefix*.
1109
+ If one *string-literal* has no *encoding-prefix*, it is treated as a
1110
  *string-literal* of the same *encoding-prefix* as the other operand. If
1111
  a UTF-8 string literal token is adjacent to a wide string literal token,
1112
  the program is ill-formed. Any other concatenations are
1113
  conditionally-supported with *implementation-defined* behavior.
1114
 
1115
+ [*Note 4*: This concatenation is an interpretation, not a conversion.
1116
  Because the interpretation happens in translation phase 6 (after each
1117
+ character from a *string-literal* has been translated into a value from
1118
  the appropriate character set), a *string-literal*’s initial rawness has
1119
  no effect on the interpretation or well-formedness of the
1120
  concatenation. — *end note*]
1121
 
1122
+ [[lex.string.concat]] has some examples of valid concatenations.
 
1123
 
1124
+ **Table: String literal concatenations** <a id="lex.string.concat">[lex.string.concat]</a>
1125
 
1126
  | | | | | | |
1127
  | -------------------------- | ----- | -------------------------- | ----- | -------------------------- | ----- |
1128
  | *[spans 2 columns]* Source | Means | *[spans 2 columns]* Source | Means | *[spans 2 columns]* Source | Means |
1129
  | `u"a"` | `u"b"` | `u"ab"` | `U"a"` | `U"b"` | `U"ab"` | `L"a"` | `L"b"` | `L"ab"` |
 
1142
  contains the two characters `'\xA'` and `'B'` after concatenation (and
1143
  not the single hexadecimal character `'\xAB'`).
1144
 
1145
  — *end example*]
1146
 
1147
+ After any necessary concatenation, in translation phase 7
1148
+ [[lex.phases]], `'\0'` is appended to every *string-literal* so that
1149
  programs that scan a string can find its end.
1150
 
1151
  Escape sequences and *universal-character-name*s in non-raw string
1152
+ literals have the same meaning as in *character-literal*s [[lex.ccon]],
1153
  except that the single quote `'` is representable either by itself or by
1154
  the escape sequence `\'`, and the double quote `"` shall be preceded by
1155
+ a `\`, and except that a *universal-character-name* in a UTF-16 string
1156
+ literal may yield a surrogate pair. In a narrow string literal, a
1157
+ *universal-character-name* may map to more than one `char` or `char8_t`
1158
+ element due to *multibyte encoding*. The size of a `char32_t` or wide
1159
+ string literal is the total number of escape sequences,
1160
+ *universal-character-name*s, and other characters, plus one for the
1161
+ terminating `U'\0'` or `L'\0'`. The size of a UTF-16 string literal is
1162
+ the total number of escape sequences, *universal-character-name*s, and
1163
+ other characters, plus one for each character requiring a surrogate
1164
+ pair, plus one for the terminating `u'\0'`.
1165
 
1166
+ [*Note 5*: The size of a `char16_t` string literal is the number of
1167
  code units, not the number of characters. — *end note*]
1168
 
1169
+ [*Note 6*: Any *universal-character-name*s are required to correspond
1170
+ to a code point in the range [0, D800) or [E000, 10FFFF] (hexadecimal)
1171
+ [[lex.charset]]. *end note*]
1172
+
1173
+ The size of a narrow string literal is the total number of escape
1174
+ sequences and other characters, plus at least one for the multibyte
1175
+ encoding of each *universal-character-name*, plus one for the
1176
  terminating `'\0'`.
1177
 
1178
  Evaluating a *string-literal* results in a string literal object with
1179
  static storage duration, initialized from the given characters as
1180
+ specified above. Whether all *string-literal*s are distinct (that is,
1181
+ are stored in nonoverlapping objects) and whether successive evaluations
1182
+ of a *string-literal* yield the same or a different object is
1183
+ unspecified.
1184
 
1185
+ [*Note 7*: The effect of attempting to modify a *string-literal* is
1186
  undefined. — *end note*]
1187
 
1188
  ### Boolean literals <a id="lex.bool">[[lex.bool]]</a>
1189
 
1190
  ``` bnf
 
1205
 
1206
  The pointer literal is the keyword `nullptr`. It is a prvalue of type
1207
  `std::nullptr_t`.
1208
 
1209
  [*Note 1*: `std::nullptr_t` is a distinct type that is neither a
1210
+ pointer type nor a pointer-to-member type; rather, a prvalue of this
1211
  type is a null pointer constant and can be converted to a null pointer
1212
  value or null member pointer value. See  [[conv.ptr]] and 
1213
  [[conv.mem]]. — *end note*]
1214
 
1215
  ### User-defined literals <a id="lex.ext">[[lex.ext]]</a>
1216
 
1217
  ``` bnf
1218
  user-defined-literal:
1219
  user-defined-integer-literal
1220
+ user-defined-floating-point-literal
1221
  user-defined-string-literal
1222
  user-defined-character-literal
1223
  ```
1224
 
1225
  ``` bnf
 
1229
  hexadecimal-literal ud-suffix
1230
  binary-literal ud-suffix
1231
  ```
1232
 
1233
  ``` bnf
1234
+ user-defined-floating-point-literal:
1235
  fractional-constant exponent-partₒₚₜ ud-suffix
1236
  digit-sequence exponent-part ud-suffix
1237
  hexadecimal-prefix hexadecimal-fractional-constant binary-exponent-part ud-suffix
1238
  hexadecimal-prefix hexadecimal-digit-sequence binary-exponent-part ud-suffix
1239
  ```
 
1267
  The syntactic non-terminal preceding the *ud-suffix* in a
1268
  *user-defined-literal* is taken to be the longest sequence of characters
1269
  that could match that non-terminal.
1270
 
1271
  A *user-defined-literal* is treated as a call to a literal operator or
1272
+ literal operator template [[over.literal]]. To determine the form of
1273
  this call for a given *user-defined-literal* *L* with *ud-suffix* *X*,
1274
  the *literal-operator-id* whose literal suffix identifier is *X* is
1275
  looked up in the context of *L* using the rules for unqualified name
1276
+ lookup [[basic.lookup.unqual]]. Let *S* be the set of declarations found
1277
+ by this lookup. *S* shall not be empty.
1278
 
1279
  If *L* is a *user-defined-integer-literal*, let *n* be the literal
1280
  without its *ud-suffix*. If *S* contains a literal operator with
1281
  parameter type `unsigned long long`, the literal *L* is treated as a
1282
  call of the form
1283
 
1284
  ``` cpp
1285
  operator "" X(nULL)
1286
  ```
1287
 
1288
+ Otherwise, *S* shall contain a raw literal operator or a numeric literal
1289
+ operator template [[over.literal]] but not both. If *S* contains a raw
1290
+ literal operator, the literal *L* is treated as a call of the form
1291
 
1292
  ``` cpp
1293
  operator "" X("n{"})
1294
  ```
1295
 
1296
+ Otherwise (*S* contains a numeric literal operator template), *L* is
1297
+ treated as a call of the form
1298
 
1299
  ``` cpp
1300
  operator "" X<'c₁', 'c₂', ... 'cₖ'>()
1301
  ```
1302
 
1303
  where *n* is the source character sequence c₁c₂...cₖ.
1304
 
1305
  [*Note 1*: The sequence c₁c₂...cₖ can only contain characters from the
1306
  basic source character set. — *end note*]
1307
 
1308
+ If *L* is a *user-defined-floating-point-literal*, let *f* be the
1309
+ literal without its *ud-suffix*. If *S* contains a literal operator with
1310
  parameter type `long double`, the literal *L* is treated as a call of
1311
  the form
1312
 
1313
  ``` cpp
1314
  operator "" X(fL)
1315
  ```
1316
 
1317
+ Otherwise, *S* shall contain a raw literal operator or a numeric literal
1318
+ operator template [[over.literal]] but not both. If *S* contains a raw
1319
+ literal operator, the *literal* *L* is treated as a call of the form
1320
 
1321
  ``` cpp
1322
  operator "" X("f{"})
1323
  ```
1324
 
1325
+ Otherwise (*S* contains a numeric literal operator template), *L* is
1326
+ treated as a call of the form
1327
 
1328
  ``` cpp
1329
  operator "" X<'c₁', 'c₂', ... 'cₖ'>()
1330
  ```
1331
 
 
1334
  [*Note 2*: The sequence c₁c₂...cₖ can only contain characters from the
1335
  basic source character set. — *end note*]
1336
 
1337
  If *L* is a *user-defined-string-literal*, let *str* be the literal
1338
  without its *ud-suffix* and let *len* be the number of code units in
1339
+ *str* (i.e., its length excluding the terminating null character). If
1340
+ *S* contains a literal operator template with a non-type template
1341
+ parameter for which *str* is a well-formed *template-argument*, the
1342
  literal *L* is treated as a call of the form
1343
 
1344
+ ``` cpp
1345
+ operator "" X<str>()
1346
+ ```
1347
+
1348
+ Otherwise, the literal *L* is treated as a call of the form
1349
+
1350
  ``` cpp
1351
  operator "" X(str, len)
1352
  ```
1353
 
1354
  If *L* is a *user-defined-character-literal*, let *ch* be the literal
1355
+ without its *ud-suffix*. *S* shall contain a literal operator
1356
+ [[over.literal]] whose only parameter has the type of *ch* and the
1357
  literal *L* is treated as a call of the form
1358
 
1359
  ``` cpp
1360
  operator "" X(ch)
1361
  ```
 
1374
  }
1375
  ```
1376
 
1377
  — *end example*]
1378
 
1379
+ In translation phase 6 [[lex.phases]], adjacent *string-literal*s are
1380
+ concatenated and *user-defined-string-literal*s are considered
1381
+ *string-literal*s for that purpose. During concatenation, *ud-suffix*es
1382
+ are removed and ignored and the concatenation process occurs as
1383
+ described in  [[lex.string]]. At the end of phase 6, if a
1384
+ *string-literal* is the result of a concatenation involving at least one
1385
  *user-defined-string-literal*, all the participating
1386
  *user-defined-string-literal*s shall have the same *ud-suffix* and that
1387
  suffix is applied to the result of the concatenation.
1388
 
1389
  [*Example 3*:
 
1401
  [basic.fundamental]: basic.md#basic.fundamental
1402
  [basic.link]: basic.md#basic.link
1403
  [basic.lookup.unqual]: basic.md#basic.lookup.unqual
1404
  [basic.stc]: basic.md#basic.stc
1405
  [basic.types]: basic.md#basic.types
1406
+ [conv.mem]: expr.md#conv.mem
1407
+ [conv.ptr]: expr.md#conv.ptr
1408
  [cpp]: cpp.md#cpp
1409
  [cpp.concat]: cpp.md#cpp.concat
1410
  [cpp.cond]: cpp.md#cpp.cond
1411
+ [cpp.import]: cpp.md#cpp.import
1412
  [cpp.include]: cpp.md#cpp.include
1413
+ [cpp.module]: cpp.md#cpp.module
1414
  [cpp.stringize]: cpp.md#cpp.stringize
1415
  [dcl.attr.grammar]: dcl.md#dcl.attr.grammar
1416
  [headers]: library.md#headers
1417
  [lex]: #lex
1418
  [lex.bool]: #lex.bool
1419
  [lex.ccon]: #lex.ccon
1420
+ [lex.ccon.esc]: #lex.ccon.esc
1421
  [lex.charset]: #lex.charset
1422
  [lex.comment]: #lex.comment
1423
  [lex.digraph]: #lex.digraph
1424
  [lex.ext]: #lex.ext
1425
  [lex.fcon]: #lex.fcon
1426
+ [lex.fcon.type]: #lex.fcon.type
1427
  [lex.header]: #lex.header
1428
  [lex.icon]: #lex.icon
1429
+ [lex.icon.base]: #lex.icon.base
1430
+ [lex.icon.type]: #lex.icon.type
1431
  [lex.key]: #lex.key
1432
+ [lex.key.digraph]: #lex.key.digraph
1433
  [lex.literal]: #lex.literal
1434
  [lex.literal.kinds]: #lex.literal.kinds
1435
  [lex.name]: #lex.name
1436
+ [lex.name.allowed]: #lex.name.allowed
1437
+ [lex.name.disallowed]: #lex.name.disallowed
1438
+ [lex.name.special]: #lex.name.special
1439
  [lex.nullptr]: #lex.nullptr
1440
  [lex.operators]: #lex.operators
1441
  [lex.phases]: #lex.phases
1442
  [lex.ppnumber]: #lex.ppnumber
1443
  [lex.pptoken]: #lex.pptoken
1444
  [lex.separate]: #lex.separate
1445
  [lex.string]: #lex.string
1446
+ [lex.string.concat]: #lex.string.concat
1447
  [lex.token]: #lex.token
1448
+ [module.import]: module.md#module.import
1449
+ [module.unit]: module.md#module.unit
1450
  [over.literal]: over.md#over.literal
 
 
 
 
 
 
 
 
 
1451
  [temp.explicit]: temp.md#temp.explicit
1452
  [temp.names]: temp.md#temp.names
1453
 
1454
  [^1]: Implementations must behave as if these separate phases occur,
1455
  although in practice different phases might be folded together.
 
1470
  (described in translation phase 1) is specified as
1471
  *implementation-defined*, an implementation is required to document
1472
  how the basic source characters are represented in source files.
1473
 
1474
  [^5]: A sequence of characters resembling a *universal-character-name*
1475
+ in an *r-char-sequence* [[lex.string]] does not form a
1476
  *universal-character-name*.
1477
 
1478
  [^6]: These include “digraphs” and additional reserved words. The term
1479
  “digraph” (token consisting of two characters) is not perfectly
1480
+ descriptive, since one of the alternative *preprocessing-token*s is
1481
  `%:%:` and of course several primary tokens contain two characters.
1482
  Nonetheless, those alternative tokens that aren’t lexical keywords
1483
  are colloquially known as “digraphs”.
1484
 
1485
+ [^7]: Thus the “stringized” values [[cpp.stringize]] of `[` and `<:`
1486
  will be different, maintaining the source spelling, but the tokens
1487
  can otherwise be freely interchanged.
1488
 
1489
  [^8]: Literals include strings and character and numeric literals.
1490
 
 
1501
  long external identifier, but C++ does not place a translation limit
1502
  on significant characters for external identifiers. In C++, upper-
1503
  and lower-case letters are considered different for all identifiers,
1504
  including external identifiers.
1505
 
1506
+ [^11]: The term “literal” generally designates, in this document, those
1507
+ tokens that are called “constants” in ISO C.
1508
 
1509
+ [^12]: They are intended for character sets where a character does not
 
 
1510
  fit into a single byte.
1511
 
1512
+ [^13]: Using an escape sequence for a question mark is supported for
1513
  compatibility with ISO C++14 and ISO C.