From Jason Turner

[lex.literal]

Diff to HTML by rtfpessoa

Files changed (1) hide show
  1. tmp/tmpyqm3pry3/{from.md → to.md} +293 -181
tmp/tmpyqm3pry3/{from.md → to.md} RENAMED
@@ -44,13 +44,11 @@ decimal-literal:
44
  decimal-literal '''ₒₚₜ digit
45
  ```
46
 
47
  ``` bnf
48
  hexadecimal-literal:
49
- '0x' hexadecimal-digit
50
- '0X' hexadecimal-digit
51
- hexadecimal-literal '''ₒₚₜ hexadecimal-digit
52
  ```
53
 
54
  ``` bnf
55
  binary-digit:
56
  '0'
@@ -65,10 +63,21 @@ octal-digit: one of
65
  ``` bnf
66
  nonzero-digit: one of
67
  '1 2 3 4 5 6 7 8 9'
68
  ```
69
 
 
 
 
 
 
 
 
 
 
 
 
70
  ``` bnf
71
  hexadecimal-digit: one of
72
  '0 1 2 3 4 5 6 7 8 9'
73
  'a b c d e f'
74
  'A B C D E F'
@@ -99,22 +108,25 @@ long-long-suffix: one of
99
 
100
  An *integer literal* is a sequence of digits that has no period or
101
  exponent part, with optional separating single quotes that are ignored
102
  when determining its value. An integer literal may have a prefix that
103
  specifies its base and a suffix that specifies its type. The lexically
104
- first digit of the sequence of digits is the most significant. A
105
- *binary* integer literal (base two) begins with `0b` or `0B` and
106
- consists of a sequence of binary digits. An *octal* integer literal
107
- (base eight) begins with the digit `0` and consists of a sequence of
108
- octal digits.[^12] A *decimal* integer literal (base ten) begins with a
109
- digit other than `0` and consists of a sequence of decimal digits. A
110
- *hexadecimal* integer literal (base sixteen) begins with `0x` or `0X`
111
  and consists of a sequence of hexadecimal digits, which include the
112
  decimal digits and the letters `a` through `f` and `A` through `F` with
113
- decimal values ten through fifteen. The number twelve can be written
114
- `12`, `014`, `0XC`, or `0b1100`. The literals `1048576`, `1'048'576`,
115
- `0X100000`, `0x10'0000`, and `0'004'000'000` all have the same value.
 
 
 
116
 
117
  The type of an integer literal is the first of the corresponding list in
118
  Table  [[tab:lex.type.integer.literal]] in which its value can be
119
  represented.
120
 
@@ -144,26 +156,28 @@ represented.
144
 
145
 
146
  If an integer literal cannot be represented by any type in its list and
147
  an extended integer type ([[basic.fundamental]]) can represent its
148
  value, it may have that extended integer type. If all of the types in
149
- the list for the literal are signed, the extended integer type shall be
150
- signed. If all of the types in the list for the literal are unsigned,
151
- the extended integer type shall be unsigned. If the list contains both
152
- signed and unsigned types, the extended integer type may be signed or
153
- unsigned. A program is ill-formed if one of its translation units
154
- contains an integer literal that cannot be represented by any of the
155
- allowed types.
156
 
157
  ### Character literals <a id="lex.ccon">[[lex.ccon]]</a>
158
 
159
  ``` bnf
160
  character-literal:
161
- ''' c-char-sequence '''
162
- u''' c-char-sequence '''
163
- U''' c-char-sequence '''
164
- L''' c-char-sequence '''
 
 
165
  ```
166
 
167
  ``` bnf
168
  c-char-sequence:
169
  c-char
@@ -195,46 +209,64 @@ hexadecimal-escape-sequence:
195
  '\x' hexadecimal-digit
196
  hexadecimal-escape-sequence hexadecimal-digit
197
  ```
198
 
199
  A character literal is one or more characters enclosed in single quotes,
200
- as in `'x'`, optionally preceded by one of the letters `u`, `U`, or `L`,
201
- as in `u'y'`, `U'z'`, or `L'x'`, respectively. A character literal that
202
- does not begin with `u`, `U`, or `L` is an ordinary character literal,
203
- also referred to as a narrow-character literal. An ordinary character
204
- literal that contains a single *c-char* representable in the execution
205
- character set has type `char`, with value equal to the numerical value
206
- of the encoding of the *c-char* in the execution character set. An
207
- ordinary character literal that contains more than one *c-char* is a
208
- *multicharacter literal*. A multicharacter literal, or an ordinary
209
- character literal containing a single *c-char* not representable in the
210
- execution character set, is conditionally-supported, has type `int`, and
211
- has an *implementation-defined* value.
212
 
213
- A character literal that begins with the letter `u`, such as `u'y'`, is
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
214
  a character literal of type `char16_t`. The value of a `char16_t`
215
- literal containing a single *c-char* is equal to its ISO 10646 code
216
- point value, provided that the code point is representable with a single
217
- 16-bit code unit. (That is, provided it is a basic multi-lingual plane
218
- code point.) If the value is not representable within 16 bits, the
219
- program is ill-formed. A `char16_t` literal containing multiple
220
- *c-char*s is ill-formed. A character literal that begins with the letter
221
- `U`, such as `U'z'`, is a character literal of type `char32_t`. The
222
- value of a `char32_t` literal containing a single *c-char* is equal to
223
- its ISO 10646 code point value. A `char32_t` literal containing multiple
224
- *c-char*s is ill-formed. A character literal that begins with the letter
225
- `L`, such as `L'x'`, is a wide-character literal. A wide-character
226
- literal has type `wchar_t`.[^13] The value of a wide-character literal
227
- containing a single *c-char* has value equal to the numerical value of
228
- the encoding of the *c-char* in the execution wide-character set, unless
229
- the *c-char* has no representation in the execution wide-character set,
230
- in which case the value is *implementation-defined*. The type `wchar_t`
231
- is able to represent all members of the execution wide-character set
232
- (see  [[basic.fundamental]]). . The value of a wide-character literal
233
- containing multiple *c-char*s is *implementation-defined*.
234
 
235
- Certain nongraphic characters, the single quote `'`, the double quote
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
236
  `"`, the question mark `?`,[^14] and the backslash `\`, can be
237
  represented according to Table  [[tab:escape.sequences]]. The double
238
  quote `"` and the question mark `?`, can be represented as themselves or
239
  by the escape sequences `\"` and `\?` respectively, but the single quote
240
  `'` and the backslash `\` shall be represented by the escape sequences
@@ -269,45 +301,74 @@ backslash followed by `x` followed by one or more hexadecimal digits
269
  that are taken to specify the value of the desired character. There is
270
  no limit to the number of digits in a hexadecimal sequence. A sequence
271
  of octal or hexadecimal digits is terminated by the first character that
272
  is not an octal digit or a hexadecimal digit, respectively. The value of
273
  a character literal is *implementation-defined* if it falls outside of
274
- the implementation-defined range defined for `char` (for literals with
275
- no prefix), `char16_t` (for literals prefixed by `'u'`), `char32_t` (for
276
- literals prefixed by `'U'`), or `wchar_t` (for literals prefixed by
277
- `'L'`).
278
 
279
- A universal-character-name is translated to the encoding, in the
 
 
 
 
280
  appropriate execution character set, of the character named. If there is
281
- no such encoding, the universal-character-name is translated to an
282
- *implementation-defined* encoding. In translation phase 1, a
283
- universal-character-name is introduced whenever an actual extended
284
- character is encountered in the source text. Therefore, all extended
285
- characters are described in terms of universal-character-names. However,
286
- the actual compiler implementation may use its own native character set,
287
- so long as the same results are obtained.
 
 
288
 
289
  ### Floating literals <a id="lex.fcon">[[lex.fcon]]</a>
290
 
291
  ``` bnf
292
  floating-literal:
 
 
 
 
 
 
293
  fractional-constant exponent-partₒₚₜ floating-suffixₒₚₜ
294
  digit-sequence exponent-part floating-suffixₒₚₜ
295
  ```
296
 
 
 
 
 
 
 
297
  ``` bnf
298
  fractional-constant:
299
  digit-sequenceₒₚₜ '.' digit-sequence
300
  digit-sequence '.'
301
  ```
302
 
 
 
 
 
 
 
303
  ``` bnf
304
  exponent-part:
305
  'e' signₒₚₜ digit-sequence
306
  'E' signₒₚₜ digit-sequence
307
  ```
308
 
 
 
 
 
 
 
309
  ``` bnf
310
  sign: one of
311
  '+ -'
312
  ```
313
 
@@ -320,46 +381,55 @@ digit-sequence:
320
  ``` bnf
321
  floating-suffix: one of
322
  'f l F L'
323
  ```
324
 
325
- A floating literal consists of an integer part, a decimal point, a
326
- fraction part, an `e` or `E`, an optionally signed integer exponent, and
327
- an optional type suffix. The integer and fraction parts both consist of
328
- a sequence of decimal (base ten) digits. Optional separating single
329
- quotes in a *digit-sequence* are ignored when determining its value. The
330
- literals `1.602'176'565e-19` and `1.602176565e-19` have the same value.
331
- Either the integer part or the fraction part (not both) can be omitted;
332
- either the decimal point or the letter `e` (or `E` ) and the exponent
333
- (not both) can be omitted. The integer part, the optional decimal point
334
- and the optional fraction part form the *significant part* of the
335
- floating literal. The exponent, if present, indicates the power of 10 by
336
- which the significant part is to be scaled. If the scaled value is in
337
- the range of representable values for its type, the result is the scaled
338
- value if representable, else the larger or smaller representable value
339
- nearest the scaled value, chosen in an *implementation-defined* manner.
340
- The type of a floating literal is `double` unless explicitly specified
341
- by a suffix. The suffixes `f` and `F` specify `float`, the suffixes `l`
342
- and `L` specify `long` `double`. If the scaled value is not in the range
343
- of representable values for its type, the program is ill-formed.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
344
 
345
  ### String literals <a id="lex.string">[[lex.string]]</a>
346
 
347
  ``` bnf
348
  string-literal:
349
  encoding-prefixₒₚₜ '"' s-char-sequenceₒₚₜ '"'
350
  encoding-prefixₒₚₜ 'R' raw-string
351
  ```
352
 
353
- ``` bnf
354
- encoding-prefix:
355
- 'u8'
356
- 'u'
357
- 'U'
358
- 'L'
359
- ```
360
-
361
  ``` bnf
362
  s-char-sequence:
363
  s-char
364
  s-char-sequence s-char
365
  ```
@@ -379,36 +449,43 @@ r-char-sequence:
379
  d-char-sequence:
380
  d-char
381
  d-char-sequence d-char
382
  ```
383
 
384
- A string literal is a sequence of characters (as defined in 
385
  [[lex.ccon]]) surrounded by double quotes, optionally prefixed by `R`,
386
  `u8`, `u8R`, `u`, `uR`, `U`, `UR`, `L`, or `LR`, as in `"..."`,
387
  `R"(...)"`, `u8"..."`, `u8R"**(...)**"`, `u"..."`, `uR"*~(...)*~"`,
388
  `U"..."`, `UR"zzz(...)zzz"`, `L"..."`, or `LR"(...)"`, respectively.
389
 
390
- A string literal that has an `R` in the prefix is a *raw string
391
  literal*. The *d-char-sequence* serves as a delimiter. The terminating
392
  *d-char-sequence* of a *raw-string* is the same sequence of characters
393
  as the initial *d-char-sequence*. A *d-char-sequence* shall consist of
394
  at most 16 characters.
395
 
396
- The characters `'('` and `')'` are permitted in a *raw-string*. Thus,
397
- `R"delimiter((a|b))delimiter"` is equivalent to `"(a|b)"`.
 
 
 
398
 
399
  A source-file new-line in a raw string literal results in a new-line in
400
- the resulting execution *string-literal*. Assuming no whitespace at the
401
  beginning of lines in the following example, the assert will succeed:
402
 
403
  ``` cpp
404
  const char* p = R"(a\
405
  b
406
  c)";
407
  assert(std::strcmp(p, "a\\\nb\nc") == 0);
408
  ```
409
 
 
 
 
 
410
  The raw string
411
 
412
  ``` cpp
413
  R"a(
414
  )\
@@ -430,62 +507,63 @@ R"#(
430
  )#"
431
  ```
432
 
433
  is equivalent to `"\n)\?\?=\"\n"`.
434
 
435
- After translation phase 6, a string literal that does not begin with an
436
- *encoding-prefix* is an ordinary string literal, and is initialized with
437
- the given characters.
438
 
439
- A string literal that begins with `u8`, such as `u8"asdf"`, is a UTF-8
440
- string literal.
 
 
 
 
441
 
442
  Ordinary string literals and UTF-8 string literals are also referred to
443
  as narrow string literals. A narrow string literal has type “array of
444
  *n* `const char`”, where *n* is the size of the string as defined below,
445
  and has static storage duration ([[basic.stc]]).
446
 
447
  For a UTF-8 string literal, each successive element of the object
448
  representation ([[basic.types]]) has the value of the corresponding
449
  code unit of the UTF-8 encoding of the string.
450
 
451
- A string literal that begins with `u`, such as `u"asdf"`, is a
452
  `char16_t` string literal. A `char16_t` string literal has type “array
453
  of *n* `const char16_t`”, where *n* is the size of the string as defined
454
- below; it has static storage duration and is initialized with the given
455
- characters. A single *c-char* may produce more than one `char16_t`
456
- character in the form of surrogate pairs.
457
 
458
- A string literal that begins with `U`, such as `U"asdf"`, is a
459
  `char32_t` string literal. A `char32_t` string literal has type “array
460
  of *n* `const char32_t`”, where *n* is the size of the string as defined
461
- below; it has static storage duration and is initialized with the given
462
- characters.
463
 
464
- A string literal that begins with `L`, such as `L"asdf"`, is a wide
465
- string literal. A wide string literal has type “array of *n* `const
466
- wchar_t`”, where *n* is the size of the string as defined below; it has
467
- static storage duration and is initialized with the given characters.
468
 
469
- Whether all string literals are distinct (that is, are stored in
470
- nonoverlapping objects) is *implementation-defined*. The effect of
471
- attempting to modify a string literal is undefined.
472
-
473
- In translation phase 6 ([[lex.phases]]), adjacent string literals are
474
- concatenated. If both string literals have the same *encoding-prefix*,
475
  the resulting concatenated string literal has that *encoding-prefix*. If
476
- one string literal has no *encoding-prefix*, it is treated as a string
477
- literal of the same *encoding-prefix* as the other operand. If a UTF-8
478
- string literal token is adjacent to a wide string literal token, the
479
- program is ill-formed. Any other concatenations are
480
- conditionally-supported with *implementation-defined* behavior. This
481
- concatenation is an interpretation, not a conversion. Because the
482
- interpretation happens in translation phase 6 (after each character from
483
- a literal has been translated into a value from the appropriate
484
- character set), a string literal’s initial rawness has no effect on the
485
- interpretation or well-formedness of the concatenation. Table 
486
- [[tab:lex.string.concat]] has some examples of valid concatenations.
 
 
 
 
487
 
488
  **Table: String literal concatenations** <a id="tab:lex.string.concat">[tab:lex.string.concat]</a>
489
 
490
  | | | | | | |
491
  | -------------------------- | ----- | -------------------------- | ----- | -------------------------- | ----- |
@@ -495,41 +573,59 @@ interpretation or well-formedness of the concatenation. Table 
495
  | `"a"` | `u"b"` | `u"ab"` | `"a"` | `U"b"` | `U"ab"` | `"a"` | `L"b"` | `L"ab"` |
496
 
497
 
498
  Characters in concatenated strings are kept distinct.
499
 
 
 
500
  ``` cpp
501
  "\xA" "B"
502
  ```
503
 
504
  contains the two characters `'\xA'` and `'B'` after concatenation (and
505
  not the single hexadecimal character `'\xAB'`).
506
 
 
 
507
  After any necessary concatenation, in translation phase 7 (
508
  [[lex.phases]]), `'\0'` is appended to every string literal so that
509
  programs that scan a string can find its end.
510
 
511
- Escape sequences and universal-character-names in non-raw string
512
  literals have the same meaning as in character literals ([[lex.ccon]]),
513
  except that the single quote `'` is representable either by itself or by
514
  the escape sequence `\'`, and the double quote `"` shall be preceded by
515
- a `\`. In a narrow string literal, a universal-character-name may map to
516
- more than one `char` element due to *multibyte encoding*. The size of a
517
- `char32_t` or wide string literal is the total number of escape
518
- sequences, universal-character-names, and other characters, plus one for
519
- the terminating `U'\0'` or `L'\0'`. The size of a `char16_t` string
520
- literal is the total number of escape sequences,
521
- universal-character-names, and other characters, plus one for each
522
- character requiring a surrogate pair, plus one for the terminating
523
- `u'\0'`. The size of a `char16_t` string literal is the number of code
524
- units, not the number of characters. Within `char32_t` and `char16_t`
525
- literals, any universal-character-names shall be within the range `0x0`
526
- to `0x10FFFF`. The size of a narrow string literal is the total number
527
- of escape sequences and other characters, plus at least one for the
528
- multibyte encoding of each universal-character-name, plus one for the
 
 
 
 
 
529
  terminating `'\0'`.
530
 
 
 
 
 
 
 
 
 
 
531
  ### Boolean literals <a id="lex.bool">[[lex.bool]]</a>
532
 
533
  ``` bnf
534
  boolean-literal:
535
  'false'
@@ -545,14 +641,17 @@ are prvalues and have type `bool`.
545
  pointer-literal:
546
  'nullptr'
547
  ```
548
 
549
  The pointer literal is the keyword `nullptr`. It is a prvalue of type
550
- `std::nullptr_t`. `std::nullptr_t` is a distinct type that is neither a
 
 
551
  pointer type nor a pointer to member type; rather, a prvalue of this
552
  type is a null pointer constant and can be converted to a null pointer
553
- value or null member pointer value. See  [[conv.ptr]] and  [[conv.mem]].
 
554
 
555
  ### User-defined literals <a id="lex.ext">[[lex.ext]]</a>
556
 
557
  ``` bnf
558
  user-defined-literal:
@@ -572,10 +671,12 @@ user-defined-integer-literal:
572
 
573
  ``` bnf
574
  user-defined-floating-literal:
575
  fractional-constant exponent-partₒₚₜ ud-suffix
576
  digit-sequence exponent-part ud-suffix
 
 
577
  ```
578
 
579
  ``` bnf
580
  user-defined-string-literal:
581
  string-literal ud-suffix
@@ -589,15 +690,24 @@ user-defined-character-literal:
589
  ``` bnf
590
  ud-suffix:
591
  identifier
592
  ```
593
 
594
- If a token matches both *user-defined-literal* and another literal kind,
595
- it is treated as the latter. `123_km` is a *user-defined-literal*, but
596
- `12LL` is an *integer-literal*. The syntactic non-terminal preceding the
597
- *ud-suffix* in a *user-defined-literal* is taken to be the longest
598
- sequence of characters that could match that non-terminal.
 
 
 
 
 
 
 
 
 
599
 
600
  A *user-defined-literal* is treated as a call to a literal operator or
601
  literal operator template ([[over.literal]]). To determine the form of
602
  this call for a given *user-defined-literal* *L* with *ud-suffix* *X*,
603
  the *literal-operator-id* whose literal suffix identifier is *X* is
@@ -627,13 +737,14 @@ a call of the form
627
 
628
  ``` cpp
629
  operator "" X<'c₁', 'c₂', ... 'cₖ'>()
630
  ```
631
 
632
- where *n* is the source character sequence c₁c₂...cₖ. The sequence
633
- c₁c₂...cₖ can only contain characters from the basic source character
634
- set.
 
635
 
636
  If *L* is a *user-defined-floating-literal*, let *f* be the literal
637
  without its *ud-suffix*. If *S* contains a literal operator with
638
  parameter type `long double`, the literal *L* is treated as a call of
639
  the form
@@ -655,32 +766,35 @@ a call of the form
655
 
656
  ``` cpp
657
  operator "" X<'c₁', 'c₂', ... 'cₖ'>()
658
  ```
659
 
660
- where *f* is the source character sequence c₁c₂...cₖ. The sequence
661
- c₁c₂...cₖ can only contain characters from the basic source character
662
- set.
 
663
 
664
  If *L* is a *user-defined-string-literal*, let *str* be the literal
665
  without its *ud-suffix* and let *len* be the number of code units in
666
  *str* (i.e., its length excluding the terminating null character). The
667
  literal *L* is treated as a call of the form
668
 
669
  ``` cpp
670
- operator "" X(str{}, len{})
671
  ```
672
 
673
  If *L* is a *user-defined-character-literal*, let *ch* be the literal
674
  without its *ud-suffix*. *S* shall contain a literal operator (
675
  [[over.literal]]) whose only parameter has the type of *ch* and the
676
  literal *L* is treated as a call of the form
677
 
678
  ``` cpp
679
- operator "" X(ch{})
680
  ```
681
 
 
 
682
  ``` cpp
683
  long double operator "" _w(long double);
684
  std::string operator "" _w(const char16_t*, std::size_t);
685
  unsigned operator "" _w(const char*);
686
  int main() {
@@ -689,48 +803,47 @@ int main() {
689
  12_w; // calls operator "" _w("12")
690
  "two"_w; // error: no applicable literal operator
691
  }
692
  ```
693
 
 
 
694
  In translation phase 6 ([[lex.phases]]), adjacent string literals are
695
  concatenated and *user-defined-string-literal*s are considered string
696
  literals for that purpose. During concatenation, *ud-suffix*es are
697
  removed and ignored and the concatenation process occurs as described
698
  in  [[lex.string]]. At the end of phase 6, if a string literal is the
699
  result of a concatenation involving at least one
700
  *user-defined-string-literal*, all the participating
701
  *user-defined-string-literal*s shall have the same *ud-suffix* and that
702
  suffix is applied to the result of the concatenation.
703
 
 
 
704
  ``` cpp
705
  int main() {
706
  L"A" "B" "C"_x; // OK: same as L"ABC"_x
707
  "P"_x "Q" "R"_y;// error: two different ud-suffix{es}
708
  }
709
  ```
710
 
711
- Some *identifier*s appearing as *ud-suffix*es are reserved for future
712
- standardization ([[usrlit.suffix]]). A program containing such a
713
- *ud-suffix* is ill-formed, no diagnostic required.
714
 
715
  <!-- Link reference definitions -->
716
  [basic.fundamental]: basic.md#basic.fundamental
717
  [basic.link]: basic.md#basic.link
718
  [basic.lookup.unqual]: basic.md#basic.lookup.unqual
719
  [basic.stc]: basic.md#basic.stc
720
  [basic.types]: basic.md#basic.types
721
- [charname.allowed]: charname.md#charname.allowed
722
- [charname.disallowed]: charname.md#charname.disallowed
723
  [conv.mem]: conv.md#conv.mem
724
  [conv.ptr]: conv.md#conv.ptr
725
  [cpp]: cpp.md#cpp
726
  [cpp.concat]: cpp.md#cpp.concat
727
  [cpp.cond]: cpp.md#cpp.cond
728
  [cpp.include]: cpp.md#cpp.include
729
  [cpp.stringize]: cpp.md#cpp.stringize
730
  [dcl.attr.grammar]: dcl.md#dcl.attr.grammar
731
- [global.names]: library.md#global.names
732
  [headers]: library.md#headers
733
  [lex]: #lex
734
  [lex.bool]: #lex.bool
735
  [lex.ccon]: #lex.ccon
736
  [lex.charset]: #lex.charset
@@ -750,23 +863,22 @@ standardization ([[usrlit.suffix]]). A program containing such a
750
  [lex.ppnumber]: #lex.ppnumber
751
  [lex.pptoken]: #lex.pptoken
752
  [lex.separate]: #lex.separate
753
  [lex.string]: #lex.string
754
  [lex.token]: #lex.token
755
- [lex.trigraph]: #lex.trigraph
756
  [over.literal]: over.md#over.literal
757
  [tab:alternative.representations]: #tab:alternative.representations
758
  [tab:alternative.tokens]: #tab:alternative.tokens
 
 
759
  [tab:escape.sequences]: #tab:escape.sequences
760
  [tab:identifiers.special]: #tab:identifiers.special
761
  [tab:keywords]: #tab:keywords
762
  [tab:lex.string.concat]: #tab:lex.string.concat
763
  [tab:lex.type.integer.literal]: #tab:lex.type.integer.literal
764
- [tab:trigraph.sequences]: #tab:trigraph.sequences
765
  [temp.explicit]: temp.md#temp.explicit
766
  [temp.names]: temp.md#temp.names
767
- [usrlit.suffix]: library.md#usrlit.suffix
768
 
769
  [^1]: Implementations must behave as if these separate phases occur,
770
  although in practice different phases might be folded together.
771
 
772
  [^2]: A partial preprocessing token would arise from a source file
@@ -781,16 +893,16 @@ standardization ([[usrlit.suffix]]). A program containing such a
781
  [^4]: The glyphs for the members of the basic source character set are
782
  intended to identify characters from the subset of ISO/IEC 10646
783
  which corresponds to the ASCII character set. However, because the
784
  mapping from source file characters to the source character set
785
  (described in translation phase 1) is specified as
786
- implementation-defined, an implementation is required to document
787
  how the basic source characters are represented in source files.
788
 
789
- [^5]: A sequence of characters resembling a universal-character-name in
790
- an *r-char-sequence* ([[lex.string]]) does not form a
791
- universal-character-name.
792
 
793
  [^6]: These include “digraphs” and additional reserved words. The term
794
  “digraph” (token consisting of two characters) is not perfectly
795
  descriptive, since one of the alternative preprocessing-tokens is
796
  `%:%:` and of course several primary tokens contain two characters.
@@ -807,14 +919,14 @@ standardization ([[usrlit.suffix]]). A program containing such a
807
  might result in an error, be interpreted as the character
808
  corresponding to the escape sequence, or have a completely different
809
  meaning, depending on the implementation.
810
 
811
  [^10]: On systems in which linkers cannot accept extended characters, an
812
- encoding of the universal-character-name may be used in forming
813
  valid external identifiers. For example, some otherwise unused
814
  character or sequence of characters may be used to encode the `\u`
815
- in a universal-character-name. Extended characters may produce a
816
  long external identifier, but C++does not place a translation limit
817
  on significant characters for external identifiers. In C++, upper-
818
  and lower-case letters are considered different for all identifiers,
819
  including external identifiers.
820
 
@@ -824,7 +936,7 @@ standardization ([[usrlit.suffix]]). A program containing such a
824
  [^12]: The digits `8` and `9` are not octal digits.
825
 
826
  [^13]: They are intended for character sets where a character does not
827
  fit into a single byte.
828
 
829
- [^14]: Using an escape sequence for a question mark can avoid
830
- accidentally creating a trigraph.
 
44
  decimal-literal '''ₒₚₜ digit
45
  ```
46
 
47
  ``` bnf
48
  hexadecimal-literal:
49
+ hexadecimal-prefix hexadecimal-digit-sequence
 
 
50
  ```
51
 
52
  ``` bnf
53
  binary-digit:
54
  '0'
 
63
  ``` bnf
64
  nonzero-digit: one of
65
  '1 2 3 4 5 6 7 8 9'
66
  ```
67
 
68
+ ``` bnf
69
+ hexadecimal-prefix: one of
70
+ '0x 0X'
71
+ ```
72
+
73
+ ``` bnf
74
+ hexadecimal-digit-sequence:
75
+ hexadecimal-digit
76
+ hexadecimal-digit-sequence '''ₒₚₜ hexadecimal-digit
77
+ ```
78
+
79
  ``` bnf
80
  hexadecimal-digit: one of
81
  '0 1 2 3 4 5 6 7 8 9'
82
  'a b c d e f'
83
  'A B C D E F'
 
108
 
109
  An *integer literal* is a sequence of digits that has no period or
110
  exponent part, with optional separating single quotes that are ignored
111
  when determining its value. An integer literal may have a prefix that
112
  specifies its base and a suffix that specifies its type. The lexically
113
+ first digit of the sequence of digits is the most significant. A *binary
114
+ integer literal* (base two) begins with `0b` or `0B` and consists of a
115
+ sequence of binary digits. An *octal integer literal* (base eight)
116
+ begins with the digit `0` and consists of a sequence of octal
117
+ digits.[^12] A *decimal integer literal* (base ten) begins with a digit
118
+ other than `0` and consists of a sequence of decimal digits. A
119
+ *hexadecimal integer literal* (base sixteen) begins with `0x` or `0X`
120
  and consists of a sequence of hexadecimal digits, which include the
121
  decimal digits and the letters `a` through `f` and `A` through `F` with
122
+ decimal values ten through fifteen.
123
+
124
+ [*Example 1*: The number twelve can be written `12`, `014`, `0XC`, or
125
+ `0b1100`. The integer literals `1048576`, `1'048'576`, `0X100000`,
126
+ `0x10'0000`, and `0'004'000'000` all have the same
127
+ value. — *end example*]
128
 
129
  The type of an integer literal is the first of the corresponding list in
130
  Table  [[tab:lex.type.integer.literal]] in which its value can be
131
  represented.
132
 
 
156
 
157
 
158
  If an integer literal cannot be represented by any type in its list and
159
  an extended integer type ([[basic.fundamental]]) can represent its
160
  value, it may have that extended integer type. If all of the types in
161
+ the list for the integer literal are signed, the extended integer type
162
+ shall be signed. If all of the types in the list for the integer literal
163
+ are unsigned, the extended integer type shall be unsigned. If the list
164
+ contains both signed and unsigned types, the extended integer type may
165
+ be signed or unsigned. A program is ill-formed if one of its translation
166
+ units contains an integer literal that cannot be represented by any of
167
+ the allowed types.
168
 
169
  ### Character literals <a id="lex.ccon">[[lex.ccon]]</a>
170
 
171
  ``` bnf
172
  character-literal:
173
+ encoding-prefixₒₚₜ ''' c-char-sequence '''
174
+ ```
175
+
176
+ ``` bnf
177
+ encoding-prefix: one of
178
+ 'u8' 'u' 'U' 'L'
179
  ```
180
 
181
  ``` bnf
182
  c-char-sequence:
183
  c-char
 
209
  '\x' hexadecimal-digit
210
  hexadecimal-escape-sequence hexadecimal-digit
211
  ```
212
 
213
  A character literal is one or more characters enclosed in single quotes,
214
+ as in `'x'`, optionally preceded by `u8`, `u`, `U`, or `L`, as in
215
+ `u8'w'`, `u'x'`, `U'y'`, or `L'z'`, respectively.
 
 
 
 
 
 
 
 
 
 
216
 
217
+ A character literal that does not begin with `u8`, `u`, `U`, or `L` is
218
+ an *ordinary character literal*. An ordinary character literal that
219
+ contains a single *c-char* representable in the execution character set
220
+ has type `char`, with value equal to the numerical value of the encoding
221
+ of the *c-char* in the execution character set. An ordinary character
222
+ literal that contains more than one *c-char* is a *multicharacter
223
+ literal*. A multicharacter literal, or an ordinary character literal
224
+ containing a single *c-char* not representable in the execution
225
+ character set, is conditionally-supported, has type `int`, and has an
226
+ *implementation-defined* value.
227
+
228
+ A character literal that begins with `u8`, such as `u8'w'`, is a
229
+ character literal of type `char`, known as a *UTF-8 character literal*.
230
+ The value of a UTF-8 character literal is equal to its ISO 10646 code
231
+ point value, provided that the code point value is representable with a
232
+ single UTF-8 code unit (that is, provided it is in the C0 Controls and
233
+ Basic Latin Unicode block). If the value is not representable with a
234
+ single UTF-8 code unit, the program is ill-formed. A UTF-8 character
235
+ literal containing multiple *c-char*s is ill-formed.
236
+
237
+ A character literal that begins with the letter `u`, such as `u'x'`, is
238
  a character literal of type `char16_t`. The value of a `char16_t`
239
+ character literal containing a single *c-char* is equal to its ISO 10646
240
+ code point value, provided that the code point is representable with a
241
+ single 16-bit code unit. (That is, provided it is a basic multi-lingual
242
+ plane code point.) If the value is not representable within 16 bits, the
243
+ program is ill-formed. A `char16_t` character literal containing
244
+ multiple *c-char*s is ill-formed.
 
 
 
 
 
 
 
 
 
 
 
 
 
245
 
246
+ A character literal that begins with the letter `U`, such as `U'y'`, is
247
+ a character literal of type `char32_t`. The value of a `char32_t`
248
+ character literal containing a single *c-char* is equal to its ISO 10646
249
+ code point value. A `char32_t` character literal containing multiple
250
+ *c-char*s is ill-formed.
251
+
252
+ A character literal that begins with the letter `L`, such as `L'z'`, is
253
+ a *wide-character literal*. A wide-character literal has type
254
+ `wchar_t`.[^13] The value of a wide-character literal containing a
255
+ single *c-char* has value equal to the numerical value of the encoding
256
+ of the *c-char* in the execution wide-character set, unless the *c-char*
257
+ has no representation in the execution wide-character set, in which case
258
+ the value is *implementation-defined*.
259
+
260
+ [*Note 1*: The type `wchar_t` is able to represent all members of the
261
+ execution wide-character set (see 
262
+ [[basic.fundamental]]). — *end note*]
263
+
264
+ The value of a wide-character literal containing multiple *c-char*s is
265
+ *implementation-defined*.
266
+
267
+ Certain non-graphic characters, the single quote `'`, the double quote
268
  `"`, the question mark `?`,[^14] and the backslash `\`, can be
269
  represented according to Table  [[tab:escape.sequences]]. The double
270
  quote `"` and the question mark `?`, can be represented as themselves or
271
  by the escape sequences `\"` and `\?` respectively, but the single quote
272
  `'` and the backslash `\` shall be represented by the escape sequences
 
301
  that are taken to specify the value of the desired character. There is
302
  no limit to the number of digits in a hexadecimal sequence. A sequence
303
  of octal or hexadecimal digits is terminated by the first character that
304
  is not an octal digit or a hexadecimal digit, respectively. The value of
305
  a character literal is *implementation-defined* if it falls outside of
306
+ the *implementation-defined* range defined for `char` (for character
307
+ literals with no prefix) or `wchar_t` (for character literals prefixed
308
+ by `L`).
 
309
 
310
+ [*Note 2*: If the value of a character literal prefixed by `u`, `u8`,
311
+ or `U` is outside the range defined for its type, the program is
312
+ ill-formed. — *end note*]
313
+
314
+ A *universal-character-name* is translated to the encoding, in the
315
  appropriate execution character set, of the character named. If there is
316
+ no such encoding, the *universal-character-name* is translated to an
317
+ *implementation-defined* encoding.
318
+
319
+ [*Note 3*: In translation phase 1, a *universal-character-name* is
320
+ introduced whenever an actual extended character is encountered in the
321
+ source text. Therefore, all extended characters are described in terms
322
+ of *universal-character-name*s. However, the actual compiler
323
+ implementation may use its own native character set, so long as the same
324
+ results are obtained. — *end note*]
325
 
326
  ### Floating literals <a id="lex.fcon">[[lex.fcon]]</a>
327
 
328
  ``` bnf
329
  floating-literal:
330
+ decimal-floating-literal
331
+ hexadecimal-floating-literal
332
+ ```
333
+
334
+ ``` bnf
335
+ decimal-floating-literal:
336
  fractional-constant exponent-partₒₚₜ floating-suffixₒₚₜ
337
  digit-sequence exponent-part floating-suffixₒₚₜ
338
  ```
339
 
340
+ ``` bnf
341
+ hexadecimal-floating-literal:
342
+ hexadecimal-prefix hexadecimal-fractional-constant binary-exponent-part floating-suffixₒₚₜ
343
+ hexadecimal-prefix hexadecimal-digit-sequence binary-exponent-part floating-suffixₒₚₜ
344
+ ```
345
+
346
  ``` bnf
347
  fractional-constant:
348
  digit-sequenceₒₚₜ '.' digit-sequence
349
  digit-sequence '.'
350
  ```
351
 
352
+ ``` bnf
353
+ hexadecimal-fractional-constant:
354
+ hexadecimal-digit-sequenceₒₚₜ '.' hexadecimal-digit-sequence
355
+ hexadecimal-digit-sequence '.'
356
+ ```
357
+
358
  ``` bnf
359
  exponent-part:
360
  'e' signₒₚₜ digit-sequence
361
  'E' signₒₚₜ digit-sequence
362
  ```
363
 
364
+ ``` bnf
365
+ binary-exponent-part:
366
+ 'p' signₒₚₜ digit-sequence
367
+ 'P' signₒₚₜ digit-sequence
368
+ ```
369
+
370
  ``` bnf
371
  sign: one of
372
  '+ -'
373
  ```
374
 
 
381
  ``` bnf
382
  floating-suffix: one of
383
  'f l F L'
384
  ```
385
 
386
+ A floating literal consists of an optional prefix specifying a base, an
387
+ integer part, a radix point, a fraction part, an `e`, `E`, `p` or `P`,
388
+ an optionally signed integer exponent, and an optional type suffix. The
389
+ integer and fraction parts both consist of a sequence of decimal (base
390
+ ten) digits if there is no prefix, or hexadecimal (base sixteen) digits
391
+ if the prefix is `0x` or `0X`. The floating literal is a *decimal
392
+ floating literal* in the former case and a *hexadecimal floating
393
+ literal* in the latter case. Optional separating single quotes in a
394
+ *digit-sequence* or *hexadecimal-digit-sequence* are ignored when
395
+ determining its value.
396
+
397
+ [*Example 1*: The floating literals `1.602'176'565e-19` and
398
+ `1.602176565e-19` have the same value. *end example*]
399
+
400
+ Either the integer part or the fraction part (not both) can be omitted.
401
+ Either the radix point or the letter `e` or `E` and the exponent (not
402
+ both) can be omitted from a decimal floating literal. The radix point
403
+ (but not the exponent) can be omitted from a hexadecimal floating
404
+ literal. The integer part, the optional radix point, and the optional
405
+ fraction part, form the *significand* of the floating literal. In a
406
+ decimal floating literal, the exponent, if present, indicates the power
407
+ of 10 by which the significand is to be scaled. In a hexadecimal
408
+ floating literal, the exponent indicates the power of 2 by which the
409
+ significand is to be scaled.
410
+
411
+ [*Example 2*: The floating literals `49.625` and `0xC.68p+2` have the
412
+ same value. — *end example*]
413
+
414
+ If the scaled value is in the range of representable values for its
415
+ type, the result is the scaled value if representable, else the larger
416
+ or smaller representable value nearest the scaled value, chosen in an
417
+ *implementation-defined* manner. The type of a floating literal is
418
+ `double` unless explicitly specified by a suffix. The suffixes `f` and
419
+ `F` specify `float`, the suffixes `l` and `L` specify `long` `double`.
420
+ If the scaled value is not in the range of representable values for its
421
+ type, the program is ill-formed.
422
 
423
  ### String literals <a id="lex.string">[[lex.string]]</a>
424
 
425
  ``` bnf
426
  string-literal:
427
  encoding-prefixₒₚₜ '"' s-char-sequenceₒₚₜ '"'
428
  encoding-prefixₒₚₜ 'R' raw-string
429
  ```
430
 
 
 
 
 
 
 
 
 
431
  ``` bnf
432
  s-char-sequence:
433
  s-char
434
  s-char-sequence s-char
435
  ```
 
449
  d-char-sequence:
450
  d-char
451
  d-char-sequence d-char
452
  ```
453
 
454
+ A *string-literal* is a sequence of characters (as defined in 
455
  [[lex.ccon]]) surrounded by double quotes, optionally prefixed by `R`,
456
  `u8`, `u8R`, `u`, `uR`, `U`, `UR`, `L`, or `LR`, as in `"..."`,
457
  `R"(...)"`, `u8"..."`, `u8R"**(...)**"`, `u"..."`, `uR"*~(...)*~"`,
458
  `U"..."`, `UR"zzz(...)zzz"`, `L"..."`, or `LR"(...)"`, respectively.
459
 
460
+ A *string-literal* that has an `R` in the prefix is a *raw string
461
  literal*. The *d-char-sequence* serves as a delimiter. The terminating
462
  *d-char-sequence* of a *raw-string* is the same sequence of characters
463
  as the initial *d-char-sequence*. A *d-char-sequence* shall consist of
464
  at most 16 characters.
465
 
466
+ [*Note 1*: The characters `'('` and `')'` are permitted in a
467
+ *raw-string*. Thus, `R"delimiter((a|b))delimiter"` is equivalent to
468
+ `"(a|b)"`. — *end note*]
469
+
470
+ [*Note 2*:
471
 
472
  A source-file new-line in a raw string literal results in a new-line in
473
+ the resulting execution string literal. Assuming no whitespace at the
474
  beginning of lines in the following example, the assert will succeed:
475
 
476
  ``` cpp
477
  const char* p = R"(a\
478
  b
479
  c)";
480
  assert(std::strcmp(p, "a\\\nb\nc") == 0);
481
  ```
482
 
483
+ — *end note*]
484
+
485
+ [*Example 1*:
486
+
487
  The raw string
488
 
489
  ``` cpp
490
  R"a(
491
  )\
 
507
  )#"
508
  ```
509
 
510
  is equivalent to `"\n)\?\?=\"\n"`.
511
 
512
+ *end example*]
 
 
513
 
514
+ After translation phase 6, a *string-literal* that does not begin with
515
+ an *encoding-prefix* is an *ordinary string literal*, and is initialized
516
+ with the given characters.
517
+
518
+ A *string-literal* that begins with `u8`, such as `u8"asdf"`, is a
519
+ *UTF-8 string literal*.
520
 
521
  Ordinary string literals and UTF-8 string literals are also referred to
522
  as narrow string literals. A narrow string literal has type “array of
523
  *n* `const char`”, where *n* is the size of the string as defined below,
524
  and has static storage duration ([[basic.stc]]).
525
 
526
  For a UTF-8 string literal, each successive element of the object
527
  representation ([[basic.types]]) has the value of the corresponding
528
  code unit of the UTF-8 encoding of the string.
529
 
530
+ A *string-literal* that begins with `u`, such as `u"asdf"`, is a
531
  `char16_t` string literal. A `char16_t` string literal has type “array
532
  of *n* `const char16_t`”, where *n* is the size of the string as defined
533
+ below; it is initialized with the given characters. A single *c-char*
534
+ may produce more than one `char16_t` character in the form of surrogate
535
+ pairs.
536
 
537
+ A *string-literal* that begins with `U`, such as `U"asdf"`, is a
538
  `char32_t` string literal. A `char32_t` string literal has type “array
539
  of *n* `const char32_t`”, where *n* is the size of the string as defined
540
+ below; it is initialized with the given characters.
 
541
 
542
+ A *string-literal* that begins with `L`, such as `L"asdf"`, is a *wide
543
+ string literal*. A wide string literal has type “array of *n* `const
544
+ wchar_t`”, where *n* is the size of the string as defined below; it is
545
+ initialized with the given characters.
546
 
547
+ In translation phase ([[lex.phases]]), adjacent *string-literal*s are
548
+ concatenated. If both *string-literal*s have the same *encoding-prefix*,
 
 
 
 
549
  the resulting concatenated string literal has that *encoding-prefix*. If
550
+ one *string-literal* has no *encoding-prefix*, it is treated as a
551
+ *string-literal* of the same *encoding-prefix* as the other operand. If
552
+ a UTF-8 string literal token is adjacent to a wide string literal token,
553
+ the program is ill-formed. Any other concatenations are
554
+ conditionally-supported with *implementation-defined* behavior.
555
+
556
+ [*Note 3*: This concatenation is an interpretation, not a conversion.
557
+ Because the interpretation happens in translation phase 6 (after each
558
+ character from a string literal has been translated into a value from
559
+ the appropriate character set), a *string-literal*’s initial rawness has
560
+ no effect on the interpretation or well-formedness of the
561
+ concatenation. — *end note*]
562
+
563
+ Table  [[tab:lex.string.concat]] has some examples of valid
564
+ concatenations.
565
 
566
  **Table: String literal concatenations** <a id="tab:lex.string.concat">[tab:lex.string.concat]</a>
567
 
568
  | | | | | | |
569
  | -------------------------- | ----- | -------------------------- | ----- | -------------------------- | ----- |
 
573
  | `"a"` | `u"b"` | `u"ab"` | `"a"` | `U"b"` | `U"ab"` | `"a"` | `L"b"` | `L"ab"` |
574
 
575
 
576
  Characters in concatenated strings are kept distinct.
577
 
578
+ [*Example 2*:
579
+
580
  ``` cpp
581
  "\xA" "B"
582
  ```
583
 
584
  contains the two characters `'\xA'` and `'B'` after concatenation (and
585
  not the single hexadecimal character `'\xAB'`).
586
 
587
+ — *end example*]
588
+
589
  After any necessary concatenation, in translation phase 7 (
590
  [[lex.phases]]), `'\0'` is appended to every string literal so that
591
  programs that scan a string can find its end.
592
 
593
+ Escape sequences and *universal-character-name*s in non-raw string
594
  literals have the same meaning as in character literals ([[lex.ccon]]),
595
  except that the single quote `'` is representable either by itself or by
596
  the escape sequence `\'`, and the double quote `"` shall be preceded by
597
+ a `\`, and except that a *universal-character-name* in a `char16_t`
598
+ string literal may yield a surrogate pair. In a narrow string literal, a
599
+ *universal-character-name* may map to more than one `char` element due
600
+ to *multibyte encoding*. The size of a `char32_t` or wide string literal
601
+ is the total number of escape sequences, *universal-character-name*s,
602
+ and other characters, plus one for the terminating `U'\0'` or `L'\0'`.
603
+ The size of a `char16_t` string literal is the total number of escape
604
+ sequences, *universal-character-name*s, and other characters, plus one
605
+ for each character requiring a surrogate pair, plus one for the
606
+ terminating `u'\0'`.
607
+
608
+ [*Note 4*: The size of a `char16_t` string literal is the number of
609
+ code units, not the number of characters. *end note*]
610
+
611
+ Within `char32_t` and `char16_t` string literals, any
612
+ *universal-character-name*s shall be within the range `0x0` to
613
+ `0x10FFFF`. The size of a narrow string literal is the total number of
614
+ escape sequences and other characters, plus at least one for the
615
+ multibyte encoding of each *universal-character-name*, plus one for the
616
  terminating `'\0'`.
617
 
618
+ Evaluating a *string-literal* results in a string literal object with
619
+ static storage duration, initialized from the given characters as
620
+ specified above. Whether all string literals are distinct (that is, are
621
+ stored in nonoverlapping objects) and whether successive evaluations of
622
+ a *string-literal* yield the same or a different object is unspecified.
623
+
624
+ [*Note 5*: The effect of attempting to modify a string literal is
625
+ undefined. — *end note*]
626
+
627
  ### Boolean literals <a id="lex.bool">[[lex.bool]]</a>
628
 
629
  ``` bnf
630
  boolean-literal:
631
  'false'
 
641
  pointer-literal:
642
  'nullptr'
643
  ```
644
 
645
  The pointer literal is the keyword `nullptr`. It is a prvalue of type
646
+ `std::nullptr_t`.
647
+
648
+ [*Note 1*: `std::nullptr_t` is a distinct type that is neither a
649
  pointer type nor a pointer to member type; rather, a prvalue of this
650
  type is a null pointer constant and can be converted to a null pointer
651
+ value or null member pointer value. See  [[conv.ptr]] and 
652
+ [[conv.mem]]. — *end note*]
653
 
654
  ### User-defined literals <a id="lex.ext">[[lex.ext]]</a>
655
 
656
  ``` bnf
657
  user-defined-literal:
 
671
 
672
  ``` bnf
673
  user-defined-floating-literal:
674
  fractional-constant exponent-partₒₚₜ ud-suffix
675
  digit-sequence exponent-part ud-suffix
676
+ hexadecimal-prefix hexadecimal-fractional-constant binary-exponent-part ud-suffix
677
+ hexadecimal-prefix hexadecimal-digit-sequence binary-exponent-part ud-suffix
678
  ```
679
 
680
  ``` bnf
681
  user-defined-string-literal:
682
  string-literal ud-suffix
 
690
  ``` bnf
691
  ud-suffix:
692
  identifier
693
  ```
694
 
695
+ If a token matches both *user-defined-literal* and another *literal*
696
+ kind, it is treated as the latter.
697
+
698
+ [*Example 1*:
699
+
700
+ `123_km`
701
+
702
+ is a *user-defined-literal*, but `12LL` is an *integer-literal*.
703
+
704
+ — *end example*]
705
+
706
+ The syntactic non-terminal preceding the *ud-suffix* in a
707
+ *user-defined-literal* is taken to be the longest sequence of characters
708
+ that could match that non-terminal.
709
 
710
  A *user-defined-literal* is treated as a call to a literal operator or
711
  literal operator template ([[over.literal]]). To determine the form of
712
  this call for a given *user-defined-literal* *L* with *ud-suffix* *X*,
713
  the *literal-operator-id* whose literal suffix identifier is *X* is
 
737
 
738
  ``` cpp
739
  operator "" X<'c₁', 'c₂', ... 'cₖ'>()
740
  ```
741
 
742
+ where *n* is the source character sequence c₁c₂...cₖ.
743
+
744
+ [*Note 1*: The sequence c₁c₂...cₖ can only contain characters from the
745
+ basic source character set. — *end note*]
746
 
747
  If *L* is a *user-defined-floating-literal*, let *f* be the literal
748
  without its *ud-suffix*. If *S* contains a literal operator with
749
  parameter type `long double`, the literal *L* is treated as a call of
750
  the form
 
766
 
767
  ``` cpp
768
  operator "" X<'c₁', 'c₂', ... 'cₖ'>()
769
  ```
770
 
771
+ where *f* is the source character sequence c₁c₂...cₖ.
772
+
773
+ [*Note 2*: The sequence c₁c₂...cₖ can only contain characters from the
774
+ basic source character set. — *end note*]
775
 
776
  If *L* is a *user-defined-string-literal*, let *str* be the literal
777
  without its *ud-suffix* and let *len* be the number of code units in
778
  *str* (i.e., its length excluding the terminating null character). The
779
  literal *L* is treated as a call of the form
780
 
781
  ``` cpp
782
+ operator "" X(str, len)
783
  ```
784
 
785
  If *L* is a *user-defined-character-literal*, let *ch* be the literal
786
  without its *ud-suffix*. *S* shall contain a literal operator (
787
  [[over.literal]]) whose only parameter has the type of *ch* and the
788
  literal *L* is treated as a call of the form
789
 
790
  ``` cpp
791
+ operator "" X(ch)
792
  ```
793
 
794
+ [*Example 2*:
795
+
796
  ``` cpp
797
  long double operator "" _w(long double);
798
  std::string operator "" _w(const char16_t*, std::size_t);
799
  unsigned operator "" _w(const char*);
800
  int main() {
 
803
  12_w; // calls operator "" _w("12")
804
  "two"_w; // error: no applicable literal operator
805
  }
806
  ```
807
 
808
+ — *end example*]
809
+
810
  In translation phase 6 ([[lex.phases]]), adjacent string literals are
811
  concatenated and *user-defined-string-literal*s are considered string
812
  literals for that purpose. During concatenation, *ud-suffix*es are
813
  removed and ignored and the concatenation process occurs as described
814
  in  [[lex.string]]. At the end of phase 6, if a string literal is the
815
  result of a concatenation involving at least one
816
  *user-defined-string-literal*, all the participating
817
  *user-defined-string-literal*s shall have the same *ud-suffix* and that
818
  suffix is applied to the result of the concatenation.
819
 
820
+ [*Example 3*:
821
+
822
  ``` cpp
823
  int main() {
824
  L"A" "B" "C"_x; // OK: same as L"ABC"_x
825
  "P"_x "Q" "R"_y;// error: two different ud-suffix{es}
826
  }
827
  ```
828
 
829
+ *end example*]
 
 
830
 
831
  <!-- Link reference definitions -->
832
  [basic.fundamental]: basic.md#basic.fundamental
833
  [basic.link]: basic.md#basic.link
834
  [basic.lookup.unqual]: basic.md#basic.lookup.unqual
835
  [basic.stc]: basic.md#basic.stc
836
  [basic.types]: basic.md#basic.types
 
 
837
  [conv.mem]: conv.md#conv.mem
838
  [conv.ptr]: conv.md#conv.ptr
839
  [cpp]: cpp.md#cpp
840
  [cpp.concat]: cpp.md#cpp.concat
841
  [cpp.cond]: cpp.md#cpp.cond
842
  [cpp.include]: cpp.md#cpp.include
843
  [cpp.stringize]: cpp.md#cpp.stringize
844
  [dcl.attr.grammar]: dcl.md#dcl.attr.grammar
 
845
  [headers]: library.md#headers
846
  [lex]: #lex
847
  [lex.bool]: #lex.bool
848
  [lex.ccon]: #lex.ccon
849
  [lex.charset]: #lex.charset
 
863
  [lex.ppnumber]: #lex.ppnumber
864
  [lex.pptoken]: #lex.pptoken
865
  [lex.separate]: #lex.separate
866
  [lex.string]: #lex.string
867
  [lex.token]: #lex.token
 
868
  [over.literal]: over.md#over.literal
869
  [tab:alternative.representations]: #tab:alternative.representations
870
  [tab:alternative.tokens]: #tab:alternative.tokens
871
+ [tab:charname.allowed]: #tab:charname.allowed
872
+ [tab:charname.disallowed]: #tab:charname.disallowed
873
  [tab:escape.sequences]: #tab:escape.sequences
874
  [tab:identifiers.special]: #tab:identifiers.special
875
  [tab:keywords]: #tab:keywords
876
  [tab:lex.string.concat]: #tab:lex.string.concat
877
  [tab:lex.type.integer.literal]: #tab:lex.type.integer.literal
 
878
  [temp.explicit]: temp.md#temp.explicit
879
  [temp.names]: temp.md#temp.names
 
880
 
881
  [^1]: Implementations must behave as if these separate phases occur,
882
  although in practice different phases might be folded together.
883
 
884
  [^2]: A partial preprocessing token would arise from a source file
 
893
  [^4]: The glyphs for the members of the basic source character set are
894
  intended to identify characters from the subset of ISO/IEC 10646
895
  which corresponds to the ASCII character set. However, because the
896
  mapping from source file characters to the source character set
897
  (described in translation phase 1) is specified as
898
+ *implementation-defined*, an implementation is required to document
899
  how the basic source characters are represented in source files.
900
 
901
+ [^5]: A sequence of characters resembling a *universal-character-name*
902
+ in an *r-char-sequence* ([[lex.string]]) does not form a
903
+ *universal-character-name*.
904
 
905
  [^6]: These include “digraphs” and additional reserved words. The term
906
  “digraph” (token consisting of two characters) is not perfectly
907
  descriptive, since one of the alternative preprocessing-tokens is
908
  `%:%:` and of course several primary tokens contain two characters.
 
919
  might result in an error, be interpreted as the character
920
  corresponding to the escape sequence, or have a completely different
921
  meaning, depending on the implementation.
922
 
923
  [^10]: On systems in which linkers cannot accept extended characters, an
924
+ encoding of the *universal-character-name* may be used in forming
925
  valid external identifiers. For example, some otherwise unused
926
  character or sequence of characters may be used to encode the `\u`
927
+ in a *universal-character-name*. Extended characters may produce a
928
  long external identifier, but C++does not place a translation limit
929
  on significant characters for external identifiers. In C++, upper-
930
  and lower-case letters are considered different for all identifiers,
931
  including external identifiers.
932
 
 
936
  [^12]: The digits `8` and `9` are not octal digits.
937
 
938
  [^13]: They are intended for character sets where a character does not
939
  fit into a single byte.
940
 
941
+ [^14]: Using an escape sequence for a question mark is supported for
942
+ compatibility with ISO C++14and ISO C.