From Jason Turner

[lex.literal]

Diff to HTML by rtfpessoa

Files changed (1) hide show
  1. tmp/tmpud1anggv/{from.md → to.md} +288 -244
tmp/tmpud1anggv/{from.md → to.md} RENAMED
@@ -6,11 +6,11 @@ There are several kinds of literals.[^11]
6
 
7
  ``` bnf
8
  literal:
9
  integer-literal
10
  character-literal
11
- floating-literal
12
  string-literal
13
  boolean-literal
14
  pointer-literal
15
  user-defined-literal
16
  ```
@@ -48,13 +48,12 @@ decimal-literal:
48
  hexadecimal-literal:
49
  hexadecimal-prefix hexadecimal-digit-sequence
50
  ```
51
 
52
  ``` bnf
53
- binary-digit:
54
- '0'
55
- '1'
56
  ```
57
 
58
  ``` bnf
59
  octal-digit: one of
60
  '0 1 2 3 4 5 6 7'
@@ -104,38 +103,44 @@ long-suffix: one of
104
  ``` bnf
105
  long-long-suffix: one of
106
  'll LL'
107
  ```
108
 
109
- An *integer literal* is a sequence of digits that has no period or
110
- exponent part, with optional separating single quotes that are ignored
111
- when determining its value. An integer literal may have a prefix that
112
- specifies its base and a suffix that specifies its type. The lexically
113
- first digit of the sequence of digits is the most significant. A *binary
114
- integer literal* (base two) begins with `0b` or `0B` and consists of a
115
- sequence of binary digits. An *octal integer literal* (base eight)
116
- begins with the digit `0` and consists of a sequence of octal
117
- digits.[^12] A *decimal integer literal* (base ten) begins with a digit
118
- other than `0` and consists of a sequence of decimal digits. A
119
- *hexadecimal integer literal* (base sixteen) begins with `0x` or `0X`
120
- and consists of a sequence of hexadecimal digits, which include the
121
- decimal digits and the letters `a` through `f` and `A` through `F` with
 
 
 
 
 
 
122
  decimal values ten through fifteen.
123
 
124
  [*Example 1*: The number twelve can be written `12`, `014`, `0XC`, or
125
- `0b1100`. The integer literals `1048576`, `1'048'576`, `0X100000`,
126
  `0x10'0000`, and `0'004'000'000` all have the same
127
  value. — *end example*]
128
 
129
- The type of an integer literal is the first of the corresponding list in
130
- Table  [[tab:lex.type.integer.literal]] in which its value can be
131
- represented.
132
 
133
- **Table: Types of integer literals** <a id="tab:lex.type.integer.literal">[tab:lex.type.integer.literal]</a>
134
 
135
- | | | |
136
- | ---------------- | ------------------------ | ------------------------ |
137
  | none | `int` | `int` |
138
  | | `long int` | `unsigned int` |
139
  | | `long long int` | `long int` |
140
  | | | `unsigned long int` |
141
  | | | `long long int` |
@@ -153,20 +158,20 @@ represented.
153
  | | | `unsigned long long int` |
154
  | Both `u` or `U` | `unsigned long long int` | `unsigned long long int` |
155
  | and `ll` or `LL` | | |
156
 
157
 
158
- If an integer literal cannot be represented by any type in its list and
159
- an extended integer type ([[basic.fundamental]]) can represent its
160
  value, it may have that extended integer type. If all of the types in
161
- the list for the integer literal are signed, the extended integer type
162
- shall be signed. If all of the types in the list for the integer literal
163
- are unsigned, the extended integer type shall be unsigned. If the list
164
- contains both signed and unsigned types, the extended integer type may
165
- be signed or unsigned. A program is ill-formed if one of its translation
166
- units contains an integer literal that cannot be represented by any of
167
- the allowed types.
168
 
169
  ### Character literals <a id="lex.ccon">[[lex.ccon]]</a>
170
 
171
  ``` bnf
172
  character-literal:
@@ -182,10 +187,17 @@ encoding-prefix: one of
182
  c-char-sequence:
183
  c-char
184
  c-char-sequence c-char
185
  ```
186
 
 
 
 
 
 
 
 
187
  ``` bnf
188
  escape-sequence:
189
  simple-escape-sequence
190
  octal-escape-sequence
191
  hexadecimal-escape-sequence
@@ -208,76 +220,80 @@ octal-escape-sequence:
208
  hexadecimal-escape-sequence:
209
  '\x' hexadecimal-digit
210
  hexadecimal-escape-sequence hexadecimal-digit
211
  ```
212
 
213
- A character literal is one or more characters enclosed in single quotes,
214
- as in `'x'`, optionally preceded by `u8`, `u`, `U`, or `L`, as in
215
- `u8'w'`, `u'x'`, `U'y'`, or `L'z'`, respectively.
216
-
217
- A character literal that does not begin with `u8`, `u`, `U`, or `L` is
218
  an *ordinary character literal*. An ordinary character literal that
219
  contains a single *c-char* representable in the execution character set
220
  has type `char`, with value equal to the numerical value of the encoding
221
  of the *c-char* in the execution character set. An ordinary character
222
- literal that contains more than one *c-char* is a *multicharacter
223
- literal*. A multicharacter literal, or an ordinary character literal
224
- containing a single *c-char* not representable in the execution
225
- character set, is conditionally-supported, has type `int`, and has an
226
- *implementation-defined* value.
227
-
228
- A character literal that begins with `u8`, such as `u8'w'`, is a
229
- character literal of type `char`, known as a *UTF-8 character literal*.
230
- The value of a UTF-8 character literal is equal to its ISO 10646 code
231
- point value, provided that the code point value is representable with a
232
- single UTF-8 code unit (that is, provided it is in the C0 Controls and
233
- Basic Latin Unicode block). If the value is not representable with a
234
- single UTF-8 code unit, the program is ill-formed. A UTF-8 character
235
- literal containing multiple *c-char*s is ill-formed.
236
-
237
- A character literal that begins with the letter `u`, such as `u'x'`, is
238
- a character literal of type `char16_t`. The value of a `char16_t`
239
- character literal containing a single *c-char* is equal to its ISO 10646
240
- code point value, provided that the code point is representable with a
241
- single 16-bit code unit. (That is, provided it is a basic multi-lingual
242
- plane code point.) If the value is not representable within 16 bits, the
243
- program is ill-formed. A `char16_t` character literal containing
244
- multiple *c-char*s is ill-formed.
245
-
246
- A character literal that begins with the letter `U`, such as `U'y'`, is
247
- a character literal of type `char32_t`. The value of a `char32_t`
248
- character literal containing a single *c-char* is equal to its ISO 10646
249
- code point value. A `char32_t` character literal containing multiple
 
 
250
  *c-char*s is ill-formed.
251
 
252
- A character literal that begins with the letter `L`, such as `L'z'`, is
253
- a *wide-character literal*. A wide-character literal has type
254
- `wchar_t`.[^13] The value of a wide-character literal containing a
 
 
 
 
 
 
255
  single *c-char* has value equal to the numerical value of the encoding
256
  of the *c-char* in the execution wide-character set, unless the *c-char*
257
  has no representation in the execution wide-character set, in which case
258
  the value is *implementation-defined*.
259
 
260
- [*Note 1*: The type `wchar_t` is able to represent all members of the
261
  execution wide-character set (see 
262
  [[basic.fundamental]]). — *end note*]
263
 
264
  The value of a wide-character literal containing multiple *c-char*s is
265
  *implementation-defined*.
266
 
267
  Certain non-graphic characters, the single quote `'`, the double quote
268
- `"`, the question mark `?`,[^14] and the backslash `\`, can be
269
- represented according to Table  [[tab:escape.sequences]]. The double
270
- quote `"` and the question mark `?`, can be represented as themselves or
271
- by the escape sequences `\"` and `\?` respectively, but the single quote
272
- `'` and the backslash `\` shall be represented by the escape sequences
273
- `\'` and `\\` respectively. Escape sequences in which the character
274
- following the backslash is not listed in Table  [[tab:escape.sequences]]
275
- are conditionally-supported, with *implementation-defined* semantics. An
276
- escape sequence specifies a single character.
277
 
278
- **Table: Escape sequences** <a id="tab:escape.sequences">[tab:escape.sequences]</a>
279
 
280
  | | | |
281
  | --------------- | -------------- | ------------------ |
282
  | new-line | NL(LF) | `\n` |
283
  | horizontal tab | HT | `\t` |
@@ -300,49 +316,49 @@ desired character. The escape `\x\numconst{hhh}` consists of the
300
  backslash followed by `x` followed by one or more hexadecimal digits
301
  that are taken to specify the value of the desired character. There is
302
  no limit to the number of digits in a hexadecimal sequence. A sequence
303
  of octal or hexadecimal digits is terminated by the first character that
304
  is not an octal digit or a hexadecimal digit, respectively. The value of
305
- a character literal is *implementation-defined* if it falls outside of
306
- the *implementation-defined* range defined for `char` (for character
307
- literals with no prefix) or `wchar_t` (for character literals prefixed
308
- by `L`).
309
 
310
- [*Note 2*: If the value of a character literal prefixed by `u`, `u8`,
311
  or `U` is outside the range defined for its type, the program is
312
  ill-formed. — *end note*]
313
 
314
  A *universal-character-name* is translated to the encoding, in the
315
  appropriate execution character set, of the character named. If there is
316
  no such encoding, the *universal-character-name* is translated to an
317
  *implementation-defined* encoding.
318
 
319
- [*Note 3*: In translation phase 1, a *universal-character-name* is
320
  introduced whenever an actual extended character is encountered in the
321
  source text. Therefore, all extended characters are described in terms
322
  of *universal-character-name*s. However, the actual compiler
323
  implementation may use its own native character set, so long as the same
324
  results are obtained. — *end note*]
325
 
326
- ### Floating literals <a id="lex.fcon">[[lex.fcon]]</a>
327
 
328
  ``` bnf
329
- floating-literal:
330
- decimal-floating-literal
331
- hexadecimal-floating-literal
332
  ```
333
 
334
  ``` bnf
335
- decimal-floating-literal:
336
- fractional-constant exponent-partₒₚₜ floating-suffixₒₚₜ
337
- digit-sequence exponent-part floating-suffixₒₚₜ
338
  ```
339
 
340
  ``` bnf
341
- hexadecimal-floating-literal:
342
- hexadecimal-prefix hexadecimal-fractional-constant binary-exponent-part floating-suffixₒₚₜ
343
- hexadecimal-prefix hexadecimal-digit-sequence binary-exponent-part floating-suffixₒₚₜ
344
  ```
345
 
346
  ``` bnf
347
  fractional-constant:
348
  digit-sequenceₒₚₜ '.' digit-sequence
@@ -377,50 +393,55 @@ digit-sequence:
377
  digit
378
  digit-sequence '''ₒₚₜ digit
379
  ```
380
 
381
  ``` bnf
382
- floating-suffix: one of
383
  'f l F L'
384
  ```
385
 
386
- A floating literal consists of an optional prefix specifying a base, an
387
- integer part, a radix point, a fraction part, an `e`, `E`, `p` or `P`,
388
- an optionally signed integer exponent, and an optional type suffix. The
389
- integer and fraction parts both consist of a sequence of decimal (base
390
- ten) digits if there is no prefix, or hexadecimal (base sixteen) digits
391
- if the prefix is `0x` or `0X`. The floating literal is a *decimal
392
- floating literal* in the former case and a *hexadecimal floating
393
- literal* in the latter case. Optional separating single quotes in a
394
- *digit-sequence* or *hexadecimal-digit-sequence* are ignored when
395
- determining its value.
396
-
397
- [*Example 1*: The floating literals `1.602'176'565e-19` and
398
- `1.602176565e-19` have the same value. — *end example*]
399
-
400
- Either the integer part or the fraction part (not both) can be omitted.
401
- Either the radix point or the letter `e` or `E` and the exponent (not
402
- both) can be omitted from a decimal floating literal. The radix point
403
- (but not the exponent) can be omitted from a hexadecimal floating
404
- literal. The integer part, the optional radix point, and the optional
405
- fraction part, form the *significand* of the floating literal. In a
406
- decimal floating literal, the exponent, if present, indicates the power
407
- of 10 by which the significand is to be scaled. In a hexadecimal
408
- floating literal, the exponent indicates the power of 2 by which the
409
- significand is to be scaled.
410
-
411
- [*Example 2*: The floating literals `49.625` and `0xC.68p+2` have the
412
- same value. *end example*]
413
-
414
- If the scaled value is in the range of representable values for its
415
- type, the result is the scaled value if representable, else the larger
416
- or smaller representable value nearest the scaled value, chosen in an
417
- *implementation-defined* manner. The type of a floating literal is
418
- `double` unless explicitly specified by a suffix. The suffixes `f` and
419
- `F` specify `float`, the suffixes `l` and `L` specify `long` `double`.
 
 
420
  If the scaled value is not in the range of representable values for its
421
- type, the program is ill-formed.
 
 
 
422
 
423
  ### String literals <a id="lex.string">[[lex.string]]</a>
424
 
425
  ``` bnf
426
  string-literal:
@@ -432,10 +453,17 @@ string-literal:
432
  s-char-sequence:
433
  s-char
434
  s-char-sequence s-char
435
  ```
436
 
 
 
 
 
 
 
 
437
  ``` bnf
438
  raw-string:
439
  '"' d-char-sequenceₒₚₜ '(' r-char-sequenceₒₚₜ ')' d-char-sequenceₒₚₜ '"'
440
  ```
441
 
@@ -443,21 +471,28 @@ raw-string:
443
  r-char-sequence:
444
  r-char
445
  r-char-sequence r-char
446
  ```
447
 
 
 
 
 
 
 
448
  ``` bnf
449
  d-char-sequence:
450
  d-char
451
  d-char-sequence d-char
452
  ```
453
 
454
- A *string-literal* is a sequence of characters (as defined in 
455
- [[lex.ccon]]) surrounded by double quotes, optionally prefixed by `R`,
456
- `u8`, `u8R`, `u`, `uR`, `U`, `UR`, `L`, or `LR`, as in `"..."`,
457
- `R"(...)"`, `u8"..."`, `u8R"**(...)**"`, `u"..."`, `uR"*~(...)*~"`,
458
- `U"..."`, `UR"zzz(...)zzz"`, `L"..."`, or `LR"(...)"`, respectively.
 
459
 
460
  A *string-literal* that has an `R` in the prefix is a *raw string
461
  literal*. The *d-char-sequence* serves as a delimiter. The terminating
462
  *d-char-sequence* of a *raw-string* is the same sequence of characters
463
  as the initial *d-char-sequence*. A *d-char-sequence* shall consist of
@@ -494,78 +529,74 @@ a"
494
  ```
495
 
496
  is equivalent to `"\n)\\\na\"\n"`. The raw string
497
 
498
  ``` cpp
499
- R"(??)"
500
  ```
501
 
502
- is equivalent to `"\?\?"`. The raw string
503
-
504
- ``` cpp
505
- R"#(
506
- )??="
507
- )#"
508
- ```
509
-
510
- is equivalent to `"\n)\?\?=\"\n"`.
511
 
512
  — *end example*]
513
 
514
  After translation phase 6, a *string-literal* that does not begin with
515
- an *encoding-prefix* is an *ordinary string literal*, and is initialized
516
- with the given characters.
 
 
517
 
518
  A *string-literal* that begins with `u8`, such as `u8"asdf"`, is a
519
- *UTF-8 string literal*.
 
 
 
 
520
 
521
  Ordinary string literals and UTF-8 string literals are also referred to
522
- as narrow string literals. A narrow string literal has type “array of
523
- *n* `const char`”, where *n* is the size of the string as defined below,
524
- and has static storage duration ([[basic.stc]]).
525
 
526
- For a UTF-8 string literal, each successive element of the object
527
- representation ([[basic.types]]) has the value of the corresponding
528
- code unit of the UTF-8 encoding of the string.
 
 
529
 
530
- A *string-literal* that begins with `u`, such as `u"asdf"`, is a
531
- `char16_t` string literal. A `char16_t` string literal has type “array
532
- of *n* `const char16_t`”, where *n* is the size of the string as defined
533
- below; it is initialized with the given characters. A single *c-char*
534
- may produce more than one `char16_t` character in the form of surrogate
535
- pairs.
536
 
537
- A *string-literal* that begins with `U`, such as `U"asdf"`, is a
538
- `char32_t` string literal. A `char32_t` string literal has type “array
539
- of *n* `const char32_t`”, where *n* is the size of the string as defined
540
- below; it is initialized with the given characters.
 
541
 
542
  A *string-literal* that begins with `L`, such as `L"asdf"`, is a *wide
543
  string literal*. A wide string literal has type “array of *n* `const
544
  wchar_t`”, where *n* is the size of the string as defined below; it is
545
  initialized with the given characters.
546
 
547
- In translation phase 6 ([[lex.phases]]), adjacent *string-literal*s are
548
  concatenated. If both *string-literal*s have the same *encoding-prefix*,
549
- the resulting concatenated string literal has that *encoding-prefix*. If
550
- one *string-literal* has no *encoding-prefix*, it is treated as a
551
  *string-literal* of the same *encoding-prefix* as the other operand. If
552
  a UTF-8 string literal token is adjacent to a wide string literal token,
553
  the program is ill-formed. Any other concatenations are
554
  conditionally-supported with *implementation-defined* behavior.
555
 
556
- [*Note 3*: This concatenation is an interpretation, not a conversion.
557
  Because the interpretation happens in translation phase 6 (after each
558
- character from a string literal has been translated into a value from
559
  the appropriate character set), a *string-literal*’s initial rawness has
560
  no effect on the interpretation or well-formedness of the
561
  concatenation. — *end note*]
562
 
563
- Table  [[tab:lex.string.concat]] has some examples of valid
564
- concatenations.
565
 
566
- **Table: String literal concatenations** <a id="tab:lex.string.concat">[tab:lex.string.concat]</a>
567
 
568
  | | | | | | |
569
  | -------------------------- | ----- | -------------------------- | ----- | -------------------------- | ----- |
570
  | *[spans 2 columns]* Source | Means | *[spans 2 columns]* Source | Means | *[spans 2 columns]* Source | Means |
571
  | `u"a"` | `u"b"` | `u"ab"` | `U"a"` | `U"b"` | `U"ab"` | `L"a"` | `L"b"` | `L"ab"` |
@@ -584,46 +615,49 @@ Characters in concatenated strings are kept distinct.
584
  contains the two characters `'\xA'` and `'B'` after concatenation (and
585
  not the single hexadecimal character `'\xAB'`).
586
 
587
  — *end example*]
588
 
589
- After any necessary concatenation, in translation phase 7 (
590
- [[lex.phases]]), `'\0'` is appended to every string literal so that
591
  programs that scan a string can find its end.
592
 
593
  Escape sequences and *universal-character-name*s in non-raw string
594
- literals have the same meaning as in character literals ([[lex.ccon]]),
595
  except that the single quote `'` is representable either by itself or by
596
  the escape sequence `\'`, and the double quote `"` shall be preceded by
597
- a `\`, and except that a *universal-character-name* in a `char16_t`
598
- string literal may yield a surrogate pair. In a narrow string literal, a
599
- *universal-character-name* may map to more than one `char` element due
600
- to *multibyte encoding*. The size of a `char32_t` or wide string literal
601
- is the total number of escape sequences, *universal-character-name*s,
602
- and other characters, plus one for the terminating `U'\0'` or `L'\0'`.
603
- The size of a `char16_t` string literal is the total number of escape
604
- sequences, *universal-character-name*s, and other characters, plus one
605
- for each character requiring a surrogate pair, plus one for the
606
- terminating `u'\0'`.
607
 
608
- [*Note 4*: The size of a `char16_t` string literal is the number of
609
  code units, not the number of characters. — *end note*]
610
 
611
- Within `char32_t` and `char16_t` string literals, any
612
- *universal-character-name*s shall be within the range `0x0` to
613
- `0x10FFFF`. The size of a narrow string literal is the total number of
614
- escape sequences and other characters, plus at least one for the
615
- multibyte encoding of each *universal-character-name*, plus one for the
 
 
616
  terminating `'\0'`.
617
 
618
  Evaluating a *string-literal* results in a string literal object with
619
  static storage duration, initialized from the given characters as
620
- specified above. Whether all string literals are distinct (that is, are
621
- stored in nonoverlapping objects) and whether successive evaluations of
622
- a *string-literal* yield the same or a different object is unspecified.
 
623
 
624
- [*Note 5*: The effect of attempting to modify a string literal is
625
  undefined. — *end note*]
626
 
627
  ### Boolean literals <a id="lex.bool">[[lex.bool]]</a>
628
 
629
  ``` bnf
@@ -644,21 +678,21 @@ pointer-literal:
644
 
645
  The pointer literal is the keyword `nullptr`. It is a prvalue of type
646
  `std::nullptr_t`.
647
 
648
  [*Note 1*: `std::nullptr_t` is a distinct type that is neither a
649
- pointer type nor a pointer to member type; rather, a prvalue of this
650
  type is a null pointer constant and can be converted to a null pointer
651
  value or null member pointer value. See  [[conv.ptr]] and 
652
  [[conv.mem]]. — *end note*]
653
 
654
  ### User-defined literals <a id="lex.ext">[[lex.ext]]</a>
655
 
656
  ``` bnf
657
  user-defined-literal:
658
  user-defined-integer-literal
659
- user-defined-floating-literal
660
  user-defined-string-literal
661
  user-defined-character-literal
662
  ```
663
 
664
  ``` bnf
@@ -668,11 +702,11 @@ user-defined-integer-literal:
668
  hexadecimal-literal ud-suffix
669
  binary-literal ud-suffix
670
  ```
671
 
672
  ``` bnf
673
- user-defined-floating-literal:
674
  fractional-constant exponent-partₒₚₜ ud-suffix
675
  digit-sequence exponent-part ud-suffix
676
  hexadecimal-prefix hexadecimal-fractional-constant binary-exponent-part ud-suffix
677
  hexadecimal-prefix hexadecimal-digit-sequence binary-exponent-part ud-suffix
678
  ```
@@ -706,65 +740,65 @@ is a *user-defined-literal*, but `12LL` is an *integer-literal*.
706
  The syntactic non-terminal preceding the *ud-suffix* in a
707
  *user-defined-literal* is taken to be the longest sequence of characters
708
  that could match that non-terminal.
709
 
710
  A *user-defined-literal* is treated as a call to a literal operator or
711
- literal operator template ([[over.literal]]). To determine the form of
712
  this call for a given *user-defined-literal* *L* with *ud-suffix* *X*,
713
  the *literal-operator-id* whose literal suffix identifier is *X* is
714
  looked up in the context of *L* using the rules for unqualified name
715
- lookup ([[basic.lookup.unqual]]). Let *S* be the set of declarations
716
- found by this lookup. *S* shall not be empty.
717
 
718
  If *L* is a *user-defined-integer-literal*, let *n* be the literal
719
  without its *ud-suffix*. If *S* contains a literal operator with
720
  parameter type `unsigned long long`, the literal *L* is treated as a
721
  call of the form
722
 
723
  ``` cpp
724
  operator "" X(nULL)
725
  ```
726
 
727
- Otherwise, *S* shall contain a raw literal operator or a literal
728
- operator template ([[over.literal]]) but not both. If *S* contains a
729
- raw literal operator, the literal *L* is treated as a call of the form
730
 
731
  ``` cpp
732
  operator "" X("n{"})
733
  ```
734
 
735
- Otherwise (*S* contains a literal operator template), *L* is treated as
736
- a call of the form
737
 
738
  ``` cpp
739
  operator "" X<'c₁', 'c₂', ... 'cₖ'>()
740
  ```
741
 
742
  where *n* is the source character sequence c₁c₂...cₖ.
743
 
744
  [*Note 1*: The sequence c₁c₂...cₖ can only contain characters from the
745
  basic source character set. — *end note*]
746
 
747
- If *L* is a *user-defined-floating-literal*, let *f* be the literal
748
- without its *ud-suffix*. If *S* contains a literal operator with
749
  parameter type `long double`, the literal *L* is treated as a call of
750
  the form
751
 
752
  ``` cpp
753
  operator "" X(fL)
754
  ```
755
 
756
- Otherwise, *S* shall contain a raw literal operator or a literal
757
- operator template ([[over.literal]]) but not both. If *S* contains a
758
- raw literal operator, the *literal* *L* is treated as a call of the form
759
 
760
  ``` cpp
761
  operator "" X("f{"})
762
  ```
763
 
764
- Otherwise (*S* contains a literal operator template), *L* is treated as
765
- a call of the form
766
 
767
  ``` cpp
768
  operator "" X<'c₁', 'c₂', ... 'cₖ'>()
769
  ```
770
 
@@ -773,20 +807,28 @@ where *f* is the source character sequence c₁c₂...cₖ.
773
  [*Note 2*: The sequence c₁c₂...cₖ can only contain characters from the
774
  basic source character set. — *end note*]
775
 
776
  If *L* is a *user-defined-string-literal*, let *str* be the literal
777
  without its *ud-suffix* and let *len* be the number of code units in
778
- *str* (i.e., its length excluding the terminating null character). The
 
 
779
  literal *L* is treated as a call of the form
780
 
 
 
 
 
 
 
781
  ``` cpp
782
  operator "" X(str, len)
783
  ```
784
 
785
  If *L* is a *user-defined-character-literal*, let *ch* be the literal
786
- without its *ud-suffix*. *S* shall contain a literal operator (
787
- [[over.literal]]) whose only parameter has the type of *ch* and the
788
  literal *L* is treated as a call of the form
789
 
790
  ``` cpp
791
  operator "" X(ch)
792
  ```
@@ -805,16 +847,16 @@ int main() {
805
  }
806
  ```
807
 
808
  — *end example*]
809
 
810
- In translation phase 6 ([[lex.phases]]), adjacent string literals are
811
- concatenated and *user-defined-string-literal*s are considered string
812
- literals for that purpose. During concatenation, *ud-suffix*es are
813
- removed and ignored and the concatenation process occurs as described
814
- in  [[lex.string]]. At the end of phase 6, if a string literal is the
815
- result of a concatenation involving at least one
816
  *user-defined-string-literal*, all the participating
817
  *user-defined-string-literal*s shall have the same *ud-suffix* and that
818
  suffix is applied to the result of the concatenation.
819
 
820
  [*Example 3*:
@@ -832,51 +874,55 @@ int main() {
832
  [basic.fundamental]: basic.md#basic.fundamental
833
  [basic.link]: basic.md#basic.link
834
  [basic.lookup.unqual]: basic.md#basic.lookup.unqual
835
  [basic.stc]: basic.md#basic.stc
836
  [basic.types]: basic.md#basic.types
837
- [conv.mem]: conv.md#conv.mem
838
- [conv.ptr]: conv.md#conv.ptr
839
  [cpp]: cpp.md#cpp
840
  [cpp.concat]: cpp.md#cpp.concat
841
  [cpp.cond]: cpp.md#cpp.cond
 
842
  [cpp.include]: cpp.md#cpp.include
 
843
  [cpp.stringize]: cpp.md#cpp.stringize
844
  [dcl.attr.grammar]: dcl.md#dcl.attr.grammar
845
  [headers]: library.md#headers
846
  [lex]: #lex
847
  [lex.bool]: #lex.bool
848
  [lex.ccon]: #lex.ccon
 
849
  [lex.charset]: #lex.charset
850
  [lex.comment]: #lex.comment
851
  [lex.digraph]: #lex.digraph
852
  [lex.ext]: #lex.ext
853
  [lex.fcon]: #lex.fcon
 
854
  [lex.header]: #lex.header
855
  [lex.icon]: #lex.icon
 
 
856
  [lex.key]: #lex.key
 
857
  [lex.literal]: #lex.literal
858
  [lex.literal.kinds]: #lex.literal.kinds
859
  [lex.name]: #lex.name
 
 
 
860
  [lex.nullptr]: #lex.nullptr
861
  [lex.operators]: #lex.operators
862
  [lex.phases]: #lex.phases
863
  [lex.ppnumber]: #lex.ppnumber
864
  [lex.pptoken]: #lex.pptoken
865
  [lex.separate]: #lex.separate
866
  [lex.string]: #lex.string
 
867
  [lex.token]: #lex.token
 
 
868
  [over.literal]: over.md#over.literal
869
- [tab:alternative.representations]: #tab:alternative.representations
870
- [tab:alternative.tokens]: #tab:alternative.tokens
871
- [tab:charname.allowed]: #tab:charname.allowed
872
- [tab:charname.disallowed]: #tab:charname.disallowed
873
- [tab:escape.sequences]: #tab:escape.sequences
874
- [tab:identifiers.special]: #tab:identifiers.special
875
- [tab:keywords]: #tab:keywords
876
- [tab:lex.string.concat]: #tab:lex.string.concat
877
- [tab:lex.type.integer.literal]: #tab:lex.type.integer.literal
878
  [temp.explicit]: temp.md#temp.explicit
879
  [temp.names]: temp.md#temp.names
880
 
881
  [^1]: Implementations must behave as if these separate phases occur,
882
  although in practice different phases might be folded together.
@@ -897,21 +943,21 @@ int main() {
897
  (described in translation phase 1) is specified as
898
  *implementation-defined*, an implementation is required to document
899
  how the basic source characters are represented in source files.
900
 
901
  [^5]: A sequence of characters resembling a *universal-character-name*
902
- in an *r-char-sequence* ([[lex.string]]) does not form a
903
  *universal-character-name*.
904
 
905
  [^6]: These include “digraphs” and additional reserved words. The term
906
  “digraph” (token consisting of two characters) is not perfectly
907
- descriptive, since one of the alternative preprocessing-tokens is
908
  `%:%:` and of course several primary tokens contain two characters.
909
  Nonetheless, those alternative tokens that aren’t lexical keywords
910
  are colloquially known as “digraphs”.
911
 
912
- [^7]: Thus the “stringized” values ([[cpp.stringize]]) of `[` and `<:`
913
  will be different, maintaining the source spelling, but the tokens
914
  can otherwise be freely interchanged.
915
 
916
  [^8]: Literals include strings and character and numeric literals.
917
 
@@ -928,15 +974,13 @@ int main() {
928
  long external identifier, but C++ does not place a translation limit
929
  on significant characters for external identifiers. In C++, upper-
930
  and lower-case letters are considered different for all identifiers,
931
  including external identifiers.
932
 
933
- [^11]: The term “literal” generally designates, in this International
934
- Standard, those tokens that are called “constants” in ISO C.
935
 
936
- [^12]: The digits `8` and `9` are not octal digits.
937
-
938
- [^13]: They are intended for character sets where a character does not
939
  fit into a single byte.
940
 
941
- [^14]: Using an escape sequence for a question mark is supported for
942
  compatibility with ISO C++14 and ISO C.
 
6
 
7
  ``` bnf
8
  literal:
9
  integer-literal
10
  character-literal
11
+ floating-point-literal
12
  string-literal
13
  boolean-literal
14
  pointer-literal
15
  user-defined-literal
16
  ```
 
48
  hexadecimal-literal:
49
  hexadecimal-prefix hexadecimal-digit-sequence
50
  ```
51
 
52
  ``` bnf
53
+ binary-digit: one of
54
+ '0 1'
 
55
  ```
56
 
57
  ``` bnf
58
  octal-digit: one of
59
  '0 1 2 3 4 5 6 7'
 
103
  ``` bnf
104
  long-long-suffix: one of
105
  'll LL'
106
  ```
107
 
108
+ In an *integer-literal*, the sequence of *binary-digit*s,
109
+ *octal-digit*s, *digit*s, or *hexadecimal-digit*s is interpreted as a
110
+ base N integer as shown in table [[lex.icon.base]]; the lexically first
111
+ digit of the sequence of digits is the most significant.
112
+
113
+ [*Note 1*: The prefix and any optional separating single quotes are
114
+ ignored when determining the value. *end note*]
115
+
116
+ **Table: Base of *integer-literal*{s}** <a id="lex.icon.base">[lex.icon.base]</a>
117
+
118
+ | Kind of *integer-literal* | base $N$ |
119
+ | ------------------------- | -------- |
120
+ | *binary-literal* | 2 |
121
+ | *octal-literal* | 8 |
122
+ | *decimal-literal* | 10 |
123
+ | *hexadecimal-literal* | 16 |
124
+
125
+
126
+ The *hexadecimal-digit*s `a` through `f` and `A` through `F` have
127
  decimal values ten through fifteen.
128
 
129
  [*Example 1*: The number twelve can be written `12`, `014`, `0XC`, or
130
+ `0b1100`. The *integer-literal*s `1048576`, `1'048'576`, `0X100000`,
131
  `0x10'0000`, and `0'004'000'000` all have the same
132
  value. — *end example*]
133
 
134
+ The type of an *integer-literal* is the first type in the list in
135
+ [[lex.icon.type]] corresponding to its optional *integer-suffix* in
136
+ which its value can be represented. An *integer-literal* is a prvalue.
137
 
138
+ **Table: Types of *integer-literal*s** <a id="lex.icon.type">[lex.icon.type]</a>
139
 
140
+ | *integer-suffix* | *decimal-literal* | *integer-literal* other than *decimal-literal* |
141
+ | ---------------- | ------------------------ | ---------------------------------------------- |
142
  | none | `int` | `int` |
143
  | | `long int` | `unsigned int` |
144
  | | `long long int` | `long int` |
145
  | | | `unsigned long int` |
146
  | | | `long long int` |
 
158
  | | | `unsigned long long int` |
159
  | Both `u` or `U` | `unsigned long long int` | `unsigned long long int` |
160
  | and `ll` or `LL` | | |
161
 
162
 
163
+ If an *integer-literal* cannot be represented by any type in its list
164
+ and an extended integer type [[basic.fundamental]] can represent its
165
  value, it may have that extended integer type. If all of the types in
166
+ the list for the *integer-literal* are signed, the extended integer type
167
+ shall be signed. If all of the types in the list for the
168
+ *integer-literal* are unsigned, the extended integer type shall be
169
+ unsigned. If the list contains both signed and unsigned types, the
170
+ extended integer type may be signed or unsigned. A program is ill-formed
171
+ if one of its translation units contains an *integer-literal* that
172
+ cannot be represented by any of the allowed types.
173
 
174
  ### Character literals <a id="lex.ccon">[[lex.ccon]]</a>
175
 
176
  ``` bnf
177
  character-literal:
 
187
  c-char-sequence:
188
  c-char
189
  c-char-sequence c-char
190
  ```
191
 
192
+ ``` bnf
193
+ c-char:
194
+ any member of the basic source character set except the single-quote ''', backslash '\', or new-line character
195
+ escape-sequence
196
+ universal-character-name
197
+ ```
198
+
199
  ``` bnf
200
  escape-sequence:
201
  simple-escape-sequence
202
  octal-escape-sequence
203
  hexadecimal-escape-sequence
 
220
  hexadecimal-escape-sequence:
221
  '\x' hexadecimal-digit
222
  hexadecimal-escape-sequence hexadecimal-digit
223
  ```
224
 
225
+ A *character-literal* that does not begin with `u8`, `u`, `U`, or `L` is
 
 
 
 
226
  an *ordinary character literal*. An ordinary character literal that
227
  contains a single *c-char* representable in the execution character set
228
  has type `char`, with value equal to the numerical value of the encoding
229
  of the *c-char* in the execution character set. An ordinary character
230
+ literal that contains more than one *c-char* is a
231
+ *multicharacter literal*. A multicharacter literal, or an ordinary
232
+ character literal containing a single *c-char* not representable in the
233
+ execution character set, is conditionally-supported, has type `int`, and
234
+ has an *implementation-defined* value.
235
+
236
+ A *character-literal* that begins with `u8`, such as `u8'w'`, is a
237
+ *character-literal* of type `char8_t`, known as a *UTF-8 character
238
+ literal*. The value of a UTF-8 character literal is equal to its ISO/IEC
239
+ 10646 code point value, provided that the code point value can be
240
+ encoded as a single UTF-8 code unit.
241
+
242
+ [*Note 1*: That is, provided the code point value is in the range
243
+ [0, 7F] (hexadecimal). *end note*]
244
+
245
+ If the value is not representable with a single UTF-8 code unit, the
246
+ program is ill-formed. A UTF-8 character literal containing multiple
247
+ *c-char*s is ill-formed.
248
+
249
+ A *character-literal* that begins with the letter `u`, such as `u'x'`,
250
+ is a *character-literal* of type `char16_t`, known as a *UTF-16
251
+ character literal*. The value of a UTF-16 character literal is equal to
252
+ its ISO/IEC 10646 code point value, provided that the code point value
253
+ is representable with a single 16-bit code unit.
254
+
255
+ [*Note 2*: That is, provided the code point value is in the range
256
+ [0, FFFF] (hexadecimal). *end note*]
257
+
258
+ If the value is not representable with a single 16-bit code unit, the
259
+ program is ill-formed. A UTF-16 character literal containing multiple
260
  *c-char*s is ill-formed.
261
 
262
+ A *character-literal* that begins with the letter `U`, such as `U'y'`,
263
+ is a *character-literal* of type `char32_t`, known as a *UTF-32
264
+ character literal*. The value of a UTF-32 character literal containing a
265
+ single *c-char* is equal to its ISO/IEC 10646 code point value. A UTF-32
266
+ character literal containing multiple *c-char*s is ill-formed.
267
+
268
+ A *character-literal* that begins with the letter `L`, such as `L'z'`,
269
+ is a *wide-character literal*. A wide-character literal has type
270
+ `wchar_t`.[^12] The value of a wide-character literal containing a
271
  single *c-char* has value equal to the numerical value of the encoding
272
  of the *c-char* in the execution wide-character set, unless the *c-char*
273
  has no representation in the execution wide-character set, in which case
274
  the value is *implementation-defined*.
275
 
276
+ [*Note 3*: The type `wchar_t` is able to represent all members of the
277
  execution wide-character set (see 
278
  [[basic.fundamental]]). — *end note*]
279
 
280
  The value of a wide-character literal containing multiple *c-char*s is
281
  *implementation-defined*.
282
 
283
  Certain non-graphic characters, the single quote `'`, the double quote
284
+ `"`, the question mark `?`,[^13] and the backslash `\`, can be
285
+ represented according to [[lex.ccon.esc]]. The double quote `"` and the
286
+ question mark `?`, can be represented as themselves or by the escape
287
+ sequences `\"` and `\?` respectively, but the single quote `'` and the
288
+ backslash `\` shall be represented by the escape sequences `\'` and `\\`
289
+ respectively. Escape sequences in which the character following the
290
+ backslash is not listed in [[lex.ccon.esc]] are conditionally-supported,
291
+ with *implementation-defined* semantics. An escape sequence specifies a
292
+ single character.
293
 
294
+ **Table: Escape sequences** <a id="lex.ccon.esc">[lex.ccon.esc]</a>
295
 
296
  | | | |
297
  | --------------- | -------------- | ------------------ |
298
  | new-line | NL(LF) | `\n` |
299
  | horizontal tab | HT | `\t` |
 
316
  backslash followed by `x` followed by one or more hexadecimal digits
317
  that are taken to specify the value of the desired character. There is
318
  no limit to the number of digits in a hexadecimal sequence. A sequence
319
  of octal or hexadecimal digits is terminated by the first character that
320
  is not an octal digit or a hexadecimal digit, respectively. The value of
321
+ a *character-literal* is *implementation-defined* if it falls outside of
322
+ the *implementation-defined* range defined for `char` (for
323
+ *character-literal*s with no prefix) or `wchar_t` (for
324
+ *character-literal*s prefixed by `L`).
325
 
326
+ [*Note 4*: If the value of a *character-literal* prefixed by `u`, `u8`,
327
  or `U` is outside the range defined for its type, the program is
328
  ill-formed. — *end note*]
329
 
330
  A *universal-character-name* is translated to the encoding, in the
331
  appropriate execution character set, of the character named. If there is
332
  no such encoding, the *universal-character-name* is translated to an
333
  *implementation-defined* encoding.
334
 
335
+ [*Note 5*: In translation phase 1, a *universal-character-name* is
336
  introduced whenever an actual extended character is encountered in the
337
  source text. Therefore, all extended characters are described in terms
338
  of *universal-character-name*s. However, the actual compiler
339
  implementation may use its own native character set, so long as the same
340
  results are obtained. — *end note*]
341
 
342
+ ### Floating-point literals <a id="lex.fcon">[[lex.fcon]]</a>
343
 
344
  ``` bnf
345
+ floating-point-literal:
346
+ decimal-floating-point-literal
347
+ hexadecimal-floating-point-literal
348
  ```
349
 
350
  ``` bnf
351
+ decimal-floating-point-literal:
352
+ fractional-constant exponent-partₒₚₜ floating-point-suffixₒₚₜ
353
+ digit-sequence exponent-part floating-point-suffixₒₚₜ
354
  ```
355
 
356
  ``` bnf
357
+ hexadecimal-floating-point-literal:
358
+ hexadecimal-prefix hexadecimal-fractional-constant binary-exponent-part floating-point-suffixₒₚₜ
359
+ hexadecimal-prefix hexadecimal-digit-sequence binary-exponent-part floating-point-suffixₒₚₜ
360
  ```
361
 
362
  ``` bnf
363
  fractional-constant:
364
  digit-sequenceₒₚₜ '.' digit-sequence
 
393
  digit
394
  digit-sequence '''ₒₚₜ digit
395
  ```
396
 
397
  ``` bnf
398
+ floating-point-suffix: one of
399
  'f l F L'
400
  ```
401
 
402
+ The type of a *floating-point-literal* is determined by its
403
+ *floating-point-suffix* as specified in [[lex.fcon.type]].
404
+
405
+ **Table: Types of *floating-point-literal*{s}** <a id="lex.fcon.type">[lex.fcon.type]</a>
406
+
407
+ | *floating-point-suffix* | type |
408
+ | ----------------------- | --------------- |
409
+ | none | `double` |
410
+ | `f` or `F` | `float` |
411
+ | `l` or `L` | `long` `double` |
412
+
413
+
414
+ The *significand* of a *floating-point-literal* is the
415
+ *fractional-constant* or *digit-sequence* of a
416
+ *decimal-floating-point-literal* or the
417
+ *hexadecimal-fractional-constant* or *hexadecimal-digit-sequence* of a
418
+ *hexadecimal-floating-point-literal*. In the significand, the sequence
419
+ of *digit*s or *hexadecimal-digit*s and optional period are interpreted
420
+ as a base N real number s, where N is 10 for a
421
+ *decimal-floating-point-literal* and 16 for a
422
+ *hexadecimal-floating-point-literal*.
423
+
424
+ [*Note 1*: Any optional separating single quotes are ignored when
425
+ determining the value. *end note*]
426
+
427
+ If an *exponent-part* or *binary-exponent-part* is present, the exponent
428
+ e of the *floating-point-literal* is the result of interpreting the
429
+ sequence of an optional *sign* and the *digit*s as a base 10 integer.
430
+ Otherwise, the exponent e is 0. The scaled value of the literal is
431
+ s × 10ᵉ for a *decimal-floating-point-literal* and s × 2ᵉ for a
432
+ *hexadecimal-floating-point-literal*.
433
+
434
+ [*Example 1*: The *floating-point-literal*s `49.625` and `0xC.68p+2`
435
+ have the same value. The *floating-point-literal*s `1.602'176'565e-19`
436
+ and `1.602176565e-19` have the same value. — *end example*]
437
+
438
  If the scaled value is not in the range of representable values for its
439
+ type, the program is ill-formed. Otherwise, the value of a
440
+ *floating-point-literal* is the scaled value if representable, else the
441
+ larger or smaller representable value nearest the scaled value, chosen
442
+ in an *implementation-defined* manner.
443
 
444
  ### String literals <a id="lex.string">[[lex.string]]</a>
445
 
446
  ``` bnf
447
  string-literal:
 
453
  s-char-sequence:
454
  s-char
455
  s-char-sequence s-char
456
  ```
457
 
458
+ ``` bnf
459
+ s-char:
460
+ any member of the basic source character set except the double-quote '"', backslash '\', or new-line character
461
+ escape-sequence
462
+ universal-character-name
463
+ ```
464
+
465
  ``` bnf
466
  raw-string:
467
  '"' d-char-sequenceₒₚₜ '(' r-char-sequenceₒₚₜ ')' d-char-sequenceₒₚₜ '"'
468
  ```
469
 
 
471
  r-char-sequence:
472
  r-char
473
  r-char-sequence r-char
474
  ```
475
 
476
+ ``` bnf
477
+ r-char:
478
+ any member of the source character set, except a right parenthesis ')' followed by
479
+ the initial *d-char-sequence* (which may be empty) followed by a double quote '"'.
480
+ ```
481
+
482
  ``` bnf
483
  d-char-sequence:
484
  d-char
485
  d-char-sequence d-char
486
  ```
487
 
488
+ ``` bnf
489
+ d-char:
490
+ any member of the basic source character set except:
491
+ space, the left parenthesis '(', the right parenthesis ')', the backslash '\', and the control characters
492
+ representing horizontal tab, vertical tab, form feed, and newline.
493
+ ```
494
 
495
  A *string-literal* that has an `R` in the prefix is a *raw string
496
  literal*. The *d-char-sequence* serves as a delimiter. The terminating
497
  *d-char-sequence* of a *raw-string* is the same sequence of characters
498
  as the initial *d-char-sequence*. A *d-char-sequence* shall consist of
 
529
  ```
530
 
531
  is equivalent to `"\n)\\\na\"\n"`. The raw string
532
 
533
  ``` cpp
534
+ R"(x = "\"y\"")"
535
  ```
536
 
537
+ is equivalent to `"x = \"\\\"y\\\"\""`.
 
 
 
 
 
 
 
 
538
 
539
  — *end example*]
540
 
541
  After translation phase 6, a *string-literal* that does not begin with
542
+ an *encoding-prefix* is an *ordinary string literal*. An ordinary string
543
+ literal has type “array of *n* `const char`” where *n* is the size of
544
+ the string as defined below, has static storage duration [[basic.stc]],
545
+ and is initialized with the given characters.
546
 
547
  A *string-literal* that begins with `u8`, such as `u8"asdf"`, is a
548
+ *UTF-8 string literal*. A UTF-8 string literal has type “array of *n*
549
+ `const char8_t`”, where *n* is the size of the string as defined below;
550
+ each successive element of the object representation [[basic.types]] has
551
+ the value of the corresponding code unit of the UTF-8 encoding of the
552
+ string.
553
 
554
  Ordinary string literals and UTF-8 string literals are also referred to
555
+ as narrow string literals.
 
 
556
 
557
+ A *string-literal* that begins with `u`, such as `u"asdf"`, is a *UTF-16
558
+ string literal*. A UTF-16 string literal has type “array of *n*
559
+ `const char16_t`”, where *n* is the size of the string as defined below;
560
+ each successive element of the array has the value of the corresponding
561
+ code unit of the UTF-16 encoding of the string.
562
 
563
+ [*Note 3*: A single *c-char* may produce more than one `char16_t`
564
+ character in the form of surrogate pairs. A surrogate pair is a
565
+ representation for a single code point as a sequence of two 16-bit code
566
+ units. *end note*]
 
 
567
 
568
+ A *string-literal* that begins with `U`, such as `U"asdf"`, is a *UTF-32
569
+ string literal*. A UTF-32 string literal has type “array of *n*
570
+ `const char32_t`”, where *n* is the size of the string as defined below;
571
+ each successive element of the array has the value of the corresponding
572
+ code unit of the UTF-32 encoding of the string.
573
 
574
  A *string-literal* that begins with `L`, such as `L"asdf"`, is a *wide
575
  string literal*. A wide string literal has type “array of *n* `const
576
  wchar_t`”, where *n* is the size of the string as defined below; it is
577
  initialized with the given characters.
578
 
579
+ In translation phase 6 [[lex.phases]], adjacent *string-literal*s are
580
  concatenated. If both *string-literal*s have the same *encoding-prefix*,
581
+ the resulting concatenated *string-literal* has that *encoding-prefix*.
582
+ If one *string-literal* has no *encoding-prefix*, it is treated as a
583
  *string-literal* of the same *encoding-prefix* as the other operand. If
584
  a UTF-8 string literal token is adjacent to a wide string literal token,
585
  the program is ill-formed. Any other concatenations are
586
  conditionally-supported with *implementation-defined* behavior.
587
 
588
+ [*Note 4*: This concatenation is an interpretation, not a conversion.
589
  Because the interpretation happens in translation phase 6 (after each
590
+ character from a *string-literal* has been translated into a value from
591
  the appropriate character set), a *string-literal*’s initial rawness has
592
  no effect on the interpretation or well-formedness of the
593
  concatenation. — *end note*]
594
 
595
+ [[lex.string.concat]] has some examples of valid concatenations.
 
596
 
597
+ **Table: String literal concatenations** <a id="lex.string.concat">[lex.string.concat]</a>
598
 
599
  | | | | | | |
600
  | -------------------------- | ----- | -------------------------- | ----- | -------------------------- | ----- |
601
  | *[spans 2 columns]* Source | Means | *[spans 2 columns]* Source | Means | *[spans 2 columns]* Source | Means |
602
  | `u"a"` | `u"b"` | `u"ab"` | `U"a"` | `U"b"` | `U"ab"` | `L"a"` | `L"b"` | `L"ab"` |
 
615
  contains the two characters `'\xA'` and `'B'` after concatenation (and
616
  not the single hexadecimal character `'\xAB'`).
617
 
618
  — *end example*]
619
 
620
+ After any necessary concatenation, in translation phase 7
621
+ [[lex.phases]], `'\0'` is appended to every *string-literal* so that
622
  programs that scan a string can find its end.
623
 
624
  Escape sequences and *universal-character-name*s in non-raw string
625
+ literals have the same meaning as in *character-literal*s [[lex.ccon]],
626
  except that the single quote `'` is representable either by itself or by
627
  the escape sequence `\'`, and the double quote `"` shall be preceded by
628
+ a `\`, and except that a *universal-character-name* in a UTF-16 string
629
+ literal may yield a surrogate pair. In a narrow string literal, a
630
+ *universal-character-name* may map to more than one `char` or `char8_t`
631
+ element due to *multibyte encoding*. The size of a `char32_t` or wide
632
+ string literal is the total number of escape sequences,
633
+ *universal-character-name*s, and other characters, plus one for the
634
+ terminating `U'\0'` or `L'\0'`. The size of a UTF-16 string literal is
635
+ the total number of escape sequences, *universal-character-name*s, and
636
+ other characters, plus one for each character requiring a surrogate
637
+ pair, plus one for the terminating `u'\0'`.
638
 
639
+ [*Note 5*: The size of a `char16_t` string literal is the number of
640
  code units, not the number of characters. — *end note*]
641
 
642
+ [*Note 6*: Any *universal-character-name*s are required to correspond
643
+ to a code point in the range [0, D800) or [E000, 10FFFF] (hexadecimal)
644
+ [[lex.charset]]. *end note*]
645
+
646
+ The size of a narrow string literal is the total number of escape
647
+ sequences and other characters, plus at least one for the multibyte
648
+ encoding of each *universal-character-name*, plus one for the
649
  terminating `'\0'`.
650
 
651
  Evaluating a *string-literal* results in a string literal object with
652
  static storage duration, initialized from the given characters as
653
+ specified above. Whether all *string-literal*s are distinct (that is,
654
+ are stored in nonoverlapping objects) and whether successive evaluations
655
+ of a *string-literal* yield the same or a different object is
656
+ unspecified.
657
 
658
+ [*Note 7*: The effect of attempting to modify a *string-literal* is
659
  undefined. — *end note*]
660
 
661
  ### Boolean literals <a id="lex.bool">[[lex.bool]]</a>
662
 
663
  ``` bnf
 
678
 
679
  The pointer literal is the keyword `nullptr`. It is a prvalue of type
680
  `std::nullptr_t`.
681
 
682
  [*Note 1*: `std::nullptr_t` is a distinct type that is neither a
683
+ pointer type nor a pointer-to-member type; rather, a prvalue of this
684
  type is a null pointer constant and can be converted to a null pointer
685
  value or null member pointer value. See  [[conv.ptr]] and 
686
  [[conv.mem]]. — *end note*]
687
 
688
  ### User-defined literals <a id="lex.ext">[[lex.ext]]</a>
689
 
690
  ``` bnf
691
  user-defined-literal:
692
  user-defined-integer-literal
693
+ user-defined-floating-point-literal
694
  user-defined-string-literal
695
  user-defined-character-literal
696
  ```
697
 
698
  ``` bnf
 
702
  hexadecimal-literal ud-suffix
703
  binary-literal ud-suffix
704
  ```
705
 
706
  ``` bnf
707
+ user-defined-floating-point-literal:
708
  fractional-constant exponent-partₒₚₜ ud-suffix
709
  digit-sequence exponent-part ud-suffix
710
  hexadecimal-prefix hexadecimal-fractional-constant binary-exponent-part ud-suffix
711
  hexadecimal-prefix hexadecimal-digit-sequence binary-exponent-part ud-suffix
712
  ```
 
740
  The syntactic non-terminal preceding the *ud-suffix* in a
741
  *user-defined-literal* is taken to be the longest sequence of characters
742
  that could match that non-terminal.
743
 
744
  A *user-defined-literal* is treated as a call to a literal operator or
745
+ literal operator template [[over.literal]]. To determine the form of
746
  this call for a given *user-defined-literal* *L* with *ud-suffix* *X*,
747
  the *literal-operator-id* whose literal suffix identifier is *X* is
748
  looked up in the context of *L* using the rules for unqualified name
749
+ lookup [[basic.lookup.unqual]]. Let *S* be the set of declarations found
750
+ by this lookup. *S* shall not be empty.
751
 
752
  If *L* is a *user-defined-integer-literal*, let *n* be the literal
753
  without its *ud-suffix*. If *S* contains a literal operator with
754
  parameter type `unsigned long long`, the literal *L* is treated as a
755
  call of the form
756
 
757
  ``` cpp
758
  operator "" X(nULL)
759
  ```
760
 
761
+ Otherwise, *S* shall contain a raw literal operator or a numeric literal
762
+ operator template [[over.literal]] but not both. If *S* contains a raw
763
+ literal operator, the literal *L* is treated as a call of the form
764
 
765
  ``` cpp
766
  operator "" X("n{"})
767
  ```
768
 
769
+ Otherwise (*S* contains a numeric literal operator template), *L* is
770
+ treated as a call of the form
771
 
772
  ``` cpp
773
  operator "" X<'c₁', 'c₂', ... 'cₖ'>()
774
  ```
775
 
776
  where *n* is the source character sequence c₁c₂...cₖ.
777
 
778
  [*Note 1*: The sequence c₁c₂...cₖ can only contain characters from the
779
  basic source character set. — *end note*]
780
 
781
+ If *L* is a *user-defined-floating-point-literal*, let *f* be the
782
+ literal without its *ud-suffix*. If *S* contains a literal operator with
783
  parameter type `long double`, the literal *L* is treated as a call of
784
  the form
785
 
786
  ``` cpp
787
  operator "" X(fL)
788
  ```
789
 
790
+ Otherwise, *S* shall contain a raw literal operator or a numeric literal
791
+ operator template [[over.literal]] but not both. If *S* contains a raw
792
+ literal operator, the *literal* *L* is treated as a call of the form
793
 
794
  ``` cpp
795
  operator "" X("f{"})
796
  ```
797
 
798
+ Otherwise (*S* contains a numeric literal operator template), *L* is
799
+ treated as a call of the form
800
 
801
  ``` cpp
802
  operator "" X<'c₁', 'c₂', ... 'cₖ'>()
803
  ```
804
 
 
807
  [*Note 2*: The sequence c₁c₂...cₖ can only contain characters from the
808
  basic source character set. — *end note*]
809
 
810
  If *L* is a *user-defined-string-literal*, let *str* be the literal
811
  without its *ud-suffix* and let *len* be the number of code units in
812
+ *str* (i.e., its length excluding the terminating null character). If
813
+ *S* contains a literal operator template with a non-type template
814
+ parameter for which *str* is a well-formed *template-argument*, the
815
  literal *L* is treated as a call of the form
816
 
817
+ ``` cpp
818
+ operator "" X<str>()
819
+ ```
820
+
821
+ Otherwise, the literal *L* is treated as a call of the form
822
+
823
  ``` cpp
824
  operator "" X(str, len)
825
  ```
826
 
827
  If *L* is a *user-defined-character-literal*, let *ch* be the literal
828
+ without its *ud-suffix*. *S* shall contain a literal operator
829
+ [[over.literal]] whose only parameter has the type of *ch* and the
830
  literal *L* is treated as a call of the form
831
 
832
  ``` cpp
833
  operator "" X(ch)
834
  ```
 
847
  }
848
  ```
849
 
850
  — *end example*]
851
 
852
+ In translation phase 6 [[lex.phases]], adjacent *string-literal*s are
853
+ concatenated and *user-defined-string-literal*s are considered
854
+ *string-literal*s for that purpose. During concatenation, *ud-suffix*es
855
+ are removed and ignored and the concatenation process occurs as
856
+ described in  [[lex.string]]. At the end of phase 6, if a
857
+ *string-literal* is the result of a concatenation involving at least one
858
  *user-defined-string-literal*, all the participating
859
  *user-defined-string-literal*s shall have the same *ud-suffix* and that
860
  suffix is applied to the result of the concatenation.
861
 
862
  [*Example 3*:
 
874
  [basic.fundamental]: basic.md#basic.fundamental
875
  [basic.link]: basic.md#basic.link
876
  [basic.lookup.unqual]: basic.md#basic.lookup.unqual
877
  [basic.stc]: basic.md#basic.stc
878
  [basic.types]: basic.md#basic.types
879
+ [conv.mem]: expr.md#conv.mem
880
+ [conv.ptr]: expr.md#conv.ptr
881
  [cpp]: cpp.md#cpp
882
  [cpp.concat]: cpp.md#cpp.concat
883
  [cpp.cond]: cpp.md#cpp.cond
884
+ [cpp.import]: cpp.md#cpp.import
885
  [cpp.include]: cpp.md#cpp.include
886
+ [cpp.module]: cpp.md#cpp.module
887
  [cpp.stringize]: cpp.md#cpp.stringize
888
  [dcl.attr.grammar]: dcl.md#dcl.attr.grammar
889
  [headers]: library.md#headers
890
  [lex]: #lex
891
  [lex.bool]: #lex.bool
892
  [lex.ccon]: #lex.ccon
893
+ [lex.ccon.esc]: #lex.ccon.esc
894
  [lex.charset]: #lex.charset
895
  [lex.comment]: #lex.comment
896
  [lex.digraph]: #lex.digraph
897
  [lex.ext]: #lex.ext
898
  [lex.fcon]: #lex.fcon
899
+ [lex.fcon.type]: #lex.fcon.type
900
  [lex.header]: #lex.header
901
  [lex.icon]: #lex.icon
902
+ [lex.icon.base]: #lex.icon.base
903
+ [lex.icon.type]: #lex.icon.type
904
  [lex.key]: #lex.key
905
+ [lex.key.digraph]: #lex.key.digraph
906
  [lex.literal]: #lex.literal
907
  [lex.literal.kinds]: #lex.literal.kinds
908
  [lex.name]: #lex.name
909
+ [lex.name.allowed]: #lex.name.allowed
910
+ [lex.name.disallowed]: #lex.name.disallowed
911
+ [lex.name.special]: #lex.name.special
912
  [lex.nullptr]: #lex.nullptr
913
  [lex.operators]: #lex.operators
914
  [lex.phases]: #lex.phases
915
  [lex.ppnumber]: #lex.ppnumber
916
  [lex.pptoken]: #lex.pptoken
917
  [lex.separate]: #lex.separate
918
  [lex.string]: #lex.string
919
+ [lex.string.concat]: #lex.string.concat
920
  [lex.token]: #lex.token
921
+ [module.import]: module.md#module.import
922
+ [module.unit]: module.md#module.unit
923
  [over.literal]: over.md#over.literal
 
 
 
 
 
 
 
 
 
924
  [temp.explicit]: temp.md#temp.explicit
925
  [temp.names]: temp.md#temp.names
926
 
927
  [^1]: Implementations must behave as if these separate phases occur,
928
  although in practice different phases might be folded together.
 
943
  (described in translation phase 1) is specified as
944
  *implementation-defined*, an implementation is required to document
945
  how the basic source characters are represented in source files.
946
 
947
  [^5]: A sequence of characters resembling a *universal-character-name*
948
+ in an *r-char-sequence* [[lex.string]] does not form a
949
  *universal-character-name*.
950
 
951
  [^6]: These include “digraphs” and additional reserved words. The term
952
  “digraph” (token consisting of two characters) is not perfectly
953
+ descriptive, since one of the alternative *preprocessing-token*s is
954
  `%:%:` and of course several primary tokens contain two characters.
955
  Nonetheless, those alternative tokens that aren’t lexical keywords
956
  are colloquially known as “digraphs”.
957
 
958
+ [^7]: Thus the “stringized” values [[cpp.stringize]] of `[` and `<:`
959
  will be different, maintaining the source spelling, but the tokens
960
  can otherwise be freely interchanged.
961
 
962
  [^8]: Literals include strings and character and numeric literals.
963
 
 
974
  long external identifier, but C++ does not place a translation limit
975
  on significant characters for external identifiers. In C++, upper-
976
  and lower-case letters are considered different for all identifiers,
977
  including external identifiers.
978
 
979
+ [^11]: The term “literal” generally designates, in this document, those
980
+ tokens that are called “constants” in ISO C.
981
 
982
+ [^12]: They are intended for character sets where a character does not
 
 
983
  fit into a single byte.
984
 
985
+ [^13]: Using an escape sequence for a question mark is supported for
986
  compatibility with ISO C++14 and ISO C.